Monday, July 19, 2021

Correlated subqueries

Correlated subqueries 

  • Uses values from the outer query to generate a result
  • Re-run for every row generated in the final data set
A simple subquery in SELECT FROM and WHERE


SELECT 
    -- Select the stage and average goals from s
    s.stage,
    ROUND(s.avg_goals,2) AS avg_goal,
    -- Select the overall average for 2012/2013
    (SELECT AVG(home_goal + away_goal) FROM match WHERE season = '2012/2013') AS overall_avg
FROM 
    -- Select the stage and average goals in 2012/2013 from match
    (SELECT
         stage,
         AVG(home_goal + away_goal) AS avg_goals
     FROM match
     WHERE season = '2012/2013'
     GROUP BY stage) AS s
WHERE 
    -- Filter the main query using the subquery
    s.avg_goals > (SELECT AVG(home_goal + away_goal) 
                    FROM match WHERE season = '2012/2013');









Both the above queries gives identical OUTPUT:







Why Correlated subqueries in the first place?


Correlated subqueries take longer to produce results as they are evaluated in loops and this significantly slows down query runtime, but they often prevent you from having to create multiple subqueries. They are useful for matching data across multiple column (for selecting data broken into multiple categories).


These queries take a while to load.


Basic Correlated Subqueries 

  •  Correlated Subqueries are subqueries that reference one or more columns in the main query.
  • They depend on the information in the main query to run and thus cannot be execute on their own.
  • They are evaluated in SQL once per row of data retrieved -- a process that takes a lot more computing power and time than a simple subquery.
Exercise:

Examine matches with scores that are extreme outliers for each country -- above 3 times the average score!

SELECT 
    -- Select country ID, date, home, and away goals from match
    main.country_id,
    date,
    main.home_goal, 
    main.away_goal
FROM match AS main
WHERE 
    -- Filter the main query by the subquery
    (home_goal + away_goal) > 
        (SELECT AVG((sub.home_goal + sub.away_goal) * 3)
         FROM match AS sub
         -- Join the main query to the subquery in WHERE
         WHERE main.country_id = sub.country_id);




Exercise:

What was the highest scoring match for each country, in each season?


SELECT 
    -- Select country ID, date, home, and away goals from match
    main.country_id,
    date,
    main.home_goal,
    main.away_goal
FROM match AS main
WHERE 
    -- Filter for matches with the highest number of goals scored
    (home_goal + away_goal) IN 
        (SELECT MAX(sub.home_goal + sub.away_goal)
         FROM match AS sub
         WHERE main.country_id = sub.country_id
               AND main.season = sub.season);



Thank you!



No comments: