Correlated subqueries
- Uses values from the outer query to generate a result
- Re-run for every row generated in the final data set
A simple subquery in SELECT FROM and WHERE
SELECT
-- Select the stage and average goals from s
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
-- Select the overall average for 2012/2013
(SELECT AVG(home_goal + away_goal) FROM match WHERE season = '2012/2013') AS overall_avg
FROM
-- Select the stage and average goals in 2012/2013 from match
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s
WHERE
-- Filter the main query using the subquery
s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match WHERE season = '2012/2013');
Both the above queries gives identical OUTPUT:
Why Correlated subqueries in the first place?
Correlated subqueries take longer to produce results as they are evaluated in loops and this significantly slows down query runtime, but they often prevent you from having to create multiple subqueries. They are useful for matching data across multiple column (for selecting data broken into multiple categories).
These queries take a while to load.
Basic Correlated Subqueries
- Correlated Subqueries are subqueries that reference one or more columns in the main query.
- They depend on the information in the main query to run and thus cannot be execute on their own.
- They are evaluated in SQL once per row of data retrieved -- a process that takes a lot more computing power and time than a simple subquery.
Exercise:
Examine matches with scores that are extreme outliers for each country -- above 3 times the average score!
SELECT
-- Select country ID, date, home, and away goals from match
main.country_id,
date,
main.home_goal,
main.away_goal
FROM match AS main
WHERE
-- Filter the main query by the subquery
(home_goal + away_goal) >
(SELECT AVG((sub.home_goal + sub.away_goal) * 3)
FROM match AS sub
-- Join the main query to the subquery in WHERE
WHERE main.country_id = sub.country_id);
Exercise:
What was the highest scoring match for each country, in each season?
SELECT
-- Select country ID, date, home, and away goals from match
main.country_id,
date,
main.home_goal,
main.away_goal
FROM match AS main
WHERE
-- Filter for matches with the highest number of goals scored
(home_goal + away_goal) IN
(SELECT MAX(sub.home_goal + sub.away_goal)
FROM match AS sub
WHERE main.country_id = sub.country_id
AND main.season = sub.season);
Thank you!
No comments:
Post a Comment