What is a subquery?
- Query nested inside another query
- Useful for intermediary transformations
- Can return a variety of information
- Scalar quantities (3.14159, -2, 0.001)
- A list (id = (12, 25, 392, 401, 939))
- A table
- Can be in any part of a query
- WHERE, SELECT, FROM, ON statement in JOINS, GROUP BY
Why subquery?
- Comparing groups to summarized values
- How did Liverpool compare to the English Premier League's average performance for that year?
- Reshaping data
- What is the highest monthly average of goals scored in the Bundesliga?
- Combining data that cannot be joined
- How do you get both the home and away team nams into a table of match results?
Subqueries inside WHERE clause (returns only one column)
Most common: Inside WHERE
These are useful for filtering results based on information you'd have to calculate separately beforehand.
Subqueries are also useful for generating a filtering list.
You can filter data based on single, scalar values using a subquery in ways you cannot by using WHERE statements or joins.
SELECT name, fert_rate
FROM states
WHERE continent = 'Asia'
AND fert_rate <
(SELECT AVG(fert_rate
FROM states);
Question: Which countries had high average life expectancies (at the country level) in 2015.
-- Select fields
SELECT *
-- From populations
FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy >
-- 1.15 * subquery
1.15 * (SELECT AVG(life_expectancy)
FROM populations
WHERE year = 2015) AND
year = 2015;
Question: Get the urban area population for only capital cities.
-- Select fields
SELECT c1.name, c1.country_code, urbanarea_pop
-- From cities
FROM cities c1
-- Where city name in the field of capital cities
WHERE c1.name IN
-- Subquery
(SELECT capital
FROM countries)
ORDER BY urbanarea_pop DESC;
Question:
Generate a list of matches where the total goals scored (for both teams in total) is more than 3 times the average for games in the matches_2013_2014
table, which includes all games played in the 2013/2014 season.
SELECT
-- Select the date, home goals, and away goals scored
date,
home_goal,
away_goal
FROM matches_2013_2014
-- Filter for matches where total goals exceeds 3x the average
WHERE (home_goal + away_goal) >
(SELECT 3 * AVG(home_goal + away_goal)
FROM matches_2013_2014);
Question:
Generate a list of teams that never played a game in their home city. Using a subquery, generate a list of unique hometeam_ID
values from the unfiltered match
table to exclude in the team
table's team_api_ID
column.
In addition to filtering using a single-value (scalar) subquery, you can create a list of values in a subquery to filter data based on a complex set of conditions. This type of subquery generates a one column reference list for the main query. As long as the values in your list match a column in your main query's table, you don't need to use a join -- even if the list is from a separate table.
SELECT
-- Select the team long and short names
team_long_name,
team_short_name
FROM team
-- Exclude all values from the subquery
WHERE team_api_id NOT IN
(SELECT DISTINCT hometeam_ID FROM match);
Question:
Create a list of teams that scored 8 or more goals in a home match.
SELECT
-- Select the team long and short names
team_long_name,
team_short_name
FROM team
-- Filter for teams with 8 or more home goals
WHERE team_api_id IN
(SELECT hometeam_id
FROM match
WHERE home_goal >= 8);
Question
Generate a subquery using the match
table, and then join that subquery to the country
table to calculate information about matches with 10 or more goals in total!
SELECT
-- Select country name and the count match IDs
c.name AS country_name,
COUNT(sub.id) AS matches
FROM country AS c
-- Inner join the subquery onto country
-- Select the country id and match id columns
INNER JOIN (SELECT country_id, id
FROM match
-- Filter the subquery by matches with 10+ goals
WHERE (home_goal + away_goal) >= 10) AS sub
ON c.id = sub.country_id
GROUP BY country_name;
SELECT
-- Select country, date, home, and away goals from the subquery
country,
date,
home_goal,
away_goal
FROM
-- Select country name, date, home_goal, away_goal, and total goals in the subquery
(SELECT c.name AS country,
m.date,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) AS total_goals
FROM match AS m
LEFT JOIN country AS c
ON m.country_id = c.id) AS subq
-- Filter by total goals scored in the main query
WHERE total_goals >= 10;
Subqueries inside FROM clause (returns a temporary table)
Usage:
- Restructure and transform data into different shape
- A subquery in
FROM
is an effective way of answering detailed questions that requires filtering or transforming data before including it in your final results. - Pre-filtering data
- Calculating aggregates of aggregates
- Which 3 teams has the highest average of home goals scored? (AVG, MAX)
Things to remember:
- You can create multiple subqueries in one FROM statement
- You can join a subquery to a table in FROM
- Include a joining columns in both tables
Subquery as a temporary table in FROM clause. We can have multiple table using a comma in FROM clause.
SELECT DISTINCT monarchs.continent, subquery.max_perc
FROM monarchs,
(SELECT continent, MAX(women_parli_perc) AS max_perc
FROM states
GROUP BY continent) AS subquery
WHERE monarch.continent = subquery.continent
ORDER BY continent;
Question: Determine the number of languages spoken for each country, identified by the country's local name!
-- Select fields
SELECT local_name, subquery.lang_num
-- From countries
FROM countries,
-- Subquery (alias as subquery)
(SELECT code, COUNT(*) AS lang_num
FROM languages
GROUP BY code) AS subquery
-- Where codes match
WHERE countries.code = subquery.code
-- Order by descending number of languages
ORDER BY lang_num DESC;
Advanced subquery
Question: Identify which country had the maximum inflation rate, and how high it was, using multiple subqueries. (Note: Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.)
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
ON countries.code = economies.code
-- Where year is 2015
WHERE year = 2015
-- And inflation rate in subquery (alias as subquery)
AND inflation_rate IN (
SELECT MAX(inflation_rate) AS max_inf
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
ON countries.code = economies.code
WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent);
Question:
Use a subquery to get 2015 economic data for countries that do not have
gov_form
of 'Constitutional Monarchy'
or 'Republic'
in their gov_form
.
Here, gov_form
stands for the form of the government for each country.
-- Select fields
SELECT code, inflation_rate, unemployment_rate
-- From economies
FROM economies
-- Where year is 2015 and code is not in
WHERE year = 2015 AND code NOT IN
-- Subquery
(SELECT code
FROM countries
WHERE (gov_form = 'Constitutional Monarchy' OR gov_form LIKE '%Republic%'))
-- Order by inflation rate
ORDER BY inflation_rate;
Challenge 1:
Get the country names and other 2015 data in the economies
table and the countries
table for Central American countries with an official language.
-- Select fields
SELECT DISTINCT c.name, total_investment, imports
-- From table (with alias)
FROM countries AS c
-- Join with table (with alias)
LEFT JOIN economies AS e
-- Match on code
ON (c.code = e.code
-- and code in Subquery
AND c.code IN (
SELECT l.code
FROM languages AS l
WHERE official = 'true'
) )
-- Where region and year are correct
WHERE region = 'Central America' AND year = 2015
-- Order by field
ORDER BY c.name;
As above, a subquery can be used inside of ON statement of join.
Challenge 2:
Calculate the average fertility rate for each region in 2015.
-- Select fields
SELECT region, continent, AVG(fertility_rate) AS avg_fert_rate
-- From left table
FROM countries AS c
-- Join to right table
INNER JOIN populations AS p
-- Match on join condition
ON c.code = p.country_code
-- Where specific records matching some condition
WHERE year = 2015
-- Group appropriately
GROUP BY continent, region
-- Order appropriately
ORDER BY avg_fert_rate;
Challenge 3:
Determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using city_proper_pop
and metroarea_pop
in cities
.
Do not use table aliasing in this exercise.
-- Select fields
SELECT name, country_code, city_proper_pop, metroarea_pop,
-- Calculate city_perc
city_proper_pop / metroarea_pop * 100 AS city_perc
-- From appropriate table
FROM cities
-- Where
WHERE name IN
-- Subquery
(SELECT capital
FROM countries
WHERE (continent = 'Europe'
OR continent LIKE '%America'))
AND metroarea_pop IS NOT NULL
-- Order appropriately
ORDER BY city_perc DESC
-- Limit amount
LIMIT 10;
Subqueries inside SELECT clause
Second most common: Inside SELECT
Usage:
- Returns s single value
- Include aggregate values to compare to individuals values
- Used in mathematical calculations
Keep things in mind:
- Need to return a single value
- Properly filter both the main and the subquery
Some queries can be written using either a join or a subquery
SELECT DISTINCT continent,
(SELECT COUNT(*)
FROM states
WHERE prime_ministers.continent = states.continent) AS countries_num
FROM prime_ministers;
Examples:
Joins: To get summarized results from multiple table we used GROUP BY and INNER JOIN.
Question: Select the top nine countries in terms of number of cities (popular cities) appearing in the cities
table.
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
/*
SELECT ___ AS ___,
(SELECT ___
FROM ___
WHERE countries.code = cities.country_code) AS cities_num
FROM ___
ORDER BY ___ ___, ___
LIMIT 9;
*/
Subquery: Convert the GROUP BY code to use a subquery inside of SELECT
/*
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
*/
SELECT countries.name AS country,
-- Subquery
(SELECT COUNT(*)
FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;