Nothing Special   »   [go: up one dir, main page]

Window Functions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Window Functions

I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
Working with aggregate values
Requires you to use GROUP BY with all non-aggregate columns

SELECT
country_id,
season,
date,
AVG(home_goal) AS avg_home
FROM match
GROUP BY country_id;

ERROR: column "match.season" must appear in the GROUP BY


clause or be used in an aggregate function

INTERMEDIATE SQL
Introducing window functions! The OVER() clause allows you to pass an aggregate function
down a data set, similar to subqueries in SELECT. The
OVER() clause offers significant benefits over subqueries in
select -- namely, your queries will run faster, and the OVER()

Perform calculations on an already generated result set (a window)


clause has a wide range of additional functions and clauses
you can include with it that we will cover later on in this
chapter.

Aggregate calculations
Similar to subqueries in SELECT

Running totals, rankings, moving averages

INTERMEDIATE SQL
What's a window function?
How many goals were scored in each match in 2011/2012, and how did
that compare to the average?
SELECT
date,
(home_goal + away_goal) AS goals,
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2011/2012') AS overall_avg
FROM match
WHERE season = '2011/2012';

| date | goals | overall_avg |


|------------|-------|-------------------|
| 2011-07-29 | 3 | 2.71646 |
| 2011-07-30 | 2 | 2.71646 |
| 2011-07-30 | 4 | 2.71646 |
| 2011-07-30 | 1 | 2.71646 |

INTERMEDIATE SQL
What's a window function?
How many goals were scored in each match in 2011/2012, and how did
that compare to the average?
SELECT
date,
(home_goal + away_goal) AS goals,
AVG(home_goal + away_goal) OVER() AS overall_avg this clause tells sql to 'pass this aggregate value over this existing result set'

FROM match
WHERE season = '2011/2012';

| date | goals | overall_avg |


|------------|-------|-------------------|
| 2011-07-29 | 3 | 2.71646 |
| 2011-07-30 | 2 | 2.71646 |
| 2011-07-30 | 4 | 2.71646 |
| 2011-07-30 | 1 | 2.71646 |

INTERMEDIATE SQL
Generate a RANK
What is the rank of matches based on number of goals scored?

SELECT
date,
(home_goal + away_goal) AS goals
FROM match
WHERE season = '2011/2012';

| date | goals |
|------------|-------|
| 2011-07-29 | 3 |
| 2011-07-30 | 2 |
| 2011-07-30 | 4 |
| 2011-07-30 | 1 |

INTERMEDIATE SQL
Generate a RANK
What is the rank of matches based on number of goals scored?

SELECT
date,
(home_goal + away_goal) AS goals,
RANK() OVER(ORDER BY home_goal + away_goal) AS goals_rank
FROM match
WHERE season = '2011/2012';

| date | goals | goals_rank |


|------------|-------|------------|
| 2012-04-28 | 0 | 1 |
| 2011-12-26 | 0 | 1 |
| 2011-09-10 | 0 | 1 |
| 2011-08-27 | 0 | 1 |

INTERMEDIATE SQL
Generate a RANK
What is the rank of matches based on number of goals scored?

SELECT
date,
(home_goal + away_goal) AS goals,
RANK() OVER(ORDER BY home_goal + away_goal DESC) AS goals_rank
FROM match
WHERE season = '2011/2012';

| date | goals | goals_rank |


|------------|-------|------------|
| 2011-11-06 | 10 | 1 |
| 2011-08-28 | 10 | 1 |
| 2012-05-12 | 9 | 3 |
| 2012-02-12 | 9 | 3 |

INTERMEDIATE SQL
Key Differences
Processed after every part of query except ORDER BY
Uses information in result set rather than database

Available in PostgreSQL, Oracle, MySQL, SQL Server...


...but NOT SQLite

INTERMEDIATE SQL
Let's Practice!
I N T E R M E D I AT E S Q L
Window Partitions
I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
OVER and PARTITION BY
Calculate separate values for different categories

Calculate different calculations in the same column

AVG(home_goal) OVER(PARTITION BY season)

INTERMEDIATE SQL
Partition your data
How many goals were scored in each match, and how did that compare
to the overall average?
SELECT
date,
(home_goal + away_goal) AS goals,
AVG(home_goal + away_goal) OVER() AS overall_avg
FROM match;

| date | goals | overall_avg |


|------------|-------|-------------|
| 2011-12-17 | 3 | 2.73210 |
| 2012-05-01 | 2 | 2.73210 |
| 2012-11-27 | 4 | 2.73210 |
| 2013-04-20 | 1 | 2.73210 |
| 2013-11-09 | 5 | 2.73210 |

INTERMEDIATE SQL
Partition your data
How many goals were scored in each match, and how did that compare
to the season's average?
SELECT
date,
(home_goal + away_goal) AS goals,
AVG(home_goal + away_goal) OVER(PARTITION BY season) AS season_avg
FROM match;

| date | goals | season_avg |


|------------|-------|-------------|
| 2011-12-17 | 3 | 2.71646 |
| 2012-05-01 | 2 | 2.71646 |
| 2012-11-27 | 4 | 2.77270 |
| 2013-04-20 | 1 | 2.77270 |
| 2013-11-09 | 5 | 2.76682 |

INTERMEDIATE SQL
PARTITION by Multiple Columns
SELECT
c.name,
m.season,
(home_goal + away_goal) AS goals,
AVG(home_goal + away_goal)
OVER(PARTITION BY m.season, c.name) AS season_ctry_avg
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id

| name | season | goals | season_ctry_avg |


|-------------|-----------|-----------|-----------------|
| Belgium | 2011/2012 | 1 | 2.88 |
| Netherlands | 2014/2015 | 1 | 3.08 |
| Belgium | 2011/2012 | 1 | 2.88 |
| Spain | 2014/2015 | 2 | 2.66 |

INTERMEDIATE SQL
PARTITION BY considerations
Can partition data by 1 or more columns

Can partition aggregate calculations, ranks, etc

INTERMEDIATE SQL
Let's Practice!
I N T E R M E D I AT E S Q L
Sliding Windows
I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
Sliding Windows Sliding windows allow you to create running calculations between any two points in a window

Perform calculations relative to the current row

Can be used to calculate running totals, sums, averages, etc

Can be partitioned by one or more columns

INTERMEDIATE SQL
Sliding Window Keywords
ROWS BETWEEN <start> AND <finish>

PRECEDING
used to specify the num of rows before or after the current row that you want to include in a calculation

FOLLOWING
UNBOUNDED PRECEDING
tell sql that you want to include every row since the beginning, or the end, of the dataset in your calculations
UNBOUNDED FOLLOWING
CURRENT ROW tells sql that you want to stop your calculation at the curretn row

INTERMEDIATE SQL
Sliding Window Example
-- Manchester City Home Games
SELECT
date,
home_goal,
away_goal,
SUM(home_goal)
OVER(ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM match
WHERE hometeam_id = 8456 AND season = '2011/2012';

| date | home_goal | away_goal | running_total |


|------------|-----------|-----------|---------------|
| 2011-08-15 | 4 | 0 | 4 |
| 2011-09-10 | 3 | 0 | 7 |
| 2011-09-24 | 2 | 0 | 9 |
| 2011-10-15 | 4 | 1 | 13 |

INTERMEDIATE SQL
Sliding Window Frame
-- Manchester City Home Games
SELECT date,
home_goal,
away_goal,
SUM(home_goal)
OVER(ORDER BY date
ROWS BETWEEN 1 PRECEDING
AND CURRENT ROW) AS last2
FROM match
WHERE hometeam_id = 8456
AND season = '2011/2012';

INTERMEDIATE SQL
Let's Practice!
I N T E R M E D I AT E S Q L
Bringing it all
Together
I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
What you've learned so far
CASE statements

Simple subqueries

Nested and correlated subqueries

Common table expressions

Window functions

INTERMEDIATE SQL
Let's do a case study!
Who defeated Manchester United in the 2013/2014 season?

INTERMEDIATE SQL
Steps to construct the query
Get team names with CTEs

Get match outcome with CASE statements

Determine how badly they lost with a window function

INTERMEDIATE SQL
Getting the database for yourself
Now that you have a query identifying the home team in a match, you will perform a similar set of

Full European Soccer Database steps to identify the away team. Just like the previous step, you will join the match and team ta-
bles. Each of these two queries will be declared as a Common Table Expression in the following
step.
For this exercise, you will be using all of these concepts to generate a
list of matches in which Manchester United was defeated during the The primary difference in this query is that you will be joining the tables on awayteam_id, and
2014/2015 English Premier League season. reversing the match outcomes in the CASE statement.

Your first task is to create the first query that filters for matches where When altering CASE statement logic in your own work, you can reverse either the logical
Manchester United played as the home team. This will become a condition (i.e., home_goal > away_goal) or the outcome in THEN -- just make sure you only rever-
common table expression in a later exercise. se one of the two!

SELECT
SELECT
m.id,
m.id,
t.team_long_name,
t.team_long_name,
-- Identify matches as home/away wins or ties
-- Identify matches as home/away wins or ties
case when m.home_goal < m.away_goal then 'MU Win'
case when m.home_goal > m.away_goal then 'MU Win'
when m.home_goal > m.away_goal then 'MU Loss'
when m.home_goal < m.away_goal then 'MU Loss'
else 'Tie' end AS outcome
else 'Tie' end AS outcome
FROM match AS m
FROM match AS m
-- Left join team on the home team ID and team API id
-- Left join team on the home team ID and team API id
LEFT JOIN team AS t
LEFT JOIN team AS t
ON m.awayteam_id = t.team_api_id
ON m.hometeam_id = t.team_api_id
WHERE
WHERE
-- Filter for 2014/2015 and Manchester United as the home team
-- Filter for 2014/2015 and Manchester United as the home team
season = '2014/2015'
season = '2014/2015'
AND t.team_long_name = 'Manchester United';
AND t.team_long_name = 'Manchester United';

INTERMEDIATE SQL
it's time to rearrange your query with the home and away subqueries as Common Table You now have a result set that retrieves the match date, home team, away team, and the goals
Expressions (CTEs). You'll notice that the main query includes the phrase, SELECT scored by each team. You have one final component of the question left -- how badly did
DISTINCT. Without identifying only DISTINCT matches, you will return a duplicate record for Manchester United lose in each match?
each game played.
In order to determine this, let's add a window function to the main query that ranks matches by the
Continue building the query to extract all matches played by Manchester United in the 2014/ absolute value of the difference between home_goal and away_goal. This allows us to directly
2015 season. compare the difference in scores without having to consider whether Manchester United played as
the home or away team!

-- Set up the home team CTE -- Set up the home team CTE
with home as ( WITH home AS (
SELECT m.id, t.team_long_name, SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win' CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss' WHEN m.home_goal < m.away_goal THEN 'MU Loss'
ELSE 'Tie' END AS outcome ELSE 'Tie' END AS outcome
FROM match AS m FROM match AS m
LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id), LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),

Let's Practice!
-- Set up the away team CTE
-- Set up the away team CTE away AS (
away as ( SELECT m.id, t.team_long_name,
SELECT m.id, t.team_long_name, CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win' WHEN m.home_goal < m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss' ELSE 'Tie' END AS outcome
ELSE 'Tie' END AS outcome FROM match AS m
I N T E R M E D I AT E S Q L
FROM match AS m LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id) -- Select columns and and rank the matches by date
-- Select team names, the date and goals SELECT DISTINCT
SELECT DISTINCT m.date,
m.date, home.team_long_name AS home_team,
home.team_long_name AS home_team, away.team_long_name AS away_team,
away.team_long_name AS away_team, m.home_goal, m.away_goal,
m.home_goal, RANK() OVER(ORDER BY ABS(home_goal - away_goal) DESC) as match_rank
m.away_goal -- Join the CTEs onto the match table
-- Join the CTEs onto the match table FROM match AS m
FROM match AS m LEFT JOIN home ON m.id = home.id
left JOIN home ON m.id = home.id LEFT JOIN AWAY ON m.id = away.id
left JOIN away ON m.id = away.id WHERE m.season = '2014/2015'
WHERE m.season = '2014/2015' AND ((home.team_long_name = 'Manchester United' AND home.outcome = 'MU Loss')
AND (home.team_long_name = 'Manchester United' OR (away.team_long_name = 'Manchester United' AND away.outcome = 'MU Loss'));
OR away.team_long_name = 'Manchester United');

You might also like