Window Functions (PostgreSQL)
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result. 1
PARTITION BY
and WINDOW
clauses
SELECT category, count(*) OVER (PARTITION BY category) as category_count, count(*) OVER () as total_count FROM posts ORDER BY category;
will output both the count for each category, and the total amount of rows:
category | category_count | total_count ----------+----------------+------------ 1 | 2 | 3 1 | 2 | 3 3 | 1 | 3
you can also define aliases for the window frames.
SELECT category, count(*) OVER (w1) as category_count, count(*) OVER (w2) as total_count FROM posts WINDOW w1 as (PARTITION BY category), w2 as () ORDER BY category;
Frames
- Partitions can be divided into
frames
Frames can either be absolute or relative
SELECT -- (...) FROM table WINDOW w as ( PARTITION BY ... ORDER BY ... MODE BETWEEN f_start AND f_end [EXCLUDE exclusion_mode] )
The default frame is equivalent to:
PARTITION BY ... ORDER BY ... RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
To span an entire partition, set the mode equals to
ROWS
and useUNBOUNDED FOLLOWING
.PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
If you want to limit the amount of records inside a frame:
PARTITION BY ... ORDER BY ... ROWS BETWEEN M PRECEDING AND N FOLLOWING
You can also use
GROUPS
instead ofROWS
, to span a group of records:PARTITION BY ... ORDER BY ... GROUPS BETWEEN ... AND ...
EXCLUDE
can used to drop certain records from the frame, by default this is the toNO OTHERS
. This can be set to:PARTITION BY ... ORDER BY ... ROWS BETWEEN M PRECEDING AND N FOLLOWING EXCLUDE CURRENT ROW -- or EXCLUDE CURRENT GROUP -- or EXCLUDE TIES
Introducing some useful functions
ROW_NUMBER
SELECT category, row_number() OVER (PARTITION BY category) as category_count, count(*) OVER () as total_count FROM posts ORDER BY category;
SUM
SELECT SUM(amount) OVER w FROM table WINDOW w as ( PARTITION BY id ORDER BY created_on )
LEAD
& LAG
The
LAG
function returns a value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, it instead returns the default (which must be of the same type as the value). Both the offset and the default are evaluated with respect to the current row. If omitted,OFFSET
defaults to 1 and default to null (Ferrari and Pirozzi 2023, 163 chap.6)
- The
LEAD(field, N)
function is used to retrieve data from the next N rows. The
LAG(field, N)
function is used to retrieve data from the previous N rows.SELECT id, created_at, LEAD(created_at, 1) OVER w as prev_timestamp, LAG(created_at, 1) OVER w as next_timestamp FROM table WINDOW w as ( PARTITION BY id ORDER BY created_at )
FIRST_VALUE()
and LAST_VALUE()
You can reference the beginning/end of a partition.
SELECT id, created_at, created_at - FIRST_VALUE(created_at) OVER w as timestamp_since_first, LAST_VALUE(created_at) - created_at OVER w as timestamp_until_last FROM table WINDOW w as ( PARTITION BY id ORDER BY created_at RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )
RANK
and DENSE_RANK
The
RANK
function ranks the current row within its partition with gaps. If we don't specify aPARTITION BY
clause, the function doesn't know how to correlate the current tuple, so the function correlates to itself. (…). If we add theORDER BY
clause, the function ranks in the assigned order. (…). If we add thePARTITION BY
clause, the working mechanism is the same; the only difference is that the ranking is calculated within the partition and not on the whole table. (Ferrari and Pirozzi 2023, 161–62 chap.6)
id |
name |
month |
sold_products |
---|---|---|---|
1 | Carl | 1 | 2200 |
2 | Grace | 1 | 2200 |
3 | Alex | 1 | 1200 |
1 | Carl | 2 | 1200 |
2 | Grace | 2 | 500 |
3 | Alex | 2 | 2200 |
1 | Carl | 3 | 3000 |
2 | Grace | 3 | 3000 |
SELECT RANK() OVER(w) AS rank, name, month, sold_products FROM sales WINDOW w AS ( ORDER BY sold products DESC );
rank |
name |
month |
sold_products |
---|---|---|---|
1 | Carl | 1 | 2200 |
1 | Grace | 1 | 2500 |
3 | Alex | 1 | 1200 |
1 | Alex | 2 | 2200 |
2 | Carl | 2 | 1200 |
3 | Grace | 2 | 500 |
1 | Carl | 3 | 3000 |
1 | Grace | 3 | 3000 |
Similar to the RANK function. The difference is that the
DENSE_RANK
function ranks the current row within its partition without gaps.
SELECT DENSE_RANK() OVER(w) AS rank, name, month, sold_products FROM sales WINDOW w AS ( ORDER BY sold products DESC );
rank |
name |
month |
sold_products |
---|---|---|---|
1 | Carl | 1 | 2200 |
1 | Grace | 1 | 2500 |
2 | Alex | 1 | 1200 |
1 | Alex | 2 | 2200 |
2 | Carl | 2 | 1200 |
3 | Grace | 2 | 500 |
1 | Carl | 3 | 3000 |
1 | Grace | 3 | 3000 |
PERCENTILE_CONT
WITH test_rows AS ( -- Generates a series of random numbers between 0 and 100, distributed -- over 10 fake ids. SELECT floor(random() * 10) AS id, floor(random() * 100) AS value FROM generate_series(1, 100) ) SELECT tr.id, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY tr.value) as median, PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY tr.value) as p75, PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY tr.value) as p95, PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY tr.value) as p99 FROM test_rows tr GROUP BY tr.id ORDER BY tr.id ASC
id | median | p75 | p95 | p99 ----+--------+-------+-------------------+------------------- 0 | 45 | 59.5 | 71.5 | 75.9 1 | 32 | 81 | 83.4 | 84.68 2 | 62 | 72.5 | 88.89999999999998 | 95.38 3 | 54 | 69.25 | 92.15 | 92.83 4 | 65 | 81.5 | 96.1 | 96.82 5 | 46.5 | 82 | 96.6 | 97.72 6 | 28 | 60 | 71.39999999999999 | 74.28 7 | 48 | 59.5 | 86.25 | 88.45 8 | 44 | 53.25 | 73.8 | 75.56 9 | 39.5 | 59.5 | 90.5 | 97.30000000000001
PERCENTILE_DISC
- Discrete version of
PERCENTILE_CONT
, gets the closest value, does not perform interpolation.
WITH test_rows AS ( -- Generates a series of random numbers between 0 and 100, distributed -- over 10 fake ids. SELECT floor(random() * 10) AS id, floor(random() * 100) AS value FROM generate_series(1, 100) ) SELECT tr.id, PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY tr.value) as median, PERCENTILE_DISC(0.75) WITHIN GROUP(ORDER BY tr.value) as p75, PERCENTILE_DISC(0.95) WITHIN GROUP(ORDER BY tr.value) as p95, PERCENTILE_DISC(0.99) WITHIN GROUP(ORDER BY tr.value) as p99 FROM test_rows tr GROUP BY tr.id ORDER BY tr.id ASC
id | median | p75 | p95 | p99 ----+--------+-----+-----+----- 0 | 54 | 83 | 89 | 89 1 | 58 | 67 | 91 | 91 2 | 43 | 82 | 98 | 98 3 | 55 | 85 | 94 | 94 4 | 48 | 89 | 97 | 97 5 | 52 | 78 | 96 | 96 6 | 41 | 74 | 98 | 98 7 | 66 | 76 | 89 | 89 8 | 44 | 87 | 91 | 91 9 | 61 | 77 | 98 | 98
CUME_DIST
The
CUME_DIST
function computes the fraction of partition rows that are less than or equal to the current row and its peers. (Ferrari and Pirozzi 2023, 165 chap.6)
SELECT x, CUME_DIST() OVER w FROM (SELECT generate_series(1,10) AS x) WINDOW w AS (ORDER BY x); -- Outputs: x | cume_dist ----+----------- 1 | 0.1 2 | 0.2 3 | 0.3 4 | 0.4 5 | 0.5 6 | 0.6 7 | 0.7 8 | 0.8 9 | 0.9 10 | 1
PERCENT_RANK
- Calculates percent values in a current partition.
WITH test_rows AS ( -- Generates a series of random numbers between 0 and 100, distributed -- over 5 fake ids. SELECT floor(random() * 10) AS id, floor(random() * 100) AS value FROM generate_series(1, 20) ) SELECT tr.id, tr.value, CUME_DIST() OVER w as c_rank, PERCENT_RANK() OVER w as p_rank FROM test_rows tr WINDOW w AS ( PARTITION BY tr.id ORDER BY tr.value ); #+end_src sql #+begin_src id | value | c_rank | p_rank ----+-------+--------------------+-------------------- 0 | 61 | 0.5 | 0 0 | 92 | 1 | 1 1 | 21 | 0.5 | 0 1 | 31 | 1 | 1 2 | 4 | 0.3333333333333333 | 0 2 | 15 | 0.6666666666666666 | 0.5 2 | 68 | 1 | 1 3 | 29 | 0.5 | 0 3 | 72 | 1 | 1 4 | 7 | 0.25 | 0 4 | 31 | 0.5 | 0.3333333333333333 4 | 35 | 0.75 | 0.6666666666666666 4 | 63 | 1 | 1 6 | 39 | 1 | 0 7 | 26 | 0.5 | 0 7 | 43 | 1 | 1 8 | 18 | 0.25 | 0 8 | 62 | 0.5 | 0.3333333333333333 8 | 66 | 0.75 | 0.6666666666666666 8 | 96 | 1 | 1
NTILE
- Divides an ordered partition into a number of ranked groups, each having close-to-equal size, if feasible.
SELECT id, NTILE(3) OVER w FROM (SELECT generate_series(1,6) AS id) WINDOW w AS (ORDER BY id);
id | ntile ----+------- 1 | 1 2 | 1 3 | 2 4 | 2 5 | 3 6 | 3
Using Advanced Statement Window Functions
Value Functions
References:
Footnotes:
Source: PostgreSQL Docs