/*
I'm going to explain the moving parts one by one and we will bring it all
together at the end.
First thing we need to do is gather all of the data we want to use as
columns in our pivoted output, plus any data we want to break by on
those columns.
Lets just do the first part to start with and then we will come back and show
how to break by other data.
So lets start by just trying to sum all of the amounts by month and have a
column for each month. This data is for the year 2022, and I want to make sure
we dont miss any empty months in our output where we might have had no revenue,
so we will use generate_series to get a data point for each month.
*/
SELECT
generate_series::DATE dte
FROM GENERATE_SERIES('2022-01-01'::DATE, '2022-12-31', '1 month')
dte
2022-01-01
2022-02-01
2022-03-01
2022-04-01
2022-05-01
2022-06-01
2022-07-01
2022-08-01
2022-09-01
2022-10-01
2022-11-01
2022-12-01
…
hidden batch(es)
/*
Now, lets take our columns and get the data for each - this will give us a
normal relational dataset where our pivoted "columns" will be their own
column and the data for each will be another column.
We select from our columns data set and LEFT OUTER JOIN so that if there was
no data for that column we would still get it in our output.
Notice that March does not have any amount, but we still have the row.
*/
WITH columns AS (
SELECT
generate_series::DATE dte
FROM GENERATE_SERIES('2022-01-01'::DATE, '2022-12-31', '1 month')
)
SELECT
columns.dte
, SUM(COALESCE(invoice.amount, 0)) total
FROM columns
LEFT OUTER
JOIN invoice
ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
GROUP BY
columns.dte
ORDER BY
columns.dte
dte
total
2022-01-01
22030
2022-02-01
22753
2022-03-01
0
2022-04-01
9456
2022-05-01
7798
2022-06-01
38278
2022-07-01
18736
2022-08-01
6794
2022-09-01
21033
2022-10-01
28576
2022-11-01
10172
2022-12-01
41901
…
hidden batch(es)
/*
Now comes the secret sauce, the JSONB_OBJECT_AGG() that makes our pivot work.
We aggregate our data together, passing the "column" we want to use as the
first argument and the data we want to aggregate using that column as
the second argument.
Basically the first argument is your pivoted "column" column from the data set
(or a formatted version of it, like in our example) and the second arg is
the "cell" of data you want for that "column".
You get a json object with a key for each "column" and its value being its data.
(Remember to scroll down for a more complicated example where we are
aggregating this data by customer)
*/
WITH columns AS (
SELECT
generate_series::DATE dte
FROM GENERATE_SERIES('2022-01-01'::DATE, '2022-12-31', '1 month')
)
, data AS (
SELECT
columns.dte
, SUM(COALESCE(invoice.amount, 0)) total
FROM columns
LEFT OUTER
JOIN invoice
ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
GROUP BY
columns.dte
)
SELECT
JSONB_OBJECT_AGG(TO_CHAR(dte, 'YYYY-MM'), total) pivotData
FROM
data;
/*
If your application can use the json object as it is you can stop
at the last query, but if you need to get actual columns out and
you dont mind creating a dynamic sql query, you can pull the data
out into columns like this example.
*But* this requires you to know the columns you want before hand.
*/
WITH columns AS (
SELECT
generate_series::DATE dte
FROM GENERATE_SERIES('2022-01-01'::DATE, '2022-12-31', '1 month')
)
, data AS (
SELECT
columns.dte
, SUM(COALESCE(invoice.amount, 0)) total
FROM columns
LEFT OUTER
JOIN invoice
ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
GROUP BY
columns.dte
), result AS (
SELECT
JSONB_OBJECT_AGG(TO_CHAR(dte, 'YYYY-MM'), total) pivotData
FROM
data
)
SELECT
(pivotData->>'2022-01') "jan"
, (pivotData->>'2022-02') "feb"
, (pivotData->>'2022-03') "mar"
, (pivotData->>'2022-04') "apr"
, (pivotData->>'2022-05') "may"
, (pivotData->>'2022-06') "jun"
, (pivotData->>'2022-07') "jul"
, (pivotData->>'2022-08') "aug"
, (pivotData->>'2022-09') "sep"
, (pivotData->>'2022-10') "oct"
, (pivotData->>'2022-11') "nov"
, (pivotData->>'2022-12') "dec"
FROM
result
;
jan
feb
mar
apr
may
jun
jul
aug
sep
oct
nov
dec
22030
22753
0
9456
7798
38278
18736
6794
21033
28576
10172
41901
…
hidden batch(es)
/*
Now for a more complicated example, lets say you want to do the same thing,
but you want to get totals for each customer. You just need to pull
that grouped data along for the ride.
Your "columns" query needs to create a column for each month, plus the data
you're grouping by. We use a cross join here with a list of all of our
customers, again so that if we are missing data for one month for a customer
we will still have that column in our output.
*/
SELECT
generate_series dte
, customer_id
FROM generate_series('2022-01-01'::date, '2022-12-31', '1 month')
CROSS JOIN
(
SELECT DISTINCT
customer_id
FROM
invoice
) AS customers
dte
customer_id
2022-01-01 00:00:00+00
2
2022-01-01 00:00:00+00
1
2022-02-01 00:00:00+00
2
2022-02-01 00:00:00+00
1
2022-03-01 00:00:00+00
2
2022-03-01 00:00:00+00
1
2022-04-01 00:00:00+01
2
2022-04-01 00:00:00+01
1
2022-05-01 00:00:00+01
2
2022-05-01 00:00:00+01
1
2022-06-01 00:00:00+01
2
2022-06-01 00:00:00+01
1
2022-07-01 00:00:00+01
2
2022-07-01 00:00:00+01
1
2022-08-01 00:00:00+01
2
2022-08-01 00:00:00+01
1
2022-09-01 00:00:00+01
2
2022-09-01 00:00:00+01
1
2022-10-01 00:00:00+01
2
2022-10-01 00:00:00+01
1
2022-11-01 00:00:00+00
2
2022-11-01 00:00:00+00
1
2022-12-01 00:00:00+00
2
2022-12-01 00:00:00+00
1
…
hidden batch(es)
/*
Now we need to group our "data" query by the grouping column and
sum for each "column".
You can see some examples this time of months with no money.
*/
WITH columns AS (
SELECT
generate_series dte
, customer_id
FROM generate_series('2022-01-01'::date, '2022-12-31', '1 month')
CROSS JOIN
(
SELECT DISTINCT
customer_id
FROM
invoice
) AS customers
)
SELECT
columns.dte
, columns.customer_id
, SUM(COALESCE(invoice.amount, 0)) total
FROM columns
LEFT OUTER
JOIN invoice
ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
AND columns.customer_id = invoice.customer_id
GROUP BY
columns.dte
, columns.customer_id;
dte
customer_id
total
2022-12-01 00:00:00+00
1
14447
2022-04-01 00:00:00+01
2
4057
2022-09-01 00:00:00+01
1
14
2022-03-01 00:00:00+00
1
0
2022-03-01 00:00:00+00
2
0
2022-05-01 00:00:00+01
1
0
2022-06-01 00:00:00+01
2
23457
2022-01-01 00:00:00+00
2
22030
2022-12-01 00:00:00+00
2
27454
2022-10-01 00:00:00+01
2
13110
2022-08-01 00:00:00+01
2
6794
2022-08-01 00:00:00+01
1
0
2022-02-01 00:00:00+00
1
10170
2022-02-01 00:00:00+00
2
12583
2022-10-01 00:00:00+01
1
15466
2022-11-01 00:00:00+00
1
3675
2022-05-01 00:00:00+01
2
7798
2022-01-01 00:00:00+00
1
0
2022-06-01 00:00:00+01
1
14821
2022-04-01 00:00:00+01
1
5399
2022-07-01 00:00:00+01
2
10809
2022-09-01 00:00:00+01
2
21019
2022-07-01 00:00:00+01
1
7927
2022-11-01 00:00:00+00
2
6497
…
hidden batch(es)
/*
Now we do our final aggregation, passing our "column" heading as the first arg,
the "cell" as the second arg, and the grouping by our customer_id.
Again if your application can handle the json data then you may be done, but if
you want to turn it back into columns see the next example.
*/
WITH columns AS (
SELECT
generate_series dte
, customer_id
FROM generate_series('2022-01-01'::date, '2022-12-31', '1 month')
CROSS JOIN
(
SELECT DISTINCT
customer_id
FROM
invoice
) AS customers
), data AS (
SELECT
columns.dte
, columns.customer_id
, SUM(COALESCE(invoice.amount, 0)) total
FROM columns
LEFT OUTER
JOIN invoice
ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
AND columns.customer_id = invoice.customer_id
GROUP BY
columns.dte
, columns.customer_id
)
SELECT
customer_id
, JSONB_OBJECT_AGG(TO_CHAR(dte, 'YYYY-MM'), total) pivotData
FROM
data
GROUP BY
customer_id
;
/*
And here is our final output back as columns.
*/
WITH columns AS (
SELECT
generate_series dte
, customer_id
FROM generate_series('2022-01-01'::date, '2022-12-31', '1 month')
CROSS JOIN
(
SELECT DISTINCT
customer_id
FROM
invoice
) AS customers
), data AS (
SELECT
columns.dte
, columns.customer_id
, SUM(COALESCE(invoice.amount, 0)) total
FROM columns
LEFT OUTER
JOIN invoice
ON DATE_PART('year', invoice_date) = DATE_PART('year', columns.dte)
AND DATE_PART('month', invoice_date) = DATE_PART('month', columns.dte)
AND columns.customer_id = invoice.customer_id
GROUP BY
columns.dte
, columns.customer_id
), result AS (
SELECT
customer_id
, JSONB_OBJECT_AGG(TO_CHAR(dte, 'YYYY-MM'), total) pivotData
FROM
data
GROUP BY
customer_id
)
SELECT
customer_id
, (pivotData->>'2022-01') "jan"
, (pivotData->>'2022-02') "feb"
, (pivotData->>'2022-03') "mar"
, (pivotData->>'2022-04') "apr"
, (pivotData->>'2022-05') "may"
, (pivotData->>'2022-06') "jun"
, (pivotData->>'2022-07') "jul"
, (pivotData->>'2022-08') "aug"
, (pivotData->>'2022-09') "sep"
, (pivotData->>'2022-10') "oct"
, (pivotData->>'2022-11') "nov"
, (pivotData->>'2022-12') "dec"
FROM
result
ORDER BY customer_id