I have to display a table like this:
Year
Month
Delivered
Not delivered
Not Received
2021
Jan
10
86
75
2021
Feb
13
36
96
2021
March
49
7
61
2021
Apr
3
21
72
Using raw data generated by this query:
SELECT
year,
TO_CHAR( creation_date, 'Month') AS month,
marking,
COUNT(*) AS count
FROM invoices
GROUP BY 1,2,3
I have tried using crosstab() but I got error:
SELECT * FROM crosstab('
SELECT
year,
TO_CHAR( creation_date, ''Month'') AS month,
marking,
COUNT(*) AS count
FROM invoices
GROUP BY 1,2,3
') AS ct(year text, month text, marking text)
I would prefer to not manually type all marking values because they are a lot.
ERROR: invalid source data SQL statement
DETAIL: The provided SQL must return 3 columns: rowid, category, and values.
1. Static solution with a limited list of marking values :
SELECT year
, TO_CHAR( creation_date, 'Month') AS month
, COUNT(*) FILTER (WHERE marking = 'Delivered') AS Delivered
, COUNT(*) FILTER (WHERE marking = 'Not delivered') AS "Not delivered"
, COUNT(*) FILTER (WHERE marking = 'Not Received') AS "Not Received"
FROM invoices
GROUP BY 1,2
2. Full dynamic solution with a large list of marking values :
This proposal is an alternative solution to the crosstab solution as proposed in A and B.
The proposed solution here just requires a dedicated composite type which can be dynamically created and then it relies on the jsonb type and standard functions :
Starting from your query which counts the number of rows per year, month and marking value :
Using the jsonb_object_agg function, the resulting rows are first
aggregated by year and month into jsonb objects whose jsonb keys
correspond to the marking values and whose jsonb values
correspond to the counts.
the resulting jsonb objects are then converted into records using the jsonb_populate_record function and the dedicated composite type.
First we dynamically create a composite type which corresponds to the ordered list of marking values :
CREATE OR REPLACE PROCEDURE create_composite_type() LANGUAGE plpgsql AS $$
DECLARE
column_list text ;
BEGIN
SELECT string_agg(DISTINCT quote_ident(marking) || ' bigint', ',' ORDER BY quote_ident(marking) || ' bigint' ASC)
INTO column_list
FROM invoices ;
EXECUTE 'DROP TYPE IF EXISTS composite_type' ;
EXECUTE 'CREATE TYPE composite_type AS (' || column_list || ')' ;
END ;
$$ ;
CALL create_composite_type() ;
Then the expected result is provided by the following query :
SELECT a.year
, TO_CHAR(a.year_month, 'Month') AS month
, (jsonb_populate_record( null :: composite_type
, jsonb_object_agg(a.marking, a.count)
)
).*
FROM
( SELECT year
, date_trunc('month', creation_date) AS year_month
, marking
, count(*) AS count
FROM invoices AS v
GROUP BY 1,2,3
) AS a
GROUP BY 1,2
ORDER BY month
Obviously, if the list of marking values may vary in time, then you have to recall the create_composite_type() procedure just before executing the query. If you don't update the composite_type, the query will still work (no error !) but some old marking values may be obsolete (not used anymore), and some new marking values may be missing in the query result (not displayed as columns).
See the full demo in dbfiddle.
You need to generate the crosstab() call dynamically.
But since SQL does not allow dynamic return types, you need a two-step workflow:
Generate query
Execute query
If you are unfamiliar with crosstab(), read this first:
PostgreSQL Crosstab Query
It's odd to generate the month from creation_date, but not the year. To simplify, I use a combined column year_month instead.
Query to generate the crosstab() query:
SELECT format(
$f$SELECT * FROM crosstab(
$q$
SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
, marking
, COUNT(*) AS ct
FROM invoices
GROUP BY date_trunc('month', creation_date), marking
ORDER BY date_trunc('month', creation_date) -- optional
$q$
, $c$VALUES (%s)$c$
) AS ct(year_month text, %s);
$f$, string_agg(quote_literal(sub.marking), '), (')
, string_agg(quote_ident (sub.marking), ' int, ') || ' int'
)
FROM (SELECT DISTINCT marking FROM invoices ORDER BY 1) sub;
If the table invoices is big with only few distinct values for marking (which seems likely) there are faster ways to get distinct values. See:
Optimize GROUP BY query to retrieve latest row per user
Generates a query of the form:
SELECT * FROM crosstab(
$q$
SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
, marking
, COUNT(*) AS ct
FROM invoices
GROUP BY date_trunc('month', creation_date), marking
ORDER BY date_trunc('month', creation_date) -- optional
$q$
, $c$VALUES ('Delivered'), ('Not Delivered'), ('Not Received')$c$
) AS ct(year_month text, "Delivered" int, "Not Delivered" int, "Not Received" int);
The simplified query does not need "extra columns. See:
Pivot on Multiple Columns using Tablefunc
Note the use date_trunc('month', creation_date) in GROUP BY and ORDER BY. This produces a valid sort order, and faster, too. See:
Cumulative sum of values by month, filling in for missing months
How to get rows by max(date) group by Year-Month in Postgres?
Also note the use of dollar-quotes to avoid quoting hell. See:
Insert text with single quotes in PostgreSQL
Months without entries don't show up in the result, and no markings for an existing month show as NULL. You can adapt either if need be. See:
Join a count query on generate_series() and retrieve Null values as '0'
Then execute the generated query.
db<>fiddle here (reusing
Edouard's fiddle, kudos!)
See:
Execute a dynamic crosstab query
In psql
In psql you can use \qexec to immediately execute the generated query. See:
Simulate CREATE DATABASE IF NOT EXISTS for PostgreSQL?
In Postgres 9.6 or later, you can also use the meta-command \crosstabview instead of crosstab():
test=> SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
test-> , marking
test-> , COUNT(*) AS count
test-> FROM invoices
test-> GROUP BY date_trunc('month', creation_date), 2
test-> ORDER BY date_trunc('month', creation_date)\crosstabview
year_month | Not Received | Delivered | Not Delivered
----------------+--------------+-----------+---------------
2020_January | 1 | 1 | 1
2020_March | | 2 | 2
2021_January | 1 | 1 | 2
2021_February | 1 | |
2021_March | | 1 |
2021_August | 2 | 1 | 1
2022_August | | 2 |
2022_November | 1 | 2 | 3
2022_December | 2 | |
(9 rows)
Note that \crosstabview - unlike crosstab() - does not support "extra" columns. If you insist on separate year and month columns, you need crosstab().
See:
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
Hello I have created a view, but I want to pivot it with dynamic years.
Output before pivoting:
Expected output:
My query :
SELECT *
FROM crosstab(
' select b.jenisiuran,
date_part(''year''::text, a.insertdate) AS tahun,
sum(b.jumlah_amt) AS jumlah
FROM blm_dpembayaraniuran a
JOIN blm_dpembayaraniuranline b ON a.blm_dpembayaraniuran_key::text = b.blm_dpembayaraniuran_key::text
GROUP BY date_part(''year''::text, a.insertdate), b.jenisiuran'
) AS (TRANSAKSI TEXT, "2019" NUMERIC, "2020" NUMERIC, "2021" numeric);
and I'm getting error like this :
ERROR: invalid return type
Detail: SQL rowid datatype does not match return rowid datatype.
Thanks for helping me
I find using filtered aggregation easier to work with than crosstab()
select b.jenisiuran as transaksi,
sum(b.jumlah_amt) filter (where extract(year from a.insertdate) = 2019) as "2019",
sum(b.jumlah_amt) filter (where extract(year from a.insertdate) = 2020) as "2020",
sum(b.jumlah_amt) filter (where extract(year from a.insertdate) = 2021) as "2021",
sum(b.jumlah_amt) as total
FROM blm_dpembayaraniuran a
JOIN blm_dpembayaraniuranline b ON a.blm_dpembayaraniuran_key::text = b.blm_dpembayaraniuran_key::text
WHERE a.insertdate >= date '2019-01-01'
AND a.insertdate < date '2022-01-01'
GROUP b.jenisiuran;
Adding a range condition on inserdate should improve performance as the grouping only needs to be done for the rows in the desired range, not on all rows in the both tables.
I'm having a hard time filtering this view by CreateDate. The CreateDate in the table is in the following format: 2013-10-14 15:53:33.900
I managed to DATEPART the year month and day into separate columns, but now it's not letting me use my WHERE clause on those newly created columns. Specifically, the error is "Invalid Column Name CreateYear" for both lines. What am I doing wrong here guys? Is there a better/easier way to do this than parse out the day, month, and year? It seems overkill. I've spent quite a bit of hours on this to no avail.
SELECT convert(varchar, DATEPART(month,v.CreateDate)) CreateMonth,
convert(varchar, DATEPART(DAY,v.CreateDate)) CreateDay,
convert(varchar, DATEPART(YEAR,v.CreateDate)) CreateYear,
v.CreateDate,
v.customerName
From
vw_Name_SQL_DailyPartsUsage v
full outer join
ABC.serviceteamstechnicians t on v.TechnicianNumber = t.AgentNumber
full outer join
ABC.ServiceTeams s on t.STID = s.STID
where
CreateYear >= '02/01/2018'
and
CreateYear <= '02/20/2018'
You cannot reference an alias from the select in the where
Even if you could why would you expect year to be '02/01/2018'
Why are you converting to varchar
where year(v.CreateDate) = 2018
or
select crdate, cast(crdate as date), year(crdate), month(crdate), day(crdate)
from sysObjects
where cast(crdate as date) <= '2014-2-20'
and cast(crdate as date) >= '2000-2-10'
order by crdate
You could use:
SELECT convert(varchar, DATEPART(month,v.CreateDate)) CreateMonth,
convert(varchar, DATEPART(DAY,v.CreateDate)) CreateDay,
convert(varchar, DATEPART(YEAR,v.CreateDate)) CreateYear,
v.CreateDate,
v.customerName
From vw_Name_SQL_DailyPartsUsage v
full outer join
ABC.serviceteamstechnicians t on v.TechnicianNumber = t.AgentNumber
full outer join
ABC.ServiceTeams s on t.STID = s.STID
where CreateDate BETWEEN '20180102' and '20180220';
More info about the logical query processing is that you cannot refer to a column alias at SELECT in the WHERE clause without using a subquery/CROSS APPLY.
I have a very simpl postgres (9.3) query that looks like this:
SELECT a.date, b.status
FROM sis.table_a a
JOIN sis.table_b b ON a.thing_id = b.thing_id
WHERE EXTRACT(MONTH FROM a.date) = 06
AND EXTRACT(YEAR FROM a.date) = 2015
Some days of the month of June do not exist in table_a and thus are obviously not joined to table_b. What is the best way to create records for these not represented days and assign a placeholder (e.g. 'EMPTY') to their 'status' column? Is this even possible to do using pure SQL?
Basically, you need LEFT JOIN and it looks like you also need generate_series() to provide the full set of days:
SELECT d.date
, a.date IS NOT NULL AS a_exists
, COALESCE(b.status, 'status_missing') AS status
FROM (
SELECT date::date
FROM generate_series('2015-06-01'::date
, '2015-06-30'::date
, interval '1 day') date
) d
LEFT JOIN sis.table_a a USING (date)
LEFT JOIN sis.table_b b USING (thing_id)
ORDER BY 1;
Use sargable WHERE conditions. What you had cannot use a plain index on date and has to default to a much more expensive sequential scan. (There are no more WHERE conditions in my final query.)
Aside: don't use the basic type name (and reserved word in standard SQL) date as identifier.
Related (2nd chapter):
PostgreSQL: running count of rows for a query 'by minute'
Here's the very basic query that i want to accomplish in Greenplum Database (like postgresql 8.2.15).
The field create_date in table t is timestamp w/o time zone.
Could anyone point me to right query to accomplish this? Thanks.
select * from generate_series ((select EXTRACT (YEAR FROM MIN(t1.create_date)) from t1),(select EXTRACT (YEAR FROM MAX(t1.create_date)) from t1))
Its throwing error
ERROR: function generate_series(double precision, double precision) does not exist
LINE 1: select * from generate_series ((select EXTRACT (YEAR FROM MI...
^
HINT: No function matches the given name and argument types. You may need to add explicit type casts.
You can explicitly cast arguments to integer:
select *
from generate_series (
(select EXTRACT (YEAR FROM MIN(t1.create_date)) from t1)::int,
(select EXTRACT (YEAR FROM MAX(t1.create_date)) from t1)::int
)
sql fiddle demo