Retrieve data from the same column in two columns - postgresql

I have a table in PostgreSQL, something like this:
ID NAME
450 China
525 Germany
658 Austria
Iā€™d like to query every names where ID < 500 and at the same time where ID > 500 and retrieve the result in two columns using
array_to_string(array_agg(NAME), ', ').
I need the following result:
column1 (ID < 500) column2 (ID > 500)
China Germany, Austria

Try using conditional aggregation:
SELECT
STRING_AGG(CASE WHEN ID < 500 THEN NAME END, ', ') AS ID_lt_500,
STRING_AGG(CASE WHEN ID >= 500 THEN NAME END, ', ') AS ID_gt_500
FROM yourTable;
Demo
Edit:
If you are using a version of Postgres which does not support STRING_AGG, then do as you were already doing:
SELECT
ARRAY_TO_STRING(ARRAY_AGG(CASE WHEN ID < 500 THEN NAME END), ', ') AS ID_lt_500,
ARRAY_TO_STRING(ARRAY_AGG(CASE WHEN ID >= 500 THEN NAME END), ', ') AS ID_gt_500
FROM yourTable;
Demo

Something like:
select (select string_agg(name, ', ')
from the_table
where id <= 500) as column1,
(select string_agg(name, ', ')
from the_table
where id > 500) as column2;
Alternatively:
select string_agg(name, ', ') filter (where id <= 500) as column1,
string_agg(name, ', ') filter (where id > 500) as column2
from the_table;

Related

In PostgreSQL, how can I optimize a query with which I obtain the differences between the current column and the immediately previous one?

I have this audit table
User
date
text
text 2
u1
2023-01-01
hi
yes
u1
2022-12-20
hi
no
u1
2022-12-01
hello
maybe
And I need as a result, something like this:
User
date
text
text 2
u1
2023-01-01
null
x
u1
2022-12-20
x
x
u1
2022-12-01
null
null
So I can know which column changed from the last time.
Something like this is working, but I think may be a way to optimize it? or at least generate a "more easy to look" query? (i need the information for almost 20 columns, not only 3)
SELECT
ta.audit_date,
ta.audit_user,
CASE
WHEN ta.audit_operation = 'I' THEN 'Insert'
WHEN ta.audit_operation = 'U' THEN 'Update'
END AS action,
CASE WHEN ta.column1 <> (SELECT column1
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column1,
CASE WHEN ta.column2 <> (SELECT column2
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column2,
CASE WHEN ta.column3 <> (SELECT column3
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column3
FROM
audit_table ta
WHERE
ta.id = 9207
ORDER BY
audit_date DESC
Thank you!
I think you can just use the LAG() analytic function here. If I understand correctly:
SELECT *, CASE WHEN text != LAG(text) OVER (ORDER BY date) THEN 'x' END AS text_label,
CASE WHEN text2 != LAG(text) OVER (ORDER BY date) THEN 'x' END AS text2_label
FROM yourTable
ORDER BY date;

how do i string_agg these accounts together?

i've got a table that's kinda like this
id
account
111111
333-333-2
111111
333-333-1
222222
444-444-1
222222
555-555-1
222222
555-555-2
and i'm trying to aggregate everything up to look like this
id
account
111111
333-333-1, -2
222222
444-444-1, 555-555-1, -2
so far i've got this
SELECT
id,
CONCAT((STRING_AGG(DISTINCT SUBSTRING(account FROM '^(([^-]*-){2})'), ', ')),
(STRING_AGG(DISTINCT SUBSTRING(account FROM '[^-]*$'), ', '))) account
GROUP BY id
but this produces
id
account
111111
333-333-1, 2
222222
444-444-, 555-555-1, 2
, A AS (
SELECT id,
SUBSTRING(account FROM '^(([^-]*-){2})') first_account,
STRING_AGG(DISTINCT SUBSTRING(account FROM '[^-]*$'), ', ') second_account
FROM table
GROUP BY id, first_account
)
select id, STRING_AGG(DISTINCT first_account || second_account, ', ')
FROM A
GROUP BY id
i ended up figuring it out and this worked for me :))
I would suggest a different approach: first split the account numbers into main part and suffix, then do separate grouping operations on them:
SELECT
id,
string_agg(accounts, ', ') AS account
FROM (
SELECT
id,
concat(account_main, string_agg(account_suffix, ', ')) AS accounts
FROM (
SELECT
id,
substr(account, 1, 7) AS account_main,
substr(account, 8, 9) AS account_suffix
FROM
example
) AS t1
GROUP BY
id,
account_main
) AS t2
GROUP BY
id;

Checking Slowly Changing Dimension 2

I have a table that looks like this:
A slowly changing dimension type 2, according to Kimball.
Key is just a surrogate key, a key to make rows unique.
As you can see there are three rows for product A.
Timelines for this product are ok. During time the description of the product changes.
From 1-1-2020 up until 4-1-2020 the description of this product was ProdA1.
From 5-1-2020 up until 12-2-2020 the description of this product was ProdA2 etc.
If you look at product B, you see there are gaps in the timeline.
We use DB2 V12 z/Os. How can I check if there are gaps in the timelines for each and every product?
Tried this, but doesn't work
with selectie (key, tel) as
(select product, count(*)
from PROD_TAB
group by product
having count(*) > 1)
Select * from
PROD_TAB A
inner join selectie B
on A.product = B.product
Where not exists
(SELECT 1 from PROD_TAB C
WHERE A.product = C.product
AND A.END_DATE + 1 DAY = C.START_DATE
)
Does anyone know the answer?
The following query returns all gaps for all products.
The idea is to enumerate (RN column) all periods inside each product by START_DATE and join each record with its next period record.
WITH
/*
MYTAB (PRODUCT, DESCRIPTION, START_DATE, END_DATE) AS
(
SELECT 'A', 'ProdA1', DATE('2020-01-01'), DATE('2020-01-04') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA2', DATE('2020-01-05'), DATE('2020-02-12') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA3', DATE('2020-02-13'), DATE('2020-12-31') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB1', DATE('2020-01-05'), DATE('2020-01-09') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB2', DATE('2020-01-12'), DATE('2020-03-14') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB3', DATE('2020-03-15'), DATE('2020-04-18') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB4', DATE('2020-04-16'), DATE('2020-05-03') FROM SYSIBM.SYSDUMMY1
)
,
*/
MYTAB_ENUM AS
(
SELECT
T.*
, ROWNUMBER() OVER (PARTITION BY PRODUCT ORDER BY START_DATE) RN
FROM MYTAB T
)
SELECT A.PRODUCT, A.END_DATE + 1 START_DT, B.START_DATE - 1 END_DT
FROM MYTAB_ENUM A
JOIN MYTAB_ENUM B ON B.PRODUCT = A.PRODUCT AND B.RN = A.RN + 1
WHERE A.END_DATE + 1 <> B.START_DATE
AND A.END_DATE < B.START_DATE;
The result is:
|PRODUCT|START_DT |END_DT |
|-------|----------|----------|
|B |2020-01-10|2020-01-11|
May be more efficient way:
WITH MYTAB2 AS
(
SELECT
T.*
, LAG(END_DATE) OVER (PARTITION BY PRODUCT ORDER BY START_DATE) END_DATE_PREV
FROM MYTAB T
)
SELECT PRODUCT, END_DATE_PREV + 1 START_DATE, START_DATE - 1 END_DATE
FROM MYTAB2
WHERE END_DATE_PREV + 1 <> START_DATE
AND END_DATE_PREV < START_DATE;
Thnx Mark, will try this one of these days.
Never heard of LAG in DB2 V12 for z/Os
Will read about it
Thnx

How to show the maximum number for each combination of customer and product in a specific state in Postgresql?

I just begin learning Postgresql recently.
I have a table named 'sales':
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
)
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
......
It looks like this:
And there are 500 rows in total.
Now I want to use the query to implement this:
For each combination of customer and product, output the maximum sales quantities for
NY and minimum sales quantities for NJ and CT in 3 separate columns. Like the first
report, display the corresponding dates (i.e., dates of those maximum and minimum sales
quantities). Furthermore, for CT and NJ, include only the sales that occurred after 2000;
for NY, include all sales.
It should be like this:
I have tried the following query:
SELECT
cust customer,
prod product,
MAX(CASE WHEN rn3 = 1 THEN quant END) NY_MAX,
MAX(CASE WHEN rn3 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date,
MIN(CASE WHEN rn2 = 1 THEN quant END) NJ_MIN,
MIN(CASE WHEN rn2 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date,
MIN(CASE WHEN rn1 = 1 THEN quant END) CT_MIN,
MIN(CASE WHEN rn1 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant) rn1,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant) rn2,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant DESC) rn3
FROM sales
) x
WHERE rn1 = 1 OR rn2 = 1 or rn3 = 1
GROUP BY cust, prod;
This is the result:
This is wrong because it shows me the maximum number and minimum number of all states, not of the specific state I want. And I have no idea how to deal with the year as the question as me to do.
We can handle this using separate CTEs along with a calendar table:
WITH custprod AS (
SELECT DISTINCT cust, prod
FROM sales
),
ny_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant DESC) rn
FROM sales
WHERE state = 'NY'
),
nj_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant) rn
FROM sales
WHERE state = 'NJ'
),
ct_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant) rn
FROM sales
WHERE state = 'CT'
)
SELECT
cp.cust,
cp.prod,
nys.quant AS ny_max,
nys.year::text || '-' || nys.month::text || '-' || nys.day::text AS ny_date,
njs.quant AS nj_max,
njs.year::text || '-' || njs.month::text || '-' || njs.day::text AS nj_date,
cts.quant AS ct_max,
cts.year::text || '-' || cts.month::text || '-' || cts.day::text AS ct_date
FROM custprod cp
LEFT JOIN ny_sales nys
ON cp.cust = nys.cust AND cp.prod = nys.prod AND nys.rn = 1
LEFT JOIN nj_sales njs
ON cp.cust = njs.cust AND cp.prod = njs.prod AND njs.rn = 1
LEFT JOIN ct_sales cts
ON cp.cust = cts.cust AND cp.prod = cts.prod AND cts.rn = 1
ORDER BY
cp.cust,
cp.prod;
Note: You didn't provide comprehensive sample data, but the above seems to be working in the demo link below.
Demo

Postgresql sorting string_agg

I run the following query string_agg(DISTINCT grades, '|'), which executed and outputted my result in this order 01|02|03|04|05|KG|PK.
How can I rearrange it this way PK|KG|01|02|03|04|05?
SELECT
U.CUSTOM_100000001 AS USERID
, SC.TITLE
, U.FIRST_NAME
, U.LAST_NAME
, string_AGG(DISTINCT SGL.short_name, '|')
FROM
USERS U
, COURSE_PERIODS CP
, SCHOOLS SC
, school_gradelevels SGL
WHERE
CP.SCHOOL_ID=SC.ID
AND
U.STAFF_ID = CP.TEACHER_ID
AND
SGL.SCHOOL_ID = SC.ID
AND
CP.SYEAR =2015
AND
SGL.short_name in('PK','KG','01','02','03','04','05','06','07','08')
AND
SC.CUSTOM_327 IN ('0021','0025','0051','0061','0071','0073','0081','0101','0111','0131','0211','0ā€Œā€‹221','0294','0301','0321','0341','0361','0371','0291')
GROUP BY
U.CUSTOM_100000001, SC.TITLE, U.FIRST_NAME, U.LAST_NAME
It's possible in PostgreSQL 9.0+:
SELECT
string_agg(DISTINCT SGL.short_name
, '|' ORDER BY
(substring(SGL.short_name, '^[0-9]+'))::int NULLS FIRST,
substring(SGL.short_name, '[^0-9_]+$') DESC)
FROM school_gradelevels SGL;
Test example:
WITH tbl(grade) AS (
VALUES
('01'),
('02'),
('03'),
('PK'),
('KG')
)
SELECT grade
FROM tbl
ORDER BY (substring(grade, '^[0-9]+'))::int NULLS FIRST, substring(grade, '[^0-9_]+$') DESC;
Result:
grade
-------
PK
KG
01
02
03
(5 rows)
Aggregate Expressions