Postgresql sorting string_agg

Postgresql sorting string_agg - postgresql

I run the following query string_agg(DISTINCT grades, '|'), which executed and outputted my result in this order 01|02|03|04|05|KG|PK.
How can I rearrange it this way PK|KG|01|02|03|04|05?
SELECT
U.CUSTOM_100000001 AS USERID
, SC.TITLE
, U.FIRST_NAME
, U.LAST_NAME
, string_AGG(DISTINCT SGL.short_name, '|')
FROM
USERS U
, COURSE_PERIODS CP
, SCHOOLS SC
, school_gradelevels SGL
WHERE
CP.SCHOOL_ID=SC.ID
AND
U.STAFF_ID = CP.TEACHER_ID
AND
SGL.SCHOOL_ID = SC.ID
AND
CP.SYEAR =2015
AND
SGL.short_name in('PK','KG','01','02','03','04','05','06','07','08')
AND
SC.CUSTOM_327 IN ('0021','0025','0051','0061','0071','0073','0081','0101','0111','0131','0211','0‌221','0294','0301','0321','0341','0361','0371','0291')
GROUP BY
U.CUSTOM_100000001, SC.TITLE, U.FIRST_NAME, U.LAST_NAME

It's possible in PostgreSQL 9.0+:
SELECT
string_agg(DISTINCT SGL.short_name
, '|' ORDER BY
(substring(SGL.short_name, '^[0-9]+'))::int NULLS FIRST,
substring(SGL.short_name, '[^0-9_]+$') DESC)
FROM school_gradelevels SGL;
Test example:
WITH tbl(grade) AS (
VALUES
('01'),
('02'),
('03'),
('PK'),
('KG')
)
SELECT grade
FROM tbl
ORDER BY (substring(grade, '^[0-9]+'))::int NULLS FIRST, substring(grade, '[^0-9_]+$') DESC;
Result:
grade
-------
PK
KG
01
02
03
(5 rows)
Aggregate Expressions

Related

How to collapse overlapping date periods with acceptable gaps using T-SQL?

We want to group our members' enrollments into "continuous enrollments," allowing for a gap of up to 45 days. I know how to use LEAD to determine if an enrollment should be grouped with the next, but I don't know how to group them. Would it be more appropriate to add 45 to the term date and subtract 45 from the effective date, then check for overlapping date periods? My goal is to have a SQL view that returns the results similar to the final query below. Thank you for your help.
SELECT '101' AS MemID, '2021-01-01' AS EffDate, '2021-01-31' AS TermDate INTO #T1 UNION
SELECT '101', '2021-02-01', '2021-02-28' UNION
SELECT '101', '2021-03-01', '2021-03-31' UNION
SELECT '101', '2021-06-01', '2021-06-30' UNION
SELECT '999', '2021-01-01', '2021-01-15' UNION
SELECT '999', '2021-09-01', '2021-09-28' UNION
SELECT '999', '2021-10-01', '2021-10-31'
SELECT *
, LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS LeadEffDate
, DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate))) AS DaysToNextEnrollment
, CASE WHEN (DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate)))) <= 45 THEN 1 ELSE 0 END AS CombineWithNextRecord
FROM #T1
-- result objective
SELECT 101 AS MemID, '2021-01-01' AS EffDate, '2021-03-31' AS TermDate UNION
SELECT 101, '2021-06-01', '2021-06-30' UNION
SELECT 999, '2021-01-01', '2021-01-15' UNION
SELECT 999, '2021-09-01', '2021-10-31'

I think you are really close. Your question is very similar to
TSQL - creating from-to date table while ignoring in-between steps with conditions with a logic difference on what you want to consider to be the same group.
My basic approach is to use the LAG() function to figure out the previous values for MemID and TermDate and combine that with your 45 day rule to define a group. And finally get the first and last values of each group.
Here is my response to that question modified to your situation.
SELECT
a4.MemID
, CONVERT (DATE, a4.First_EffDate) AS [EffDate]
, CONVERT (DATE, a4.TermDate) AS [TermDate]
FROM (
SELECT
a3.MemID
, a3.EffDate
, a3.TermDate
, a3.MemID_group
, FIRST_VALUE (a3.EffDate) OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate) AS [First_EffDate]
, ROW_NUMBER () OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate DESC) AS [Row_number]
FROM (
SELECT
a2.MemID
, a2.EffDate
, a2.TermDate
, a2.Previous_MemID
, a2.Previous_TermDate
, a2.New_group
, SUM (a2.New_group) OVER (ORDER BY a2.MemID, a2.EffDate) AS [MemID_group]
FROM (
SELECT
a1.MemID
, a1.EffDate
, a1.TermDate
, a1.Previous_MemID
, a1.Previous_TermDate
---------------------------------------------------------------------------------
-- new group if the MemID is different from the previous row OR
-- if the MemID is the same as the previous row AND it has been more than 45 days
-- between the TermDate of the previous row and the EffDate of the current row
,
IIF((a1.MemID <> a1.Previous_MemID)
OR (
a1.MemID = a1.Previous_MemID
AND DATEDIFF (DAY, a1.Previous_TermDate, a1.EffDate) > 45
)
, 1
, 0) AS [New_group]
---------------------------------------------------------------------------------
FROM (
SELECT
MemID
, EffDate
, TermDate
, LAG (MemID) OVER (ORDER BY MemID) AS [Previous_MemID]
, LAG (TermDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS [Previous_TermDate]
FROM #T1
) a1
) a2
) a3
) a4
WHERE a4.[Row_number] = 1;
Here is the dbfiddle.

How to get number of consecutive days from current date using postgres?

I want to get the number of consecutive days from the current date using Postgres SQL.
enter image description here
Above is the scenario in which I have highlighted consecutive days count should be like this.
Below is the SQL query which I have created but it's not returning the expected result
with grouped_dates as (
select user_id, created_at::timestamp::date,
(created_at::timestamp::date - (row_number() over (partition by user_id order by created_at::timestamp::date) || ' days')::interval)::date as grouping_date
from watch_history
)
select * , dense_rank() over (partition by grouping_date order by created_at::timestamp::date) as in_streak
from grouped_dates where user_id = 702
order by created_at::timestamp::date
Can anyone please help me to resolve this issue?
If anyhow we can able to apply distinct for created_at field to below query then I will get solutions for my issue.
WITH list AS
(
SELECT user_id,
(created_at::timestamp::date - (row_number() over (partition by user_id order by created_at::timestamp::date) || ' days')::interval)::date as next_day
FROM watch_history
)
SELECT user_id, count(*) AS number_of_consecutive_days
FROM list
WHERE next_day IS NOT NULL
GROUP BY user_id
Does anyone have an idea how to apply distinct to created_at for the above mentioned query ?

To get the "number of consecutive days" for the same user_id :
WITH list AS
(
SELECT user_id
, array_agg(created_at) OVER (PARTITION BY user_id ORDER BY created_at RANGE BETWEEN CURRENT ROW AND '1 day' FOLLOWING) AS consecutive_days
FROM watch_history
)
SELECT user_id, count(DISTINCT d.day) AS number_of_consecutive_days
FROM list
CROSS JOIN LATERAL unnest(consecutive_days) AS d(day)
WHERE array_length(consecutive_days, 1) > 1
GROUP BY user_id
To get the list of "consecutive days" for the same user_id :
WITH list AS
(
SELECT user_id
, array_agg(created_at) OVER (PARTITION BY user_id ORDER BY created_at RANGE BETWEEN CURRENT ROW AND '1 day' FOLLOWING) AS consecutive_days
FROM watch_history
)
SELECT user_id
, array_agg(DISTINCT d.day ORDER BY d.day) AS list_of_consecutive_days
FROM list
CROSS JOIN LATERAL unnest(consecutive_days) AS d(day)
WHERE array_length(consecutive_days, 1) > 1
GROUP BY user_id
full example & result in dbfiddle

Redshift PostgreSQL Distinct ON Operator

I have a data set that I want to parse for to see multi-touch attribution. The data set is made up by leads who responded to a marketing campaign and their marketing source.
Each lead can respond to multiple campaigns and I want to get their first marketing source and their last marketing source in the same table.
I was thinking I could create two tables and use a select statement from both.
The first table would attempt to create a table with the most recent marketing source from every person (using email as their unique ID).
create table temp.multitouch1 as (
select distinct on (email) email, date, market_source as last_source
from sf.campaignmember
where date >= '1/1/2016' ORDER BY DATE DESC);
Then I would create a table with deduped emails but this time for the first source.
create table temp.multitouch2 as (
select distinct on (email) email, date, market_source as first_source
from sf.campaignmember
where date >= '1/1/2016' ORDER BY DATE ASC);
Finally I wanted to simply select the email and join the first and last market sources to it each in their own column.
select a.email, a.last_source, b.first_source, a.date
from temp.multitouch1 a
left join temp.multitouch b on b.email = a.email
Since distinct on doesn't work on redshift's postgresql version I was hoping someone had an idea to solve this issue in another way.
EDIT 2/22: For more context I'm dealing with people and campaigns they've responded to. Each record is a "campaign response" and every person can have more than one campaign response with multiple sources. I'm trying make a select statement which would dedupe by person and then have columns for the first campaign/marketing source they've responded to and the last campaign/marketing source they've responded to respectively.
EDIT 2/24: Ideal output is a table with 4 columns: email, last_source, first_source, date.
The first and last source columns would be the same for people with only 1 campaign member record and different for everyone who has more than 1 campaign member record.

I believe you could use row_number() inside case expressions like this:
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
FROM sf.campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
tested here: SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE campaignmember
(email varchar(3), date timestamp, market_source varchar(1))
;
INSERT INTO campaignmember
(email, date, market_source)
VALUES
('a#a', '2016-01-02 00:00:00', 'x'),
('a#a', '2016-01-03 00:00:00', 'y'),
('a#a', '2016-01-04 00:00:00', 'z'),
('b#b', '2016-01-02 00:00:00', 'x')
;
Query 1:
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
FROM campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
Results:
| email | first_source | first_date | last_source | last_date |
|-------|--------------|---------------------------|-------------|---------------------------|
| a#a | x | January, 02 2016 00:00:00 | z | January, 04 2016 00:00:00 |
| b#b | x | January, 02 2016 00:00:00 | x | January, 02 2016 00:00:00 |
& a small extension to the request, count the number of contact points.
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
, MAX(numof) AS Numberof_Contacts
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
, COUNT(*) OVER (PARTITION BY email) as numof
FROM campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email

You can use the good old left join groupwise maximum.
SELECT DISTINCT c1.email, c1.date, c1.market_source
FROM sf.campaignmember c1
LEFT JOIN sf.campaignmember c2
ON c1.email = c2.email AND c1.date > c2.date AND c1.id > c2.id
LEFT JOIN sf.campaignmember c3
ON c1.email = c3.email AND c1.date < c3.date AND c1.id > c3.id
WHERE c1.date >= '1/1/2016' AND c2.date >= '1/1/2016'
AND (c2.email IS NULL OR c3.email IS NULL)
This assumes you have an unique id column, if (date, email) is unique id is not needed.

How to order UNPIVOT

I have the following UNPIVOT code and I would like to order it by the FactSheetSummary columns so that when it is converted to rows it is order 1 - 12:
INSERT INTO #Results
SELECT DISTINCT ReportingDate, PortfolioID,ISIN, PortfolioNme, Section,REPLACE(REPLACE(Risks,'‘',''''),'’','''')
FROM
(SELECT DISTINCT
ReportingDate
, PortfolioID
, ISIN
, PortfolioNme
, Section
, FactSheetSummary_1, FactSheetSummary_2, FactSheetSummary_3
, FactSheetSummary_4, FactSheetSummary_5, FactSheetSummary_6
, FactSheetSummary_7, FactSheetSummary_8, FactSheetSummary_9
, FactSheetSummary_10, FactSheetSummary_11, FactSheetSummary_12
FROM #WorkingTableFactsheet) p
UNPIVOT
(Risks FOR FactsheetSummary IN
( FactSheetSummary_1, FactSheetSummary_2, FactSheetSummary_3
, FactSheetSummary_4, FactSheetSummary_5, FactSheetSummary_6
, FactSheetSummary_7, FactSheetSummary_8, FactSheetSummary_9
, FactSheetSummary_10, FactSheetSummary_11, FactSheetSummary_12)
)AS unpvt;
--DELETE records where there are no Risk Narratives
DELETE FROM #Results
WHERE Risks = ''
SELECT
ReportingDate
, PortfolioID
, ISIN
, PortfolioNme
, Section
, Risks
, ROW_NUMBER() OVER(PARTITION BY ISIN,Section ORDER BY ISIN,Section,Risks) as SortOrder
FROM #Results
order by ISIN, Risks
Is it possible to do this? I thought the IN of the UNPIVOT would dictate the order? Do I need to add a column to dictate which I would like to be 1 through to 12?

Just use case expression to order:
order by case FactsheetSummary
when 'FactSheetSummary_1' then 1
when 'FactSheetSummary_2' then 2
when 'FactSheetSummary_12' then 12 end

TSQL invalid HAVING count

I am using SSMS 2008 and trying to use a HAVING statement. This should be a real simple query. However, I am only getting one record returned event though there are numerous duplicates.
Am I doing something wrong with the HAVING statement here? Or is there some other function that I could use instead?
select
address_desc,
people_id
from
dbo.address_view
where people_id is not NULL
group by people_id , address_desc
having count(*) > 1
sample data from address_view:
people_id address_desc
---------- ------------
Murfreesboro, TN 37130 F15D1135-9947-4F66-B778-00E43EC44B9E
11 Mohawk Rd., Burlington, MA 01803 C561918F-C2E9-4507-BD7C-00FB688D2D6E
Unknown, UN 00000 C561918F-C2E9-4507-BD7C-00FB688D2D6E
Jacksonville, NC 28546 FC7C78CD-8AEA-4C8E-B93D-010BF8E4176D
Memphis, TN 38133 8ED8C601-5D35-4EB7-9217-012905D6E9F1
44 Maverick St., Fitchburg, MA 8ED8C601-5D35-4EB7-9217-012905D6E9F1

The GROUP BY is going to lump your duplicates together into a single row.
I think instead, you want to find all people_id values with duplicate address_desc:
SELECT a.address_desc, a.people_id
FROM dbo.address_view a
INNER JOIN (SELECT address_desc
FROM dbo.address_view
GROUP BY address_desc
HAVING COUNT(*) > 1) t
ON a.address_desc = t.address_desc

using row_number and partition you can find the duplicate occurrences where row_num>1
select address_desc,
people_id,
row_num
from
(
select
address_desc,
people_id,
row_number() over (partition by address_desc order by address_desc) row_num
from
dbo.address_view
where people_id is not NULL
) x
where row_num>1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse