Postgres very hard dynamic select statement with COALESCE - postgresql

Having a table and data like this
CREATE TABLE solicitations
(
id SERIAL PRIMARY KEY,
name text
);
CREATE TABLE donations
(
id SERIAL PRIMARY KEY,
solicitation_id integer REFERENCES solicitations, -- can be null
created_at timestamp without time zone NOT NULL DEFAULT (now() at time zone 'utc'),
amount bigint NOT NULL DEFAULT 0
);
INSERT INTO solicitations (name) VALUES
('solicitation1'), ('solicitation2');
INSERT INTO donations (created_at, solicitation_id, amount) VALUES
('2018-06-26', null, 10), ('2018-06-26', 1, 20), ('2018-06-26', 2, 30),
('2018-06-27', null, 10), ('2018-06-27', 1, 20),
('2018-06-28', null, 10), ('2018-06-28', 1, 20), ('2018-06-28', 2, 30);
How to make solicitation id's dynamic in following select statement using only postgres???
SELECT
"created_at"
-- make dynamic this begins
, COALESCE("no_solicitation", 0) AS "no_solicitation"
, COALESCE("1", 0) AS "1"
, COALESCE("2", 0) AS "2"
-- make dynamic this ends
FROM crosstab(
$source_sql$
SELECT
created_at::date as row_id
, COALESCE(solicitation_id::text, 'no_solicitation') as category
, SUM(amount) as value
FROM donations
GROUP BY row_id, category
ORDER BY row_id, category
$source_sql$
, $category_sql$
-- parametrize with ids from here begins
SELECT unnest('{no_solicitation}'::text[] || ARRAY(SELECT DISTINCT id::text FROM solicitations ORDER BY id))
-- parametrize with ids from here ends
$category_sql$
) AS ct (
"created_at" date
-- make dynamic this begins
, "no_solicitation" bigint
, "1" bigint
, "2" bigint
-- make dynamic this ends
)
The select should return data like this
created_at no_solicitation 1 2
____________________________________
2018-06-26 10 20 30
2018-06-27 10 20 0
2018-06-28 10 20 30
The solicitation ids that should parametrize select are the same as in
SELECT unnest('{no_solicitation}'::text[] || ARRAY(SELECT DISTINCT id::text FROM solicitations ORDER BY id))
One can fiddle the code here

I decided to use json, which is much simpler then crosstab
WITH
all_solicitation_ids AS (
SELECT
unnest('{no_solicitation}'::text[] ||
ARRAY(SELECT DISTINCT id::text FROM solicitations ORDER BY id))
AS col
)
, all_days AS (
SELECT
-- TODO: compute days ad hoc, from min created_at day of donations to max created_at day of donations
generate_series('2018-06-26', '2018-06-28', '1 day'::interval)::date
AS col
)
, all_days_and_all_solicitation_ids AS (
SELECT
all_days.col AS created_at
, all_solicitation_ids.col AS solicitation_id
FROM all_days, all_solicitation_ids
ORDER BY all_days.col, all_solicitation_ids.col
)
, donations_ AS (
SELECT
created_at::date as created_at
, COALESCE(solicitation_id::text, 'no_solicitation') as solicitation_id
, SUM(amount) as amount
FROM donations
GROUP BY created_at, solicitation_id
ORDER BY created_at, solicitation_id
)
, donations__ AS (
SELECT
all_days_and_all_solicitation_ids.created_at
, all_days_and_all_solicitation_ids.solicitation_id
, COALESCE(donations_.amount, 0) AS amount
FROM all_days_and_all_solicitation_ids
LEFT JOIN donations_
ON all_days_and_all_solicitation_ids.created_at = donations_.created_at
AND all_days_and_all_solicitation_ids.solicitation_id = donations_.solicitation_id
)
SELECT
jsonb_object_agg(solicitation_id, amount) ||
jsonb_object_agg('date', created_at)
AS data
FROM donations__
GROUP BY created_at
which results
data
______________________________________________________________
{"1": 20, "2": 30, "date": "2018-06-28", "no_solicitation": 10}
{"1": 20, "2": 30, "date": "2018-06-26", "no_solicitation": 10}
{"1": 20, "2": 0, "date": "2018-06-27", "no_solicitation": 10}
Thought its not the same that I requested.
It returns only data column, instead of date, no_solicitation, 1, 2, ...., to do so I need to use json_to_record, but I dont know how to produce its as argument dynamically

Related

Grouping user id columns together with string_agg on PostgreSQL 13

This is my emails table
create table emails (
id bigint not null primary key generated by default as identity,
name text not null
);
And contacts table:
create table contacts (
id bigint not null primary key generated by default as identity,
email_id bigint not null,
user_id bigint not null,
full_name text not null,
ordering int not null
);
As you can see I have user_id field here. There can be multiple same user ID's on my result so i want to join them using comma ,
Insert some data to the tables:
insert into emails (name)
values
('dennis1'),
('dennis2');
insert into contacts (id, email_id, user_id, full_name, ordering)
values
(5, 1, 1, 'dennis1', 9),
(6, 2, 1, 'dennis1', 5),
(7, 2, 1, 'dennis1', 1),
(8, 1, 3, 'john', 2),
(9, 2, 4, 'dennis7', 1),
(10, 2, 4, 'dennis7', 1);
My query is:
select em.name,
c.user_ids
from emails em
join (
select email_id, string_agg(user_id::text, ',' order by ordering desc) as user_ids
from contacts
group by email_id
) c on c.email_id = em.id
order by em.name;
Actual Result
name user_ids
dennis1 1,3
dennis2 1,1,4,4
Expected Result
name user_ids
dennis1 1,3
dennis2 1,4
On my real-world data, I get same user id like 50 times. Instead it should appear 1 time only. In example above, you see user 1 and 4 appears 2 times for dennis2 user.
How can I unique them?
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2e957b52eb46742f3ddea27ec36effb1
P.S: I tried to add user_id it to group by but this time I get duplicate rows...
demo:db<>fiddle
SELECT
name,
string_agg(user_id::text, ',' order by ordering desc)
FROM (
SELECT DISTINCT ON (em.id, c.user_id)
*
FROM emails em
JOIN contacts c ON c.email_id = em.id
) s
GROUP BY name
Join the tables
DISTINCT ON email and the user_id, so for every email record, there is no equal users
Aggregate

Redshift PostgreSQL Distinct ON Operator

I have a data set that I want to parse for to see multi-touch attribution. The data set is made up by leads who responded to a marketing campaign and their marketing source.
Each lead can respond to multiple campaigns and I want to get their first marketing source and their last marketing source in the same table.
I was thinking I could create two tables and use a select statement from both.
The first table would attempt to create a table with the most recent marketing source from every person (using email as their unique ID).
create table temp.multitouch1 as (
select distinct on (email) email, date, market_source as last_source
from sf.campaignmember
where date >= '1/1/2016' ORDER BY DATE DESC);
Then I would create a table with deduped emails but this time for the first source.
create table temp.multitouch2 as (
select distinct on (email) email, date, market_source as first_source
from sf.campaignmember
where date >= '1/1/2016' ORDER BY DATE ASC);
Finally I wanted to simply select the email and join the first and last market sources to it each in their own column.
select a.email, a.last_source, b.first_source, a.date
from temp.multitouch1 a
left join temp.multitouch b on b.email = a.email
Since distinct on doesn't work on redshift's postgresql version I was hoping someone had an idea to solve this issue in another way.
EDIT 2/22: For more context I'm dealing with people and campaigns they've responded to. Each record is a "campaign response" and every person can have more than one campaign response with multiple sources. I'm trying make a select statement which would dedupe by person and then have columns for the first campaign/marketing source they've responded to and the last campaign/marketing source they've responded to respectively.
EDIT 2/24: Ideal output is a table with 4 columns: email, last_source, first_source, date.
The first and last source columns would be the same for people with only 1 campaign member record and different for everyone who has more than 1 campaign member record.
I believe you could use row_number() inside case expressions like this:
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
FROM sf.campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
tested here: SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE campaignmember
(email varchar(3), date timestamp, market_source varchar(1))
;
INSERT INTO campaignmember
(email, date, market_source)
VALUES
('a#a', '2016-01-02 00:00:00', 'x'),
('a#a', '2016-01-03 00:00:00', 'y'),
('a#a', '2016-01-04 00:00:00', 'z'),
('b#b', '2016-01-02 00:00:00', 'x')
;
Query 1:
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
FROM campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
Results:
| email | first_source | first_date | last_source | last_date |
|-------|--------------|---------------------------|-------------|---------------------------|
| a#a | x | January, 02 2016 00:00:00 | z | January, 04 2016 00:00:00 |
| b#b | x | January, 02 2016 00:00:00 | x | January, 02 2016 00:00:00 |
& a small extension to the request, count the number of contact points.
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
, MAX(numof) AS Numberof_Contacts
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
, COUNT(*) OVER (PARTITION BY email) as numof
FROM campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
You can use the good old left join groupwise maximum.
SELECT DISTINCT c1.email, c1.date, c1.market_source
FROM sf.campaignmember c1
LEFT JOIN sf.campaignmember c2
ON c1.email = c2.email AND c1.date > c2.date AND c1.id > c2.id
LEFT JOIN sf.campaignmember c3
ON c1.email = c3.email AND c1.date < c3.date AND c1.id > c3.id
WHERE c1.date >= '1/1/2016' AND c2.date >= '1/1/2016'
AND (c2.email IS NULL OR c3.email IS NULL)
This assumes you have an unique id column, if (date, email) is unique id is not needed.

Select Based on Column Value in Postgres

I want to get values of TIMESTAMP and STRING_VALUE based on selected ID.
Suppose My Selected ID is 4259,4226 and 4259
Then It should select TIMESTAMP and STRING_VALUE for selected ID using CASE Statement.
I have tried Below query but returning Into Error
CREATE TABLE "DRL_FTO3_DI1_A0"
(
"VARIABLE" integer,
"CALCULATION" integer,
"TIMESTAMP_S" integer,
"TIMESTAMP_MS" integer,
"VALUE" double precision,
"STATUS" integer,
"GUID" character(36),
"STRVALUE" character varying(50)
)
INSERT INTO "DRL_FTO3_DI1_A0"(
"VARIABLE", "CALCULATION", "TIMESTAMP_S", "TIMESTAMP_MS", "VALUE",
"STATUS", "GUID", "STRVALUE")
VALUES (4226, 0, 1451120925, 329,0 , 1078067200, '', 'BATCH 1'),
(4306, 0, 1451120925, 329,0 , 1078067200, '', 'BATCH 2'),
(4311, 0, 1451120925, 329,0 , 1078067200, '', '2')
Now Suppose Out of three Variable(4226,4306,4311) I want to select 4226 and 4311
SELECT ((TIMESTAMP WITHOUT Time Zone 'epoch' + "TIMESTAMP_S" * INTERVAL '1 second') AT TIME ZONE 'UTC')::TIMESTAMP WITHOUT Time Zone,
SUM(CASE WHEN "VARIABLE" = 4226 Then "STRVALUE" END) as 'A',
SUM(CASE WHEN "VARIABLE" = 4311 Then "STRVALUE" END) as 'B'
FROM "DRL_FTO3_DI1_A0"
GROUP BY "TIMESTAMP_S"
ORDER BY "TIMESTAMP_S";
TIMESTAMP_S A B
2015-12-26 14:38:45 BATCH_1 2
This Is the Query using crosstab and It Works
SELECT *
FROM crosstab (
$$SELECT "VARIABLE", "TIMESTAMP_S", "STRVALUE"
FROM "DRL_FTO3_DI1_A0"
WHERE "VARIABLE" = ANY (array[4306,4226])
ORDER BY 1,2$$
)
AS
t (
"TIMESTAMP_S" integer,
"A" character varying,
"B" character varying
);

tsql selecting record based upon date and null

I have a table:
ID as int, ParentId as int, FreeFromTerxt as varchar(max), ActiveUntil as DateTime
As an example, within this table I have two records.
1, 100, 'Some text', '2015-11-30 12:10:09.0000000'
2, 100, 'New text', null
What I am trying to do is get the current active record, which in the case above would by record 1. To do that I just select with the following criteria:
ActiveUntil > GETDATE()
This works great, but if I change the first date to 2015-10-30, I need to get the null record as this record will take precedence.
So I changed the code to be:
((ActiveUntil is NULL) OR (ActiveUntil > GETDATE()))
But this does not work.
Here is some example with union:
DECLARE #t TABLE ( d DATETIME )
INSERT INTO #t
VALUES ( NULL ),
( '2015-11-30' )
SELECT TOP 1 *
FROM ( SELECT * , 1 AS ordering
FROM #t
WHERE d > GETDATE()
UNION ALL
SELECT * , 2 AS ordering
FROM #t
WHERE d IS NULL
) t
ORDER BY ordering, d
For 2015-11-30 it returns 2015-11-30. For 2015-10-30 it returns null.
Try like this:
((ActiveUntil is NULL) OR (CONVERT(char(10), ActiveUntil ,126)) > GETDATE())
Refer MSDN for Cast and Convert. The format specifier 126 is for YYYY-MM-DD. Or you can use CAST
((ActiveUntil is NULL) OR (CAST(ActiveUntil as Date) > GETDATE())

Merge row columns from a query in sql server

I have a simple query which returns the following rows :
Current rows:
Empl ECode DCode LCode Earn Dedn Liab
==== ==== ===== ===== ==== ==== ====
123 PerHr Null Null 13 0 0
123 Null Union Null 0 10 0
123 Null Per Null 0 20 0
123 Null Null MyHealth 0 0 5
123 Null Null 401 0 0 10
123 Null Null Train 0 0 15
123 Null Null CAFTA 0 0 20
However, I needed to see the above rows as follows :
Empl ECode DCode LCode Earn Dedn Liab
==== ==== ===== ===== ==== ==== ====
123 PerHr Union MyHealth 13 10 5
123 Null Per 401 0 20 10
123 Null Null Train 0 0 15
123 Null Null CAFTA 0 0 20
It's more like merging the succeeding rows into the preceding rows wherever there are Nulls encountered for EarnCode, DednCode & LiabCode. Actually what I wanted to see was to roll up everything to the preceding rows.
In Oracle we had this LAST_VALUE function which we could use, but in this case, I simply cannot figure out what to do with this.
In the example above, ECode's sum value column is Earn, DCode is Dedn, and LCode is Liab; notice that whenever either of ECode, DCode, or LCode is not null, there is a corresponding value in Earn, Dedn, or the Liab columns.
By the way, we are using SQL Server 2008 R2 at work.
Hoping for your advice, thanks.
This is basically the same technique as Tango_Guy does but without the temporary tables and with the sort made explicit. Because the number of rows per Empl is <= the number of rows already in place, I didn't need to make a dummy table for the leftmost table, just filtered the base data to where there was a match amongst the 3 codes. Also, I reviewed your discussion and the Earn and ECode move together. In fact a non-zero Earn in a column without an ECode is effectively lost (this is a good case for a constraint - non-zero Earn is not allowed when ECode is NULL):
http://sqlfiddle.com/#!3/7bd04/3
CREATE TABLE data(ID INT IDENTITY NOT NULL,
Empl VARCHAR(3),
ECode VARCHAR(8),
DCode VARCHAR(8),
LCode VARCHAR(8),
Earn INT NOT NULL,
Dedn INT NOT NULL,
Liab INT NOT NULL ) ;
INSERT INTO data (Empl, ECode, DCode, LCode, Earn, Dedn, Liab)
VALUES ('123', 'PerHr', NULL, NULL, 13, 0, 0),
('123', NULL, 'Union', NULL, 0, 10, 0),
('123', NULL, 'Per', NULL, 0, 20, 0),
('123', NULL, NULL, 'MyHealth', 0, 0, 5),
('123', NULL, NULL, '401', 0, 0, 10),
('123', NULL, NULL, 'Train', 0, 0, 15),
('123', NULL, NULL, 'CAFTA', 0, 0, 20);
WITH basedata AS (
SELECT *, ROW_NUMBER () OVER(ORDER BY ID) AS OrigSort, ROW_NUMBER () OVER(PARTITION BY Empl ORDER BY ID) AS EmplSort
FROM data
),
E AS (
SELECT Empl, ECode, Earn, ROW_NUMBER () OVER(PARTITION BY Empl ORDER BY OrigSort) AS EmplSort
FROM basedata
WHERE ECode IS NOT NULL
),
D AS (
SELECT Empl, DCode, Dedn, ROW_NUMBER () OVER(PARTITION BY Empl ORDER BY OrigSort) AS EmplSort
FROM basedata
WHERE DCode IS NOT NULL
),
L AS (
SELECT Empl, LCode, Liab, ROW_NUMBER () OVER(PARTITION BY Empl ORDER BY OrigSort) AS EmplSort
FROM basedata
WHERE LCode IS NOT NULL
)
SELECT basedata.Empl, E.ECode, D.Dcode, L.LCode, E.Earn, D.Dedn, L.Liab
FROM basedata
LEFT JOIN E
ON E.Empl = basedata.Empl AND E.EmplSort = basedata.EmplSort
LEFT JOIN D
ON D.Empl = basedata.Empl AND D.EmplSort = basedata.EmplSort
LEFT JOIN L
ON L.Empl = basedata.Empl AND L.EmplSort = basedata.EmplSort
WHERE E.ECode IS NOT NULL OR D.DCode IS NOT NULL OR L.LCode IS NOT NULL
ORDER BY basedata.Empl, basedata.EmplSort
Not sure if it is what you need but have you tried coalesc
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product ;
I have a solution, but it is very kludgy. If anyone has something better, that would be great.
However, an algorithm:
1) Get rownumbers for each distinct list of values in the columns
2) Join all columns based on rownumber
Example:
select Distinct ECode into #Ecode from source_table order by rowid;
select Distinct DCode into #Dcode from source_table order by rowid;
select Distinct LCode into #Lcode from source_table order by rowid;
select Distinct Earn into #Earn from source_table order by rowid;
select Distinct Dedn into #Dedn from source_table order by rowid;
select Distinct Liab into #Liab from source_table order by rowid;
select b.ECode, c.DCode, d.LCode, e.Earn, f.Dedn, g.Liab
from source_table a -- Note: a source for row numbers that will be >= the below
left outer join #Ecode b on a.rowid = b.rowid
left outer join #DCode c on a.rowid = c.rowid
left outer join #LCode d on a.rowid = d.rowid
left outer join #Earn e on a.rowid = e.rowid
left outer join #Dedn f on a.rowid = f.rowid
left outer join #Liab g on a.rowid = g.rowid
where
b.ecode is not null or
c.dcode is not null or
d.lcode is not null or
e.earn is not null or
f.dedn is not null or
g.liab is not null;
I didn't include Empl, since I don't know what role you want it to play. If this is all true for a given Empl, then you could just add it, join on it, and carry it through.
I don't like this solution at all, so hopefully someone else will come up with something more elegant.
Best,
David