Grouping user id columns together with string_agg on PostgreSQL 13 - postgresql

This is my emails table
create table emails (
id bigint not null primary key generated by default as identity,
name text not null
);
And contacts table:
create table contacts (
id bigint not null primary key generated by default as identity,
email_id bigint not null,
user_id bigint not null,
full_name text not null,
ordering int not null
);
As you can see I have user_id field here. There can be multiple same user ID's on my result so i want to join them using comma ,
Insert some data to the tables:
insert into emails (name)
values
('dennis1'),
('dennis2');
insert into contacts (id, email_id, user_id, full_name, ordering)
values
(5, 1, 1, 'dennis1', 9),
(6, 2, 1, 'dennis1', 5),
(7, 2, 1, 'dennis1', 1),
(8, 1, 3, 'john', 2),
(9, 2, 4, 'dennis7', 1),
(10, 2, 4, 'dennis7', 1);
My query is:
select em.name,
c.user_ids
from emails em
join (
select email_id, string_agg(user_id::text, ',' order by ordering desc) as user_ids
from contacts
group by email_id
) c on c.email_id = em.id
order by em.name;
Actual Result
name user_ids
dennis1 1,3
dennis2 1,1,4,4
Expected Result
name user_ids
dennis1 1,3
dennis2 1,4
On my real-world data, I get same user id like 50 times. Instead it should appear 1 time only. In example above, you see user 1 and 4 appears 2 times for dennis2 user.
How can I unique them?
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2e957b52eb46742f3ddea27ec36effb1
P.S: I tried to add user_id it to group by but this time I get duplicate rows...

demo:db<>fiddle
SELECT
name,
string_agg(user_id::text, ',' order by ordering desc)
FROM (
SELECT DISTINCT ON (em.id, c.user_id)
*
FROM emails em
JOIN contacts c ON c.email_id = em.id
) s
GROUP BY name
Join the tables
DISTINCT ON email and the user_id, so for every email record, there is no equal users
Aggregate

Related

Enforcing a unique relationship over multiple columns where one column is nullable

Given the table
ID PERSON_ID PLAN EMPLOYER_ID TERMINATION_DATE
1 123 ABC 321 2020-01-01
2 123 DEF 321 (null)
3 123 ABC 321 (null)
4 123 ABC 321 (null)
I want to exclude the 4th entry. (The 3rd entry shows the person was re-hired and therefore is a new relationship. I'm only showing relevant fields)
My first attempt was to simply create a unique index over PERSON_ID / PLAN / EMPLOYER_ID / TERMINATION_DATE, thinking that DB2 for IBMi considered nulls equal in a unique index. I was evidently wrong...
Is there a way to enforce uniqueness over these columns, or,
is there a better way to approach the value of termination date? (null is not technically correct; I'm thinking of it as more true/false, but the business logic needs a date)
Edit
According to the docs for 7.3:
UNIQUE
Prevents the table from containing two or more rows with the same value of the index key. When UNIQUE is used, all null values for a column are considered equal. For example, if the key is a single column that can contain null values, that column can contain only one null value. The constraint is enforced when rows of the table are updated or new rows are inserted.
The constraint is also checked during the execution of the CREATE INDEX statement. If the table already contains rows with duplicate key values, the index is not created.
UNIQUE WHERE NOT NULL
Prevents the table from containing two or more rows with the same value of the index key, where all null values for a column are not considered equal. Multiple null values in a column are allowed. Otherwise, this is identical to UNIQUE.
So, the behavior I'm seeing looks more like UNIQUE WHERE NOT NULL. When I generate SQL for this table, I see
ADD CONSTRAINT TERMEMPPLANSSN
UNIQUE( TERMINATION_DATE , EMPLOYERID , PLAN_CODE , SSN ) ;
(note this is showing the real field names, not the ones I used in my example)
Edit 2
Bottom line, Constraint !== Index. When I went back and created an actual index, I got the desired behavior.
CREATE TABLE PERSON
(
ID INT NOT NULL
, PERSON_ID INT NOT NULL
, PLAN CHAR(3) NOT NULL
, EMPLOYER_ID INT
, TERMINATION_DATE DATE
);
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE)
VALUES
(1, 123, 'ABC', 321, DATE('2020-01-01'))
, (2, 123, 'DEF', 321, CAST(NULL AS DATE))
, (3, 123, 'ABC', 321, CAST(NULL AS DATE))
WITH NC;
--- To not allow: ---
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE) VALUES
(4, 123, 'ABC', 321, CAST(NULL AS DATE))
or
(4, 123, 'ABC', 321, DATE('2020-01-01'))
You may:
CREATE UNIQUE INDEX PERSON_U1 ON PERSON
(PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE);
--- To not allow: ---
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE) VALUES
(4, 123, 'ABC', 321, DATE('2020-01-01'))
but allow multiple:
(X, 123, 'ABC', 321, CAST(NULL AS DATE))
(Y, 123, 'ABC', 321, CAST(NULL AS DATE))
...
You may:
CREATE UNIQUE WHERE NOT NULL INDEX PERSON_U2 ON PERSON
(PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE);

Postgres very hard dynamic select statement with COALESCE

Having a table and data like this
CREATE TABLE solicitations
(
id SERIAL PRIMARY KEY,
name text
);
CREATE TABLE donations
(
id SERIAL PRIMARY KEY,
solicitation_id integer REFERENCES solicitations, -- can be null
created_at timestamp without time zone NOT NULL DEFAULT (now() at time zone 'utc'),
amount bigint NOT NULL DEFAULT 0
);
INSERT INTO solicitations (name) VALUES
('solicitation1'), ('solicitation2');
INSERT INTO donations (created_at, solicitation_id, amount) VALUES
('2018-06-26', null, 10), ('2018-06-26', 1, 20), ('2018-06-26', 2, 30),
('2018-06-27', null, 10), ('2018-06-27', 1, 20),
('2018-06-28', null, 10), ('2018-06-28', 1, 20), ('2018-06-28', 2, 30);
How to make solicitation id's dynamic in following select statement using only postgres???
SELECT
"created_at"
-- make dynamic this begins
, COALESCE("no_solicitation", 0) AS "no_solicitation"
, COALESCE("1", 0) AS "1"
, COALESCE("2", 0) AS "2"
-- make dynamic this ends
FROM crosstab(
$source_sql$
SELECT
created_at::date as row_id
, COALESCE(solicitation_id::text, 'no_solicitation') as category
, SUM(amount) as value
FROM donations
GROUP BY row_id, category
ORDER BY row_id, category
$source_sql$
, $category_sql$
-- parametrize with ids from here begins
SELECT unnest('{no_solicitation}'::text[] || ARRAY(SELECT DISTINCT id::text FROM solicitations ORDER BY id))
-- parametrize with ids from here ends
$category_sql$
) AS ct (
"created_at" date
-- make dynamic this begins
, "no_solicitation" bigint
, "1" bigint
, "2" bigint
-- make dynamic this ends
)
The select should return data like this
created_at no_solicitation 1 2
____________________________________
2018-06-26 10 20 30
2018-06-27 10 20 0
2018-06-28 10 20 30
The solicitation ids that should parametrize select are the same as in
SELECT unnest('{no_solicitation}'::text[] || ARRAY(SELECT DISTINCT id::text FROM solicitations ORDER BY id))
One can fiddle the code here
I decided to use json, which is much simpler then crosstab
WITH
all_solicitation_ids AS (
SELECT
unnest('{no_solicitation}'::text[] ||
ARRAY(SELECT DISTINCT id::text FROM solicitations ORDER BY id))
AS col
)
, all_days AS (
SELECT
-- TODO: compute days ad hoc, from min created_at day of donations to max created_at day of donations
generate_series('2018-06-26', '2018-06-28', '1 day'::interval)::date
AS col
)
, all_days_and_all_solicitation_ids AS (
SELECT
all_days.col AS created_at
, all_solicitation_ids.col AS solicitation_id
FROM all_days, all_solicitation_ids
ORDER BY all_days.col, all_solicitation_ids.col
)
, donations_ AS (
SELECT
created_at::date as created_at
, COALESCE(solicitation_id::text, 'no_solicitation') as solicitation_id
, SUM(amount) as amount
FROM donations
GROUP BY created_at, solicitation_id
ORDER BY created_at, solicitation_id
)
, donations__ AS (
SELECT
all_days_and_all_solicitation_ids.created_at
, all_days_and_all_solicitation_ids.solicitation_id
, COALESCE(donations_.amount, 0) AS amount
FROM all_days_and_all_solicitation_ids
LEFT JOIN donations_
ON all_days_and_all_solicitation_ids.created_at = donations_.created_at
AND all_days_and_all_solicitation_ids.solicitation_id = donations_.solicitation_id
)
SELECT
jsonb_object_agg(solicitation_id, amount) ||
jsonb_object_agg('date', created_at)
AS data
FROM donations__
GROUP BY created_at
which results
data
______________________________________________________________
{"1": 20, "2": 30, "date": "2018-06-28", "no_solicitation": 10}
{"1": 20, "2": 30, "date": "2018-06-26", "no_solicitation": 10}
{"1": 20, "2": 0, "date": "2018-06-27", "no_solicitation": 10}
Thought its not the same that I requested.
It returns only data column, instead of date, no_solicitation, 1, 2, ...., to do so I need to use json_to_record, but I dont know how to produce its as argument dynamically

How to filter a query based on a jsonb data?

Not even sure if it's possible to do this kind of query in postgres. At least i'm stuck.
I have two tables: a product recommendation list, containing multiple products to be recommended to a particular customer; and a transaction table indicating the product bought by customer and transaction details.
I'm trying to track the performance of my recommendation by plotting all the transaction that match the recommendations (both customer and product).
Below is my test case.
Kindly help
create table if not exists productRec( --Product Recommendation list
task_id int,
customer_id int,
detail jsonb);
truncate productRec;
insert into productRec values (1, 2, '{"1":{"score":5, "name":"KitKat"},
"4":{"score":2, "name":"Yuppi"}
}'),
(1, 3, '{"1":{"score":3, "name":"Yuppi"},
"4":{"score":2, "name":"GoldenSnack"}
}'),
(1, 4, '{"1":{"score":3, "name":"Chickies"},
"4":{"score":2, "name":"Kitkat"}
}');
drop table txn;
create table if not exists txn( --Transaction table
customer_id int,
item_id text,
txn_value numeric,
txn_date date);
truncate txn;
insert into txn values (1, 'Yuppi', 500, DATE '2001-01-01'), (2, 'Kitkat', 2000, DATE '2001-01-01'),
(3, 'Kitkat', 2000, DATE '2001-02-01'), (4, 'Chickies', 200, DATE '2001-09-01');
--> Query must plot:
--Transaction value vs date where the item_id is inside the recommendation for that customer
--ex: (2000, 2001-01-01), (200, 2001-09-01)
We can get each recommendation as its own row with jsonb_each. I don't know what to do with the keys so I just take the value (still jsonb) and then the name inside it (the ->> outputs text).
select
customer_id,
(jsonb_each(detail)).value->>'name' as name
from productrec
So now we have a list of customer_ids and item_ids they were recommended. Now we can just join this with the transactions.
select
txn.txn_value,
txn.txn_date
from txn
join (
select
customer_id,
(jsonb_each(detail)).value->>'name' as name
from productrec
) p ON (
txn.customer_id = p.customer_id AND
lower(txn.item_id) = lower(p.name)
);
In your example data you spelled Kitkat differently in the recommendation table for customer 2. I added lowercasing in the join condition to counter that but it might not be the right solution.
txn_value | txn_date
-----------+------------
2000 | 2001-01-01
200 | 2001-09-01
(2 rows)

Create a GROUP BY query to show the latest row

So my tables are:
user_msgs: http://sqlfiddle.com/#!9/7d6a9
token_msgs: http://sqlfiddle.com/#!9/3ac0f
There are only these 4 users as listed. When a user sends a message to another user, the query checks if there is a communication between those 2 users already started by checking the token_msgs table's from_id and to_id and if no token exists, create token and use that in the user_msgs table. So the token is a unique field in these 2 tables.
Now, I want to list the users with whom user1 has started the conversation. So if from_id or to_id include 1 those conversation should be listed.
There are multiple rows for conversations in the user_msgs table for same users.
I think I need to use group_concat but not sure. I am trying to build the query to do the same and show the latest of the conversation on the top, hence ORDER BY time DESC:
SELECT * FROM (SELECT * FROM user_msgs ORDER BY time DESC) as temp_messages GROUP BY token
Please help in building the query.
Thanks.
CREATE TABLE `token_msgs` (
`id` int(11) NOT NULL,
`from_id` int(100) NOT NULL,
`to_id` int(100) NOT NULL,
`token` varchar(50) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Dumping data for table `token_msgs`
--
INSERT INTO `token_msgs` (`id`, `from_id`, `to_id`, `token`) VALUES
(1, 1, 2, '1omcda84om2'),
(2, 1, 3, '1omd0666om3'),
(3, 4, 1, '4om6713bom1'),
(4, 3, 4, '3om0e1abom4');
---
CREATE TABLE `user_msgs` (
`id` int(11) NOT NULL,
`token` varchar(50) NOT NULL,
`from_id` int(50) NOT NULL,
`to_id` int(50) NOT NULL,
`message` text NOT NULL,
`time` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Dumping data for table `user_msgs`
--
INSERT INTO `user_msgs` (`id`, `token`, `from_id`, `to_id`, `message`, `time`) VALUES
(1, '1omcda84om2', 1, 2, '1 => 2\r\nCan I have your picture so I can show Santa what I want for Christmas?', '2016-08-14 22:50:34'),
(2, '1omcda84om2', 2, 1, 'Makeup tip: You\'re not in the circus.\r\n2=>1', '2016-08-14 22:51:26'),
(3, '1omd0666om3', 1, 3, 'Behind every fat woman there is a beautiful woman. No seriously, your in the way. 1=>3', '2016-08-14 22:52:08'),
(4, '1omd0666om3', 3, 1, 'Me: Siri, why am I alone? Siri: *opens front facing camera*', '2016-08-14 22:53:24'),
(5, '1omcda84om2', 1, 2, 'I know milk does a body good, but damn girl, how much have you been drinking? 1 => 2', '2016-08-14 22:54:36'),
(6, '4om6713bom1', 4, 1, 'Hi, Im interested in your profile. Please send your contact number and I will call you.', '2016-08-15 00:18:11'),
(7, '3om0e1abom4', 3, 4, 'Girl you\'re like a car accident, cause I just can\'t look away. 3=>4', '2016-08-15 00:42:57'),
(8, '3om0e1abom4', 3, 4, 'Hola!! \r\n3=>4', '2016-08-15 00:43:34'),
(9, '1omd0666om3', 3, 1, 'Sometext from 3=>1', '2016-08-15 13:53:54'),
(10, '3om0e1abom4', 3, 4, 'More from 3->4', '2016-08-15 13:54:46');
Let's try this (on fiddle):
SELECT *
FROM (SELECT * FROM user_msgs
WHERE from_id = 1 OR to_id = 1
ORDER BY id DESC
) main
GROUP BY from_id + to_id
ORDER BY id DESC
Thing to mention GROUP BY from_id + to_id this is because sum makes it unique for each conversation between two persons: like from 1 to 3 is same as from 3 to 1. No need for extra table, and it makes it harder to maintain
UPDATE:
Because sometimes GROUPing works weird in MySQL I've created new approach to this problem:
SELECT
a.*
FROM user_msgs a
LEFT JOIN user_msgs b
ON ((b.`from_id` = a.`from_id` AND b.`to_id` = a.`to_id`)
OR (b.`from_id` = a.`to_id` AND b.`to_id` = a.`from_id`))
AND a.`id` < b.`id`
WHERE (a.from_id = 1 OR a.to_id = 1)
AND b.`id` IS NULL
ORDER BY a.id DESC

How to query the data in a join table by two sets of joined records?

I've got three tables: users, courses, and grades, the latter of which joins users and courses with some metadata like the user's score for the course. I've created a SQLFiddle, though the site doesn't appear to be working at the moment. The schema looks like this:
CREATE TABLE users(
id INT,
name VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO users VALUES
(1, 'Beth'),
(2, 'Alice'),
(3, 'Charles'),
(4, 'Dave');
CREATE TABLE courses(
id INT,
title VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO courses VALUES
(1, 'Biology'),
(2, 'Algebra'),
(3, 'Chemistry'),
(4, 'Data Science');
CREATE TABLE grades(
id INT,
user_id INT,
course_id INT,
score INT,
PRIMARY KEY (ID)
);
INSERT INTO grades VALUES
(1, 2, 2, 89),
(2, 2, 1, 92),
(3, 1, 1, 93),
(4, 1, 3, 88);
I'd like to know how (if possible) to construct a query which specifies some users.id values (1, 2, 3) and courses.id values (1, 2, 3) and returns those users' grades.score values for those courses
| name | Algebra | Biology | Chemistry |
|---------|---------|---------|-----------|
| Alice | 89 | 92 | |
| Beth | | 93 | 88 |
| Charles | | | |
In my application logic, I'll be receiving an array of user_ids and course_ids, so the query needs to select those users and courses dynamically by primary key. (The actual data set contains millions of users and tens of thousands of courses—the examples above are just a sample to work with.)
Ideally, the query would:
use the course titles as dynamic attributes/column headers for the users' score data
sort the row and column headers alphabetically
include empty/NULL cells if the user-course pair has no grades relationship
I suspect I may need some combination of JOINs and Postgresql's crosstab, but I can't quite wrap my head around it.
Update: learning that the terminology for this is "dynamic pivot", I found this SO answer which appears to be trying to solve a related problem in Postgres with crosstab()
I think a simple pivot query should work here, since you only have 4 courses in your data set to pivot.
SELECT t1.name,
MAX(CASE WHEN t3.title = 'Biology' THEN t2.score ELSE NULL END) AS Biology,
MAX(CASE WHEN t3.title = 'Algebra' THEN t2.score ELSE NULL END) AS Algebra,
MAX(CASE WHEN t3.title = 'Chemistry' THEN t2.score ELSE NULL END) AS Chemistry,
MAX(CASE WHEN t3.title = 'Data Science' THEN t2.score ELSE NULL END) AS Data_Science
FROM users t1
LEFT JOIN grades t2
ON t1.id = t2.user_id
LEFT JOIN courses t3
ON t2.course_id = t3.id
GROUP BY t1.name
Follow the link below for a running demo. I used MySQL because, as you have noticed, SQLFiddle seems to be perpetually busted the other databases.
SQLFiddle