Return rows which have the same values in two columns, but different values in another - postgresql

I have a table that looks like this:
id | name | address | code
-----------+--------------------------+--------------------+----------
101 | joe smith | 1 long road | SC1
102 | joe smith | 6 long road | SC1
103 | amy hughes | 5 hillside lane | SC5
104 | amy hughes | 5 hillside lane | SC5
I want to return the rows that are duplications based on name and code but have different address fields.
I had something like this originally (which looked for duplications across the name, address and code columns:
SELECT name, address, code, count(*)
FROM table_name
GROUP BY 1,2,3
HAVING count(*) >1;
Is there a way I can expand on the above to only return rows that have the same name and code but different address fields?
In my example data above, I would only want to return:
id | name | address | code
-----------+--------------------------+--------------------+----------
101 | joe smith | 1 long road | SC1
102 | joe smith | 6 long road | SC1

Remove address from the select list and GROUP BY and use count(DISTINCT):
SELECT name, code, count(DISTINCT address)
FROM table_name
GROUP BY name, code
HAVING count(DISTINCT address) > 1;

Related

How to select rows based on properties of another row?

Had a question..
| a_id | name | r_id | message | date
_____________________________________________
| 1 | bob | 77 | bob here | 1-jan
| 1 | bob | 77 | bob here again | 2-jan
| 2 | jack | 77 | jack here. | 2-jan
| 1 | bob | 79 | in another room| 3-feb
| 3 | gill | 79 | gill here | 4-feb
These are basically accounts (a_id) chatting inside different rooms (r_id)
I'm trying to find the last chat message for every room that jack a_id = 2 is chatting in.
What i've tried so far is using distinct on (r_id) ... ORDER BY r_id, date DESC.
But this incorrectly gives me the last message in every room instead of only giving the last message in everyroom that jack belongs to.
| 2 | jack | 77 | jack here. | 2-jan
| 3 | gill | 79 | gill here | 4-feb
Is this a partition problem instead distinct on?
I would suggest :
to group the rows by r_id with a GROUP BY clause
to select only the groups where a_id = 2 is included with a HAVING clause which aggregates the a_id of each group : HAVING array_agg(a_id) #> array[2]
to select the latest message of each selected group by aggregating its rows in an array with ORDER BY date DESC and selecting the first element of the array : (array_agg(t.*))[1]
to convert the selected rows into a json object and then displaying the expected result by using the json_populate_record function
The full query is :
SELECT (json_populate_record(null :: my_table, (array_agg(to_json(t.*)))[1])).*
FROM my_table AS t
GROUP BY r_id
HAVING array_agg(a_id) #> array[2]
and the result is :
a_id
name
r_id
message
date
1
bob
77
bob here
2022-01-01
see dbfiddle
For last message in every chat room simply would be:
select a_id, name, r_id, to_char(max(date),'dd-mon') from chats
where a_id =2
group by r_id, a_id,name;
Fiddle https://www.db-fiddle.com/f/keCReoaXg2eScrhFetEq1b/0
Or seeing messages
with last_message as (
select a_id, name, r_id, to_char(max(date),'dd-mon') date from chats
where a_id =1
group by r_id, a_id,name
)
select l.*, c.message
from last_message l
join chats c on (c.a_id= l.a_id and l.r_id=c.r_id and l.date=to_char(c.date,'dd-mon'));
Fiddle https://www.db-fiddle.com/f/keCReoaXg2eScrhFetEq1b/1
Though all this complication could by avoided with a primary key on your table.

Postgres join when only one row is equal

I have two tables and I am wanting to do an inner join between table_1 and table_2 but only when there is one row in table_2 that meets the join criteria.
For example:
table_1
id | name | age |
-----------------+------------------+--------------+
1 | john jones | 10 |
2 | pete smith | 15 |
3 | mary lewis | 12 |
4 | amy roberts | 13 |
table_2
id | name | age | hair | height |
-----------------+------------------+--------------+--------------+--------------+
1 | john jones | 10 | brown | 100 |
2 | john jones | 10 | blonde | 132 |
3 | mary lewis | 12 | brown | 146 |
4 | pete smith | 15 | black | 171 |
So I want to do a join when name is equal, but only when there is one corresponding matching name in table_2
So my results would look like this:
id | name | age | hair |
-----------------+------------------+--------------+--------------+
2 | pete smith | 15 | black |
3 | mary lewis | 12 | brown |
As you can see, John Jones isn't in the results as there are two corresponding rows in table_2.
My initial code looks like this:
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
inner join table_2 sc
on tb.name = sc.name and tb.age = sc.age
Can I apply a clause within the join so that it only joins on rows which are unique matches?
Group by all columns and apply having count(*) = 1
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
join table_2 sc
on tb.name = sc.name and tb.age = sc.age
group by tb.id,tb.name,tb.age,sc.hair
having count(*) = 1
The interesting thing to note is that you don’t need the aggregate expression (in the case count(*) )in the select clause.

Postgres GROUP BY an array column

I have a list of students and parents and would like to group them into families using the student id's. Parents who share common student id's can be considered to be a family while also students who share common parent id's can be considered to be a family. This is a sample table:
p_id | parent_name | s_id | student_name |
------------------------------------------|
1 | John Doe | 100 | Mike Doe |
3 | Jane Doe | 100 | Mike Doe |
3 | Jane Doe | 105 | Lisa Doe |
5 | Will Willy | 108 | William Son |
I'd like to end up with something like:
parents | students |
-------------------|------------------------|
John Doe, Jane Doe | Mike Doe, Lisa Doe |
Will Willy | William Son |
To achieve this I'm currently using:
SELECT array_agg(parents) AS parents FROM (
SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id
FROM (
/* sub query */
)b
GROUP BY s_id, student_name
ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC
But I get an error: ERROR: cannot accumulate arrays of different dimensionality. SQL state: 2202E
How can I attain the desired results?
The inner query from the above statement returns:
| par_ids | parents | student_name | s_id |
--------------------------------|------------------------|
| {1,3} | {John Doe, Jane Doe}| Mike Doe | 100 |
| {3} | {Jane Doe} | Lisa Doe | 105 |
| {5} | {Will Willy} | William Son | 108 |
Grouping these students now to the parents is where I'm stuck.
I did something similar (but a bit more complex) already here: https://stackoverflow.com/a/53129510/3984221
step-by-step demo:db<>fiddle
SELECT
array_agg(parent_name) as parents, -- 4
array_agg(student_name) as students
FROM (
SELECT DISTINCT ON (t.s_id) -- 3
*
FROM (
SELECT
s_id,
array_agg(p_id) as parents -- 1
FROM mytable
GROUP BY s_id
) s JOIN mytable t ON t.p_id = ANY(s.parents) -- 2
ORDER BY t.s_id, CARDINALITY(parents) DESC -- 3
) s
GROUP BY parents
Aggregate the p_id values into an array:
s_id
parents
108
{5}
105
{3}
100
{1,3}
Self-join the original table on this array:
s_id
parents
p_id
parent_name
s_id
student_name
100
{1,3}
1
John Doe
100
Mike Doe
105
{3}
3
Jane Doe
100
Mike Doe
100
{1,3}
3
Jane Doe
100
Mike Doe
105
{3}
3
Jane Doe
105
Lisa Doe
100
{1,3}
3
Jane Doe
105
Lisa Doe
108
{5}
5
Will Willy
108
William Son
Remove all duplicate student records. The remaining ones should be the records with the most complete p_id array. This can be done using DISTINCT ON(s_id) on a descending order by the array length:
s_id
parents
p_id
parent_name
s_id
student_name
100
{1,3}
1
John Doe
100
Mike Doe
100
{1,3}
3
Jane Doe
105
Lisa Doe
108
{5}
5
Will Willy
108
William Son
Finally you can group by the p_id array and aggregate the two name columns:
parents
students
{"John Doe","Jane Doe"}
{"Mike Doe","Lisa Doe"}
{"Will Willy"}
{"William Son"}
If you don't want to get an array, but a string list, you can use string_agg(name_colum, ',') instead of array_agg(name_column)

postgresql write a materialized view query to include base record and no of records matching

I have two tables one is users and another is orders in postgresql.
users table
userid | username | usertype
1 | John | F
2 | Bob | P
orders table
userid | orderid | ordername
1 | 001 | Mobile
1 | 002 | TV
1 | 003 | Laptop
2 | 001 | Book
2 | 002 | Kindle
Now I want to write a query for postgresql materialized view it will give me output like below
userid | username | Base Order Name |No of Orders | User Type
1 | John | Mobile | 3 | F - Free
2 | Bob | Book | 2 | P- Premium
I have tried below query but it's giving five records as output instead of two records and didn't figure out how to show usertype F - Free / P - Premium
CREATE MATERIALIZED VIEW userorders
TABLESPACE pg_default
AS
SELECT
u.userid,
username,
(select count(orderid) from orders where userid = u.userid)
as no_of_orders,
(select ordername from orders where orderid=1 and userid = u.userid)
as baseorder
FROM users u
INNER JOIN orders o ON u.userid = o.userid
WITH DATA;
It's giving result like below
userid | username | no_of_orders | baseorder
1 | John | 3 | Mobile
1 | John | 3 | Mobile
1 | John | 3 | Mobile
2 | Bob | 2 | Book
2 | Bob | 2 | Book
Assume base order id is always 001. In the final materialized view user type will return F - Free/ P - Premium by some mapping in query.
Use a group by and this becomes pretty trivial. The only slightly complex part is getting the base order name, but this can be accomplished using FILTER:
select users.userid,
username,
max(ordername) FILTER (WHERE orderid='001') as "Base Order Name",
count(orderid) as "No of Orders",
CASE WHEN usertype = 'F' THEN 'F - Free'
WHEN usertype = 'P' THEN 'P- Premium'
END as "User Type"
FROM users
JOIN orders on users.userid = orders.userid
GROUP BY users.userid, users.username, users.usertype;

PostgreSQL COUNT DISTINCT on one column while checking duplicates of another column

I have a query that results in such a table:
guardian_id | child_id | guardian_name | relation | child_name |
------------|----------|---------------|----------|------------|
1 | 1 | John Doe | father | Doe Son |
2 | 1 | Jane Doe | mother | Doe Son |
3 | 2 | Peter Pan | father | Pan Dghter |
4 | 2 | Pet Pan | mother | Pan Dghter |
1 | 3 | John Doe | father | Doe Dghter |
2 | 3 | Jane Doe | mother | Doe Dghter |
So from these results, I need to count the families. That is, distinct children with the same guardians. From the results above, There are 3 children but 2 families. How can I achieve this?
If I do:
SELECT COUNT(DISTINCT child_id) as families FROM (
//larger query
)a
I'll get 3 which is not correct.
Alternatively, how can I incorporate a WHERE clause that checks DISTINCT guardian_id's? Any other approaches?
Also note that there are instances where a child may have one guardian only.
To get the distinct family you can try the following approach.
select distinct array_agg(distinct guardian_id)
from family
group by child_id;
The above query will return the list of unique families.
eg.
{1,2}
{3,4}
Now you can apply the count on top of it.