Return rows which have the same values in two columns, but different values in another

Return rows which have the same values in two columns, but different values in another - postgresql

I have a table that looks like this:
id | name | address | code
-----------+--------------------------+--------------------+----------
101 | joe smith | 1 long road | SC1
102 | joe smith | 6 long road | SC1
103 | amy hughes | 5 hillside lane | SC5
104 | amy hughes | 5 hillside lane | SC5
I want to return the rows that are duplications based on name and code but have different address fields.
I had something like this originally (which looked for duplications across the name, address and code columns:
SELECT name, address, code, count(*)
FROM table_name
GROUP BY 1,2,3
HAVING count(*) >1;
Is there a way I can expand on the above to only return rows that have the same name and code but different address fields?
In my example data above, I would only want to return:
id | name | address | code
-----------+--------------------------+--------------------+----------
101 | joe smith | 1 long road | SC1
102 | joe smith | 6 long road | SC1

Remove address from the select list and GROUP BY and use count(DISTINCT):
SELECT name, code, count(DISTINCT address)
FROM table_name
GROUP BY name, code
HAVING count(DISTINCT address) > 1;

Related

How to select rows based on properties of another row?

Had a question..
| a_id | name | r_id | message | date
_____________________________________________
| 1 | bob | 77 | bob here | 1-jan
| 1 | bob | 77 | bob here again | 2-jan
| 2 | jack | 77 | jack here. | 2-jan
| 1 | bob | 79 | in another room| 3-feb
| 3 | gill | 79 | gill here | 4-feb
These are basically accounts (a_id) chatting inside different rooms (r_id)
I'm trying to find the last chat message for every room that jack a_id = 2 is chatting in.
What i've tried so far is using distinct on (r_id) ... ORDER BY r_id, date DESC.
But this incorrectly gives me the last message in every room instead of only giving the last message in everyroom that jack belongs to.
| 2 | jack | 77 | jack here. | 2-jan
| 3 | gill | 79 | gill here | 4-feb
Is this a partition problem instead distinct on?

I would suggest :
to group the rows by r_id with a GROUP BY clause
to select only the groups where a_id = 2 is included with a HAVING clause which aggregates the a_id of each group : HAVING array_agg(a_id) #> array[2]
to select the latest message of each selected group by aggregating its rows in an array with ORDER BY date DESC and selecting the first element of the array : (array_agg(t.*))[1]
to convert the selected rows into a json object and then displaying the expected result by using the json_populate_record function
The full query is :
SELECT (json_populate_record(null :: my_table, (array_agg(to_json(t.*)))[1])).*
FROM my_table AS t
GROUP BY r_id
HAVING array_agg(a_id) #> array[2]
and the result is :
a_id
name
r_id
message
date
1
bob
77
bob here
2022-01-01
see dbfiddle

For last message in every chat room simply would be:
select a_id, name, r_id, to_char(max(date),'dd-mon') from chats
where a_id =2
group by r_id, a_id,name;
Fiddle https://www.db-fiddle.com/f/keCReoaXg2eScrhFetEq1b/0
Or seeing messages
with last_message as (
select a_id, name, r_id, to_char(max(date),'dd-mon') date from chats
where a_id =1
group by r_id, a_id,name
)
select l.*, c.message
from last_message l
join chats c on (c.a_id= l.a_id and l.r_id=c.r_id and l.date=to_char(c.date,'dd-mon'));
Fiddle https://www.db-fiddle.com/f/keCReoaXg2eScrhFetEq1b/1
Though all this complication could by avoided with a primary key on your table.

Postgres join when only one row is equal

I have two tables and I am wanting to do an inner join between table_1 and table_2 but only when there is one row in table_2 that meets the join criteria.
For example:
table_1
id | name | age |
-----------------+------------------+--------------+
1 | john jones | 10 |
2 | pete smith | 15 |
3 | mary lewis | 12 |
4 | amy roberts | 13 |
table_2
id | name | age | hair | height |
-----------------+------------------+--------------+--------------+--------------+
1 | john jones | 10 | brown | 100 |
2 | john jones | 10 | blonde | 132 |
3 | mary lewis | 12 | brown | 146 |
4 | pete smith | 15 | black | 171 |
So I want to do a join when name is equal, but only when there is one corresponding matching name in table_2
So my results would look like this:
id | name | age | hair |
-----------------+------------------+--------------+--------------+
2 | pete smith | 15 | black |
3 | mary lewis | 12 | brown |
As you can see, John Jones isn't in the results as there are two corresponding rows in table_2.
My initial code looks like this:
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
inner join table_2 sc
on tb.name = sc.name and tb.age = sc.age
Can I apply a clause within the join so that it only joins on rows which are unique matches?

Group by all columns and apply having count(*) = 1
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
join table_2 sc
on tb.name = sc.name and tb.age = sc.age
group by tb.id,tb.name,tb.age,sc.hair
having count(*) = 1
The interesting thing to note is that you don’t need the aggregate expression (in the case count(*) )in the select clause.

Postgres GROUP BY an array column

I have a list of students and parents and would like to group them into families using the student id's. Parents who share common student id's can be considered to be a family while also students who share common parent id's can be considered to be a family. This is a sample table:
p_id | parent_name | s_id | student_name |
------------------------------------------|
1 | John Doe | 100 | Mike Doe |
3 | Jane Doe | 100 | Mike Doe |
3 | Jane Doe | 105 | Lisa Doe |
5 | Will Willy | 108 | William Son |
I'd like to end up with something like:
parents | students |
-------------------|------------------------|
John Doe, Jane Doe | Mike Doe, Lisa Doe |
Will Willy | William Son |
To achieve this I'm currently using:
SELECT array_agg(parents) AS parents FROM (
SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id
FROM (
/* sub query */
)b
GROUP BY s_id, student_name
ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC
But I get an error: ERROR: cannot accumulate arrays of different dimensionality. SQL state: 2202E
How can I attain the desired results?
The inner query from the above statement returns:
| par_ids | parents | student_name | s_id |
--------------------------------|------------------------|
| {1,3} | {John Doe, Jane Doe}| Mike Doe | 100 |
| {3} | {Jane Doe} | Lisa Doe | 105 |
| {5} | {Will Willy} | William Son | 108 |
Grouping these students now to the parents is where I'm stuck.

I did something similar (but a bit more complex) already here: https://stackoverflow.com/a/53129510/3984221
step-by-step demo:db<>fiddle
SELECT
array_agg(parent_name) as parents, -- 4
array_agg(student_name) as students
FROM (
SELECT DISTINCT ON (t.s_id) -- 3
*
FROM (
SELECT
s_id,
array_agg(p_id) as parents -- 1
FROM mytable
GROUP BY s_id
) s JOIN mytable t ON t.p_id = ANY(s.parents) -- 2
ORDER BY t.s_id, CARDINALITY(parents) DESC -- 3
) s
GROUP BY parents
Aggregate the p_id values into an array:
s_id
parents
108
{5}
105
{3}
100
{1,3}
Self-join the original table on this array:
s_id
parents
p_id
parent_name
s_id
student_name
100
{1,3}
1
John Doe
100
Mike Doe
105
{3}
3
Jane Doe
100
Mike Doe
100
{1,3}
3
Jane Doe
100
Mike Doe
105
{3}
3
Jane Doe
105
Lisa Doe
100
{1,3}
3
Jane Doe
105
Lisa Doe
108
{5}
5
Will Willy
108
William Son
Remove all duplicate student records. The remaining ones should be the records with the most complete p_id array. This can be done using DISTINCT ON(s_id) on a descending order by the array length:
s_id
parents
p_id
parent_name
s_id
student_name
100
{1,3}
1
John Doe
100
Mike Doe
100
{1,3}
3
Jane Doe
105
Lisa Doe
108
{5}
5
Will Willy
108
William Son
Finally you can group by the p_id array and aggregate the two name columns:
parents
students
{"John Doe","Jane Doe"}
{"Mike Doe","Lisa Doe"}
{"Will Willy"}
{"William Son"}
If you don't want to get an array, but a string list, you can use string_agg(name_colum, ',') instead of array_agg(name_column)

postgresql write a materialized view query to include base record and no of records matching

I have two tables one is users and another is orders in postgresql.
users table
userid | username | usertype
1 | John | F
2 | Bob | P
orders table
userid | orderid | ordername
1 | 001 | Mobile
1 | 002 | TV
1 | 003 | Laptop
2 | 001 | Book
2 | 002 | Kindle
Now I want to write a query for postgresql materialized view it will give me output like below
userid | username | Base Order Name |No of Orders | User Type
1 | John | Mobile | 3 | F - Free
2 | Bob | Book | 2 | P- Premium
I have tried below query but it's giving five records as output instead of two records and didn't figure out how to show usertype F - Free / P - Premium
CREATE MATERIALIZED VIEW userorders
TABLESPACE pg_default
AS
SELECT
u.userid,
username,
(select count(orderid) from orders where userid = u.userid)
as no_of_orders,
(select ordername from orders where orderid=1 and userid = u.userid)
as baseorder
FROM users u
INNER JOIN orders o ON u.userid = o.userid
WITH DATA;
It's giving result like below
userid | username | no_of_orders | baseorder
1 | John | 3 | Mobile
1 | John | 3 | Mobile
1 | John | 3 | Mobile
2 | Bob | 2 | Book
2 | Bob | 2 | Book
Assume base order id is always 001. In the final materialized view user type will return F - Free/ P - Premium by some mapping in query.

Use a group by and this becomes pretty trivial. The only slightly complex part is getting the base order name, but this can be accomplished using FILTER:
select users.userid,
username,
max(ordername) FILTER (WHERE orderid='001') as "Base Order Name",
count(orderid) as "No of Orders",
CASE WHEN usertype = 'F' THEN 'F - Free'
WHEN usertype = 'P' THEN 'P- Premium'
END as "User Type"
FROM users
JOIN orders on users.userid = orders.userid
GROUP BY users.userid, users.username, users.usertype;

PostgreSQL COUNT DISTINCT on one column while checking duplicates of another column

I have a query that results in such a table:
guardian_id | child_id | guardian_name | relation | child_name |
------------|----------|---------------|----------|------------|
1 | 1 | John Doe | father | Doe Son |
2 | 1 | Jane Doe | mother | Doe Son |
3 | 2 | Peter Pan | father | Pan Dghter |
4 | 2 | Pet Pan | mother | Pan Dghter |
1 | 3 | John Doe | father | Doe Dghter |
2 | 3 | Jane Doe | mother | Doe Dghter |
So from these results, I need to count the families. That is, distinct children with the same guardians. From the results above, There are 3 children but 2 families. How can I achieve this?
If I do:
SELECT COUNT(DISTINCT child_id) as families FROM (
//larger query
)a
I'll get 3 which is not correct.
Alternatively, how can I incorporate a WHERE clause that checks DISTINCT guardian_id's? Any other approaches?
Also note that there are instances where a child may have one guardian only.

To get the distinct family you can try the following approach.
select distinct array_agg(distinct guardian_id)
from family
group by child_id;
The above query will return the list of unique families.
eg.
{1,2}
{3,4}
Now you can apply the count on top of it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Return rows which have the same values in two columns, but different values in another - postgresql

Remove address from the select list and GROUP BY and use count(DISTINCT): SELECT name, code, count(DISTINCT address) FROM table_name GROUP BY name, code HAVING count(DISTINCT address) > 1;

Related

How to select rows based on properties of another row?

Postgres join when only one row is equal

Postgres GROUP BY an array column

postgresql write a materialized view query to include base record and no of records matching

PostgreSQL COUNT DISTINCT on one column while checking duplicates of another column

Categories

Resources