PostgreSQL COUNT DISTINCT on one column while checking duplicates of another column - postgresql

I have a query that results in such a table:
guardian_id | child_id | guardian_name | relation | child_name |
------------|----------|---------------|----------|------------|
1 | 1 | John Doe | father | Doe Son |
2 | 1 | Jane Doe | mother | Doe Son |
3 | 2 | Peter Pan | father | Pan Dghter |
4 | 2 | Pet Pan | mother | Pan Dghter |
1 | 3 | John Doe | father | Doe Dghter |
2 | 3 | Jane Doe | mother | Doe Dghter |
So from these results, I need to count the families. That is, distinct children with the same guardians. From the results above, There are 3 children but 2 families. How can I achieve this?
If I do:
SELECT COUNT(DISTINCT child_id) as families FROM (
//larger query
)a
I'll get 3 which is not correct.
Alternatively, how can I incorporate a WHERE clause that checks DISTINCT guardian_id's? Any other approaches?
Also note that there are instances where a child may have one guardian only.

To get the distinct family you can try the following approach.
select distinct array_agg(distinct guardian_id)
from family
group by child_id;
The above query will return the list of unique families.
eg.
{1,2}
{3,4}
Now you can apply the count on top of it.

Related

Return rows which have the same values in two columns, but different values in another

I have a table that looks like this:
id | name | address | code
-----------+--------------------------+--------------------+----------
101 | joe smith | 1 long road | SC1
102 | joe smith | 6 long road | SC1
103 | amy hughes | 5 hillside lane | SC5
104 | amy hughes | 5 hillside lane | SC5
I want to return the rows that are duplications based on name and code but have different address fields.
I had something like this originally (which looked for duplications across the name, address and code columns:
SELECT name, address, code, count(*)
FROM table_name
GROUP BY 1,2,3
HAVING count(*) >1;
Is there a way I can expand on the above to only return rows that have the same name and code but different address fields?
In my example data above, I would only want to return:
id | name | address | code
-----------+--------------------------+--------------------+----------
101 | joe smith | 1 long road | SC1
102 | joe smith | 6 long road | SC1
Remove address from the select list and GROUP BY and use count(DISTINCT):
SELECT name, code, count(DISTINCT address)
FROM table_name
GROUP BY name, code
HAVING count(DISTINCT address) > 1;

Postgres join when only one row is equal

I have two tables and I am wanting to do an inner join between table_1 and table_2 but only when there is one row in table_2 that meets the join criteria.
For example:
table_1
id | name | age |
-----------------+------------------+--------------+
1 | john jones | 10 |
2 | pete smith | 15 |
3 | mary lewis | 12 |
4 | amy roberts | 13 |
table_2
id | name | age | hair | height |
-----------------+------------------+--------------+--------------+--------------+
1 | john jones | 10 | brown | 100 |
2 | john jones | 10 | blonde | 132 |
3 | mary lewis | 12 | brown | 146 |
4 | pete smith | 15 | black | 171 |
So I want to do a join when name is equal, but only when there is one corresponding matching name in table_2
So my results would look like this:
id | name | age | hair |
-----------------+------------------+--------------+--------------+
2 | pete smith | 15 | black |
3 | mary lewis | 12 | brown |
As you can see, John Jones isn't in the results as there are two corresponding rows in table_2.
My initial code looks like this:
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
inner join table_2 sc
on tb.name = sc.name and tb.age = sc.age
Can I apply a clause within the join so that it only joins on rows which are unique matches?
Group by all columns and apply having count(*) = 1
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
join table_2 sc
on tb.name = sc.name and tb.age = sc.age
group by tb.id,tb.name,tb.age,sc.hair
having count(*) = 1
The interesting thing to note is that you don’t need the aggregate expression (in the case count(*) )in the select clause.

Reset column with numeric value that represents the order when destroying a row

I have a table of users that has a column called order that represents the order in they will be elected.
So, for example, the table might look like:
| id | name | order |
|-----|--------|-------|
| 1 | John | 2 |
| 2 | Mike | 0 |
| 3 | Lisa | 1 |
So, say that now Lisa gets destroyed, I would like that in the same transaction that I destroy Lisa, I am able to update the table so the order is still consistent, so the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 2 | Mike | 0 |
Or, if Mike were the one to be deleted, the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 3 | Lisa | 0 |
How can I do this in PostgreSQL?
If you are just deleting one row, one option uses a cte and the returning clause to then trigger an update
with del as (
delete from mytable where name = 'Lisa'
returning ord
)
update mytable
set ord = ord - 1
from del d
where mytable.ord > d.ord
As a more general approach, I would really recommend trying to renumber the whole table after every delete. This is inefficient, and can get tedious for multi-rows delete.
Instead, you could build a view on top of the table:
create view myview as
select id, name, row_number() over(order by ord) ord
from mytable

SUM of two level group by in postgresql

I have three table as given below
student
id name stand_id sub_id gender
---------------------------------------
1 | Joe | 1 | 1 | M
2 | Saun | 2 | 1 | F
3 | Paul | 1 | 2 | F
4 | Sena | 2 | 2 | M
Subject
id name
1 Math
2 English
Standard
id name
1 First
2 Second
How can I achieve this kind of multiple group by like standard, subject than total number of boys and girls.
Should I use with, union or union all ?
First
Math
boys total
girls total
second
math
boys total
girls total
It's not completely clear what you are attempting. My interpretation is that you are looking for the total of students by standard, subject and gender.
If that is correct, you need to join together the tables and count the students at the appropriate grain, like so:
SELECT
sta.name AS standard_name,
sub.name AS subject_name,
CASE stu.gender WHEN 'M' THEN 'Boys' ELSE 'Girls' END AS student_gender,
COUNT(stu.id) AS total
FROM
student stu
JOIN
subject sub
ON (stu.sub_id = sub.id)
JOIN
standard sta
ON (stu.stand_id = sta.id)
GROUP BY
standard_name,
subject_name,
student_gender;
Based on your sample data, it would return this:
standard_name | subject_name | student_gender | total
-----------------------------------------------------
First | Math | Boys | 1
First | English | Girls | 1
Second | Math | Girls | 1
Second | English | Boys | 1
Is it what you are looking for
SELECT sd.name,
sj.name,
count(st.gender) filter (
WHERE st.gender='M') AS MALE,
count(st.gender) filter (
WHERE st.gender='F') AS FEMALE
FROM Standard sd
INNER JOIN Student st ON (st.stand_id=sd.id)
INNER JOIN Subject sj ON (sj.id=st.sub_id)
GROUP BY sd.name,
sj.name;
name | name | male | female
--------+---------+------+--------
First | Math | 1 | 0
First | English | 0 | 1
Second | English | 2 | 1
Second | Math | 0 | 1
(4 rows)
I have added some more rows to second English.

MS Access Group By breaks when using a date

For some reason using a date/time field in a select query with Group By in Access 2010 breaks (records are not properly "grouped by" the text field first, showing the same "aTextField" value multiple times). I am able to replicate the issue in a simple, one table query. Ex:
SELECT aTextField, SUM(aIntField) AS SumOfaIntField
FROM simpleTable
GROUP BY aTextField, aDateField
HAVING aDateField >= Date()
ORDER BY aTextField;
As soon as you remove the "aDateField" from the query (Group By and Having lines) then it works properly. I can even remove the HAVING line and it still breaks. Leaving me to believe that it is something with the Group By.
Any feedback would be great. Thanks!
EDIT More details
**simpleTable**
--------------------------------------------
| ID | aTextField | aIntField | aDateField |
============================================
| 1 | John Doe | 1 | 3/14/2013 |
| 2 | John Doe | | 3/15/2013 |
| 3 | Jane Doe | 1 | 3/15/2013 |
| 4 | John Doe | 2 | 3/18/2013 |
| 5 | Jane Doe | 1 | 3/19/2013 |
| 6 | John Doe | | 3/20/2013 |
| 7 | John Doe | 3 | 3/21/2013 |
| 8 | Jane Doe | 1 | 3/19/2013 |
| 9 | John Doe | | 3/22/2013 |
| 10 | Jane Doe | 2 | 3/20/2013 |
| 11 | Jane Doe | | 3/21/2013 |
| 12 | Jane Doe | | 3/22/2013 |
--------------------------------------------
**Expected Result**
-------------------------------
| aTextField | SumOfaIntField |
===============================
| Jane Doe | 4 |
| John Doe | 3 |
-------------------------------
**Actual Result**
-------------------------------
| aTextField | SumOfaIntField |
===============================
| Jane Doe | 2 |
| Jane Doe | 2 |
| Jane Doe | |
| Jane Doe | |
| John Doe | |
| John Doe | 3 |
| John Doe | |
-------------------------------
So what appears to be happening is that there is a seperate row for each date as well. I just need to filter by the date and not necessarily Group By it. However, Access will not accept the query without grouping it. Options?
You're grouping by aTextField and aDateField. Perhaps simpleTable includes rows where the date is the same, but the time of day is different. In that case your grouping would produce a row for each date/time combination.
Whether or not that was the explanation, you should check what the db engine actually evaluates by including aDateField in the SELECT list.
SELECT aTextField, aDateField, SUM(aIntField)
FROM simpleTable
GROUP BY aTextField, aDateField
HAVING aDateField >= Date()
ORDER BY aTextField;
Also consider using a WHERE instead of HAVING clause:
WHERE aDateField >= Date()
Based on your sample data, I suspect you want ...
SELECT aTextField, SUM(aIntField)
FROM simpleTable
GROUP BY aTextField
WHERE aDateField >= Date()
ORDER BY aTextField;
You should be able to use the following:
SELECT aTextField, SUM(aIntField) AS SumOfaIntField
FROM simpleTable
WHERE aDateField >= Date()
GROUP BY aTextField
ORDER BY aTextField;
You will notice that I removed the GROUP BY on the aDateField column. Since you want the total for each aTextField, then you do not need to group by the date. Grouping by date will result in a separate row for each distinct date.
Note: this query was tested in MS Access 2010 and generated your desired result.
I think you are misunderstanding on how GROUP BY works. You should be seeing the same aTextField once for each unique textfield/datetime combination
Sample
a 2012-01-01
a 2012-01-01
b 2012-01-01
b 2012-01-02
b 2012-01-02
group by aTextField, aDateField
a 2012-01-01
b 2012-01-01
b 2012-01-02
group by aTextField
a
b