Compare number of rows in two tables - tsql

I have two tables, which share a column, that is not unique. I want all records where table A has more values of the shared column than TABLE B.
TABLE A:
Shared_Column|User_ID|Department
123 | joe| sales
123 | joe| sales
123 | joe| sales
124 | sam| ops
124 | sam| ops
TABLE B
Shared_Column|Other_Column
123 | 1
123 | 1
124 | 4
124 | 4
From this data, I want joe|sales but not sam|ops. I could also work with this as output:
USER|TABLE_A_COUNT|TABLE_B_COUNT
joe| 3| 2
sam| 2| 2
edit: I've tried to do a join like this:
select a.user_ID, count(a.shared_column) as 'TABLE_A_COUNT', count(b.shared_column) as 'TABLE_B_COUNT'
from a inner join b on a.shared_column = b.shared_column
group by a.user_ID
but that seems to produce a cross join and I get joe|6|6 instead of 3 and 2
Thanks!

It seems like you want something like this:
select a.user_id,
count(a.shared_column) TableA,
TableB
from tablea a
inner join
(
select count(*) TableB, Shared_column
from tableb
group by shared_column
) b
on a.Shared_Column = b.Shared_Column
group by a.user_id, TableB
See Sql Fiddle with Demo
Result:
| USER_ID | TABLEA | TABLEB |
-----------------------------
| joe | 3 | 2 |
| sam | 2 | 2 |

Related

Postgres join when only one row is equal

I have two tables and I am wanting to do an inner join between table_1 and table_2 but only when there is one row in table_2 that meets the join criteria.
For example:
table_1
id | name | age |
-----------------+------------------+--------------+
1 | john jones | 10 |
2 | pete smith | 15 |
3 | mary lewis | 12 |
4 | amy roberts | 13 |
table_2
id | name | age | hair | height |
-----------------+------------------+--------------+--------------+--------------+
1 | john jones | 10 | brown | 100 |
2 | john jones | 10 | blonde | 132 |
3 | mary lewis | 12 | brown | 146 |
4 | pete smith | 15 | black | 171 |
So I want to do a join when name is equal, but only when there is one corresponding matching name in table_2
So my results would look like this:
id | name | age | hair |
-----------------+------------------+--------------+--------------+
2 | pete smith | 15 | black |
3 | mary lewis | 12 | brown |
As you can see, John Jones isn't in the results as there are two corresponding rows in table_2.
My initial code looks like this:
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
inner join table_2 sc
on tb.name = sc.name and tb.age = sc.age
Can I apply a clause within the join so that it only joins on rows which are unique matches?
Group by all columns and apply having count(*) = 1
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
join table_2 sc
on tb.name = sc.name and tb.age = sc.age
group by tb.id,tb.name,tb.age,sc.hair
having count(*) = 1
The interesting thing to note is that you don’t need the aggregate expression (in the case count(*) )in the select clause.

How to use join with aggregate function in postgresql?

I have 4 tables
Table1
id | name
1 | A
2 | B
Table2
id | name1
1 | C
2 | D
Table3
id | name2
1 | E
2 | F
Table4
id | name1_id | name2_id | name3_id
1 | 1 | 2 | 1
2 | 2 | 2 | 2
3 | 1 | 2 | 1
4 | 2 | 1 | 1
5 | 1 | 1 | 2
6 | 2 | 2 | 1
7 | 1 | 1 | 2
8 | 2 | 1 | 1
9 | 1 | 2 | 1
10 | 2 | 2 | 1
Now I want to join all tables with 4 and get this type of output
name | count
{A,B} | {5, 5}
{C,D} | {5, 6}
{E,F} | {7, 3}
I tried this
select array_agg(distinct(t1.name)), array_agg(distinct(temp.test))
from
(select t4.name1_id, (count(t4.name1_id)) "test"
from table4 t4 group by t4.name1_id
) temp
join table1 t1
on temp.name1_id = t1.id
I am trying to achieve this. Anybody can help me.
Calculate the counts for every table separately and union the results:
select
array_agg(name order by name) as name,
array_agg(count order by name) as count
from (
select 1 as t, name, count(*)
from table4
join table1 t1 on t1.id = name1_id
group by name
union all
select 2 as t, name, count(*)
from table4
join table2 t2 on t2.id = name2_id
group by name
union all
select 3 as t, name, count(*)
from table4
join table3 t3 on t3.id = name3_id
group by name
) s
group by t;
name | count
-------+-------
{A,B} | {5,5}
{C,D} | {4,6}
{E,F} | {7,3}
(3 rows)

Duplicate row after left join

I am trying to write a query which as follow:
select distinct bsg.id as bsgId,
s.system_id as sysId,
g.code_no as gameNo,
u.user_name as nameOfUser,
s.score_code as scoreId,
p.name as cityOfGame
from score s
join scoreGr sg on sg.id = s.scoreGr_id
join bigScoreGr bsg on sg.bigScoreGr_id = bsg.id
join game g on bsg.fld_case_id = g.id
join user u on s.user_id = u.id
join system_number sn on g.id = sn.game_id
join system_doc sd on sd.system_number_id = sn.id
left join parameter p on sd.city_id = p.id
Until I have joined with parameter table, result is as expected. The result seems like below:
bsgId| sysId | gameNo | nameOfUser | scoreId
--------------------------------------------------
1234 | abcde | G-12 | admin | G-12/1/1
1235 | abcdf | G-15 | admin | G-15/1/3
1234 | abcdf | G-12 | user1 | G-12/1/8
1237 | abcdf | G-16 | user1 | G-16/2/4
However, parameter table is something big and system_doc has some null values in its city_id column. When I add the left join part of my query, it becomes like that:
bsgId| sysId | gameNo | nameOfUser | scoreId | city
--------------------------------------------------
1234 | abcde | G-12 | admin | G-12/1/1 | city1
1235 | abcdf | G-15 | admin | G-15/1/3 | city5
1235 | abcdf | G-15 | admin | G-15/1/3 |
1234 | abcdg | G-12 | user1 | G-12/1/8 | city4
1234 | abcdg | G-12 | user1 | G-12/1/8 |
1237 | abcdf | G-16 | user1 | G-16/2/4 |
I do not want rows like 3rd and 5th ones. To avoid these rows which has null in their city columns and "has the exact same data except city field" (I mean city can be null actually,as in the last row, but having row #2 makes row #3 useless, so I only want row #2) I have used distinct on(scoreId), but it did not worked since I have lost row #2 but not row #3.
How could I eliminate those duplicate rows which has null in their city fields? I hope my question is clear.
It's a postgresql bug.
left join parameter p on sd.city_id = p.id
Try this
left join parameter p on p.id = p.id
WHERE sd.city_id = p.id
(I have answered this so anyone looking for will now know about this bug)
It seems like you have a composite key. Try to mention all columns of composite key i.e. if you have primary key(pk1, pk2) then select * from table1 left join table2 on table1.pk1=table2.pk1 and table2.pk2=table2.pk2

Resolve many to many relationship in SQL

I'm using Postgresql. Let's say I have 3 tables:
Classes
id | name
1 | Biology
2 | Math
Students
id | name
1 | John
2 | Jane
Student_Classes
id | student_id | class_id | registration_token
1 | 1 | 1 | abc
2 | 1 | 2 | def
3 | 2 | 1 | zxc
I want to obtain a result set like this:
Results
student_name | biology | math
John | abc | def
Jane | zxc | NULL
I can get this result set with this query:
SELECT
student.name as student_name,
biology.registration_token as biology,
math.registration_token as math
FROM
Students
LEFT JOIN (
SELECT registration_token FROM Student_Classes WHERE class_id = (
SELECT id FROM Classes WHERE name = 'Biology'
)
) AS biology
ON Students.id = biology.student_id
LEFT JOIN (
SELECT registration_token FROM Student_Classes WHERE class_id = (
SELECT id FROM Classes WHERE name = 'Math'
)
) AS math
ON Students.id = math.student_id
Is there a way to get this same result set without having a join statement for each class? With this solution, if I want to add a class, I need to add another join statement.
You can do this via postgresql tablefunc extension crosstab but such presentation requirements may be handled better outside of sql.

Join tables and count instances of different values

user
---------------------------
| ID | Name |
---------------------------
| 1 | Jim Rice |
| 2 | Wade Boggs |
| 3 | Bill Buckner |
---------------------------
at_bats
----------------------
| ID | User | Bases |
----------------------
| 1 | 1 | 2 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 3 | 0 |
| 5 | 1 | 3 |
----------------------
What I want my query to do is get the count of the different base values in a join table like:
count_of_hits
---------------------
| ID | 1B | 2B | 3B |
---------------------
| 1 | 0 | 2 | 1 |
| 2 | 1 | 0 | 0 |
| 3 | 0 | 0 | 0 |
---------------------
I had a query where I was able to get the bases individually, but not them all unless I did some complicated Joins and I'd imagine there is a better way. This was the foundational query though:
SELECT id, COUNT(ab.*)
FROM user
LEFT OUTER JOIN (SELECT * FROM at_bats WHERE at_bats.bases=2) ab ON ab.user=user.id
PostgreSQL 9.4+ provides a much cleaner way to do this:
SELECT
users,
count(*) FILTER (WHERE bases=1) As B1,
count(*) FILTER (WHERE bases=2) As B2,
count(*) FILTER (WHERE bases=3) As B3,
FROM at_bats
GROUP BY users
ORDER BY users;
I think the following query would solve your problem. However, I am not sure if it is the best approach:
select distinct a.users, coalesce(b.B1, 0) As B1, coalesce(c.B2, 0) As B2 ,coalesce(d.B3, 0) As B3
FROM at_bats a
LEFT JOIN (SELECT users, count(bases) As B1 FROM at_bats WHERE bases = 1 GROUP BY users) as b ON a.users=b.users
LEFT JOIN (SELECT users, count(bases) As B2 FROM at_bats WHERE bases = 2 GROUP BY users) as c ON a.users=c.users
LEFT JOIN (SELECT users, count(bases) As B3 FROM at_bats WHERE bases = 3 GROUP BY users) as d ON a.users=d.users
Order by users
the coalesce() function is just to replace the nulls with zeros. I hope this query helps you :D
UPDATE 1
I found a better way to do it, look to the following:
SELECT users,
count(case bases when 1 then 1 else null end) As B1,
count(case bases when 2 then 1 else null end) As B2,
count(case bases when 3 then 1 else null end) As B3
FROM at_bats
GROUP BY users
ORDER BY users;
It it is more efficient compared to my first query. You can check the performance by using EXPLAIN ANALYSE before the query.
Thanks to Guffa from this post: https://stackoverflow.com/a/1400115/4453190