SQL: find overlapping cell values across the same column of many tables - postgresql

I'm working with postgresql and I'm a bit of a newbie to SQL, generally.
I'm attempting to write a query that checks if a values overlap between a master list and multiple tables. The values in question are usernames and the multiple tables (30 in all) each represent event data for different games.
Each game has its own table with identical column headings. 30 tables that have identical columns something like this...:
table name: game1...game30
USERNAME EVENT_TIMESTAMP OTHER_FELIDS
2592761928AF756E45891527ED49A7A9 2016-02-01 02:38:05 ...
79460FE440ADB429F542D2F08A763D50 2016-02-01 02:38:35 ...
3945B26DD9F6FD2D49574856ECF9FA7D 2016-02-01 02:44:12 ...
A597AE2CF6E15497EE7AC2A02CEEB32E 2016-02-01 02:46:57 ...
65DE308FC39980CCD37DBDE8A432F221 2016-02-01 02:46:57 ...
...
I have a list of specified user_ids that I've used to create a "key table" I'm attempting to write a query that will tell me whether or not any of the users in my key table list show up in the game's event data.
My key table is only two columns and looks something like this:
table name: username_key
EMAIL HASHED_EMAIL
asd0#asd.com 79460FE440ADB429F542D2F08A763D50
asd1#asd.com 0C450FAC330D69A315604CDE61C7A65E
asd2#asd.com F2D7714CBA1048A940231087549F1D95
bob#asd.com FE793A075E0633441B5EE5535FAAEDD2
asd7#asd.com 47FAFD07C174B81BADD28AD9BE64E26B
...
(Note: the username in both the games tables and the key table are hash encrypted emails, hence the name "HASHED_EMAILS")
My query currently looks like this:
create temp table players as select ky.hashed_email from username_key as ky
inner join game1 g1 on ky.hashed_email = g1.username
inner join game2 g2 on ky.hashed_email = g2.username
inner join game3 g3 on ky.hashed_email = g3.username
inner join game4 g4 on ky.hashed_email = g4.username
...
inner join game30 g30 on ky.hashed_email = g30.username
When I try to run this query it hangs for a long long time... Hours and eventually times out.
I'm hoping to return a list of users that show up in one or more of the game events tables, or return an empty list (which would tell me that no one in my key table list has played the games).
Am I on the right track with my query?
Is there a faster/more efficient way to accomplish this task then the way I'm doing it?
How would you, a postgresql expert, solve this problem (finding specific occurrences of usernames across many different tables)?

If you care for a user being in any of the tables, not multiple ones, you have the following alternatives:
IN with UNION:
SELECT * FROM players WHERE hashed_email IN (
SELECT username FROM game1
UNION SELECT username FROM game2
UNION SELECT username FROM game3
...
)
IN with OR:
SELECT * FROM player WHERE hashed_email IN (SELECT username FROM game1)
OR hashed_email IN (SELECT username FROM game2)
OR hashed_email IN (SELECT username FROM game3)
...
EXISTS:
SELECT * FROM player WHERE EXISTS (SELECT 1 FROM game1 WHERE username=hashed_email)
OR EXISTS (SELECT 1 FROM game2 WHERE username=hashed_email)
OR EXISTS (SELECT 1 FROM game3 WHERE username=hashed_email)
...
There are probably quite a few other alternatives. You should probably use EXPLAIN or EXPLAIN ANALYZE to find out which is the more efficient, though I wouldn't be surprised if all three resulted in a substantially similar query plan.
Note that having an appropriate index on username in each of the game* tables would of course help a lot.

Related

Problem adding a number of students registered to a course (as a row) to a SELECT statement in POSTGRES

I have a database of courses. I need to get a name, a a topic, a teacher, a duration and a number of students registered. I get the first four successfully, but not the last one.
Here is what my tables look like:
courses table
journal table
all tables
That's the successful part for the first four:
SELECT c.name, t.topic_name AS topic, u.name || ' '|| u.surname AS TEACHER, ((c.end_date - c.start_date) / 7)::int AS duration
FROM public.course c
RIGHT JOIN public.topic t ON c.topic_id = t.topic_id
RIGHT JOIN public.teacher_course tc ON c.course_id = tc.course_id
RIGHT JOIN public.user u ON tc.teacher_id = u.user_id
WHERE u.role_id = 2;
Basically, to know the number of registered students per course, I only need to count records in the journal table for each course, but when I add
count(j.id_record) AS students_registered
it just breaks and asks me to group everything by and blah blah.
I'm confused about that. How to get this number correctly for each course?

SQL Natural Join

Okay. So the question that I got asked by the teacher was this:
(5 marks) Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition. (E.g. find the titles of films rented by a particular customer.) Note the hints on the course news page if your query returns nothing.
Here is the layout of the database im working with:
http://www.postgresqltutorial.com/wp-content/uploads/2013/05/PostgreSQL-Sample-Database.png
The hint to us was this:
PostgreSQL hint:
If a natural join doesn't produce any results in the dvdrental DB, it is because many tables have the last update: timestamp field, and thus the natural join tries to join on that field as well as the intended field.
e.g.
select *
from film natural join inventory;
does not work because of this - it produces an empty table (no results).
Instead, use
select *
from film, inventory
where film.film_id = inventory.film_id;
This is what I did:
select *
from film, customer
where film.film_id = customer.customer_id;
The problem is I cannot get a particular customer.
I tried doing customer_id = 2; but it returns a error.
Really need help!
Well, it seems that you would like to join two tables that have no direct relation with each other, there's your issue:
where film.film_id = customer.customer_id
To find which films are rented by which customer you would have to join customer table with rental, then with inventory and finally with film.
The task description states
Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition.quote

Complex Joins in Postgresql

It's possible I'm stupid, but I've been querying and checking for hours and I can't seem to find the answer to this, so I apologize in advance if the post is redundant... but I can't seem to find its doppelganger.
OK: I have a PostGreSQL db with the following tables:
Key(containing two fields in which I'm interested, ID and Name)
and a second table, Key.
Data contains well... data, sorted by ID. ID is unique, but each Name has multiple ID's. E.G. if Bill enters the building this is ID 1 for Bill. Mary enters the building, ID 2 for Mary, Bill re-enters the building, ID 3 for Bill.
The ID field is in both the Key table, and the DATA table.
What I want to do is... find
The MAX (e.g. last) ID, unique to EACH NAME, and the Data associated with it.
E.g. Bill - Last Login: ID 10. Time: 123UTC Door: West and so on.
So... I'm trying the following query:
SELECT
*
FROM
Data, Key
WHERE
Key.ID = (
SELECT
MAX (ID)
FROM
Key
GROUP BY ID
)
Here's the kicker, there's about... something like 800M items in these tables, so errors are... time consuming. Can anyone help to see if this query is gonna do what I expect?
Thanks so much.
To get the maximum key for each name . . .
select Name, max(ID) as max_id
from data
group by Name;
Join that to your other table.
select *
from key t1
inner join (select Name, max(ID) as max_id
from data
group by Name) t2
on t1.id = t2.max_id

Finding duplicates between two tables

I've got two SQL2008 tables, one is a "Import" table containing new data and the other a "Destination" table with the live data. Both tables are similar but not identical (there's more columns in the Destination table updated by a CRM system), but both tables have three "phone number" fields - Tel1, Tel2 and Tel3. I need to remove all records from the Import table where any of the phone numbers already exist in the destination table.
I've tried knocking together a simple query (just a SELECT to test with just now):
select t2.account_id
from ImportData t2, Destination t1
where
(t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
... but I'm aware this is almost certainly Not The Way To Do Things, especially as it's very slow. Can anyone point me in the right direction?
this query requires a little more that this information. If You want to write it in the efficient way we need to know whether there is more duplicates each load or more new records. I assume that account_id is the primary key and has a clustered index.
I would use the temporary table approach that is create a normalized table #r with an index on phone_no and account_id like
SELECT Phone, Account into #tmp
FROM
(SELECT account_id, tel1, tel2, tel3
FROM destination) p
UNPIVOT
(Phone FOR Account IN
(Tel1, tel2, tel3)
)AS unpvt;
create unclustered index on this table with the first column on the phone number and the second part the account number. You can't escape one full table scan so I assume You can scan the import(probably smaller). then just join with this table and use the not exists qualifier as explained. Then of course drop the table after the processing
luke
I am not sure on the perforamance of this query, but since I made the effort of writing it I will post it anyway...
;with aaa(tel)
as
(
select Tel1
from Destination
union
select Tel2
from Destination
union
select Tel3
from Destination
)
,bbb(tel, id)
as
(
select Tel1, account_id
from ImportData
union
select Tel2, account_id
from ImportData
union
select Tel3, account_id
from ImportData
)
select distinct b.id
from bbb b
where b.tel in
(
select a.tel
from aaa a
intersect
select b2.tel
from bbb b2
)
Exists will short-circuit the query and not do a full traversal of the table like a join. You could refactor the where clause as well, if this still doesn't perform the way you want.
SELECT *
FROM ImportData t2
WHERE NOT EXISTS (
select 1
from Destination t1
where (t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
)

T-SQL - How to write query to get records that match ALL records in a many to many join

(I don't think I have titled this question correctly - but I don't know how to describe it)
Here is what I am trying to do:
Let's say I have a Person table that has a PersonID field. And let's say that a Person can belong to many Groups. So there is a Group table with a GroupID field and a GroupMembership table that is a many-to-many join between the two tables and the GroupMembership table has a PersonID field and a GroupID field. So far, it is a simple many to many join.
Given a list of GroupIDs I would like to be able to write a query that returns all of the people that are in ALL of those groups (not any one of those groups). And the query should be able to handle any number of GroupIDs. I would like to avoid dynamic SQL.
Is there some simple way of doing this that I am missing?
Thanks,
Corey
select person_id, count(*) from groupmembership
where group_id in ([your list of group ids])
group by person_id
having count(*) = [size of your list of group ids]
Edited: thank you dotjoe!
Basically you are looking for Persons for whom there is no group he is not a member of, so
select *
from Person p
where not exists (
select 1
from Group g
where not exists (
select 1
from GroupMembership gm
where gm.PersonID = p.ID
and gm.GroupID = g.ID
)
)
You're basically not going to avoid "dynamic" SQL in the sense of dynamically generating the query at query time. There's no way to hand a list around in SQL (well, there is, table variables, but getting them into the system from C# is either impossible (2005 & below) or else annoying (2008)).
One way that you could do it with multiple queries is to insert your list into a work table (probably a process-keyed table) and join against that table. The only other option would be to use a dynamic query such as the ones specified by Jonathan and hongliang.