Complex Joins in Postgresql - postgresql

It's possible I'm stupid, but I've been querying and checking for hours and I can't seem to find the answer to this, so I apologize in advance if the post is redundant... but I can't seem to find its doppelganger.
OK: I have a PostGreSQL db with the following tables:
Key(containing two fields in which I'm interested, ID and Name)
and a second table, Key.
Data contains well... data, sorted by ID. ID is unique, but each Name has multiple ID's. E.G. if Bill enters the building this is ID 1 for Bill. Mary enters the building, ID 2 for Mary, Bill re-enters the building, ID 3 for Bill.
The ID field is in both the Key table, and the DATA table.
What I want to do is... find
The MAX (e.g. last) ID, unique to EACH NAME, and the Data associated with it.
E.g. Bill - Last Login: ID 10. Time: 123UTC Door: West and so on.
So... I'm trying the following query:
SELECT
*
FROM
Data, Key
WHERE
Key.ID = (
SELECT
MAX (ID)
FROM
Key
GROUP BY ID
)
Here's the kicker, there's about... something like 800M items in these tables, so errors are... time consuming. Can anyone help to see if this query is gonna do what I expect?
Thanks so much.

To get the maximum key for each name . . .
select Name, max(ID) as max_id
from data
group by Name;
Join that to your other table.
select *
from key t1
inner join (select Name, max(ID) as max_id
from data
group by Name) t2
on t1.id = t2.max_id

Related

how can i delete one duplicate record per user in postgresql

How can i delete a record and have one unique record per user.
That means there could be multiple of the same records but by different users. one user can not have more than one of the same record.
For example if user 1 searches for spiderman movie, then that is a unique record,
but if they search for spiderman again with the same name no difference then delete the second
record. Now if user 2 searches for same spiderman movie then keep that record because the user id of user 2 is different than the user 1 so on so forth.
My question is how can i delete a user record and keep only one unique record per user.
I am using two different tables to track users and match them with the movie table
I would like to delete the movie that is not unique for a user in the user table and movie table.
here are my tables
https://i.stack.imgur.com/yrc97.png
DELETE FROM movie_media t1
USING accountMovie t2
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY title
ORDER BY id) AS row_num
FROM movie_media ) t
WHERE t.row_num > 1)
AND t2."accountId" = $1
im looking for something like this i know the code will not work im trying to convey my general idea of what im looking for
The first thing you want to do is check for duplicate records before inserting them. I do this at the code level, but you can also do it at the database level. An ounce of prevention is worth a pound of cure!
If I'm understanding your tables correctly, you should be able to use this to delete your duplicate rows:
DELETE FROM movie
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY id,
mediaId
ORDER BY id ) AS row_num
FROM movie ) t
WHERE t.row_num > 1 );

Postgres Select one record per matching condition

I have some issues while trying to get only one record per matching condition..
Let's suppose I have Certifications table with the following columns:
Id, EmployeeId, DepartmentId, CertificationTitle, PassedDate
An employee can have more then one record in this table but I need to get only one record per employee (based on latest PassedDate)
SELECT Id, EmployeeId, CertificationTitle
FROM certifications c
ORDER BY EmployeeId, PassedDate DESC
From this select I need somehow to get only the first record for each EmployeeId.
Does anyone have any ideas how I can achieve this, Is it possible?
The Id is the Primary Key on the table, so it is different on each record.
I need to keep all this columns specified in the Select query.
The Group By didn't worked for me, or maybe I did it wrong...
Use DISTINCT ON. This returns exactly the first ordered record of the group. You ordered correctly by PassedData DESC to get the most recent record first. The group for DISTINCT ON, naturally, is EmployeeID:
SELECT DISTINCT ON (EmployeeId),
Id,
EmployeeId,
CertificationTitle
FROM certifications c
ORDER BY EmployeeId, PassedDate DESC

count max values in postgresql

I have a problem to formulate an sql question in postgresql, hoping to get some help here
I have three tables employee, visitor, and visit. I want to find out which employee (fk_employee_id) who have been responsible for most visit that haven't been checked out.
I want to make an sql question which are returning just the number one result, (by max function maybe?) instead of my current one, which are returning a ranked list (this ranked list doesn't work either if the number one position is shared by two persons)
This is my current sql question:
select visitor.fk_employee_id, count(visitor.fk_employee_id)
From Visit
Inner Join visitor on visit.fk_visitor_id = visitor.visitor_id
WHERE check_out_time IS NULL
group by visitor.fk_employee_id, visitor.fk_employee_id
Limit 1
Anyone now how to do this?
enter image description here
To avoid confusion, I will change the column names to:
visitor table, the FK to employee id : employee_in_charge_id
visit table, the FK to employee id : employee_to_meet_id
From your explanation in comments, you are looking for Employee, who has the most visits which are not check-out .
In the case where, more than 1 employees are having same max number of visits which are not check-out, this query lists all the multiple employees:
SELECT * FROM
(
SELECT
r.employee_in_charge_id,
count(*) cnt,
rank() over (ORDER BY count(*) DESC)
FROM visit v
JOIN visitor r ON v.visitor_id = r.id
WHERE v.check_out_time IS NULL
GROUP BY r.employee_in_charge_id
) a
WHERE rank = 1;
Refer SQLFidle link: http://sqlfiddle.com/#!17/423d9/2
Side Note:
To me, it sounds more correct if employee_in_charge_id is part of visit table, rather than visitor table. My assumption is for each visit, there is 1 employee (A) who is responsible to handle the visit, & the visitor is meeting 1 employee (B). So 1 visitor can make multiple visits, which handle by different employees.
Anyway, my answer above is based on your original schema design.
Assuming a standard n:m implementation like detailed here, this whould be one way to do it:
SELECT fk_employee_id
FROM visit
WHERE check_out_time IS NULL
GROUP BY fk_employee_id
ORDER BY count(*) DESC
LIMIT 1;
Assuming referential integrity, you do not need to include the table visitor in the query at all.
count(*) is a bit faster than count(fk_employee_id) doing the same in this case. (assuming fk_employee_id is NOT NULL). See:
PostgreSQL: running count of rows for a query 'by minute'

PostgreSQL: custom logic for determining distinct rows?

Here's my problem. Suppose I have a table called persons containing, among other things, fields for the person's name and national identification number, with the latter being optional. There can be multiple rows for each actual person.
Now suppose I want to select exactly one row for each actual person. For the purposes of the application, two rows are considered to refer to the same person if a) their ID numbers match, or b) their names match and the ID number of one or both is NULL. SELECT DISTINCT is no good here: I cannot do a DISTINCT ON (name, id) because then two rows with the same name where the ID of one is NULL wouldn't match (which is incorrect, they should be considered the same). I cannot do a DISTINCT ON (name) because then rows with the same name but different IDs would match (again incorrect, they should be considered different). And I cannot do a DISTINCT ON (id) because then all the rows where ID is NULL would be considered the same (obviously incorrect).
Is there any way to redefine the way PostgreSQL compares rows to determine whether or not they're identical? I guess the default behaviour for DISTINCT ON (name, id) would be something like IF a.name = b.name AND a.id = b.id THEN IDENTICAL ELSE DISTINCT. I'd like to redefine it to something like IF a.id = b.id OR (a.name = b.name AND (a.id IS NULL OR b.id IS NULL)) THEN IDENTICAL ELSE DISTINCT.
It's pretty late and I might have missed something obvious, so other suggestions on how to achieve what I want would also be welcome. Anything to enable me to select distinct rows based on more complex criteria than a simple list of columns. Thanks in advance.
With Window Functions
--
-- First, SELECT those names with NULL national IDs not shadowed by the same
-- name with a national ID. Each one is a unique person.
--
SELECT name, id
FROM persons
WHERE NOT EXISTS (SELECT 1
FROM persons p
WHERE p.name = persons.name AND p.id IS NOT NULL)
--
-- Second, collapse each national ID into the "first" row with that ID,
-- whatever the name. Each ID is a unique person.
--
UNION ALL
SELECT name, id
FROM (SELECT name, id, ROW_NUMBER() OVER (PARTITION BY id)
FROM persons
WHERE id IS NOT NULL) d
WHERE d.row_number = 1;
Without Window Functions
Replace the above UNION with a GROUP BY the first (MIN()) name for each non-NULL id:
...
UNION ALL
SELECT MIN(name) AS name, id
FROM persons
WHERE id IS NOT NULL
GROUP BY id
It seems like the main problem is the layout of your database. I don't know the details of your specific application, but having multiple rows and null IDs for the same person is usually a bad idea. If possible you may want to consider creating a separate table for any of the information that requires multiple rows, with persons only containing one row per person and a unique identifier for each row.
But, if you can't do that... I don't think just a distinct is going to solve this problem.
What's the problem with:
select distinct name, id
from persons
where id is not null
Do you have some persons that have a name, but not an ID? Or do you need some specific data from the other rows?
Here's another problem: if there are two rows with the same name and null IDs, and multiple people with the same name and different IDs, how do you know which person the null rows match?

T-SQL - How to write query to get records that match ALL records in a many to many join

(I don't think I have titled this question correctly - but I don't know how to describe it)
Here is what I am trying to do:
Let's say I have a Person table that has a PersonID field. And let's say that a Person can belong to many Groups. So there is a Group table with a GroupID field and a GroupMembership table that is a many-to-many join between the two tables and the GroupMembership table has a PersonID field and a GroupID field. So far, it is a simple many to many join.
Given a list of GroupIDs I would like to be able to write a query that returns all of the people that are in ALL of those groups (not any one of those groups). And the query should be able to handle any number of GroupIDs. I would like to avoid dynamic SQL.
Is there some simple way of doing this that I am missing?
Thanks,
Corey
select person_id, count(*) from groupmembership
where group_id in ([your list of group ids])
group by person_id
having count(*) = [size of your list of group ids]
Edited: thank you dotjoe!
Basically you are looking for Persons for whom there is no group he is not a member of, so
select *
from Person p
where not exists (
select 1
from Group g
where not exists (
select 1
from GroupMembership gm
where gm.PersonID = p.ID
and gm.GroupID = g.ID
)
)
You're basically not going to avoid "dynamic" SQL in the sense of dynamically generating the query at query time. There's no way to hand a list around in SQL (well, there is, table variables, but getting them into the system from C# is either impossible (2005 & below) or else annoying (2008)).
One way that you could do it with multiple queries is to insert your list into a work table (probably a process-keyed table) and join against that table. The only other option would be to use a dynamic query such as the ones specified by Jonathan and hongliang.