Join returns duplicate rows - postgresql

I am learning PostgreSQL using the Sakila database. There's a table called actor that has an actor_id, first_name and last_name. There is another table that has actors mapped to films through an actor_id and film_id combination.
I expect the following query to return one row for each actor with the maximum value for film_id for that actor, but I am getting multiple rows instead of one (the maximum of film_id for that actor).
SELECT actor.first_name, actor.last_name, MAX(film_actor.film_id)
FROM actor
LEFT JOIN film_actor ON film_actor.actor_id = actor.actor_id
GROUP BY film_actor.film_id, actor.first_name, actor.last_name
ORDER BY film_actor.film_id;
I appreciate your help in understanding how to get this right using joins (I already have the solution to achieve this using a sub-query).
PS: I am sure this question is often asked by beginners to SQL, but I have not seen an answer that works yet.

You must remove film_actor.film_id from:
GROUP BY film_actor.film_id, actor.first_name, actor.last_name
because you want to group by actor only.
So change to this:
GROUP BY actor.actor_id, actor.first_name, actor.last_name
I added also actor.actor_id just in case there are 2 actors with the same name.
and change the ORDER BY clause to:
ORDER BY MAX(film_actor.film_id)

Related

PostgreSQL how to GROUP BY single field from returned table

So I have complicated query, to simplify let it be like
SELECT
t.*,
SUM(a.hours) AS spent_hours
FROM (
SELECT
person.id,
person.name,
person.age,
SUM(contacts.id) AS contact_count
FROM
person
JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id
Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table.
I can do that, but I don't believe that will work efficiently with big data.
I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.
Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?
This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.
I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.
So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.
If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:
.
SELECT
t.*,
SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id
This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.
For example the sum column would look like this:
Name Hours Sum Hours
----- ----- ---------
Smith 20 120
Jones 30 30
Smith 100 120
Whereas a group by would have had Smith once and could not have displayed the hours column in detail.
If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.

Postgresql get references from a dictionary

I'm trying to build a request to get the data from a table, but some of those columns have foreign keys I would like to replace by the associated keyword in one request.
Basically there's
table A with column 1:PKA-ID and column 2:name.
table B with column 1:PKB-ID, column 2:FKA-ID, column 3:amount.
I want to get all the lines in table B but with all foreign keys replaced by the associated names in table A.
I started building a request with a subrequest + alias to get that, but ofc I have more than one result per subrequest, yet I can't find a way to link that subrequest to the ID of table B [might be exhausted, dumb or both] from the main request. I did something like that:
SELECT (SELECT "NAME" FROM A JOIN B ON ID = FKA-ID) AS name, amount FROM TABLEB;
it feels so simple of a request yet...
You don't need a join in the subselect.
SELECT pkb_id,
(SELECT name FROM a WHERE a.pka_id = b.fka_id),
amount
FROM b;
(See it live in SQL Fiddle).
The subselect query runs for each and every row of its parent select and has the parent row available from the context.
You can also use a simple join.
SELECT b.pkb_id, a.name, b.amount
FROM b, a
WHERE a.pka_id = b.fka_id;
Note that the join version puts less restrictions on the PostgreSQL query optimizer so in some cases the join version might work faster. (For example, in PostgreSQL 9.6 the join might utilize multiple CPU units, cf. Parallel Query).

Create a query to select two columns; (Company, No. of Films) from the database

I have created a database as part of university assignment and I have hit a snag with the question in the title.
More likely I am being asked to find out how many films each company has made. Which suggests to me a group by query. But I have no idea where to begin. It is only a two mark question but the syntax is not clicking in my head.
My schema is:
CREATE TABLE Movie
(movieID CHAR(3) ,
title CHAR(36),
year NUMBER,
company CHAR(50),
totalNoms NUMBER,
awardsWon NUMBER,
DVDPrice NUMBER(5,2),
discountPrice NUMBER(5,2))
There are other tables but at first glance I don't think they are relevant to this question.
I am using sqlplus10
The answer you need comes from three basic SQL concepts, I'll step through them with you. If you need more assistance to create an answer from these hints, let me know and I can try to keep guiding you.
Group By
As you mentioned, SQL offers a GROUP BY function that can help you.
A SQL Query utilizing GROUP BY would look like the following.
SELECT list, fields, aggregate(value)
FROM tablename
--WHERE goes here, if you need to restrict your result set
GROUP BY list, fields
a GROUP BY query can only return fields listed in the group by statement, or aggregate functions acting on each group.
Aggregate Functions
Your homework question also needs an Aggregate function called Count. This is used to count the results returned. A simple query like the following returns the count of all records returned.
SELECT Count(*)
FROM tablename
The two can be combined, allowing you to get the Count of each group in the following way.
SELECT list, fields, count(*)
FROM tablename
GROUP BY list, fields
Column Aliases
Another answer also tried to introduce you to SQL column aliases, but they did not use SQLPLUS syntax.
SELECT Count(*) as count
...
SQLPLUS column alias syntax is shown below.
SELECT Count(*) "count"
...
I'm not going to provide you the SQL, but instead a way to think about it.
What you want to do is select where the company matches and count the total rows returned. That count is the number of films made by the specified company.
Hope that points you in the right direction.
Select company, count(*) AS count
from Movie
group by company
select * group by company won't work in Oracle.

T-SQL - How to write query to get records that match ALL records in a many to many join

(I don't think I have titled this question correctly - but I don't know how to describe it)
Here is what I am trying to do:
Let's say I have a Person table that has a PersonID field. And let's say that a Person can belong to many Groups. So there is a Group table with a GroupID field and a GroupMembership table that is a many-to-many join between the two tables and the GroupMembership table has a PersonID field and a GroupID field. So far, it is a simple many to many join.
Given a list of GroupIDs I would like to be able to write a query that returns all of the people that are in ALL of those groups (not any one of those groups). And the query should be able to handle any number of GroupIDs. I would like to avoid dynamic SQL.
Is there some simple way of doing this that I am missing?
Thanks,
Corey
select person_id, count(*) from groupmembership
where group_id in ([your list of group ids])
group by person_id
having count(*) = [size of your list of group ids]
Edited: thank you dotjoe!
Basically you are looking for Persons for whom there is no group he is not a member of, so
select *
from Person p
where not exists (
select 1
from Group g
where not exists (
select 1
from GroupMembership gm
where gm.PersonID = p.ID
and gm.GroupID = g.ID
)
)
You're basically not going to avoid "dynamic" SQL in the sense of dynamically generating the query at query time. There's no way to hand a list around in SQL (well, there is, table variables, but getting them into the system from C# is either impossible (2005 & below) or else annoying (2008)).
One way that you could do it with multiple queries is to insert your list into a work table (probably a process-keyed table) and join against that table. The only other option would be to use a dynamic query such as the ones specified by Jonathan and hongliang.

How can I write a SQL select statement to include a lookup from another table?

I'm copying data from one database to another and massaging the data while I'm at it. Both databases have tables called Clients and Jobs.
However, in database "Alpha" the Jobs table does not have a relationship to the Clients table, where database "Epsilon" does. Alpha's Jobs table just has the Clients name in an nvarchar column.
I need a select statement to lookup the Client's ID in the Client table by their name while I am inserting it into the Jobs table in Epsilon.
My unfinished SQL statement looks like this:
insert into Epsilon.dbo.Jobs (ClientId, Name, Location, DateCreated)
select ????, Name, Location, DateCreated from Alpha.dbo.Jobs
How can I modify this so that the ???? contains the ClientId from the Clients table in Epsilon? I know I need to lookup the data using the Name column in Jobs, but I can't figure out the syntax for this.
What you need is a join. Joins, contrary to what pretty much everybody thinks when starting out, don't require defined relationships in the schema of the database. They just require that the two columns you're comparing have the same type (edit see comments).
The question is which join do you want. Because there isn't a relationship defined, there may be clients that have jobs and clients that don't, and jobs that have clients and jobs that don't.
I'm assuming that you want all JOBS that exist, and where a ClientId matches the CLIENTS table bring in the ClientId, and where that relationship doesn't exist to leave the ClientId null. We can do this with a LEFT JOIN. Jobs LEFT JOIN Clients will bring in all records on the LEFT, even where the relationship defined with Clients on the right doesn't exist. We could reverse the two and do a RIGHT JOIN, but that's not what people usually do. I'll leave it to you to read up on other types of joins and how they work.
So your select statement would look like:
select ClientId, Name, Location, DateCreated
from Alpha.dbo.Jobs as J LEFT JOIN
Alpha.dbo.Clients as C ON j.ClientName = c.ClientName
If Jobs.ClientName is not the same data type as c.ClientName, you can edit the schema before running the query to bring them in line with each other.
insert into Epsilon.dbo.Jobs (ClientId, Name, Location, DateCreated)
select c.ClientID, a.Name, a.Location, a.DateCreated from Alpha.dbo.Jobs a
join Epsilon.dbo.Client c on c.Name = a.ClientName
This is a pretty optimistic join, but even if it needs to be modified this should give you the general idea.
insert into Epsilon.dbo.Jobs
(ClientId, Name, Location,
DateCreated)
select c.ClientId, Name, Location, DateCreated from Alpha.dbo.Jobs as j
inner join Epsilon.dbo.Clients as c On
(j.ClientId = c.ClientId)