LEFT JOIN trouble with multiple tables - postgresql

I have the following query
SELECT a.account_id, sum(p.amount) AS amount
FROM accounts a
LEFT JOIN users_accounts ua
JOIN users u
JOIN payments p on p.meta_id = u.user_id
ON u.user_id = ua.user_id
ON ua.account_id = a.account_id
WHERE p.date_prcsd BETWEEN '2017-08-01 00:00:00' AND '2017-08-31 23:59:59'
GROUP BY a.account_id
ORDER BY account_id ASC;
What I want is all the rows from accounts a and zeroes for missing amount data. Same result set for different types of joins and different join structures - only rows that have some payments in p.
Where do I go wrong?

Simplified:
SELECT a.account_id
,sum(coalesce(p2.amount, 0)) AS amount
FROM accounts a
LEFT JOIN users_accounts ua ON (a.account_id = ua.account_id)
LEFT JOIN users u ON (ua.user_id = u.user_id)
LEFT JOIN (
SELECT p.meta_id
,p.amount
FROM payments p
WHERE p.date BETWEEN '2017-08-01' AND '2017-08-10'
) AS p2 ON (u.user_id = p2.meta_id)
GROUP BY a.account_id
ORDER BY account_id ASC;
Result:
account_id | amount
------------+--------
1 | 4
2 | 0
3 | 0
(3 rows)
Explanation: you need to take care of all returning null values. coalesce() does that for you. The where-clause is actually the real problem in your solution because it filters out rows that you would want to have in your endresult. On top of that: you left out the left join for the other tables. I created a simplified test db:
$ cat tables.sql
drop table users_accounts;
drop table payments;
drop table users;
drop table accounts;
create table accounts (account_id serial primary key, name varchar not
null);
create table users (user_id serial primary key, name varchar not null);
create table users_accounts(user_id int references users(user_id),
account_id int references
accounts(account_id));
create table payments(meta_id int references users(user_id), amount int
not null, date date);
insert into accounts (account_id, name) values (1, 'Account A'), (2,
'Account B'), (3, 'Account C');
insert into users (user_id, name) values (1, 'Marc'), (2, 'Ruben'), (3,
'Isaak');
insert into users_accounts (user_id, account_id) values (1,1),(2,1);
insert into payments(meta_id, amount, date) values (1,1, '2017-08-01'),
(1,2, '2017-08-11'),(1,3, '2017-08-03'),(2,1, null),(2,2, null),(2,3,
null);

Related

PostgreSQL count other values of ID that have the same value of other column

Let's say we have the following table that stores id of an observation and its address_id. You can create the table with the following code:
drop table if exists schema.pl_address_cnt;
create table schema.pl_address_cnt (
id serial,
address_id int);
insert into schema.pl_address_cnt(address_id) values
(100), (101), (100), (101), (100), (125), (128), (200), (200), (100);
My task is to count for each id how many other ids (thus -1) have the same address_id. I've come up with a solution that turns out to be quite expensive (explain) on the original dataset. I wonder whether my solution can be somehow optimised.
with tmp_table as (select address_id
, count(distinct id) as id_count
from schema.pl_address_cnt
group by address_id
)
select id
, id_count - 1
from schema.pl_address_cnt as pac
left join tmp_table as tt on tt.address_id=pac.address_id;
You can try to omit the CTE and do a self left join on common address but different ID and then aggregate this.
SELECT pac1.id,
count(pac2.id)
FROM pl_address_cnt pac1
LEFT JOIN pl_address_cnt pac2
ON pac1.address_id = pac2.address_id
AND pac1.id <> pac2.id
GROUP BY pac1.id
ORDER BY pac1.id;
For performance you can try indexes on (address_id, id) and (id).

Selecting one specific data row (required), and 3 others (specific data row must be included)

I need to select a specific row and 2 other rows that is not that specific row (a total of 3). The specific row must always be included in the 3 results. How should I go about it? I think it can be done with a UNION ALL, but do I have another choice? Thanks all! :)
Here are my scripts to create the sample tables:
create table users (
user_id serial primary key,
user_name varchar(20) not null
);
create table result_table1 (
result_id serial primary key,
user_id int4 references users(user_id),
result_1 int4 not null
);
create table result_table2 (
result_id serial primary key,
user_id int4 references users(user_id),
result_2 int4 not null
);
insert into users (user_name) values ('Kevin'),('John'),('Batman'),('Someguy');
insert into result_table1 (user_id, result_1) values (1, 20),(2, 40),(3, 70),(4, 42);
insert into result_table2 (user_id, result_2) values (1, 4),(2, 3),(3, 7),(4, 5);
Here is my UNION query:
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id = 1
UNION ALL
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id != 1
LIMIT 3;
Are there any options other than a UNION? The query works and does what I want for now, but will it always include user_id = 1 if I had a larger set of rows (assume that user_id = 1 will always be there)? :(
Thank you all! :)

Join two tables with count from first table

I know there is an obvious answer to this question, but I'm like a noob trying to remember how to write queries. I have the following table structure in Postgresql:
CREATE TABLE public.table1 (
accountid BIGINT NOT NULL,
rpt_start DATE NOT NULL,
rpt_end DATE NOT NULL,
CONSTRAINT table1_pkey PRIMARY KEY(accountid, rpt_start, rpt_end)
)
WITH (oids = false);
CREATE TABLE public.table2 (
customer_id BIGINT NOT NULL,
read VARCHAR(255),
CONSTRAINT table2 PRIMARY KEY(customer_id)
)
WITH (oids = false);
The objective of the query is to display a result set of accountid's, count of accountid's in table1 and read from table2. The join is on table1.accountid = table2.customer_id.
The result set should appear as follows:
accountid count read
1234 2 100
1235 9 110
1236 1 91
The count column reflect the number of rows in table1 for each accountid. The read column is a value from table2 associated with the same accountid.
select accountid, "count", read
from
(
select accountid, count(*) "count"
from table1
group by accountid
) t1
inner join
table2 t2 on t1.accountid = t2.customer_id
order by accountid
SELECT table2.customer_id, COUNT(*), table2.read
FROM table2
LEFT JOIN table1 ON (table2.customer_id = table1.accountid)
GROUP BY table2.customer_id, table2.read
SELECT t2.customer_id, t2.read, COUNT(*) AS the_count
FROM table2 t2
JOIN table1 t1 ON t1.accountid = t2.customer_id
GROUP BY t2.customer_id, t2.read
;

Is it possible to use a aggregate in an aggregate to get a specific single value?

I have been playing around with code for a while now, and I have come across a problem where I must get the amount of certain fields where the average is above a certain amount , grouped by two fields from different tables
Here is my Code and expectations
SELECT C.Course,S.Name, COUNT(*) as Average FROM Students S
INNER JOIN Student_Modules SM ON
SM.StudentID = S.ID
INNER JOIN Courses_Template C
ON C.ID = SM.CourseID
Group by C.Course,S.Name
Having AVG(SM.Percentage_Obtained) > 80
This sends me back the rows containing the course name, the student's name, and the amount of percentages above 80%.
This counts for me as "the amounts of students that passed the course". I would Like to know how to force this query to give me the amount of students who have passed the course in stead of the amount of modules the student has passed and if it is possible
EDIT 1:
STUDENT LAYOUT
CREATE TABLE Students
(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,StudentNumber VARCHAR(20)
,Name VARCHAR(40)
,Surname VARCHAR(40)
,Student_ID VARCHAR(13)
,Languages VARCHAR(200)
,[Address] Varchar (512)
,Contact_Number varchar(20)
,Email Varchar (150)
,Days_Absent INT
,Student_Web_Username varchar(40)
,Student_Web_Password varchar(MAX)
,BranchID int
,Constraint FKStudentBranch FOREIGN KEY (BranchID) REFERENCES Branches(ID)
,CONSTRAINT Unq_StudentNumber UNIQUE (StudentNumber)
,CONSTRAINT Unq_Student_ID UNIQUE (Student_ID));
STUDENT_MODULE LAYOUT
CREATE TABLE Student_Modules
(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,ModuleID INT
,StudentID INT
,CourseID INT
,Percentage_Obtained INT Check (Percentage_Obtained >= -1 AND Percentage_Obtained <= 100)
,CONSTRAINT FKStudentModulesChosen FOREIGN KEY (ModuleID) REFERENCES Modules_Template(ID) ON DELETE CASCADE
,CONSTRAINT FKStudentModules FOREIGN KEY (StudentID) REFERENCES Students(ID) ON DELETE CASCADE);
COURSES_TEMPLATE LAYOUT
CREATE TABLE COURSES_TEMPLATE
(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,Course VARCHAR(40)
,Price SMALLMONEY CHECK(Price > 0)
,BranchID INT
,CONSTRAINT FKCourseBranches FOREIGN KEY (BranchID) REFERENCES Branches(ID) ON DELETE CASCADE);
If they need to pass by average 80% across all modules.
SELECT C.Course, COUNT(*) as [Average]
FROM Students S
INNER JOIN Student_Modules SM ON S.ID = SM.StudentID
INNER JOIN Courses_Template C ON SM.CourseID = C.ID
INNER JOIN (
SELECT SM.StudentID, SM.CourseID
FROM Student_Modules SM
Group by SM.StudentID, SM.CourseID
Having AVG(SM.Percentage_Obtained) > 80
) Pass ON SM.StudentID = Pass.StudentID AND SM.CourseID = Pass.CourseID
GROUP BY C.Course
If they need to pass each module by 80% to pass the course then
SELECT C.Course, COUNT(*) as [Average]
FROM Students S
INNER JOIN Student_Modules SM ON S.ID = SM.StudentID
INNER JOIN Courses_Template C ON SM.CourseID = C.ID
LEFT OUTER JOIN (
SELECT DISTINCT SM.StudentID, SM.CourseID
FROM Student_Modules SM
WHERE SM.Percentage_Obtained <= 80
) as NotPass ON SM.StudentID = NotPass.StudentID AND SM.CourseID = NotPass.CourseID
WHERE NotPass.StudentID IS NULL
GROUP BY C.Course
This is untested, let me know any errors or paste incorrect output and expected output.
It looks like you want the number of students that passed each course? If so wouldn't you just need to group by C.Course and then have a Count(S.Name) as NumWhoPassed for the display?

Getting those records that have a match of all records in another table

Within the realm of this problem I have 3 entities:
User
Position
License
Then I have two relational (many-to-many) tables:
PositionLicense - this one connects Position with License ie. which licenses are required for a particular position
UserLicense - this one connects User with License ie. which licenses a particular user has. But with an additional complexity: user licenses have validity date range (ValidFrom and ValidTo)
The problem
These are input variables:
UserID that identifiers a particular User
RangeFrom defines the lower date range limit
RangeTo defines the upper date range limit
What I need to get? For a particular user (and date range) I need to get a list of positions that this particular user can work at. The problem is that user must have at least all licenses required by every matching position.
I'm having huge problems writing a SQL query to get this list.
If at all possible I would like to do this using a single SQL query (can have additional CTEs of course). If you can convince me that doing it in several queries would be more efficient I'm willing to listen in.
Some workable data
Copy and runs this script. 3 users, 3 positions, 6 licenses. Mark and John should have a match but not Jane.
create table [User] (
UserID int identity not null
primary key,
Name nvarchar(100) not null
)
go
create table Position (
PositionID int identity not null
primary key,
Name nvarchar(100) not null
)
go
create table License (
LicenseID int identity not null
primary key,
Name nvarchar(100) not null
)
go
create table UserLicense (
UserID int not null
references [User](UserID),
LicenseID int not null
references License(LicenseID),
ValidFrom date not null,
ValidTo date not null,
check (ValidFrom < ValidTo),
primary key (UserID, LicenseID)
)
go
create table PositionLicense (
PositionID int not null
references Position(PositionID),
LicenseID int not null
references License(LicenseID),
primary key (PositionID, LicenseID)
)
go
insert [User] (Name) values ('Mark the mechanic');
insert [User] (Name) values ('John the pilot');
insert [User] (Name) values ('Jane only has arts PhD but not medical.');
insert Position (Name) values ('Mechanic');
insert Position (Name) values ('Pilot');
insert Position (Name) values ('Doctor');
insert License (Name) values ('Mecha');
insert License (Name) values ('Flying');
insert License (Name) values ('Medicine');
insert License (Name) values ('PhD');
insert License (Name) values ('Phycho');
insert License (Name) values ('Arts');
insert PositionLicense (PositionID, LicenseID) values (1, 1);
insert PositionLicense (PositionID, LicenseID) values (2, 2);
insert PositionLicense (PositionID, LicenseID) values (2, 5);
insert PositionLicense (PositionID, LicenseID) values (3, 3);
insert PositionLicense (PositionID, LicenseID) values (3, 4);
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (1, 1, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (2, 2, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (2, 5, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (3, 4, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (3, 6, '20110101', '20120101');
Resulting solution
I've setup my resulting solution based on accepted answer which provides the most simplified solution to this problem. If you'd like to play with the query just hit edit/clone (whether you're logged in or not). What can be changed:
three variables:
two variable to set date range (#From and #To)
user ID (#User)
you can toggle commented code in the first CTE to switch code between fully overlapping user licenses or partially overlapping ones.
This makes a number of assumptions (ignores presence of time in the datetime columns, assumes fairly obvious primary keys) and skips the joins to pull in user name, position details, and the like. (And you implied that the user had to hold all the licenses for the full period specified, right?)
SELECT pl.PositionId
from PositionLicense pl
left outer join (-- All licenses user has for the entirety (sp?) of the specified date range
select LicenseId
from UserLicense
where UserId = #UserId
and #RangeFrom <= ValidFrom
and #RangeTo >= ValidTo) li
on li.LicenseId = pl.LicenseId
group by pl.PositionId
-- Where all licenses required by position are held by user
having count(pl.LicenseId) = count(li.LicenseId)
No data so I can't debug or test it, but this or something very close to it should do the trick.
Select ...
From User As U
Cross Join Position As P
Where Exists (
Select 1
From PositionLicense As PL1
Join UserLicense As UL1
On UL1.LicenseId = PL1.LicenseId
And UL1.ValidFrom <= #RangeTo
And UL1.ValidTo >= #RangeFrom
Where PL1.PositionId = P.Id
And UL1.UserId = U.Id
Except
Select 1
From PositionLicense As PL2
Left Join UserLicense As UL2
On UL2.LicenseId = PL2.LicenseId
And UL2.ValidFrom <= #RangeTo
And UL2.ValidTo >= #RangeFrom
And UL2.UserId = U.Id
Where PL2.PositionId = P.Id
And UL2.UserId Is Null
)
If the requirement is that you want users and positions that are valid across the entire range, that is trickier:
With Calendar As
(
Select #RangeFrom As [Date]
Union All
Select DateAdd(d, 1, [Date])
From Calendar
Where [Date] <= #RangeTo
)
Select ...
From User As U
Cross Join Position As P
Where Exists (
Select 1
From UserLicense As UL1
Join PositionLicense As PL1
On PL1.LicenseId = UL1.LicenseId
Where UL1.UserId = U.Id
And PL1.PositionId = P.Id
And UL1.ValidFrom <= #RangeTo
And UL1.ValidTo >= #RangeFrom
Except
Select 1
From Calendar As C1
Cross Join User As U1
Cross Join PositionLicense As PL1
Where U1.Id = U.Id
And PL1.PositionId = P.Id
And Not Exists (
Select 1
From UserLicense As UL2
Where UL2.LicenseId = PL1.LicenseId
And UL1.UserId = U1.Id
And C1.Date Between UL2.ValidFrom And UL2.ValidTo
)
)
Option ( MaxRecursion 0 );
Runnable Version Here
WITH PositionRequirements AS (
SELECT p.PositionID, COUNT(*) AS LicenseCt
FROM #Position AS p
INNER JOIN #PositionLicense AS posl
ON posl.PositionID = p.PositionID
GROUP BY p.PositionID
)
,Satisfied AS (
SELECT u.UserID, posl.PositionID, COUNT(*) AS LicenseCt
FROM #User AS u
INNER JOIN #UserLicense AS perl
ON perl.UserID = u.UserID
-- AND #Date BETWEEN perl.ValidFrom AND perl.ValidTo
AND '20110101' BETWEEN perl.ValidFrom AND perl.ValidTo
INNER JOIN #PositionLicense AS posl
ON posl.LicenseID = perl.LicenseID
-- WHERE u.UserID = #UserID -- Not strictly necessary, we can go over all people
GROUP BY u.UserID, posl.PositionID
)
SELECT PositionRequirements.PositionID, Satisfied.UserID
FROM PositionRequirements
INNER JOIN Satisfied
ON Satisfied.PositionID = PositionRequirements.PositionID
AND PositionRequirements.LicenseCt = Satisfied.LicenseCt
You could probably turn this into an inline table-valued function parameterized on effective date.