Getting those records that have a match of all records in another table - tsql

Within the realm of this problem I have 3 entities:
User
Position
License
Then I have two relational (many-to-many) tables:
PositionLicense - this one connects Position with License ie. which licenses are required for a particular position
UserLicense - this one connects User with License ie. which licenses a particular user has. But with an additional complexity: user licenses have validity date range (ValidFrom and ValidTo)
The problem
These are input variables:
UserID that identifiers a particular User
RangeFrom defines the lower date range limit
RangeTo defines the upper date range limit
What I need to get? For a particular user (and date range) I need to get a list of positions that this particular user can work at. The problem is that user must have at least all licenses required by every matching position.
I'm having huge problems writing a SQL query to get this list.
If at all possible I would like to do this using a single SQL query (can have additional CTEs of course). If you can convince me that doing it in several queries would be more efficient I'm willing to listen in.
Some workable data
Copy and runs this script. 3 users, 3 positions, 6 licenses. Mark and John should have a match but not Jane.
create table [User] (
UserID int identity not null
primary key,
Name nvarchar(100) not null
)
go
create table Position (
PositionID int identity not null
primary key,
Name nvarchar(100) not null
)
go
create table License (
LicenseID int identity not null
primary key,
Name nvarchar(100) not null
)
go
create table UserLicense (
UserID int not null
references [User](UserID),
LicenseID int not null
references License(LicenseID),
ValidFrom date not null,
ValidTo date not null,
check (ValidFrom < ValidTo),
primary key (UserID, LicenseID)
)
go
create table PositionLicense (
PositionID int not null
references Position(PositionID),
LicenseID int not null
references License(LicenseID),
primary key (PositionID, LicenseID)
)
go
insert [User] (Name) values ('Mark the mechanic');
insert [User] (Name) values ('John the pilot');
insert [User] (Name) values ('Jane only has arts PhD but not medical.');
insert Position (Name) values ('Mechanic');
insert Position (Name) values ('Pilot');
insert Position (Name) values ('Doctor');
insert License (Name) values ('Mecha');
insert License (Name) values ('Flying');
insert License (Name) values ('Medicine');
insert License (Name) values ('PhD');
insert License (Name) values ('Phycho');
insert License (Name) values ('Arts');
insert PositionLicense (PositionID, LicenseID) values (1, 1);
insert PositionLicense (PositionID, LicenseID) values (2, 2);
insert PositionLicense (PositionID, LicenseID) values (2, 5);
insert PositionLicense (PositionID, LicenseID) values (3, 3);
insert PositionLicense (PositionID, LicenseID) values (3, 4);
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (1, 1, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (2, 2, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (2, 5, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (3, 4, '20110101', '20120101');
insert UserLicense (UserID, LicenseID, ValidFrom, ValidTo) values (3, 6, '20110101', '20120101');
Resulting solution
I've setup my resulting solution based on accepted answer which provides the most simplified solution to this problem. If you'd like to play with the query just hit edit/clone (whether you're logged in or not). What can be changed:
three variables:
two variable to set date range (#From and #To)
user ID (#User)
you can toggle commented code in the first CTE to switch code between fully overlapping user licenses or partially overlapping ones.

This makes a number of assumptions (ignores presence of time in the datetime columns, assumes fairly obvious primary keys) and skips the joins to pull in user name, position details, and the like. (And you implied that the user had to hold all the licenses for the full period specified, right?)
SELECT pl.PositionId
from PositionLicense pl
left outer join (-- All licenses user has for the entirety (sp?) of the specified date range
select LicenseId
from UserLicense
where UserId = #UserId
and #RangeFrom <= ValidFrom
and #RangeTo >= ValidTo) li
on li.LicenseId = pl.LicenseId
group by pl.PositionId
-- Where all licenses required by position are held by user
having count(pl.LicenseId) = count(li.LicenseId)
No data so I can't debug or test it, but this or something very close to it should do the trick.

Select ...
From User As U
Cross Join Position As P
Where Exists (
Select 1
From PositionLicense As PL1
Join UserLicense As UL1
On UL1.LicenseId = PL1.LicenseId
And UL1.ValidFrom <= #RangeTo
And UL1.ValidTo >= #RangeFrom
Where PL1.PositionId = P.Id
And UL1.UserId = U.Id
Except
Select 1
From PositionLicense As PL2
Left Join UserLicense As UL2
On UL2.LicenseId = PL2.LicenseId
And UL2.ValidFrom <= #RangeTo
And UL2.ValidTo >= #RangeFrom
And UL2.UserId = U.Id
Where PL2.PositionId = P.Id
And UL2.UserId Is Null
)
If the requirement is that you want users and positions that are valid across the entire range, that is trickier:
With Calendar As
(
Select #RangeFrom As [Date]
Union All
Select DateAdd(d, 1, [Date])
From Calendar
Where [Date] <= #RangeTo
)
Select ...
From User As U
Cross Join Position As P
Where Exists (
Select 1
From UserLicense As UL1
Join PositionLicense As PL1
On PL1.LicenseId = UL1.LicenseId
Where UL1.UserId = U.Id
And PL1.PositionId = P.Id
And UL1.ValidFrom <= #RangeTo
And UL1.ValidTo >= #RangeFrom
Except
Select 1
From Calendar As C1
Cross Join User As U1
Cross Join PositionLicense As PL1
Where U1.Id = U.Id
And PL1.PositionId = P.Id
And Not Exists (
Select 1
From UserLicense As UL2
Where UL2.LicenseId = PL1.LicenseId
And UL1.UserId = U1.Id
And C1.Date Between UL2.ValidFrom And UL2.ValidTo
)
)
Option ( MaxRecursion 0 );

Runnable Version Here
WITH PositionRequirements AS (
SELECT p.PositionID, COUNT(*) AS LicenseCt
FROM #Position AS p
INNER JOIN #PositionLicense AS posl
ON posl.PositionID = p.PositionID
GROUP BY p.PositionID
)
,Satisfied AS (
SELECT u.UserID, posl.PositionID, COUNT(*) AS LicenseCt
FROM #User AS u
INNER JOIN #UserLicense AS perl
ON perl.UserID = u.UserID
-- AND #Date BETWEEN perl.ValidFrom AND perl.ValidTo
AND '20110101' BETWEEN perl.ValidFrom AND perl.ValidTo
INNER JOIN #PositionLicense AS posl
ON posl.LicenseID = perl.LicenseID
-- WHERE u.UserID = #UserID -- Not strictly necessary, we can go over all people
GROUP BY u.UserID, posl.PositionID
)
SELECT PositionRequirements.PositionID, Satisfied.UserID
FROM PositionRequirements
INNER JOIN Satisfied
ON Satisfied.PositionID = PositionRequirements.PositionID
AND PositionRequirements.LicenseCt = Satisfied.LicenseCt
You could probably turn this into an inline table-valued function parameterized on effective date.

Related

LEFT JOIN trouble with multiple tables

I have the following query
SELECT a.account_id, sum(p.amount) AS amount
FROM accounts a
LEFT JOIN users_accounts ua
JOIN users u
JOIN payments p on p.meta_id = u.user_id
ON u.user_id = ua.user_id
ON ua.account_id = a.account_id
WHERE p.date_prcsd BETWEEN '2017-08-01 00:00:00' AND '2017-08-31 23:59:59'
GROUP BY a.account_id
ORDER BY account_id ASC;
What I want is all the rows from accounts a and zeroes for missing amount data. Same result set for different types of joins and different join structures - only rows that have some payments in p.
Where do I go wrong?
Simplified:
SELECT a.account_id
,sum(coalesce(p2.amount, 0)) AS amount
FROM accounts a
LEFT JOIN users_accounts ua ON (a.account_id = ua.account_id)
LEFT JOIN users u ON (ua.user_id = u.user_id)
LEFT JOIN (
SELECT p.meta_id
,p.amount
FROM payments p
WHERE p.date BETWEEN '2017-08-01' AND '2017-08-10'
) AS p2 ON (u.user_id = p2.meta_id)
GROUP BY a.account_id
ORDER BY account_id ASC;
Result:
account_id | amount
------------+--------
1 | 4
2 | 0
3 | 0
(3 rows)
Explanation: you need to take care of all returning null values. coalesce() does that for you. The where-clause is actually the real problem in your solution because it filters out rows that you would want to have in your endresult. On top of that: you left out the left join for the other tables. I created a simplified test db:
$ cat tables.sql
drop table users_accounts;
drop table payments;
drop table users;
drop table accounts;
create table accounts (account_id serial primary key, name varchar not
null);
create table users (user_id serial primary key, name varchar not null);
create table users_accounts(user_id int references users(user_id),
account_id int references
accounts(account_id));
create table payments(meta_id int references users(user_id), amount int
not null, date date);
insert into accounts (account_id, name) values (1, 'Account A'), (2,
'Account B'), (3, 'Account C');
insert into users (user_id, name) values (1, 'Marc'), (2, 'Ruben'), (3,
'Isaak');
insert into users_accounts (user_id, account_id) values (1,1),(2,1);
insert into payments(meta_id, amount, date) values (1,1, '2017-08-01'),
(1,2, '2017-08-11'),(1,3, '2017-08-03'),(2,1, null),(2,2, null),(2,3,
null);

Selecting one specific data row (required), and 3 others (specific data row must be included)

I need to select a specific row and 2 other rows that is not that specific row (a total of 3). The specific row must always be included in the 3 results. How should I go about it? I think it can be done with a UNION ALL, but do I have another choice? Thanks all! :)
Here are my scripts to create the sample tables:
create table users (
user_id serial primary key,
user_name varchar(20) not null
);
create table result_table1 (
result_id serial primary key,
user_id int4 references users(user_id),
result_1 int4 not null
);
create table result_table2 (
result_id serial primary key,
user_id int4 references users(user_id),
result_2 int4 not null
);
insert into users (user_name) values ('Kevin'),('John'),('Batman'),('Someguy');
insert into result_table1 (user_id, result_1) values (1, 20),(2, 40),(3, 70),(4, 42);
insert into result_table2 (user_id, result_2) values (1, 4),(2, 3),(3, 7),(4, 5);
Here is my UNION query:
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id = 1
UNION ALL
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id != 1
LIMIT 3;
Are there any options other than a UNION? The query works and does what I want for now, but will it always include user_id = 1 if I had a larger set of rows (assume that user_id = 1 will always be there)? :(
Thank you all! :)

Converting Traditional IF EXIST UPDATE ELSE INSERT into MERGE is not working?

I am going to use MERGE to insert or update a table depending upon ehether it's exist or not. This is my query,
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT id FROM #t WHERE ID = 2) t2 ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
The result is,
id name
1 a
I think it should be,
id name
1 a
2 b
You have your USING part slightly messed up, that's where to put what you want to match against (although in this case you're only using id)
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT 2, 'b') AS t2 (id, name) ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
As Mikhail pointed out, your query in the USING clause doesn't contain any rows.
If you want to do an upsert, put the new data into the USING clause:
MERGE INTO #t t1
USING (SELECT 2 as id, 'b' as name) t2 ON (t1.id = t2.id) --This no longer has an artificial dependency on #t
WHEN MATCHED THEN
UPDATE SET name = t2.name
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (t2.id, t2.name);
This query won't return anything:
SELECT id FROM #t WHERE ID = 2
Because where is no rows in table with ID = 2, so there is nothing to merge into table.
Besides, in MATCHED clause you are updating a field ID on which you are joining table, i think, it's forbidden.
For each DML operations you have to commit (Marks the end of a successful the transaction)Then only you will be able to see the latest data
For example :
GO
BEGIN TRANSACTION;
GO
DELETE FROM HumanResources.JobCandidate
WHERE JobCandidateID = 13;
GO
COMMIT TRANSACTION;
GO

Select value from an enumerated list in PostgreSQL

I want to select from an enumaration that is not in database.
E.g. SELECT id FROM my_table returns values like 1, 2, 3
I want to display 1 -> 'chocolate', 2 -> 'coconut', 3 -> 'pizza' etc. SELECT CASE works but is too complicated and hard to overview for many values. I think of something like
SELECT id, array['chocolate','coconut','pizza'][id] FROM my_table
But I couldn't succeed with arrays. Is there an easy solution? So this is a simple query, not a plpgsql script or something like that.
with food (fid, name) as (
values
(1, 'chocolate'),
(2, 'coconut'),
(3, 'pizza')
)
select t.id, f.name
from my_table t
join food f on f.fid = t.id;
or without a CTE (but using the same idea):
select t.id, f.name
from my_table t
join (
values
(1, 'chocolate'),
(2, 'coconut'),
(3, 'pizza')
) f (fid, name) on f.fid = t.id;
This is the correct syntax:
SELECT id, (array['chocolate','coconut','pizza'])[id] FROM my_table
But you should create a referenced table with those values.
What about creating another table that enumerate all cases, and do join ?
CREATE TABLE table_case
(
case_id bigserial NOT NULL,
case_name character varying,
CONSTRAINT table_case_pkey PRIMARY KEY (case_id)
)
WITH (
OIDS=FALSE
);
and when you select from your table:
SELECT id, case_name FROM my_table
inner join table_case on case_id=my_table_id;

TSQL: Remove duplicates based on max(date)

I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1