Why is my subselect no working? - db2

I have this query and I want to get all records that have both ER and (OM, MT, or NM) money types, but this query brings back everything with just ER and disergards my subselect. The records key is SSN, PLAN, MONEY_TYPE_CD so I want all the SSNs that have ER and (OM, MT, or NM). Hope this is clearer. Any help is appreciated.
SELECT A.SSN, A.MONEY_TYPE_CD, A.PLAN_TYPE, A.ER_NUM
FROM FUND_DTL A
WHERE A.MONEY_TYPE_CD = 'ER'
AND A.SSN IN (
SELECT SSN
FROM FUND_DTL
WHERE MONEY_TYPE_CD IN ('OM', 'MT', 'NM')
)
WITH UR;

The WHERE A.MONEY_TYPE_CD = 'ER' is restricting the results set to just the 'ER' codes.
You want something more like so...
SELECT A.SSN, A.MONEY_TYPE_CD, A.PLAN_TYPE, A.ER_NUM
FROM FUND_DTL A
WHERE A.SSN IN (SELECT SSN
FROM FUND_DTL
WHERE MONEY_TYPE_CD IN ('OM', 'MT', 'NM'))
AND A.SSN IN (SELECT SSN
FROM FUND_DTL
WHERE MONEY_TYPE_CD = 'ER')
WITH UR;
Another version using INTERSECT
SELECT A.SSN, A.MONEY_TYPE_CD, A.PLAN_TYPE, A.ER_NUM
FROM FUND_DTL A
WHERE A.SSN IN (SELECT SSN
FROM FUND_DTL
WHERE MONEY_TYPE_CD IN ('OM', 'MT', 'NM')
INTERSECT
SELECT SSN
FROM FUND_DTL
WHERE MONEY_TYPE_CD = 'ER')
WITH UR;
Both say give me any record whose SSN appears in both the set of SSNs with 'ER' and the set of SSNs with ('OM', 'MT', 'NM')

Related

Selecting distinct values

The domain is:
company (id, name, adress)
employee (id, name, adress, company_id, expertise_id)
dependantrelative (id, name, employee_id)
expertise (id, name, class)
I want to know how to get the number of dependantrelatives of each employee who are unique experts in their respective companies.
The Query below does not return the correct answer. Can you help me?
SELECT DISTINCT dependantrelative.employee_id
, COUNT(*) AS qty_dependantrelatives
FROM dependantrelative
INNER JOIN employee
ON employee.id = dependantrelative.employee_id
GROUP BY dependantrelative.employee_id
I just tried out the Query below and it works, but I want to know if there is a faster and simple way of getting the answer.
SELECT employee.id
,COUNT(dependantrelative.employee_id) AS qty_dependantrelatives
FROM (
SELECT employee.company_id
, employee.expertise_id AS expert
, COUNT(employee.expertise_id)
FROM employee
GROUP BY employee.company_id
, employee.expertise_id
HAVING COUNT(employee.expertise_id)<2
) AS uniexpert
LEFT JOIN employee
ON employee.expertise_id = uniexpert.expert
LEFT JOIN salesorderdetail
ON dependantrelative.employee_id = employee.id
GROUP BY employee.id
ORDER BY employee.id

Where condition inside json_agg in postgres

I'm trying to return a JSON object from postgres that looks something like this:
[
{id: 1, name: "some organisation name", alias: [{alias:"alt name"}, {alias:"another name"}]}
...]
My query below works fine, except that I want to add a where condition, referencing the org_aliases table
SELECT json_build_object('id', table1.id, 'name', table1.name, 'alias', a.alias) as json
FROM orgs table1
CROSS JOIN LATERAL (
SELECT json_agg(agg) AS alias
FROM (
SELECT table2.alias as name
FROM org_aliases table2
WHERE
table2.org_id = table1.id
) agg
) a
WHERE
table1.name ilike 'nspcc'
or table2.alias ilike 'nspcc';
It fails on the last line (missing from condition on table2). I can see why it doesn't allow me to do this, as I'm referencing something inside a sub query.
My question, is what's the best way to handle this?
My only idea is that I need to join the org_aliases again so I can add a where condition. But if anyone has a better idea for how I structure the query to avoid duplication, that would be amazing.
Not sure if it's exactly what you're looking for, but it does give you the required result for your example:
SELECT
json_build_object('id', orgs.id
, 'name', name
, 'alias', json_agg(
json_build_object('name', alias)
)
)
FROM orgs
LEFT JOIN org_aliases ON orgs.id = org_aliases.org_id
WHERE name ILIKE '%dave%'
OR alias ILIKE '%dave%'
GROUP BY
orgs.id, orgs.name;
Edit: I would use a CTE to first find my target id and then get alle the data that I need. This makes it easy to understand (and debug):
WITH target AS (
SELECT orgs.id
FROM orgs
LEFT JOIN org_aliases ON orgs.id = org_aliases.id
WHERE name ILIKE '%dave%'
OR alias ILIKE '%dave%'
)
SELECT
json_build_object('id', orgs.id
, 'name', name
, 'alias', json_agg(
json_build_object('name', alias)
)
)
FROM target
JOIN orgs ON target.id = orgs.id
LEFT JOIN org_aliases ON orgs.id = org_aliases.org_id
GROUP BY
orgs.id, orgs.name;

In psql how to run a Loop for a Select query with CTEs and get the output shown if I run it in a read-only db?

My initial question is posted here (In psql how to run a Loop for a Select query with CTEs and get the output shown in read-only db?), which isn't defined well, so I am creating new question here.
I want to know how can I use a loop variable (or something similar) inside a Select query with CTEs .
I hope the following is a minimal reproducible example:
CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);
insert into persons values (4,'Smith','Eric','713 Louise Circle','Paris');
insert into persons values (5,'Smith2','Eric2','715 Louise Circle','London');
insert into persons values (8,'Smith3','Eric3','718 Louise Circle','Madrid');
Now I run the following for different values of (1,2,3)
WITH params AS
(
SELECT <ROWNUMBER> AS rownumber ),
person AS
(
SELECT personid, lastname, firstname, address
FROM params, persons
ORDER BY personid DESC
LIMIT 1
OFFSET ( SELECT rownumber - 1
FROM params) ),
filtered AS
(
SELECT *
FROM person
WHERE address ~ (SELECT rownumber::text FROM params)
)
SELECT *
FROM filtered;
and getting these outputs respectively for 1,2 and 3:
| personid | lastname | firstname | address
|----------|----------|-----------|-------------------
| 8 | Smith3 | Eric3 | 718 Louise Circle
(1 row)
| personid | lastname | firstname | address
|----------|----------|-----------|---------
(0 rows)
| personid | lastname | firstname | address
|----------|----------|-----------|-------------------
| 4 | Smith | Eric | 713 Louise Circle
(1 row)
My goal is to have a single query with loop or any other means to get the union of all 3 above select runs. I only have read-only access to db, so can't output in a new table. The GUI software I use have options to output in an internal window or export to a plain text file. The desired result would be:
|personid | lastname | firstname | address
|----------|----------|-----------|-------------------
| 4 | Smith | Eric | 713 Louise Circle
| 8 | Smith3 | Eric3 | 718 Louise Circle
(2 rows)
In reality the the loop variable is used in a more complicated way.
If I decipher this right, you basically want to select all people where the row number according to the descending ID appears in the address. The final result should then be limited to certain of these row numbers.
Then you don't need to use that cumbersome LIMIT/OFFSET construct at all. You can simply use the row_number() window function.
To filter for the row numbers you can simply use IN. Depending on what you want here you can either use a list of literals, especially if the numbers aren't consecutive. Or you can use generate_series() to generate a list of consecutive numbers. Of course you can also use a subquery, when the numbers are stored in another table.
With a list of literals that would look something like this:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
WHERE pn.address LIKE concat('%', pn.n, '%')
AND pn.n IN (1, 2, 4);
If you want to use generate_series() an example would be:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
WHERE pn.address LIKE concat('%', pn.n, '%')
AND pn.n IN (SELECT s.n
FROM generate_series(1, 3) s (n));
And a subquery of another table could be used like so:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
WHERE pn.address LIKE concat('%', pn.n, '%')
AND pn.n IN (SELECT t.nmuloc
FROM elbat t);
For larger sets of numbers you can also consider to use an INNER JOIN on the numbers instead of IN.
Using generate_series():
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
INNER JOIN generate_series(1, 1000000) s (n)
ON s.n = pn.n
WHERE pn.address LIKE concat('%', pn.n, '%');
Or when the numbers are in another table:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
INNER JOIN elbat t
ON t.nmuloc = pn.n
WHERE pn.address LIKE concat('%', pn.n, '%');
Note that I also changed the regular expression pattern matching to a simple LIKE. That would make the queries a bit more portable. But you can of course replace that by any expression you really need.
db<>fiddle (with some of the variants)

Using "UNION ALL" and "GROUP BY" to implement "Intersect"

I'v provided following query to find common records in 2 data sets but it's difficult for me to make sure about correctness of my query because of that I have a lot of data records in my DB.
Is it OK to implement Intersect between "Customers" & "Employees" tables using UNION ALL and apply GROUP BY on the result like below?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
So can we say that any record which exists in the result of this query also exists in the Intersect set between "Customers & Employees" tables AND any record that exists in Intersect set between "Customers & Employees" tables will be in the result of this query too?
So is it right to say any record in result of this query is in
"Intersect" set between "Customers & Employees" "AND" any record that
exist in "Intersect" set between "Customers & Employees" is in result
of this query too?
YES.
... Yes, but it won't be as efficient because you are filtering out duplicates three times instead of once. In your query you're
Using DISTINCT to pull unique records from employees
Using DISTINCT to pull unique records from customers
Combining both queries using UNION ALL
Using GROUP BY in your outer query to to filter the records you retrieved in steps 1,2 and 3.
Using INTERSECT will return identical results but more efficiently. To see for yourself you can create the sample data below and run both queries:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
Run these queries:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
Results:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
If using INTERSECT is not an option OR you want a faster query you could improve the query you posted a couple different ways, such as:
Option 1: let GROUP BY handle ALL the de-duplication like this:
This is the same as what you posted but without the DISTINCTS
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
Option 2: Use ROW_NUMBER
This would be my preference and will likely be most efficient
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;

TSQL Group By with an "OR"?

This query for creating a list of Candidate duplicates is easy enough:
SELECT Count(*), Can_FName, Can_HPhone, Can_EMail
FROM Can
GROUP BY Can_FName, Can_HPhone, Can_EMail
HAVING Count(*) > 1
But if the actual rule I want to check against is FName and (HPhone OR Email) - how can I adjust the GROUP BY to work with this?
I'm fairly certain I'm going to end up with a UNION SELECT here (i.e. do FName, HPhone on one and FName, EMail on the other and combine the results) - but I'd love to know if anyone knows an easier way to do it.
Thank you in advance for any help.
Scott in Maine
Before I can advise anything, I need to know the answer to this question:
name phone email
John 555-00-00 john#example.com
John 555-00-01 john#example.com
John 555-00-01 john-other#example.com
What COUNT(*) you want for this data?
Update:
If you just want to know that a record has any duplicates, use this:
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
)
SELECT *
FROM q qo
WHERE EXISTS
(
SELECT NULL
FROM q qi
WHERE qi.id <> qo.id
AND qi.name = qo.name
AND (qi.phone = qo.phone OR qi.email = qo.email)
)
It's more efficient, but doesn't tell you where the duplicate chain started.
This query select all entries along with the special field, chainid, that indicates where the duplicate chain started.
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
),
dup AS (
SELECT id AS chainid, id, name, phone, email, 1 as d
FROM q
UNION ALL
SELECT chainid, qo.id, qo.name, qo.phone, qo.email, d + 1
FROM dup
JOIN q qo
ON qo.name = dup.name
AND (qo.phone = dup.phone OR qo.email = dup.email)
AND qo.id > dup.id
),
chains AS
(
SELECT *
FROM dup do
WHERE chainid NOT IN
(
SELECT id
FROM dup di
WHERE di.chainid < do.chainid
)
)
SELECT *
FROM chains
ORDER BY
chainid
None of these answers is correct. Quassnoi's is a decent approach, but you will notice one fatal flaw in the expressions "qo.id > dup.id" and "di.chainid < do.chainid": comparisons made by ID! This is ALWAYS bad practice because it depends on some inherent ordering in the IDs. IDs should NEVER be given any implicit meaning and should ONLY participate in equality or null testing. You can easily break Quassnoi's solution in this example by simply reordering the IDs in the data.
The essential problem is a disjunctive condition with a grouping, which leads to the possibility of two records being related through an intermediate, though they are not directly relatable.
e.g., you stated these records should all be grouped:
(1) John 555-00-00 john#example.com
(2) John 555-00-01 john#example.com
(3) John 555-00-01 john-other#example.com
You can see that #1 and #2 are relatable, as are #2 and #3, but clearly #1 and #3 are not directly relatable as a group.
This establishes that a recursive or iterative solution is the ONLY possible solution.
So, recursion is not viable since you can easily end up in a looping situation. This is what Quassnoi was trying to avoid with his ID comparisons, but in doing so he broke the algorithm. You could try limiting the levels of recursion, but you may not then complete all relations, and you will still potentially be following loops back upon yourself, leading to excessive data size and prohibitive inefficiency.
The best solution is ITERATIVE: Start a result set by tagging each ID as a unique group ID, and then spin through the result set and update it, combining IDs into the same unique group ID as they match on the disjunctive condition. Repeat the process on the updated set each time until no further updates can be made.
I will create example code for this soon.
GROUP BY doesn't support OR - it's implicitly AND and must include every non-aggregator in the select list.
I assume you also have a unique ID integer as the primary key on this table. If you don't, it's a good idea to have one, for this purpose and many others.
Find those duplicates by a self-join:
select
c1.ID
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
, c2.ID
, c2.Can_FName
, c2.Can_HPhone
, c2.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
order by
c1.ID
The query gives you N-1 rows for each N duplicate combinations - if you want just a count along with each unique combination, count the rows grouped by the "left" side:
select count(1) + 1,
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
group by
c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
Granted, this is more involved than a union - but I think it illustrates a good way of thinking about duplicates.
Project the desired transformation first from a derived table, then do the aggregation:
SELECT COUNT(*)
, CAN_FName
, Can_HPhoneOrEMail
FROM (
SELECT Can_FName
, ISNULL(Can_HPhone,'') + ISNULL(Can_EMail,'') AS Can_HPhoneOrEMail
FROM Can) AS Can_Transformed
GROUP BY Can_FName, Can_HPhoneOrEMail
HAVING Count(*) > 1
Adjust your 'OR' operation as needed in the derived table project list.
I know this answer will be criticised for the use of the temp table, but it will work anyway:
-- create temp table to give the table a unique key
create table #tmp(
ID int identity,
can_Fname varchar(200) null, -- real type and len here
can_HPhone varchar(200) null, -- real type and len here
can_Email varchar(200) null, -- real type and len here
)
-- just copy the rows where a duplicate fname exits
-- (better performance specially for a big table)
insert into #tmp
select can_fname,can_hphone,can_email
from Can
where can_fname exists in (select can_fname from Can
group by can_fname having count(*)>1)
-- select the rows that have the same fname and
-- at least the same phone or email
select can_Fname, can_Hphone, can_Email
from #tmp a where exists
(select * from #tmp b where
a.ID<>b.ID and A.can_fname = b.can_fname
and (isnull(a.can_HPhone,'')=isnull(b.can_HPhone,'')
or (isnull(a.can_email,'')=isnull(b.can_email,'') )
Try this:
SELECT Can_FName, COUNT(*)
FROM (
SELECT
rank() over(partition by Can_FName order by Can_FName,Can_HPhone) rnk_p,
rank() over(partition by Can_FName order by Can_FName,Can_EMail) rnk_m,
Can_FName
FROM Can
) X
WHERE rnk_p=1 or rnk_m =1
GROUP BY Can_FName
HAVING COUNT(*)>1