Using "UNION ALL" and "GROUP BY" to implement "Intersect" - tsql

I'v provided following query to find common records in 2 data sets but it's difficult for me to make sure about correctness of my query because of that I have a lot of data records in my DB.
Is it OK to implement Intersect between "Customers" & "Employees" tables using UNION ALL and apply GROUP BY on the result like below?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
So can we say that any record which exists in the result of this query also exists in the Intersect set between "Customers & Employees" tables AND any record that exists in Intersect set between "Customers & Employees" tables will be in the result of this query too?

So is it right to say any record in result of this query is in
"Intersect" set between "Customers & Employees" "AND" any record that
exist in "Intersect" set between "Customers & Employees" is in result
of this query too?
YES.
... Yes, but it won't be as efficient because you are filtering out duplicates three times instead of once. In your query you're
Using DISTINCT to pull unique records from employees
Using DISTINCT to pull unique records from customers
Combining both queries using UNION ALL
Using GROUP BY in your outer query to to filter the records you retrieved in steps 1,2 and 3.
Using INTERSECT will return identical results but more efficiently. To see for yourself you can create the sample data below and run both queries:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
Run these queries:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
Results:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
If using INTERSECT is not an option OR you want a faster query you could improve the query you posted a couple different ways, such as:
Option 1: let GROUP BY handle ALL the de-duplication like this:
This is the same as what you posted but without the DISTINCTS
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
Option 2: Use ROW_NUMBER
This would be my preference and will likely be most efficient
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;

Related

can you use max in this query?

From this table, I'm trying to determine the nation (s) that have the highest number of teams (a nation X has a team if it has at least one athlete from that country X).
driver(id,name, team, country)
This solution restores all countries in descending order. Would it be possible to ensure that only the one (s) with the most team (s) return and not all of them? I think you should use the 'max' command but I'm not sure.
SELECT (country) ,count(distinct team)
FROM driver
GROUP BY country
order by count(distinct team) DESC;
I would use your query as a CTE and then select from it like this -
WITH t AS
(
SELECT country, count(distinct team) cnt
FROM driver
GROUP BY country
)
SELECT country, cnt FROM t
WHERE cnt = (SELECT max(cnt) FROM t);
You can combine this with a window function:
with counts as (
SELECT country,
count(distinct team) as num_teams,
dense_rank() over (order by count(distinct team) desc) as rnk
FROM driver
GROUP BY country
)
select country, num_teams
from counts
where rnk = 1;
If you are using Postgres 14, you can use fetch first with the option with ties:
SELECT country,
count(distinct team) as num_teams
FROM driver
GROUP BY country
order by count(distinct team) desc
fetch first 1 rows with ties
If two countries have the same highest number of drivers, this would return both. Without the with ties option (which was introduced in Postgres 14) only one of them would be returned.

find all companies where all employees in specific state

I have a table employees with columns:
company_id,
id,
opt_state (ceased_membership, ignition, opted_out, opted_in),
opt_out_on.
I want to query all companies where all employees opt-state is in ('ceased_membership', 'ignition', 'opted_out') and the date opt_out_on when last employee left.
I have tried this but it didn't work
select company_id from employees where id=all(select id from
employees
where opt_state in ('ceased_membership', 'ignition','opted_out')
Then I wrote this query below, which worked very well and gave me the resolution I was looking for. However, I'd like to ask here if this can be done differently, more elegantly.
SELECT
e.company_id
, max_opt_out
FROM (
SELECT DISTINCT
company_id
, count(id)
OVER (
PARTITION BY company_id ) opt_out
FROM employees
WHERE opt_state IN ('ceased_membership', 'ignition', 'opted_out')) e
LEFT JOIN (
SELECT
company_id
, count(id) opt_in
, max(opt_out_on) max_opt_out
FROM employees
GROUP BY company_id) S
ON e.company_id = s.company_id
WHERE e.opt_out = s.opt_in;
This seems like a good time to use the HAVING clause
SELECT company_id, max(opt_out_on)
FROM employees e
GROUP BY company_id
HAVING bool_and( opt_state in ('ceased_membership', 'ignition','opted_out'));
HAVING in a bit like a WHERE but the condition apples to whole GROUPS
bool_and is an agregate function that is only true when all the records in the group are result in true.
I'd say that you want to query a maximum out_out_on for each company that only have employees in a set of states, which means that do not have any employee not in a set of states.
So, translated to SQL:
select company_id, max(opt_out_on)
from employees e
where not exists(
select 1 from employees
where company_id=e.company_id
and opt_state not in ('ceased_membership', 'ignition','opted_out')
)
group by company_id;

Query-Sql Developer

I am creating some queries for my project, but I face some difficulties with the follow ones:
A SELECT statement containing a subquery to retrieve a list of Locations (location id and street_address) that have employees with higher salary than the average of their department. The list must contain the number of those employees and their total salary per location. Name these aggregates respectively "emp" and "totalsalary". The locations in the list must be ordered by location_id.
Select LOCATION_ID, STREET_ADDRESS
from HR.LOCATIONS IN
(Select Employee_id
from HR.Employees
Where Salary > round(avg(SALARY)))
order by location_id;
error: SQL command not properly ended
and the second query is the following
The JOB_HISTORY table can contain more than one entries for an employee who was hired more than once. Create a query to retrieve a list of Employees that were hired more than once. Include the columns EMPLOYEE_ID, LAST_NAME, FIRST_NAME and the aggregate "Times Hired".
SELECT FIRST_NAME,LAST_NAME,EMPLOYEE_ID,
count (*)as TIMES_HIRED
from HR.JOB_HISTORY, HR.EMPLOYEES
where EMPLOYEE_ID= LAST_NAME
having COUNT(*) >1;
error: not a single-group
Try these hope they help. I am making an assumption that employee table has Location_Id column. I am adding Employee_id to Group by to make sure you get correct TotalSalary:
Select LOCATION_ID, STREET_ADDRESS, Count(Employee_id) AS emp, SUM(salary) AS totalsalary
from HR.LOCATIONS INNER JOIN
(Select Employee_id, salary
from HR.Employees
Having Salary > round(avg(SALARY), 0)) AS Emp ON HR.LOCATION_ID = Emp.Location_ID
Group By LOCATION_ID, STREET_ADDRESS, Employee_id
order by location_id;
For the second question:
SELECT FIRST_NAME,LAST_NAME,EMPLOYEE_ID,
count(Employee_id) as TIMES_HIRED
from HR.JOB_HISTORY inner join HR.EMPLOYEES On JOB_HISTORY.Employee_id = Employees.Employee_id
Group By FIRST_NAME,LAST_NAME,EMPLOYEE_ID
Having count(Employee_id) >1;

Group by multiple columns in PostgreSQL

I have two queries:
SELECT city, count(id) as num_of_applicants
FROM(
select distinct(students.id), city
FROM STUDENTS INNER JOIN APPLICATIONS ON STUDENTS.ID = APPLICATIONS.STUDENT_ID
WHERE APPLICATIONS.COLLEGE_ID = '28'
) AS derivedTable
GROUP BY city;
SELECT city, count(id) as num_of_accepted_applicants
FROM
(select applications.id, city FROM
STUDENTS INNER JOIN APPLICATIONS ON STUDENTS.ID = APPLICATIONS.STUDENT_ID
WHERE status = 'Accepted' and college_id = '28') as tbl
GROUP BY city
one give the number of applicants for each college and one give the number of accepted applicants in each college, but I want to get a result in on query (instead of) where the result is something like:
city | number_of_applicants | number_of_accepted_applicants
You can simplify (fyi: I didn't understand why you used the derived tables, you could have just put the COUNT and GROUP BY on the inner queries) and combine the queries as this:
SELECT city
, COUNT(*) AS num_of_applicants
, SUM( CASE
WHEN status = 'Accepted' THEN 1
ELSE 0
END
) AS num_of_accepted_applicants
FROM STUDENTS
JOIN APPLICATIONS
ON STUDENTS.ID = APPLICATIONS.STUDENT_ID
WHERE college_id='28'
GROUP BY city;
Another way is to continue with the technique of derived tables. Make each of your queries a derived table and JOIN on the city - but that would not perform as well.

Union Select Distinct syntax?

I have a huge table that contains both shipping address information and billing address information. I can get unique shipping and billing addresses in two separate tables with the following:
SELECT DISTINCT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
ORDER BY Orders.ShipToName
SELECT DISTINCT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders
ORDER BY Orders.BillToName
How can I get the distinct intersection of the two? I am unsure of the syntax.
something like this?
SELECT DISTINCT
toname, addr1, addr2, addr3, city, zip
FROM
(SELECT DISTINCT
ShipToName AS toName,
ShipToAddress1 AS addr1,
ShipToAddress2 AS addr2,
ShipToAddress3 AS addr3,
ShipToCity AS city,
ShipToZipCode AS zip
FROM
Orders
UNION ALL
SELECT DISTINCT
BillToName AS toName,
BillToAddress1 AS addr1,
BillToAddress2 AS addr2,
BillToAddress3 AS addr3,
BillToCity AS city,
BillToZipCode AS zip
FROM
Orders) o
ORDER BY ToName
You say "Intersection" but you accepted the Union answer so I guess you just want the UNION DISTINCT. No need for derived tables and the three DISTINCT. You can use the simple:
SELECT
ShipToName AS Name,
ShipToAddress1 AS Address1,
ShipToAddress2 AS Address2,
ShipToAddress3 AS Address3,
ShipToCity AS City,
ShipToZipCode AS ZipCode
FROM
Orders
UNION --- UNION means UNION DISTINCT
SELECT
BillToName
BillToAddress1,
BillToAddress2,
BillToAddress3,
BillToCity,
BillToZipCode
FROM
Orders
ORDER BY
Name ;
You can join both sets on all fields and this will return the records that match:
SELECT *
FROM Orders o1
INNER JOIN Orders o2
ON o1.ShipToName = o2.BillToName
AND o1.ShipToAddress1 = o2.BillToAddress1
AND o1.ShipToAddress2 = o2.BillToAddress2
AND o1.ShipToAddress3 = o2.BillToAddress3
AND o1.ShipToCity = o2.BillToCity
AND o1.ShipToZipCode = o2.BillToZipCode
Or you should be able to use INTERSECT:
SELECT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
INTERSECT
SELECT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders
Or even a UNION query (UNION removes duplicates between two sets of data):
SELECT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
UNION
SELECT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders