Group by multiple columns in PostgreSQL - group-by

I have two queries:
SELECT city, count(id) as num_of_applicants
FROM(
select distinct(students.id), city
FROM STUDENTS INNER JOIN APPLICATIONS ON STUDENTS.ID = APPLICATIONS.STUDENT_ID
WHERE APPLICATIONS.COLLEGE_ID = '28'
) AS derivedTable
GROUP BY city;
SELECT city, count(id) as num_of_accepted_applicants
FROM
(select applications.id, city FROM
STUDENTS INNER JOIN APPLICATIONS ON STUDENTS.ID = APPLICATIONS.STUDENT_ID
WHERE status = 'Accepted' and college_id = '28') as tbl
GROUP BY city
one give the number of applicants for each college and one give the number of accepted applicants in each college, but I want to get a result in on query (instead of) where the result is something like:
city | number_of_applicants | number_of_accepted_applicants

You can simplify (fyi: I didn't understand why you used the derived tables, you could have just put the COUNT and GROUP BY on the inner queries) and combine the queries as this:
SELECT city
, COUNT(*) AS num_of_applicants
, SUM( CASE
WHEN status = 'Accepted' THEN 1
ELSE 0
END
) AS num_of_accepted_applicants
FROM STUDENTS
JOIN APPLICATIONS
ON STUDENTS.ID = APPLICATIONS.STUDENT_ID
WHERE college_id='28'
GROUP BY city;
Another way is to continue with the technique of derived tables. Make each of your queries a derived table and JOIN on the city - but that would not perform as well.

Related

find all companies where all employees in specific state

I have a table employees with columns:
company_id,
id,
opt_state (ceased_membership, ignition, opted_out, opted_in),
opt_out_on.
I want to query all companies where all employees opt-state is in ('ceased_membership', 'ignition', 'opted_out') and the date opt_out_on when last employee left.
I have tried this but it didn't work
select company_id from employees where id=all(select id from
employees
where opt_state in ('ceased_membership', 'ignition','opted_out')
Then I wrote this query below, which worked very well and gave me the resolution I was looking for. However, I'd like to ask here if this can be done differently, more elegantly.
SELECT
e.company_id
, max_opt_out
FROM (
SELECT DISTINCT
company_id
, count(id)
OVER (
PARTITION BY company_id ) opt_out
FROM employees
WHERE opt_state IN ('ceased_membership', 'ignition', 'opted_out')) e
LEFT JOIN (
SELECT
company_id
, count(id) opt_in
, max(opt_out_on) max_opt_out
FROM employees
GROUP BY company_id) S
ON e.company_id = s.company_id
WHERE e.opt_out = s.opt_in;
This seems like a good time to use the HAVING clause
SELECT company_id, max(opt_out_on)
FROM employees e
GROUP BY company_id
HAVING bool_and( opt_state in ('ceased_membership', 'ignition','opted_out'));
HAVING in a bit like a WHERE but the condition apples to whole GROUPS
bool_and is an agregate function that is only true when all the records in the group are result in true.
I'd say that you want to query a maximum out_out_on for each company that only have employees in a set of states, which means that do not have any employee not in a set of states.
So, translated to SQL:
select company_id, max(opt_out_on)
from employees e
where not exists(
select 1 from employees
where company_id=e.company_id
and opt_state not in ('ceased_membership', 'ignition','opted_out')
)
group by company_id;

Using "UNION ALL" and "GROUP BY" to implement "Intersect"

I'v provided following query to find common records in 2 data sets but it's difficult for me to make sure about correctness of my query because of that I have a lot of data records in my DB.
Is it OK to implement Intersect between "Customers" & "Employees" tables using UNION ALL and apply GROUP BY on the result like below?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
So can we say that any record which exists in the result of this query also exists in the Intersect set between "Customers & Employees" tables AND any record that exists in Intersect set between "Customers & Employees" tables will be in the result of this query too?
So is it right to say any record in result of this query is in
"Intersect" set between "Customers & Employees" "AND" any record that
exist in "Intersect" set between "Customers & Employees" is in result
of this query too?
YES.
... Yes, but it won't be as efficient because you are filtering out duplicates three times instead of once. In your query you're
Using DISTINCT to pull unique records from employees
Using DISTINCT to pull unique records from customers
Combining both queries using UNION ALL
Using GROUP BY in your outer query to to filter the records you retrieved in steps 1,2 and 3.
Using INTERSECT will return identical results but more efficiently. To see for yourself you can create the sample data below and run both queries:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
Run these queries:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
Results:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
If using INTERSECT is not an option OR you want a faster query you could improve the query you posted a couple different ways, such as:
Option 1: let GROUP BY handle ALL the de-duplication like this:
This is the same as what you posted but without the DISTINCTS
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
Option 2: Use ROW_NUMBER
This would be my preference and will likely be most efficient
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;

Can't solve this SQL query

I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3

Union Select Distinct syntax?

I have a huge table that contains both shipping address information and billing address information. I can get unique shipping and billing addresses in two separate tables with the following:
SELECT DISTINCT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
ORDER BY Orders.ShipToName
SELECT DISTINCT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders
ORDER BY Orders.BillToName
How can I get the distinct intersection of the two? I am unsure of the syntax.
something like this?
SELECT DISTINCT
toname, addr1, addr2, addr3, city, zip
FROM
(SELECT DISTINCT
ShipToName AS toName,
ShipToAddress1 AS addr1,
ShipToAddress2 AS addr2,
ShipToAddress3 AS addr3,
ShipToCity AS city,
ShipToZipCode AS zip
FROM
Orders
UNION ALL
SELECT DISTINCT
BillToName AS toName,
BillToAddress1 AS addr1,
BillToAddress2 AS addr2,
BillToAddress3 AS addr3,
BillToCity AS city,
BillToZipCode AS zip
FROM
Orders) o
ORDER BY ToName
You say "Intersection" but you accepted the Union answer so I guess you just want the UNION DISTINCT. No need for derived tables and the three DISTINCT. You can use the simple:
SELECT
ShipToName AS Name,
ShipToAddress1 AS Address1,
ShipToAddress2 AS Address2,
ShipToAddress3 AS Address3,
ShipToCity AS City,
ShipToZipCode AS ZipCode
FROM
Orders
UNION --- UNION means UNION DISTINCT
SELECT
BillToName
BillToAddress1,
BillToAddress2,
BillToAddress3,
BillToCity,
BillToZipCode
FROM
Orders
ORDER BY
Name ;
You can join both sets on all fields and this will return the records that match:
SELECT *
FROM Orders o1
INNER JOIN Orders o2
ON o1.ShipToName = o2.BillToName
AND o1.ShipToAddress1 = o2.BillToAddress1
AND o1.ShipToAddress2 = o2.BillToAddress2
AND o1.ShipToAddress3 = o2.BillToAddress3
AND o1.ShipToCity = o2.BillToCity
AND o1.ShipToZipCode = o2.BillToZipCode
Or you should be able to use INTERSECT:
SELECT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
INTERSECT
SELECT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders
Or even a UNION query (UNION removes duplicates between two sets of data):
SELECT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
UNION
SELECT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders

How do you perform a search on a 1-to-many relationship when the criteria could be on either table?

I am using t-sql. I have what I thought would be an easy search. There is a 1-to-many relationship between SalesPerson and TradeShow. 1 salesperson could have gone to many trade shows. I need to be able to search on the SalePerson. I also need to be able to search on the LAST trade show they attended. I thought I would be able to do simple join and group on their last trade show, but I can not display the City or State.
SELECT SalePersonID, FirstName, LastName, TradeShow.DateLastWent
FROM SalesPerson INNER JOIN
(SELECT SalePersonID, MAX(DateLastWent) AS DateLastWent
FROM TradeShow
GROUP BY SalesPersonID) AS TradeShow ON SalesPerson.SalePersonID= TradeShow.SalePersonID
This workds, but the Tradeshow also has city and State. I need to be able to search on and display city and state. But if I include them in the subquery, I have to include thm in an aggregate function, and if I do that, I get the incorrect city and state.
The tables are simple
SALEPERSON
salespersonID PK
firstname
lastname
TRADESHOW
tradeshowID PK
datelastwent
city
state
salespersonID FK
Re-word it: what you want is the salesperson, plus the information from the last show that they have been to.
Select
SalePersonID,
FirstName,
LastName,
TradeShow.DateLastWent,
TradeShow.City,
TradeShow.State
From
SalesPerson
Inner Join TradeShow
On SalesPerson.SalePersonID = TradeShow.SalePersonID
Where
TradeShow.TradeShowID =
(Select Top 1 Latest.TradeShowID
From TradeShow As Latest
Where SalesPerson.SalePersonID = Latest.SalePersonID
Order By Latest.DateLastWent Desc)
You can join TradeShow twice :
SELECT SalePersonID, FirstName, LastName, TS1.DateLastWent,
TS2.City, TS2.State
FROM SalesPerson INNER JOIN
(SELECT SalePersonID, MAX(DateLastWent) AS DateLastWent
FROM TradeShow
GROUP BY SalesPersonID
) AS TS1 ON (SalesPerson.SalePersonID= TradeShow.SalePersonID)
INNER JOIN TradeShow TS2 ON
(TS2.SalePersonID = TS1.SalePersonID AND TS2.DateLastWent = TS1.DateLastWent)
WHERE TS2.City = 'CityName'
There is likely a more elegant way to solve this, but my first thought is to simply grab the newest TradeShow record to join with
SELECT SalePersonID, FirstName, LastName, TradeShow.DateLastWent
FROM SalesPerson
INNER JOIN (
SELECT *
FROM (
SELECT TradeShowId, DateLastWent, City, State, SalesPersonId
FROM TradeShow
ORDER BY datelastwent DESC
)
WHERE ROWNUM <= 1
) ON SalesPerson.SalesPersonId = TradeShow.SalesPersonId
Edit
Oops... been playing with Oracle too much
ROW_NUMBER() OVER(order by date) or SELECT TOP X
would be thw SQL Server way for doing this... don't have an instance of SQL-Server running, but pretty sure the syntax ends up being something like
SELECT SalePersonID, FirstName, LastName, TradeShow.DateLastWent
FROM SalesPerson
INNER JOIN (
SELECT TradeShowId, DateLastWent, City, State, SalesPersonId, ROW_NUMBER() OVER(PARTITION BY TradeShow.SalesPersonId ORDER BY DateLastWent DESC) RowNumber
FROM TradeShow
) ON SalesPerson.SalesPersonId = TradeShow.SalesPersonId AN TradeShow.RowNumber = 1