Case When + Join + Group By + Having SQL Query

Case When + Join + Group By + Having SQL Query - group-by

Please click here for sample tables and description.
I have three tables ‘PROJECTS’, 'PROJECTMANAGER' and ‘MANAGERS' as per attached images.
I need a query which can list PROJECTS managed by “PRODUCT” type managers grouped with PROJECT STATUS.
If STATUS=2 then show as Completed Projects or else In Progress Projects.
Resulted table should look like as shown in attached image.
Desired Result: http://www.dbasupport.com/forums/attachment.php?attachmentid=588&d=1336691473
The query, I need should be generic so that it can be used in any database (MySQL/Oracle/MSSQL/DB2)
Please Help.
BTW, This is not the homework...!!!!
I used sample tables.
I have tried with case when statements but don't know how to join and at the same time use group by.
CREATE TABLE PROJECTS
(
PROJECT_ID varchar(20),
PROJECT_NAME varchar(30),
STATUS int
);
CREATE TABLE PROJECTMANAGER
(
PROJECT_ID varchar(20),
MANAGER_ID varchar(20)
);
CREATE TABLE MANAGERS
(
MANAGER_ID varchar(20),
MANAGER_NAME varchar(20),
TYPE varchar(20)
);
INSERT INTO PROJECTS (PROJECT_ID, PROJECT_NAME, STATUS) VALUES
('project_001', 'Project 001', 0),
('project_002', 'Project 002', 1),
('project_003', 'Project 003', 2),
('project_004', 'Project 004', 0),
('project_005', 'Project 005', 2),
('project_006', 'Project 006', 0),
('project_007', 'Project 007', 1);
INSERT INTO PROJECTMANAGER (PROJECT_ID , MANAGER_ID) VALUES
('project_001', 'mgr_001'),
('project_002', 'mgr_001'),
('project_001', 'mgr_002'),
('project_002', 'mgr_003'),
('project_001', 'mgr_003'),
('project_005', 'mgr_001'),
('project_004', 'mgr_002');
INSERT INTO MANAGERS (MANAGER_ID, MANAGER_NAME, TYPE) VALUES
('mgr_001', 'Manager 001', 'PRODUCT'),
('mgr_002', 'Manager 002', 'HR'),
('mgr_003', 'Manager 003', 'PRODUCT'),
('mgr_004', 'Manager 004', 'FINANCE'),
('mgr_005', 'Manager 005', 'PRODUCT');
Resulted Table:
MANAGER_ID | MANAGER _NAME | COMPLETED_PROJECTS | IN_PROGRESS_PROJECTS |
mgr_001 | Manager 001 | 1 | 2 |
mgr_003 | Manager 003 | 0 | 1 |
mgr_005 | Manager 005 | 0 | 0 |

Try something like this:
SELECT
m.manager_id,
m.manager_name,
SUM(CASE WHEN p.status = 2 THEN 1 ELSE 0 END) as completed,
SUM(CASE WHEN p.status != 2 THEN 1 ELSE 0 END) as in_progress
FROM managers m
LEFT JOIN projectmanager pm ON (m.manager_id = pm.manager_id)
LEFT JOIN projects p ON (p.project_id = pm.project_id)
WHERE m.type = 'product'
GROUP BY m.manager_id

Related

Postgres: Query for list of ids in a mapping table and create If they don't exist

Assume we have the following table whose purpose is to autogenerate a numeric id for distinct (name, location) tuples:
CREATE TABLE mapping
(
id bigserial PRIMARY KEY,
name text NOT NULL,
location text NOT NULL,
);
CREATE UNIQUE INDEX idx_name_loc on mapping(name location)
What is the most efficient way to query for a set of (name, location) tuples and autocreate any mappings that don't already exist, with all mappings (including the ones we created) being returned to the user.
My naive implementation would be something like:
SELECT id, name, location
FROM mappings
WHERE (name, location) IN ((name_1, location_1)...(name_n, location_n))
do something with the results in a programming language of may choice to work out which results are missing.
INSERT
INTO mappings (name, location)
VALUES (missing_name_1, missing_loc_1), ... (missing_name_2, missing_loc_2)
ON CONFLICT DO NOTHING
This gets the job done but I get the feeling there's probably something that can a) be done in pure sql and b) is more efficient.

You can use DISTINCT to get all possible values for the two columns, and CROSS JOIN to get their Carthesian product.
LEFT JOIN with the original table to get the actual records (if any):
CREATE TABLE mapping
( id bigserial PRIMARY KEY
, name text NOT NULL
, location text NOT NULL
, UNIQUE (name, location)
);
INSERT INTO mapping(name, location) VALUES ('Alice', 'kitchen'), ('Bob', 'bedroom' );
SELECT * FROM mapping;
SELECT n.name, l.location, m.id
FROM (SELECT DISTINCT name from mapping) n
CROSS JOIN (SELECT DISTINCT location from mapping) l
LEFT JOIN mapping m ON m.name = n.name AND m.location = l.location
;
Results:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 2
id | name | location
----+-------+----------
1 | Alice | kitchen
2 | Bob | bedroom
(2 rows)
name | location | id
-------+----------+----
Alice | kitchen | 1
Alice | bedroom |
Bob | kitchen |
Bob | bedroom | 2
(4 rows)
And if you want to physically INSERT the missing combinations:
INSERT INTO mapping(name, location)
SELECT n.name, l.location
FROM (SELECT DISTINCT name from mapping) n
CROSS JOIN (SELECT DISTINCT location from mapping) l
WHERE NOT EXISTS(
SELECT *
FROM mapping m
WHERE m.name = n.name AND m.location = l.location
)
;
SELECT * FROM mapping;
INSERT 0 2
id | name | location
----+-------+----------
1 | Alice | kitchen
2 | Bob | bedroom
3 | Alice | bedroom
4 | Bob | kitchen
(4 rows)

Count with group by on Postgresql

I have a postgresql type and a table
CREATE TYPE mem_status AS ENUM('waiting', 'active', 'expired');
CREATE TABLE mems (
id BIGSERIAL PRIMARY KEY,
status mem_status NOT NULL
);
dataset
INSERT INTO mems(id, status) VALUES
(1, 'active'), (2, 'active'), (3, 'expired');
I want to query counts that grouped by statuses. So I treid the query below.
WITH mem_statuses AS (
SELECT unnest(enum_range(NULL::mem_status)) AS status
)
SELECT m.status, count(1)
FROM mems m
RIGHT JOIN mem_statuses ms ON ms.status = m.status
GROUP BY m.status;
But if there is no waiting mems, the result looks like below.
status | count
================
NULL | 1 <- problem
'active' | 2
'expired' | 1
I want to get result like this.
status | count
================
'waiting' | 0
'active' | 2
'expired' | 1
How can I do that?

Use count(id):
WITH mem_statuses AS (
SELECT unnest(enum_range(NULL::mem_status)) AS status
)
SELECT ms.status, count(id)
FROM mems m
RIGHT JOIN mem_statuses ms ON ms.status = m.status
GROUP BY ms.status;
or:
select status, count(id)
from unnest(enum_range(null::mem_status)) as status
left join mems using(status)
group by status
status | count
---------+-------
waiting | 0
active | 2
expired | 1
(3 rows)
Per the documentation count(expression) gives
number of input rows for which the value of expression is not null

You need to modify the join and aggregate a bit -
select ms.status, count(m.status)
from (select unnest(enum_range(null::mem_status))) as ms(status)
left join mems as m
on ms.status = m.status
group by ms.status;

Optimising T-SQL reporting performance

I have the table bellow, I need to delete opposite rows between two dates by pairs based on PerCode Value,
In fact, we delete rows inside the date range that have the same PerCode and have equal and opposite values.
The problem is that begin date and end date are provided by users as parameters while reporting but the query take too much time if i try to delete these at runtime.
Example:
Begin date = 01/01/2018
End date = 31/12/2018
I should delete rows 3 and 4.
Do u have any idea how to do that while optimising performance (the table have 200 Millions of rows)
+----+------------+---------+---------+-----------+
| Id | Date | PerCode | Value | IsDeleted |
+----+------------+---------+---------+-----------+
| 1 | 01/10/2017 | C1 | 10 | |
| 2 | 01/01/2018 | C1 | -10 | |
| 3 | 15/02/2018 | C2 | 20 | 1 |
| 4 | 10/03/2018 | C2 | -20 | 1 |
| 5 | 01/12/2018 | C3 | 15 | |
| 6 | 01/02/2019 | C3 | -15 | |
+----+------------+---------+---------------------+

I had a quick go at this, using a table variable to allow me to knock together a query using your test data. However, this might not perform well when used over 2 million rows?
DECLARE #table TABLE (id INT, [date] DATE, percode CHAR(2), [value] INT, isdeleted BIT);
INSERT INTO #table
SELECT 1, '20171001', 'C1', 10, NULL
UNION ALL
SELECT 2, '20180101', 'C1', -10, NULL
UNION ALL
SELECT 3, '20180215', 'C2', 20, NULL
UNION ALL
SELECT 4, '20180310', 'C2', -20, NULL
UNION ALL
SELECT 5, '20181201', 'C3', 15, NULL
UNION ALL
SELECT 6, '20190201', 'C3', -15, NULL;
DECLARE #date_from DATE = '20180101';
DECLARE #date_to DATE = '20181231';
WITH ordered AS (
SELECT
id,
percode,
[value],
ROW_NUMBER() OVER (PARTITION BY percode, [value] ORDER BY [value]) AS order_id
FROM
#table
WHERE
[date] BETWEEN #date_from AND #date_to
AND ISNULL(isdeleted, 0) != 1),
matches AS (
SELECT
m1.id AS match_1_id,
m2.id AS match_2_id
FROM
ordered m1
INNER JOIN ordered m2 ON m1.percode = m2.percode AND m1.[value] = m2.[value] * -1 AND m1.order_id = m2.order_id)
UPDATE
t
SET
isdeleted = 1
FROM
#table t
INNER JOIN matches m ON m.match_1_id = t.id OR m.match_2_id = t.id;
SELECT * FROM #table;
Results:
id date percode value isdeleted
1 2017-10-01 C1 10 NULL
2 2018-01-01 C1 -10 NULL
3 2018-02-15 C2 20 1
4 2018-03-10 C2 -20 1
5 2018-12-01 C3 15 NULL
6 2019-02-01 C3 -15 NULL
How does it work? Well I broke the task down into steps:
make a list of all rows in the date period specified, where they aren't already deleted;
for each row of data assign it a running count number, grouped by the percode and the value. So the first C1 10 would be number #1, then the second C1 10 would be number #2, etc.;
to find matches it's simply a case of finding any value that has the same percode, the equal and opposite value to another value group, and the same running count number;
where there's a match set the isdeleted flag to 1.

Here is my code but this is not performant over 200 millions rows in real time.
and in real life Percode is concatenation of 5 columns (date, varchar(13), varchar(2),varchar(1) and varchar(50)) and Value is 4 numeric columns.
I am searching for other ideas.
--DECLARE #table TABLE (id INT, [date] DATE, percode CHAR(2), [value] INT, isdeleted BIT);
Select * INTO #MasterTable FROM
(
SELECT 1 id, '20171001' [date], 'C1' percode, 10 [value], NULL isdeleted
UNION ALL
SELECT 2, '20180101', 'C1', -10, NULL
UNION ALL
SELECT 3, '20180215', 'C2', 20, NULL
UNION ALL
SELECT 4, '20180310', 'C2', -20, NULL
UNION ALL
SELECT 5, '20181201', 'C3', 15, NULL
UNION ALL
SELECT 6, '20190201', 'C3', -15, NULL
) T ;
DECLARE #date_from DATE = '20180101';
DECLARE #date_to DATE = '20181231';
select F.id
Into #TmpTable
from
(
select Id, PerCode, Value
,ROW_NUMBER() over (partition by PerCode, Value order by (select 0)) Rn2
from
#MasterTable ) F
inner join (
select
PerCode
, Rn1
from (
select
PerCode
,Value
,ROW_NUMBER() over (partition by PerCode, Value order by (select 0)) Rn1
FROM #MasterTable
where
[date] BETWEEN #date_from AND #date_to
) A
group by PerCode , Rn1
having sum(Value) = 0 and count(*)>1
) B on F.PerCode = B.PerCode
and F.Rn2 = B.Rn1
update R
set IsDeleted = 1
from #MasterTable R
inner join #TmpTable P
on R.id = P.id
select * from #MasterTable
drop table #MasterTable ;
drop table #TmpTable;

TSQL query to return values from a table where there are multiple rows with same ID into a single row but each unique value in a different column

I'm trying to return values from a table so that I get 1 row per purchaseID and return multiple columns with Buyers First and Last Names.
E.G
I have a table with the following Data
| PurchaseID | FirstName | LastName|
|---------1------- | ----Joe------ | ---Smith----|
|---------1------- | -----Peter--- | ---Pan------|
|---------2------- | ----Max------|---Power----|
|---------2------- | -----Jack---- | ---Frost----|
I'm trying to write a query that returns the values like so
| PurchaseID | Buyer1FirstName | Buyer1LastName | Buyer2FirstName |Buyer2LastName|
|--------1---------|------------Joe--------- |--------Smith----------|---------Peter-----------|--------Pan------------|
|--------2---------|-------------Max--------|---------Power--------|---------Jack -----------|---------Frost----------|
I've been looking online but because I'm not sure how to explain in words what I want to do, I'm not having much luck. I'm hoping with a more visual explanation someone could point me in the right direction.
Any help would be awesome.

You can use ROW_NUMBER as the below:
DECLARE #Tbl TABLE (PurchaseID INT, FirstName VARCHAR(50), LastName VARCHAR(50))
INSERT INTO #Tbl
VALUES
(1, 'Joe', 'Smith'),
(1, 'Peter', 'Pan'),
(2, 'Max', 'Power'),
(2, 'Jack', 'Frost'),
(2, 'Opss', 'Sspo')
;WITH CTE
AS
(
SELECT
*, ROW_NUMBER() OVER (PARTITION BY PurchaseID ORDER BY PurchaseID) RowId
FROM #Tbl
)
SELECT
A.PurchaseID,
MIN(CASE WHEN A.RowId = 1 THEN A.FirstName END) Buyer1FirstName,
MIN(CASE WHEN A.RowId = 1 THEN A.LastName END ) Buyer1LastName ,
MIN(CASE WHEN A.RowId = 2 THEN A.FirstName END) Buyer2FirstName ,
MIN(CASE WHEN A.RowId = 2 THEN A.LastName END )Buyer2LastName,
MIN(CASE WHEN A.RowId = 3 THEN A.FirstName END) Buyer3FirstName ,
MIN(CASE WHEN A.RowId = 3 THEN A.LastName END )Buyer3LastName,
MIN(CASE WHEN A.RowId = 4 THEN A.FirstName END) Buyer4FirstName ,
MIN(CASE WHEN A.RowId = 4 THEN A.LastName END )Buyer4LastName
FROM
CTE A
GROUP BY
A.PurchaseID
Result:
PurchaseID Buyer1FirstName Buyer1LastName Buyer2FirstName Buyer2LastName Buyer3FirstName Buyer3LastName Buyer4FirstName Buyer4LastName
----------- ------------------- -------------------- -------------------- ------------------ ------------------- ----------------- ------------------- --------------
1 Joe Smith Peter Pan NULL NULL NULL NULL
2 Max Power Jack Frost Opss Sspo NULL NULL

PostgreSQL UNION don't merge lines properly

I have 3 tables in a PostgreSQL database:
localities (loc, 12561 rows)
plants (pl, 17052 rows)
specimens or samples (esp, 9211 rows)
pl and esp each have a field loc, to specify where that tagged plant lives, or where that sample (usually a branch with leaves and flowers) came from.
I need a report of the places that have plants or samples, and the number of plants and samples in each place. The best I did up to now is the union of two subqueries, that runs very fast (33 ms to fetch 69 rows):
(select l.id,l.nome,count(pl.id) pls,null esps
from loc l
left join pl on pl.loc = l.id
where l.id in
(select distinct pl.loc
from pl
where pl.loc > 0)
group by l.id,l.nome
union
select l.id,l.nome,null pls,count(e.id) esps
from loc l
left join esp e on e.loc = l.id
where l.id in
(select distinct e.loc
from esp e
where e.loc > 0)
group by l.id,l.nome)
order by id
The point is, when the same place has both plants and samples, it becomes two distinct lines, like:
11950 | San Martin | | 5 |
11950 | San Martin | 61 | |
Of course what I want is:
11950 | San Martin | 61 | 5 |
Before that, I have tried doing all in one query:
select l.id,l.nome,count(pl.id),count(e.id) esps
from loc l
left join pl on pl.loc = l.id
left join esp e on e.loc = l.id
where l.id in
(select distinct pl.loc
from pl
where pl.loc > 0)
or l.id in
(select distinct e.loc
from esp e
where e.loc > 0)
group by l.id,l.nome
but it returns a strange repetition (it's multiplying both results and showing the result twice):
11950 | San Martin | 305 | 305 |
I have tried without subqueries, but it was taking about 13 seconds, which is too long.

I created test layout with:
create table localities (id integer, loc_name text);
create table plants (plant_id integer, loc_id integer);
create table samples (sample_id integer, loc_id integer);
insert into localities select x, ('Loc ' || x::text) from generate_series(1, 12561) x ;
insert into plants select x, (random()*12561)::integer from generate_series(1, 17052) x;
insert into samples select x, (random()*12561)::integer from generate_series(1, 9211) x;
The trick is to create an intermediate table from plants and samples but with same structure. Where data doesn't make sense (plant has no sample_id), you add null:
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples
This table has unified structure and you can then aggregate on it (I'm using WITH to make it a bit more readable.):
with localities_used as (
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples)
select
localities_used.loc_id,
count(localities_used.plant_id) plant_count,
count(localities_used.sample_id) sample_count
from
localities_used
group by
localities_used.loc_id;
If you need additional data from localities, you can join them on the aggregated table:
with localities_used as (
select loc_id, plant_id, null as sample_id from plants
union all
select loc_id, null as plant_id, sample_id from samples),
aggregated as (
select
localities_used.loc_id,
count(localities_used.plant_id) plant_count,
count(localities_used.sample_id) sample_count
from
localities_used
group by
localities_used.loc_id)
select * from aggregated left outer join localities on aggregated.loc_id = localities.id;
This takes 75ms on my laptop all together.

This should be as easy as
select * from (
select
location.*,
(select count(id) from plant where plant.location = location.id) as plants,
(select count(id) from sample where sample.location = location.id) as samples
from location
) subquery
where subquery.plants > 0 or subquery.samples > 0;
id | name | plants | samples
----+------------+--------+---------
1 | San Martin | 2 | 1
2 | Rome | 1 | 2
3 | Dallas | 3 | 1
(3 rows)
This is the database I quickly set up to experiment with:
create table location(id serial primary key, name text);
create table plant(id serial primary key, name text, location integer references location(id));
create table sample(id serial primary key, name text, location integer references location(id));
insert into location (name) values ('San Martin'), ('Rome'), ('Dallas'), ('Ghost Town');
insert into plant (name, location) values ('San Martin Dandelion', 1),('San Martin Camomile', 1), ('Rome Raspberry', 2), ('Dallas Locoweed', 3), ('Dallas Lemongrass', 3), ('Dallas Setaria', 3);
insert into sample (name, location) values ('San Martin Bramble', 1), ('Rome Iris', 2), ('Rome Eucalypt', 2), ('Dallas Dogbane', 3);
tests=# select * from location;
id | name
----+------------
1 | San Martin
2 | Rome
3 | Dallas
4 | Ghost Town
(4 rows)
tests=# select * from plant;
id | name | location
----+----------------------+----------
1 | San Martin Dandelion | 1
2 | San Martin Camomile | 1
3 | Rome Raspberry | 2
4 | Dallas Locoweed | 3
5 | Dallas Lemongrass | 3
6 | Dallas Setaria | 3
(6 rows)
tests=# select * from sample;
id | name | location
----+--------------------+----------
1 | San Martin Bramble | 1
2 | Rome Iris | 2
3 | Rome Eucalypt | 2
4 | Dallas Dogbane | 3
(4 rows)

I didn't test that but I think it could be something like this:
SELECT
l.id,
l.nome,
SUM(CASE WHEN pl.id IS NOT NULL THEN 1 ELSE 0 END) as plants_count,
SUM(CASE WHEN e.id IS NOT NULL THEN 1 ELSE 0 END) as esp_count
FROM loc l
LEFT JOIN pl ON pl.loc = l.id
LEFT JOIN esp e ON e.loc = l.id
GROUP BY l.id,l.nome
The point is to count non null ids of each type.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Case When + Join + Group By + Having SQL Query - group-by

Related

Postgres: Query for list of ids in a mapping table and create If they don't exist

Count with group by on Postgresql

Optimising T-SQL reporting performance

TSQL query to return values from a table where there are multiple rows with same ID into a single row but each unique value in a different column

PostgreSQL UNION don't merge lines properly

Categories

Resources