SQL Subquery for each - postgresql

I have following tables
create table players
(
name varchar(30) not null primary key,
);
create table injuries
bId int not null primarykey,
date DATE not null,
name varchar(30),
foreign key(name) references players
);
create table sportsBegins
(
cId int not null primarykey,
date DATE,
sportname varchar(20),
name varchar(30)
foreign key(name) references players
);
Following example data:
players
name
John
Jane
George
shows players in db
sportsBegins
cId | date | sportname | name
1 2020-01-01 Basketball John
2 2020-02-02 Basketball John
3 2020-01-01 Soccer John
4 2020-02-02 Basketball Jane
5 2020-01-03 Basketball George
6 2020-01-04 Badminton George
shows what date players begin playing a sport
injuries
bId | date | name
1 2020-01-01 John
2 2020-02-03 Jane
3 2020-01-05 George
shows the date these players reported injuries.
I want to count the number of DISTINCT players that have experienced an injury in Basketball AFTER the first day they got assigned the sport (not the same day).
So for each player, i need to only grab the first date they started playing basketball. Then for that player, i need to compate his name AND date to the name AND date in the injuries table to see if he ever reported an injury after the date he got the sport assigned.
Example
In the example data I provided this would be the output
Total basketball injuries
2
Explanation of answer
John got assigned basketball twice. Only look at first date he got assigned basketball. Then look at injuries table. He only reported an injury on that day, but never after, so ignore. Jane and George reported injuries after first day assigned basketball so count them

This should get you the desired result
SELECT count(distinct injuries.name)
FROM injuries
INNER JOIN (SELECT name, min(date) as startDate FROM sportsBegins WHERE sportname = 'Basketball' GROUP BY name) as startDates ON injuries.name=startDates.name and injuries.date > startDates.startDate
Quick explanation:
startDates extracts the first date each player started playing basketball
the join condition filters only injuries which happened after the first start date for each player
count(distinct injuries.name) ensures each player only gets counted once even if he/she reported more than one injury after the first start date

Related

How do I produce a report to show the number of occurrences an employee has been absent from work

I have been asked to generate a report to show the number of occurrences an employee is absent from work sick.
If an employee is absent from work for 3 consecutive days this will be counted as 1 occurrence. If they then return to work and are then absent again for another 2 consecutive days this will be recorded as 2 occurrences.
I need to generate a report to show the number of occurrences an employee is away from work sick within a 6 month period.
I have set out an example below of the data showing an employee's absence records and how i need the report to look.
How data shows in database:
enter image description here
Name Absence Dates
John Smith 01-Sep-19
John Smith 02-Sep-19
John Smith 03-Sep-19
John Smith 10-Sep-19
John Smith 11-Sep-19
How i wish for the report to look:
Name Occurrences
John Smith 2
I would be grateful for any assistance with writing to code to achieve this result.
Not a full answer, as you should really do some of this yourself, however, based on what you have detailed in your quesiton, you could use the approach below to count up any spells of absence, within a 6 month period.
Assumes you would be compiling this using SQL Server
declare #absences table (empid nvarchar(10), [abs date] date, [ret date] date);
declare #staff table ([empid] int, [name1] nvarchar(50), [name2] nvarchar(50), [surname] nvarchar(50));
-- put some test values in the staff table to work with
insert into #staff
values
(1, 'John', 'Lewis', 'Smith'), -- using a unique ID here, in any good system this should be an incremental number for each new staff member added to the table
(2, 'James', 'Thomas', 'Brown')
-- put some test values in the absences table to work with
insert into #absences
values
(1, '2019-07-01', '2019-07-04'), -- userid, absence date & return date
(1, '2019-08-04', '2019-08-06'),
(2, '2019-07-02', '2019-07-05'),
(2, '2019-08-05', '2019-08-07')
select count(*) spellsoff, empid, name1, name2, surname, [days absent]
from
(
select
s.empid,
s.name1,
s.name2,
s.surname,
a.[abs date],
a.[ret date],
datediff(d,a.[abs date], a.[ret date]) [days absent]
from #staff s
left join #absences a
on s.empid = a.empid
where [abs date] >= DATEADD(M,-6,GETDATE()) -- pull back those employeess that have been absent in the last 6 months from today's date
)doff
group by empid, name1, name2, surname, [days absent]
Gives you the following breakdown:
spellsoff empid name1 name2 surname days absent
1 1 John Lewis Smith 2
1 1 John Lewis Smith 3
1 2 James Thomas Brown 2
1 2 James Thomas Brown 3

Select a specific row from a table with duplicated entries based on one field

I have a table which holds data in the following format, however I would like to be able to create a query that checks whether the reference number is duplicated and only return the entry with the latest date_issued.
ref_no name gender place date_issued
xgb/358632/p John Smith M London 02.08.2016
Xgb/358632/p John Smith M London 14.06.2017
Rtu/638932/k Jane Doe F Birmingham 04.09.2017
The result from the query should be;
ref_no name gender place date_issued
Xgb/358632/p John Smith M London 14.06.2017
Rtu/638932/k Jane Doe F Birmingham 04.09.2017
Is there a fairly straightforward solution for this?
assuming the date column is type date or timestamp
select distinct on(ref_no) * from tablename order by refno,date desc;
this works beacuse distinct on supresses rows with duplicates of the expression in parenthese.

How can 'brand new, never before seen' IDs be counted per month in redshift?

A fair amount of material is available detailing methods utilising dense_rank() and the like to count distinct somethings per month, however, I've been unable to find anything that allows a count of distinct per month which also removes/discounts any id's that have been seen in prior month groups.
The data can be imagined like so:
id (int8 type) | observed time (timestamp utc)
------------------
1 | 2017-01-01
2 | 2017-01-02
1 | 2017-01-02
1 | 2017-02-02
2 | 2017-02-03
3 | 2017-02-04
1 | 2017-03-01
3 | 2017-03-01
4 | 2017-03-01
5 | 2017-03-02
The process of the count can be seen as:
1: in 2017-01 we saw devices 1 and 2 so the count is 2
2: in 2017-02 we saw devices 1, 2 and 3. We know already about devices 1 and 2, but not 3, so the count is 1
3: in 2017-03 we saw devices 1, 3, 4 and 5. We already know about 1 and 3, but not 4 or 5, so the count is 2.
with the desired output being something like:
observed time | count of new id
--------------------------
2017-01 | 2
2017-02 | 1
2017-03 | 2
Explicitly, I am looking to have a new table, with an aggregated month per row, with a count of how many new ids occur within that month that have not been seen at all before.
The IRL case allows devices to be seen more than once in a month, but this shouldn't impact the count. It also uses integer for storage (both positive and negative) of the id, and time periods will be to the second in true timestamps. The size of the data set is also significant.
My initial attempt is along the lines of:
WITH records_months AS (
SELECT *,
date_trunc('month', observed_time) AS month_group
FROM my_table
WHERE observed_time > '2017-01-01')
id_months AS (
SELECT DISTINCT
month_group,
id
FROM records_months
GROUP BY month_group, id)
SELECT *
FROM id-months
However, I'm stuck on the next part i.e counting the number of new ID that were not seen in prior months. I believe the solution might be a window function, but I'm having trouble working out which or how.
First thing I thought of. The idea is to
(innermost query) calculate the earliest month that each id was seen,
(next level up) join that back to the main my_table dataset, and then
(outer query) count distinct ids by month after nulling out the already-seen ids.
I tested it out and got the desired result set. Joining the earliest month back to the original table seemed like the most natural thing to do (vs. a window function). Hopefully this is performant enough for your Redshift!
select observed_month,
-- Null out the id if the observed_month that we're grouping by
-- is NOT the earliest month that the id was seen.
-- Then count distinct id
count(distinct(case when observed_month != earliest_month then null else id end)) as num_new_ids
from (
select t.id,
date_trunc('month', t.observed_time) as observed_month,
earliest.earliest_month
from my_table t
join (
-- What's the earliest month an id was seen?
select id,
date_trunc('month', min(observed_time)) as earliest_month
from my_table
group by 1
) earliest
on t.id = earliest.id
)
group by 1
order by 1;

Distinct Count after Sum

So I am looking to do a count after aggregation. Basically I want to be able to total up the Inventory count with a sum and then count how many times each employee has a non zero inventory count.
So for this data Jack/Jimmy would have a count of 1, Sam would have a count of 2 and Steve would have a count of 0. I could easily do this in SQL on the back end but I also want them to be able to use a date parameter. So if they shifted the date to only 1/1/17 Sam would have a count of 1 and everyone else would have a 0. Any help would be much appreciated!
Data
Emp Item Inventory Date
Sam Crackers 1 1/1/2017
Jack Crackers 1 1/1/2017
Jack Crackers -1 2/1/2017
Jimmy Crackers -2 1/1/2017
Sam Apples 1 1/1/2017
Steve Apples -1 1/1/2017
Sam Cheese 1 1/1/2017
With Date>= '1/1/17':
Emp NonZeroCount
Sam 2
Jack 1
Jimmy 1
Steve 0
With Date = '1/1/17':
Emp NonZeroCount
Sam 1
Jack 0
Jimmy 0
Steve 0
SQL I envision it replacing
Create Table #Test(
Empl varchar(50),
Item Varchar (50),
Inventory int,
Date Date
)
Declare #DateParam Date
Set #DateParam = '1/1/17'
Insert into #Test (Empl,Item,Inventory,Date)
Values
('Sam','Crackers',1,'1/1/2017'),
('Jack','Crackers',1,'1/1/2017'),
('Jack','Crackers',-1,'2/1/2017'),
('Jimmy','Crackers',-2,'1/1/2017'),
('Sam','Apples',1,'1/1/2017'),
('Steve','Apples',-1,'1/1/2017'),
('Sam','Cheese',1,'1/1/2017');
Select
Item,Sum(Inventory) as Total
into #badItems
from #Test
Where Date >= #DateParam
group by Item
having Sum(Inventory) <> 0
Select
T.Empl,Count(Distinct BI.Item)
From #Test T
Inner Join #badItems BI on BI.Item = T.Item
group by T.Empl
This is a good case for creating a set in Tableau.
Select the Item field in the data pane on the left, and right click to create a set based on that field. Name it Bad Items, and define it using the following formula on the Condition tab, which assumes you've defined a parameter named [DateParam] of type Date.
sum(if [Date] >= [DateParam] then [Inventory] end) <> 0
You can then use the set on the filter shelf, row shelf, in calculations or combine with other sets as desired.
P.S. I used an alias to display the text "Bad Items" instead of "In" in the table, set a manual default sort order for the Emp field (in case you are trying to reproduce this exactly)

How to check if something from one table in not used in another

I have three tables:
CREATE TABLE activities (
activity varchar(20) Primary key
);
with data:
Table_Tennis1
Table_Tennis2
Table_Tennis3
and
CREATE TABLE times (
time varchar(5)
);
with data
09:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
and finally
CREATE TABLE planner (
day varchar(9) foreign key
time varchar(5) foreign key
activity varchar(20) foreign key
member bigint foreign key
);
and Primary Key = (day, time, activity)
with data
friday,09:00,Table_Tennis1,4
friday,10:00,Table_Tennis2,2
I was wondering it was possible to find out all the Table_Tennis rooms that are not being used at a certain time on a certain day, or all rooms that have not yet been booked on all times for one day.
so it should give me a result_set of
09:00, Table_Tennis2, Table_Tennis3
10:00, Table_Tennis1, Table_Tennis3
11:00, Table_Tennis1, Table_Tennis2, Table_Tennis3 ect ect
all the Table_Tennis rooms that are not being used at a certain time
on a certain day,
SELECT activity
FROM activities a
WHERE NOT EXISTS (
SELECT *
FROM planner p
WHERE p.activity ~~ 'Table_Tennis%' -- may or may not be needed
AND p.day = 'friday'
AND p.time = '09:00'
AND p.activity = a.activity -- was missing in my 1st draft
);
all rooms that have not yet been booked on all times for one day.
SELECT a.activity
FROM activities a
LEFT JOIN (
SELECT activity
FROM planner p
WHERE day = 'friday'
GROUP BY 1
HAVING count(*) = 12 -- assuming there are exactly 12 slots
) p USING (activity)
WHERE p.activity IS NULL; -- excludes all fully booked rooms
Or:
SELECT activity
FROM activities a
WHERE NOT EXISTS (
SELECT activity
FROM planner p
WHERE day = 'friday'
GROUP BY 1
HAVING count(*) = 12 -- assuming there are exactly 12 slots
);
But not:
SELECT activity
FROM activities a
JOIN (
SELECT activity
FROM planner p
WHERE day = 'friday'
GROUP BY 1
HAVING count(*) < 12
) p USING (activity);
... because that would drop rooms with no entries for the day at all.
You might consider using
slot time
instead of
time varchar(5)
time should not be used as identifier. It is a reserved word in all SQL standards and a type name in PostgreSQL.
Also, the data type time is a better fit for your purpose and occupies less space than varchar(5).
And
day date foreign key ...
,slot time foreign key ...
instead of
day varchar(9) foreign key ...
,time varchar(5) foreign key ...
The names of weekdays would let you cover one week. I assume you want more than that.