I have a Postgres table that contains a date and status field. I want to create a query that will return the date, plus the total number of records and then the total number of records for each status on that date.
Source Table:
job_id, process_datetime, process_status
The results I would like:
process_date | total_925_jobs | total_completed_925_jobs
2022-01-02 | 50 | 45
2022-01-03 | 150 | 135
I tried to join to subqueries, but it does not like the calculated date field.
SELECT
date(all_records.create_datetime) AS process_date,
total_jobs.total_925_jobs,
total_completed.total_completed_925_jobs
from "925-FilePreprocessing"
all_records
INNER JOIN
( SELECT
date("925-FilePreprocessing".create_datetime) AS total_process_date,
"925-FilePreprocessing".process_status,
COUNT("925-FilePreprocessing".file_preprocessing_id) as total_925_jobs
FROM
"925-FilePreprocessing"
where
"925-FilePreprocessing".create_datetime > '2022-01-01'
GROUP BY
total_process_date, process_status
) as "total_jobs"
ON date(all_records.create_datetime) = date(total_jobs.total_process_date)
INNER JOIN
(SELECT
date("925-FilePreprocessing".create_datetime) AS completed_process_date,
COUNT("925-FilePreprocessing".file_preprocessing_id) as total_completed_925_jobs
FROM
"925-FilePreprocessing"
where
"925-FilePreprocessing".create_datetime > '2022-01-01'
and ("925-FilePreprocessing".process_status = 'completed'
or "925-FilePreprocessing".process_status = 'completed-duplicated'
or "925-FilePreprocessing".process_status = 'completed-duplicated-published'
or "925-FilePreprocessing".process_status = 'completed-not_a_drawing'
)
GROUP BY
completed_process_date
) as "total_completed"
ON all_records.process_date = total_completed.completed_process_date
ORDER BY
process_date
I get an error:
ERROR: column all_records.process_date does not exist
LINE 42: ON all_records.process_date = total_completed.completed_pro...
^
Conditional count may be usefull
Old way (using sum) - before Postgresql 9.4
select
a.process_datetime::DATE,
count(*) total_925_jobs,
sum ( case when a.process_status in ('completed',
'completed-duplicated',
'completed-duplicated-published',
'completed-not_a_drawing')
then 1
else 0 end) total_completed_925_jobs
from "925-FilePreprocessing" a
where a.process_datetime::DATE >= '2021-01-01'
group by a.process_datetime::DATE
New way - from POstgresql 9.4 (using filter)
select
a.process_datetime::DATE,
count(*) total_925_jobs,
count(*) filter (where a.process_status in ('completed', 'completed-duplicated', 'completed-duplicated-published', 'completed-not_a_drawing')) total_completed_925_jobs
from "925-FilePreprocessing" a
where a.process_datetime::DATE >= '2021-01-01'
group by a.process_datetime::DATE
Going back to your query - I have error column 925-FilePreprocessing.create_datetime does not exist which is different than yours. Check if table definition you deliver is complete.
the result you like
process_date | total_925_jobs | total_completed_925_jobs
2022-01-02 | 50 | 45
2022-01-03 | 150 | 135
since total_completed have far less row than total_jobs means that there are only two date/datetime greater than '2022-01-01'.
the follow query can be get your result. I declutter a lot unnecessary code.
group by 1 mean: https://www.cybertec-postgresql.com/en/postgresql-group-by-expression/
WITH total_jobs AS (
SELECT
create_datetime::date AS total_process_date,
process_status,
COUNT(file_preprocessing_id) AS total_925_jobs
FROM
"925-FilePreprocessing"
WHERE
create_datetime::date > '2022-01-01'::date
GROUP BY
1,
2
),
total_completed AS (
SELECT
date("925-FilePreprocessing".create_datetime) AS completed_process_date,
COUNT(file_preprocessing_id) AS total_completed_925_jobs
FROM
"925-FilePreprocessing"
WHERE
create_datetime::date > '2022-01-01'
AND process_status IN ('completed', 'completed-duplicated', 'completed-duplicated-published', 'completed-not_a_drawing')
GROUP BY
1
)
SELECT
total_jobs. *,
tp.total_completed_925_jobs
FROM
total_jobs tk
JOIN total_completed tp ON tk.total_process_date = tp.completed_process_date
I have data something like this:
ID 1 1 1 1 1 1 1 1 1 1 1 1
Month J F M A M J J A S O N D
Status 1 0 0 1 0 1 0 0 1 1 1 1
ID 2 2 2 2 2 2 2 2 2 2 2 2
Month J F M A M J J A S O N D
Status 1 0 1 0 1 0 1 0 1 0 1 1
ID 3 3 3 3 3 3 3 3 3 3 3 3
Month J F M A M J J A S O N D
Status 0 0 0 0 0 0 0 0 0 0 0 1
Using t-SQL, I am trying to capture the month corresponding to the first STATUS = 1 in the last group of 1s for each ID, i.e., September, November and December in this example.
Here is the code I'm using:
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL DROP TABLE #Temp1
;WITH PARTITIONED1 AS
(SELECT t0.ID
, t0.Year_Month
, LAST_VALUE(t0.Year_Month) OVER (PARTITION BY t0.Account_Number ORDER BY t0.Year_Month) AS STATUS
, ROW_NUMBER() OVER (PARTITION BY t0.Account_Number ORDER BY t0.Year_Month) AS rn1
FROM #Temp0 t0
)
SELECT *
INTO #Temp1
FROM PARTITIONED1 p1
ORDER BY t0.ID
, t0.Year_Month
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL DROP TABLE #Temp
SELECT *
INTO #Temp
FROM #Temp1 t1
WHERE t1.rn1 = (SELECT MAX(b.rn1) + 1 FROM #Temp1 b WHERE b.STATUS = 0)
GROUP BY t1.ID
, t1.Year_Month
, t1.rn1
However, this just returns the last instance where STATUS = 1 is achieved overall as the first 1 of the last group of 1s, in this case January.
I've tried using CASE statements and grouping in various combinations (hence the intermediate step reading the data into #Temp1), but have not been able to get results for all three IDs; is anyone able to assist?
Thanks in advance!
Assuming Ju for June and Jl for July:
--Sample Data
IF OBJECT_ID('tempdb..#Temp0') IS NOT NULL DROP TABLE #Temp0
CREATE TABLE #Temp0 (ID INT, Year_Month VARCHAR(1), Status INT)
INSERT INTO #Temp0
VALUES(1,'J',1),(1,'F',0),(1,'M',0),(1,'A',1),(1,'M',0),(1,'J',1),(1,'J',0),(1,'A',0),(1,'S',1),(1,'O',1),(1,'N',1),(1,'D',1),(2,'J',1),(2,'F',0),(2,'M',1),(2,'A',0),(2,'M',1),(2,'J',0),(2,'J',1),(2,'A',0),(2,'S',1),(2,'O',0),(2,'N',1),(2,'D',1),(3,'J',0),(3,'F',0),(3,'M',0),(3,'A',0),(3,'M',0),(3,'J',0),(3,'J',0),(3,'A',0),(3,'S',0),(3,'O',0),(3,'N',0),(3,'D',1);
--Query
WITH A
AS ( SELECT *,
CASE Year_Month
WHEN 'J' THEN 1
WHEN 'F' THEN 2
WHEN 'M' THEN 3
WHEN 'A' THEN 4
WHEN 'M' THEN 5
WHEN 'Ju' THEN 6
WHEN 'Jl' THEN 7
WHEN 'A' THEN 8
WHEN 'S' THEN 9
WHEN 'O' THEN 10
WHEN 'N' THEN 11
WHEN 'D' THEN 12
END
AS MonthNumber
FROM #Temp0 ),
StartingPoints
AS ( SELECT ID,
Year_Month,
MonthNumber,
Status
FROM A
WHERE NOT EXISTS
(
SELECT 1
FROM A
AS B
WHERE B.ID=A.ID
AND B.Status=A.Status-1
) ),
MonthRanking
AS ( SELECT A.*,
ROW_NUMBER( ) OVER( PARTITION BY A.ID ORDER BY A.MonthNumber )
AS rownum
FROM A
INNER JOIN
(
SELECT ID,
MAX( MonthNumber )+1
AS StartOfLastGroup
FROM StartingPoints
GROUP BY ID
)
AS B
ON A.ID=B.ID
AND A.MonthNumber>=B.StartOfLastGroup )
SELECT *
FROM MonthRanking
WHERE rownum=1;
Results:
If Month Names are recorded in Full as in July, June then this would work as well:
WITH StartingPoints
AS (SELECT ID,
Year_Month,
MonthNUmber = MONTH('01-'+Year_Month+'-2010'),
Status
FROM #Temp0
WHERE NOT EXISTS
(
SELECT 1
FROM #Temp0 AS B
WHERE B.ID = #Temp0.ID
AND B.Status = #Temp0.Status - 1
)),
MonthRanking
AS (SELECT A.*,
ROW_NUMBER() OVER(PARTITION BY A.ID ORDER BY MONTH('01-'+A.Year_Month+'-2010')) AS rownum
FROM #Temp0 AS A
INNER JOIN
(
SELECT ID,
MAX(MonthNumber) + 1 AS StartOfLastGroup
FROM StartingPoints
GROUP BY ID
) AS B ON A.ID = B.ID
AND MONTH('01-'+A.Year_Month+'-2010') >= B.StartOfLastGroup)
SELECT *
FROM MonthRanking
WHERE rownum = 1;
Results:
And if we assume that the data is as Iamdave assumes then it simply like so:
WITH StartingPoints
AS (SELECT ID,
Year_Month,
Status
FROM #Temp0
WHERE NOT EXISTS
(
SELECT 1
FROM #Temp0 AS B
WHERE B.ID = #Temp0.ID
AND B.Status = #Temp0.Status - 1
)),
MonthRanking
AS (SELECT A.*,
ROW_NUMBER() OVER(PARTITION BY A.ID ORDER BY Year_Month) AS rownum
FROM #Temp0 AS A
INNER JOIN
(
SELECT ID,
MAX(Year_Month) + 1 AS StartOfLastGroup
FROM StartingPoints
GROUP BY ID
) AS B ON A.ID = B.ID
AND A.Year_Month >= B.StartOfLastGroup)
SELECT *
FROM MonthRanking
WHERE rownum = 1;
Results:
You can do this with a couple derived tables that stack two window functions on top of one another (which can't be done in the same select). I have assumed that your data is slightly different to the table you have provided, based on the column names in your query. If they are not as I have them below, I strongly recommend having a look at how you store your data:
declare #t table(ID int, YearMonth int,StatusValue bit);
insert into #t values (1,201501,1),(1,201502,0),(1,201503,0),(1,201504,1),(1,201505,0),(1,201506,1),(1,201507,0),(1,201508,0),(1,201509,1),(1,201510,1),(1,201511,1),(1,201512,1),(2,201601,1),(2,201602,0),(2,201603,1),(2,201604,0),(2,201605,1),(2,201606,0),(2,201607,1),(2,201608,0),(2,201609,1),(2,201610,0),(2,201611,1),(2,201612,1),(3,201701,0),(3,201702,0),(3,201703,0),(3,201704,0),(3,201705,0),(3,201706,0),(3,201707,0),(3,201708,0),(3,201709,0),(3,201710,0),(3,201711,0),(3,201712,1);
with c as
(
select ID
,YearMonth
,StatusValue
,case when StatusValue = 1
and lead(StatusValue,1,1) over (partition by ID
order by YearMonth desc) = 0
then 1
else 0
end as c
from #t
), sc as
(
select ID
,YearMonth
,StatusValue
,sum(c) over (partition by ID order by YearMonth desc) as sc
from c
where c = 1
)
select ID
,YearMonth
,StatusValue
from sc
where sc = 1
order by ID;
Output:
+----+-----------+-------------+
| ID | YearMonth | StatusValue |
+----+-----------+-------------+
| 1 | 201509 | 1 |
| 2 | 201611 | 1 |
| 3 | 201712 | 1 |
+----+-----------+-------------+