I have a table with two columns ID and Date say like with below data. For a given range say like from 2022-09-01 to 2022-09-10 I want to return the missing dates for respective ID's along with ID value, I want data to be returned as mentioned in Expected output. How can I achieve this
Data inside table:
ID
Date
1
2022-09-01
1
2022-09-07
1
2022-09-08
1
2022-09-09
2
2022-09-01
2
2022-09-02
2
2022-09-03
2
2022-09-04
Expected Output:
ID
Missing Dates
1
2022-09-02
1
2022-09-03
1
2022-09-04
1
2022-09-05
1
2022-09-06
1
2022-09-10
2
2022-09-05
2
2022-09-06
2
2022-09-07
2
2022-09-08
2
2022-09-09
2
2022-09-10
I wrote sample query for you:
CREATE TABLE test1 (
id int4 NULL,
pdate date NULL
);
INSERT INTO test1 (id, pdate) VALUES(1, '2022-09-01');
INSERT INTO test1 (id, pdate) VALUES(1, '2022-09-07');
INSERT INTO test1 (id, pdate) VALUES(1, '2022-09-08');
INSERT INTO test1 (id, pdate) VALUES(1, '2022-09-09');
INSERT INTO test1 (id, pdate) VALUES(2, '2022-09-01');
INSERT INTO test1 (id, pdate) VALUES(2, '2022-09-02');
INSERT INTO test1 (id, pdate) VALUES(2, '2022-09-03');
INSERT INTO test1 (id, pdate) VALUES(2, '2022-09-04');
select t1.id, t1.datelist from (
select t.id, generate_series(t.startdate, t.enddate, '1 day')::date as datelist from (
select distinct id, '2022-09-01'::date as startdate, '2022-09-10'::date as enddate from test1
) t
) t1
left join test1 t2 on t2.pdate = t1.datelist and t1.id = t2.id
where t2.pdate is null
Result:
id datelist
1 2022-09-02
1 2022-09-03
1 2022-09-04
1 2022-09-05
1 2022-09-06
1 2022-09-10
2 2022-09-05
2 2022-09-06
2 2022-09-07
2 2022-09-08
2 2022-09-09
2 2022-09-10
Related
I am trying to find count and distinct of multiple values but its not worikng in db2
select count(distinct col1, col2) from table
it throws syntax error that count has multiple columns.
any way to achieve this
column 1 column 2 date
1 a 2022-12-01
1 a 2022-12-01
2 a 2022-11-30
2 b 2022-11-30
1 b 2022-12-01
i want output
column1 column2 date count
1 a 2022-12-01 2
2 a 2022-11-30 1
2 b 2022-11-30 1
1 a 2022-12-01 1
The following query returns exactly what you want.
WITH MYTAB (column1, column2, date) AS
(
VALUES
(1, 'a', '2022-12-01')
, (1, 'a', '2022-12-01')
, (2, 'a', '2022-11-30')
, (2, 'b', '2022-11-30')
, (1, 'b', '2022-12-01')
)
SELECT
column1
, column2
, date
, COUNT (*) AS CNT
FROM MYTAB
GROUP BY
column1
, column2
, date
COLUMN1
COLUMN2
DATE
CNT
1
a
2022-12-01
2
1
b
2022-12-01
1
2
a
2022-11-30
1
2
b
2022-11-30
1
fiddle
Not exactly sure of what you are looking for...
but
select count(distinct col1), count(distinct col2) from table
or
select count(distinct col1 CONCAT col2) from table
Are how I would interpret "distinct count of multiple values" in a table..
I have a table of datestamped events that I need to bundle into 7-day groups, starting with the earliest occurrence of each event_id.
The final output should return each bundle's start and end date and 'value' column of the most recent event from each bundle.
There is no predetermined start date, and the '7-day' windows are arbitrary, not 'week of the year'.
I've tried a ton of examples from other posts but none quite fit my needs or use things I'm not sure how to refactor for BigQuery
Sample Data;
Event_Id
Event_Date
Value
1
2022-01-01
010203
1
2022-01-02
040506
1
2022-01-03
070809
1
2022-01-20
101112
1
2022-01-23
131415
2
2022-01-02
161718
2
2022-01-08
192021
3
2022-02-12
212223
Expected output;
Event_Id
Start_Date
End_Date
Value
1
2022-01-01
2022-01-03
070809
1
2022-01-20
2022-01-23
131415
2
2022-01-02
2022-01-08
192021
3
2022-02-12
2022-02-12
212223
You might consider below.
CREATE TEMP FUNCTION cumsumbin(a ARRAY<INT64>) RETURNS INT64
LANGUAGE js AS """
bin = 0;
a.reduce((c, v) => {
if (c + Number(v) > 6) { bin += 1; return 0; }
else return c += Number(v);
}, 0);
return bin;
""";
WITH sample_data AS (
select 1 event_id, DATE '2022-01-01' event_date, '010203' value union all
select 1 event_id, '2022-01-02' event_date, '040506' value union all
select 1 event_id, '2022-01-03' event_date, '070809' value union all
select 1 event_id, '2022-01-20' event_date, '101112' value union all
select 1 event_id, '2022-01-23' event_date, '131415' value union all
select 2 event_id, '2022-01-02' event_date, '161718' value union all
select 2 event_id, '2022-01-08' event_date, '192021' value union all
select 3 event_id, '2022-02-12' event_date, '212223' value
),
binning AS (
SELECT *, cumsumbin(ARRAY_AGG(diff) OVER w1) bin
FROM (
SELECT *, DATE_DIFF(event_date, LAG(event_date) OVER w0, DAY) AS diff
FROM sample_data
WINDOW w0 AS (PARTITION BY event_id ORDER BY event_date)
) WINDOW w1 AS (PARTITION BY event_id ORDER BY event_date)
)
SELECT event_id,
MIN(event_date) start_date,
ARRAY_AGG(
STRUCT(event_date AS end_date, value) ORDER BY event_date DESC LIMIT 1
)[OFFSET(0)].*
FROM binning GROUP BY event_id, bin;
is it possible to make a statistic with the queries starting from the data so configured?
Table a: registry
id (key)
name
able b: holidays
id (key)
id_anagrafica (foreign key)
data_start
data_end
Query:
SELECT b.id, a.name, b.start_date, b.end_date
FROM registry to INNER JOIN
holidays b ON (a.id = b.id_anagrafica)
WHERE b.start_date> = getdate ()
So doing I get:
id, name, start_date, end_date
1, Mario, 01/06/2018, 30/06/2018
2, Marino, 08/06/2018, 25/06/2018
3, Maria, 01/07/2018, 05/07/2018
-
-
-
Having only a start_date and end_date I can not know in a day how many people are on holidays.
What I need is:
data, num_pers_in_ferie
01/06/2018, 1
06/02/2018, 1
03/06/2018, 1
-
-
08/06/2018, 2
Can you help me?
Thanks in advance
Check the approach below
create table #registry (id int, name nvarchar(50))
insert into #registry values
(1, 'Mario'),
(2, 'Marino'),
(3, 'Maria')
create table #holidays (id int,id_anagrafica int,data_start date,data_end date)
insert into #holidays
select id, id, '2018-06-01', '2018-06-30'
from #registry
update #holidays set data_start = dateadd(day, 20, data_start), data_end = dateadd(day, -5, data_end)
where id = 2
update #holidays set data_start = dateadd(day, 14, data_start)--, data_end = dateadd(day, -10, data_end)
where id = 3
SELECT b.id, a.name, b.data_start, b.data_end
FROM #registry a
INNER JOIN
#holidays b ON (a.id = b.id_anagrafica)
WHERE b.data_start > = getdate ()
DECLARE #startDate DATETIME=CAST(MONTH(GETDATE()) AS VARCHAR) + '/' + '01/' + + CAST(YEAR(GETDATE()) AS VARCHAR) -- mm/dd/yyyy
DECLARE #endDate DATETIME= GETDATE() -- mm/dd/yyyy
select [DATA] = convert(date, DATEADD(Day,Number,#startDate)),
--se ti serve in italiano usa la riga sotto
--[DATA] = CONVERT(varchar, DATEADD(Day,Number,#startDate), 103)
SUM(case when DATEADD(Day,Number,#startDate) between data_start and data_end then 1 else 0 end) Pers_in_Ferie
from master..spt_values c,
#registry a
INNER JOIN
#holidays b ON (a.id = b.id_anagrafica)
where c.Type='P' and DATEADD(Day,Number,#startDate) >=data_start and DATEADD(Day,Number,#startDate) <=data_end
group by DATEADD(Day,Number,#startDate)
order by [DATA]
drop table #holidays
drop table #registry
Output:
DATA Pers_in_Ferie
---------- -------------
2018-06-01 1
2018-06-02 1
2018-06-03 1
2018-06-04 1
2018-06-05 1
2018-06-06 1
2018-06-07 1
2018-06-08 1
2018-06-09 1
2018-06-10 1
2018-06-11 1
2018-06-12 1
2018-06-13 1
2018-06-14 1
2018-06-15 2
2018-06-16 2
2018-06-17 2
2018-06-18 2
2018-06-19 2
2018-06-20 2
2018-06-21 3
2018-06-22 3
2018-06-23 3
2018-06-24 3
2018-06-25 3
2018-06-26 2
2018-06-27 2
2018-06-28 2
2018-06-29 2
2018-06-30 2
(30 rows affected)
Table1
sub-id ref-id Name
1 1 Project 1
2 1 Project 2
3 2 Project 3
4 2 Project 4
Table2
sub-id ref-id log_stamp Recepient log_type
----------------------------------------------------
1 1 06/06/2011 person A 1
1 1 06/14/2011 person B 2
1 1 06/16/2011 person C 2
1 1 06/17/2011 person D 3
2 1 06/18/2011 person E 2
2 1 06/19/2011 person F 2
3 2 06/20/2011 person G 1
4 2 06/23/2011 person H 3
Result
Name ref-id start_date Recepient latest_comment Recepient completion_date Receipient
Project1 1 06/06/2011 person A 06/19/2011 person F 06/17/2011 person D
Project3 2 06/20/2011 person G NULL NULL 06/23/2011 person H
log_type of 1 stands for start_date
log_type of 2 stands for latest_comment
log_type of 3 stands for completion_date
The Name of the project is just the name of the top-most name in the same group of ref-id
have tried this for now
;with T as (select
Table2.ref-id,
Table2.log_stamp,
Table2 log.log_type
when 1 then '1'
when 2 then '2'
when 3 then '3'
end as title
from
Submission sb inner join submission_log log on Table1.[sub-id] = Table2.[sub-id]
)
select * from T
pivot (
max(log_stamp)
for title IN ([1],[2],[3],[5],[6],[9],[11])
I was unable to do it as a pivot, I dont think it is possible as described
DECLARE #table1 TABLE (sub_id INT, ref_id INT, name VARCHAR(50))
INSERT #table1 VALUES (1, 1, 'Project 1')
INSERT #table1 VALUES (2, 1, 'Project 2')
INSERT #table1 VALUES (3, 2, 'Project 3' )
INSERT #table1 VALUES (4, 2, 'Project 4')
DECLARE #Table2 TABLE (sub_id INT, ref_id INT, log_stamp DATETIME, recepient VARCHAR(10), logtype INT)
INSERT #table2 VALUES(1,1,'06/06/2011','person A',1)
INSERT #table2 VALUES(1,1,'06/14/2011','person B',2)
INSERT #table2 VALUES(1,1,'06/16/2011','person C',2)
INSERT #table2 VALUES(1,1,'06/17/2011','person D',3)
INSERT #table2 VALUES(2,1,'06/18/2011','person E',2)
INSERT #table2 VALUES(2,1,'06/19/2011','person F',2)
INSERT #table2 VALUES(3,2,'06/20/2011','person G',1)
INSERT #table2 VALUES(3,2,'06/23/2011','person H',3)
;WITH a as (
SELECT RN = ROW_NUMBER() OVER (PARTITION BY t1.sub_id, t1.ref_id, t1.name, t2.logtype ORDER BY log_stamp DESC), t1.sub_id, t1.ref_id, t1.name, t2.Recepient , t2.logtype ,log_stamp
FROM #table1 t1 JOIN #table2 t2 ON t1.ref_id = t2.ref_id AND
t1.sub_id = t2.sub_id),
b as (SELECT * FROM a WHERE RN = 1)
SELECT b1.name, b1.ref_id,b1.log_stamp start_date , b1.Recepient, b2.log_stamp latest_comment , b2.Recepient, b3.log_stamp completion_date , b3.Recepient
FROM b b1
LEFT JOIN b b2 ON b1.sub_id=b2.sub_id AND b1.ref_id = b2.ref_id AND b2.logtype = 2
LEFT JOIN b b3 ON b1.sub_id=b3.sub_id AND b1.ref_id = b3.ref_id AND b3.logtype = 3
WHERE b1.logtype = 1
Result:
name ref_id start_date Recepient latest_comment Recepient completion_date Recepient
------------ ----------- ----------------------- ---------- ----------------------- ---------- ----------------------- ----------
Project 1 1 2011-06-06 00:00:00.000 person A 2011-06-16 00:00:00.000 person C 2011-06-17 00:00:00.000 person D
Project 3 2 2011-06-20 00:00:00.000 person G NULL NULL 2011-06-23 00:00:00.000 person H
Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2005
1 Dan Kramer 1/1/2007
1 Dan Kramer 1/1/2009
2 Pamella Slattery 1/1/2005
2 Pam Slattery 1/1/2006
2 Pam Slattery 1/1/2008
3 Samamantha Cohen 1/1/2008
3 Sam Cohen 1/1/2009
I need to extract the latest updated for all these users, basically here's what I'm looking for:
Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2009
2 Pam Slattery 1/1/2008
3 Sam Cohen 1/1/2009
Now when I run the following:
SELECT Userid, FirstName, LastName, Max(UserUpdate) AS MaxDate
FROM Table
GROUP BY Userid, FirstName, LastName
I still get duplicates, something like this:
Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2009
2 Pamella Slattery 1/1/2005
2 Pam Slattery 1/1/2008
3 Samamantha Cohen 1/1/2008
3 Sam Cohen 1/1/2009
try:
declare #Table table (userid int,firstname varchar(10),lastname varchar(20), userupdate datetime)
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2005')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2007')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2009')
INSERT #Table VALUES (2, 'Pamella' ,'Slattery' ,'1/1/2005')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2006')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2008')
INSERT #Table VALUES (3, 'Samamantha' ,'Cohen' ,'1/1/2008')
INSERT #Table VALUES (3, 'Sam' ,'Cohen' ,'1/1/2009')
SELECT
dt.Userid,dt.MaxDate
,MIN(a.FirstName) AS FirstName, MIN(a.LastName) AS LastName
FROM (SELECT
Userid, Max(UserUpdate) AS MaxDate
FROM #Table GROUP BY Userid
) dt
INNER JOIN #Table a ON dt.Userid=a.Userid and dt.MaxDate =a.UserUpdate
GROUP BY dt.Userid,dt.MaxDate
OUTPUT:
Userid MaxDate FirstName LastName
----------- ----------------------- ---------- --------------------
1 2009-01-01 00:00:00.000 Dan Kramer
2 2008-01-01 00:00:00.000 Pam Slattery
3 2009-01-01 00:00:00.000 Sam Cohen
You aren't getting duplicates. 'Pam' is not equal to 'Pamella' from the perspective of the database; the fact that one is a colloquial shortening of the other doesn't mean anything to the database engine. There really is no reliable, universal way to do this (since there are names that have multiple abbreviations, like "Rob" or "Bob" for "Robert", as well as abbreviations that can suit multiple names like "Kel" for "Kelly" or "Kelsie", let alone the fact that names can have alternate spellings).
For your simple example, you could simply select and group by SUBSTRING(FirstName, 1, 3) instead of FirstName, but that's just a coincidence based upon your sample data; other name abbreviations would not fit this pattern.
Or use a subquery...
SELECT
a.userID,
a.FirstName,
a.LastName,
b.MaxDate
FROM
myTable a
INNER JOIN
( SELECT
UserID,
Max(ISNULL(UserUpdate,GETDATE())) as MaxDate
FROM
myTable
GROUP BY
UserID
) b
ON
a.UserID = b.UserID
AND a.UserUpdate = b.MaxDate
The subquery (named "b") returns the following:
Userid UserUpdate
1 1/1/2009
2 1/1/2008
3 1/1/2009
The INNER JOIN between the subquery and the original table causes the original table to be filtered for matching records only -- i.e., only records with a UserID/UserUpdate pair that matches a UserID/MaxDate pair from the subquery will be returned, giving you the unduplicated result set you were looking for:
Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2009
2 Pam Slattery 1/1/2008
3 Sam Cohen 1/1/2009
Of course, this is just a work-around. If you really want to solve the problem for the long-term, you should normalize your original table by splitting it into two.
Table1:
Userid FirstName LastName
1 Dan Kramer
2 Pam Slattery
3 Sam Cohen
Table2:
Userid UserUpdate
1 1/1/2007
2 1/1/2007
3 1/1/2007
1 1/1/2008
2 1/1/2008
3 1/1/2008
1 1/1/2009
2 1/1/2009
3 1/1/2009
This would be a more standard way to store data, and would be much easier to query (without having to resort to a subquery). In that case, the query would look like this:
SELECT
T1.UserID,
T1.FirstName,
T1.LastName,
MAX(ISNULL(T2.UserUpdate,GETDATE()))
FROM
Table1 T1
LEFT JOIN
Table2 T2
ON
T1.UserID = T2.UserID
GROUP BY
T1.UserID,
T1.FirstName,
T1.LastName
Another alternative if you have SQL 2005(I think ?) or later would be to use a Common Table Expression and pull out the user id and max date from the table then join against that to get the matching firstname and lastname on the max date. NOTE - this assumes that userid + date would always be unique, the query will break if you get 2 rows with same userid and date. As others have already pointed out this is pretty awful database design - but sometimes thats life, the problem must still be solved. e.g.
declare #Table table (userid int,firstname varchar(10),lastname varchar(20), userupdate datetime)
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2005')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2007')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2009')
INSERT #Table VALUES (2, 'Pamella' ,'Slattery' ,'1/1/2005')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2006')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2008')
INSERT #Table VALUES (3, 'Samamantha' ,'Cohen' ,'1/1/2008')
INSERT #Table VALUES (3, 'Sam' ,'Cohen' ,'1/1/2009');
with cte ( userid , maxdt ) as
(select userid,
max(userupdate)
from #table
group by userid)
SELECT dt.Userid,
dt.firstname,
dt.lastname,
cte.maxdt
FROM
#Table dt
join cte on cte.userid = dt.userid and dt.userupdate = cte.maxdt
Output
Userid firstname lastname maxdt
----------- ---------- -------------------- -----------------------
3 Sam Cohen 2009-01-01 00:00:00.000
2 Pam Slattery 2008-01-01 00:00:00.000
1 Dan Kramer 2009-01-01 00:00:00.000