Joining based on dates - tsql

I have two sets of data that I need to join.
However there is no ID or similar to join the sets, the only common factor between the two sets are the dates used.
Unfortunately the dates are not 100% identical, typically it’s a few minutes or seconds in difference between the two dates.
What I’m hoping is that it would be possible to join based on the dates, if the difference between date for example is less than 10minutes.
Is this something that could be possible?
Example data:
> EventDate WindSpeed
> 2018-01-09 12:00:18.000 3
> 2018-01-10 12:00:03.000 4
Then join with:
> ReadingDate ReadingValue
> 2018-01-09 12:00:00.000 4,6
> 2018-01-10 12:00:00.000 5
So far I have the two below queries that I’m not able to join by myself, would appreciate any help or hints to get this working:
SELECT tu.Name,
tv.VoyageNo,
tv.ExternalVoyageNo,
tve.VoyageEventCode,
tve.EventDate,
tve.WindSpeed,
tve.WindDirection,
tve.SeaStateCode,
tve.PositionLatitude,
tve.PositionLongitude,
tve.Speed,
tve.DistanceByLog,
tve.DistanceOverGround,
tve.HFOStock,
tve.LSFOStock,
tve.MDOStock,
tve.MGOStock,
tve.DraughxcID,
CASE
WHEN ts.IsBallast = 0
THEN 'Laden'
WHEN ts.IsBallast = 1
THEN 'Ballast'
ELSE 'In Port'
END AS Condition,
tcf.Mt
FROM dbo.xcVoyageEvent tve
INNER JOIN dbo.xcUnit tu ON tve.xcUnitID = tu.xcUnitID
INNER JOIN dbo.xcVoyage tv ON tve.xcVoyageID = tv.xcVoyageID
LEFT JOIN dbo.xcSailing ts ON tve.xcSailingID = ts.xcSailingID
LEFT JOIN dbo.xcCargo tc ON tve.xcVoyageID = tc.xcVoyageID
LEFT JOIN dbo.xcCargoFigure tcf ON tc.xcCargoID = tcf.xcCargoID
WHERE tve.RowDeleted = 0
AND tv.RowDeleted = 0
AND ts.RowDeleted = 0
AND tu.RowDeleted = 0
AND tve.VoyageEventCode IN('Commence Sea Passage', 'Sea Passage
Suspended', 'Sea Passage Resumed', 'End Of Sea Passage', 'xcdailyreport',
'Noon position', 'morning position', 'voyage commenced', 'voyage complete',
'Enter Magallanes Strait', 'Enter Suez Canal', 'All Clear', 'All Fast')
ORDER BY tu.Name,
tve.EventDate;
SELECT tu.name,
tc.Code,
tc.Name,
xc.Name,
xcr.ReadingValue,
xcr.ReadingDate,
xcr.Comment
FROM dbo.xcMeasurement xc
INNER JOIN dbo.xcMeasurementReading xcr ON xc.xcMeasurementID =
xcr.xcMeasurementID
INNER JOIN dbo.xcComponent tc ON xc.xcComponentID = tc.xcComponentID
INNER JOIN dbo.xcUnit tu ON xc.xcUnitID = tu.xcUnitID
WHERE xc.RowDeleted = 0
AND xcr.RowDeleted = 0
AND tu.RowDeleted = 0
AND xc.Consumption = 1
ORDER BY tu.Name,
tc.code,
xcr.ReadingDate

You mentioned that there is a difference between the two dates in the different tables,
if it's just 1 "event" per day, you could join on either Date, or Datetime.
if you have multiple events per day, you'd also have to consider using Datediff in my example i used 600 seconds difference
See my example below
create table #temp1
( EventDate Datetime2, Windspeed int)
create table #temp2
( Readingdate Datetime2, Readingvalue nvarchar(50))
insert into #temp1 Values
('2018-01-09 12:00:18.000', 3),
('2018-01-09 13:10:00.000', 3),
('2018-01-10 12:00:03.000', 4)
insert into #temp2 Values
('2018-01-09 12:00:00.000', '5,6'),
('2018-01-09 13:00:00.000', '6'),
('2018-01-10 12:00:00.000', '5')
SELECT
*,DATEDIFF(S,T2.Readingdate, T1.EventDate) [dif_in_sec]
FROM #temp1 AS T1
INNER JOIN #TEMP2 AS T2 ON CAST(T1.EventDate AS date) = CAST(T2.Readingdate AS DATE)
AND( DATEDIFF(S,T2.Readingdate, T1.EventDate) <= 600
and DATEDIFF(S,T2.Readingdate, T1.EventDate) >= 0)
You might have to tweak with the values

To get rows with EventDate and ReadingDate within 10 minutes add this to your where statement:
ABS(DATEDIFF(MINUTE, EventDate, ReadingDate)) <= 10

Related

How to repeat some data points in query results?

I am trying to get the max date by account from 3 different tables and view those dates side by side. I created a separate query for each table, merged the results with UNION ALL, and then wrapped all that in a PIVOT.
The first 2 sections in the link/pic below show what I have been able to accomplish and the 3rd section is what I would like to do.
Query results by step
How can I get the results from 2 of the tables to repeat? Is that possible?
--define var_ent_type = 'ACOM'
--define var_ent_id = '52766'
--define var_dict_id = 113
SELECT
*
FROM
(
SELECT
E.ENTITY_TYPE,
E.ENTITY_ID,
'PERF_SUMMARY' as "TableName",
PS.DICTIONARY_ID,
to_char(MAX(PS.END_EFFECTIVE_DATE), 'YYYY-MM-DD') as "MaxDate"
FROM
RULESDBO.ENTITY E
INNER JOIN PERFORMDBO.PERF_SUMMARY PS ON (PS.ENTITY_ID = E.ENTITY_ID)
WHERE
1=1
-- AND E.ENTITY_TYPE = '&var_ent_type'
-- AND E.ENTITY_ID = '&var_ent_id'
AND PS.DICTIONARY_ID >= 100
AND (E.ACTIVE_STATUS <> 'N' )--and E.TERMINATION_DATE is null )
GROUP BY
E.ENTITY_TYPE,
E.ENTITY_ID,
'PERF_SUMMARY',
PS.DICTIONARY_ID
union all
SELECT
E.ENTITY_TYPE,
E.ENTITY_ID,
'POSITION' as "TableName",
0 as DICTIONARY_ID,
to_char(MAX(H.EFFECTIVE_DATE), 'YYYY-MM-DD') as "MaxDate"
FROM
RULESDBO.ENTITY E
INNER JOIN HOLDINGDBO.POSITION H ON (H.ENTITY_ID = E.ENTITY_ID)
WHERE
1=1
-- AND E.ENTITY_TYPE = '&var_ent_type'
-- AND E.ENTITY_ID = '&var_ent_id'
AND (E.ACTIVE_STATUS <> 'N' )--and E.TERMINATION_DATE is null )
GROUP BY
E.ENTITY_TYPE,
E.ENTITY_ID,
'POSITION',
1
union all
SELECT
E.ENTITY_TYPE,
E.ENTITY_ID,
'CASH_ACTIVITY' as "TableName",
0 as DICTIONARY_ID,
to_char(MAX(C.EFFECTIVE_DATE), 'YYYY-MM-DD') as "MaxDate"
FROM
RULESDBO.ENTITY E
INNER JOIN CASHDBO.CASH_ACTIVITY C ON (C.ENTITY_ID = E.ENTITY_ID)
WHERE
1=1
-- AND E.ENTITY_TYPE = '&var_ent_type'
-- AND E.ENTITY_ID = '&var_ent_id'
AND (E.ACTIVE_STATUS <> 'N' )--and E.TERMINATION_DATE is null )
GROUP BY
E.ENTITY_TYPE,
E.ENTITY_ID,
'CASH_ACTIVITY',
1
--ORDER BY
-- 2,3, 4
)
PIVOT
(
MAX("MaxDate")
FOR "TableName"
IN ('CASH_ACTIVITY', 'PERF_SUMMARY','POSITION')
)
Everything is possible. You only need a window function to make the value repeat across rows w/o data.
--Assuming current query is QC
With QC as (
...
)
select code, account, grouping,
--cash,
first_value(cash) over (partition by code, account order by grouping asc rows unbounded preceding) as cash_repeat,
perf,
--pos,
first_value(pos) over (partition by code, account order by grouping asc rows unbounded preceding) as pos_repeat
from QC
;
See first_value() help here: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC

Correlated subquery in Postgres

I have a query like below to find the stock details of certain products.The query is working fine but i think it is not efficient and fast enough(DB: postgresql version 11).
There is a CTE "result_set"in this code where i need to find the "quantity of a product ordered"(qty_last_7d_from_oos_date) during the period between out of stock and last 7 days before out of stock date.Same like this i have to find the revenue also.
So what i did is wrote a same subquery two times one outputting the revenue and other the quantity which is not an efficient step.So someone have any suggestions on how to rewrite this and make it an efficient code.
WITH final as
(
SELECT product_id,product_name,item_sku,out_of_stock_at
,out_of_stock_at - INTERVAL '7 days' as previous_7_days
,back_in_stock_at
FROM oos_base
)
SELECT product_id,product_name,item_sku,out_of_stock_at,previous_7_days
,back_in_stock_at
,(SELECT coalesce(sum(i.qty_ordered), 0) AS qty_last_7d_from_oos_date
FROM ol.orders o
LEFT JOIN ol.items i ON i.order_id = o.order_id
LEFT JOIN ol.products p ON p.product_id = i.product_id AND i.store_id = p.store_id
WHERE o.order_state_2 IN('complete','processing')
AND f.product_id=p.product_id
AND o.created_at_order :: DATE BETWEEN f.previous_7_days::DATE AND COALESCE(f.out_of_stock_at::DATE,current_date)
)
,( SELECT coalesce(sum(i.row_amount_minus_discount_order), 0) AS rev_last_7d_from_oos_date
FROM ol.orders o
LEFT JOIN ol.items i ON i.order_id = o.order_id
LEFT JOIN ol.products p ON p.product_id = i.product_id AND i.store_id = p.store_id
WHERE o.order_state_2 IN('complete','processing')
AND f.product_id=p.product_id
AND o.created_at_order :: DATE BETWEEN f.previous_7_days::DATE AND COALESCE(f.out_of_stock_at::DATE,current_date)
)
FROM final f
In the above code the CTE "final" gives you two dates "out_of_stock_at" &
"previous_7_days". I want to find the quantity and revenue of a product based on this 2 dates means between "previous_7_days" & "out_of_stock_at".
Below query will give the quantity and revenue of the products but the period between "previous_7_days" & "out_of_stock_at"from the above CTE.
As of now i have used the below code two times to obtain the information of revenue and quantity.
SELECT coalesce(sum(i.qty_ordered), 0) AS qty ,
coalesce(sum(i.row_amount_minus_discount_order), 0)
FROM ol.orders o
LEFT JOIN ol.items i ON i.order_id = o.order_id
LEFT JOIN ol.products p ON p.product_id = i.product_id AND i.store_id = p.store_id
WHERE o.order_state_2 IN('complete','processing')
AND f.product_id=p.product_id
AND o.created_at_order :: DATE BETWEEN f.previous_7_days::DATE AND COALESCE(f.out_of_stock_at::DATE,current_date)

How do you organize this query by week

Here is my Query so far:
select one.week, total, comeback, round(comeback)::Numeric / total::numeric * 100 as comeback_percent
FROM
(
SELECT count(username) as total, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, date_trunc ('month', creation_date)::date AS week
FROM users u
left join entries e on u.id = e.user_id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) x
where row = 1
group by week
order by week asc
) one
join
(
SELECT count(username) as comeback, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, runs_completed, date_trunc ('month', creation_date)::date AS week
FROM entries e
left join users u on e.user_id = u.id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) y
where runs_completed > 1 and row = 1
group by week
order by week asc
) two
on one.week = two.week
What I want to accomplish, is return a line graph for users that have completed one run with us, grouped by week, and assign percentages for that week of anyone who has completed a second run EVER, not just within that week. Our funnel has improved by a factor of 5 since we started, yet the line graph that is produced does not show similar results.
I could be incorrectly joining them together, or there may be a cleaner way to use CTE or window functions to perform this query, I am open to any and all suggestions. Thanks!
If you need tables or further information, let me know. I'm happy to provide anything that may be needed.

SUM(CASE WHEN ...) returns a greater number than COUNT(DISTINCT..)

I have written a query in two models, but I can't figure out why the second query returns a greater number than the first one; while the number that the first one, COUNT(DISTINCT...) returns is correct:
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[])),
date_gen64 AS
(
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017', interval
'1 day') AS date) as days ORDER BY days)
SELECT cl.class_date AS c_date,
count(DISTINCT (CASE WHEN co.id = 1 THEN p.id END)),
count(DISTINCT (CASE WHEN co.id = 2 THEN p.id END))
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id AND co.id = 1
JOIN types ON cr.type_id = ANY (types.id)
RIGHT JOIN date_gen64 dg ON dg.days = cl.class_date
GROUP BY cl.class_date
ORDER BY cl.class_date
The above query returns 26 but following query returns 27!
The reason why I rewrote it with SUM is that the first query
was too slow. But my question is that why the second one counts more?
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))
SELECT tmpcl.days,
SUM(CASE WHEN tmp80.course_id = 1 THEN 1
ELSE 0 END),
SUM(CASE WHEN tmp80.course_id = 2 THEN 1
ELSE 0 END)
FROM (
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
interval '1 day') AS date) as days ORDER BY days) tmpcl
LEFT JOIN (
SELECT DISTINCT p.id AS "person_id",
cl.class_date AS c_date,
co.id AS "course_id"
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id
JOIN types ON cr.type_id = ANY (types.id)
WHERE co.id IN ( 1 , 2 )
) tmp80 ON tmpcl.days = tmp80.class_date
GROUP BY tmpcl.days
ORDER BY tmpcl.days
You can theoretically have multiple people enrolled in the same class on the same day. Indeed that would seem to be the main point of having classes. So each time there are multiple people assigned to the same class on the same day you can have a higher count than you would in your first query. Does that make sense?
You don't appear to be using p.id in that inner query so simply remove it and your counts should match.
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))
SELECT tmpcl.days,
SUM(CASE WHEN tmp80.course_id = 1 THEN 1
ELSE 0 END),
SUM(CASE WHEN tmp80.course_id = 2 THEN 1
ELSE 0 END)
FROM (
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
interval '1 day') AS date) as days ORDER BY days) tmpcl
LEFT JOIN (
SELECT DISTINCT cl.class_date AS c_date,
co.id AS "course_id"
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id
JOIN types ON cr.type_id = ANY (types.id)
WHERE co.id IN ( 1 , 2 )
) tmp80 ON tmpcl.days = tmp80.class_date
GROUP BY tmpcl.days
ORDER BY tmpcl.days

TSQL - COUNT number of rows in a different state than current row

It's kind of hard to explain, but from this example it should be clear.
Table TABLE:
Name State Time
--------------------
A 1 1/4/2012
B 0 1/3/2012
C 0 1/2/2012
D 1 1/1/2012
Would like to
select * from TABLE where state=1 order by Time desc
plus an additional column 'Skipped' containing the number of rows after one where state=1 in state 0, in other words the output should look like this:
Name State Time Skipped
A 1 1/4/2012 2 -- 2 rows after A where State != 1
D 1 1/1/2012 0 -- 0 rows after D where State != 1
0 should also be reported in case of 2 consecutive rows are in state = 1, i.e. there is nothing between these rows in a state other than 1.
It seems like CTE are must here, but can't figure out how to count rows where state != 1.
Any help will be appreciated.
(MS Sql Server 2008)
I've used a CTE to establish RowNo, so that you're not dependent on consecutive dates:
WITH CTE_Rows as
(
select name,state,time,
rowno = ROW_NUMBER() over (order by [time])
from MyTable
)
select name,state,time,
gap = isnull(r.rowno - x.rowno - 1,0)
from
CTE_Rows r
outer apply (
select top 1 rowno
from CTE_Rows sub
where sub.rowno < r.rowno and sub.state = 1
order by sub.rowno desc) x
where r.state = 1
If you just want to do it by date, then its simpler - just need an outer apply:
select name,state,r.time,
gap = convert(int,isnull(r.time - x.time - 1,0))
from
MyTable r
outer apply (
select top 1 time
from MyTable sub
where sub.time < r.time and sub.state = 1
order by sub.time desc) x
where r.state = 1
FYI the test data is used was created as follows:
create table MyTable
(Name char(1), [state] tinyint, [Time] datetime)
insert MyTable
values
('E',1,'2012-01-05'),
('A',1,'2012-01-04'),
('B',0,'2012-01-03'),
('C',0,'2012-01-02'),
('D',1,'2012-01-01')
Okay, here you go (it gets a little messy):
SELECT U.CurrentTime,
(SELECT COUNT(*)
FROM StateTable AS T3
WHERE T3.State=0
AND T3.Time BETWEEN U.LastTime AND U.CurrentTime) AS Skipped
FROM (SELECT T1.Time AS CurrentTime,
(SELECT TOP 1 T2.Time
FROM StateTable AS T2
WHERE T2.Time < T1.Time AND T2.State=1
ORDER BY T2.Time DESC) AS LastTime
FROM StateTable AS T1 WHERE T1.State = 1) AS U