recreate historical ProducerID

recreate historical ProducerID - tsql

In a system that handles milk quality data, there is a table for lab results. Here is an example of what the data looks like:
This is well and good, except that the ProducerID can change over time. For instance, if a milk producer changes processors, a new ProducerID gets assigned to that milk producer. Unfortunately, this ID change phenomenon is not within control of the company that's using this system.
To track ProducerID changes this system has a table that records that information. Here is an example which corresponds with the lab data above:
The problem is, when a ProducerID change occurs, the ProducerID in the lab results table is updated to whatever the new ProducerID is on ALL the rows. I am tasked with adding another column to the lab results table, ProducerID_Corrected, and populating that using the data in the ProducerID changes table to reflect what the correct ProducerID was for each lab result.
However, I am unsure how to go about populating the ProducerID_Corrected column.
Here is what the lab results should look like:
Here is the code to recreate the 2 tables shown above:
select
OldProducerID,
NewProducerID,
EffectiveDate
into #ProducerIDChanges
from
(
values
(1451, 1539, '3/7/2017'),
(1539, 1650, '3/12/2017')
) d (OldProducerID, NewProducerID, EffectiveDate);
select * from #ProducerIDChanges;
select
ProducerID,
TestDate,
TestType,
TestValue,
ProducerID_Corrected = cast(null as int)
into #LabResults
from
(
values
(1650, '3/1/2017', 'butterfat', 4.7),
(1650, '3/6/2017', 'butterfat', 4.1),
(1650, '3/7/2017', 'butterfat', 3.9),
(1650, '3/8/2017', 'butterfat', 4.0),
(1650, '3/10/2017', 'butterfat', 4.5),
(1650, '3/12/2017', 'butterfat', 4.6)
) d (ProducerID, TestDate, TestType, TestValue);
select * from #LabResults;

Here is the solution used for this problem. It yields a #LabResults table that matches the screen shot of expected results in the original question.
--add and populate a column that can be used for joining to #LabResults
alter table #ProducerIDChanges add CurrentProducerID int;
go
while exists (select top 1 1
from #ProducerIDChanges a
inner join #ProducerIDChanges b
on isnull(a.CurrentProducerID, a.NewProducerID) = b.OldProducerID
and cast(a.EffectiveDate as date) < cast(b.EffectiveDate as date))
begin
update a
set a.CurrentProducerID = b.NewProducerID
from #ProducerIDChanges a
inner join #ProducerIDChanges b
on isnull(a.CurrentProducerID, a.NewProducerID) = b.OldProducerID
and cast(a.EffectiveDate as date) < cast(b.EffectiveDate as date)
end;
update #ProducerIDChanges
set CurrentProducerID = NewProducerID
where CurrentProducerID is null;
--create a temp table that holds effective and termination dates for each ProducerID
select
CurrentProducerID,
FormerProducerID,
EffectiveDate = isnull(effective, '1/1/1900'),
TerminationDate = isnull(termination, '1/1/3000')
into #ProducerIDsWithDates
from
(
select CurrentProducerID, FormerProducerID = OldProducerID, DateType = 'termination', DateValue = cast(dateadd(d, -1, EffectiveDate) as date) from #ProducerIDChanges
union all
select CurrentProducerID, FormerProducerID = NewProducerID, DateType = 'effective', DateValue = EffectiveDate from #ProducerIDChanges
) d
pivot (min(DateValue) for DateType in (effective, termination)) p;
--update ProducerID_Corrected column in #LabResults table
update l
set l.ProducerID_Corrected = d.FormerProducerID
from #LabResults l
inner join #ProducerIDsWithDates d
on l.ProducerID = d.CurrentProducerID
and l.TestDate between d.EffectiveDate and d.TerminationDate;

Related

SQL Group By that works in SQLite does not work in Postgres

This statement works in SQLite, but not in Postgres:
SELECT A.*, B.*
FROM Readings A
LEFT JOIN Offsets B ON A.MeterNum = B.MeterNo AND A.DateTime > B.TimeDate
WHERE A.MeterNum = 1
GROUP BY A.DateTime
ORDER BY A.DateTime DESC
The Readings table contains electric submeter readings each with a date stamp. The Offsets table holds an adjustment that the user enters after a failed meter is replaced with a new one that starts again at zero. Without the Group By statement the query returns a line for each meter reading with each prior adjustment made before the reading date while I only want the last adjustment.
All the docs I've seen on Group By for Postgres indicate I should be including an aggregate function which I don't need and can't use (The Reading column contains the Modbus string returned from the meter).

Just pick the latest reading in a derived table. In Postgres this can be done quite efficiently using distinct on ()
SELECT A.*, B.*
FROM readings A
left join (
select distinct on (meterno) o.*
from offsets o
order by o.meterno, o.timedate desc
) B ON A.MeterNum = B.MeterNo AND A.DateTime > B.TimeDate
WHERE A.meternum = 1
ORDER BY A.DateTime DESC
distinct on () will only return one row per meterno and this is the "latest" row due to the order by ... , timedate desc
The query might even be faster by pushing the condition on datetime > timedate into the derived table using a lateral join:
SELECT A.*, B.*
FROM readings A
left join lateral (
select distinct on (meterno) o.*
from offsets o
where a.datetime > o.timedeate
order by o.meterno, o.timedate desc
) B ON A.MeterNum = B.MeterNo
WHERE A.meternum = 1
ORDER BY A.DateTime DESC

Select Date and Count, Group By Date -- How to show Dates with NULL Counts?

SELECT
CAST(c.DT AS DATE) AS 'Date'
, COUNT(p.PatternID) AS 'Count'
FROM CalendarMain c
LEFT OUTER JOIN Pattern p
ON c.DT = p.PatternDate
INNER JOIN Result r
ON p.PatternID = r.PatternID
INNER JOIN Detail d
ON p.PatternID = d.PatternID
WHERE r.Type = 7
AND d.Panel = 501
AND CAST(c.DT AS DATE)
BETWEEN '20190101' AND '20190201'
GROUP BY CAST(c.DT AS DATE)
ORDER BY CAST(c.DT AS DATE)
The query above isn't working for me. It still skips days where the COUNT is NULL for it's c.DT.
c.DT and p.PatternDate are both time DateTime, although c.DT can't be NULL. It is actually the PK for the table. It is populated as DateTimes for every single day from 2015 to 2049, so the records for those days exist.
Another weird thing I noticed is that nothing returns at all when I join C.DT = p.PatternDate without a CAST or CONVERT to a Date style. Not sure why when they are both DateTimes.

There are a few things to talk about here. At this stage it's not clear what you're actually trying to count. If it's the number of "patterns" per day for the month of Jan 2019, then:
Your BETWEEN will also count any activity occurring on 1 Feb,
It looks like one pattern could have multiple results, potentially causing a miscount
It looks like one pattern could have multiple details, potentially causing a miscount
If one pattern has say 3 eligible results, and also 4 details, you'll get the cross product of them. Your count will be 12.
I'm going to assume:
you only want the distinct number of patterns, regardless of the number of details and results.
You only want January's activity
--Set up some dummy data
DROP TABLE IF EXISTS #CalendarMain
SELECT cast('20190101' as datetime) as DT
INTO #CalendarMain
UNION ALL SELECT '20190102' as DT
UNION ALL SELECT '20190103' as DT
UNION ALL SELECT '20190104' as DT
UNION ALL SELECT '20190105' as DT --etc etc
;
DROP TABLE IF EXISTS #Pattern
SELECT cast('1'as int) as PatternID
,cast('20190101 13:00' as datetime) as PatternDate
INTO #Pattern
UNION ALL SELECT 2,'20190101 14:00'
UNION ALL SELECT 3,'20190101 15:00'
UNION ALL SELECT 4,'20190104 11:00'
UNION ALL SELECT 5,'20190104 14:00'
;
DROP TABLE IF EXISTS #Result
SELECT cast(100 as int) as ResultID
,cast(1 as int) as PatternID
,cast(7 as int) as [Type]
INTO #Result
UNION ALL SELECT 101,1,7
UNION ALL SELECT 102,1,8
UNION ALL SELECT 103,1,9
UNION ALL SELECT 104,2,8
UNION ALL SELECT 105,2,7
UNION ALL SELECT 106,3,7
UNION ALL SELECT 107,3,8
UNION ALL SELECT 108,4,7
UNION ALL SELECT 109,5,7
UNION ALL SELECT 110,5,8
;
DROP TABLE IF EXISTS #Detail
SELECT cast(201 as int) as DetailID
,cast(1 as int) as PatternID
,cast(501 as int) as Panel
INTO #Detail
UNION ALL SELECT 202,1,502
UNION ALL SELECT 203,1,503
UNION ALL SELECT 204,1,502
UNION ALL SELECT 205,1,502
UNION ALL SELECT 206,1,502
UNION ALL SELECT 207,2,502
UNION ALL SELECT 208,2,503
UNION ALL SELECT 209,2,502
UNION ALL SELECT 210,4,502
UNION ALL SELECT 211,4,501
;
-- create some variables
DECLARE #start_date as date = '20190101';
DECLARE #end_date as date = '20190201'; --I assume this is an exclusive end date
SELECT cal.DT
,isnull(patterns.[count],0) as [Count]
FROM #CalendarMain cal
LEFT JOIN ( SELECT cast(p.PatternDate as date) as PatternDate
,COUNT(DISTINCT p.PatternID) as [Count]
FROM #Pattern p
JOIN #Result r ON p.PatternID = r.PatternID
JOIN #Detail d ON p.PatternID = d.PatternID
WHERE r.[Type] = 7
and d.Panel = 501
GROUP BY cast(p.PatternDate as date)
) patterns ON cal.DT = patterns.patternDate
WHERE cal.DT >= #start_date
and cal.DT < #end_date --Your code would have included 1 Feb, which I assume was unintentional.
ORDER BY cal.DT
;

After doing CTE Select Order By and then Update, Update results are not ordered the same (TSQL)

The code is roughly like this:
WITH cte AS
(
SELECT TOP 4 id, due_date, check
FROM table_a a
INNER JOIN table_b b ON a.linkid = b.linkid
WHERE
b.status = 1
AND due_date > GetDate()
ORDER BY due_date, id
)
UPDATE cte
SET check = 1
OUTPUT
INSERTED.id,
INSERTED.due_date
Note: the actual data has same due_date.
When I ran the SELECT statement only inside the cte, I could get the result, for ex: 1, 2, 3, 4.
But after the UPDATE statement, the updated results are: 4, 1, 2, 3
Why is this (order-change) happening?
How to keep or re-order the results back to 1,2,3,4 in this same 1 query?

In MSDN https://msdn.microsoft.com/pl-pl/library/ms177564(v=sql.110).aspx you can read that
There is no guarantee that the order in which the changes are applied
to the table and the order in which the rows are inserted into the
output table or table variable will correspond.
Thats mean you can't solve your problem with only one query. But you still can use one batch to do what you need. Because your output don't guarantee the order then you have to save it in another table and order it after update. This code will return your output values in order that you assume:
declare #outputTable table( id int, due_date date);
with cte as (
select top 4 id, due_date, check
from table_a a
inner join table_b b on a.linkid = b.linkid
where b.status = 1
and due_date > GetDate()
order by due_date, id
)
update cte
set check = 1
output inserted.id, inserted.due_date
into #outputTable;
select *
from #outputTable
order by due_date, id;

Grouping consecutive dates in PostgreSQL

I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.
I'm using PostgreSQL.
Table A
id startdate enddate
--------------------------
101 12/28/2013 12/31/2013
Table B
id startdate enddate
--------------------------
101 12/15/2013 12/15/2013
101 12/16/2013 12/16/2013
101 12/28/2013 12/28/2013
101 12/29/2013 12/31/2013
Desired Result
id startdate enddate
-------------------------
101 12/15/2013 12/16/2013
101 12/28/2013 12/31/2013

Right. I have a query that I think works. It certainly works on the sample records you provided. It uses a recursive CTE.
First, you need to merge the two tables. Next, use a recursive CTE to get the sequences of overlapping dates. Finally, get the start and end dates, and join back to the "merged" table to get the id.
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
select *, row_number() over (ORDER BY startdate) as rowid from
(select * from table1
UNION
select * from table2) a
),
path as ( -- the recursive CTE. This gets the sequences
select rowid as parent,rowid,startdate,enddate from allrecords a
union
select p.parent,b.rowid,b.startdate,b.enddate from allrecords b join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)
SELECT id,g.startdate,g.enddate FROM -- outer query to get the id
-- inner query to get the start and end of each sequence
(select parent,min(startdate) as startdate, max(enddate) as enddate from
(
select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
) a
where row_number = 1 -- We only want the first occurrence of each record
group by parent)g
INNER JOIN allrecords a on a.rowid = parent

The below fragment does what you intend. (but it will probably be very slow) The problem is that detecteng (non)overlapping dateranges is impossible with standard range operators, since a range could be split into two parts.
So, my code does the following:
split the dateranges from table_A into atomic records, with one date per record
[the same for table_b]
cross join these two tables (we are only interested in A_not_in_B, and B_not_in_A) , remembering which of the L/R outer join wings it came from.
re-aggregate the resulting records into date ranges.
-- EXPLAIN ANALYZE
--
WITH RECURSIVE ranges AS (
-- Chop up the a-table into atomic date units
WITH ar AS (
SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date AS thedate
, 'A'::text AS which
, a.id
FROM a
)
-- Same for the b-table
, br AS (
SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date AS thedate
, 'B'::text AS which
, b.id
FROM b
)
-- combine the two sets, retaining a_not_in_b plus b_not_in_a
, moments AS (
SELECT COALESCE(ar.id,br.id) AS id
, COALESCE(ar.which, br.which) AS which
, COALESCE(ar.thedate, br.thedate) AS thedate
FROM ar
FULL JOIN br ON br.id = ar.id AND br.thedate = ar.thedate
WHERE ar.id IS NULL OR br.id IS NULL
)
-- use a recursive CTE to re-aggregate the atomic moments into ranges
SELECT m0.id, m0.which
, m0.thedate AS startdate
, m0.thedate AS enddate
FROM moments m0
WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id AND nx.which = m0.which
AND nx.thedate = m0.thedate -1
)
UNION ALL
SELECT rr.id, rr.which
, rr.startdate AS startdate
, m1.thedate AS enddate
FROM ranges rr
JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
)
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
-- suppress partial subassemblies
WHERE nx.id = ra.id AND nx.which = ra.which
AND nx.startdate = ra.startdate
AND nx.enddate > ra.enddate
)
;

How do I get the max date value from each of my partitions

I have created a partition table which is partitioned based on years of a certain field
e.g. partition 1 - Year 2011 |
partition 2 - Year 2012 |
partition 3 - Year 2013
How can i show the max date for each partition
e.g. partition 1 - 2011/12/15 |
partition 2 - 2012/12/25 |
partition 3 - 2013/12/16

Just group the records by the year in the field you used to partition the table. More especifically, you can use this:
select year(my_date_field), max(my_date_field)
from my_partitioned_table
group by year(my_date_field)
order by year(my_date_field)
The order by above is not required, but gives you a nicely ordered result set.

Use max() function (Example).
Select max(y1) y1, max(y2) y2, max(y3) y3
From t

Let me try again. Now I think this query below will do the trick:
SELECT year(cast(rv.value as date)) _year,
p.partition_number,
max_date
FROM sys.partitions p
JOIN sys.indexes i
ON (p.object_id = i.object_id AND p.index_id = i.index_id)
JOIN sys.partition_schemes ps
ON (ps.data_space_id = i.data_space_id)
JOIN sys.partition_functions f
ON (f.function_id = ps.function_id)
LEFT JOIN sys.partition_range_values rv
ON (f.function_id = rv.function_id AND p.partition_number = rv.boundary_id)
JOIN sys.destination_data_spaces dds
ON (dds.partition_scheme_id = ps.data_space_id
AND dds.destination_id = p.partition_number)
JOIN sys.filegroups fg
ON (dds.data_space_id = fg.data_space_id)
INNER JOIN (
select year([partitioncol]) year,
max([partitioncol]) max_date
from TABLE1 group by year([partitioncol])
) table1
ON (table1.year=year(cast(rv.value as date)))
WHERE i.index_id < 2
AND i.object_id = Object_Id('TABLE1')
With my test data (which I will explain how to set up below), I got these results:
_year partition_number max_date
2011 1 2011-12-31 08:00:16.920
2012 2 2012-12-31 08:00:13.397
2013 3 2013-10-02 08:00:10.660
The key system table here is sys.partition_range_values. It gives us the range values for each partition that we join with the original agreggation query that returns the max dates per year.
To reproduce my results, follow the instructions in the link below (it's a post about creating partitioned tables). That will create and populate TABLE1.
http://www.mssqltips.com/sqlservertip/2888/how-to-partition-an-existing-sql-server-table/
Then run the query at the beggining of my post. I adapted it from a query presented by the link below:
http://www.sqlsuperfast.com/post/2011/02/22/T-SQL-Get-Partition-Details.aspx

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

recreate historical ProducerID - tsql

Related

SQL Group By that works in SQLite does not work in Postgres

Select Date and Count, Group By Date -- How to show Dates with NULL Counts?

After doing CTE Select Order By and then Update, Update results are not ordered the same (TSQL)

Grouping consecutive dates in PostgreSQL

How do I get the max date value from each of my partitions

Categories

Resources