Is it possible to merge pairs of rows with similar (<500 ms) datetime values, and left other rows as it is ?
I have following table of events:
ID DateTime FileName Event
=================================================
001 2011-04-04 12:30:15.000 File_A Deleted
002 2011-04-04 15:30:37.000 File_A Created
003 2011-04-05 08:30:25.000 File_A Deleted
004 2011-04-05 08:30:25.050 File_A Created
If I have pair of Deleted and Created events inside timespan of 500 ms, query should merge these two rows, and put Event as "Modified".
Result should be:
DateTime FileName Event
============================================
2011-04-04 12:30:15.000 File_A Deleted
2011-04-04 15:30:37.000 File_A Created
2011-04-05 08:30:25.000 File_A Modified
Thanks in advance..
SELECT t1.DateTime, t1.FileName, 'Modified' as Event
FROM Table t1
JOIN Table t2 ON DATEDIFF(millisecond, t1.DateTime, t2.DateTime) < 500
AND t1.Event = 'Deleted' AND t2.Event = 'Created'
UNION ALL
SELECT t3.DateTime, t3.FileName, t3.Event
FROM Table t3
WHERE NOT EXISTS(SELECT 1
FROM Table t4
JOIN Table t5 ON DATEDIFF(millisecond, t4.DateTime, t5.DateTime) < 500
AND t4.Event = 'Deleted' AND t5.Event = 'Created'
AND t3.ID IN (t4.ID, t5.ID)
)
...okay, there has to be a better way to pull the non-merged results back. But this should work?
Try this:
select cast(DATEPART(hour, DateColumn) as varchar(2)) + ':' + cast(DATEPART(minute, DateColumn) as varchar(2)) + ':' + cast(DATEPART(second, DateColumn) as varchar(2)) as DateTime,
max(filename) as FileName, Max(event) as event
from table
group by
cast(DATEPART(hour, DateColumn) as varchar(2)) + ':' + cast(DATEPART(minute, DateColumn) as varchar(2)) + ':' + cast(DATEPART(second, DateColumn) as varchar(2)) as DateTime
Related
I have a table in Postgres which looks like below:
CREATE TABLE my_features
(
id uuid NOT NULL,
feature_id uuid NOT NULL,
begin_time timestamptz NOT NULL,
duration integer NOT NULL
)
For each feature_id there may be multiple rows with time ranges specified by begin_time .. (begin_time + duration). duration is in milliseconds. They may overlap. I'm looking for a fast way to find all feature_ids that have any overlaps.
I have referred to this - Query Overlapping time range which is similar but works on a fixed time end time.
I have tried the below query but it is throwing an error.
Query:
select c1.*
from my_features c1
where exists (select 1
from my_features c2
where tsrange(c2.begin_time, c2.begin_time + '30 minutes'::INTERVAL, '[]') && tsrange(c1.begin_time, c1.begin_time + '30 minutes'::INTERVAL, '[]')
and c2.feature_id = c1.feature_id
and c2.id <> c1.id);
Error:
ERROR: function tsrange(timestamp with time zone, timestamp with time zone, unknown) does not exist
LINE 5: where tsrange(c2.begin_time, c2.begin_time...
I have used a default time interval here because I did not understand how to convert the time into minutes and substitute it with 'n minutes'.
If you need a solution faster than O(n²), then you can use constraints on ranges with btree_gist extension, possibly on a temporary table:
CREATE TEMPORARY TABLE my_features_ranges (
id uuid NOT NULL,
feature_id uuid NOT NULL,
range tstzrange NOT NULL,
EXCLUDE USING GIST (feature_id WITH =, range WITH &&)
);
INSERT INTO my_features_ranges (id, feature_id, range)
select id, feature_id, tstzrange(begin_time, begin_time+duration*'1ms'::interval)
from my_features
on conflict do nothing;
select id from my_features except select id from my_features_ranges;
Using OVERLAPS predicate:
SELECT * -- DISTINCT f1.*
FROM my_features f1
JOIN my_features f2
ON f1.feature_id = f2.feature_id
AND f1.id <> f2.id
AND (f1.begin_time::date, f1.begin_time::date + '30 minutes'::INTERVAL)
OVERLAPS (f2.begin_time::date, f2.begin_time::date + '30 minutes'::INTERVAL);
db<>fiddle demo
Or try this
select c1.*
from jak.my_features c1
where exists (select 1
from jak.my_features c2
where tsrange(c2.begin_time::date, c2.begin_time::date + '30 minutes'::INTERVAL, '[]') && tsrange(c1.begin_time::date, c1.begin_time::date + '30 minutes'::INTERVAL, '[]') and
c2.feature_id = c1.feature_id
and c2.id <> c1.id);
The problem was, I was using tsrange on a column with timezone and for timestamp with timezone, there exist another function called tstzrange
Below worked for me:
EDIT: Added changes suggested by #a_horse_with_no_name
select c1.*
from my_features c1
where exists (select 1
from my_features c2
where tstzrange(c2.begin_time, c2.begin_time + make_interval(secs => c2.duration / 1000), '[]') && tstzrange(c1.begin_time, c1.begin_time + make_interval(secs => c1.duration / 1000), '[]')
and c2.feature_id = c1.feature_id
and c2.id <> c1.id);
However, the part of calculating interval dynamically is still pending
I have a table punches that looks like this
EMP_ID INpunchDATETIME OUTpunchDATETIME
-----------------------------------------------
1 2017-11-10 11:59 2017-11-10 13:30
1 2017-11-10 9:00 2017-11-10 10:30
I need to create a table #temptable from the previous table that looks like this
Emp_ID InPunch1 InPunch2 OUTpunch1 OUTpunch2
----------------------------------------------------------------------------
1 2017-11-10 9:00 2017-11-10 11:59 2017-11-10 10:30 2017-11-10 13:30
I'm trying to use PIVOT but if that's wrong I can change
DECLARE #temptable Table (
EMP_ID int,
InPunch1 datetime,
InPunch2 datetime,
OutPunch1 datetime,
OutPunch2 datetime);
SELECT
Emp_ID, InPunch1, InPunch2, Outpunch1, Outpunch2
INTO
#temptable
FROM
(SELECT
EMP_ID, INPunchDATETIME, OUTpunchDATETIME
FROM
punches) AS p
PIVOT
(
That's as far as I've got.
Sample Data Setup
create table dbo.punches
(
emp_id int
, INpunchDATETIME datetime
, OUTpunchDATETIME datetime
)
insert into dbo.punches
values (1, '2017-11-10 11:59','2017-11-10 13:30')
, (1, '2017-11-10 9:00','2017-11-10 10:30')
Answer
The punches tables has the in/out punches in two separate column, and the inner most query moves both types of punches into one column to allow all of the data to be pivoted at once. The next query puts them in chronological order, and creates values in punch_ind that will be the eventual column names. Last step is to pivot the data and select the final output.
select post.emp_id
, post.InPunch1
, post.InPunch2
, post.OutPunch1
, post.OutPunch2
from (
--decide which punch is in1/in2/etc.
select sub.emp_id
, sub.punch_type + 'Punch' + cast(row_number() over (partition by sub.emp_id, sub.punch_type order by sub.punch_ts) as varchar(10)) as punch_ind --punch indicator
, sub.punch_ts
from (
--get all of the data in one column to enable pivot
select p.emp_id
, 'In' as punch_type
, p.INpunchDATETIME as punch_ts
from dbo.punches as p
union all
select p.emp_id
, 'Out' as punch_type
, p.OUTpunchDATETIME as punch_ts
from dbo.punches as p
) as sub
) as pre --before the pivot
pivot (max(pre.punch_ts) for pre.punch_ind in ([InPunch1], [InPunch2], [OutPunch1], [OutPunch2])) as post --after the pivot
Just take this final output and insert the records into the temp table/table variable of your choice.
I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.
I'm using PostgreSQL.
Table A
id startdate enddate
--------------------------
101 12/28/2013 12/31/2013
Table B
id startdate enddate
--------------------------
101 12/15/2013 12/15/2013
101 12/16/2013 12/16/2013
101 12/28/2013 12/28/2013
101 12/29/2013 12/31/2013
Desired Result
id startdate enddate
-------------------------
101 12/15/2013 12/16/2013
101 12/28/2013 12/31/2013
Right. I have a query that I think works. It certainly works on the sample records you provided. It uses a recursive CTE.
First, you need to merge the two tables. Next, use a recursive CTE to get the sequences of overlapping dates. Finally, get the start and end dates, and join back to the "merged" table to get the id.
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
select *, row_number() over (ORDER BY startdate) as rowid from
(select * from table1
UNION
select * from table2) a
),
path as ( -- the recursive CTE. This gets the sequences
select rowid as parent,rowid,startdate,enddate from allrecords a
union
select p.parent,b.rowid,b.startdate,b.enddate from allrecords b join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)
SELECT id,g.startdate,g.enddate FROM -- outer query to get the id
-- inner query to get the start and end of each sequence
(select parent,min(startdate) as startdate, max(enddate) as enddate from
(
select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
) a
where row_number = 1 -- We only want the first occurrence of each record
group by parent)g
INNER JOIN allrecords a on a.rowid = parent
The below fragment does what you intend. (but it will probably be very slow) The problem is that detecteng (non)overlapping dateranges is impossible with standard range operators, since a range could be split into two parts.
So, my code does the following:
split the dateranges from table_A into atomic records, with one date per record
[the same for table_b]
cross join these two tables (we are only interested in A_not_in_B, and B_not_in_A) , remembering which of the L/R outer join wings it came from.
re-aggregate the resulting records into date ranges.
-- EXPLAIN ANALYZE
--
WITH RECURSIVE ranges AS (
-- Chop up the a-table into atomic date units
WITH ar AS (
SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date AS thedate
, 'A'::text AS which
, a.id
FROM a
)
-- Same for the b-table
, br AS (
SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date AS thedate
, 'B'::text AS which
, b.id
FROM b
)
-- combine the two sets, retaining a_not_in_b plus b_not_in_a
, moments AS (
SELECT COALESCE(ar.id,br.id) AS id
, COALESCE(ar.which, br.which) AS which
, COALESCE(ar.thedate, br.thedate) AS thedate
FROM ar
FULL JOIN br ON br.id = ar.id AND br.thedate = ar.thedate
WHERE ar.id IS NULL OR br.id IS NULL
)
-- use a recursive CTE to re-aggregate the atomic moments into ranges
SELECT m0.id, m0.which
, m0.thedate AS startdate
, m0.thedate AS enddate
FROM moments m0
WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id AND nx.which = m0.which
AND nx.thedate = m0.thedate -1
)
UNION ALL
SELECT rr.id, rr.which
, rr.startdate AS startdate
, m1.thedate AS enddate
FROM ranges rr
JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
)
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
-- suppress partial subassemblies
WHERE nx.id = ra.id AND nx.which = ra.which
AND nx.startdate = ra.startdate
AND nx.enddate > ra.enddate
)
;
I'm attempting to convert data from two columns (one with text and one with numbers) to a range.
I've searched and unable to find something that works for this needed solution:
Table:
ColumnA Nvarchar(50)
ColumnB Int
Table Sample:
ColumnA ColumnB
AA 1
AA 2
AA 3
AA 4
AA 5
AB 1
AB 2
AB 3
AB 4
Desired Output:
AA:1-5, AB:1-4
Any help would be greatly appreciated
Note I am assuming the reason you're asking the question is that you can have broken ranges and you're not simply looking for the min/max ColumnB for each ColumnA.
If you ask me, this type of thing is probably best handled in code on either an intermediate layer or directly in your presentation layer. Sort the rows by (ColumnA, ColumnB) in your query, then you can get the desired results in a single pass as you read rows - by comparing the current values with the previous row, and outputting a row when either ColumnA changes or ColumnB is not adjacent.
However, if you're bent on doing this in SQL, you can use a recursive CTE. The basic premise would be to correlate each row with an adjacent row and hold on to the beginning value of ColumnB as you proceed. An adjacent row is defined as a row with the same value of ColumnA and the next value of ColumnB (i.e. the previous row + 1).
Something like the following ought to do:
;with cte as (
select a.ColumnA, a.ColumnB, a.ColumnB as rangeStart
from myTable a
where not exists ( --make sure we don't keep 'intermediate rows' as start rows
select 1
from myTable b
where b.ColumnA = a.ColumnA
and b.ColumnB = a.ColumnB - 1
)
union all
select a.ColumnA, b.ColumnB, a.rangeStart
from cte a
join myTable b on a.ColumnA = b.ColumnA
and b.ColumnB = a.ColumnB + 1 --correlate with 'next' row
)
select ColumnA, rangeStart, max(ColumnB) as rangeEnd
from cte
group by ColumnA, rangeStart
And given your sample data, indeed it does.
And for kicks, here is another Fiddle with data having gaps in ColumnB.
Note the group by clause for the continuous values by doing some math.
DECLARE #Data table (ColumnA Nvarchar(50), ColumnB Int)
INSERT #Data VALUES
('AA', 1),
('AA', 2),
('AA', 3),
--('AA', 4),
('AA', 5),
('AB', 1),
('AB', 2),
('AB', 3),
('AB', 4)
;WITH Ordered AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY ColumnA ORDER BY ColumnB) AS Seq,
*
FROM #Data
)
SELECT
ColumnA,
CASE
WHEN 1 = 0 THEN ''
-- if the ColumnA only has 1 row, the display is 1-1? or just 1?
--WHEN MIN(ColumnB) = MAX(ColumnB) THEN CONVERT(varchar(10), MIN(ColumnB))
ELSE CONVERT(varchar(10), MIN(ColumnB)) + '-' + CONVERT(varchar(10), MAX(ColumnB))
END AS Range
FROM Ordered
GROUP BY
ColumnA,
ColumnB - Seq -- The math
ORDER BY ColumnA, MIN(ColumnB)
SQL Fiddle
My requirement is as follows:
Am using Postgresql and ireport 4.0.1 for generating this report.
I've four tables like g_employee,g_year,g_period,g_salary, by joining these four tables and passing parameter are fromDate and toDate these parameter values like '01/02/14' between '01/05/14'.Based this parameters the displaying months will be vary in the headings as i shown in the below example:
EmpName
01/02/14 01/03/14 01/04/14 01/05/14
abc
2000 3000 3000 2000
Can anyone help me in this getting output?
What you're describing sounds like the number of columns would grow or shrink based on the number of months between the 2 parameters, which just doesn't work.
I don't know any way to add additional columns based on an interval between 2 parameters without a procedural code generated sql statement.
What is possible is:
emp_id1 period1 salary
emp_id1 period2 salary
emp_id1 period3 salary
epd_id1 period4 salary
emp_id2 period1 salary
emp_id2 period2 salary
emp_id2 period3 salary
epd_id2 period4 salary
generated with something like:
select g_employee_id,
g_period_start,
g_salary_amt
from g_employee, g_year, g_period, g_salary
where <join everything>
and g_period_start between date_param_1 and date_param_2
group by g_employee_id, g_period_start;
Hard to get more specific with out the table structure.
As the range between date_param_1 and date_param_2 grew, the number of rows would grow for each employee with pay in that "g_period"
EDIT - Other option:
The less dynamic option which requires more parameters would be:
select g_employee_id,
(select g_salary_amount
from g_period, g_salary
where g_period_id = g_salary_period_id
and g_salard_emp_id = g_employee_id
and g_period_start = <DATE_PARAM_1> ) as "DATE_PARAM_1_desc",
(select g_salary_amount
from g_period, g_salary
where g_period_id = g_salary_period_id
and g_salard_emp_id = g_employee_id
and g_period_start = <DATE_PARAM_2> ) as "DATE_PARAM_2_desc",
(select g_salary_amount
from g_period, g_salary
where g_period_id = g_salary_period_id
and g_salard_emp_id = g_employee_id
and g_period_start = <DATE_PARAM_3> ) as "DATE_PARAM_3_desc"
,..... -- dynamic not possible
from employee;
i create one table #g_employee and insert dummy data
create table #g_employee(empid int,yearid int,periodid int,salary int)
insert into #g_employee(empid,yearid,periodid,salary)
select 1,2014,02,2000
union
select 2,2014,02,2000
union
select 3,2014,02,2000
union
select 3,2014,03,2000
union
select 1,2014,03,3000
union
select 1,2014,04,4000
output query as per your requirement :
Solution 1 :
select empid, max(Case when periodid=2 and yearid=2014 then salary end) as '01/02/2014'
, max(Case when periodid=3 and yearid=2014 then salary end) as '01/03/2014'
, max(Case when periodid=4 and yearid=2014 then salary end) as '01/04/2014'
from #g_employee
group by empid
you can do with dynamic sql :
Solution 2 :
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(periodid)
from #g_employee
group by periodid
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT empid,' + #cols + ' from
(
select salary, periodid,empid
from #g_employee
) x
pivot
(
max(salary)
for periodid in (' + #cols + ')
) p '
execute(#query)
hope this will help