Join together events from a table of start dates and end dates

Join together events from a table of start dates and end dates - tsql

I have one table that records whether an employee has clocked in and clocked out, but tracks those as separate rows.
Input:
+------------+------------+---------------+
| EmployeeID | Event_Date | Event_Type |
+------------+------------+---------------+
| 2450770 | 2020/01/02 | 'Clocked Out' | -- Doesn't have a clocked in time within desired time range
| 2195326 | 2020/01/06 | 'Clocked In' |
| 2195326 | 2020/01/10 | 'Clocked Out' |
| 800455 | 2020/01/15 | 'Clocked In' |
| 2450770 | 2020/01/15 | 'Clocked In' | -- No clock out time yet
| 800455 | 2020/01/22 | 'Clocked Out' |
| 2195326 | 2020/01/23 | 'Clocked In' |
| 2331340 | 2020/01/25 | 'Clocked In' |
| 2195326 | 2020/01/27 | 'Clocked Out' |
| 2331340 | 2020/02/01 | 'Clocked Out' |
| 2515957 | 2020/02/05 | 'Clocked In' |
| | | | -- etc
Desired Output:
+------------+------------+-------------+
| EmployeeID | Clocked_In | Clocked_Out |
+------------+------------+-------------+
| 2195326 | 2020/01/06 | 2020/01/10 |
| 800455 | 2020/01/15 | 2020/01/22 |
| 2450770 | 2020/01/15 | NULL |
| 2195326 | 2020/01/23 | 2020/01/27 |
| 2331340 | 2020/01/25 | 2020/02/01 |
+------------+------------+-------------+
This problem seems simple enough, but I can't quite figure it out: How do I join those Clocked In and Clocked Out events together, such that I can see anyone who clocked in on January, with their corresponding clocked out date (if it exists) within no date restriction? Complications include people without matching clocked-in and clocked-out dates, people clocking in and out within different months, clocking in/out multiple times in a month. As far as I can tell, there are no cases of people clocking in twice in a row (or out twice in a row).

Something like this gets the row numbers of your data, partitioning by EmployeeID and ordering by Event_Date. Once you have that, then join on the employee id plus the next row number.
Doing it this way will handle most, if not all of your complications, including when the clocked in date is greater than the clocked out date.
DROP TABLE IF EXISTS #temp
CREATE TABLE #temp (EmployeeID VARCHAR(256), Event_Date DATE, Event_Type varchar(256))
INSERT INTO #temp
VALUES
( '2450770', '2020/01/02', 'Clocked Out'),
( '2195326', '2020/01/06', 'Clocked In'),
( '2195326', '2020/01/10', 'Clocked Out'),
( '800455', '2020/01/15', 'Clocked In'),
( '2450770', '2020/01/15', 'Clocked In'),
( '800455', '2020/01/22', 'Clocked Out'),
( '2195326', '2020/01/23', 'Clocked In'),
( '2331340', '2020/01/25', 'Clocked In'),
( '2195326', '2020/01/27', 'Clocked Out'),
( '2331340', '2020/02/01', 'Clocked Out'),
( '2515957', '2020/02/05', 'Clocked In')
DROP TABLE IF EXISTS #output
SELECT
EmployeeID,
Event_Date,
Event_Type,
ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Event_Date) AS RN
INTO #output
FROM #temp
SELECT
I.EmployeeID,
I.Event_Date AS [Clocked In],
O.Event_Date AS [Clocked Out]
FROM #output I
LEFT JOIN #output O ON O.EmployeeID + O.RN = I.EmployeeID + (I.RN + 1) AND O.Event_Type = 'Clocked Out'
WHERE I.Event_Type = 'Clocked In'

Related

2 Level pivot using Postgresql

I have a table whose schema along with data (table_name : raw_data) appears to be this :
name | category | clear_date |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | BAD | 2020-05-30 |
A | BAD | 2020-05-30 |
Now if I perform a "groupby" operation using the following statement :
SELECT name, category, date(clear_date), count(clear_date)
FROM raw_data
GROUP BY name, category, date(clear_date)
ORDER BY name
I get the following answer :
name | caetgory | date | count |
A | GOOD |2020-05-30 | 4 |
A | BAD |2020-05-30 | 1 |
A | BAD |2020-05-31 | 1 |
IN order to produce the pivot in following format :
name | category | 2020-05-30 | 2020-05-31 |
A | GOOD | 4 | NULL |
A | BAD | 1 | 1 |
I am using the following query :
select * from crosstab (
'select name, category, date(clear_date), count(clear_date) from raw_data group by name, category, date(clear_date) order by 1,2,3',
'select distinct date(clear_date) from raw_data order by 1'
)
as newtable (
node_name varchar, alarm_name varchar, "2020-05-30" integer, "2020-05-31" integer
)
ORDER BY name
But I am getting results as follows :
name | category | 2020-05-30 | 2020-05-31 |
A | BAD | 4 | 1 |
Can anyone please try to suggest how can i achieve the result mentioned above. It appears crosstab removes the duplicate entry of A automatically.

Not sure if this is possible using crosstab because you have a missing records in some dates. Here is an example how to get expected result but not sure is what you need. Anyway hope this helps.
SELECT r1.*, r2.counter AS "2020-05-30", r3.counter AS "2020-05-31"
FROM (
SELECT DISTINCT name, category
FROM raw_data
) AS r1
LEFT JOIN (
SELECT name, category, count(*) AS counter
FROM raw_data
WHERE clear_date = '2020-05-30'
GROUP BY name, category
) AS r2 ON (r2.category = r1.category AND r2.name = r1.name)
LEFT JOIN (
SELECT name, category, count(*) AS counter
FROM raw_data
WHERE clear_date = '2020-05-31'
GROUP BY name, category
) AS r3 ON (r3.category = r1.category AND r3.name = r1.name)
ORDER BY r1.category DESC;

postgreSQL - Sum hours by unique ID and more

I am trying to figure out how to SUM the time by unique ID (meaning only one per ID). Here is a markup of some of the data. I need to GROUP BY f_name, l_name and area. I also need group count (unique count of id) and group participants (just a simple count of the ids).
+----+------+-------+-------+------+--+
| ID | Time | fname | lname | Area | |
+----+------+-------+-------+------+--+
| 1 | 3:30 | Jeff | Chose | LA | |
| 1 | 3:30 | Jeff | Chose | LA | |
| 1 | 3:30 | Jeff | Chose | LA | |
| 2 | 4:00 | Jeff | Chose | LA | |
+----+------+-------+-------+------+--+
The data should look like:
+------+-------+-------+------+-------------+--------------------+
| Time | fname | lnam | Area | Group Count | Group Participants |
+------+-------+-------+------+-------------+--------------------+
| 7:30 | Jeff | Chose | LA | 2 | 4 |
| | | | | | |
+------+-------+-------+------+-------------+--------------------+
BONUS: If you can convert 3:30 to 3.5

SQL DEMO
WITH cte as (
SELECT fname, lname, Area,
COUNT(DISTINCT id) as "Group Count",
COUNT(id) as "Group Participants",
SUM(DISTINCT "time"::time) as "Time"
FROM Table1
GROUP BY fname, lname, Area, id
)
SELECT fname, lname, Area,
SUM("Group Count") as "Group Count",
SUM("Group Participants") as "Group Participants",
SUM("Time"::time) as "Time"
FROM cte
GROUP BY fname, lname, Area;
OUTPUT

How to query just the last record of every second within a period of time in postgres

I have a table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585'.
I can quite easily to select a period using
SELECT uid, price, unit from prices
WHERE dt > '2017-05-01 00:00:00.000'
AND dt < '2017-05-01 02:59:59.999'
What I can't understand how to select price for every last record in each second. (I also need a very first one of each second too, but I guess it will be a similar separate query). There are some similar example (here), but they did not work for me when I try to adapt them to my needs generating errors.
Could some please help me to crack this nut?

Let say that there is a table which has been generated with a help of this command:
CREATE TABLE test AS
SELECT timestamp '2017-09-16 20:00:00' + x * interval '0.1' second As my_timestamp
from generate_series(0,100) x
This table contains an increasing series of timestamps, each timestamp differs by 100 milliseconds (0.1 second) from neighbors, so that there are 10 records within each second.
| my_timestamp |
|------------------------|
| 2017-09-16T20:00:00Z |
| 2017-09-16T20:00:00.1Z |
| 2017-09-16T20:00:00.2Z |
| 2017-09-16T20:00:00.3Z |
| 2017-09-16T20:00:00.4Z |
| 2017-09-16T20:00:00.5Z |
| 2017-09-16T20:00:00.6Z |
| 2017-09-16T20:00:00.7Z |
| 2017-09-16T20:00:00.8Z |
| 2017-09-16T20:00:00.9Z |
| 2017-09-16T20:00:01Z |
| 2017-09-16T20:00:01.1Z |
| 2017-09-16T20:00:01.2Z |
| 2017-09-16T20:00:01.3Z |
.......
The below query determines and prints the first and the last timestamp within each second:
SELECT my_timestamp,
CASE
WHEN rn1 = 1 THEN 'First'
WHEN rn2 = 1 THEN 'Last'
ELSE 'Somwhere in the middle'
END as Which_row_within_a_second
FROM (
select *,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp
) rn1,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp DESC
) rn2
from test
) xx
WHERE 1 IN (rn1, rn2 )
ORDER BY my_timestamp
;
| my_timestamp | which_row_within_a_second |
|------------------------|---------------------------|
| 2017-09-16T20:00:00Z | First |
| 2017-09-16T20:00:00.9Z | Last |
| 2017-09-16T20:00:01Z | First |
| 2017-09-16T20:00:01.9Z | Last |
| 2017-09-16T20:00:02Z | First |
| 2017-09-16T20:00:02.9Z | Last |
| 2017-09-16T20:00:03Z | First |
| 2017-09-16T20:00:03.9Z | Last |
| 2017-09-16T20:00:04Z | First |
| 2017-09-16T20:00:04.9Z | Last |
| 2017-09-16T20:00:05Z | First |
| 2017-09-16T20:00:05.9Z | Last |
A working demo you can find here

Select all columns from two tables

Lets say I have the following:
table_a
| id | date | order_id | sku | price |
--------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 |
table_b
| id | date | order_id | description | type | notes | valid |
-------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | test | AA | | true |
I want to get get all columns from both tables, so the resulting table looks like this:
| id | date | order_id | sku | price | description | type | notes | valid |
---------------------------------------------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 | | | | |
---------------------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | | | test | AA | | true |
I tried union:
(
SELECT *
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT *
from table_b
where table_b.date > Date('today')
)
But I get a:
ERROR: each UNION query must have the same number of columns
How can this be fixed / is there another way to do this?

Easily :)
(
SELECT id, date, order_id, sku, price, NULL AS description, NULL AS type, NULL AS notes, NULL AS valid
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT id, date, order_id, NULL AS sku, NULL AS price, description, type, notes, valid
from table_b
where table_b.date > Date('today')
)

Alternatively, instead of UNION you can just JOIN them:
SELECT *
FROM table_a A
JOIN table_b B USING ( id )
WHERE A.date > TIMESTAMP 'TODAY'
AND B.date > TIMESTAMP 'TODAY';
See more options: https://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-JOIN

Filling gaps in postgresql

I have Actions table, which has rows ordered by time
| time | session |
|----------|-----------|
| 16:10:10 | session_1 |
| 16:13:05 | null |
| 16:16:43 | null |
| 16:23:12 | null |
| 16:24:01 | session_2 |
| 16:41:32 | null |
| 16:43:56 | session_3 |
| 16:51:22 | session_4 |
I want to write a select which will put previous meaningful value instead of nulls
How to get this result with postgresql?
| time | session |
|----------|-----------|
| 16:10:10 | session_1 |
| 16:13:05 | session_1 |
| 16:16:43 | session_1 |
| 16:23:12 | session_1 |
| 16:24:01 | session_2 |
| 16:41:32 | session_2 |
| 16:43:56 | session_3 |
| 16:51:22 | session_4 |

update Actions a
set session = (
select session
from Actions
where time = (
select max(time) from Actions b
where b.time < a.time and session is not null
)
) where session is null;
I tried this with 'time' as int and 'session' as int [easier to add data].
drop table Actions;
create table Actions (time int, session int);
insert into Actions values (1,10),(2,null),(3,null),(4,2),(5,null),(6,3),(7,4);
select * from Actions order by time;
update Actions a ...;
select * from Actions order by time;
EDIT
Response to your modified question.
select a1.time, a2.session
from Actions a1
inner join
Actions a2
on a2.time = (
select max(time) from Actions b
where b.time <= a1.time and session is not null
)