postgresql pivot using crosstab

postgresql pivot using crosstab - postgresql

I have trouble using crosstab() in postgresql-11.
Here is my table,
CREATE TABLE monitor(tz timestamptz, level int, table_name text, status text);
The table monitors events on other tables. It contains
table_name (table on which the event occurred)
timestamp(time at which the event occurred)
level (level of the event)
status of the event (start/end of the event)
Here is the sample data to it.
tz | level | status | table_name
----------------------------------+-------+--------+--------------
2019-10-24 16:18:34.89435+05:30 | 2 | start | test_table_2
2019-10-24 16:18:58.922523+05:30 | 2 | end | test_table_2
2019-11-01 10:31:08.948459+05:30 | 3 | start | test_table_3
2019-11-01 10:41:22.863529+05:30 | 3 | end | test_table_3
2019-11-01 10:51:44.009129+05:30 | 3 | start | test_table_3
2019-11-01 12:35:23.280294+05:30 | 3 | end | test_table_3
Given a timestamp, I want to list out all current events at that time. It could be done using the criteria,
start_time >= 'given_timestamp' and end_time <= 'given_timestamp'
So I tried to use crosstab() to pivot the table over columns table_name,status and timestamp. My query is,
with q1 (table_name, start_time,end_time) as
(select * from crosstab
('select table_name, status, tz from monitor ')
as finalresult (table_name text, start_time timestamptz, end_time timestamptz)),
q2 (level,start_time,end_time) as
(select * from crosstab('select level, status, tz from monitor ')
as finalresult (level int, start_time timestamptz, end_time timestamptz))
select q1.table_name,q2.level,q1.start_time,q1.end_time
from q1,q2
where q1.start_time=q2.start_time;
The output of the query is,
table_name | level | start_time | end_time
--------------+-------+----------------------------------+----------------------------------
test_table_2 | 2 | 2019-10-24 16:18:34.89435+05:30 | 2019-10-24 16:18:58.922523+05:30
test_table_3 | 3 | 2019-11-01 10:31:08.948459+05:30 | 2019-11-01 10:41:22.863529+05:30
But my expected output is,
table_name | level | start_time | end_time
--------------+-------+----------------------------------+----------------------------------
test_table_2 | 2 | 2019-10-24 16:18:34.89435+05:30 | 2019-10-24 16:18:58.922523+05:30
test_table_3 | 3 | 2019-11-01 10:31:08.948459+05:30 | 2019-11-01 10:41:22.863529+05:30
test_table_3 | 3 | 2019-11-01 10:51:44.009129+05:30 | 2019-11-01 12:35:23.280294+05:30
How do I achieve the expected output? Or is there any better way other than crosstab?

I would use a self join for this. To keep the rows on the same level and table together you can use a window function to assign numbers to them so they can be distinguished.
with numbered as (
select tz, level, table_name, status,
row_number() over (partition by table_name, status order by tz) as rn
from monitor
)
select st.table_name, st.level, st.tz as start_time, et.tz as end_time
from numbered as st
join numbered as et on st.table_name = et.table_name
and et.status = 'end'
and et.level = st.level
and et.rn = st.rn
where st.status = 'start'
order by st.table_name, st.level;
This assumes that there will never be a row with status = 'end' and an earlier timestamp then the corresponding row with status = 'start'
Online example: https://rextester.com/QYJK57764

Related

Need help to merge overlapping time intervals

I need some help with merging overlapping time intervals if the interval not more than 4 minutes (for example only where id = 1).
I have the next table:
--------------------------------------
id | action | date
--------------------------------------
1 | started | 2020-08-18 13:51:02
1 | suspended | 2020-08-18 13:51:04
2 | started | 2020-08-18 13:52:14
2 | suspended | 2020-08-18 13:52:17
3 | started | 2020-08-18 13:52:21
3 | suspended | 2020-08-18 13:52:24
1 | started | 2020-08-18 13:57:21
1 | suspended | 2020-08-18 13:57:22
1 | started | 2020-08-18 15:07:56
1 | suspended | 2020-08-18 15:08:56
1 | started | 2020-08-18 15:09:11
1 | suspended | 2020-08-18 15:09:11
1 | started | 2020-08-18 15:09:11
1 | suspended | 2020-08-18 15:09:13
Expected result:
--------------------------------------
id | action | date
--------------------------------------
1 | started | 2020-08-18 13:51:02
1 | suspended | 2020-08-18 13:51:04
1 | started | 2020-08-18 13:57:21
1 | suspended | 2020-08-18 13:57:22
1 | started | 2020-08-18 15:07:56
1 | suspended | 2020-08-18 15:09:13
How it can be done? I will be very grateful for your help!

You want to eliminate suspended/start pairs that are for the same id and within 4 minutes. Use lag() and lead():
select t.*
from (select t.*,
lag(date) over (partition by id order by date) as prev_date,
lead(date) over (partition by id order by date) as next_date
from t
) t
where (action = 'start' and
prev_date > date - interval '4 minute'
) or
(action = 'suspended' and
next_date < date + interval '4 minute'
);
Date/time functions are notoriously database dependent. This is just adding or subtracting 4 minutes, which any database can do but the syntax might vary.

You're wanting to filter out certain rows, what is common with the rows you are removing?
It seems you want the first 'started' and last 'suspended' rows. Can you just ignore 'started' rows if there is another 'started' row in the previous 4 minutes, and ignore 'suspended' rows if there is another 'suspended row in the next 4 minutes?
from my_table a
where action = 'started' and not exists (
select 1 from my_table b
where b.id = a.id and b.action = 'started'
and datediff(minute, b.date, a.date) <= 4 -- row exists in the previous 4 min
)
Ditto for 'suspended' but the other way. That doesn't work if the difference between the last 'started' and 'suspended' is > 4 minutes though, but that can be overcome with another condition to check for no start within the last 4 minutes.

If you need to get overlapping intervals duration not more than 4 minutes, can use this query:
--cte where creating groups with time intervals
with base_cte as
(
select Tab.id,Tab.NumGr,Tab.date,
Tab.action from
(
select * from
(
--selecting only values where time difference <= 4 min
select *,sum(TimeDiff)over(partition by id,NumGr order by date rows unbounded preceding)SumTimeInterval from
(
--creating a group
select sum(Num)over(partition by id order by date rows unbounded preceding )NumGr, * from
(
select date,lead(date)over(partition by id order by date)lead_date,id,action,
lead(action)over(partition by id order by date)lead_action,
--split intervals between overlaps (240seconds)
iif(TimeDiff>240,1,0)Num,TimeDiff from
(
--find time difference in seconds between current and next date (partition by id)
select datediff(second,date,LEAD(date)over(partition by id order by date))TimeDiff,* from Table
)A
)B
)C
--selecting only pairs within time intervals
where TimeDiff<=240
--checking duration interval:<=4 min
)D where SumTimeInterval<=240
)E
CROSS JOIN LATERAL
(values (id,NumGr,date,action),
(id,NumGr,lead_date,lead_action)
)Tab(id,NumGr,date,action)
)
--selectig data with start/end overlapping time interval
select id,date,action from base_cte base
where date
in (select max(date) from base_cte where NumGr=base.NumGr)
or date in
(select min(date) from base_cte where NumGr=base.NumGr)

MySQL group by timestamp difference

I need to write mysql query which will group results by difference between timestamps.
Is it possible?
I have table with locations and every row has created_at (timestamp) and I want to group results by difference > 1min.
Example:
id | lat | lng | created_at
1. | ... | ... | 2020-05-03 06:11:35
2. | ... | ... | 2020-05-03 06:11:37
3. | ... | ... | 2020-05-03 06:11:46
4. | ... | ... | 2020-05-03 06:12:48
5. | ... | ... | 2020-05-03 06:12:52
Result of this data should be 2 groups (1,2,3) and (4,5)

It depends on what you actually want. If youw want to group together records that belong to the same minute, regardless of the difference with the previous record, then simple aggregation is enough:
select
date_format(created_at, '%Y-%m-%d %H:%i:00') date_minute,
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from mytable
group by date_minute
On the other hand, if you want to build groups of consecutive records that have less than 1 minute gap in between, this is a gaps and islands problem. Here is on way to solve it using window functions (available in MySQL 8.0):
select
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from (
select
t.*,
sum(case when created_at < lag_created_at + interval 1 minute then 0 else 1 end)
over(order by created_at) grp
from (
select
t.*,
lag(created_at) over(order by created_at) lag_created_at
from mytable t
) t
) t
group by grp

Limiting/Aggregating Results When Selecting on an ID followed by a CASE

Trying to pull some data from a Log table.
Simplified version of the query I'm using below:
SELECT
type_id AS type_id
, CASE WHEN LogOperation = 'Insert' THEN CreateDT ELSE NULL END AS create_date
, CASE WHEN (AuditID > 0 OR AuditID IS NOT NULL) AND OldVal_AuditID IS NULL THEN LogTimeStamp ELSE NULL END AS audit_date
FROM TypeLOG
ORDER BY type_id
Sample result set:
|---------|-------------|------------|
| type_id | create_date | audit_date |
|---------|-------------|------------|
| 176 | 2019-06-01 | NULL |
| 177 | 2019-06-01 | NULL |
| 177 | NULL | 2019-06-03 |
| 178 | 2019-06-03 | NULL |
| 178 | NULL | 2019-06-04 |
| 178 | NULL | NULL |
How could I pull this data so only 1 row per type_id is shown? This is a log table so naturally other operations such as updates and deletes create multiple rows per type_id.
Related to this, there can theoretically be multiple 'Inserts' per type_id... I would want the first 'Insert' per type_id, I suppose based on the earliest LogTimeStamp.
How do I need to modify my query to reflect this?
Thanks!

You gotta do the CTE's or self joins it should look like that
WITH CTE1 AS (
SELECT TYPE_ID,
CASE WHEN LogOperation = 'Insert' THEN CreateDT ELSE NULL END AS create_date
FROM TypeLOG
), CTE2 (
SELECT TYPE_ID,
CASE WHEN (AuditID > 0 OR AuditID IS NOT NULL) AND OldVal_AuditID IS NULL THEN LogTimeStamp ELSE NULL END AS audit_date
FROM TypeLOG
)
SELECT CTE1.TYPE_ID, CREATE_DATE, AUDIT_DATE
FROM CTE1 JOIN CTE2
ON CTE1.TYPE_ID = CTE2.TYPE_ID
ORDER BY TYPE_ID

Get spare time out of stored activities start and end times

I am trying to implement a function that calculates the spare time out of stored activities start and end times. I implemented my database on PostgreSQL 9.5.3. This is how the activity table looks like
activity_id | user_id | activity_title | starts_at | ends_at
(serial) | (integer) | (text) | (timestamp without time zone) |(timestamp without time zone)
---------------------------------------------------------------------------------------------------------------------------
1 | 1 | Go to school | 2016-06-12 08:00:00 | 2016-06-12 14:00:00
2 | 1 | Visit my uncle | 2016-06-12 16:00:00 | 2016-06-12 17:30:00
3 | 1 | Go shopping | 2016-06-12 18:00:00 | 2016-06-12 21:15:00
4 | 1 | Go to Library | 2016-06-13 10:00:00 | 2016-06-13 12:00:00
5 | 1 | Install some programs on my laptop | 2016-06-13 18:00:00 | 2016-06-13 19:00:00
Actual table definition of my real table:
CREATE TABLE public.activity (
activity_id serial,
user_id integer NOT NULL,
activity_title text,
starts_at timestamp without time zone NOT NULL,
start_tz text NOT NULL,
ends_at timestamp without time zone NOT NULL,
end_tz text NOT NULL,
recurrence text NOT NULL DEFAULT 'none'::text,
lat numeric NOT NULL,
lon numeric NOT NULL,
CONSTRAINT pk_activity PRIMARY KEY (activity_id),
CONSTRAINT fk_user_id FOREIGN KEY (user_id)
REFERENCES public.users (user_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I want to calculate every day spare time for this user using PL/pgSQL function that takes (user_id INTEGER, range_start TIMESTAMP, range_end TIMESTAMP) as parameters. I want the output of this SQL statement:
SELECT * from calculate_spare_time(1, '2016-06-12', '2016-06-13');
to be like this:
spare_time_id | user_id | starts_at | ends_at
(serial) | (integer) | (timestamp without time zone) |(timestamp without time zone)
----------------------------------------------------------------------------------------
1 | 1 | 2016-06-12 00:00:00 | 2016-06-12 08:00:00
2 | 1 | 2016-06-12 12:00:00 | 2016-06-12 16:00:00
3 | 1 | 2016-06-12 17:30:00 | 2016-06-12 18:00:00
4 | 1 | 2016-06-12 21:15:00 | 2016-06-13 00:00:00
5 | 1 | 2016-06-13 00:00:00 | 2016-06-13 10:00:00
6 | 1 | 2016-06-13 12:00:00 | 2016-06-13 18:00:00
7 | 1 | 2016-06-13 19:00:00 | 2016-06-14 00:00:00
I have the idea of subtracting the end time of one activity from the start time of the next activity happening on the same date, but I am stuck with implementing that with PL/pgSQL especially on how to deal with 2 rows in the same time.

To simplify things, I suggest to create a view - or better yet: a MATERIALZED VIEW showing gaps in the activities per user:
CREATE MATERIALIZED VIEW mv_gap AS
SELECT user_id, tsrange(a, z) AS gap
FROM (
SELECT user_id, ends_at AS a
, lead(starts_at) OVER (PARTITION BY user_id ORDER BY starts_at) AS z
FROM activity
) sub
WHERE z > a; -- weed out simple overlaps and the dangling "gap" till infinity
Note the range type tsrange.
ATTENTION: You mentioned possible overlaps, which complicate things. If one time range of a single user can be included in another, you need to do more! Merge time ranges to identify earliest start and latest end per block.
Remember to refresh the MV when needed.
Then your function can simply be:
CREATE OR REPLACE FUNCTION f_freetime(_user_id int, _from timestamp, _to timestamp)
RETURNS TABLE (rn int, gap tsrange) AS
$func$
SELECT row_number() OVER (ORDER BY g.gap)::int AS rn
, g.gap * tsrange(_from, _to) AS gap
FROM mv_gap g
WHERE g.user_id = _user_id
AND g.gap && tsrange(_from, _to)
ORDER BY g.gap;
$func$ LANGUAGE sql STABLE;
Call:
SELECT * FROM f_freetime(1, '2016-06-12 0:0', '2016-06-13 0:0');
Note the range operators * and &&.
Also note that I use a simple SQL function, after the problem has been simplified enough. If you need to add more, you might want to switch back to plpgsql and use RETURN QUERY ...
Or just use the query without function wrapper.
Performance
If you have many rows per user, to optimize query times, add an SP-GiST index (one reason to use a MV):
CREATE INDEX activity_gap_spgist_idx on mv_gap USING spgist (gap);
In addition to an index on (user_id).
Details in this related answer:
Perform this hours of operation query in PostgreSQL

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?

Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)

An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.

This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

postgresql pivot using crosstab - postgresql

Related

Need help to merge overlapping time intervals

MySQL group by timestamp difference

Limiting/Aggregating Results When Selecting on an ID followed by a CASE

Get spare time out of stored activities start and end times

Can window function LAG reference the column which value is being calculated?

Categories

Resources