Get spare time out of stored activities start and end times - postgresql

I am trying to implement a function that calculates the spare time out of stored activities start and end times. I implemented my database on PostgreSQL 9.5.3. This is how the activity table looks like
activity_id | user_id | activity_title | starts_at | ends_at
(serial) | (integer) | (text) | (timestamp without time zone) |(timestamp without time zone)
---------------------------------------------------------------------------------------------------------------------------
1 | 1 | Go to school | 2016-06-12 08:00:00 | 2016-06-12 14:00:00
2 | 1 | Visit my uncle | 2016-06-12 16:00:00 | 2016-06-12 17:30:00
3 | 1 | Go shopping | 2016-06-12 18:00:00 | 2016-06-12 21:15:00
4 | 1 | Go to Library | 2016-06-13 10:00:00 | 2016-06-13 12:00:00
5 | 1 | Install some programs on my laptop | 2016-06-13 18:00:00 | 2016-06-13 19:00:00
Actual table definition of my real table:
CREATE TABLE public.activity (
activity_id serial,
user_id integer NOT NULL,
activity_title text,
starts_at timestamp without time zone NOT NULL,
start_tz text NOT NULL,
ends_at timestamp without time zone NOT NULL,
end_tz text NOT NULL,
recurrence text NOT NULL DEFAULT 'none'::text,
lat numeric NOT NULL,
lon numeric NOT NULL,
CONSTRAINT pk_activity PRIMARY KEY (activity_id),
CONSTRAINT fk_user_id FOREIGN KEY (user_id)
REFERENCES public.users (user_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I want to calculate every day spare time for this user using PL/pgSQL function that takes (user_id INTEGER, range_start TIMESTAMP, range_end TIMESTAMP) as parameters. I want the output of this SQL statement:
SELECT * from calculate_spare_time(1, '2016-06-12', '2016-06-13');
to be like this:
spare_time_id | user_id | starts_at | ends_at
(serial) | (integer) | (timestamp without time zone) |(timestamp without time zone)
----------------------------------------------------------------------------------------
1 | 1 | 2016-06-12 00:00:00 | 2016-06-12 08:00:00
2 | 1 | 2016-06-12 12:00:00 | 2016-06-12 16:00:00
3 | 1 | 2016-06-12 17:30:00 | 2016-06-12 18:00:00
4 | 1 | 2016-06-12 21:15:00 | 2016-06-13 00:00:00
5 | 1 | 2016-06-13 00:00:00 | 2016-06-13 10:00:00
6 | 1 | 2016-06-13 12:00:00 | 2016-06-13 18:00:00
7 | 1 | 2016-06-13 19:00:00 | 2016-06-14 00:00:00
I have the idea of subtracting the end time of one activity from the start time of the next activity happening on the same date, but I am stuck with implementing that with PL/pgSQL especially on how to deal with 2 rows in the same time.

To simplify things, I suggest to create a view - or better yet: a MATERIALZED VIEW showing gaps in the activities per user:
CREATE MATERIALIZED VIEW mv_gap AS
SELECT user_id, tsrange(a, z) AS gap
FROM (
SELECT user_id, ends_at AS a
, lead(starts_at) OVER (PARTITION BY user_id ORDER BY starts_at) AS z
FROM activity
) sub
WHERE z > a; -- weed out simple overlaps and the dangling "gap" till infinity
Note the range type tsrange.
ATTENTION: You mentioned possible overlaps, which complicate things. If one time range of a single user can be included in another, you need to do more! Merge time ranges to identify earliest start and latest end per block.
Remember to refresh the MV when needed.
Then your function can simply be:
CREATE OR REPLACE FUNCTION f_freetime(_user_id int, _from timestamp, _to timestamp)
RETURNS TABLE (rn int, gap tsrange) AS
$func$
SELECT row_number() OVER (ORDER BY g.gap)::int AS rn
, g.gap * tsrange(_from, _to) AS gap
FROM mv_gap g
WHERE g.user_id = _user_id
AND g.gap && tsrange(_from, _to)
ORDER BY g.gap;
$func$ LANGUAGE sql STABLE;
Call:
SELECT * FROM f_freetime(1, '2016-06-12 0:0', '2016-06-13 0:0');
Note the range operators * and &&.
Also note that I use a simple SQL function, after the problem has been simplified enough. If you need to add more, you might want to switch back to plpgsql and use RETURN QUERY ...
Or just use the query without function wrapper.
Performance
If you have many rows per user, to optimize query times, add an SP-GiST index (one reason to use a MV):
CREATE INDEX activity_gap_spgist_idx on mv_gap USING spgist (gap);
In addition to an index on (user_id).
Details in this related answer:
Perform this hours of operation query in PostgreSQL

Related

PostgreSQL query to return free rooms for booking availability calender

I need bit of a help in writing an SQL query.
A simple scenario is that I have a table named BookedRooms in which three columns are used most, checkInDate and checkOutDate, both are of type timestamp and roomId which is a foreign key to the Rooms table.
Now Rooms table has PK, name column and roomNo column.
This is BookedRooms table
+----+----------------------------+-------------------------+------------------+--+
| PK | checkInDate | checkOutDate | roomId | |
+----+----------------------------+-------------------------+------------------+--+
| 1 | 2022-05-26T00:00:00Z | 2022-05-29T00:00:00Z | 2 | |
| 2 | 2022-05-29T00:00:00Z | 2022-05-30T00:00:00Z | 3 | |
+----+----------------------------+-------------------------+------------------+--+
This is Rooms table
+----+------------+-------------------+--+
| PK | name | roomNo | |
+----+------------+-------------------+--+
| 2 | Deluxe | 102 | |
| 3 | King | 103 | |
+----+------------+-------------------+--+
Now, i wanna write a query in which if i put the month number like 4 , it tells me name and roomNo of Rooms which are free for each particular day of the month.
The logic to check if a room is occupied is that, if for example room 102 has a checkin date of 03 of month April and checkout date of 06 of month April , then the query will not include this room in the result set until the checkout date has come, only for that date and onwards would it include room 102 in the result set, again until this room appears in another checkInDate column somewhere.
Thank you
I recommend creating an exclusion constraint on bookedrooms. Not only can the GiST index that implements the constraint speed up the search you want, but it will also exclude double booking.
CREATE EXTENSION IF NOT EXISTS btree_gist;
ALTER TABLE bookedrooms ADD EXCLUDE USING gist (
tstzrange(checkindate, checkoutdate) WITH &&,
roomid WITH =
);
The query you need is
SELECT roomno FROM bookedrooms
EXCEPT
SELECT roomno FROM bookedrooms
WHERE tstzrange(checkindate, checkoutdate) &&
tstzrange(
date_trunc('year', current_timestamp) + INTERVAL '1 month' * 4,
date_trunc('year', current_timestamp) + INTERVAL '1 month' * (4 + 1)
);
&& is the "overlaps" operator for ranges.

Postgres time comparaison with time zone

Today I encounter a strange postgres behavious. Let me explain:
Here is my table I will work on.
=># \d planning_time_slot
Table "public.planning_time_slot"
Column | Type | Collation | Nullable | Default
-------------+---------------------------+-----------+----------+------------------------------------------------
id | integer | | not null | nextval('planning_time_slot_id_seq'::regclass)
planning_id | integer | | not null |
day | character varying(255) | | not null |
start_time | time(0) without time zone | | not null |
end_time | time(0) without time zone | | not null |
day_id | integer | | not null | 0
Indexes:
"planning_time_slot_pkey" PRIMARY KEY, btree (id)
"idx_a9e3f3493d865311" btree (planning_id)
Foreign-key constraints:
"fk_a9e3f3493d865311" FOREIGN KEY (planning_id) REFERENCES planning(id)
what i want to do is something like:
select * from planning_time_slot where start_time > (CURRENT_TIME AT TIME ZONE 'Europe/Paris');
But it seems like postgres is comparing time before the time zone conversion.
Here is my tests:
=># select * from planning_time_slot where start_time > (CURRENT_TIME AT TIME ZONE 'Europe/Paris');
id | planning_id | day | start_time | end_time | day_id
-----+-------------+-----+------------+----------+--------
157 | 6 | su | 16:00:00 | 16:30:00 | 0
(1 row)
=># select (CURRENT_TIME AT TIME ZONE 'Europe/Paris');
timezone
--------------------
16:35:48.591002+02
(1 row)
When I try with a lot of entries it appears that the comparaison is done between start_time and CURRENT_TIME without the time zone cast.
For your information I also tried :
select * from planning_time_slot where start_time > timezone('Europe/Paris', CURRENT_TIME);
It has the exact same result.
I also tried to change the column type to time(0) with time zone. It makes the exact same result.
One last important point. I really need to set timezone I want, because later on I will change it dynamically depending on other stuffs. So it will not be 'Europe/Paris' everytime.
Does anyone have a clue or a hint please ?
psql (PostgreSQL) 11.2 (Debian 11.2-1.pgdg90+1)
(CURRENT_TIME AT TIME ZONE 'Europe/Paris') is, for example, 17:52:17.872082+02. But internally it is 15:52:17.872082+00. Both time and timetz (time with time zone) are all stored as UTC, the only difference is timetz is stored with a time zone. Changing the time zone does not change what point in time it represents.
So when you compare it with a time...
# select '17:00:00'::time < '17:52:17+02'::timetz;
?column?
----------
f
That is really...
# select '17:00:00'::time < '15:52:17'::time;
?column?
----------
f
Casting a timetz to a time will lop off the time zone.
test=# select (CURRENT_TIME AT TIME ZONE 'Europe/Paris')::time;
timezone
-----------------
17:55:57.099863
(1 row)
test=# select '17:00:00' < (CURRENT_TIME AT TIME ZONE 'Europe/Paris')::time;
?column?
----------
t
Note that this sort of comparison only makes sense if you want to store the notion that a thing happens at 17:00 according to the clock on the wall. For example, if you had a mobile phone game where an event starts "at 17:00" meaning 17:00 where the user is. This is referred to as a "floating time zone".
Assuming day is "day of week", I suggest storing it as an integer. It's easier to compare and localize.
Instead of separate start and end times, consider a single timerange. Then you can use range operators.
I think you have deeper problems.
You have a day, a start time and an end time, but no notion of time zone. So this will mean something different depending on the time zone of the observer.
I think you should add a tz column that stores which time zone that information is in. Then you can get the start time like this:
WHERE (day + start_time) AT TIME ZONE tz > current_timestamp

postgresql pivot using crosstab

I have trouble using crosstab() in postgresql-11.
Here is my table,
CREATE TABLE monitor(tz timestamptz, level int, table_name text, status text);
The table monitors events on other tables. It contains
table_name (table on which the event occurred)
timestamp(time at which the event occurred)
level (level of the event)
status of the event (start/end of the event)
Here is the sample data to it.
tz | level | status | table_name
----------------------------------+-------+--------+--------------
2019-10-24 16:18:34.89435+05:30 | 2 | start | test_table_2
2019-10-24 16:18:58.922523+05:30 | 2 | end | test_table_2
2019-11-01 10:31:08.948459+05:30 | 3 | start | test_table_3
2019-11-01 10:41:22.863529+05:30 | 3 | end | test_table_3
2019-11-01 10:51:44.009129+05:30 | 3 | start | test_table_3
2019-11-01 12:35:23.280294+05:30 | 3 | end | test_table_3
Given a timestamp, I want to list out all current events at that time. It could be done using the criteria,
start_time >= 'given_timestamp' and end_time <= 'given_timestamp'
So I tried to use crosstab() to pivot the table over columns table_name,status and timestamp. My query is,
with q1 (table_name, start_time,end_time) as
(select * from crosstab
('select table_name, status, tz from monitor ')
as finalresult (table_name text, start_time timestamptz, end_time timestamptz)),
q2 (level,start_time,end_time) as
(select * from crosstab('select level, status, tz from monitor ')
as finalresult (level int, start_time timestamptz, end_time timestamptz))
select q1.table_name,q2.level,q1.start_time,q1.end_time
from q1,q2
where q1.start_time=q2.start_time;
The output of the query is,
table_name | level | start_time | end_time
--------------+-------+----------------------------------+----------------------------------
test_table_2 | 2 | 2019-10-24 16:18:34.89435+05:30 | 2019-10-24 16:18:58.922523+05:30
test_table_3 | 3 | 2019-11-01 10:31:08.948459+05:30 | 2019-11-01 10:41:22.863529+05:30
But my expected output is,
table_name | level | start_time | end_time
--------------+-------+----------------------------------+----------------------------------
test_table_2 | 2 | 2019-10-24 16:18:34.89435+05:30 | 2019-10-24 16:18:58.922523+05:30
test_table_3 | 3 | 2019-11-01 10:31:08.948459+05:30 | 2019-11-01 10:41:22.863529+05:30
test_table_3 | 3 | 2019-11-01 10:51:44.009129+05:30 | 2019-11-01 12:35:23.280294+05:30
How do I achieve the expected output? Or is there any better way other than crosstab?
I would use a self join for this. To keep the rows on the same level and table together you can use a window function to assign numbers to them so they can be distinguished.
with numbered as (
select tz, level, table_name, status,
row_number() over (partition by table_name, status order by tz) as rn
from monitor
)
select st.table_name, st.level, st.tz as start_time, et.tz as end_time
from numbered as st
join numbered as et on st.table_name = et.table_name
and et.status = 'end'
and et.level = st.level
and et.rn = st.rn
where st.status = 'start'
order by st.table_name, st.level;
This assumes that there will never be a row with status = 'end' and an earlier timestamp then the corresponding row with status = 'start'
Online example: https://rextester.com/QYJK57764

selecting records without value

I have a problem when I'm trying to reach the desired result. The task looks simple — make a daily count of occurrences of the event for top countries.
The main table looks like this:
id | date | country | col1 | col2 | ...
1 | 2018-01-01 21:21:21 | US | value 1 | value 2 | ...
2 | 2018-01-01 22:32:54 | UK | value 1 | value 2 | ...
From this table, I want to get daily event counts by the country, which is achieved by
SELECT date::DATE AT TIME ZONE 'UTC', country, COALESCE(count(id),0) FROM tab1
GROUP BY 1, 2
The problem comes when there is no event was made by an UK user on 2 January 2018
country_events
date | country | count
2018-01-01 | US | 23
2018-01-01 | UK | 5
2018-01-02 | US | 30
2018-01-02 | UK | 0 -> is desired result, but row is missing
I've tried to generate date series and series of countries which I'm looking for, then CROSS JOIN these two tables. This helper with columns date and country I've left joined with my result table like
SELECT * FROM helper h
LEFT JOIN country_events c ON c.date::DATE = h.date::DATE AND c.country = h.country
I'm using PostgreSQL.
You need an outer join, not a cross join:
SELECT tab1.date::date, tab1.country, coalesce(count(*), 0)
FROM generate_series(TIMESTAMP '2018-01-01 00:00:00',
TIMESTAMP '2018-01-31 00:00:00',
INTERVAL '1 day') AS ts(d)
LEFT JOIN tab1 ON tab1.date >= ts.d AND tab1.date < ts.d + INTERVAL '1 day'
GROUP BY tab1.date::date, tab1.country
ORDER BY tab1.date::date, tab1.country;
This will give the desired list for January 2018.

Grouping based on every N days in postgresql

I have a table that includes ID, date, values (temperature) and some other stuff. My table looks like this:
+-----+--------------+------------+
| ID | temperature | Date |
+-----+--------------+------------+
| 1 | 26.3 | 2012-02-05 |
| 2 | 27.8 | 2012-02-06 |
| 3 | 24.6 | 2012-02-07 |
| 4 | 29.6 | 2012-02-08 |
+-----+--------------+------------+
I want to perform aggregation queries like sum and mean for every 10 days.
I was wondering if it is possible in psql or not?
SQL Fiddle
select
"date",
temperature,
avg(temperature) over(order by "date" rows 10 preceding) mean
from t
order by "date"
select id,
temperature,
sum(temperature) over (order by "date" rows between 10 preceding and current row)
from the_table;
It might not exactly be what you want, as it will do a moving sum over the last 10 rows, which is not necessarily the same as the last 10 days.
Since Postgres 11, you can now use a range based on an interval
select id,
temperature,
avg(temperature) over (order by "date"
range between interval '10 days' preceding and current row)
from the_table;