How correctly test the date is within time intervals - postgresql

I have a timestamp of an user action. And several time intervals when the user have grants to perform the action.
I need to check wether the timestamp of this action is within at least one of the time intervals or not.
Table with users:
CREATE TABLE ausers (
id serial PRIMARY KEY,
user_name VARCHAR(255) default NULL,
action_date TIMESTAMP
);
INSERT INTO ausers VALUES(1,'Jhon', '2018-02-21 15:05:06');
INSERT INTO ausers VALUES(2,'Bob', '2018-05-24 12:22:26');
#|id|user_name|action_date
----------------------------------
1|1 |Jhon |21.02.2018 15:05:06
2|2 |Bob |24.05.2018 12:22:26
Table with grants:
CREATE TABLE user_grants (
id serial PRIMARY KEY,
user_id INTEGER,
start_date TIMESTAMP,
end_date TIMESTAMP
);
INSERT INTO user_grants VALUES(1, 1, '2018-01-01 00:00:01', '2018-03-01 00:00:00');
INSERT INTO user_grants VALUES(2, 1, '2018-06-01 00:00:01', '2018-09-01 00:00:00');
INSERT INTO user_grants VALUES(3, 2, '2018-01-01 00:00:01', '2018-02-01 00:00:00');
INSERT INTO user_grants VALUES(4, 2, '2018-02-01 00:00:01', '2018-03-01 00:00:00');
#|id|user_id|start_date |end_date
------------------------------------------------------
1|1 |1 |01.01.2018 00:00:01 |01.03.2018 00:00:00
2|2 |1 |01.06.2018 00:00:01 |01.09.2018 00:00:00
3|3 |2 |01.01.2018 00:00:01 |01.02.2018 00:00:00
4|4 |2 |01.02.2018 00:00:01 |01.03.2018 00:00:00
The query:
select u.user_name,
case
when array_agg(gr.range) #> array_agg(tstzrange(u.action_date, u.action_date, '[]')) then 'Yes'
else 'No'
end as "permition was granted"
from ausers u
left join (select tstzrange(ug.start_date, ug.end_date, '[]') as range, ug.user_id as uid
from user_grants ug) as gr on gr.uid = u.id
group by u.user_name;
Result:
#|user_name|permition was granted
---------------------------------
1|Bob |No
2|Jhon |No
Timestamp '01.02.2018 15:05:06' is within "01.01.2018 00:00:01, 01.03.2018 00:00:00" range, so "Bob" had grants to perform action and where should be "Yes" in first row, not "No".
The expected output is like this:
#|user_name|permition was granted
---------------------------------
1|Bob |Yes
2|Jhon |No
I tried to test like this:
select array_agg(tstzrange('2018-02-21 15:05:06', '2018-02-21 15:05:06', '[]')) <# array_agg(tstzrange('2018-01-01 00:00:01', '2018-03-01 00:00:01', '[]'));
#|?column?
----------
|false
Result is "false".
But if remove array_agg function
select tstzrange('2018-02-21 15:05:06', '2018-02-21 15:05:06', '[]') <# tstzrange('2018-01-01 00:00:01', '2018-03-01 00:00:01', '[]');
#|?column?
----------
|true
It works fine - the result is "true". Why? Whats's wrong with array_agg?
I have to use array_agg because I have several time intervals to compare.
I have to make "fake" time interval
array_agg(tstzrange(u.action_date, u.action_date, '[]'))
from one timestamp because operator #> doesn't allow to compare the timestamp and array of timestamps intervals.
How to compare that one date is within at least on time interval from the array of time intervals?

There are several #> operators in PostgreSQL:
tstzrange #> tstzrange tests if the first interval contains the second
anyarray #> anyarray tests if the first array contains all elements of the second array.
In your query that will test if for each interval in the second array there is an equal interval in the first array.
Therebis a way to test if an interval is contained in one of the elements of an array of intervals:
someinterval <# ANY (array_of_intervals)
but there is no straightforward way to express your condition with an operator.
Do without an aggregate, join the two tables on #> and count the result rows.

Since the all three dates aare scalar quantities Postgres range checking is not required, a simple BETWEEN operation suffices.
select au.user_name
, case when ug.user_id is null then 'No' else 'Yes' end authorized
from ausers au
left join user_grants ug
on ( au.id = ug.id
and au.action_date between ug.start_date and ug.end_date
);
BTW. I think your expected results posted are backwards. Neither user name has a timestamp of '01.02.2018 15:05:06' as indicated in the description.

Related

Postgres Find missing dates between a dataset of two columns

Im trying to create a query that retuns missing dates between two columns and multiple rows.
Example:
leases
move_in move_out hotel_id
2021-04-01 2021-04-14 1
2021-04-17 2021-04-30 1
2021-04-01 2021-04-14 2
2021-04-17 2021-04-30 2
Result should be
date hotel_id
2021-04-15 1
2021-04-16 1
2021-04-15 2
2021-04-16 2
You're finding the difference between two sets. One is the leased hotel days. The other is all days in the month of April. And you're doing this for all hotels.
We can make a set of all days of April for all hotels. First we need to build the set of all days in the month of April: generate_series('2022-04-01'::date, '2022-04-30'::date, '1 day').
Then we need to cross join this with all hotel IDs.
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
Now for each day we can left join this with the leases for that day.
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
Any days without a lease won't have a lease.id, so filter on that.
select day, hotels.id
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
left join leases on day between move_in and leases.move_out and hotel_id = hotels.id
where leases.id is null
order by hotels.id, day
Demonstration.
If you're using postgresql 14+ you can use multiranges to do this:
CREATE TEMP TABLE t (
"move_in" DATE,
"move_out" DATE,
"hotel_id" INTEGER
);
INSERT INTO t
("move_in", "move_out", "hotel_id")
VALUES ('2021-04-01', '2021-04-14', '1')
, ('2021-04-17', '2021-04-30', '1')
, ('2021-05-03', '2021-05-30', '1') -- added this as a test case
, ('2021-04-01', '2021-04-14', '2')
, ('2021-04-17', '2021-04-30', '2');
SELECT hotel_id, datemultirange(DATERANGE(MIN(move_in), MAX(move_out))) - range_agg(DATERANGE(move_in, move_out, '[]')) AS r
FROM t
GROUP BY hotel_id
returns
+--------+-------------------------------------------------+
|hotel_id|r |
+--------+-------------------------------------------------+
|2 |{[2021-04-14,2021-04-17)} |
|1 |{[2021-04-14,2021-04-17),[2021-04-30,2021-05-03)}|
+--------+-------------------------------------------------+
If you want to have 1 row per day you can use unnest and generate_series to expand the multiranges:
WITH available_ranges AS(
SELECT hotel_id, unnest(datemultirange(DATERANGE(MIN(move_in), MAX(move_out), '[]')) - range_agg(DATERANGE(move_in, move_out, '[]'))) AS r
FROM t
GROUP BY hotel_id
)
SELECT hotel_id, generate_series(lower(r), upper(r) - 1, '1 day'::interval)
FROM available_ranges
ORDER BY 1, 2
;
returns
+--------+---------------------------------+
|hotel_id|generate_series |
+--------+---------------------------------+
|1 |2021-04-15 00:00:00.000000 +00:00|
|1 |2021-04-16 00:00:00.000000 +00:00|
|1 |2021-05-01 00:00:00.000000 +00:00|
|1 |2021-05-02 00:00:00.000000 +00:00|
|2 |2021-04-15 00:00:00.000000 +00:00|
|2 |2021-04-16 00:00:00.000000 +00:00|
+--------+---------------------------------+

How to Shorten Execution Time for A View

I have 3 tables, a user table, an admin table, and a cust table. Both admin and cust tables are foreign keyed to the user_account table. Basically, every user has a user record, and the type of user they are is determined by if they have a record in the admin or the cust table.
user admin cust
user_id user_id | admin_id user_id | cust_id
--------- ---------|---------- ---------|---------
1 1 | a 2 | dd
2 4 | b 3 | ff
3
4
Then I have a login_history table that records the user_id and login timestamp every time a user logs into the app
login_history
user_id | login_on
---------|---------------------
1 | 2022-01-01 13:22:43
1 | 2022-01-02 16:16:27
3 | 2022-01-05 21:17:52
2 | 2022-01-11 11:12:26
3 | 2022-01-12 03:34:47
I would like to create a view that would contain all dates for the first day of each week in the year starting from jan 1st, and a count column that contains the count of unique admin users that logged in that week and a count of unique cust users that logged in that week. So the resulting view should contain the following 53 records, one for each week.
login_counts_view
week_start_date | admin_count | cust_count
-----------------|-------------|------------
2022-01-01 | 1 | 1
2022-01-08 | 0 | 2
2022-01-15 | 0 | 0
.
.
.
2022-12-31 | 0 | 0
Note that the first week (2022-01-01) only has 1 count for admin_count even though the admin with user_id 1 logged in twice that week.
Below is the current query I have for the view. However, the tables are pretty large and it takes over 10 seconds to retrieve all records from the view, mainly because of the left joined date comparisons.
CREATE VIEW login_counts_view AS
SELECT
week_start_dates.week_start_date::text AS week_start_date,
count(distinct a.user_id) AS admin_count,
count(distinct c.user_id) AS cust_count
FROM (
SELECT
to_char(i::date, 'YYYY-MM-DD') AS week_start_date
FROM
generate_series(date_trunc('year', NOW()), to_char(NOW(), 'YYYY-12-31')::date, '1 week') i
) week_start_dates
LEFT JOIN login_history l ON l.login_on::date BETWEEN week_start_dates.week_start_date::date AND (week_start_dates.week_start_date::date + INTERVAL '6 day')::date
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
GROUP BY week_start_date;
Does anyone have any tips as to how to make this query execute more efficiently?
Idea
Compute the pseudo-week of each login date: partition the year into 7-day slices and number them consecutively. The pseudo-week of a given date would be the ordinal number of the slice it falls into.
Then operate the joins on integers representing the pseudo-weeks instead of date values and comparisons.
Implementation
A view to implement this follows:
CREATE VIEW login_counts_view_fast AS
WITH RECURSIVE Numbers(i) AS ( SELECT 0 UNION ALL SELECT i + 1 FROM Numbers WHERE i < 52 )
SELECT CAST ( date_trunc('year', NOW()) AS DATE) + 7 * n.i week_start_date
, count(distinct lw.admin_id) admin_count
, count(distinct lw.cust_id) cust_count
FROM (
SELECT i FROM Numbers
) n
LEFT JOIN (
SELECT admin_id
, cust_id
, base
, pit
, pit-base delta
, (pit-base) / (3600 * 24 * 7) week
FROM (
SELECT a.user_id admin_id
, c.user_id cust_id
, CAST ( EXTRACT ( EPOCH FROM l.login_on ) AS INTEGER ) pit
, CAST ( EXTRACT ( EPOCH FROM date_trunc('year', NOW()) ) AS INTEGER ) base
FROM login_history l
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
) le
) lw
ON lw.week = n.i
GROUP BY n.i
;
Some remarks:
The epoch values are the number of seconds elapsed since an absolute base datetime (specifically 1/1/1970 0h00).
CASTS are necessary to convert doubles to integers and timestamps to dates as mandated by the signatures of postgresql date functions and in order to enforce integer arithmetics.
The recursive subquery is a generator of consecutive integers. It could possibly be replaced by a generate_series call (untested)
Evaluation
See it in action in this db fiddle
The query plan indicates savings of 50-70% in execution time.

confusion in using select command in postgresql with timestamp column

I have table which has structure like this.
CREATE TABLE users (
id serial NOT NULL,
created_at timestamp NOT NULL
)
I have more than 1000 records in this table.
This is my first query.
1 query
select id,created_at from users where id in (1051,1052)
This returns two rows which is correct as as expected. However when I use
2nd Query
select id,created_at from users where created_at = '2020-06-28'
or
select id,created_at from users where created_at = date '2020-06-28'
It returns nothing, this is not expected result as it should return two rows against it.
Similarly if I use this
3rd Query
select id, created_at from users where created_at between date '2020-06-28' and date '2020-06-28'
It returns nothing however I think this should also return two rows.
While this
4th Query
select id, created_at from users where created_at between date '2020-06-28' and date '2020-06-29'
returns two rows.
Show timezone returns correct timezong in which currently i am
I did`t understand this, why the results are different in 2nd, 3rd and 4th query. How can i get same result as of query 1 using 3rd query.
One single reason for all your queries is that you are comparing timestamp with date
in Query 2
You are comparing 2020-06-28 13:02:53 = 2020-06-28 00:00:00 which will not match so returning no records.
in Query 3
You are using between i.e. 2020-06-28 13:02:53 between 2020-06-28 00:00:00 and 2020-06-28 00:00:00 which will not match so returning no records.
in Query 4
You are using between i.e. 2020-06-28 13:02:53 between 2020-06-28 00:00:00 and 2020-06-29 00:00:00. Here both records are falling in those timestamps and you are getting the records
So you have to compare date values. As right operand is a date type value, you have to convert the left operand to date. try this
for 2nd Query
select id,created_at from users where date(created_at) = '2020-06-28'
for 3rd Query
select id, created_at from users where date(created_at) between date '2020-06-28' and date '2020-06-28'
You should opt 3rd method if you want to compare a date range. For one day only you should use 2nd query.
Because what you are doing is:
test(5432)=# select '2020-06-28'::timestamp;
timestamp
---------------------
06/28/2020 00:00:00
You are selecting for created_at that is exactly at midnight and there is none. The same thing when you do:
select id, created_at from users where created_at between date '2020-06-28' and date '2020-06-28'
You already corrected the mistake in your 3rd query in the 4th query:
select id, created_at from users where created_at between date '2020-06-28' and date '2020-06-29'
where you span the time from midnight of 06/28/2020 to midnight 06/29/2020
An alternate solution is:
create table dt_test(id integer, ts_fld timestamp);
insert into dt_test values (1, '07/04/2020 8:00'), (2, '07/05/2020 1:00'), (3, '07/05/2020 8:15');
select * from dt_test ;
id | ts_fld
----+---------------------
1 | 07/04/2020 08:00:00
2 | 07/05/2020 01:00:00
3 | 07/05/2020 08:15:00
select * from dt_test where date_trunc('days', ts_fld) = '07/05/2020'::date;
id | ts_fld
----+---------------------
2 | 07/05/2020 01:00:00
3 | 07/05/2020 08:15:00
In your case:
select id, created_at from users where date_trunc('days', created_at) = '2020-06-28'::date;

Difference between the max date and the penultimate max for specific employee - postgresql

Bit stuck on a problem. Trying to find the difference between two dates in postgreSQL.
I have a table emp with many employees in it:
emp_id, date
1, 31-10-2017
1, 08-08-2017
1, 02-06-2017
I want it to look like this:
emp_id, max_date, penultimate_date, difference
1, 31-10-2017, 08-08-2017, 84 days
Obviously you can use max(date) and group by the emp_id, however how do you retrieve the penultimate date. I have used a few functions like:
order by date desc limit 1 offset 1
I have also tried to put these in sub queries but that hasn,t worked as there are many employee numbers and I need one row for each employee.
Can anyone help???
Thanks,
pp84
as kindly suggested by #Haleemur Ali, order by date desc limit 1 offset 1 would not work with several emp_id:
t=# with d(emp_id, date)as (values(1, '31-10-2017'::date),(1, '08-08-2017'),(1, '02-06-2017' ),(2,'2016-01-01'),(2,'2016-02-02'),(2,'2016-03-03'))
select distinct emp_id
, max(date) over (partition by emp_id) max_date
, nth_value(date,2) over (partition by emp_id) penultimate_date
, max(date) over (partition by emp_id) - nth_value(date,2) over (partition by emp_id) diff
from d
;
emp_id | max_date | penultimate_date | diff
--------+------------+------------------+------
2 | 2016-03-03 | 2016-02-02 | 30
1 | 2017-10-31 | 2017-08-08 | 84
(2 rows)
Time: 0.756 ms
WITH emps (emp_id, date) AS (
VALUES (1, '2017-10-31'::DATE)
, (1, '2017-08-08'::DATE)
, (1, '2017-08-08'::DATE)
)
SELECT DISTINCT ON (emp_id)
emp_id
, "date" max_date
, LEAD("date") OVER w penultimate_date
, "date" - LEAD("date") OVER w difference
FROM emps
WINDOW w AS (PARTITION BY emp_id)
ORDER BY emp_id, date DESC
When ordered in descending order, the LEAD("date") w will give the value of the date value from the next row.
The DISTINCT ON limits the resultset to 1 row (the first row encountered) per emp_id.
With our ordering this first row must contain the greatest date, and the LEAD(...) over w therefore returns the penultimate date. This gives us the following result:
emp_id | max_date | penultimate_date | difference
--------+------------+------------------+------------
1 | 2017-10-31 | 2017-08-08 | 84
(1 row)

adding missing date in a table in PostgreSQL

I have a table that contains data for every day in 2002, but it has some missing dates. Namely, 354 records for 2002 (instead of 365). For my calculations, I need to have the missing data in the table with Null values
+-----+------------+------------+
| ID | rainfall | date |
+-----+------------+------------+
| 100 | 110.2 | 2002-05-06 |
| 101 | 56.6 | 2002-05-07 |
| 102 | 65.6 | 2002-05-09 |
| 103 | 75.9 | 2002-05-10 |
+-----+------------+------------+
you see that 2002-05-08 is missing. I want my final table to be like:
+-----+------------+------------+
| ID | rainfall | date |
+-----+------------+------------+
| 100 | 110.2 | 2002-05-06 |
| 101 | 56.6 | 2002-05-07 |
| 102 | | 2002-05-08 |
| 103 | 65.6 | 2002-05-09 |
| 104 | 75.9 | 2002-05-10 |
+-----+------------+------------+
Is there a way to do that in PostgreSQL?
It doesn't matter if I have the result just as a query result (not necessarily an updated table)
date is a reserved word in standard SQL and the name of a data type in PostgreSQL. PostgreSQL allows it as identifier, but that doesn't make it a good idea. I use thedate as column name instead.
Don't rely on the absence of gaps in a surrogate ID. That's almost always a bad idea. Treat such an ID as unique number without meaning, even if it seems to carry certain other attributes most of the time.
In this particular case, as #Clodoaldo commented, thedate seems to be a perfect primary key and the column id is just cruft - which I removed:
CREATE TEMP TABLE tbl (thedate date PRIMARY KEY, rainfall numeric);
INSERT INTO tbl(thedate, rainfall) VALUES
('2002-05-06', 110.2)
, ('2002-05-07', 56.6)
, ('2002-05-09', 65.6)
, ('2002-05-10', 75.9);
Query
Full table by query:
SELECT x.thedate, t.rainfall -- rainfall automatically NULL for missing rows
FROM (
SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
FROM tbl
) x
LEFT JOIN tbl t USING (thedate)
ORDER BY x.thedate
Similar to what #a_horse_with_no_name posted, but simplified and ignoring the pruned id.
Fills in gaps between first and last date found in the table. If there can be leading / lagging gaps, extend accordingly. You can use date_trunc() like #Clodoaldo demonstrated - but his query suffers from syntax errors and can be simpler.
INSERT missing rows
The fastest and most readable way to do it is a NOT EXISTS anti-semi-join.
INSERT INTO tbl (thedate, rainfall)
SELECT x.thedate, NULL
FROM (
SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
FROM tbl
) x
WHERE NOT EXISTS (SELECT 1 FROM tbl t WHERE t.thedate = x.thedate)
Just do an outer join against a query that returns all dates in 2002:
with all_dates as (
select date '2002-01-01' + i as date_col
from generate_series(0, extract(doy from date '2002-12-31')::int - 1) as i
)
select row_number() over (order by ad.date_col) as id,
t.rainfall,
ad.date_col as date
from all_dates ad
left join your_table t on ad.date_col = t.date
order by ad.date_col;
This will not change your table, it will just produce the result as desired.
Note that the generated id column will not contain the same values as the ID column in your table as it is merely a counter in the result set.
You could also replace the row_number() function with extract(doy from ad.date_col)
To fill the gaps. This will not reorder the IDs:
insert into t (rainfall, "date") values
select null, "date"
from (
select d::date as "date"
from (
t
right join
generate_series(
(select date_trunc('year', min("date")) from t)::timestamp,
(select max("date") from t),
'1 day'
) s(d) on t."date" = s.d::date
where t."date" is null
) q
) s
You have to fully re-create your table as indexes haves to change.
The better way to do it is to use your prefered dbi language, make a loop ignoring ID and putting values in a new table with new serialized IDs.
for day in (whole needed calendar)
value = select rainfall from oldbrokentable where date = day
insert into newcleanedtable date=day, rainfall=value, id=serialized
(That's not real code! Just conceptual to be adapted to your prefered scripting language)