Column of counts for time intervals - postgresql

I want to get a table that constructs a column that tracks how many times an id appears in a given week. If the id appears once it is given a 1, if it appears twice it is given a 2, but if it appears more than two times it is given a 0.
id date
a 2015-11-10
a 2015-11-25
a 2015-11-09
b 2015-11-10
b 2015-11-09
a 2015-11-05
b 2015-11-23
b 2015-11-28
b 2015-12-04
a 2015-11-10
b 2015-12-04
a 2015-12-07
a 2015-12-09
c 2015-11-30
a 2015-12-06
c 2015-10-31
c 2015-11-04
b 2015-12-01
a 2015-10-30
a 2015-12-14
the one week intervals are given as follows
1 - 2015-10-30 to 2015-11-05
2 - 2015-11-06 to 2015-11-12
3 - 2015-11-13 to 2015-11-19
4 - 2015-11-20 to 2015-11-26
5 - 2015-11-27 to 2015-12-03
6 - 2015-12-04 to 2015-12-10
7 - 2015-12-11 to 2015-12-17
The table should look like this.
id interval count
a 1 2
b 1 0
c 1 2
a 2 0
b 2 2
c 2 0
a 3 0
b 3 0
c 3 0
a 4 1
b 4 1
c 4 0
a 5 0
b 5 2
c 5 1
a 6 0
b 6 2
c 6 0
a 7 1
b 7 0
c 7 0
The interval column doesn't have to be there, I simply added it for clarity.
I am new to sql and am unsure how to break the dates into intervals. The only thing I have is grouping by date and counting.
Select id ,date, count (*) as frequency
from data_1
group by id, date having frequency <= 2;

Looking at just the data you provided, this does the trick:
SELECT v.id,
i.interval,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM (VALUES('a'), ('b'), ('c')) v(id)
CROSS JOIN generate_series(1, 7) i(interval)
LEFT JOIN (
SELECT id, ((date - '2015-10-30')/7 + 1)::int AS interval, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, interval)
ORDER BY 2, 1;
A few words of explanation:
You have three id values which are here recreated with a VALUES clause. If you have many more or don't know beforehand which id's to enumerate, you can always replace the VALUES clause with a sub-query.
You provide a specific date range over 7 weeks. Since you might have weeks where a certain id is not present you need to generate a series of the interval values and CROSS JOIN that to the id values above. This yields the 21 rows you are looking for.
Then you calculate the occurrences of ids in intervals. You can subtract a date from another date which will give you the number of days in between. So subtract the date of the row from the earliest date, divide that by 7 to get the interval period, add 1 to make the interval 1-based and convert to integer. You can then convert counts of > 2 to 0 and NULL to 0 with a combination of CASE and coalesce().
The query outputs the interval too, otherwise you will have no clue what the data refers to. Optionally, you can turn this into a column which shows the date range of the interval.
More flexible solution
If you have more ids and a larger date range, you can use the below version which first determines the distinct ids and the date range. Note that the interval is now 0-based to make calculations easier. Not that it matters much because instead of the interval number, the corresponding date range is displayed.
WITH mi AS (
SELECT min(date) AS min, ((max(date) - min(date))/7)::int AS intv FROM my_table)
SELECT v.id,
to_char((mi.min + i.intv * 7)::timestamp, 'YYYY-mm-dd') || ' - ' ||
to_char((mi.min + i.intv * 7 + 6)::timestamp, 'YYYY-mm-dd') AS period,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM mi,
(SELECT DISTINCT id FROM my_table) v
CROSS JOIN LATERAL generate_series(0, mi.intv) i(intv)
LEFT JOIN LATERAL (
SELECT id, ((date - mi.min)/7)::int AS intv, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, intv)
ORDER BY 2, 1;
SQLFiddle with both solutions.

Assuming you have a table of all users, this will do the trick.
select
users.id,
interval_table.id,
CASE
WHEN count(log_table.user_id)>2 THEN 0
ELSE count(log_table.user_id)
END
from users
cross join interval_table
left outer join log_table
on users.id = log_table.user_id
and log_table.event_date >= interval_table.start_interval
and log_table.event_date < interval_table.stop_interval
group by users.id, interval_table.id
order by interval_table.id, users.id
Check it out: http://sqlfiddle.com/#!15/1a822/21

Related

How to Shorten Execution Time for A View

I have 3 tables, a user table, an admin table, and a cust table. Both admin and cust tables are foreign keyed to the user_account table. Basically, every user has a user record, and the type of user they are is determined by if they have a record in the admin or the cust table.
user admin cust
user_id user_id | admin_id user_id | cust_id
--------- ---------|---------- ---------|---------
1 1 | a 2 | dd
2 4 | b 3 | ff
3
4
Then I have a login_history table that records the user_id and login timestamp every time a user logs into the app
login_history
user_id | login_on
---------|---------------------
1 | 2022-01-01 13:22:43
1 | 2022-01-02 16:16:27
3 | 2022-01-05 21:17:52
2 | 2022-01-11 11:12:26
3 | 2022-01-12 03:34:47
I would like to create a view that would contain all dates for the first day of each week in the year starting from jan 1st, and a count column that contains the count of unique admin users that logged in that week and a count of unique cust users that logged in that week. So the resulting view should contain the following 53 records, one for each week.
login_counts_view
week_start_date | admin_count | cust_count
-----------------|-------------|------------
2022-01-01 | 1 | 1
2022-01-08 | 0 | 2
2022-01-15 | 0 | 0
.
.
.
2022-12-31 | 0 | 0
Note that the first week (2022-01-01) only has 1 count for admin_count even though the admin with user_id 1 logged in twice that week.
Below is the current query I have for the view. However, the tables are pretty large and it takes over 10 seconds to retrieve all records from the view, mainly because of the left joined date comparisons.
CREATE VIEW login_counts_view AS
SELECT
week_start_dates.week_start_date::text AS week_start_date,
count(distinct a.user_id) AS admin_count,
count(distinct c.user_id) AS cust_count
FROM (
SELECT
to_char(i::date, 'YYYY-MM-DD') AS week_start_date
FROM
generate_series(date_trunc('year', NOW()), to_char(NOW(), 'YYYY-12-31')::date, '1 week') i
) week_start_dates
LEFT JOIN login_history l ON l.login_on::date BETWEEN week_start_dates.week_start_date::date AND (week_start_dates.week_start_date::date + INTERVAL '6 day')::date
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
GROUP BY week_start_date;
Does anyone have any tips as to how to make this query execute more efficiently?
Idea
Compute the pseudo-week of each login date: partition the year into 7-day slices and number them consecutively. The pseudo-week of a given date would be the ordinal number of the slice it falls into.
Then operate the joins on integers representing the pseudo-weeks instead of date values and comparisons.
Implementation
A view to implement this follows:
CREATE VIEW login_counts_view_fast AS
WITH RECURSIVE Numbers(i) AS ( SELECT 0 UNION ALL SELECT i + 1 FROM Numbers WHERE i < 52 )
SELECT CAST ( date_trunc('year', NOW()) AS DATE) + 7 * n.i week_start_date
, count(distinct lw.admin_id) admin_count
, count(distinct lw.cust_id) cust_count
FROM (
SELECT i FROM Numbers
) n
LEFT JOIN (
SELECT admin_id
, cust_id
, base
, pit
, pit-base delta
, (pit-base) / (3600 * 24 * 7) week
FROM (
SELECT a.user_id admin_id
, c.user_id cust_id
, CAST ( EXTRACT ( EPOCH FROM l.login_on ) AS INTEGER ) pit
, CAST ( EXTRACT ( EPOCH FROM date_trunc('year', NOW()) ) AS INTEGER ) base
FROM login_history l
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
) le
) lw
ON lw.week = n.i
GROUP BY n.i
;
Some remarks:
The epoch values are the number of seconds elapsed since an absolute base datetime (specifically 1/1/1970 0h00).
CASTS are necessary to convert doubles to integers and timestamps to dates as mandated by the signatures of postgresql date functions and in order to enforce integer arithmetics.
The recursive subquery is a generator of consecutive integers. It could possibly be replaced by a generate_series call (untested)
Evaluation
See it in action in this db fiddle
The query plan indicates savings of 50-70% in execution time.

How to calculate the number of messages within 10 seconds before the previous one?

I have a table with messages and I need to find chats where were two or more messages in period of 10 seconds. table
id message_id time
1 1 2021.11.10 13:09:00
1 2 2021.11.10 13:09:01
1 3 2021.11.10 13:09:50
2 1 2021.11.10 15:18:00
2 2 2021.11.10 15:20:00
3 1 2021.11.12 15:00:00
3 2 2021.11.12 15:10:00
3 2 2021.11.12 15:10:10
So the result looks like
id
1
3
I can't come up with the idea how to group by a period or maybe it can be done other way?
select id
from t
group by id, ?
having count(message_id) > 1
You can join the table with itself, matching them on the chat id and your timeframe.
create table messages (chat_id integer,message_id integer,"time" timestamp);
insert into messages values
(1,1,'2021.11.10 13:09:00'),
(1,2,'2021.11.10 13:09:01'),
(1,3,'2021.11.10 13:09:50'),
(2,1,'2021.11.10 15:18:00'),
(2,2,'2021.11.10 15:20:00'),
(3,1,'2021.11.12 15:00:00'),
(3,2,'2021.11.12 15:10:00'),
(3,2,'2021.11.12 15:10:10');
select target_chat,
target_message,
count(*) "number of messages preceding by no more than 10 seconds"
from
(select t1.chat_id target_chat,
t1.message_id target_message,
t1.time,
t2.chat_id,
t2.message_id,
t2.time
from messages t1
inner join messages t2
on t1.chat_id=t2.chat_id
and t1.message_id<>t2.message_id
and (t2.time<=t1.time-'10 seconds'::interval and t2.time<=t1.time)) a
group by 1,2;
-- target_chat | target_message | number of messages preceding by no more than 10 seconds
---------------+----------------+---------------------------------------------------------
-- 1 | 3 | 2
-- 2 | 2 | 1
-- 3 | 2 | 2
--(3 rows)
From that you can select the records with your desired number of preceding messages.
this is a simple query that finds every previous value that is included in our interval
select id from test_table t where
t.time + interval '10 second' >=
(select time from test_table where id=t.id and time>t.time limit 1)
group by id;
results
id
----
1
3
To find rows within an period of time, you can tipically use a window function which avoids a self join on the table :
SELECT id, count(*) OVER (ORDER BY time RANGE BETWEEN CURRENT ROW AND '10 minutes' FOLLOWING)
FROM t
GROUP BY id
Then you can use this query as a sub-query if you only want the id with count(*) > 1 :
SELECT DISTINCT ON (l.id) l.id
FROM
( SELECT id, count(*) OVER (ORDER BY time RANGE BETWEEN CURRENT ROW AND '10 minutes' FOLLOWING) AS ct
FROM t
GROUP BY id
) AS l
WHERE l.ct > 1 ;

Select query for fetching data on the interval of 4 hours

I am using postgres 8.1 database and I want to write a query which select data on the interval of 4 hours.
So as image show subscriber_id with date, this how currently data available in the database and
I want data like
No. of Subscriber | Interval
0 0-4
0 4-8
7 8-12
1 12-16
0 16-20
0 20-24
basically in each day we have 24 hours, if I divide 24/4=6 means I have total 6 intervals for each day
0-4
4-8
8-12
12-16
16-20
20-24
So I need count of subscribers within these intervals. Is there any data function in postgres which solve my problem or how can I write a query for this problem ?
NOTE : Please write your solution according to postgres 8.1 version
Use generate_series() to generate periods and left join date_time with appropriate periods, e.g.:
with my_table(date_time) as (
values
('2016-10-24 11:10:00'::timestamp),
('2016-10-24 11:20:00'),
('2016-10-24 15:10:00'),
('2016-10-24 21:10:00')
)
select
format('%s-%s', p, p+4) as "interval",
sum((date_time notnull)::int) as "no of subscriber"
from generate_series(0, 20, 4) p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
interval | no of subscriber
----------+------------------
0-4 | 0
4-8 | 0
8-12 | 2
12-16 | 1
16-20 | 0
20-24 | 1
(6 rows)
I wouldn't suppose that there is a live guy who remembers version 8.1. You can try:
create table periods(p integer);
insert into periods values (0),(4),(8),(12),(16),(20);
select
p as "from", p+4 as "to",
sum((date_time notnull)::int) as "no of subscriber"
from periods
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
from | to | no of subscriber
------+----+------------------
0 | 4 | 0
4 | 8 | 0
8 | 12 | 2
12 | 16 | 1
16 | 20 | 0
20 | 24 | 1
(6 rows)
In Postgres, you can do this by generating all the intervals for your time periods. This is a little tricky, because you have to pick out the dates in your data. However, generate_series() is really helpful.
The rest is just a left join and aggregation:
select dt.dt, count(t.t)
from (select generate_series(min(d.dte), max(d.dte) + interval '23 hour', interval '4 hour') as dt
from (select distinct date_trunc('day', t.t)::date as dte from t) d
) dt left join
t
on t.t >= dt.dt and t.t < dt.dt + interval '4 hour'
group by dt.dt
order by dt.dt;
Note that this keeps the period as the date/time of the beginning of the period. You can readily convert this to a date and an interval number, if that is more helpful.
The Previous Solution is also working...
Adding one more option,
Instead of creating a table for periods we can also use array and unnest function of arrays in this query
My code is
select
p as "from", p+4 as "to",
sum((date_time not null)::int) as "no of subscriber"
from unnest(ARRAY[0,4,8,12,16,20]) as p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
I think if you run six different queries since you know the time intervals (lower and upper limit) will be better.

PostgreSQL - time series interpolation

I have a table "performances" with columns "date" and "count".
However, the rows are sparse i.e. there are many days for which there is no row, which implicitly means that the count = 0.
Is there a query I can do that when run on this:
date count
2016-7-15 3
2016-7-12 1
2016-7-11 2
Would give me this:
date count
2016-7-15 3
2016-7-14 0
2016-7-13 0
2016-7-12 1
2016-7-11 2
?
You can use generate_series() and a left join:
with q as (<your query here>)
select s.dte, coalesce(q.count, 0) as count
from (select generate_series(min(q.date), max(q.date), interval '1 day') as dte
from q
) s left join
q
on s.dte = q.date;

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.