I have a PostgreSQL database in which users can order from a start date to an end date.
I want to know, for each day, how many users would be able to order.
Let's make an example, given:
id
from
to
A
2023-01-01
2023-01-03
B
2023-01-02
2023-01-07
C
2023-01-02
2023-01-09
D
2023-01-10
2023-01-12
For the first two weeks (let's suppose that the 01/01/2023 were on Monday), I would have (the third column is not needed):
day
n_of_users_that_can_order
who
2023-01-01
1
A
2023-01-02
3
A, B, C
2023-01-03
3
A, B, C
2023-01-04
2
B, C
2023-01-05
2
B, C
2023-01-06
2
B, C
2023-01-07
2
B, C
2023-01-08
1
C
2023-01-09
1
C
2023-01-10
1
D
2023-01-11
1
D
2023-01-12
0
2023-01-13
0
2023-01-14
0
My end result should be the above table aggregated per week:
week
total
2023-01-01 to 2023-01-07
15
2023-01-08 to 2023-01-14
4
I don't know how to do it with cube.dev, this is the idea I had for now:
ube(`Users`, {
sql: `SELECT * FROM public.users`,
measures: {
usersPerDayThatCanOrder: {
type: `count`,
sql: `id`,
rollingWindow: {
trailing: `1 day`,
offset: `start`
},
filters: [
{ sql: `${CUBE}.can_order_from >= ${TODAY} AND ${CUBE}.can_order_to <= ${TODAY}` }, // NOTE: today doesn't exist
]
}
},
dimensions: {
id: {
sql: `id`,
type: `number`,
primaryKey: true
},
canOrderFrom: {
sql: `can_order_from`,
type: `time`
},
canOrderTo: {
sql: `can_order_to`,
type: `time`
},
},
dataSource: `default`
});
But it does not work because I do not know how to have the real ${TODAY} value and also how to aggregate per week.
Here is a sql query which provides your expected result assuming that your table is named test :
SELECT d.week AS "week start"
, (d.week + interval '6 days') :: date AS "week end"
, sum( upper(daterange(d.week, (d.week + interval '1 week') :: date, '[)') * daterange(t."from", t."to", '[]'))
- lower(daterange(d.week, (d.week + interval '1 week') :: date, '[)') * daterange(t."from", t."to", '[]'))
)
FROM
( SELECT generate_series(min(date_trunc('week', "from")), max("to"), interval '1 week') :: date AS week
FROM test
) AS d
INNER JOIN test AS t
ON daterange(d.week, (d.week + interval '1 week') :: date, '[)') && daterange(t."from", t."to", '[]')
GROUP BY d.week
ORDER BY d.week
The subquery calculates the weeks start date covered by table test
The INNER JOIN clause intersects the weeks with the user date ranges
Then rows are concatenated by weeks and the number of days are summed for all the users
By the way, the 2023/01/01 seems not to be Monday but Sunday.
Result :
week start
week end
sum
2022-12-26
2023-01-01
1
2023-01-02
2023-01-08
15
2023-01-09
2023-01-15
4
see test result in dbfiddle
Related
I have a table that looks the following way
time
group
sub_group
count
2022-01-01
A
True
3
2022-01-01
A
False
1
2022-01-01
B
True
2
2022-01-01
B
False
1
2022-01-02
A
False
2
2022-01-02
A
True
5
2022-01-02
B
False
3
2022-01-03
A
False
3
2022-01-03
B
False
4
2022-01-03
B
True
3
So an increasing count per group+sub_group per day, unless on a day when a count did not change for a group+subgroup, the row is missing.
in the example above missing rows would be:
...
| 2022-01-02 | B | True | 2 |
...
| 2022-01-03 | A | True | 5 |
...
For ease of data handling, I need a continuous timestamp per day for all groups+sub_groups. So the result would look like this:
time
group
sub_group
count
2022-01-01
A
True
3
2022-01-01
A
False
1
2022-01-01
B
True
2
2022-01-01
B
False
1
2022-01-02
A
False
2
2022-01-02
A
True
5
2022-01-02
B
False
3
2022-01-02
B
True
2
2022-01-03
A
False
3
2022-01-03
A
True
5
2022-01-03
B
False
4
2022-01-03
B
True
3
How could I achieve this? Probably some parition by ... over select construct, but I can't wrap my head around how to partition by timestamps from other groups in this case, as I don't have the NULL counts to forward fill for each group as intermediate.
update:
So far, I seem to have the reached the intermediate state that filled the missing timestamps (basically just daily frequency is fine here) between groups like this:
with time_range as (
select min(time) as start_time, -- current_timestamp - interval '2 day'
max(time) as end_time
from my_table-- current_timestamp
),
interested_events as (
select e.group, e.sub_group, e.time, e.count
from my_table e
),
classes_having_events as (
select distinct group, sub_group
from interested_events
ORDER BY group, sub_group
),
periods as (
select ts as period_start, ts + interval '1 day' as period_end
from generate_series(
(select start_time from time_range),
(select end_time from time_range) - interval '1 second',
interval '1 day') ts
), resampled as (
SELECT period_start,
period_end,
classes_having_events.group,
classes_having_events.sub_group,
interested_events.count
FROM periods
CROSS JOIN classes_having_events
LEFT JOIN interested_events
ON time >= period_start AND time < period_end
AND interested_events.group = classes_having_events.group
AND interested_events.sub_group = classes_having_events.sub_group
ORDER BY period_start DESC
)
Okay, seems like I was pretty close and rubber duck debugging helped.
This seems to do what I wanted to have:
WITH time_range AS (
SELECT MIN(time) AS start_time, -- current_timestamp - interval '2 day'
MAX(time) AS end_time
FROM my_table-- current_timestamp
),
interested_events AS (
SELECT e.group, e.sub_group, e.time, e.count
FROM my_table e
),
classes_having_events AS (
SELECT DISTINCT
GROUP, sub_group
FROM interested_events
ORDER BY
GROUP, sub_group
),
periods AS (
SELECT ts AS period_start, ts + INTERVAL '1 day' AS period_end
FROM GENERATE_SERIES(
(
SELECT start_time
FROM time_range
),
(
SELECT end_time
FROM time_range
) - INTERVAL '1 second',
INTERVAL '1 day') ts
),
resampled AS (
SELECT period_start,
period_end,
classes_having_events.group,
classes_having_events.sub_group,
interested_events.count
FROM periods
CROSS JOIN classes_having_events
LEFT JOIN interested_events
ON time >= period_start AND time < period_end
AND interested_events.group = classes_having_events.group
AND interested_events.sub_group = classes_having_events.sub_group
ORDER BY period_start DESC
)
SELECT period_start AS time,
"group",
sub_group,
MAX(count) OVER (PARTITION BY "group", "sub_group" ORDER BY period_start) AS count
FROM resampled
ORDER BY period_start DESC, "group", sub_group;
6.Hive: Given a table t with schema (date, revenue), like this
6.Hive: Given a table t with schema (date, revenue), like this
date r
Jan. 1 100
Jan. 2 120
Jan. 3 80
Jan. 4 150
Jan. 5 50
What does the following query do?
SELECT t1.date AS date, sum(t2.revenue) AS revenue
FROM t as t1 JOIN t as t2 ON t2.date <= t1.date GROUP BY 1 ORDER BY 1
i have following query in postgresql for dates between 2 ranges.
select generate_series('2019-04-01'::timestamp, '2020-03-31', '1 month')
as g_date
I need to generate specific date in every month .i.e 15 th of every month. Following is my query to generate series
DO $$
DECLARE
compdate date = '2019-04-15';
BEGIN
CREATE TEMP TABLE tmp_table ON COMMIT DROP AS
select *,
case
when extract('day' from d) <> extract('day' from compdate) then 0
when ( extract('month' from d)::int - extract('month' from compdate)::int ) % 1 = 0 then 1
else 0
end as c
from generate_series('2019-04-01'::timestamp, '2020-03-31', '1 day') d;
END $$;
SELECT * FROM tmp_table
where c=1;
;
But every thing is perfect if input date between (1..29)-04-2019 ..
2019-04-25
2019-05-25
2019-06-25
2019-07-25
2019-08-25
2019-09-25
2019-10-25
2019-11-25
2019-12-25
2020-01-25
2020-02-25
2020-03-25
but if i give compdate: 31-04-2019 or 30-04-2019 giving out put:
2019-05-31
2019-07-31
2019-08-31
2019-10-31
2019-12-31
2020-01-31
2020-03-31
Expected Output:
date flag
2019-04-01 0 ----start_date
2019-04-30 1
2019-05-31 1
2019-06-30 1
2019-07-31 1
2019-08-31 1
2019-09-30 1
2019-10-31 1
2019-11-30 1
2019-12-31 1
2020-01-31 1
2020-02-29 1
2020-03-31 0 ---end_date
If matched day not found in the result it should take last day of that month..i.e if 31 not found in month of feb it
should take 29-02-2019 and also in april month instead of 31 it should take 2019-04-30.
Please suggest.
to generate the last days of the month, just generate first days & subtract a 1 day interval
example: the following generates all last day of month in the year 2010
SELECT x - interval '1 day' FROM
GENERATE_SERIES('2010-02-01', '2011-01-01', interval '1 month') x
You cannot accomplish what you want with generate_series. This results due to that process applying a fixed increment from the previous generated value. Your case 1 month. Now Postgres will successfully compute correct end-of-month date from 1 month to the next. So for example 1month from 31-Jan yields 28-Feb (or 29), because 31-Feb would be an invalid date, Postgres handles it. However, that same interval from 28-Feb gives the valid date 28-Mar so no end-of-month adjustment is needed. Generate_Series will return 28th of the month from then on. The same applies to 30 vs. 31 day months.
But you can achieve what your after with a recursive CTE by employing a varying interval to the same initial start date. If the resulting date is invalid for date the necessary end-of-month adjustment will be made. The following does that:
create or replace function constant_monthly_date
( start_date timestamp
, end_date timestamp
)
returns setof date
language sql strict
as $$
with recursive date_set as
(select start_date ds, start_date sd, end_date ed, 1 cnt
union all
select (sd + cnt*interval '1 month') ds, sd, ed, cnt+1
from date_set
where ds<end_date
)
select ds::date from date_set;
$$;
-- test
select * from constant_monthly_date(date '2020-01-15', date '2020-12-15' );
select * from constant_monthly_date(date '2020-01-31', date '2020-12-31' );
Use the least function to get the least one between the computed day and end of month.
create or replace function test1(day int) returns table (t timestamptz) as $$
select least(date_trunc('day', t) + make_interval(days => day-1), date_trunc('day', t) + interval '1 month' - interval '1 day') from generate_series('2019-04-01', '2020-03-31', interval '1 month') t
$$ language sql;
select test1(31);
There is one table:
ID DATE
1 2017-09-16 20:12:48
2 2017-09-16 20:38:54
3 2017-09-16 23:58:01
4 2017-09-17 00:24:48
5 2017-09-17 00:26:42
..
The result I need is the last 7-days of data with hourly aggregated count of rows:
COUNT DATE
2 2017-09-16 21:00:00
0 2017-09-16 22:00:00
0 2017-09-16 23:00:00
1 2017-09-17 00:00:00
2 2017-09-17 01:00:00
..
I tried different stuff with EXTRACT, DISTINCT and also used the generate_series function (most stuff from similar stackoverflow questions)
This try was the best one currently:
SELECT
date_trunc('hour', demotime) as date,
COUNT(demotime) as count
FROM demo
GROUP BY date
How to generate hourly series for 7 days and fill-in the count of rows?
SQL DEMO
SELECT dd, count("demotime")
FROM generate_series
( current_date - interval '7 days'
, current_date
, '1 hour'::interval) dd
LEFT JOIN Table1
ON dd = date_trunc('hour', demotime)
GROUP BY dd;
To work from now and now - 7 days:
SELECT dd, count("demotime")
FROM generate_series
( date_trunc('hour', NOW()) - interval '7 days'
, date_trunc('hour', NOW())
, '1 hour'::interval) dd
LEFT JOIN Table1
ON dd = date_trunc('hour', demotime)
GROUP BY dd;
user timespent(in sec) date(in timestamp)
u1 10 t1(2015-08-15)
u1 20 t2(2015-08-19)
u1 15 t3(2015-08-28)
u1 16 t4(2015-09-06)
Above is the format of my table, which represents timespent by user on a course and it is ordered by timestamp. I want to get sum of timespent by a particular user, say u1 weekly in the format :
start_date end_date sum
2015-08-15 2015-08-21 30
2015-08-22 2015-08-28 15
2015-08-29 2015-09-04 0
2015-09-05 2015-09-11 16
The difficulty lies in the fact that the seven-day periods that you want to get are not regular weeks starting with Monday.
You can not therefore use standard functions to get the week number based on the date, and have to use your own weeks generator using generate_series().
Example data:
create table sessions (user_name text, time_spent int, session_date timestamp);
insert into sessions values
('u1', 10, '2015-08-15'),
('u1', 20, '2015-08-19'),
('u1', 15, '2015-08-28'),
('u1', 16, '2015-09-06');
The query for an arbitrary chosen period from 2015-08-15 to 2015-09-06:
with weeks as (
select d::date start_date, d::date+ 6 end_date
from generate_series('2015-08-15', '2015-09-06', '7d'::interval) d
)
select w.start_date, w.end_date, coalesce(sum(time_spent), 0) total
from weeks w
left join (
select start_date, end_date, coalesce(time_spent, 0) time_spent
from weeks
join sessions
on session_date between start_date and end_date
where user_name = 'u1'
) s
on w.start_date = s.start_date and w.end_date = s.end_date
group by 1, 2
order by 1;
start_date | end_date | total
------------+------------+-------
2015-08-15 | 2015-08-21 | 30
2015-08-22 | 2015-08-28 | 15
2015-08-29 | 2015-09-04 | 0
2015-09-05 | 2015-09-11 | 16
(4 rows)
select
ui,
date_trunc('week', the_date)::date as start_date,
date_trunc('week', the_date)::date + 6 as end_date,
sum(timespent) as "sum"
from t
group by 1, 2, 3
order by 1,2
Something like this (assuming that by timestamp you mean the data type timestamp).
In order to make the 1st day of the week to be Sunday, I added and extra day to "date" in the group by.
select (start_date - date_part('dow', start_date) * interval '1 day')::date start_date,
(start_date + (6 - date_part('dow', start_date)) * interval '1 day')::date end_date,
total_time_spent
from (
select min("date") start_date, sum(timespent) total_time_spent
from mytable
where user=u1
group by date_part('year', "date"), date_part('week', "date" + interval '1 day')) "tmp"
order by start_date
This is a more generic approach, for any date interval.