I have a table which contains id (unique), number of system (system_id), timestamp time_change and status. When status 1 it means, system unavailable, when 0 - available:
CREATE temp TABLE temp_data_test (
id int8 NULL,
system_id int8 NULL,
time_change timestamptz NULL,
status int4 NULL
);
INSERT INTO temp_data_test (id, system_id, time_change, status) VALUES
(53,1,'2022-04-02 13:57:07.000',1),
(54,1,'2022-04-02 14:10:26.000',0),
(55,1,'2022-04-02 14:28:45.000',1),
(56,1,'2022-04-02 14:32:19.000',0),
(57,1,'2022-04-05 03:20:18.000',1),
(58,3,'2022-04-05 03:21:18.000',1),
(59,2,'2022-04-05 03:21:22.000',1),
(60,2,'2022-04-06 02:27:15.000',0),
(61,3,'2022-04-06 02:27:15.000',0),
(62,1,'2022-04-06 02:28:17.000',0);
And a table date_dict with date (just one column date_of_day date).
As you can see, status doesn't change everyday. But I need a statistics for each calendar day for each system.
So for day, that are not in the table I need add 2 rows for each system. First with timestamp 'date 00:00:00' and status opposite to first nearest status with a higher date (may be not in that day, but tomorrow).
And second with timestamp 'date 23:59:59' with opposite to nearest lower date (today, yesterday etc).
For this table I need something like
id system_id time_change status
63 1 '2022-04-02 00:00:00' 0
64 1 '2022-04-02 23:59:59' 1
65 1 '2022-04-03 00:00:00' 0
66 1 '2022-04-03 23:59:59' 1-- cause system was available from 2 april
67 1 '2022-04-04 00:00:00' 0
68 1 '2022-04-04 23:59:59' 1
69 1 '2022-04-05 00:00:00' 0
70 1 '2022-04-05 23:59:59' 0--cause become unavailable in 2022-04-05 03:20:18
And so on for another system
I suppose, it can be divide into 2 parts with first row and second (00:00:00 and 23:59:59). My attempts lead to null in dates and I try to group by date, which didn't work as I can see.
Related
In general, I have something similar to the calendar.
In my database, I have repeatable events. To simplify work with them I generate time slots during which booking room will be reserved.
Table event
id long
room_uuid varchar
start_date timestamp
end_date time_stamp
repeat_every_min long
duration_min long
And another table:
Table event_time_slot
id long
event_id long (fk)
start_date timestamp
end_date time_stamp
How it looks like with mock data:
Table event mock data
id 1
room_uuid 267cb70a-6911-488c-aa9e-9deb506f785b
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:57:00"
repeat_every_min 15
duration_min 10
As result in the table event_time_slot I will have next records:
id 1
event_id 1
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:10:00"
____________________________________
id 2
event_id 1
start_date "2023-01-05 10:15:00"
end_date "2023-01-05 10:20:00"
____________________________________
id 3
event_id 1
start_date "2023-01-05 10:30:00"
end_date "2023-01-05 10:35:00"
____________________________________
id 4
event_id 1
start_date "2023-01-05 10:45:00"
end_date "2023-01-05 10:55:00"
Basically, I will generate time slots while
((startTime + N * duration) + repeatEveryMin) < endTime
My current flow to check will 2 repeatable events conflict or not is quite simple:
I generate time slots for event, and I do
select from event_time_slot ts
join event_time_slot its on its.event_id = ts_id
where
//condition that any of the saved slots overlaps with first generated slots
(its.start_date < (*endTime*) AND its.start_date > (*startTime*))
or
//condition that any of the saved slots overlaps (equal) with first generated slots
(its.start_date = (*endTime*) AND its.start_date = (*startTime*))
The problem is that it forces me to generate a lot of the time slots to execute this query.
Moreover, if I have event with 100 time_slots -> I will need to check that any of the previously saved event time slots do not overlap with 100 which I am going to save.
My question is:
Is in the Postgres any functionality, which can simplify working with repeatable events?
Is there any other technology, which solves this problem?
What I have tried:
To generate time slots for the event. The problem is that query is too complex and if I will have more than 5000 time slots for the 1 event -> I will need to do multiple queries to the DB, because I will receive memory error in my app.
Expecting to receive a feedback or a technology how Postgres can simplify current flow.
My primary question is - does Postgres have any functionality, to remove work with time slots at all?
For example - I pass startDate + endDate + repeatInterval to the query and SQL shows me overlapping events.
I want to avoid creating condition for every time_slot from event for which I want to check this
This query generates 4 time slots:
SELECT
tsrange(ts, ts + INTERVAL '10 MINUTE', '[)')
FROM generate_series(
'2023-01-05 10:00:00'::timestamp
, '2023-01-05 10:57:00'::timestamp
, INTERVAL '15 MINUTE') g(ts)
WHERE ts::time BETWEEN '09:00' AND '17:55' -- business hours
AND EXTRACT(DOW FROM ts) BETWEEN 1 AND 5 -- Monday to Friday
-- other conditions
Check the manual for all options you have with ranges, including the very powerful constraints to avoid overlapping events.
I need a query to return the initial and final numeric value of the number of listeners of some artists of the last 30 days ordered from the highest increase of listeners to the lowest.
To better understand what I mean, here are the tables involved.
artist table saves the information of a Spotify artist.
id
name
Spotify_id
1
Shakira
0EmeFodog0BfCgMzAIvKQp
2
Bizarrap
716NhGYqD1jl2wI1Qkgq36
platform_information table save the information that I want to get from the artists and on which platform.
id
platform
information
1
spotify
monthly_listeners
2
spotify
followers
platform_information_artist table stores information for each artist on a platform and information on a specific date.
id
platform_information_id
artist_id
date
value
1
1
1
2022-11-01
100000
2
1
1
2022-11-15
101000
3
1
1
2022-11-30
102000
4
1
2
2022-11-02
85000
5
1
2
2022-11-06
90000
6
1
2
2022-11-26
100000
Right now have this query:
SELECT (SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date ASC
LIMIT 1) as month_start,
(SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date DESC
LIMIT 1) as month_end,
(SELECT month_end - month_start) as diference
ORDER BY month_start;
Which returns the following:
month_start
month_end
difference
100000
102000
2000
The problem is that this query only returns the artist I specify.
And I need the information like this:
artist_id
name
platform_information_id
month_start_value
month_end_value
difference
2
Bizarrap
1
85000
100000
15000
1
Shakira
1
100000
102000
2000
The query should return the 5 artists that have grown the most in number of monthly listeners over the last 30 days, along with the starting value 30 days ago, and the current value.
Thanks for the help.
I have a table which contains id (unique), system id, timestamp and status.
When status 1 it means, system unavailable, when 0 - available:
CREATE temp TABLE temp_data_test (
id int8 NULL,
system_id int8 NULL,
time timestamptz NULL,
status int4 NULL
);
INSERT INTO temp_data_test (id, system, time, status) VALUES
(53,1,'2022-04-02 13:57:07.000',1),
(54,1,'2022-04-02 14:10:26.000',0),
(55,1,'2022-04-02 14:28:45.000',1),
(56,1,'2022-04-02 14:32:19.000',0),
(57,1,'2022-04-05 03:20:18.000',1),
(58,3,'2022-04-05 03:21:18.000',1),
(59,2,'2022-04-05 03:21:22.000',1),
(60,2,'2022-04-06 02:27:15.000',0),
(61,3,'2022-04-06 02:27:15.000',0),
(62,1,'2022-04-06 02:28:17.000',0);
It works like when system become unavailable we get 1, when become available -> 0.
I need to get a result table when can see how much hours each day each system was unavailable.
For this table result should be
date system available unavailable
2022-04-02 1 13:57:07+00:18:19+09:27:40 =23:43:06 23:59:59-23:43:06=..
2022-04-02 2 24 0
2022-04-02 3 24 0
2022-04-03 1 24 0
2022-04-03 2 24 0
2022-04-03 3 24 0
...
2022-04-05 1 03:20:18 23:59:59-03:20:18=..
2022-04-05 3 03:21:18 23:59:59-03:21:18=..
2022-04-05 2 03:21:22 23:59:59-03:21:22=..
2022-04-06 1 23:59:59-02:28:17=.. 02:28:17
2022-04-06 3 23:59:59-02:27:15=.. 02:27:15
2022-04-06 2 23:59:59-02:27:15=.. 02:27:15
I try do it with over partition by and recursion, but get more interval, than I need.
I wrote a sample query for calculating intervals and showing intervals as hours using your table structure:
with d_test as (
select
row_number() over (order by system_id, time) as r_num,
"time"::date as "onlydate",
"time",
system_id,
status
from temp_data_test
order by system_id, time
)
select avi.*, unavi."unavialible", unavi."unavialible_hours" from (
select
d1.system_id,
d1.onlydate,
sum(d2.time - d1.time) as "avialible",
(extract(day from sum(d2.time - d1.time) )*24 + extract(hour from sum(d2.time - d1.time)))::text || ' hours' as "avialible_hours"
from d_test d1
inner join d_test d2 on d1.r_num+1 = d2.r_num and d1.system_id = d2.system_id
where d1.status = 1
group by d1.system_id, d1.onlydate
order by d1.system_id
) avi
left join (
select
d1.system_id,
d1.onlydate,
sum(d2.time - d1.time) as "unavialible",
(extract(day from sum(d2.time - d1.time) )*24 + extract(hour from sum(d2.time - d1.time)))::text || ' hours' as "unavialible_hours"
from d_test d1
inner join d_test d2 on d1.r_num+1 = d2.r_num and d1.system_id = d2.system_id
where d1.status = 0
group by d1.system_id, d1.onlydate
order by d1.system_id
) unavi on avi.system_id = unavi.system_id and avi.onlydate = unavi.onlydate
Result of this query:
system_id
onlydate
avialible
avialible_hours
unavialible
unavialible_hours
1
2022-04-02
00:16:53
0 hours
2 days 13:06:18
61 hours
1
2022-04-05
23:07:59
23 hours
2
2022-04-05
23:05:53
23 hours
3
2022-04-05
1 day 02:05:57
26 hours
I have this table (products with information about start and end publish
):
SKU start_time end_time
id1 21.01.2020 14:10:00 22.01.2020 16:18:05
id1 23.01.2020 16:18:05 24.01.2020 19:03:14
id2 21.01.2020 16:18:05 21.01.2020 18:33:50
id3 25.01.2020 18:33:50 25.01.2020 19:03:14
and expect active products by days in two variants (not include comments in braces):
date active_sku active sku_end_of_day
21.01.2020 2 (id1,id2) 1 (id1)
22.01.2020 1 (id1) 0
23.01.2020 1 (id1) 1 (id1)
24.01.2020 1 (id1) 0
25.01.2020 1 (id3) 0
Below is for BigQuery Standard SQL
Assuming that start_time and end_time are timestamps data type - consider below
select date,
count(distinct SKU) as active_sku,
count(distinct if(offset = 0, null, SKU)) as active_sku_end_of_date
from `project.dataset.table`,
unnest(array_reverse(generate_date_array(date(start_time),date(end_time)))) date with offset
group by date
# order by date
if applied to sample data in your question - output is
In case if start_time and end_time are strings - you should use parse_timestamp() function to parse timestamp from string - as in below example
select date,
count(distinct SKU) as active_sku,
count(distinct if(offset = 0, null, SKU)) as active_sku_end_of_date
from `project.dataset.table`,
unnest(array_reverse(generate_date_array(date(parse_timestamp('%d.%m.%Y %T', start_time)),date(parse_timestamp('%d.%m.%Y %T', end_time))))) date with offset
group by date
# order by date
Here is the schema I'm working with
-- Table Definition ----------------------------------------------
CREATE TABLE transactions (
id BIGSERIAL PRIMARY KEY,
date date,
amount double precision,
category character varying,
full_category character varying,
transaction_id character varying,
created_at timestamp(6) without time zone NOT NULL,
updated_at timestamp(6) without time zone NOT NULL
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX transactions_pkey ON transactions(id int8_ops);
I would like to group the data with the following columns:
Category, January Total, February Total, March Total, and so on for every month.
This is as far as I've got:
SELECT
category, sum(amount) as january_total
from transactions
where category NOT IN ('Transfer', 'Payment', 'Deposit', 'Income')
AND date >= '2021-01-01' AND date < '2021-02-01'
group by category
Order by january_total asc
How do I add a column for every month to this output?
Here is the solution I came up with:
SELECT
category,
SUM(CASE WHEN date >= '2021-01-01' AND date <'2021-02-01' THEN amount ELSE 0.00 END) AS january,
SUM(CASE WHEN date >= '2021-02-01' AND date <'2021-03-01' THEN amount ELSE 0.00 END) AS february,
SUM(CASE WHEN date >= '2021-03-01' AND date <'2021-04-01' THEN amount ELSE 0.00 END) AS march,
SUM(CASE WHEN date >= '2021-04-01' AND date <'2021-05-01' THEN amount ELSE 0.00 END) AS april
from transactions
where category NOT IN ('Transfer', 'Payment', 'Deposit', 'Income')
GROUP BY category
Order by category