Clickhouse group by with difference less then - group-by

date_start | date_end | value
2020-12-05 11:00:00 | 2020-12-05 11:15:00 | 1
2020-12-05 11:15:00 | 2020-12-05 11:30:00 | 2
2020-12-05 11:30:00 | 2020-12-05 11:45:00 | 3
2020-12-05 13:00:00 | 2020-12-05 13:15:00 | 4
If the difference of date_start are less than 15 minutes, then group the rows and calculate the sum of values.
Expected result
date_start | date_end | sum
2020-12-05 11:00:00 | 2020-12-05 11:45:00 | 6
2020-12-05 13:00:00 | 2020-12-05 13:15:00 | 4

Related

PostgreSQL build working range from one date column

I'm using PostgreSQL v. 11.2
I have a table
|id | bot_id | date |
| 1 | 1 | 2020-04-20 16:00:00|
| 2 | 2 | 2020-04-22 12:00:00|
| 3 | 3 | 2020-04-24 04:00:00|
| 4 | 1 | 2020-04-27 09:00:00|
And for example, I have DateTime range 2020-03-30 00:00:00 and 2020-04-30 00:00:00
I need to show get working ranges to count the total working hours of each bot.
Like this:
|bot_id | start_date | end_date |
| 1 | 2020-03-30 00:00:00 | 2020-04-20 16:00:00 |
| 2 | 2020-04-20 16:00:00 | 2020-04-22 12:00:00 |
| 3 | 2020-04-22 12:00:00 | 2020-04-24 04:00:00 |
| 1 | 2020-04-24 04:00:00 | 2020-04-27 09:00:00 |
| 1 | 2020-04-27 09:00:00 | 2020-04-30 00:00:00 |
I've tried to use LAG(date) but I'm not getting first and last dates of the range.
You could use a UNION ALL, with one part building the start_date/end_date couples from your values & the other part filling in the last period (from the last date to 2020-04-30 00:00:00):
WITH values (id, bot_id, date) AS (
VALUES (1, 1, '2020-04-20 16:00:00'::TIMESTAMP)
, (2, 2, '2020-04-22 12:00:00')
, (3, 3, '2020-04-24 04:00:00')
, (4, 1, '2020-04-27 09:00:00')
)
(
SELECT bot_id
, LAG(date, 1, '2020-03-30 00:00:00') OVER (ORDER BY id) AS start_date
, date AS end_date
FROM values
)
UNION ALL
(
SELECT bot_id
, date AS start_date
, '2020-04-30 00:00:00' AS end_date
FROM values
ORDER BY id DESC
LIMIT 1
)
+------+--------------------------+--------------------------+
|bot_id|start_date |end_date |
+------+--------------------------+--------------------------+
|1 |2020-03-30 00:00:00.000000|2020-04-20 16:00:00.000000|
|2 |2020-04-20 16:00:00.000000|2020-04-22 12:00:00.000000|
|3 |2020-04-22 12:00:00.000000|2020-04-24 04:00:00.000000|
|1 |2020-04-24 04:00:00.000000|2020-04-27 09:00:00.000000|
|1 |2020-04-27 09:00:00.000000|2020-04-30 00:00:00.000000|
+------+--------------------------+--------------------------+

How to generate grouped by minutes in given day, max min and avg for time range in postgresql?

Hello I have been trying to generate a report based on some db data.
I need to calculate per DAY (finished) so in this case lets say that the day for calculation will be : (2001-01-02) and in current date we are in the 2001-01-03.
So basically day before current date.
MAX count for locker_orders occupancy in that day + time of occurrence (peak max load of lockers per place)
Min count for locker_orders occupancy in that day + time of occurrence
(peak min load of lockers per place)
AVG count for locker_orders occupancy in that day (average load in that day based on min max and the number of lockers per place)
group PER place_id
group PER each minute in current day
NUMBER of all lockers in store on that day (may change in time)
Where there is no pickup date the locker is still occupied - it may move to another days span
I was able to perform a simple query to group by place and per minute the locker order was created at but currently i have a problem placing it in current day scope
here is a representation of the timeline (handmade ;))
Given a schema of data containing
DB DATA
LOCKERS
------------------------------------
| id | created_at |
------------------------------------
| 1 | 2001-01-01 00:00 (DATETIME) |
------------------------------------
| 2 | 2001-01-01 00:00 (DATETIME) |
------------------------------------
| 3 | 2001-01-01 00:00 (DATETIME) |
------------------------------------
| 4 | 2001-01-01 00:00 (DATETIME) |
------------------------------------
| 5 | 2001-01-01 00:00 (DATETIME) |
------------------------------------
LOCKER_ORDERS
------------------------------------------------------------------------------------
| id | created_at | pickup_date | place_id | locker_id |
------------------------------------------------------------------------------------
| 1 | 2001-01-02 10:00 (DATETIME) | 2001-01-02 13:25 (DATETIME) | 1 | 2 |
------------------------------------------------------------------------------------
| 2 | 2001-01-02 07:45 (DATETIME) | 2001-01-02 11:50 (DATETIME) | 1 | 1 |
------------------------------------------------------------------------------------
| 3 | 2001-01-02 19:30 (DATETIME) | NULL | 1 | 4 |
------------------------------------------------------------------------------------
| 4 | 2001-01-01 14:40 (DATETIME) | 2001-01-01 21:15 (DATETIME) | 1 | 5 |
-------------------------------------------------------------------------------------
| 5 | 2001-01-02 12:25 (DATETIME) | NULL | 1 | 3 |
-------------------------------------------------------------------------------------
| 6 | 2001-01-02 13:30 (DATETIME) | 2001-01-02 18:40 (DATETIME) | 1 | 2 |
-------------------------------------------------------------------------------------
| 7 | 2001-01-02 12:45 (DATETIME) | 2001-01-02 20:50 (DATETIME) | 1 | 1 |
-------------------------------------------------------------------------------------
| 8 | 2001-01-02 07:40 (DATETIME) | 2001-01-02 18:15 (DATETIME) | 1 | 5 |
-------------------------------------------------------------------------------------
OUTPUT DATA - the desired output
# | Date (day) | place_id | min | max | avg | NO of all lockers in that day in given place |
---------------------------------------------------------------------------------------------
# | 2001-01-02 | 1 | 0 | 4 | 2 | 8 |

Aggregate all previous rows for a specific time difference

I have a Spark DataFrame with the following entries:
| order id | time | amt |
| 1 | 2017-10-01 12:00 | 100 |
| 2 | 2017-10-01 15:00 | 100 |
| 3 | 2017-10-01 17:00 | 100 |
| 4 | 2017-10-02 16:00 | 100 |
| 5 | 2017-10-02 23:00 | 100 |
I want to add a column amount_prev_24h that has, for each order id, the sum of amt for all orders in the last 24 hours.
| order id | time | amt | amt_24h
| 1 | 2017-10-01 12:00 | 100 | 0
| 2 | 2017-10-01 15:00 | 100 | 100
| 3 | 2017-10-01 17:00 | 100 | 200
| 4 | 2017-10-02 16:00 | 100 | 100
| 5 | 2017-10-02 23:00 | 100 | 100
How would I go about doing it?
This is a pyspark code and similar to scala API.
df = df.withColumn('time_uts', unix_timestamp('time', format='yyyy-MM-dd HH:mm'))
df = df.withColumn('amt_24h', sum('amt').over(Window.orderBy('time_uts').rangeBetween(-24 * 3600, -1))).fillna(0, subset='amt_24h')
I hope this may help you.

Postgresql - increment counter in rows where a column has duplicate value

I have added a column (seq) to a table used for scheduling so the front end can manage the order in which each item can be displayed. Is it possible to craft a SQL query to populate this column with an incremental counter based on the common duplicate values in the date column?
Before
------------------------------------
| name | date_time | seq |
------------------------------------
| ABC1 | 15-01-2017 11:00:00 | |
| ABC2 | 16-01-2017 11:30:00 | |
| ABC1 | 16-01-2017 11:30:00 | |
| ABC3 | 17-01-2017 10:00:00 | |
| ABC3 | 18-01-2017 12:30:00 | |
| ABC4 | 18-01-2017 12:30:00 | |
| ABC1 | 18-01-2017 12:30:00 | |
------------------------------------
After
------------------------------------
| name | date_time | seq |
------------------------------------
| ABC1 | 15-01-2017 11:00:00 | 0 |
| ABC2 | 16-01-2017 11:30:00 | 0 |
| ABC1 | 16-01-2017 11:30:00 | 1 |
| ABC3 | 17-01-2017 10:00:00 | 0 |
| ABC3 | 18-01-2017 12:30:00 | 0 |
| ABC4 | 18-01-2017 12:30:00 | 1 |
| ABC1 | 18-01-2017 12:30:00 | 2 |
------------------------------------
Solved, thanks to both answers.
To make it easier for anybody who finds this, the working code is:
UPDATE my_table f
SET seq = seq2
FROM (
SELECT ctid, ROW_NUMBER() OVER (PARTITION BY date_time ORDER BY ctid) -1 AS seq2
FROM my_table
) s
WHERE f.ctid = s.ctid;
Use the window function row_number():
with my_table (name, date_time) as (
values
('ABC1', '15-01-2017 11:00:00'),
('ABC2', '16-01-2017 11:30:00'),
('ABC1', '16-01-2017 11:30:00'),
('ABC3', '17-01-2017 10:00:00'),
('ABC3', '18-01-2017 12:30:00'),
('ABC4', '18-01-2017 12:30:00'),
('ABC1', '18-01-2017 12:30:00')
)
select *,
row_number() over (partition by name order by date_time)- 1 as seq
from my_table
order by date_time;
name | date_time | seq
------+---------------------+-----
ABC1 | 15-01-2017 11:00:00 | 0
ABC1 | 16-01-2017 11:30:00 | 1
ABC2 | 16-01-2017 11:30:00 | 0
ABC3 | 17-01-2017 10:00:00 | 0
ABC1 | 18-01-2017 12:30:00 | 2
ABC3 | 18-01-2017 12:30:00 | 1
ABC4 | 18-01-2017 12:30:00 | 0
(7 rows)
Read this answer for a similar question about updating existing records with a unique integer.
Check out ROW_NUMBER().
SELECT name, date_time, ROW_NUMBER() OVER (PARTITION BY date_time ORDER BY name) FROM [table]

Symfony2 Query to find last working date from Holiday Calender

I had a calender entity in my project which manages the open and close time of business day of the whole year.
Below is the record of a specific month
id | today_date | year | month_of_year | day_of_month | is_business_day
-------+---------------------+------+---------------+-------------+---------------+
10103 | 2016-02-01 00:00:00 | 2016 | 2 | 1 | t
10104 | 2016-02-02 00:00:00 | 2016 | 2 | 2 | t
10105 | 2016-02-03 00:00:00 | 2016 | 2 | 3 | t
10106 | 2016-02-04 00:00:00 | 2016 | 2 | 4 | t
10107 | 2016-02-05 00:00:00 | 2016 | 2 | 5 | t
10108 | 2016-02-06 00:00:00 | 2016 | 2 | 6 | f
10109 | 2016-02-07 00:00:00 | 2016 | 2 | 7 | f
10110 | 2016-02-08 00:00:00 | 2016 | 2 | 8 | t
10111 | 2016-02-09 00:00:00 | 2016 | 2 | 9 | t
10112 | 2016-02-10 00:00:00 | 2016 | 2 | 10 | t
10113 | 2016-02-11 00:00:00 | 2016 | 2 | 11 | t
10114 | 2016-02-12 00:00:00 | 2016 | 2 | 12 | t
10115 | 2016-02-13 00:00:00 | 2016 | 2 | 13 | f
10116 | 2016-02-14 00:00:00 | 2016 | 2 | 14 | f
10117 | 2016-02-15 00:00:00 | 2016 | 2 | 15 | t
10118 | 2016-02-16 00:00:00 | 2016 | 2 | 16 | t
10119 | 2016-02-17 00:00:00 | 2016 | 2 | 17 | t
10120 | 2016-02-18 00:00:00 | 2016 | 2 | 18 | t
I want the get the today_date of last 7 working date. Supporse today_date is 2016-02-18 and date of last 7 working dates as 2016-02-09.
You can use row_number() for this like this:
SELECT * FROM
(SELECT t.*,row_number() OVER(order by today_date desc) as rnk
FROM Calender t
WHERE today_date <= current_date
AND is_business_day = 't')
WHERE rnk = 7
This will give you the row of the 7th business day from todays date
I see that you tagged your question with Doctrine, ORM and Datetime. Were you after a QueryBuilder solution? Maybe this is closer to what you want:
$qb->select('c.today_date')
->from(Calendar::class, 'c')
->where("c.today_date <= :today")
->andWhere("c.is_business_day = 't'")
->setMaxResults(7)
->orderBy("c.today_date", "DESC")
->setParameter('today', new \DateTime('now'), \Doctrine\DBAL\Types\Type::DATETIME));