ksql - Aggregating data based on last 1 year ( 365 days) - apache-kafka

I would like to build sql query to have aggregation done for last 1 year. In this case, window size is last 365 days. Is it possible to do it using ksql ? Like the following query
SELECT regionid, regioncity, COUNT(*) FROM pageviews
WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS)
GROUP BY regionid, regioncity ;
My question is : When we specify 'SIZE 30 seconds' or 'SIZE 3600 seconds', will it start the window from start of the stream or from latest ?

The start date and time of windows is based on the Unix epoch. It has no bearing on the timestamp of the messages on the source topic.
Windows greater than a day will start at the Unix epoch and increment from there. Because there's no YEAR size and you're using 365 days you'll find that the window starts on 20th December (because of leap years)
SELECT TIMESTAMPTOSTRING(WINDOWSTART(),'yyyy-MM-dd HH:mm:ss','Europe/London') AS WINDOW_START_TS,
CUSTOMER,
SUM(COST)
FROM SOURCE_DATA
WINDOW TUMBLING (SIZE 365 DAYS)
GROUP BY CUSTOMER
EMIT CHANGES ;
+-----------------------+----------+------------+
|WINDOW_START_TS |CUSTOMER |KSQL_COL_2 |
+-----------------------+----------+------------+
|2018-12-20 00:00:00 |A |4 |
|2019-12-20 00:00:00 |A |2 |
I've written this up here:
https://rmoff.net/2020/01/09/exploring-ksqldb-window-start-time/

Related

calculate monthly average for each column based on the last recorded timestamp for each day Sql server

select [SName],MAX([DateTime]) as Dates
from table
where [DateTime] between #MNT1 and #MXT1 and [DateTime] <=#MXT1
group by[SName] ,CONVERT(Date,[DateTime])
order by [SName],CONVERT(Date,[DateTime])
I'd like to somehow calculate average only for the max timestamp per day for each column.
#mnt1 and 2 are set as start date and end date for report
ID SERVERNAME COL2 COL3 COLN DATETIME
1 A 1.1 1.5 2022-08-01 23:50:13.0000000
2 A 1.1 1.6 2022-08-01 23:55:13.55530000
3 B 1.7 2 2022-08-01 23:50:13.7530000
4 C 3.4 2 2022-08-02 23:50:13.7530000
5 C 1.4 5 2022-08-08 23:50:13.7530000
enter image description here
I'm trying to calculate average based on the last value logged per day for each server eg. server A for month of august will have timestamps throughout the day till server is shut down at midnight ,for A il have last recorded date timestamp for the 1st of august through to end of august now I need to capture those and calculate monthly average soley based on those timestamps and so on foe each server

Truncate date by month with custom start day in postgres

I am getting some statistics using a query like
SELECT date_trunc('month', created_at) AS time, count(DISTINCT "user_id") AS mau
FROM "session"
GROUP BY time
ORDER BY time;
Which is working fine if I want to get monthly active users for each calendar month. But I would like to shift the result to show last X moths starting from today instead of actual calendar months. How do I do? Can I add an offset in some way?
EDIT
As an example, I am currently getting results like
time | mau
2022-04-01 | 10
2022-05-01 | 20
2022-06-01 | 30
But I would like it to be something like (where 2022-06-07 is today)
time | mau
2022-04-07 | 10
2022-05-07 | 20
2022-06-07 | 30

how to get date difference between rows for each 100th instance in Postgresql

I have a table where my product subscription data is recorded like date, amount, product, plan. I want to show the difference in days for every 100 subscriptions.
Subscription Range | Days
1-100 | 10 days
101-200 | 7 days
201-300 | 8 days
Please help me with the query to achieve this.

Calculating total working hours based on shifts

I would like to calculate how many hours each employee has worked for a certain time period, based on information from this table:
start employee_id
2014-08-10 18:10:00 5
2014-08-10 13:30:00 7
2014-08-10 09:00:00 7
2014-08-09 23:55:00 4
2014-08-09 16:23:00 12
2014-08-09 03:59:00 9
2014-08-08 20:05:00 7
2014-08-08 13:00:00 8
Each employee replaces another employee and that's where his work is done, so there are no empty slots.
The desired format of the result would be the following:
employee_id total_minutes_worked
I'm trying to think of the best way to achieve this, so any help will be appreciated!
You can get the total time as:
select employee_id, sum(stop - start)
from (
select start, lead(start) over (order by start) as stop, employee_id
from t
) as x
group by employee_id;
It remains to format the time, but I assume this it not what puzzles you
you should use 'GroupBy' clause to first create a group of the same employee id
than you should calculate the time by checking the start time of work and end time of work in each slot.
(NOTE - you should maintain the start time and end time both of the employee in each slot of there shift)

How can I use the PERIOD feature of Temporal Postgres / Postgres 9.2 on Heroku?

I am building an app that deals with times and durations, and intersections between given units of time and start/end times in a database, for example:
Database:
Row # | start - end
Row 1 | 1:00 - 4:00
Row 2 | 3:00 - 6:00
I want to be able to select sums of time between two certain times, or GROUP BY an INTERVAL such that the returned recordset will have one row for each sum during a given interval, something like:
SELECT length( (start, end) ) WHERE (start, end) INTERSECTS (2:00,4:00)
(in this case (start,end) is a PERIOD which is a new data type in Postgres Temporal and pg9.2)
which would return
INTERVAL 3 HOURS
since Row 1 has two hours between 2:00 - 4:00 and Row 2 has one hour during that time.
further, i'd like to be able to:
SELECT "lower bound of start", length( (start, end) ) GROUP BY INTERVAL 1 HOUR
which i would like to return:
1:00 | 1
2:00 | 1
3:00 | 2
4:00 | 2
5:00 | 1
which shows one row for each hour during the given interval and the sum of time at the beginning of that interval
I think that the PERIOD type can be used for this, which is in Postgres Temporal and Postgres 9.2. However, these are not available on Heroku at this time as far as I can tell - So,
How can I enable these sorts of maths on Heroku?
Try running:
heroku addons:add heroku-postgresql:dev --version=9.2
That should give you the 9.2 version which has range types supported. As this is currently very alpha any feedback would be greatly appreciated at dod-feedback#heroku.com