PostgreSQL / TimescaleDB query that group data by custom interval - postgresql

I have a working query that allows me to sum value field each day.
SELECT sum(value) as sum_value, to_char(time::date,'DD-MM-YY') as day
FROM "measures"
GROUP BY "day"
ORDER BY "day" asc
Is it possible to do the same query, but instead of grouping by day, grouping it by 2 days,or a specific duration ( days only, not hours)

You can use timescaledb extension time_bucket() function as #Mike Organek commented or use native PosgreSQL data transformation like:
SELECT
sum(value) as sum_value,
to_char(time::date,'DD-MM-YY') as day,
floor(extract(epoch from "time")/(60*60*24*2)) as time_bucket
FROM "measures"
GROUP BY "time_bucket"
ORDER BY "day" asc;
In above query we extract unixtime from datetime column, divide it to grouping period in seconds (60*60*24*2 - mean 2 days or 48 hours) and round it fro use into group by statement

Related

How to select data between two dates using only the start date?

I have problem select data between two dates if the only start_date is available.
The example I want to see is what discount_nr was active between 2020-07-01 and 2020-07-15 or only one day 2020-07-14. I tried different solutions, date range, generate series, and so on, but was still not able to get it to work.
Table only have start dates, no end dates
Example:
discount_nr, start_date
1, 2020-06-30
2, 2020-07-03
3, 2020-07-10
4, 2020-07-15
You can get the end dates by looking at the start date of the next row. This is done with lead. lead(start_date) over(order by start_date asc) will get you the start_date of the next row. If we take 1 day from that we'll get the inclusive end date.
Rather than separate start/end columns, a single daterange column is easier to work with. You can use that as a CTE or create a view.
create view discount_durations as
select
id,
daterange(
start_date,
lead(start_date) over(order by start_date asc)
) as duration
from discounts
Now querying it is easy using range operators. #> to check if the range contains a date.
select *
from discount_durations
where duration #> '2020-07-14'::date
And use && to see if they have any overlap.
select *
from discount_durations
where duration && daterange('2020-07-01', '2020-07-15');
Demonstration

Continuous aggregates in postgres/timescaledb requires time_bucket-function?

I have a SELECT-query which gives me the aggregated sum(minutes_per_hour_used) of some stuff. Grouped by id, weekday and observed hour.
SELECT id,
extract(dow from observed_date) AS weekday, ( --observed_date is type date
observed_hour, -- is type timestamp without timezone, every full hour 00:00:00, 01:00:00, ...
sum(minutes_per_hour_used)
FROM base_table
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
The result looks nice, but now I would like to store that in a self-maintained view, which only considers/aggregates the last 8 weeks. I thought contiouus aggregates are the right way, but I can't make it work (https://blog.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/). It seems I need to somehow use the time_bucket-function, but actually I don't know how. Any ideas/hints?
I am using postgres with timescaledb.
EDIT: This gives me the desired output, but I can't put it in a continouus aggregate
SELECT id,
extract(dow from observed_date) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
WHERE observed_date >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
EDIT: Prepend this with
CREATE VIEW my_view
WITH (timescaledb.continuous) AS
gives me [0A000] ERROR: invalid SELECT query for continuous aggregate
Continuous aggregates require grouping by time_bucket:
SELECT <grouping_exprs>, <aggregate_functions>
FROM <hypertable>
[WHERE ... ]
GROUP BY time_bucket( <const_value>, <partition_col_of_hypertable> ),
[ optional grouping exprs>]
[HAVING ...]
It should be applied to a partitioned column, which is usually the time dimension column used in the hypertable creation. Also ORDER BY is not supported.
In the case of the aggregate query in the question no time column is used for grouping. Neither weekday nor observed_hour are time valid columns, since they don't increase as time, instead their values are repeat regularly. weekday repeats every 7 days and observed_hour repeats every 24 hours. This breaks requirements for continuous aggregates.
Since there is no ready solution for this use case, one approach is to use a continuous aggregate to reduce the amount of data for the targeted query, e.g., by bucketing by day:
CREATE MATERIALIZED VIEW daily
WITH (timescaledb.continuous) AS
SELECT id,
time_bucket('1day', observed_date) AS day,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
GROUP BY 1, 2, 3;
Then execute the targeted aggregate query on top of it:
SELECT id,
extract(dow from day) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM daily
WHERE day >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
Another approach is to use PostgreSQL's materialized views and refresh it on regular basis with help of custom jobs, which is run by the job scheduling framework of TimescaleDB. Note that the refresh will re-calculate entire view, which in the example case covers 8 weeks of data. The materialized view can be written in terms of the original table base_table or in terms of the continuous aggregate suggested above.

Sum with aggregate date postgresql

I have a table that has a column date and value, what I need is to sum a value showing just one date column.
Ex:
I have this:
date value
2018-01-01 150
2018-01-23 140
what I need:
date sum(value)
2018-01 290
Simple solution to get sums per month:
SELECT to_char(date, 'YYYY-MM') AS mon, sum(value) AS sum_value
FROM tbl
GROUP BY 1;
For large tables it's cheaper to group on date_trunc('month', date) instead.
Related:
Concatenate multiple result rows of one column into one, group by another column
Group and count events per time intervals, plus running total
How to get the date and time from timestamp in PostgreSQL select query?

Calculate best sale between several sellers

I'm using postgre .
Let's say there are 5 sellers .
Each month sale is recorded inside the database like this ( userId:6, january : 10000$, february:20000$ , march : 10000$ ... ,december:50000$, year :2018 )
I need to calculate , possibily with only one query, the best of each month sale in one array of this format : ( january : 15000$, february:30000$ , march : 40000$ , year :2018 ), i dont need the userId . I simply need to compare each sales per months and display the best amount ...
For now, i've got this code, who works well, givin me the user 6 sales per month on a given year :
SELECT date_trunc('month', date_vente) AS txn_month, sum(prix_vente) as monthly_sum,count(prix_vente) AS monthly_count
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
AND id_user = 6
GROUP BY txn_month ORDER BY txn_month
I wonder if somebody could tell me what kind of technology i could use to get the best of sales each 12 months between of the 5 employees .
COuld i use view ? SHould i better do a for loop in php, with each of the users sales per months, then do a kind of comparative array ?
No need to give me a full resolution, but maybe an advice on how to do, directly with postgre ? Because my only solution for now is to use php and to do a not nice code .
Nice day, ill check on MOnday
Sorry for my english
WITH monthly_sales AS (
SELECT
date_trunc('month', date_vente) AS txn_month,
user_id,
sum(prix_vente) as monthly_sum,
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
GROUP BY txn_month, user_id
ORDER BY txn_month, user_id),
rank_monthly_sales_by_user_id AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY txn_month ORDER BY monthly_sum DESC) AS rank
FROM monthly_sales)
SELECT
txn_month,
monthly_sum
FROM rank_monthly_sales_by_user_id
WHERE rank = 1
ORDER BY txn_month ASC;
Firstly what you should do is get the totals per month by user. This is the top subquery called monthly sales. Monthly_sales sums the sales of each user by month
Next, to get the top user for each month in terms of their total sales you have to rank the rows returned by the previous subquery. This is down by ROW_NUMBER()
ROW_NUMBER() gets the row number in a specified window, in this case it's ordering the rows from monthly_sales for each month (it starts ordering again from 1 each month). The PARTITION BY statement is the window in which we want to perform the row count, here it's month since we want to order our user_id's sales by month. The ORDER BY statement says how to order the rows from 1 to n. We're using monthly_sum in descending order. So the highest monthly sum is 1, lowest is 6
The next query is selecting only the rows from rank_monthly_sales_by_user_id that are the top sales for the month (WHERE rank = 1)
This leaves us with a output where is row is a month, with the highest sale for that month
Let me know if that was what you needed help with

how do you sum over a related period

I need to sum values that are + 2 months or within a quarter period (related date table)
is there a way to use dense rank to partition those periods (custom periods)?
select
FiscalMonth
,Value
from table
The sql will have to do the following:
Join the value table and the period table
Include the period in the select list and sum the value, grouping by the period
i.e
select b.period, sum(a.value)
from table a
inner join period b on a.FiscalMonth between b.StartMonth and b.EndMonth
group by b.period
Note: The join condition will have to be modified based on what data you actually have in the period table.
Hope this helps
Well, If you need value from an X interval, by month you could use something like:
SELECT *
FROM yourTable
MONTH(some_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH) //Could be X interval!
This is an example (which show the results of the previous month, from the actual one). Just trying to write that it is possible to massage the query in functions on intervals.
Of course, you could use the SUMcommand for the adding.