Date range in window functions PostgreSQL

Date range in window functions PostgreSQL - postgresql

I have a table with data for the whole of 2021 and 2022. I need to calculate the cumulative amount of the field by date for the previous year. For example, the row with date 2022-03-01 must have the value of the cumulative amount for 2021-03-01
I am trying this window function:
SUM(fact_mln) OVER(PARTITION BY date - INTERVAL '1 year' ORDER BY date)
But the method - INTERVAl '1 year' is not working
How can this window function be converted or is there any other solution?

You don't need PARTITION BY:
CREATE TABLE laurenz.amounts (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
date date NOT NULL,
fact_mln double precision NOT NULL
);
INSERT INTO amounts (date, fact_mln) VALUES
('2021-03-01', 1),
('2021-03-01', 2),
('2021-03-02', 3),
('2021-03-02', 4),
('2021-03-03', 5),
('2022-02-28', 6),
('2022-03-01', 7),
('2022-03-01', 8),
('2022-03-02', 9);
SELECT date, fact_mln,
sum(fact_mln) OVER (ORDER BY date
RANGE BETWEEN INTERVAL '1' YEAR PRECEDING
AND INTERVAL '1' YEAR PRECEDING)
FROM amounts;
date │ fact_mln │ sum
════════════╪══════════╪═════
2021-03-01 │ 1 │ ∅
2021-03-01 │ 2 │ ∅
2021-03-02 │ 3 │ ∅
2021-03-02 │ 4 │ ∅
2021-03-03 │ 5 │ ∅
2022-02-28 │ 6 │ ∅
2022-03-01 │ 7 │ 3
2022-03-01 │ 8 │ 3
2022-03-02 │ 9 │ 7
(9 rows)

Related

ClickHouse: Efficient way to aggregate data by different time ranges at

I need to aggregate time-series data (with average functions) on different timeslots, like:
Today
last X days
Last weekend
This week
Last X weeks
This month
List item
etc...
Q1: Can it be done within GROUP BY statement or at least with a single query?
Q2: Do I need any Materialized View for that?
The table is partitioned by Month and sharded by UserID
All queries are within UserID (single shard)

group by with ROLLUP
create table xrollup(metric Int64, b date, v Int64 ) engine=MergeTree partition by tuple() order by tuple();
insert into xrollup values (1,'2018-01-01', 1), (1,'2018-01-02', 1), (1,'2018-02-01', 1), (1,'2017-03-01', 1);
insert into xrollup values (2,'2018-01-01', 1), (2,'2018-02-02', 1);
SELECT metric, toYear(b) y, toYYYYMM(b) m, SUM(v) AS val
FROM xrollup
GROUP BY metric, y, m with ROLLUP
ORDER BY metric, y, m
┌─metric─┬────y─┬──────m─┬─val─┐
│ 0 │ 0 │ 0 │ 6 │ overall
│ 1 │ 0 │ 0 │ 4 │ overall by metric1
│ 1 │ 2017 │ 0 │ 1 │ overall by metric1 for 2017
│ 1 │ 2017 │ 201703 │ 1 │ overall by metric1 for march 2017
│ 1 │ 2018 │ 0 │ 3 │
│ 1 │ 2018 │ 201801 │ 2 │
│ 1 │ 2018 │ 201802 │ 1 │
│ 2 │ 0 │ 0 │ 2 │
│ 2 │ 2018 │ 0 │ 2 │
│ 2 │ 2018 │ 201801 │ 1 │
│ 2 │ 2018 │ 201802 │ 1 │
└────────┴──────┴────────┴─────┘

Although there's an accepted answer I had to do something similar but found an alternative route using aggregate function combinators, specially -If to select specific date ranges.
I needed to group by a content ID but retrieve unique views for the whole time range and also for specific buckets to generate a histogram (ClickHouse's histogram() function wasn't suitable because there's no option for sub aggregation).
You could do something along these lines:
SELECT
group_field,
avgIf(metric, date BETWEEN toDate('2022-09-03') AND toDate('2022-09-10')) AS week_avg
avgIf(metric, date BETWEEN toDate('2022-08-10') AND toDate('2022-09-10')) AS month_avg
FROM data
GROUP BY group_field

Subtract value from previous row value if it is greater than the max value

Im using Postgresql & Sequelize. I have to find the consumption from the reading table. Currently, I have the query to subtract the value from the previous row. But the problem was If the value is less than the previous value means I have to ignore the row and need to wait for the greater value to make the calculation.
Current Query
select "readingValue",
"readingValue" - coalesce(lag("readingValue") over (order by "id")) as consumption
from public."EnergyReadingTbl";
Example Record & Current Output
id readingValue consumption
65479 "35.8706703186035" "3.1444168090820"
65480 "39.0491638183594" "3.1784934997559"
65481 "42.1287002563477" "3.0795364379883"
65482 "2.38636064529419" "-39.74233961105351"
65483 "5.91744041442871" "3.53107976913452"
65484 "9.59204387664795" "3.67460346221924"
65485 "14.3925561904907" "4.80051231384275"
65486 "19.4217891693115" "5.0292329788208"
65487 "24.2393398284912" "4.8175506591797"
65488 "29.2515335083008" "5.0121936798096"
65489 "34.2519302368164" "5.0003967285156"
65490 "38.6513633728027" "4.3994331359863"
65491 "43.7513643778087" "5.1000010050060"
In this picture, the last max value was 42.1287002563477. I have to wait until to get the greater value than 42.1287002563477 to make the calculation like the next greater value - 42.1287002563477. In this, 43.7513643778087 - 42.1287002563477.
Expected Output
id readingValue consumption
65479 "35.8706703186035" "3.1444168090820"
65480 "39.0491638183594" "3.1784934997559"
65481 "42.1287002563477" "3.0795364379883"
65482 "2.38636064529419" "0"
65483 "5.91744041442871" "0"
65484 "9.59204387664795" "0"
65485 "14.3925561904907" "0"
65486 "19.4217891693115" "0"
65487 "24.2393398284912" "0"
65488 "29.2515335083008" "0"
65489 "34.2519302368164" "0"
65490 "38.6513633728027" "0"
65491 "43.7513643778087" "1.1226641214710"
Is there any chance to resolve this issue in the query?

You can use the ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING to limit the frame of the window function, so you can substract the MAX up to the current row with the MAX value of the rows up to but excluding the current row:
SELECT readingValue,
MAX(readingValue) OVER (ORDER BY id) - MAX(readingValue) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM e;
┌──────────────────┬─────────────────┐
│ readingvalue │ ?column? │
├──────────────────┼─────────────────┤
│ 35.8706703186035 │ (null) │
│ 39.0491638183594 │ 3.1784934997559 │
│ 42.1287002563477 │ 3.0795364379883 │
│ 2.38636064529419 │ 0 │
│ 5.91744041442871 │ 0 │
│ 9.59204387664795 │ 0 │
│ 14.3925561904907 │ 0 │
│ 19.4217891693115 │ 0 │
│ 24.2393398284912 │ 0 │
│ 29.2515335083008 │ 0 │
│ 34.2519302368164 │ 0 │
│ 38.6513633728027 │ 0 │
│ 43.7513643778087 │ 1.622664121461 │
└──────────────────┴─────────────────┘
(13 rows)
Time: 0,430 ms

How to create a recursive cte query that will push parent ids and grandparent ids into an array

I have a postgresql table that I am trying to create. This is my cte and I am inserting values here
BEGIN;
CREATE TABLE section (
id SERIAL PRIMARY KEY,
parent_id INTEGER REFERENCES section(id) DEFERRABLE,
name TEXT NOT NULL UNIQUE );
SET CONSTRAINTS ALL DEFERRED;
INSERT INTO section VALUES (1, NULL, 'animal');
INSERT INTO section VALUES (2, NULL, 'mineral');
INSERT INTO section VALUES (3, NULL, 'vegetable');
INSERT INTO section VALUES (4, 1, 'dog');
INSERT INTO section VALUES (5, 1, 'cat');
INSERT INTO section VALUES (6, 4, 'doberman');
INSERT INTO section VALUES (7, 4, 'dachshund');
INSERT INTO section VALUES (8, 3, 'carrot');
INSERT INTO section VALUES (9, 3, 'lettuce');
INSERT INTO section VALUES (10, 11, 'paradox1');
INSERT INTO section VALUES (11, 10, 'paradox2');
SELECT setval('section_id_seq', (select max(id) from section));
WITH RECURSIVE last_run(parent_id, id_list, name_list) AS (
???
SELECT id_list, name_list
FROM last_run ???
WHERE ORDER BY id_list;
ROLLBACK;
I know that a recursive query is the best possible way, but am not sure how exactly to implement it. What exactly goes in the ???
What im trying to get is the table below:
id_list | name_list
---------+------------------------
{1} | animal
{2} | mineral
{3} | vegetable
{4,1} | dog, animal
{5,1} | cat, animal
{6,4,1} | doberman, dog, animal
{7,4,1} | dachshund, dog, animal
{8,3} | carrot, vegetable
{9,3} | lettuce, vegetable
{10,11} | paradox1, paradox2
{11,10} | paradox2, paradox1

You could to use several recursive CTEs in single query: one for the valid tree and another one for paradoxes:
with recursive
cte as (
select *, array[id] as ids, array[name] as names
from section
where parent_id is null
union all
select s.*, s.id||c.ids, s.name||c.names
from section as s join cte as c on (s.parent_id = c.id)),
paradoxes as (
select *, array[id] as ids, array[name] as names
from section
where id not in (select id from cte)
union all
select s.*, s.id||p.ids, s.name||p.names
from section as s join paradoxes as p on (s.parent_id = p.id)
where s.id <> all(p.ids) -- To break loops
)
select * from cte
union all
select * from paradoxes;
Result:
┌────┬───────────┬───────────┬─────────┬────────────────────────┐
│ id │ parent_id │ name │ ids │ names │
├────┼───────────┼───────────┼─────────┼────────────────────────┤
│ 1 │ ░░░░ │ animal │ {1} │ {animal} │
│ 2 │ ░░░░ │ mineral │ {2} │ {mineral} │
│ 3 │ ░░░░ │ vegetable │ {3} │ {vegetable} │
│ 4 │ 1 │ dog │ {4,1} │ {dog,animal} │
│ 5 │ 1 │ cat │ {5,1} │ {cat,animal} │
│ 8 │ 3 │ carrot │ {8,3} │ {carrot,vegetable} │
│ 9 │ 3 │ lettuce │ {9,3} │ {lettuce,vegetable} │
│ 6 │ 4 │ doberman │ {6,4,1} │ {doberman,dog,animal} │
│ 7 │ 4 │ dachshund │ {7,4,1} │ {dachshund,dog,animal} │
│ 10 │ 11 │ paradox1 │ {10} │ {paradox1} │
│ 11 │ 10 │ paradox2 │ {11} │ {paradox2} │
│ 11 │ 10 │ paradox2 │ {11,10} │ {paradox2,paradox1} │
│ 10 │ 11 │ paradox1 │ {10,11} │ {paradox1,paradox2} │
└────┴───────────┴───────────┴─────────┴────────────────────────┘
Demo
As you can see the result includes two unwanted rows: {10}, {paradox1} and {11}, {paradox2}. It is up to you how to filter them out.
And it is not clear what is the desired result if you append yet another row like INSERT INTO section VALUES (12, 10, 'paradox3'); for instance.

Could not choose a best candidate function. You might need to add explicit type casts

select
c_elementvalue.value AS "VALUE",
c_elementvalue.name AS "NAME",
rv_fact_acct.postingtype AS "POSTINGTYPE",
sum(rv_fact_acct.amtacct) AS "AMNT",
'YTDB' AS "TYPE",
c_period.enddate AS "ENDDATE",
max(ad_client.description) AS "COMPANY"
from
adempiere.c_period,
adempiere.rv_fact_acct,
adempiere.c_elementvalue,
adempiere.ad_client
where
(rv_fact_acct.ad_client_id = ad_client.ad_client_id ) and
(rv_fact_acct.c_period_id = c_period.c_period_id) and
(rv_fact_acct.account_id = c_elementvalue.c_elementvalue_id) and
(rv_fact_acct.dateacct BETWEEN to_date( to_char( '2017-03-01' ,'YYYY') ||'-04-01', 'yyyy-mm-dd') AND '2017-03-31' ) AND
(rv_fact_acct.ad_client_id = 1000000) and
(rv_fact_acct.c_acctschema_id = 1000000 )and
(rv_fact_acct.postingtype = 'B')and
(rv_fact_acct.accounttype in ('R','E') )
group by c_elementvalue.value , c_elementvalue.name , rv_fact_acct.postingtype , c_period.enddate
order by 5 asc, 1 asc
I got an error message, when executing above sql statement(postgres).
Error message:
[Err] ERROR: function to_char(unknown, unknown) is not unique
LINE 68: (rv_fact_acct.dateacct BETWEEN to_date( to_char( '2017-03-...
^
HINT: Could not choose a best candidate function. You might need to add explicit type casts.

This part of your query is problematic:
to_date( to_char( '2017-03-01' ,'YYYY') ||'-04-01', 'yyyy-mm-dd')
There are not any function to_char, that has first parameter string.
postgres=# \df to_char
List of functions
┌────────────┬─────────┬──────────────────┬───────────────────────────────────┬────────┐
│ Schema │ Name │ Result data type │ Argument data types │ Type │
╞════════════╪═════════╪══════════════════╪═══════════════════════════════════╪════════╡
│ pg_catalog │ to_char │ text │ bigint, text │ normal │
│ pg_catalog │ to_char │ text │ double precision, text │ normal │
│ pg_catalog │ to_char │ text │ integer, text │ normal │
│ pg_catalog │ to_char │ text │ interval, text │ normal │
│ pg_catalog │ to_char │ text │ numeric, text │ normal │
│ pg_catalog │ to_char │ text │ real, text │ normal │
│ pg_catalog │ to_char │ text │ timestamp without time zone, text │ normal │
│ pg_catalog │ to_char │ text │ timestamp with time zone, text │ normal │
└────────────┴─────────┴──────────────────┴───────────────────────────────────┴────────┘
(8 rows)
You can cast string 2017-03-01 to date type. PostgreSQL cannot do it self, because there are more variants: numeric,timestamp, ...
postgres=# select to_date( to_char( '2017-03-01'::date ,'YYYY') ||'-04-01', 'yyyy-mm-dd');
┌────────────┐
│ to_date │
╞════════════╡
│ 2017-04-01 │
└────────────┘
(1 row)
Usually, using string operations for date time operations are wrong. PostgreSQL (and all SQL databases) has great functions for date arithmetic.
For example - the task "get first date of following month" can be done with expression:
postgres=# select date_trunc('month', current_date + interval '1month')::date;
┌────────────┐
│ date_trunc │
╞════════════╡
│ 2017-05-01 │
└────────────┘
(1 row)
You can write custom function in SQL language (macro):
postgres=# create or replace function next_month(date)
returns date as $$
select date_trunc('month', $1 + interval '1month')::date $$
language sql;
CREATE FUNCTION
postgres=# select next_month(current_date);
┌────────────┐
│ next_month │
╞════════════╡
│ 2017-05-01 │
└────────────┘
(1 row)

It isn't clear what logic you intend to use for filtering by the account date, but your current use of to_char() and to_date() appears to be the cause of the error. If you just want to grab records from March 2017, then use the following:
rv_fact_acct.dateacct BETWEEN '2017-03-01' AND '2017-03-31'
If you give us more information about what you are trying to do, this can be updated accordingly.

what does below cron expression ,means?

i have below cron expression.
"0 0 0 ? * SUN"
when exactly this is executed? midnight of sunday of saturday?
Thanks!

See the tutorial
* * * * * * (year optional)
┬ ┬ ┬ ┬ ┬ ┬
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ └───── day of week (0 - 7) (0 or 7 is Sun, or use names)
│ │ │ │ └────────── month (1 - 12)
│ │ │ └─────────────── day of month (1 - 31)
│ │ └──────────────────── hour (0 - 23)
│ └───────────────────────── min (0 - 59)
└───────────────────────── seconds
Wild-cards (the * character) can be used to say "every" possible value of this field. Therefore the * character in the "Month" field of the previous example simply means "every month". A '*' in the Day-Of-Week field would therefore obviously mean "every day of the week".
The '?' character is allowed for the day-of-month and day-of-week fields. It is used to specify "no specific value". This is useful when you need to specify something in one of the two fields, but not the other. See the examples below (and CronTrigger JavaDoc) for clarification.
So it means every sunday at midnight

This is not actually a cron expression. It is a quartz-schedule expression.
http://quartz-scheduler.org/documentation/quartz-2.1.x/tutorials/crontrigger
The 0 0 0 means midnight (second, minutes, hour)
The ? means it depends on other fields.
The * means all months.
The SUN means on Sunday.

The trigger will fire at 00:00:00 AM On every sunday (morning). 0 is the beginning of a day, not the end. So it will fire the next second to 23:59:59 on Saturday