what does below cron expression ,means? - quartz-scheduler

i have below cron expression.
"0 0 0 ? * SUN"
when exactly this is executed? midnight of sunday of saturday?
Thanks!

See the tutorial
* * * * * * (year optional)
┬ ┬ ┬ ┬ ┬ ┬
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ └───── day of week (0 - 7) (0 or 7 is Sun, or use names)
│ │ │ │ └────────── month (1 - 12)
│ │ │ └─────────────── day of month (1 - 31)
│ │ └──────────────────── hour (0 - 23)
│ └───────────────────────── min (0 - 59)
└───────────────────────── seconds
Wild-cards (the * character) can be used to say "every" possible value of this field. Therefore the * character in the "Month" field of the previous example simply means "every month". A '*' in the Day-Of-Week field would therefore obviously mean "every day of the week".
The '?' character is allowed for the day-of-month and day-of-week fields. It is used to specify "no specific value". This is useful when you need to specify something in one of the two fields, but not the other. See the examples below (and CronTrigger JavaDoc) for clarification.
So it means every sunday at midnight

This is not actually a cron expression. It is a quartz-schedule expression.
http://quartz-scheduler.org/documentation/quartz-2.1.x/tutorials/crontrigger
The 0 0 0 means midnight (second, minutes, hour)
The ? means it depends on other fields.
The * means all months.
The SUN means on Sunday.

The trigger will fire at 00:00:00 AM On every sunday (morning). 0 is the beginning of a day, not the end. So it will fire the next second to 23:59:59 on Saturday

Related

Date range in window functions PostgreSQL

I have a table with data for the whole of 2021 and 2022. I need to calculate the cumulative amount of the field by date for the previous year. For example, the row with date 2022-03-01 must have the value of the cumulative amount for 2021-03-01
I am trying this window function:
SUM(fact_mln) OVER(PARTITION BY date - INTERVAL '1 year' ORDER BY date)
But the method - INTERVAl '1 year' is not working
How can this window function be converted or is there any other solution?
You don't need PARTITION BY:
CREATE TABLE laurenz.amounts (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
date date NOT NULL,
fact_mln double precision NOT NULL
);
INSERT INTO amounts (date, fact_mln) VALUES
('2021-03-01', 1),
('2021-03-01', 2),
('2021-03-02', 3),
('2021-03-02', 4),
('2021-03-03', 5),
('2022-02-28', 6),
('2022-03-01', 7),
('2022-03-01', 8),
('2022-03-02', 9);
SELECT date, fact_mln,
sum(fact_mln) OVER (ORDER BY date
RANGE BETWEEN INTERVAL '1' YEAR PRECEDING
AND INTERVAL '1' YEAR PRECEDING)
FROM amounts;
date │ fact_mln │ sum
════════════╪══════════╪═════
2021-03-01 │ 1 │ ∅
2021-03-01 │ 2 │ ∅
2021-03-02 │ 3 │ ∅
2021-03-02 │ 4 │ ∅
2021-03-03 │ 5 │ ∅
2022-02-28 │ 6 │ ∅
2022-03-01 │ 7 │ 3
2022-03-01 │ 8 │ 3
2022-03-02 │ 9 │ 7
(9 rows)

How to the get the first n% of a group in polars?

Q1: In polars-rust, when you do .groupby().agg() , we can use .head(10) to get the first 10 elements in a column. But if the groups have different lengths and I need to get first 20% elements in each group (like 0-24 elements in a 120 elements group). How to make it work?
Q2: with a dataframe sample like below, my goal is to loop the dataframe. Beacuse polars is column major, so I downcasted df into serval ChunkedArrays and iterated via iter().zip().I found it is faster than the same action after goupby(col("date")) which is loop some list elemnts. How is that?
In my opinion, the length of df is shorter after groupby, which means a shorter loop.
Date
Stock
Price
2010-01-01
IBM
1000
2010-01-02
IBM
1001
2010-01-03
IBM
1002
2010-01-01
AAPL
2900
2010-01-02
AAPL
2901
2010-01-03
AAPL
2902
I don't really understand your 2nd question. Maybe you can create another question with a small example.
I will answer the 1st question:
we can use head(10) to get the first 10 elements in a col. But if the groups have different length and I need to get first 20% elements in each group like 0-24 elements in a 120 elements group. how to make it work?
We can use expressions to take a head(n) where n = 0.2 group_size.
df = pl.DataFrame({
"groups": ["a"] * 10 + ["b"] * 20,
"values": range(30)
})
(df.groupby("groups")
.agg(pl.all().head(pl.count() * 0.2))
.explode(pl.all().exclude("groups"))
)
which outputs:
shape: (6, 2)
┌────────┬────────┐
│ groups ┆ values │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════╪════════╡
│ a ┆ 0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ a ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b ┆ 10 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b ┆ 11 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b ┆ 12 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b ┆ 13 │
└────────┴────────┘

Counting the same positional bits in postgresql bitmasks

I am trying to count each same position bit of multiple bitmasks in postgresql, here is an example of the problem:
Suppose i have three bitmasks (in binary) like:
011011011100110
100011010100101
110110101010101
Now what I want to do is to get the total count of bits in each separate column, considering the above masks as three rows and multiple columns.
e.g The first column have count 2, the second one have count 2, the third one have count of 1 and so on...
In actual i have total of 30 bits in each bitmasks in my database. I want to do it in PostgreSQL. I am open for further explanation of the problem if needed.
You could do it by using the get_bit functoin and a couple of joins:
SELECT sum(bit) FILTER (WHERE i = 0) AS count_0,
sum(bit) FILTER (WHERE i = 1) AS count_1,
...
sum(bit) FILTER (WHERE i = 29) AS count_29
FROM bits
CROSS JOIN generate_series(0, 29) AS i
CROSS JOIN LATERAL get_bit(b, i) AS bit;
The column with the bit string is b in my example.
You could use the bitwise and & operator and bigint arithmetic so long as your bitstrings contain 63 bits or fewer:
# create table bmasks (mask bit(15));
CREATE TABLE
# insert into bmasks values ('011011011100110'), ('100011010100101'), ('110110101010101');
INSERT 0 3
# with masks as (
select (2 ^ x)::bigint::bit(15) as mask, x as posn
from generate_series(0, 14) as gs(x)
)
select m.posn, m.mask, sum((b.mask & m.mask > 0::bit(15))::int) as set_bits
from masks m
cross join bmasks b
group by m.posn, m.mask;
┌──────┬─────────────────┬──────────┐
│ posn │ mask │ set_bits │
├──────┼─────────────────┼──────────┤
│ 0 │ 000000000000001 │ 2 │
│ 1 │ 000000000000010 │ 1 │
│ 2 │ 000000000000100 │ 3 │
│ 3 │ 000000000001000 │ 0 │
│ 4 │ 000000000010000 │ 1 │
│ 5 │ 000000000100000 │ 2 │
│ 6 │ 000000001000000 │ 2 │
│ 7 │ 000000010000000 │ 2 │
│ 8 │ 000000100000000 │ 1 │
│ 9 │ 000001000000000 │ 2 │
│ 10 │ 000010000000000 │ 3 │
│ 11 │ 000100000000000 │ 1 │
│ 12 │ 001000000000000 │ 1 │
│ 13 │ 010000000000000 │ 2 │
│ 14 │ 100000000000000 │ 2 │
└──────┴─────────────────┴──────────┘
(15 rows)

ClickHouse: Efficient way to aggregate data by different time ranges at

I need to aggregate time-series data (with average functions) on different timeslots, like:
Today
last X days
Last weekend
This week
Last X weeks
This month
List item
etc...
Q1: Can it be done within GROUP BY statement or at least with a single query?
Q2: Do I need any Materialized View for that?
The table is partitioned by Month and sharded by UserID
All queries are within UserID (single shard)
group by with ROLLUP
create table xrollup(metric Int64, b date, v Int64 ) engine=MergeTree partition by tuple() order by tuple();
insert into xrollup values (1,'2018-01-01', 1), (1,'2018-01-02', 1), (1,'2018-02-01', 1), (1,'2017-03-01', 1);
insert into xrollup values (2,'2018-01-01', 1), (2,'2018-02-02', 1);
SELECT metric, toYear(b) y, toYYYYMM(b) m, SUM(v) AS val
FROM xrollup
GROUP BY metric, y, m with ROLLUP
ORDER BY metric, y, m
┌─metric─┬────y─┬──────m─┬─val─┐
│ 0 │ 0 │ 0 │ 6 │ overall
│ 1 │ 0 │ 0 │ 4 │ overall by metric1
│ 1 │ 2017 │ 0 │ 1 │ overall by metric1 for 2017
│ 1 │ 2017 │ 201703 │ 1 │ overall by metric1 for march 2017
│ 1 │ 2018 │ 0 │ 3 │
│ 1 │ 2018 │ 201801 │ 2 │
│ 1 │ 2018 │ 201802 │ 1 │
│ 2 │ 0 │ 0 │ 2 │
│ 2 │ 2018 │ 0 │ 2 │
│ 2 │ 2018 │ 201801 │ 1 │
│ 2 │ 2018 │ 201802 │ 1 │
└────────┴──────┴────────┴─────┘
Although there's an accepted answer I had to do something similar but found an alternative route using aggregate function combinators, specially -If to select specific date ranges.
I needed to group by a content ID but retrieve unique views for the whole time range and also for specific buckets to generate a histogram (ClickHouse's histogram() function wasn't suitable because there's no option for sub aggregation).
You could do something along these lines:
SELECT
group_field,
avgIf(metric, date BETWEEN toDate('2022-09-03') AND toDate('2022-09-10')) AS week_avg
avgIf(metric, date BETWEEN toDate('2022-08-10') AND toDate('2022-09-10')) AS month_avg
FROM data
GROUP BY group_field

Subtract value from previous row value if it is greater than the max value

Im using Postgresql & Sequelize. I have to find the consumption from the reading table. Currently, I have the query to subtract the value from the previous row. But the problem was If the value is less than the previous value means I have to ignore the row and need to wait for the greater value to make the calculation.
Current Query
select "readingValue",
"readingValue" - coalesce(lag("readingValue") over (order by "id")) as consumption
from public."EnergyReadingTbl";
Example Record & Current Output
id readingValue consumption
65479 "35.8706703186035" "3.1444168090820"
65480 "39.0491638183594" "3.1784934997559"
65481 "42.1287002563477" "3.0795364379883"
65482 "2.38636064529419" "-39.74233961105351"
65483 "5.91744041442871" "3.53107976913452"
65484 "9.59204387664795" "3.67460346221924"
65485 "14.3925561904907" "4.80051231384275"
65486 "19.4217891693115" "5.0292329788208"
65487 "24.2393398284912" "4.8175506591797"
65488 "29.2515335083008" "5.0121936798096"
65489 "34.2519302368164" "5.0003967285156"
65490 "38.6513633728027" "4.3994331359863"
65491 "43.7513643778087" "5.1000010050060"
In this picture, the last max value was 42.1287002563477. I have to wait until to get the greater value than 42.1287002563477 to make the calculation like the next greater value - 42.1287002563477. In this, 43.7513643778087 - 42.1287002563477.
Expected Output
id readingValue consumption
65479 "35.8706703186035" "3.1444168090820"
65480 "39.0491638183594" "3.1784934997559"
65481 "42.1287002563477" "3.0795364379883"
65482 "2.38636064529419" "0"
65483 "5.91744041442871" "0"
65484 "9.59204387664795" "0"
65485 "14.3925561904907" "0"
65486 "19.4217891693115" "0"
65487 "24.2393398284912" "0"
65488 "29.2515335083008" "0"
65489 "34.2519302368164" "0"
65490 "38.6513633728027" "0"
65491 "43.7513643778087" "1.1226641214710"
Is there any chance to resolve this issue in the query?
You can use the ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING to limit the frame of the window function, so you can substract the MAX up to the current row with the MAX value of the rows up to but excluding the current row:
SELECT readingValue,
MAX(readingValue) OVER (ORDER BY id) - MAX(readingValue) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM e;
┌──────────────────┬─────────────────┐
│ readingvalue │ ?column? │
├──────────────────┼─────────────────┤
│ 35.8706703186035 │ (null) │
│ 39.0491638183594 │ 3.1784934997559 │
│ 42.1287002563477 │ 3.0795364379883 │
│ 2.38636064529419 │ 0 │
│ 5.91744041442871 │ 0 │
│ 9.59204387664795 │ 0 │
│ 14.3925561904907 │ 0 │
│ 19.4217891693115 │ 0 │
│ 24.2393398284912 │ 0 │
│ 29.2515335083008 │ 0 │
│ 34.2519302368164 │ 0 │
│ 38.6513633728027 │ 0 │
│ 43.7513643778087 │ 1.622664121461 │
└──────────────────┴─────────────────┘
(13 rows)
Time: 0,430 ms