Polars truncate to day of a non-utc time-zone aware datetime series - python-polars

Let's say I have:
test=pl.DataFrame({'dt':pl.date_range(low=datetime(2022,11,1), high=datetime(2022,11,4), interval='1h')})
I can do:
test.with_column(pl.col('dt').dt.truncate('1d').alias('trunced'))
and the trunced column comes out as expected.
┌─────────────────────┬─────────────────────┐
│ dt ┆ trunced │
│ --- ┆ --- │
│ datetime[μs] ┆ datetime[μs] │
╞═════════════════════╪═════════════════════╡
│ 2022-11-01 00:00:00 ┆ 2022-11-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 01:00:00 ┆ 2022-11-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 02:00:00 ┆ 2022-11-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 03:00:00 ┆ 2022-11-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 21:00:00 ┆ 2022-11-03 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 22:00:00 ┆ 2022-11-03 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 23:00:00 ┆ 2022-11-03 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-04 00:00:00 ┆ 2022-11-04 00:00:00 │
└─────────────────────┴─────────────────────┘
However, if I localize my dt column then it doesn't work. For instance.
test=test.with_column(pl.col('dt').dt.tz_localize('America/New_York'))
test.with_column(pl.col('dt').dt.truncate('1d').alias('trunced'))
┌────────────────────────────────┬────────────────────────────────┐
│ dt ┆ trunced │
│ --- ┆ --- │
│ datetime[μs, America/New_York] ┆ datetime[μs, America/New_York] │
╞════════════════════════════════╪════════════════════════════════╡
│ 2022-11-01 00:00:00 EDT ┆ 2022-10-31 20:00:00 EDT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 01:00:00 EDT ┆ 2022-10-31 20:00:00 EDT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 02:00:00 EDT ┆ 2022-10-31 20:00:00 EDT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 03:00:00 EDT ┆ 2022-10-31 20:00:00 EDT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 21:00:00 EDT ┆ 2022-11-03 20:00:00 EDT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 22:00:00 EDT ┆ 2022-11-03 20:00:00 EDT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 23:00:00 EDT ┆ 2022-11-03 20:00:00 EDT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-04 00:00:00 EDT ┆ 2022-11-03 20:00:00 EDT │
└────────────────────────────────┴────────────────────────────────┘
It looks like it's converting to UTC, then truncating, then giving back the EPT timezone representation of that.
For that matter, I get similar behavior with strftime
test.with_column(pl.col('dt').dt.strftime('%Y-%m-%dT%H:%M:%S').alias('string'))
┌────────────────────────────────┬─────────────────────┐
│ dt ┆ string │
│ --- ┆ --- │
│ datetime[μs, America/New_York] ┆ str │
╞════════════════════════════════╪═════════════════════╡
│ 2022-11-01 00:00:00 EDT ┆ 2022-11-01T04:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 01:00:00 EDT ┆ 2022-11-01T05:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 02:00:00 EDT ┆ 2022-11-01T06:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 03:00:00 EDT ┆ 2022-11-01T07:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 21:00:00 EDT ┆ 2022-11-04T01:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 22:00:00 EDT ┆ 2022-11-04T02:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 23:00:00 EDT ┆ 2022-11-04T03:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-04 00:00:00 EDT ┆ 2022-11-04T04:00:00 │
└────────────────────────────────┴─────────────────────┘
With strftime it looks like it's converting to UTC before giving the string representation.
The functions of dt that strip out the parts work, for example, I can do this:
test.with_column((pl.col('dt').dt.year().cast(pl.Utf8) + pl.lit("-") + pl.col('dt').dt.month().cast(pl.Utf8).str.zfill(2) + pl.lit("-") + pl.col('dt').dt.day().cast(pl.Utf8).str.zfill(2)).str.strptime(pl.Date(),"%Y-%m-%d").alias('long_way'))
┌────────────────────────────────┬────────────┐
│ dt ┆ long_way │
│ --- ┆ --- │
│ datetime[μs, America/New_York] ┆ date │
╞════════════════════════════════╪════════════╡
│ 2022-11-01 00:00:00 EDT ┆ 2022-11-01 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 01:00:00 EDT ┆ 2022-11-01 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 02:00:00 EDT ┆ 2022-11-01 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-01 03:00:00 EDT ┆ 2022-11-01 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 21:00:00 EDT ┆ 2022-11-03 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 22:00:00 EDT ┆ 2022-11-03 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-03 23:00:00 EDT ┆ 2022-11-03 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-11-04 00:00:00 EDT ┆ 2022-11-04 │
└────────────────────────────────┴────────────┘
Am I doing something wrong or is this a bug?

This is fixed in no later than version 0.15.1

Related

Polars get count of events prior to "this" event, but within given duration

I have been struggling with creating a feature, a counter that counts number of events prior to each event, where each prior event should have occurred within a given duration (dt). I know how to do it for all previous events, it is easy by using cumsum and over of the given column. But, if I want to do this with only events within e.g last 2 days, how do I do that ??
Below is how I do it (the wrong way) with cumsum.
import polars as pl
from datetime import date
df = pl.DataFrame(
data = {
"Event":["Rain","Sun","Rain","Sun","Rain","Sun","Rain","Sun"],
"Date":[
date(2022,1,1),
date(2022,1,2),
date(2022,1,2),
date(2022,1,3),
date(2022,1,3),
date(2022,1,5),
date(2022,1,5),
date(2022,1,8)
]
}
)
df.select(
pl.col("Date").cumcount().over("Event").alias("cum_sum")
)
outputting
shape: (8, 3)
┌───────┬────────────┬─────────┐
│ Event ┆ Date ┆ cum_sum │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ u32 │
╞═══════╪════════════╪═════════╡
│ Rain ┆ 2022-01-01 ┆ 0 │
│ Sun ┆ 2022-01-02 ┆ 0 │
│ Rain ┆ 2022-01-02 ┆ 1 │
│ Sun ┆ 2022-01-03 ┆ 1 │
│ Rain ┆ 2022-01-03 ┆ 2 │
│ Sun ┆ 2022-01-05 ┆ 2 │
│ Rain ┆ 2022-01-05 ┆ 3 │
│ Sun ┆ 2022-01-08 ┆ 3 │
└───────┴────────────┴─────────┘
What I would like to output is this:
shape: (8, 3)
┌───────┬────────────┬─────────┐
│ Event ┆ Date ┆ cum_sum │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ u32 │
╞═══════╪════════════╪═════════╡
│ Rain ┆ 2022-01-01 ┆ 0 │
│ Sun ┆ 2022-01-02 ┆ 0 │
│ Rain ┆ 2022-01-02 ┆ 1 │
│ Sun ┆ 2022-01-03 ┆ 1 │
│ Rain ┆ 2022-01-03 ┆ 2 │
│ Sun ┆ 2022-01-05 ┆ 1 │
│ Rain ┆ 2022-01-05 ┆ 1 │
│ Sun ┆ 2022-01-08 ┆ 0 │
└───────┴────────────┴─────────┘
(Preferably, a solution that scales somewhat well..)
Thanks
Tried this without success
You can try a groupby_rolling for this.
(
df
.groupby_rolling(
index_column="Date",
period="2d",
by="Event",
closed='both',
)
.agg([
pl.count() - 1
])
.sort(["Date", "Event"], reverse=[False, True])
)
shape: (8, 3)
┌───────┬────────────┬───────┐
│ Event ┆ Date ┆ count │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ u32 │
╞═══════╪════════════╪═══════╡
│ Rain ┆ 2022-01-01 ┆ 0 │
│ Sun ┆ 2022-01-02 ┆ 0 │
│ Rain ┆ 2022-01-02 ┆ 1 │
│ Sun ┆ 2022-01-03 ┆ 1 │
│ Rain ┆ 2022-01-03 ┆ 2 │
│ Sun ┆ 2022-01-05 ┆ 1 │
│ Rain ┆ 2022-01-05 ┆ 1 │
│ Sun ┆ 2022-01-08 ┆ 0 │
└───────┴────────────┴───────┘
We subtract one in the agg because we do not want to count the current event, only prior events. (The sort at the end is just to order the rows to match the original data.)

Nested time-based groupby operations/sub-groups without apply()?

I'm wanting to understand the polars way to create temporal sub-groups out of the groups from a groupby_rolling() operation.
I'm looking to do this keeping things parallel i.e. without using apply() (see that approach) and without using secondary/merging dataframes.
Example input:
┌─────┬─────────────────────┬───────┐
│ row ┆ date ┆ price │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ i64 │
╞═════╪═════════════════════╪═══════╡
│ 1 ┆ 2022-01-01 10:00:00 ┆ 10 │
│ 2 ┆ 2022-01-01 10:05:00 ┆ 20 │
│ 3 ┆ 2022-01-01 10:10:00 ┆ 30 │
│ 4 ┆ 2022-01-01 10:15:00 ┆ 40 │
│ 5 ┆ 2022-01-01 10:20:00 ┆ 50 │
│ 6 ┆ 2022-01-01 10:25:00 ┆ 60 │
│ 7 ┆ 2022-01-01 10:30:00 ┆ 70 │
│ 8 ┆ 2022-01-01 10:35:00 ┆ 80 │
│ 8 ┆ 2022-01-01 10:40:00 ┆ 90 │
│ 9 ┆ 2022-01-01 10:45:00 ┆ 100 │
│ 10 ┆ 2022-01-01 10:50:00 ┆ 110 │
│ 11 ┆ 2022-01-01 10:55:00 ┆ 120 │
│ 12 ┆ 2022-01-01 11:00:00 ┆ 130 │
└─────┴─────────────────────┴───────┘
Desired output:
┌─────┬─────────────────────┬───────┬──────────────────────────────────┐
│ row ┆ date ┆ price ┆ 10_min_groups_mean_price_history │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ i64 ┆ list[i64] │
╞═════╪═════════════════════╪═══════╪══════════════════════════════════╡
│ 1 ┆ 2022-01-01 10:00:00 ┆ 10 ┆ [10] │
│ 2 ┆ 2022-01-01 10:05:00 ┆ 20 ┆ [15] │
│ 3 ┆ 2022-01-01 10:10:00 ┆ 30 ┆ [25, 10] │
│ 4 ┆ 2022-01-01 10:15:00 ┆ 40 ┆ [35, 15] │
│ 5 ┆ 2022-01-01 10:20:00 ┆ 50 ┆ [45, 25, 10] │
│ 6 ┆ 2022-01-01 10:25:00 ┆ 60 ┆ [55, 35, 15] │
│ 7 ┆ 2022-01-01 10:30:00 ┆ 70 ┆ [65, 45, 25] │
│ 8 ┆ 2022-01-01 10:35:00 ┆ 80 ┆ [75, 55, 35] │
│ 8 ┆ 2022-01-01 10:40:00 ┆ 90 ┆ [85, 65, 45] │
│ 9 ┆ 2022-01-01 10:45:00 ┆ 100 ┆ [95, 75, 55] │
│ 10 ┆ 2022-01-01 10:50:00 ┆ 110 ┆ [105, 85, 65] │
│ 11 ┆ 2022-01-01 10:55:00 ┆ 120 ┆ [115, 95, 75] │
│ 12 ┆ 2022-01-01 11:00:00 ┆ 130 ┆ [125, 105, 85] │
└─────┴─────────────────────┴───────┴──────────────────────────────────┘
What is happening above?
A rolling window is applied over the dataframe producing a window per row.
Each window includes all rows within the last 30min (including the current row).
Then, each 30min window is devided into 10min sub-groups.
The mean price is calculated for each 10min sub-group
All mean prices from the sub-groups are returned as a list (most recent first) to the "10_min_groups_mean_price_history " column
Worked example (using row 5 as an example):
The rolling window for row 5 captures the previous 30min of data, which is rows 1 to 5
These rows are sub-grouped into 10min windows creating three sub-groups that capture rows [[5,4],[3,2],[1]]
The mean price of the rows in each sub-group is calculated and produced as a list → [45, 25, 10]
Mental model:
I'm conceptualising this as treating each window from a groupby_rolling() operation as a dataframe that can be computed as needed (in this case by performing a groupby_dynamic() operation on it, with the intent of returning aggregations on those sub-groups as a list), but not sure if that is the right way to think about it???
If the sub-group data was categorical it would be a simple case of using over() however I'm not aware of an equivalent when the requirement is to sub-group by time series?
I am also under the impression that this operation should be parallelisable as each window is independent from each other (its just more calc steps), but please point out if there's a reason it can't be.
Thanks in advance!
Full dummy data set:
If you want to run this with a realistic sized dataset you can use
df_dummy = pl.DataFrame({
'date' : pl.date_range(
datetime(2000, 1, 1, 9),
datetime(2000, 1, 1, 16, 59, 59),
timedelta(seconds=1),
)
})
df_dummy = df_dummy.with_column(
pl.Series(np.random.uniform(.5,.95,len(df_dummy)) * 100 ).alias('price')
)
Other ways that people might ask this question (for others searching):
groupby_dynamic() within groupby_rolling()
How to access polars RollingGroupBy[Dataframe] Object
Treat each groupby_rolling() window as a dataframe to aggrigate on
Nested dataframes within groupby context
Nested groupby contexts
Could you .explode() the .groupby_rolling() - then use the resulting column for your .groupby_dynamic()?
(df.groupby_rolling(index_column="date", period="30m", closed="both")
.agg(pl.col("date").alias("window"))
.explode("window"))
shape: (70, 2)
┌─────────────────────┬─────────────────────┐
│ date | window │
│ --- | --- │
│ datetime[μs] | datetime[μs] │
╞═════════════════════╪═════════════════════╡
│ 2022-01-01 10:00:00 | 2022-01-01 10:00:00 │
│ 2022-01-01 10:05:00 | 2022-01-01 10:00:00 │
│ 2022-01-01 10:05:00 | 2022-01-01 10:05:00 │
│ 2022-01-01 10:10:00 | 2022-01-01 10:00:00 │
│ 2022-01-01 10:10:00 | 2022-01-01 10:05:00 │
│ 2022-01-01 10:10:00 | 2022-01-01 10:10:00 │
│ 2022-01-01 10:15:00 | 2022-01-01 10:00:00 │
│ 2022-01-01 10:15:00 | 2022-01-01 10:05:00 │
│ 2022-01-01 10:15:00 | 2022-01-01 10:10:00 │
│ 2022-01-01 10:15:00 | 2022-01-01 10:15:00 │
│ ... | ... │
│ 2022-01-01 10:55:00 | 2022-01-01 10:45:00 │
│ 2022-01-01 10:55:00 | 2022-01-01 10:50:00 │
│ 2022-01-01 10:55:00 | 2022-01-01 10:55:00 │
│ 2022-01-01 11:00:00 | 2022-01-01 10:30:00 │
│ 2022-01-01 11:00:00 | 2022-01-01 10:35:00 │
│ 2022-01-01 11:00:00 | 2022-01-01 10:40:00 │
│ 2022-01-01 11:00:00 | 2022-01-01 10:45:00 │
│ 2022-01-01 11:00:00 | 2022-01-01 10:50:00 │
│ 2022-01-01 11:00:00 | 2022-01-01 10:55:00 │
│ 2022-01-01 11:00:00 | 2022-01-01 11:00:00 │
└─────────────────────┴─────────────────────┘
Something along the lines of:
[Edit: Removed the unneeded .join() per #ΩΠΟΚΕΚΡΥΜΜΕΝΟΣ's help.]
(df.groupby_rolling(index_column="date", period="30m", closed="both")
.agg([pl.col("date").alias("window"), pl.col("price")])
.explode(["window", "price"])
.groupby_dynamic(by="date", index_column="window", every="10m", closed="right")
.agg(pl.col("price")) # pl.col("price").mean()
.groupby("date", maintain_order=True)
.agg(pl.all()))
shape: (13, 3)
┌─────────────────────┬─────────────────────────────────────┬──────────────────────────────────┐
│ date | window | price │
│ --- | --- | --- │
│ datetime[μs] | list[datetime[μs]] | list[list[i64]] │
╞═════════════════════╪═════════════════════════════════════╪══════════════════════════════════╡
│ 2022-01-01 10:00:00 | [2022-01-01 09:50:00] | [[10]] │
│ 2022-01-01 10:05:00 | [2022-01-01 09:50:00, 2022-01-01... | [[10], [20]] │
│ 2022-01-01 10:10:00 | [2022-01-01 09:50:00, 2022-01-01... | [[10], [20, 30]] │
│ 2022-01-01 10:15:00 | [2022-01-01 09:50:00, 2022-01-01... | [[10], [20, 30], [40]] │
│ 2022-01-01 10:20:00 | [2022-01-01 09:50:00, 2022-01-01... | [[10], [20, 30], [40, 50]] │
│ 2022-01-01 10:25:00 | [2022-01-01 09:50:00, 2022-01-01... | [[10], [20, 30], ... [60]] │
│ 2022-01-01 10:30:00 | [2022-01-01 09:50:00, 2022-01-01... | [[10], [20, 30], ... [60, 70]] │
│ 2022-01-01 10:35:00 | [2022-01-01 10:00:00, 2022-01-01... | [[20, 30], [40, 50], ... [80]] │
│ 2022-01-01 10:40:00 | [2022-01-01 10:00:00, 2022-01-01... | [[30], [40, 50], ... [80, 90]] │
│ 2022-01-01 10:45:00 | [2022-01-01 10:10:00, 2022-01-01... | [[40, 50], [60, 70], ... [100]] │
│ 2022-01-01 10:50:00 | [2022-01-01 10:10:00, 2022-01-01... | [[50], [60, 70], ... [100, 110]] │
│ 2022-01-01 10:55:00 | [2022-01-01 10:20:00, 2022-01-01... | [[60, 70], [80, 90], ... [120]] │
│ 2022-01-01 11:00:00 | [2022-01-01 10:20:00, 2022-01-01... | [[70], [80, 90], ... [120, 130]] │
└─────────────────────┴─────────────────────────────────────┴──────────────────────────────────┘

Efficient way to rename columns from pivot

Currently pivot is joining the "values" column and value from "columns" column as new column name using underscore. Example from data below, new column name = "monthly_qty" + "_" + "product_a"
>>> data = pl.DataFrame({"month":["Jan", "Jan", "Feb", "Feb", "Mar", "Mar"], "type":["product_a", "product_b"]*3, "monthly_qty":[10,20]*3, "monthly_amt":[5., 8.]*3})
>>> data
shape: (6, 4)
┌───────┬───────────┬─────────────┬─────────────┐
│ month ┆ type ┆ monthly_qty ┆ monthly_amt │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ f64 │
╞═══════╪═══════════╪═════════════╪═════════════╡
│ Jan ┆ product_a ┆ 10 ┆ 5.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Jan ┆ product_b ┆ 20 ┆ 8.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Feb ┆ product_a ┆ 10 ┆ 5.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Feb ┆ product_b ┆ 20 ┆ 8.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Mar ┆ product_a ┆ 10 ┆ 5.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Mar ┆ product_b ┆ 20 ┆ 8.0 │
└───────┴───────────┴─────────────┴─────────────┘
>>> data = data.pivot(index="month", columns="type", values=["monthly_qty", "monthly_amt"])
>>> data
shape: (3, 5)
┌───────┬───────────────────────┬───────────────────────┬───────────────────────┬───────────────────────┐
│ month ┆ monthly_qty_product_a ┆ monthly_qty_product_b ┆ monthly_amt_product_a ┆ monthly_amt_product_b │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ f64 ┆ f64 │
╞═══════╪═══════════════════════╪═══════════════════════╪═══════════════════════╪═══════════════════════╡
│ Jan ┆ 10 ┆ 20 ┆ 5.0 ┆ 8.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Feb ┆ 10 ┆ 20 ┆ 5.0 ┆ 8.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Mar ┆ 10 ┆ 20 ┆ 5.0 ┆ 8.0 │
└───────┴───────────────────────┴───────────────────────┴───────────────────────┴───────────────────────┘
I wish to rename the columns as below, but not sure what is the most efficient way.
old column = "monthly_qty_product_a"
new_column = "product_a:monthly_qty"
This is what I can think of now, provided that the number of underscore is fixed.
>>> new_cols = {col:col if col=="month" else f"{'_'.join(col.split('_')[2:])}:{'_'.join(col.split('_')[0:2])}"for col in data.columns}
>>> data.rename(new_cols)
shape: (3, 5)
┌───────┬───────────────────────┬───────────────────────┬───────────────────────┬───────────────────────┐
│ month ┆ product_a:monthly_qty ┆ product_b:monthly_qty ┆ product_a:monthly_amt ┆ product_b:monthly_amt │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ f64 ┆ f64 │
╞═══════╪═══════════════════════╪═══════════════════════╪═══════════════════════╪═══════════════════════╡
│ Jan ┆ 10 ┆ 20 ┆ 5.0 ┆ 8.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Feb ┆ 10 ┆ 20 ┆ 5.0 ┆ 8.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Mar ┆ 10 ┆ 20 ┆ 5.0 ┆ 8.0 │
└───────┴───────────────────────┴───────────────────────┴───────────────────────┴───────────────────────┘
This will not work if value column has more than one underscore, e.g. "monthly_growth_pct"
Is there a better way of doing this? Any advice is much appreciated
Thanks!
There is no way in DataFrame.pivot to control this naming.
I would suggest to modify your long format dataframe (6 x 4) a bit by renaming the column monthly_qty to monthly_qty<CHAR>, where <CHAR> is a character you are quite sure is not present, for example !:
data = data.rename({"monthly_qty":"monthly_qty!"})
Proceed with the pivot, and then split on ! in your renaming logic.

iterate through groupby like pandas with a tuple

So when i iterate through a pandas.groupby() what i get back is a tuple. This was important because i could do [x for x in df_pandas.sort('date').groupby('grouping_column')] and then sort this list of tuples based on x[0].
In pandas it's also autosorted after a groupby
I did that to have a constant output in plotly. (Area chart)
Now with polars, i can't do the same. I just get the dataframe back. Is there any way to accomplish the same?
I tried adding a sort([pl.col('date'), pl.col('grouping_column') but it had no effect.
What's in my mind for polars is this:
for value in df.select('grouping_column').uniqeue().to_numpy():
df = df.filter(pl.column('grouping_column') == value)
...
This will in fact give the desired results, because it will always iterate through the same sequence, while the groupby is kinda random and the order doesn't seem to matter at all.
My problem is it that the second solution seems to be not really efficient.
The other thing i could do is
[(sub_df['some_col'].to_numpy()[0], sub_df) for sub_df in df.groupby('some_col')]
Use then pythons sort to sort the list based on key in the tuple x[0] and then reiterate through the list. However this solution seems super ugly as well.
You can use the partition_by function to create a dictionary of key-value pairs, where the keys are your grouping_column and your values are a DataFrame.
For example, let's say we have this data:
import polars as pl
from datetime import datetime
df = pl.DataFrame({"grouping_column": [1, 2, 3], }).join(
pl.DataFrame(
{
"date": pl.date_range(datetime(2020, 1, 1), datetime(2020, 3, 1), "1mo"),
}
),
how="cross",
)
df
shape: (9, 2)
┌─────────────────┬─────────────────────┐
│ grouping_column ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[ns] │
╞═════════════════╪═════════════════════╡
│ 1 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2020-03-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2020-03-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2020-03-01 00:00:00 │
└─────────────────┴─────────────────────┘
We can split the DataFrame into a dictionary.
df.partition_by(groups='grouping_column', maintain_order=True, as_dict=True)
{1: shape: (3, 2)
┌─────────────────┬─────────────────────┐
│ grouping_column ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[ns] │
╞═════════════════╪═════════════════════╡
│ 1 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2020-03-01 00:00:00 │
└─────────────────┴─────────────────────┘,
2: shape: (3, 2)
┌─────────────────┬─────────────────────┐
│ grouping_column ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[ns] │
╞═════════════════╪═════════════════════╡
│ 2 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2020-03-01 00:00:00 │
└─────────────────┴─────────────────────┘,
3: shape: (3, 2)
┌─────────────────┬─────────────────────┐
│ grouping_column ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[ns] │
╞═════════════════╪═════════════════════╡
│ 3 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2020-03-01 00:00:00 │
└─────────────────┴─────────────────────┘}
From there, you can create the tuples using the items method of the Python's dictionary.
for x in df.partition_by(groups='grouping_column', maintain_order=True, as_dict=True).items():
print("next item")
print(x)
next item
(1, shape: (3, 2)
┌─────────────────┬─────────────────────┐
│ grouping_column ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[ns] │
╞═════════════════╪═════════════════════╡
│ 1 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2020-03-01 00:00:00 │
└─────────────────┴─────────────────────┘)
next item
(2, shape: (3, 2)
┌─────────────────┬─────────────────────┐
│ grouping_column ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[ns] │
╞═════════════════╪═════════════════════╡
│ 2 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2020-03-01 00:00:00 │
└─────────────────┴─────────────────────┘)
next item
(3, shape: (3, 2)
┌─────────────────┬─────────────────────┐
│ grouping_column ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[ns] │
╞═════════════════╪═════════════════════╡
│ 3 ┆ 2020-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2020-02-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2020-03-01 00:00:00 │
└─────────────────┴─────────────────────┘)

convert a pandas loc operation that needed the index to assign values to polars

In this example i have three columns, the 'DayOfWeek' Time' and the 'Risk'.
I want to group by 'DayOfWeek' and take the first element only and assign a high risk on it. This means the first known hour in day of week is the one that has the highest risk. The rest is initialized to 'Low' risk.
In pandas i had an additional column for the index, but in polars i do not. I could artificially create one, but is it even necessary?
Can i do this somehow smarter with polars?
df['risk'] = "Low"
df = df.sort('Time')
df.loc[df.groupby("DayOfWeek").head(1).index, "risk"] = "High"
The index is unique in this case and goes to range(n)
Here is my solution btw. (I don't really like it)
df = df.with_column(pl.arange(0, df.shape[0]).alias('pseudo_index')
# find lowest time for day
indexes_df = df.sort('Time').groupby('DayOfWeek').head(1)
# Set 'High' as col for all rows from groupby
indexes_df = indexes_df.select('pseudo_index').with_column(pl.lit('High').alias('risk'))
# Left join will generate null values for all values that are not in indexes_df 'pseudo_index'
df = df.join(indexes_df, how='left', on='pseudo_index').select([
pl.all().exclude(['pseudo_index', 'risk']), pl.col('risk').fill_null(pl.lit('low'))
])
You can use window functions to find where the first "index" of the "DayOfWeek" group equals the"index" column.
For that we only need to set an "index" column. We can do that easily with:
A method: df.with_row_count(<name>)
An expression: pl.arange(0, pl.count()).alias(<name>)
After that we can use this predicate:
pl.first("index").over("DayOfWeek") == pl.col("index")
Finally we use a when -> then -> otherwise expression to use that condition and create our new "Risk" column.
Example
Let's start with some data. In the snippet below I create an hourly date range and then determine the weekdays from that.
Preparing data
df = pl.DataFrame({
"Time": pl.date_range(datetime(2022, 6, 1), datetime(2022, 6, 30), "1h").sample(frac=1.5, with_replacement=True).sort(),
}).select([
pl.arange(0, pl.count()).alias("index"),
pl.all(),
pl.col("Time").dt.weekday().alias("DayOfWeek"),
])
print(df)
shape: (1045, 3)
┌───────┬─────────────────────┬───────────┐
│ index ┆ Time ┆ DayOfWeek │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[ns] ┆ u32 │
╞═══════╪═════════════════════╪═══════════╡
│ 0 ┆ 2022-06-29 22:00:00 ┆ 3 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2022-06-14 11:00:00 ┆ 2 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2022-06-11 21:00:00 ┆ 6 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 2022-06-27 20:00:00 ┆ 1 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... ┆ ... │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1041 ┆ 2022-06-11 09:00:00 ┆ 6 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1042 ┆ 2022-06-18 22:00:00 ┆ 6 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1043 ┆ 2022-06-18 01:00:00 ┆ 6 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1044 ┆ 2022-06-23 18:00:00 ┆ 4 │
└───────┴─────────────────────┴───────────┘
Computing Risk values
df.with_column(
pl.when(
pl.first("index").over("DayOfWeek") == pl.col("index")
).then(
"High"
).otherwise(
"Low"
).alias("Risk")
).drop("index")
print(df)
shape: (1045, 3)
┌─────────────────────┬───────────┬──────┐
│ Time ┆ DayOfWeek ┆ Risk │
│ --- ┆ --- ┆ --- │
│ datetime[ns] ┆ u32 ┆ str │
╞═════════════════════╪═══════════╪══════╡
│ 2022-06-29 22:00:00 ┆ 3 ┆ High │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2022-06-14 11:00:00 ┆ 2 ┆ High │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2022-06-11 21:00:00 ┆ 6 ┆ High │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2022-06-27 20:00:00 ┆ 1 ┆ High │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ ... ┆ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2022-06-11 09:00:00 ┆ 6 ┆ Low │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2022-06-18 22:00:00 ┆ 6 ┆ Low │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2022-06-18 01:00:00 ┆ 6 ┆ Low │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2022-06-23 18:00:00 ┆ 4 ┆ Low │
└─────────────────────┴───────────┴──────┘