groupby_dynamic with selfdesigned index

groupby_dynamic with selfdesigned index - python-polars

how to make something like groupby_dynamic but can support a user-defined index
the groupby_dynamic can support timeindex to make a operation as a resample
but can only support the range of a non-duplicate way, such as
time
day1 9:00
day1 15:00
day2 9:00
day2 15:00
day3 9:00
day3 15:00
dynamic groupby to 1D
day1 9:00
day1 15:00
--------------
day2 9:00
day2 15:00
-------------
day3 9:00
day3 15:00
the feature i ask is a more user-defined way to dynamic-groupby, and the index may be duplicated
day1 9:00
day1 15:00
day2 9:00
day2 15:00
-------------
day2 9:00
day2 15:00
day3 9:00
day3 15:00
--------------
i can use rolling in a series, but the rolling_apply waste a lot of time cause it roll every index
day1 9:00
day1 15:00
day2 9:00
day2 15:00
-------------
day1 15:00
day2 9:00
day2 15:00
day3 9:00
-------------- -------> this window is useless
day2 9:00
day2 15:00
day3 9:00
day3 15:00
-------------
day2 15:00
day3 9:00
day3 15:00
day4 9:00
------------ -------> this window is useless
example pic

The solution is to give a different value between the every || period.
every decides the output of the index.
periods gives the window you need.
Examples
import datetime
df = pl.DataFrame(
{
"time": pl.date_range(
low=datetime.datetime(2021, 12, 16),
high=datetime.datetime(2021, 12, 22),
interval="12h",
),
"n": [1 for i in range(13)]
}
)
df.groupby_dynamic('time', period='2d', every='1d',include_boundaries=True,truncate=False,closed='right').agg( pl.col('n').sum())

Related

How to transpose specific columns in PySpark

I have a following PySpark dataframe:
year week date time value
2020 1 20201203 2:00 - 2:15 23.9
2020 1 20201203 2:15 - 2:30 45.87
2020 1 20201203 2:30 - 2:45 87.76
2020 1 20201203 2:45 - 3:00 12.87
I want to transpose the time and value column. The desired output should be:
year week date 2:00 - 2:15 2:15 - 2:30 2:30 - 2:45 2:45 - 3:00
2020 1 20201203 23.9 45.87 87.76 12.87

You can use groupby and pivot.
df = df.groupby('year', 'week', 'date').pivot('time').max('value')

Need to add date ranges between two date columns in pyspark?

I have input pyspark dataframe with columns like ID,StartDatetime,EndDatetime. I want to add new column named newdate based on startdatetime and enddatetime.
Input DF :-
ID StartDatetime EndDatetime
1 21-06-2021 07:00 24-06-2021 16:00
2 21-06-2021 07:00 22-06-2021 16:00
required output :-
ID StartDatetime EndDatetime newdate
1 21-06-2021 07:00 24-06-2021 16:00 21-06-2021
1 21-06-2021 07:00 24-06-2021 16:00 22-06-2021
1 21-06-2021 07:00 24-06-2021 16:00 23-06-2021
1 21-06-2021 07:00 24-06-2021 16:00 24-06-2021
2 21-06-2021 07:00 22-06-2021 16:00 21-06-2021
2 21-06-2021 07:00 22-06-2021 16:00 22-06-2021

You can use explode and array_repeat to duplicate the rows.
I use a combination of row_number and date functions to get the date ranges between start and end dates:
from pyspark.sql.types import *
from pyspark.sql import functions as F
from pyspark.sql.window import Window
w = Window().partitionBy("ID").orderBy('StartDatetime')
output_df = df.withColumn("diff", 1+F.datediff(F.to_date(F.unix_timestamp('EndDatetime', 'dd-MM-yyyy HH:mm').cast('timestamp')), \
F.to_date(F.unix_timestamp('StartDatetime', 'dd-MM-yyyy HH:mm').cast('timestamp'))))\
.withColumn('diff', F.expr('explode(array_repeat(diff,int(diff)))'))\
.withColumn("diff", F.row_number().over(w))\
.withColumn("start_dt", F.to_date(F.unix_timestamp('StartDatetime', 'dd-MM-yyyy HH:mm').cast('timestamp')))\
.withColumn("newdate", F.date_format(F.expr("date_add(start_dt, diff-1)"), 'dd-MM-yyyy')).drop('diff', 'start_dt')
Output:
output_df.orderBy("ID", "newdate").show()
+---+----------------+----------------+----------+
| ID| StartDatetime| EndDatetime| newdate|
+---+----------------+----------------+----------+
| 1|21-06-2021 07:00|24-06-2021 16:00|21-06-2021|
| 1|21-06-2021 07:00|24-06-2021 16:00|22-06-2021|
| 1|21-06-2021 07:00|24-06-2021 16:00|23-06-2021|
| 1|21-06-2021 07:00|24-06-2021 16:00|24-06-2021|
| 2|21-06-2021 07:00|22-06-2021 16:00|21-06-2021|
| 2|21-06-2021 07:00|22-06-2021 16:00|22-06-2021|
+---+----------------+----------------+----------+
I dropped the diff column, but displaying it will help you understand the logic if it's not clear.

How to convert List as String from API to List of String

I'm getting the following String from an API:
"[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]"
I want to convert this String to a List.
How to convert it?

Code :
void main() {
String mylist = "[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]";
mylist = mylist.replaceAll('[', '');
mylist = mylist.replaceAll(']', '');
List<String> newList = mylist.split(',');
print(newList[0]);
}
Output :
4:00 PM - 5:00 PM

void main() {
String list = "[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM,"
" 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM,"
" 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM,"
" 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]";
List<String> out = [];
String batch = "";
for (String s in list.split("")) {
if (s == "[" || s == "]") continue;
if (s == ",") {
out.add(batch);
batch = "";
} else
batch += s;
}
print(out);
}
Output -> [4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM]
Hope this helps! 😋

this works
String data =
"[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]";
List<String> dataList = data.replaceAll('[', '').replaceAll(']', '').split(',');

You can use split() method in Dart. It will look like this:
void main() {
String times = "4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM";
// Splitting the string
// across comma
print(times.split(","));
List<String> timesList = times.split(",");
}
// Output:
//[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]

How to add a separator line after specific item in NSTableView (Cocoa/OS X)

I'm building a program that gets weather forecasts from the Met Office using their DataPoint API. This gives me forecast data for five days into the future, with eight forecast points (every three hours) for every day. It displays them in an NSTableView, which currently looks like this:
27/3/2016 at 0:00 6°C ...
27/3/2016 at 3:00 4°C ...
27/3/2016 at 6:00 4°C ...
27/3/2016 at 9:00 7°C ...
27/3/2016 at 12:00 8°C ...
27/3/2016 at 15:00 8°C ...
27/3/2016 at 18:00 8°C ...
27/3/2016 at 21:00 6°C ...
28/3/2016 at 0:00 6°C ...
28/3/2016 at 3:00 8°C ...
28/3/2016 at 6:00 8°C ...
...
However, when viewing five days of this all at once, it is very hard to read it day by day (determine where each day starts and ends without looking at the dates) and this is not user friendly in the slightest. So, I want to add a separator line (or separator cell, depending on how it can be done) after the last forecast step of each day, to separate the string of data out into easily recognisable days. I want it to look like this:
27/3/2016 at 0:00 6°C ...
27/3/2016 at 3:00 4°C ...
27/3/2016 at 6:00 4°C ...
27/3/2016 at 9:00 7°C ...
27/3/2016 at 12:00 8°C ...
27/3/2016 at 15:00 8°C ...
27/3/2016 at 18:00 8°C ...
27/3/2016 at 21:00 6°C ...
-------------------------------
28/3/2016 at 0:00 6°C ...
28/3/2016 at 3:00 8°C ...
28/3/2016 at 6:00 8°C ...
...
What is the best way that I can do this (in a single NSTableView, put separators at select points in the dataset)? Thanks

Get COUNT of hours on 30 mins Interval

I have this below data.
Date Interval
2014-01-01 12:00 AM
2014-01-01 12:30 AM
2014-01-01 1:00 AM
2014-01-01 1:30 AM
2014-01-01 2:00 AM
2014-01-01 2:30 AM
2014-01-01 3:00 AM
2014-01-01 3:30 AM
2014-01-01 4:00 AM
2014-01-01 4:30 AM
I need to extract the hour of the interval column.
I could do it using EXTRACT('hour', Interval) which gives me the hour numbers as int.
Result will be as follows:
Date Interval HourCount
2014-01-01 12:00 AM 0
2014-01-01 12:30 AM 0
2014-01-01 1:00 AM 1
2014-01-01 1:30 AM 1
2014-01-01 2:00 AM 2
2014-01-01 2:30 AM 2
2014-01-01 3:00 AM 3
2014-01-01 3:30 AM 3
2014-01-01 4:00 AM 4
2014-01-01 4:30 AM 4
But what I'm looking for is. I need the count for every 30 mins as 1.
Example data what I'm looking for.
Date Interval HourCount
2014-01-01 12:00 AM 1
2014-01-01 12:30 AM 2
2014-01-01 1:00 AM 3
2014-01-01 1:30 AM 4
2014-01-01 2:00 AM 5
2014-01-01 2:30 AM 6
2014-01-01 3:00 AM 7
2014-01-01 3:30 AM 8
2014-01-01 4:00 AM 9
2014-01-01 4:30 AM 10
This way, in a day I'll be getting 48 intervals.
I could use ROW_NUMBER() OVER (PARTITION BY Date ORDER BY Date). But this will give me wrong count if any interval is been missed out.
Suppose if the below row is missed out.
2014-01-01 4:00 AM 9
I'll be getting 9 as the HourCount for this row.
2014-01-01 4:30 AM 9
Someone help me to get the count of hours on 30 mins interval.

Try something like:
with series as
(select interval_num, interval_num * interval '30 minutes' as interval_time
from generate_series(0,47) series(interval_num))
select *
from data_table
right join series on series.interval_time = data_table.interval

Like you wrote, you can extract the hour, what you're missing is you can also extract the minutes and just check if they are bigger or equal 30.
SELECT date_part('hour', interval) * 2 + CASE WHEN date_part('minute', interval) >= 30 THEN 1 ELSE 0 END

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

groupby_dynamic with selfdesigned index - python-polars

Related

How to transpose specific columns in PySpark

Need to add date ranges between two date columns in pyspark?

How to convert List as String from API to List of String

How to add a separator line after specific item in NSTableView (Cocoa/OS X)

Get COUNT of hours on 30 mins Interval

Categories

Resources