I have a following PySpark dataframe:
year week date time value
2020 1 20201203 2:00 - 2:15 23.9
2020 1 20201203 2:15 - 2:30 45.87
2020 1 20201203 2:30 - 2:45 87.76
2020 1 20201203 2:45 - 3:00 12.87
I want to transpose the time and value column. The desired output should be:
year week date 2:00 - 2:15 2:15 - 2:30 2:30 - 2:45 2:45 - 3:00
2020 1 20201203 23.9 45.87 87.76 12.87
You can use groupby and pivot.
df = df.groupby('year', 'week', 'date').pivot('time').max('value')
I have input pyspark dataframe with columns like ID,StartDatetime,EndDatetime. I want to add new column named newdate based on startdatetime and enddatetime.
Input DF :-
ID StartDatetime EndDatetime
1 21-06-2021 07:00 24-06-2021 16:00
2 21-06-2021 07:00 22-06-2021 16:00
required output :-
ID StartDatetime EndDatetime newdate
1 21-06-2021 07:00 24-06-2021 16:00 21-06-2021
1 21-06-2021 07:00 24-06-2021 16:00 22-06-2021
1 21-06-2021 07:00 24-06-2021 16:00 23-06-2021
1 21-06-2021 07:00 24-06-2021 16:00 24-06-2021
2 21-06-2021 07:00 22-06-2021 16:00 21-06-2021
2 21-06-2021 07:00 22-06-2021 16:00 22-06-2021
You can use explode and array_repeat to duplicate the rows.
I use a combination of row_number and date functions to get the date ranges between start and end dates:
from pyspark.sql.types import *
from pyspark.sql import functions as F
from pyspark.sql.window import Window
w = Window().partitionBy("ID").orderBy('StartDatetime')
output_df = df.withColumn("diff", 1+F.datediff(F.to_date(F.unix_timestamp('EndDatetime', 'dd-MM-yyyy HH:mm').cast('timestamp')), \
F.to_date(F.unix_timestamp('StartDatetime', 'dd-MM-yyyy HH:mm').cast('timestamp'))))\
.withColumn('diff', F.expr('explode(array_repeat(diff,int(diff)))'))\
.withColumn("diff", F.row_number().over(w))\
.withColumn("start_dt", F.to_date(F.unix_timestamp('StartDatetime', 'dd-MM-yyyy HH:mm').cast('timestamp')))\
.withColumn("newdate", F.date_format(F.expr("date_add(start_dt, diff-1)"), 'dd-MM-yyyy')).drop('diff', 'start_dt')
Output:
output_df.orderBy("ID", "newdate").show()
+---+----------------+----------------+----------+
| ID| StartDatetime| EndDatetime| newdate|
+---+----------------+----------------+----------+
| 1|21-06-2021 07:00|24-06-2021 16:00|21-06-2021|
| 1|21-06-2021 07:00|24-06-2021 16:00|22-06-2021|
| 1|21-06-2021 07:00|24-06-2021 16:00|23-06-2021|
| 1|21-06-2021 07:00|24-06-2021 16:00|24-06-2021|
| 2|21-06-2021 07:00|22-06-2021 16:00|21-06-2021|
| 2|21-06-2021 07:00|22-06-2021 16:00|22-06-2021|
+---+----------------+----------------+----------+
I dropped the diff column, but displaying it will help you understand the logic if it's not clear.
I'm getting the following String from an API:
"[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]"
I want to convert this String to a List.
How to convert it?
Code :
void main() {
String mylist = "[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]";
mylist = mylist.replaceAll('[', '');
mylist = mylist.replaceAll(']', '');
List<String> newList = mylist.split(',');
print(newList[0]);
}
Output :
4:00 PM - 5:00 PM
void main() {
String list = "[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM,"
" 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM,"
" 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM,"
" 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]";
List<String> out = [];
String batch = "";
for (String s in list.split("")) {
if (s == "[" || s == "]") continue;
if (s == ",") {
out.add(batch);
batch = "";
} else
batch += s;
}
print(out);
}
Output -> [4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM]
Hope this helps! 😋
this works
String data =
"[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]";
List<String> dataList = data.replaceAll('[', '').replaceAll(']', '').split(',');
You can use split() method in Dart. It will look like this:
void main() {
String times = "4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM";
// Splitting the string
// across comma
print(times.split(","));
List<String> timesList = times.split(",");
}
// Output:
//[4:00 PM - 5:00 PM, 5:00 PM - 6:00 PM, 6:00 PM - 7:00 PM, 7:00 PM - 8:00 PM, 10:00 AM - 11:00 AM, 11:00 AM - 12:00 PM, 12:00 PM - 1:00 PM, 1:00 PM - 2:00 PM, 2:00 PM - 3:00 PM]
I'm building a program that gets weather forecasts from the Met Office using their DataPoint API. This gives me forecast data for five days into the future, with eight forecast points (every three hours) for every day. It displays them in an NSTableView, which currently looks like this:
27/3/2016 at 0:00 6°C ...
27/3/2016 at 3:00 4°C ...
27/3/2016 at 6:00 4°C ...
27/3/2016 at 9:00 7°C ...
27/3/2016 at 12:00 8°C ...
27/3/2016 at 15:00 8°C ...
27/3/2016 at 18:00 8°C ...
27/3/2016 at 21:00 6°C ...
28/3/2016 at 0:00 6°C ...
28/3/2016 at 3:00 8°C ...
28/3/2016 at 6:00 8°C ...
...
However, when viewing five days of this all at once, it is very hard to read it day by day (determine where each day starts and ends without looking at the dates) and this is not user friendly in the slightest. So, I want to add a separator line (or separator cell, depending on how it can be done) after the last forecast step of each day, to separate the string of data out into easily recognisable days. I want it to look like this:
27/3/2016 at 0:00 6°C ...
27/3/2016 at 3:00 4°C ...
27/3/2016 at 6:00 4°C ...
27/3/2016 at 9:00 7°C ...
27/3/2016 at 12:00 8°C ...
27/3/2016 at 15:00 8°C ...
27/3/2016 at 18:00 8°C ...
27/3/2016 at 21:00 6°C ...
-------------------------------
28/3/2016 at 0:00 6°C ...
28/3/2016 at 3:00 8°C ...
28/3/2016 at 6:00 8°C ...
...
What is the best way that I can do this (in a single NSTableView, put separators at select points in the dataset)? Thanks
I have this below data.
Date Interval
2014-01-01 12:00 AM
2014-01-01 12:30 AM
2014-01-01 1:00 AM
2014-01-01 1:30 AM
2014-01-01 2:00 AM
2014-01-01 2:30 AM
2014-01-01 3:00 AM
2014-01-01 3:30 AM
2014-01-01 4:00 AM
2014-01-01 4:30 AM
I need to extract the hour of the interval column.
I could do it using EXTRACT('hour', Interval) which gives me the hour numbers as int.
Result will be as follows:
Date Interval HourCount
2014-01-01 12:00 AM 0
2014-01-01 12:30 AM 0
2014-01-01 1:00 AM 1
2014-01-01 1:30 AM 1
2014-01-01 2:00 AM 2
2014-01-01 2:30 AM 2
2014-01-01 3:00 AM 3
2014-01-01 3:30 AM 3
2014-01-01 4:00 AM 4
2014-01-01 4:30 AM 4
But what I'm looking for is. I need the count for every 30 mins as 1.
Example data what I'm looking for.
Date Interval HourCount
2014-01-01 12:00 AM 1
2014-01-01 12:30 AM 2
2014-01-01 1:00 AM 3
2014-01-01 1:30 AM 4
2014-01-01 2:00 AM 5
2014-01-01 2:30 AM 6
2014-01-01 3:00 AM 7
2014-01-01 3:30 AM 8
2014-01-01 4:00 AM 9
2014-01-01 4:30 AM 10
This way, in a day I'll be getting 48 intervals.
I could use ROW_NUMBER() OVER (PARTITION BY Date ORDER BY Date). But this will give me wrong count if any interval is been missed out.
Suppose if the below row is missed out.
2014-01-01 4:00 AM 9
I'll be getting 9 as the HourCount for this row.
2014-01-01 4:30 AM 9
Someone help me to get the count of hours on 30 mins interval.
Try something like:
with series as
(select interval_num, interval_num * interval '30 minutes' as interval_time
from generate_series(0,47) series(interval_num))
select *
from data_table
right join series on series.interval_time = data_table.interval
Like you wrote, you can extract the hour, what you're missing is you can also extract the minutes and just check if they are bigger or equal 30.
SELECT date_part('hour', interval) * 2 + CASE WHEN date_part('minute', interval) >= 30 THEN 1 ELSE 0 END