How to filter data with starting and ending conditions? - date

I'm trying to filter my data based on two conditions dependent on sequential dates.
I am looking for values below 2 for 5+ sequential dates,
with a "cushion period" of values 2 to 5 for up to 3 sequential days.
It would look something like this (sorry for the terrible excel attempt here):
Day 1 to Day 10 would be included and day 11 would not be. Days 6 to 8 would be considered the "cushion period." I hope this makes sense!!
Right now, I am able to get the cushion period (in the reprex) only but I cant figure out how to add the start and ending condition for values under 2 for 5 sequential dates to be included (the 5 days could be broken up with the cushion period inbetween but I feel like this might complicate things).
Any help would be GREATLY appreciated!
For my reprex (below), the dates that would be included in the final df are in blue (dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000) and the dates in grey would not be.
Reprex:
library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
Value = c(2,3,4,5,2,2,1,0,1,8,7,9,4,5,2,3,4,5,7,2,6,0,2,1,2,0,3,4,0,1))
head(data)
#Goal: values should include dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000
#I am able to subset the "cushion period" but I'm not sure how to add the starting and ending conditions for it
attempt1 <- data %>%
group_by(group_id = as.integer(gl(n(),3,n()))) %>%
filter(Value <= 5 & Value >=3) %>%
ungroup() %>%
select(-group_id)
head(attempt1)

If I get it correctly, you need to keep groups of consecutive values that are below or equal to 5 with at least 5 consecutive values below or equal to 2 within it. Here's a way to do that, with some explanation:
library(dplyr)
data %>%
mutate(under_three = Value <= 2) %>%
# under_three = TRUE if Value is below or equal to 2
group_by(rl_two = data.table::rleid(Value <= 2)) %>%
# Group by sequence of values that are under_three
mutate(big = n() >= 5 & all(under_three)) %>%
# big = T if there are more 5 or more consecutive values that are below or equal to 2
group_by(rl_five = data.table::rleid(Value <= 5)) %>%
# ungroup by rl_two, and group by rl_five, i.e. consecutive values that are below or equal to 5
filter(any(big))
# keep from the data frame groups of rl_five if they have at least one big = T; remove other groups.
Output:
data %>%
ungroup() %>%
select(Date, Value)
Date Value
1 2000-01-01 2
2 2000-01-02 3
3 2000-01-03 4
4 2000-01-04 5
5 2000-01-05 2
6 2000-01-06 2
7 2000-01-07 1
8 2000-01-08 0
9 2000-01-09 1
10 2000-01-22 0
11 2000-01-23 2
12 2000-01-24 1
13 2000-01-25 2
14 2000-01-26 0
15 2000-01-27 3
16 2000-01-28 4
17 2000-01-29 0
18 2000-01-30 1

Related

Condition to compare to previous period rate?

I'm having trouble to come with a condition to compare rate variation between different time periods
VAR flag =
CALCULATE (
MAX('rates'[€/KG]),
'rates', EARLIER ( 'rates'[START_ETD] ) > 'rates'[START_ETD], 'rates'[LP_DESC] = EARLIER ('rates'[LP_DESC])
)
RETURN
'rates'[€/KG]
- IF ( flag = BLANK (), 'rates'[€/KG], flag )
The idea would be not to get the MAX but the previous rate number before the time period/date we are considering! Could someone help me?
Table
LP_DESC €/kg START_ETD END_ETD VARIATION
A 2,5 1/07/2022 14/07/2022 1,5
A 3 15/07/2022 31/07/2022 0,5
B 1,5 1/07/2022 14/07/2022 -3
B 3,5 15/07/2022 31/07/2022 2
A 3,5 1/06/2022 14/06/2022 -
A 1 15/06/2022 31/06/2022 0,5
B 2,5 1/06/2022 14/06/2022 -
B 4,5 15/06/2022 31/06/2022 2

Table sort by month

I have a table in MATLAB with attributes in the first three columns and data from the fourth column onwards. I was trying to sort the entire table based on the first three columns. However, one of the columns (Column C) contains months ('January', 'February' ...etc). The sortrows function would only let me choose 'ascend' or 'descend' but not a custom option to sort by month. Any help would be greatly appreciated. Below is the code I used.
sortrows(Table, {'Column A','Column B','Column C'} , {'ascend' , 'ascend' , '???' } )
As #AnonSubmitter85 suggested, the best thing you can do is to convert your month names to numeric values from 1 (January) to 12 (December) as follows:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t.ColumnC = month(datenum(t.ColumnC,'mmmm'));
This will facilitate the access to a standard sorting criterion for your ColumnC too (in this example, ascending):
t = sortrows(t,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
If, for any reason that is unknown to us, you are forced to keep your months as literals, you can use a workaround that consists in sorting a clone of the table using the approach described above, and then applying to it the resulting indices:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t_original = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t_clone = t_original;
t_clone.ColumnC = month(datenum(t_clone.ColumnC,'mmmm'));
[~,idx] = sortrows(t_clone,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
t_original = t_original(idx,:);

How to use GroupByKey in Spark to calculate nonlinear-groupBy task

I have a table looks like
Time ID Value1 Value2
1 a 1 4
2 a 2 3
3 a 5 9
1 b 6 2
2 b 4 2
3 b 9 1
4 b 2 5
1 c 4 7
2 c 2 0
Here is the tasks and requirements:
I want to set the column ID as the key, not the column Time, but I don't want to delete the column Time. Is there a way in Spark to set Primary Key?
The aggregation function is non-linear, which means you can not use "reduceByKey". All the data must be shuffled to one single node before calculation. For example, the aggregation function may looks like root N of the sum values, where N is the number of records (count) for each ID :
output = root(sum(value1), count(*)) + root(sum(value2), count(*))
To make it clear, for ID="a", the aggregated output value should be
output = root(1 + 2 + 5, 3) + root(4 + 3 + 9, 3)
the later 3 is because we have 3 record for a. For ID='b', it is:
output = root(6 + 4 + 9 + 2, 4) + root(2 + 2 + 1 + 5, 4)
The combination is non-linear. Therefore, in order to get correct results, all the data with the same "ID" must be in one executor.
I checked UDF or Aggregator in Spark 2.0. Based on my understanding, they all assume "linear combination"
Is there a way to handle such nonlinear combination calculation? Especially, taking the advantage of parallel computing with Spark?
Function you use doesn't require any special treatment. You can use plain SQL with join
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.{count, lit, sum, pow}
def root(l: Column, r: Column) = pow(l, lit(1) / r)
val out = root(sum($"value1"), count("*")) + root(sum($"value2"), count("*"))
df.groupBy("id").agg(out.alias("outcome")).join(df, Seq("id"))
or window functions:
import org.apache.spark.sql.expressions.Window
val w = Window.partitionBy("id")
val outw = root(sum($"value1").over(w), count("*").over(w)) +
root(sum($"value2").over(w), count("*").over(w))
df.withColumn("outcome", outw)

Find how many sundays in a month asp classic

I am trying to use asp classic to find how many working days (mon - sat) are in the month and how many are left.
any help or pointers greatly appreciated!
Here's how you can find the number of Sundays in a month without iteration. Somebody posted a JavaScript solution a few months back and I ported it to VBScript:
Function GetSundaysInMonth(intMonth, intYear)
dtmStart = DateSerial(intYear, intMonth, 1)
intDays = Day(DateAdd("m", 1, dtmStart) - 1)
GetSundaysInMonth = Int((intDays + (Weekday(dtmStart) + 5) Mod 7) / 7)
End Function
So, your total work days would just be the number of days in the month minus the number of Sundays.
Edit:
As #Lankymart pointed out in the comments, the above function gives you the number of Sundays in the month but it doesn't tell you how many are left.
Here's another version that does just that. Pass in any date and it will tell you how many Sundays are left in the month starting with that date. If you want to know how many Sundays are in a full month, just pass in the first day of the month (e.g., DateSerial(2014, 8, 1)).
Function GetSundaysRemainingInMonth(dtmStart)
intDays = Day(DateSerial(Year(dtmStart), Month(dtmStart) + 1, 1) - 1)
intDays = intDays - Day(dtmStart) + 1
GetSundaysRemainingInMonth = Int((intDays + (Weekday(dtmStart) + 5) Mod 7) / 7)
End Function
Edit 2:
#Cheran Shunmugavel was interested in some specifics about how this works. First, I just want to restate that I didn't develop this method originally. I just ported it to VBScript and tailored it to the OP's requirement (Sundays).
Imagine a February during a leap year. We have 29 days during the month. We know from the start that we have four full weeks, so each weekday will be represented at least four times. But that still leaves one addition day that's unaccounted for (29 Mod 7 = 1). How do we know if we get an extra Sunday from that one day? Well, in this case, it's pretty simple. Only if our start date is a Sunday can we count an extra Sunday for the month.
What if the month has 30 days? Then we have two extra days to account for. In that case, the start date can be a Saturday or a Sunday and we can count an extra Sunday for the month. And so it goes. So we can see that if we're X additional days within an upcoming Sunday, we can count an extra Sunday.
Let's put this in tabular form:
Addl Days Needed
Day To Count Sunday
---------- ----------------
Sunday 1
Saturday 2
Friday 3
Thursday 4
Wednesday 5
Tuesday 6
Monday 7
So what we need is a formula that we can apply to these situations so that they all result in the same value. We'll need to assign some value to each day and combine that value with the number of addition days needed for Sunday to count. Seems reasonable that if we assign an inverse value to the weekdays and add that to the number of additional days, we can get the same result.
Addl Days Needed Value Assigned
Day To Count Sunday To Weekday Sum
---------- ---------------- -------------- ---
Sunday 1 6 7
Saturday 2 5 7
Friday 3 4 7
Thursday 4 3 7
Wednesday 5 2 7
Tuesday 6 1 7
Monday 7 0 7
So, if weekday_value + addl_days = 7 then we count an extra Sunday. (We'll divide this by 7 later to give us 1 additional Sunday). But how do we assign the values we want to the weekdays? Well, VBScript's Weekday() function already does this but, unfortunately, it doesn't use the values we need by default (it uses 1 for Sunday through 7 for Saturday). We could change the way Weekday() works by using the second param, or we could just use a Mod(). This is where the + 5 Mod 7 comes in. If we take the Weekday() value and add 5, then mod that by 7, we get the values we need.
Day Weekday() +5 Mod 7
---------- --------- -- -----
Sunday 1 6 6
Saturday 7 12 5
Friday 6 11 4
Thursday 5 10 3
Wednesday 4 9 2
Tuesday 3 8 1
Monday 2 7 0
That's how the + 5 Mod 7 was determined. And, with that solved, the rest is easy(er)!
#Zam is on the right track you need to use WeekDay() function, here is a basic idea of how to script it;
<%
Dim month_start, month_end, currentdate, dayofmonth
Dim num_weekdays, num_past, num_future
Dim msg
'This can be configured how you like even use Date().
month_start = CDate("01/08/2014")
month_end = DateAdd("d", -1, DateAdd("m", 1, month_start))
msgbox(Day(month_end))
For dayofmonth = 1 To Day(month_end)
currentdate = CDate(DateAdd("d", dayofmonth, month_start))
'Only ignore Sundays
If WeekDay(currentdate) <> vbSunday Then
num_weekdays = num_weekdays + 1
If currentdate <= Date() Then
num_past = num_past + 1
Else
num_future = num_future + 1
End If
End If
Next
msg = ""
msg = msg & "Start: " & month_start & "<br />"
msg = msg & "End: " & month_end & "<br />"
msg = msg & "Number of Weekdays: " & num_weekdays & "<br />"
msg = msg & "Weekdays Past: " & num_past & "<br />"
msg = msg & "Weekdays Future: " & num_future & "<br />"
Response.Write msg
%>
How about using "The Weekday function returns a number between 1 and 7, that represents the day of the week." ?

Use Lag function in SAS find difference and delete if the value is less than 30

Eg.
Subject Date
1 2/10/13
1 2/15/13
1 2/27/13
1 3/15/13
1 3/29/13
2 1/11/13
2 1/31/13
2 2/15/13
I would need only the subjects with the dates between them more than 30.
required output:
Subject Date
1 2/10/13
1 3/15/13
2 1/11/13
2 2/15/13
This is a very interesting problem. I'll use the retain statement in the DATA step.
Since we are trying to compare dates between different observations, it's a bit more difficult. We can take advantage of the fact that SAS can convert dates to SAS date values (i.e. number of days after Jan 1 1960). Then we can compare these numeric values using conditional statements.
data work.test;
input Subject Date anydtdte15.;
sasdate = Date;
retain x;
if -30 <= sasdate - x <= 30 then delete;
else x = sasdate;
datalines;
1 2/10/13
1 2/15/13
1 2/27/13
1 3/15/13
1 3/29/13
2 1/11/13
2 1/31/13
2 2/15/13
;
run;
proc print data=test;
format Date mmddyy8.;
var Subject Date;
run;
OUTPUT as required:
Obs Subject Date
1 1 02/10/13
2 1 03/15/13
3 2 01/11/13
4 2 02/15/13