Use Lag function in SAS find difference and delete if the value is less than 30 - diff

Eg.
Subject Date
1 2/10/13
1 2/15/13
1 2/27/13
1 3/15/13
1 3/29/13
2 1/11/13
2 1/31/13
2 2/15/13
I would need only the subjects with the dates between them more than 30.
required output:
Subject Date
1 2/10/13
1 3/15/13
2 1/11/13
2 2/15/13

This is a very interesting problem. I'll use the retain statement in the DATA step.
Since we are trying to compare dates between different observations, it's a bit more difficult. We can take advantage of the fact that SAS can convert dates to SAS date values (i.e. number of days after Jan 1 1960). Then we can compare these numeric values using conditional statements.
data work.test;
input Subject Date anydtdte15.;
sasdate = Date;
retain x;
if -30 <= sasdate - x <= 30 then delete;
else x = sasdate;
datalines;
1 2/10/13
1 2/15/13
1 2/27/13
1 3/15/13
1 3/29/13
2 1/11/13
2 1/31/13
2 2/15/13
;
run;
proc print data=test;
format Date mmddyy8.;
var Subject Date;
run;
OUTPUT as required:
Obs Subject Date
1 1 02/10/13
2 1 03/15/13
3 2 01/11/13
4 2 02/15/13

Related

How to filter data with starting and ending conditions?

I'm trying to filter my data based on two conditions dependent on sequential dates.
I am looking for values below 2 for 5+ sequential dates,
with a "cushion period" of values 2 to 5 for up to 3 sequential days.
It would look something like this (sorry for the terrible excel attempt here):
Day 1 to Day 10 would be included and day 11 would not be. Days 6 to 8 would be considered the "cushion period." I hope this makes sense!!
Right now, I am able to get the cushion period (in the reprex) only but I cant figure out how to add the start and ending condition for values under 2 for 5 sequential dates to be included (the 5 days could be broken up with the cushion period inbetween but I feel like this might complicate things).
Any help would be GREATLY appreciated!
For my reprex (below), the dates that would be included in the final df are in blue (dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000) and the dates in grey would not be.
Reprex:
library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
Value = c(2,3,4,5,2,2,1,0,1,8,7,9,4,5,2,3,4,5,7,2,6,0,2,1,2,0,3,4,0,1))
head(data)
#Goal: values should include dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000
#I am able to subset the "cushion period" but I'm not sure how to add the starting and ending conditions for it
attempt1 <- data %>%
group_by(group_id = as.integer(gl(n(),3,n()))) %>%
filter(Value <= 5 & Value >=3) %>%
ungroup() %>%
select(-group_id)
head(attempt1)
If I get it correctly, you need to keep groups of consecutive values that are below or equal to 5 with at least 5 consecutive values below or equal to 2 within it. Here's a way to do that, with some explanation:
library(dplyr)
data %>%
mutate(under_three = Value <= 2) %>%
# under_three = TRUE if Value is below or equal to 2
group_by(rl_two = data.table::rleid(Value <= 2)) %>%
# Group by sequence of values that are under_three
mutate(big = n() >= 5 & all(under_three)) %>%
# big = T if there are more 5 or more consecutive values that are below or equal to 2
group_by(rl_five = data.table::rleid(Value <= 5)) %>%
# ungroup by rl_two, and group by rl_five, i.e. consecutive values that are below or equal to 5
filter(any(big))
# keep from the data frame groups of rl_five if they have at least one big = T; remove other groups.
Output:
data %>%
ungroup() %>%
select(Date, Value)
Date Value
1 2000-01-01 2
2 2000-01-02 3
3 2000-01-03 4
4 2000-01-04 5
5 2000-01-05 2
6 2000-01-06 2
7 2000-01-07 1
8 2000-01-08 0
9 2000-01-09 1
10 2000-01-22 0
11 2000-01-23 2
12 2000-01-24 1
13 2000-01-25 2
14 2000-01-26 0
15 2000-01-27 3
16 2000-01-28 4
17 2000-01-29 0
18 2000-01-30 1

Table sort by month

I have a table in MATLAB with attributes in the first three columns and data from the fourth column onwards. I was trying to sort the entire table based on the first three columns. However, one of the columns (Column C) contains months ('January', 'February' ...etc). The sortrows function would only let me choose 'ascend' or 'descend' but not a custom option to sort by month. Any help would be greatly appreciated. Below is the code I used.
sortrows(Table, {'Column A','Column B','Column C'} , {'ascend' , 'ascend' , '???' } )
As #AnonSubmitter85 suggested, the best thing you can do is to convert your month names to numeric values from 1 (January) to 12 (December) as follows:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t.ColumnC = month(datenum(t.ColumnC,'mmmm'));
This will facilitate the access to a standard sorting criterion for your ColumnC too (in this example, ascending):
t = sortrows(t,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
If, for any reason that is unknown to us, you are forced to keep your months as literals, you can use a workaround that consists in sorting a clone of the table using the approach described above, and then applying to it the resulting indices:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t_original = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t_clone = t_original;
t_clone.ColumnC = month(datenum(t_clone.ColumnC,'mmmm'));
[~,idx] = sortrows(t_clone,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
t_original = t_original(idx,:);

Average similar time periods MATLAB

I have a matrix with a timestamp and several column variables.
The matrix spans a month of half hourly variables. Here is a sample of four columns of the matrix
11/11/2015 20:15 31.26410236 35.70104634 35.93171056
11/11/2015 20:45 32.10746291 35.48806277 35.9647747
.
.
.
12/11/2015 20:15 32.10746291 35.48806277 35.9647747
12/11/2015 20:45 32.10746291 35.48806277 35.9647747
.
.
.
13/11/2015 20:15 32.68310429 35.58753807 37.26447422
13/11/2015 20:45 33.05141516 34.8432801 36.48033884
.
.
.
14/11/2015 20:15 32.08328579 34.66482668 34.65446868
14/11/2015 20:45 32.19994433 34.40562145 34.34035989
What is the easiest way to find the average of identical times in terms of hours and minutes?
E.g. mean of each variable at time 20:45 for all days of the month.
I know I could achieve this by converting the timestamp to a datenum, taking the fractional part of datenum and sorting the data by the fractional part of datenum. After that I could block average the rows with similar fractional datenums. Is there a more efficient and more elegant way?
With matlab you can work directly with date and times without converting it to timestamp in miliseconds or seconds:
http://es.mathworks.com/help/matlab/date-and-time-operations-1.html
Or an easy way is to convert dates to a date vector like this:
DateVector = datevec(DateString,formatIn)
then compare the columns you want:
[Y,M,D,H,MN,S] = datevec(___)
>> A = datevec('13/11/2015 20:45','dd/mm/yyyy HH:MM')
A =
2015 11 13 20 45 0
>> B = datevec('14/11/2015 20:45','dd/mm/yyyy HH:MM')
B =
2015 11 14 20 45 0
with this is easy to compare dates:
>> A - B
ans =
0 0 -1 0 0 0
exactly one day difference
This is what I ened up doing to solve this problem:
timestamp=linspace(datenum('2015-11-01 00:00', 'yyyy-mm-dd HH:MM'),datenum('2015-12-01 00:00', 'yyyy-mm-dd HH:MM'),1440); % 30 days
timestamp=timestamp';
time_of_day=datetime(datevec(timestamp(1:48)),'Format','HH:mm');
numdays=30;
data=rand(length(timestamp),2);
means=NaN(48,3);
for tt=1:48
means(tt,:)=[datenum(time_of_day(tt)) nanmean(data(tt:48:48*numdays,:),1)];
end
figure;
plot(time_of_day,means(:,2:3));
xlim([timestamp(1) timestamp(48)]);

Tableau: last value calculation in calculated field

my dataset is like this
KPI VALUE TYPE DATE
coffee break duration 11 0 30/06/2015
coffee break duration 12 0 31/07/2015
coffee break duration 10 0 30/11/2014
coffee break duration 10 0 31/12/2014
coffee expense 20 1 31/07/2015
coffee expense 20 1 31/12/2014
coffee consumers 15 -1 31/07/2015
coffee consumers 17 -1 31/12/2014
for Type, 0 means minutes, 1 means dollars and -1 means people
I want to get a table like this
KPI Year(date) YTD
coffee break duration 2015 11,5
coffee break duration 2014 10
....
YTD calculation is:
if sum([TYPE]) = 0 then avg([VALUE])
elseif sum([TYPE]) > 0 then sum([VALUE])
elseif sum([TYPE]) < 0 then [last value for the considered year]
end
By [Last value for the considered year] I mean the last entry available, in a year if my table is set to Year, otherwise it has to change dynamically based on what Timespan I want to show.
What can I do to have [last value for the considered year] as a calc field ready to use in my YTD calc?
Many thanks,
Stefania
If I understand your question, than you can use LOD in the IF statement
if sum(type) = 0 then avg([value])
elseif sum([type]) > 0 then sum(value)
elseif sum([type]) < 0 then max(if [date] = { INCLUDE kpi: max(date)} then [value] end)
end
If there are several values on the last day of the considered year, it would take the biggest value
I slightly modified your data to show that results are working correctly

Find how many sundays in a month asp classic

I am trying to use asp classic to find how many working days (mon - sat) are in the month and how many are left.
any help or pointers greatly appreciated!
Here's how you can find the number of Sundays in a month without iteration. Somebody posted a JavaScript solution a few months back and I ported it to VBScript:
Function GetSundaysInMonth(intMonth, intYear)
dtmStart = DateSerial(intYear, intMonth, 1)
intDays = Day(DateAdd("m", 1, dtmStart) - 1)
GetSundaysInMonth = Int((intDays + (Weekday(dtmStart) + 5) Mod 7) / 7)
End Function
So, your total work days would just be the number of days in the month minus the number of Sundays.
Edit:
As #Lankymart pointed out in the comments, the above function gives you the number of Sundays in the month but it doesn't tell you how many are left.
Here's another version that does just that. Pass in any date and it will tell you how many Sundays are left in the month starting with that date. If you want to know how many Sundays are in a full month, just pass in the first day of the month (e.g., DateSerial(2014, 8, 1)).
Function GetSundaysRemainingInMonth(dtmStart)
intDays = Day(DateSerial(Year(dtmStart), Month(dtmStart) + 1, 1) - 1)
intDays = intDays - Day(dtmStart) + 1
GetSundaysRemainingInMonth = Int((intDays + (Weekday(dtmStart) + 5) Mod 7) / 7)
End Function
Edit 2:
#Cheran Shunmugavel was interested in some specifics about how this works. First, I just want to restate that I didn't develop this method originally. I just ported it to VBScript and tailored it to the OP's requirement (Sundays).
Imagine a February during a leap year. We have 29 days during the month. We know from the start that we have four full weeks, so each weekday will be represented at least four times. But that still leaves one addition day that's unaccounted for (29 Mod 7 = 1). How do we know if we get an extra Sunday from that one day? Well, in this case, it's pretty simple. Only if our start date is a Sunday can we count an extra Sunday for the month.
What if the month has 30 days? Then we have two extra days to account for. In that case, the start date can be a Saturday or a Sunday and we can count an extra Sunday for the month. And so it goes. So we can see that if we're X additional days within an upcoming Sunday, we can count an extra Sunday.
Let's put this in tabular form:
Addl Days Needed
Day To Count Sunday
---------- ----------------
Sunday 1
Saturday 2
Friday 3
Thursday 4
Wednesday 5
Tuesday 6
Monday 7
So what we need is a formula that we can apply to these situations so that they all result in the same value. We'll need to assign some value to each day and combine that value with the number of addition days needed for Sunday to count. Seems reasonable that if we assign an inverse value to the weekdays and add that to the number of additional days, we can get the same result.
Addl Days Needed Value Assigned
Day To Count Sunday To Weekday Sum
---------- ---------------- -------------- ---
Sunday 1 6 7
Saturday 2 5 7
Friday 3 4 7
Thursday 4 3 7
Wednesday 5 2 7
Tuesday 6 1 7
Monday 7 0 7
So, if weekday_value + addl_days = 7 then we count an extra Sunday. (We'll divide this by 7 later to give us 1 additional Sunday). But how do we assign the values we want to the weekdays? Well, VBScript's Weekday() function already does this but, unfortunately, it doesn't use the values we need by default (it uses 1 for Sunday through 7 for Saturday). We could change the way Weekday() works by using the second param, or we could just use a Mod(). This is where the + 5 Mod 7 comes in. If we take the Weekday() value and add 5, then mod that by 7, we get the values we need.
Day Weekday() +5 Mod 7
---------- --------- -- -----
Sunday 1 6 6
Saturday 7 12 5
Friday 6 11 4
Thursday 5 10 3
Wednesday 4 9 2
Tuesday 3 8 1
Monday 2 7 0
That's how the + 5 Mod 7 was determined. And, with that solved, the rest is easy(er)!
#Zam is on the right track you need to use WeekDay() function, here is a basic idea of how to script it;
<%
Dim month_start, month_end, currentdate, dayofmonth
Dim num_weekdays, num_past, num_future
Dim msg
'This can be configured how you like even use Date().
month_start = CDate("01/08/2014")
month_end = DateAdd("d", -1, DateAdd("m", 1, month_start))
msgbox(Day(month_end))
For dayofmonth = 1 To Day(month_end)
currentdate = CDate(DateAdd("d", dayofmonth, month_start))
'Only ignore Sundays
If WeekDay(currentdate) <> vbSunday Then
num_weekdays = num_weekdays + 1
If currentdate <= Date() Then
num_past = num_past + 1
Else
num_future = num_future + 1
End If
End If
Next
msg = ""
msg = msg & "Start: " & month_start & "<br />"
msg = msg & "End: " & month_end & "<br />"
msg = msg & "Number of Weekdays: " & num_weekdays & "<br />"
msg = msg & "Weekdays Past: " & num_past & "<br />"
msg = msg & "Weekdays Future: " & num_future & "<br />"
Response.Write msg
%>
How about using "The Weekday function returns a number between 1 and 7, that represents the day of the week." ?