Condition to compare to previous period rate? - date

I'm having trouble to come with a condition to compare rate variation between different time periods
VAR flag =
CALCULATE (
MAX('rates'[€/KG]),
'rates', EARLIER ( 'rates'[START_ETD] ) > 'rates'[START_ETD], 'rates'[LP_DESC] = EARLIER ('rates'[LP_DESC])
)
RETURN
'rates'[€/KG]
- IF ( flag = BLANK (), 'rates'[€/KG], flag )
The idea would be not to get the MAX but the previous rate number before the time period/date we are considering! Could someone help me?
Table
LP_DESC €/kg START_ETD END_ETD VARIATION
A 2,5 1/07/2022 14/07/2022 1,5
A 3 15/07/2022 31/07/2022 0,5
B 1,5 1/07/2022 14/07/2022 -3
B 3,5 15/07/2022 31/07/2022 2
A 3,5 1/06/2022 14/06/2022 -
A 1 15/06/2022 31/06/2022 0,5
B 2,5 1/06/2022 14/06/2022 -
B 4,5 15/06/2022 31/06/2022 2

Related

How to filter data with starting and ending conditions?

I'm trying to filter my data based on two conditions dependent on sequential dates.
I am looking for values below 2 for 5+ sequential dates,
with a "cushion period" of values 2 to 5 for up to 3 sequential days.
It would look something like this (sorry for the terrible excel attempt here):
Day 1 to Day 10 would be included and day 11 would not be. Days 6 to 8 would be considered the "cushion period." I hope this makes sense!!
Right now, I am able to get the cushion period (in the reprex) only but I cant figure out how to add the start and ending condition for values under 2 for 5 sequential dates to be included (the 5 days could be broken up with the cushion period inbetween but I feel like this might complicate things).
Any help would be GREATLY appreciated!
For my reprex (below), the dates that would be included in the final df are in blue (dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000) and the dates in grey would not be.
Reprex:
library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
Value = c(2,3,4,5,2,2,1,0,1,8,7,9,4,5,2,3,4,5,7,2,6,0,2,1,2,0,3,4,0,1))
head(data)
#Goal: values should include dates from 1/1/2000 to 1/9/2000, and 1/22/2000 to 1/30/2000
#I am able to subset the "cushion period" but I'm not sure how to add the starting and ending conditions for it
attempt1 <- data %>%
group_by(group_id = as.integer(gl(n(),3,n()))) %>%
filter(Value <= 5 & Value >=3) %>%
ungroup() %>%
select(-group_id)
head(attempt1)
If I get it correctly, you need to keep groups of consecutive values that are below or equal to 5 with at least 5 consecutive values below or equal to 2 within it. Here's a way to do that, with some explanation:
library(dplyr)
data %>%
mutate(under_three = Value <= 2) %>%
# under_three = TRUE if Value is below or equal to 2
group_by(rl_two = data.table::rleid(Value <= 2)) %>%
# Group by sequence of values that are under_three
mutate(big = n() >= 5 & all(under_three)) %>%
# big = T if there are more 5 or more consecutive values that are below or equal to 2
group_by(rl_five = data.table::rleid(Value <= 5)) %>%
# ungroup by rl_two, and group by rl_five, i.e. consecutive values that are below or equal to 5
filter(any(big))
# keep from the data frame groups of rl_five if they have at least one big = T; remove other groups.
Output:
data %>%
ungroup() %>%
select(Date, Value)
Date Value
1 2000-01-01 2
2 2000-01-02 3
3 2000-01-03 4
4 2000-01-04 5
5 2000-01-05 2
6 2000-01-06 2
7 2000-01-07 1
8 2000-01-08 0
9 2000-01-09 1
10 2000-01-22 0
11 2000-01-23 2
12 2000-01-24 1
13 2000-01-25 2
14 2000-01-26 0
15 2000-01-27 3
16 2000-01-28 4
17 2000-01-29 0
18 2000-01-30 1

Table sort by month

I have a table in MATLAB with attributes in the first three columns and data from the fourth column onwards. I was trying to sort the entire table based on the first three columns. However, one of the columns (Column C) contains months ('January', 'February' ...etc). The sortrows function would only let me choose 'ascend' or 'descend' but not a custom option to sort by month. Any help would be greatly appreciated. Below is the code I used.
sortrows(Table, {'Column A','Column B','Column C'} , {'ascend' , 'ascend' , '???' } )
As #AnonSubmitter85 suggested, the best thing you can do is to convert your month names to numeric values from 1 (January) to 12 (December) as follows:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t.ColumnC = month(datenum(t.ColumnC,'mmmm'));
This will facilitate the access to a standard sorting criterion for your ColumnC too (in this example, ascending):
t = sortrows(t,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
If, for any reason that is unknown to us, you are forced to keep your months as literals, you can use a workaround that consists in sorting a clone of the table using the approach described above, and then applying to it the resulting indices:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t_original = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t_clone = t_original;
t_clone.ColumnC = month(datenum(t_clone.ColumnC,'mmmm'));
[~,idx] = sortrows(t_clone,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
t_original = t_original(idx,:);

split rowcount of a table by 3 ways in perl

I am getting rowcount of a sybase table in perl. For example table have 100 rows, so n=100
I want to split this value into 3 parts
1-33 | 34-66 | 67-99 or 100
please advise how do get this in perl.
Reason for this split: I need to pass the values 1 and 33 as input parameter to a stored proc to select rows whose identity column value is between 1 and 33.
same goes for 34-66 & 67-99
The interesting part is deciding where each range starts. From there it's easy to decide that each range ends at one less than the start of the next range.
This partition() function will determine the start points for given number of partitions within a given number of elements starting at a given offset.
sub partition {
my ($offset, $n_elements, $n_partitions) = #_;
die "Cannot create $n_partitions partitions from $n_elements elements.\n"
if $n_partitions > $n_elements;
my $step = int($n_elements / $n_partitions);
return map {$step * $_ + $offset} 0 .. $n_partitions - 1;
}
Here's how it works:
First, determine what the step should be by dividing the number of elements by the number of partitions, and preserving the integer by truncating any trailing decimal places.
Next walk through the steps by starting at zero and multiplying by the step number (or the partition number). So if the step is 5 then 5*0=0, 5x1=5, 5x2=10, and so on. We will not look at the last step, because it makes more sense to include an "off by one" in the last partition than to start a new partition with only one element.
Finally, we allow for an offset to be applied, so that partition(0,100,5)means to find the starting element positions for five partitions starting at zero and continuing for 100 elements (so a range of 0 to 99). And partition(1,100,5) would mean start at 1 and continue to 100 elements partitioning in five segments, so a range of 1 to 100.
Here's an example of putting the function to use to find the partition points in a set of several ranges:
use strict;
use warnings;
use Test::More;
sub partition {
my ($offset, $n_elements, $n_partitions) = #_;
die "Cannot create $n_partitions partitions from $n_elements elements.\n"
if $n_partitions > $n_elements;
my $step = int($n_elements / $n_partitions);
return map {$step * $_ + $offset} 0 .. $n_partitions - 1;
}
while(<DATA>) {
chomp;
next unless length;
my ($off, $n_elems, $n_parts, #starts) = split /,\s*/;
local $" = ',';
is_deeply
[partition($off, $n_elems, $n_parts)],
[#starts],
"Partitioning $n_elems elements starting at $off by $n_parts yields start positions of [#starts]";
}
done_testing();
__DATA__
0,10,2,0,5
1,11,2,1,6
0,3,2,0,1
0,7,3,0,2,4
0,21,3,0,7,14
0,21,7,0,3,6,9,12,15,18
0,20,3,0,6,12
0,100,4,0,25,50,75
1,100,4,1,26,51,76
1,100,3,1,34,67
0,10,1,0
1,10,10,1,2,3,4,5,6,7,8,9,10
This yields the following output:
ok 1 - Partitioning 10 elements starting at 0 by 2 yields start positions of [0,5]
ok 2 - Partitioning 11 elements starting at 1 by 2 yields start positions of [1,6]
ok 3 - Partitioning 3 elements starting at 0 by 2 yields start positions of [0,1]
ok 4 - Partitioning 7 elements starting at 0 by 3 yields start positions of [0,2,4]
ok 5 - Partitioning 21 elements starting at 0 by 3 yields start positions of [0,7,14]
ok 6 - Partitioning 21 elements starting at 0 by 7 yields start positions of [0,3,6,9,12,15,18]
ok 7 - Partitioning 20 elements starting at 0 by 3 yields start positions of [0,6,12]
ok 8 - Partitioning 100 elements starting at 0 by 4 yields start positions of [0,25,50,75]
ok 9 - Partitioning 100 elements starting at 1 by 4 yields start positions of [1,26,51,76]
ok 10 - Partitioning 100 elements starting at 1 by 3 yields start positions of [1,34,67]
ok 11 - Partitioning 10 elements starting at 0 by 1 yields start positions of [0]
ok 12 - Partitioning 10 elements starting at 1 by 10 yields start positions of [1,2,3,4,5,6,7,8,9,10]
1..12
For additional examples look at Split range 0 to M into N non-overlapping (roughly equal) ranges. on PerlMonks.
Your question is looking for complete range start and end points. This method makes it rather trivial:
sub partition {
my ($offset, $n_elements, $n_partitions) = #_;
my $step = int($n_elements / $n_partitions);
return map {$step * $_ + $offset} 0 .. $n_partitions - 1;
}
my $n_elems = 100;
my $offset = 1;
my $n_parts = 3;
my #starts = partition($offset, $n_elems, $n_parts);
my #ranges = map{
[
$starts[$_],
($starts[$_+1] // $n_elems+$offset)-1,
]
} 0..$#starts;
print "($_->[0], $_->[1])\n" foreach #ranges;
The output:
(1, 33)
(34, 66)
(67, 100)
Even more implementation examples appear in Algorithm for dividing a range into ranges and then finding which range a number belongs to on the StackExchange Software Engineering forum.

Tableau: last value calculation in calculated field

my dataset is like this
KPI VALUE TYPE DATE
coffee break duration 11 0 30/06/2015
coffee break duration 12 0 31/07/2015
coffee break duration 10 0 30/11/2014
coffee break duration 10 0 31/12/2014
coffee expense 20 1 31/07/2015
coffee expense 20 1 31/12/2014
coffee consumers 15 -1 31/07/2015
coffee consumers 17 -1 31/12/2014
for Type, 0 means minutes, 1 means dollars and -1 means people
I want to get a table like this
KPI Year(date) YTD
coffee break duration 2015 11,5
coffee break duration 2014 10
....
YTD calculation is:
if sum([TYPE]) = 0 then avg([VALUE])
elseif sum([TYPE]) > 0 then sum([VALUE])
elseif sum([TYPE]) < 0 then [last value for the considered year]
end
By [Last value for the considered year] I mean the last entry available, in a year if my table is set to Year, otherwise it has to change dynamically based on what Timespan I want to show.
What can I do to have [last value for the considered year] as a calc field ready to use in my YTD calc?
Many thanks,
Stefania
If I understand your question, than you can use LOD in the IF statement
if sum(type) = 0 then avg([value])
elseif sum([type]) > 0 then sum(value)
elseif sum([type]) < 0 then max(if [date] = { INCLUDE kpi: max(date)} then [value] end)
end
If there are several values on the last day of the considered year, it would take the biggest value
I slightly modified your data to show that results are working correctly

Use Lag function in SAS find difference and delete if the value is less than 30

Eg.
Subject Date
1 2/10/13
1 2/15/13
1 2/27/13
1 3/15/13
1 3/29/13
2 1/11/13
2 1/31/13
2 2/15/13
I would need only the subjects with the dates between them more than 30.
required output:
Subject Date
1 2/10/13
1 3/15/13
2 1/11/13
2 2/15/13
This is a very interesting problem. I'll use the retain statement in the DATA step.
Since we are trying to compare dates between different observations, it's a bit more difficult. We can take advantage of the fact that SAS can convert dates to SAS date values (i.e. number of days after Jan 1 1960). Then we can compare these numeric values using conditional statements.
data work.test;
input Subject Date anydtdte15.;
sasdate = Date;
retain x;
if -30 <= sasdate - x <= 30 then delete;
else x = sasdate;
datalines;
1 2/10/13
1 2/15/13
1 2/27/13
1 3/15/13
1 3/29/13
2 1/11/13
2 1/31/13
2 2/15/13
;
run;
proc print data=test;
format Date mmddyy8.;
var Subject Date;
run;
OUTPUT as required:
Obs Subject Date
1 1 02/10/13
2 1 03/15/13
3 2 01/11/13
4 2 02/15/13