Consecutive episode - spss-modeler

Good afternoon.
I have data like this
ID Indicator
1 0
1 1
1 0
1 1
1 0
1 1
2 0
2 1
2 1
2 1
2 1
2 1
2 1
2 1
I need to get ID which has at least 4 consecutive indicators =1. In this example I should get ID = 2, since it has 4 consecutive indicators= 1. Please help me how to do this in SPSS Modeler. Thank you so much for your help. ID 1 has first indicator=0, 2=1, 3=0,4=1, 5=0 , 6=1, ID 2 has first indicator=o, and others all = 1. There are two columns ID and Indicator, ID 1 has 6 rows and 2 has 8 rows.
To be precise: I want to output the ID that has 4 or more indicators set to 1 consecutively.

What you first need as a way to count the number of consecutive Indicator = 1 records for the same ID.
For this, you can use the "Derive" node with the following settings:
Set the 'Derive as' option to Count
Set the 'Increment when' to ID = #OFFSET(ID, 1) and INDICATOR = 1
Set the 'Increment by' to 1
Set the 'Reset when' to INDICATOR = 0
Following the 'Derive' node, you can then use a 'Select' node to only select the records where the number of consecutive 1's is equal to 4, and finally, use a 'Distinct' node to keep only one record for each ID.
I have shared a sample stream that shows the process here.

Related

Counting presence or absence of tallies by rows

Hi I am not sure how to explain what I need but I'll try. I need a query (if there is one) for counting if a tally was present in a point (column). So all species with more than one tally in a point will count only as one.
This is how the data looks:
Sp Site Pnt1 Pnt2 Pnt3 Total
A 1 1 1 1 3
A 2 1 1 2
A 3 1 1
B 1 1 1 1 3
B 2 1 1
C 1 1 1 2
C 2 0
I want to count if the sites have tally or not and if they are repeated by points (for the same species) I want to count them as one. I would like the resulting table to look like this.
Sp Pnt1 Pnt2 Pnt3 Total
A 1 1 1 3
B 1 1 1 3
C 0 1 1 2
Thanks for all the help you can provide.

Merge two unequal data sets in SAS with replacment

I generated propensity scores in SAS to match two unequal groups with replacement. Now I'm trying to create a dataset where there are an equal number of observations for both groups-- ie there should be observations in group b that repeat since that is the smaller group. Below I have synthetic data to demonstrate what I'm trying to get.
Indicator Income Matchid
1 7 1
1 8 2
1 4 1
0 6 1
0 9 2
And I want it to look like this
Indicator Income Matchid
1 7 1
1 8 2
1 4 1
0 6 1
0 9 2
0 6 1
In a view you can create a variable that is a group sequence number amenable to modulus evaluation. In a data step load the two indicator groups into separate hashes and then for each loop over the largest group size, selecting by index modulus group size.
Example:
data have;
input Indicator Income Matchid;
datalines;
1 7 1
1 8 2
1 4 1
0 6 1
0 9 2
;
data have_v;
set have;
by indicator notsorted;
if first.indicator then group_seq=0; else group_seq+1;
run;
data want;
if 0 then set have_v;
declare hash i1 (dataset:'have_v(where=(indicator=1))', ordered:'a');
i1.defineKey('group_seq');
i1.defineData(all:'yes');
i1.defineDone();
declare hash i0 (dataset:'have_v(where=(indicator=0))', ordered:'a');
i0.defineKey('group_seq');
i0.defineData(all:'yes');
i0.defineDone();
do index = 0 to max(i0.num_items, i1.num_items)-1;
group_seq = mod(index,i1.num_items);
i1.find();
output;
end;
do index = 0 to max(i0.num_items, i1.num_items)-1;
group_seq = mod(index,i0.num_items);
i0.find();
output;
end;
stop;
drop index group_seq;
run;
If the two groups were separated into data sets, you could do similar processing utilizing SET options nobs= and point=

KDB+/Q: How to create a column that increments the occurrence of unique values of another column?

I am trying to create a column that increments the occurrence of unique (not the same as the previous) values in another column as such:
x y
=====
1 | 0
1 | 0
2 | 1
4 | 2
1 | 3
How could one achieve this functionality in kdb+?
Thanks
Does this work?
q)t:([]x:1 1 2 4 1)
q)update y:(sums 0b,1_differ x)from t
x y
---
1 0
1 0
2 1
4 2
1 3
differ looks at a list (or column of a table) and returns a list that is 1b in positions where the item is different to the item before that. It always starts with 1b though, so we have to drop the first element of the list using 1_ and add a 0b at the beginning with 0b,. Then we just take the running sum using sums.

Calculated field in Tableau

I have a very simple problem but i am totally new in Tableau. So needs some help in solving this problem.
My Data Set contain
Year_Track_4,Year_Track_5,Year_Track_6,Year_Track_7,.... N
Each Year_Track contain 1 /0 values. 1 means graduated and 0 means didnot graduated or failed
enter image description here
y4 y5 N
1 8
0 5
1 6
0 1
1 2
1 5
1 7
1 8
1 5
0 7
1 5
1 8
1 6
1 1
So , I want to create a placeholder in Tableau or Calculated Field or parameter to select one YEAR and count number of graduated or didn't graduated.
I need to create the same for OverAll_0 and OverAll_1 as one Calculated field and it contains the value of 1 and 0 . So, that i can use the SUM(N) and and calculate it.
I used IFF statement to solve this problem
IIF(Year_Track_4 = 0) then 'graduated in 4 year '
.......
......

groupby functions to get subsequent value

In my data I have stock volumes for order sequence and times, I need to go through each part of the order and find when it ends, by grabbing the next part of the chains time.
I am just starting in python and I would do this by subsetting each stock into its own pool, then adding then do another loop to find the time of the next order for that sequence. Ultimately, in R/Matlab you could go X$time[1:end-1] <- X$time[2:end,]
My question: can I use the df.groupby['sequence'].{for each entry get the time from the subsequent entry}???
I think last() would give me the last value of that entire sequence, I would like the time of that the next sequence starts/ appears
I have a set of type:
sequence time
a 1
b 1
a 3
a 5
b 2
I would like
sequence time nexttime
a 1 3
b 1 2
a 3 5
a 5 999
b 2 999
In [24]: df
Out[24]:
sequence time
0 a 1
1 b 1
2 a 3
3 a 5
4 b 2
In [25]: df['nexttime'] = df.groupby('sequence').time.shift(-1).fillna(999)
In [26]: df
Out[26]:
sequence time nexttime
0 a 1 3
1 b 1 2
2 a 3 5
3 a 5 999
4 b 2 999