Merge two unequal data sets in SAS with replacment - merge

I generated propensity scores in SAS to match two unequal groups with replacement. Now I'm trying to create a dataset where there are an equal number of observations for both groups-- ie there should be observations in group b that repeat since that is the smaller group. Below I have synthetic data to demonstrate what I'm trying to get.
Indicator Income Matchid
1 7 1
1 8 2
1 4 1
0 6 1
0 9 2
And I want it to look like this
Indicator Income Matchid
1 7 1
1 8 2
1 4 1
0 6 1
0 9 2
0 6 1

In a view you can create a variable that is a group sequence number amenable to modulus evaluation. In a data step load the two indicator groups into separate hashes and then for each loop over the largest group size, selecting by index modulus group size.
Example:
data have;
input Indicator Income Matchid;
datalines;
1 7 1
1 8 2
1 4 1
0 6 1
0 9 2
;
data have_v;
set have;
by indicator notsorted;
if first.indicator then group_seq=0; else group_seq+1;
run;
data want;
if 0 then set have_v;
declare hash i1 (dataset:'have_v(where=(indicator=1))', ordered:'a');
i1.defineKey('group_seq');
i1.defineData(all:'yes');
i1.defineDone();
declare hash i0 (dataset:'have_v(where=(indicator=0))', ordered:'a');
i0.defineKey('group_seq');
i0.defineData(all:'yes');
i0.defineDone();
do index = 0 to max(i0.num_items, i1.num_items)-1;
group_seq = mod(index,i1.num_items);
i1.find();
output;
end;
do index = 0 to max(i0.num_items, i1.num_items)-1;
group_seq = mod(index,i0.num_items);
i0.find();
output;
end;
stop;
drop index group_seq;
run;
If the two groups were separated into data sets, you could do similar processing utilizing SET options nobs= and point=

Related

Consecutive episode

Good afternoon.
I have data like this
ID Indicator
1 0
1 1
1 0
1 1
1 0
1 1
2 0
2 1
2 1
2 1
2 1
2 1
2 1
2 1
I need to get ID which has at least 4 consecutive indicators =1. In this example I should get ID = 2, since it has 4 consecutive indicators= 1. Please help me how to do this in SPSS Modeler. Thank you so much for your help. ID 1 has first indicator=0, 2=1, 3=0,4=1, 5=0 , 6=1, ID 2 has first indicator=o, and others all = 1. There are two columns ID and Indicator, ID 1 has 6 rows and 2 has 8 rows.
To be precise: I want to output the ID that has 4 or more indicators set to 1 consecutively.
What you first need as a way to count the number of consecutive Indicator = 1 records for the same ID.
For this, you can use the "Derive" node with the following settings:
Set the 'Derive as' option to Count
Set the 'Increment when' to ID = #OFFSET(ID, 1) and INDICATOR = 1
Set the 'Increment by' to 1
Set the 'Reset when' to INDICATOR = 0
Following the 'Derive' node, you can then use a 'Select' node to only select the records where the number of consecutive 1's is equal to 4, and finally, use a 'Distinct' node to keep only one record for each ID.
I have shared a sample stream that shows the process here.

Qlik Sense - Create bar chart by summing multiple columns

I have several columns of data that are numeric and I want Qlik to create a boxplot by Group which is the sum of each column:
Group Data1 Data2 Data3 Data4 Data5 Data6
1 0 2 0 0 1 0
1 1 3 1 0.5 0 1
2 2 4 0 0 0 0
2 3 5 0 0 0 2
3 3 5 0 0 0 2
If Qlik took the sum of each column, it would look like this:
Obviously, you could chose which Groups to plot. The problem is that I'm trying to plot multiple columns in a single boxplot and I can't find anything similar. I don't know where to begin with this, so any help would be appreciated.
--- update ---
The problem is that boxplots are generally created using a single column and then an aggregate function (sum, count, etc) is applied. So, a typical boxplot would use Group and then count the number of times each group occurs. In my case, I need a single chart with the sum of each Data (Data1 - Data6) column plotted by group. So, my problem is that the single boxplot is derived from several columns, not one.

Calculated field in Tableau

I have a very simple problem but i am totally new in Tableau. So needs some help in solving this problem.
My Data Set contain
Year_Track_4,Year_Track_5,Year_Track_6,Year_Track_7,.... N
Each Year_Track contain 1 /0 values. 1 means graduated and 0 means didnot graduated or failed
enter image description here
y4 y5 N
1 8
0 5
1 6
0 1
1 2
1 5
1 7
1 8
1 5
0 7
1 5
1 8
1 6
1 1
So , I want to create a placeholder in Tableau or Calculated Field or parameter to select one YEAR and count number of graduated or didn't graduated.
I need to create the same for OverAll_0 and OverAll_1 as one Calculated field and it contains the value of 1 and 0 . So, that i can use the SUM(N) and and calculate it.
I used IFF statement to solve this problem
IIF(Year_Track_4 = 0) then 'graduated in 4 year '
.......
......

Subsetting rows from Matlab for which specific column has value greater than zero

I want to subset rows from matrix for which the value in third column is greater than zero. For example, I have a matrix :
test =
1 2 3
4 5 0
4 4 1
4 4 0
Now I want to subset it so that I have
subset =
1 2 3
4 4 1
Any quick suggestion on how I can do this in matlab?
Simply make a logical array that is true for every row you want to keep, and pass it as the index to the rows:
subset = test(test(:,3)>0, :)

How to select random elements from row sequentially from matrix but if the all elements in row selected don't select the same row again

let say i have this matrix
m =
3 1 2 4 6 5
2 3 5 6 1 4
3 4 6 1 2 5
2 1 3 4 5 6
3 2 5 6 1 4
2 4 6 1 5 3
which have 6 raw's and 6 column
i want to select randomly froms raw sequentially the first selection will choose the first
element in that raw
so if by random after covering all the elements in raw 5 i don't want the program to come
to select from it again
example if random iteration 1 select raw1 it will go to the first element in raw1 which is
in column 1
if by random in iteration 2 it select the same raw1 again it will select the the second
element in raw1 which is in column 2
so if i reached column 6 in raw1 and after it the iteration selected raw1 again but my
matrix is 6 column so i want to select by random another raw which is not reached till the
sixth column
let say if each time a raw is selected i will make a value in that column which equal to
one
so if i run 20 iterations
JM =
1 1 1 1 1 0
1 1 0 0 0 0
1 1 1 1 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
by luck i didn't reach the sixth column in any of the raws
but if
JM =
1 1 1 1 1 0
1 1 1 1 0 0
1 1 0 0 0 0
1 1 1 1 1 1
1 0 0 0 0 0
1 1 0 0 0 0
Attempted to access m(5,7); index out of bounds because size(m)=[6,6].
how to continue random selection without coming to that raw which already full filled like
that raw 4 is already full filled
i hope it is easy to understand
iam using this method to create chromosome
which is in the form
m raw colmun =machine job operation
thanks
iam using matlab
I can see three slightly different methods to achieve this.
You should create an array of integers to store the information containing amount of elements you already took from each raw. Lets assume your matrix is [n*n] so the array will be [n]. Lets call it a[n]
first method:
Generate random raw number rNumber.
Check if you can take elements from this raw ( if(rNumber] < n) ). If you can't take an element then go to (1). Otherwise continue to (3)
Take an element m[rNumber][a[rNumber]] and increment a[rNumber]
second method:
Generate random raw number rNumber.
Check if you can take elements from this raw ( if(rNumber] < n) )
If you can't take an element then go to increment rNumber and go to (2). Otherwise continue to (4)
Take an element m[rNumber][a[rNumber]] and increment a[rNumber]
The last method also need to store the amount of raws where you took all the elements rawCount.
Generate random raw number rNumber between 0 and n-rawCount-1
Enumerate through all raws while currentRaw < rNumber. If a[currentRaw] < n then increment currentRaw
Take an element m[currentRaw][a[currentRaw]] and increment a[currentRaw]
If a[currentRaw] = n then increment rawCount
here is something that i wrote in c++. hope this is what were you looking for. don't hesitate to ask if you don't understand something
#include "stdafx.h"
#include<fstream>
#include<ctime>
#include<iostream>
using namespace std;
ifstream f("stack2.in");
ofstream g("stack2.out");
int a[10][10];
int solution[100],n,m,index =0;
void build(int n,int m)//create a MxN matrix, and fill it with values
{
int zz = 1;
for(int i=0;i<n;i++)
for(int j=0;j<m;j++)
{
a[i][j]= zz;
zz++;
}
}
void solve()
{
srand((unsigned)time(0));
int i,j;//column and line
while(index<m*n)
{
i = (rand()%10);//get random values for column and line
j = (rand()%10);
if(a[i][j]!=0) //if the value of matrix[i][j] is not 0, means that this value is new, so we add it to the solution list
{
solution[index]=a[i][j];
index++;
a[i][j]=0; //set the value from matrix[i][j] to 0 so we don't 'visit' again
}
else if(i<=n) //if the value from the matrix[i][j] is equal to 0, we start searching on the line of of i for values that are not 0
{
while(j<m)
{
if(a[i][j+1]!=0) //if matrix[i][j+1] has not been visited before, we add it to the solution, set it to zero and exit the while
{
solution[index]= a[i][j+1];
a[i][j+1]=0;
index++;
j=m; //exit the while
}
else
j++; //keep searching for a value on that line
}
}
}
for(int i=1;i<=m*n;i++)//print the list with random values
g<<solution[i]<<" ";
}
int main()
{
n=10;//or read the values from stack2.in using f>>n>>m
m=10;//in the file should be written on the same line, number of lines and columens of the matrix eg.: 5 7
build(n,m);
solve();
return 0;
}