I have several data files and totals within those data files that I need to re-calculate.
The variables are broken out by race/ethnicity * sex and then a total is given.
The pattern is repeated for several measures and I cannot re-structure the data files. I have to keep the structure intact.
UPDATED: For example below are the first 32 variables (and 10 rows of data) in one of the files -- Hispanic males, Hispanic females, American Indian males, American Indian females....total males, and total females for grade 8 and then grade 9.
I have over 100 of these totals to do so I want to automate the process. How can I select the 7 prior variables that end in _M or _F to sum (or something to that extent)? TIA!!!
G08_HI_M G08_HI_F G08_AM_M G08_AM_F G08_AS_M G08_AS_F G08_HP_M G08_HP_F G08_BL_M G08_BL_F G08_WH_M G08_WH_F G08_TR_M G08_TR_F TOT_G08_M TOT_G08_F G09_HI_M G09_HI_F G09_AM_M G09_AM_F G09_AS_M G09_AS_F G09_HP_M G09_HP_F G09_BL_M G09_BL_F G09_WH_M G09_WH_F G09_TR_M G09_TR_F TOT_G09_M TOT_G09_F
5 2 9 6 2 3 6 9 7 4 1 4 8 4 . . 7 11 2 13 4 2 14 10 10 13 2 11 9 5 . .
7 1 8 10 2 4 8 0 1 2 8 3 4 5 . . 7 13 12 13 5 15 3 2 2 13 11 15 3 15 . .
7 8 10 9 0 4 7 9 8 0 3 10 7 1 . . 15 9 11 9 11 9 6 7 14 9 12 8 6 14 . .
4 8 9 0 10 6 4 3 10 9 2 5 8 2 . . 13 2 5 13 3 14 5 15 10 15 7 11 9 6 . .
7 6 5 1 4 5 7 4 5 1 8 3 4 4 . . 9 7 7 2 4 8 3 4 3 10 9 8 7 7 . .
3 1 0 2 4 10 2 10 5 9 7 1 8 8 . . 7 9 5 7 13 6 12 13 10 6 2 13 3 12 . .
5 7 4 1 7 9 6 8 3 1 3 2 10 4 . . 14 12 8 5 6 2 2 5 6 4 12 6 4 5 . .
8 9 3 2 3 10 6 5 9 10 8 1 4 5 . . 10 2 3 8 3 15 3 14 9 14 3 12 4 12 . .
4 3 2 6 4 1 2 5 5 6 4 5 4 1 . . 3 14 12 12 15 10 14 11 5 8 9 14 7 15 . .
1 10 4 2 1 3 9 8 3 3 3 0 3 1 . . 12 9 5 7 14 9 13 9 6 14 5 7 13 13 . .
Is this your data? You mention you can change the data structure so you will need to "program the variable names" see below where I have example of create arrays (variable lists) that groups the names for G and sex. as implied by your TOT_: variables.
data G;
input G08_HI_M G08_HI_F G08_AM_M G08_AM_F G08_AS_M G08_AS_F G08_HP_M G08_HP_F
G08_BL_M G08_BL_F G08_WH_M G08_WH_F G08_TR_M G08_TR_F TOT_G08_M TOT_G08_F
G09_HI_M G09_HI_F G09_AM_M G09_AM_F G09_AS_M G09_AS_F G09_HP_M G09_HP_F
G09_BL_M G09_BL_F G09_WH_M G09_WH_F G09_TR_M G09_TR_F TOT_G09_M TOT_G09_F;
cards;
5 2 9 6 2 3 6 9 7 4 1 4 8 4 . . 7 11 2 13 4 2 14 10 10 13 2 11 9 5 . .
7 1 8 10 2 4 8 0 1 2 8 3 4 5 . . 7 13 12 13 5 15 3 2 2 13 11 15 3 15 . .
7 8 10 9 0 4 7 9 8 0 3 10 7 1 . . 15 9 11 9 11 9 6 7 14 9 12 8 6 14 . .
4 8 9 0 10 6 4 3 10 9 2 5 8 2 . . 13 2 5 13 3 14 5 15 10 15 7 11 9 6 . .
7 6 5 1 4 5 7 4 5 1 8 3 4 4 . . 9 7 7 2 4 8 3 4 3 10 9 8 7 7 . .
3 1 0 2 4 10 2 10 5 9 7 1 8 8 . . 7 9 5 7 13 6 12 13 10 6 2 13 3 12 . .
5 7 4 1 7 9 6 8 3 1 3 2 10 4 . . 14 12 8 5 6 2 2 5 6 4 12 6 4 5 . .
8 9 3 2 3 10 6 5 9 10 8 1 4 5 . . 10 2 3 8 3 15 3 14 9 14 3 12 4 12 . .
4 3 2 6 4 1 2 5 5 6 4 5 4 1 . . 3 14 12 12 15 10 14 11 5 8 9 14 7 15 . .
1 10 4 2 1 3 9 8 3 3 3 0 3 1 . . 12 9 5 7 14 9 13 9 6 14 5 7 13 13 . .
;;;;
run;
You can do something like this to create some array that can be used in SAS statistics functions.
proc transpose data=g(obs=0 drop=tot_:) out=gnames;
var _all_;
run;
data gnames;
set gnames;
g = input(substrn(_name_,2,3),2.);
length race $2; race=scan(_name_,2,'_');
length sex $1; sex =scan(_name_,3,'_');
run;
proc sort;
by g sex;
run;
proc print;
run;
data _null_;
set gnames;
by g sex;
if first.sex then put +3 'Array G' g z2. sex $1. '[*] ' #;
put _name_ #;
if last.sex then put ';';
run;
405 data _null_;
406 set gnames;
407 by g sex;
408 if first.sex then put +3 'Array G' g z2. sex $1. '[*] ' #;
409 put _name_ #;
410 if last.sex then put ';';
411 run;
Array G08F[*] G08_HI_F G08_AM_F G08_AS_F G08_HP_F G08_BL_F G08_WH_F G08_TR_F ;
Array G08M[*] G08_HI_M G08_AM_M G08_AS_M G08_HP_M G08_BL_M G08_WH_M G08_TR_M ;
Array G09F[*] G09_HI_F G09_AM_F G09_AS_F G09_HP_F G09_BL_F G09_WH_F G09_TR_F ;
Array G09M[*] G09_HI_M G09_AM_M G09_AS_M G09_HP_M G09_BL_M G09_WH_M G09_TR_M ;
It seems the totals are interspersed between the variables that are to be summed, so we can sum "every variable since the last total that meet some criteria, such as ending in '_F'"?
It can for example be done as below. I used a simplified data set, but the totals are the sum of every variabel since the last total for each gender. I used proc contents to get the variable list. I then go down that list building a sum expression for men and one for females. When a variable named tot is encountered, a finished line of the form tot1_M=sum(var1_M,var2_M,var3_M); is outputed. These lines are collected in the macro variable totals and inserted in a data step.
If you know it's always 7 variables for men, 7 females and then a sum, so you can use simply position and not the name, there is an easier solution below.
data old;
var1_M=1;
var1_F=2;
var2_M=3;
var2_F=4;
var3_M=5;
var3_F=6;
tot1_M=.;
tot1_F=.;
var4_M=7;
var4_F=8;
var5_M=9;
var5_F=10;
var6_M=11;
var6_F=12;
tot2_M=.;
tot2_F=.;
run;
proc contents data=old out=contents noprint;
run;
proc sort data=contents;
by varnum;
run;
data temp;
set contents;
length sumline_F sumline_M $400;
if _n_=1 then do;
sumline_M="sum(";
sumline_F="sum(";
end;
retain sumline_M sumline_F;
if find(name, "_M")>0 and find(name,"tot")=0 then sumline_M=cat(strip(sumline_M),strip(name), ", ");
else if find(name, "_F")>0 and find(name,"tot")=0 then sumline_F=cat(strip(sumline_F), strip(name), ", ");
if find(name,"tot")>0 and find(name,"_M")>0 then do;
sumline_M=substr(sumline_M,1, length(sumline_M)-1);
finline=cat(strip(name), "=", strip(sumline_M),");");
sumline_M="sum(";
end;
if find(name,"tot")>0 and find(name,"_F")>0 then do;
sumline_F=substr(sumline_F,1, length(sumline_F)-1);
finline=cat(strip(name), "=", strip(sumline_F),");");
sumline_F="sum(";
end;
run;
proc sql;
select finline
into :totals separated by " "
from temp
where not missing(finline);
data new;
set old;
&totals;
run;
If the order is always the same (always Male-female), you could go like this:
/* Defining data. Note that _M _F are always alternating, with no variables missing*/
data old;
var1_M=1;
var1_F=2;
var2_M=3;
var2_F=4;
var3_M=5;
var3_F=6;
var4_M=5;
var4_F=6;
var5_M=5;
var5_F=6;
var6_M=5;
var6_F=6;
var7_M=5;
var7_F=6;
tot1_M=.;
tot1_F=.;
var8_M=7;
var8_F=8;
var9_M=9;
var9_F=10;
var10_M=11;
var10_F=12;
var11_M=11;
var11_F=12;
var12_M=11;
var12_F=12;
var13_M=11;
var13_F=12;
var14_M=11;
var14_F=12;
tot2_M=.;
tot2_F=.;
run;
/* We have 7 _M and 7 _F-variables, so the first sum variable is number 15, the next 16. Adding 16 gived us the numbers of the next sum-variables*/
data totals;
do i=15 to 200 by 16;
output;
end;
do i=16 to 200 by 16;
output;
end;
run;
/* Puts the index of the sum variables into a macro variable*/
proc sql;
select i
into :sumvars separated by " "
from totals;
/* Loop variables using an array. If it is a sum variable, it's the sum of the 7 last variables, skipping every other.*/
data new;
set old;
array vars{*} _all_;
do i=1 to dim(vars);
if i in (&sumvars) then do;
vars{i}=sum(vars{i-2}, vars{i-4}, vars{i-6}, vars{i-8}, vars{i-10}, vars{i-12}, vars{i-14});
end;
end;
drop i;
run;
Related
I have a function that takes 2 parameters: date and sym. I would like to do this for multiple dates and multiple sym. I have a list for each parameter. I can currently loop through 1 list using
raze function[2020.07.07;] peach symlist
How can I do something similar but looping through the list of dates too?
You may try following:
Create list of pairs of input parameters.
Write anonymous function which calls your function and use peachon list op paired parameters
For example
symlist: `A`B`C; // symlist defined for testing
function: {(x;y)}; // function defined for testing
raze {function . x} peach (2020.07.07 2020.07.08 2020.07.09) cross symlist
I think this could work:
raze function'[2020.07.07 2020.07.08 2020.07.09;] peach symlist
If not some more things to consider. Could you change your function to accept a sym list instead of individual syms by including an each/peach inside it? Then you could each the dates.
Also, you could create a new list of each date matched with the symlist and create a new function which takes this list and does whatever the initial function did by separating the elements of the list.
q)dates
2020.08.31 2020.09.01 2020.09.02
q)sym
`llme`obpb`dhca`mhod`mgpg`jokg`kgnd`nhke`oofi`fnca`jffe`hjca`mdmc
q)func
{[date;syms]string[date],/:string peach syms}
q)func2
{[list]func[list 0;list 1]}
q)\t res1:func[;sym]each dates
220
q)\t res2:func[;sym]peach dates
102
q)
q)func2
{[list]func[list 0;list 1]}
q)dateSymList:dates,\:enlist sym
q)\t res3:func2 peach dateSymList
80
q)res3~res2
1b
q)res3~res1
1b
Let us know if any of those solutions work, thanks.
Some possible ways to do this
Can project dyadic f as monadic & parallelise over list of argument pairs
q)a:"ABC";b:til 3;f:{(x;y)}
q)\s 4
q)(f .)peach l:raze a,\:/:b
"A" 0
"B" 0
"C" 0
"A" 1
"B" 1
"C" 1
"A" 2
"B" 2
"C" 2
Or could define function to take a dictionary argument & parallelise over a table
q)f:{x`c1`c2}
q)f peach flip`c1`c2!flip l
"A" 0
"B" 0
"C" 0
"A" 1
"B" 1
"C" 1
"A" 2
"B" 2
"C" 2
Jason
I'll generalize everything, if you have a given function foo which will operate on an atom dt with a vector s
q)foo:{[dt;s] dt +\: s}
q)dt:10?10
q)s:100?10
q)dt
8 1 9 5 4 6 6 1 8 5
q)s
4 9 2 7 0 1 9 2 1 8 8 1 7 2 4 5 4 2 7 8 5 6 4 1 3 3 7 8 2 1 4 2 8 0 5 8 5 2 8..
q)foo[;s] each dt
12 17 10 15 8 9 17 10 9 16 16 9 15 10 12 13 12 10 15 16 13 14 12 9 11 11 ..
5 10 3 8 1 2 10 3 2 9 9 2 8 3 5 6 5 3 8 9 6 7 5 2 4 4 ..
13 18 11 16 9 10 18 11 10 17 17 10 16 11 13 14 13 11 16 17 14 15 13 10 12 12 ..
9 14 7 12 5 6 14 7 6 13 13 6 12 7 9 10 9 7 12 13 10 11 9 6 8 8 ..
The solution is to project the symList over the function in question, then use each (or peach) for the date variable.
If your function requires an atomic date and sym, then you can just create a new function to implement this
q)bar:{[x;y] foo[x;] each y};
datelist:`date$10?10
symlist:10?`IBM`MSFT`GOOG
function:{0N!(x;y)}
{.[function;x]} each datelist cross symlist
I would like to explore a better way to apply binary function which iterate via each element of the two argument. Let make the question simpler by using below function as an example:
func:{x+y}
a:til 10
q)a
0 1 2 3 4 5 6 7 8 9
b:a
q)b:a
q)b
0 1 2 3 4 5 6 7 8 9
What I what to get is the cross production such that each element of the argument will cross each other and apply the function. My expected result is
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 11 3 4 5 6 7 8 9 10 11 12 4 5 6 7 8 9 10 11 12 13 5 6 7 8 9 10 11 12 13 14 6 7 8 9 10 11 12 13 14 15 7 8 9 10 11 12 13 14 15 16 8 9 10 11 12 13 14 15 16 17 9 10 11 12 13 14 15 16 17 18
My current solution is crossing the the list of argument first:
(func/) each (a cross b)
I wonder is there a better way to doing that? simply using func'[a;b] will just get a pairwise result which not what I want.
The following should be what you are looking for:
a +/:\: b
The same can apply for other defined functions too, for example:
a {x mod y}/:\: b
You do not need cross for this just each-right or each-left. Because '+' is a vector function you can just iterate over one list and use other list as full vector.
q) a+/:b
I currently have a 3 by 3 matrix "m":
1 2 3
4 5 6
7 8 9
I would like to add a row to matrix 'm' to have a resultant matrix of:
1 2 3
4 5 6
7 8 9
10 11 12
A matrix in q is just a list of lists where inner lists represent rows.
m: ((1 2 3);(4 5 6);(7 8 9))
In order to add one more row all you have to do is add one more inner list to it:
m: m,enlist 10 11 12
enlist is important here, without it you'll end up with this:
q)((1 2 3);(4 5 6);(7 8 9)),10 11 12
1 2 3
4 5 6
7 8 9
10
11
12
I agree; using 0N!x to view the structure is very useful.
To achieve what you want then you can simply do;
q)show m:3 cut 1+til 9 /create matrix
1 2 3
4 5 6
7 8 9
q)show m,:10 11 12 /join new 'row'
1 2 3
4 5 6
7 8 9
10 11 12
q)
I have a matrix A ,and vector x as following (left side)
where S0, H0,...is row number of each block. I want to exchange these blocks such that S0 and S1; H0 and H1 are near together as right side. This is my code
S0=3;
H0=2;
N0=2;
S1=4;
H1=5;
N1=4;
Cols=5;
Rows=S0+H0+N0+S1+H1+N1;
A=randi(10,[ Rows Cols]);
x=randi(10,[Rows 1]);
%% Exchange two block
temp=A(S0+H0+1:S0+H0+N0,1:end);
A(S0+H0+1:S0+H0+H1,1:end)=A(S0+H0+N0+S1+1:S0+H0+N0+S1+H1,1:end);
A(S0+H0+N0+S1+1:S0+H0+N0+S1+H1,1:end)=temp;
%% How exchange x
The above code is not work. How can I fixed it in MATLAB? Thank in advance.
One approach with mat2cell and cell2mat -
grps = [S0,H0,N0,S1,H1,N1]
new_pattern = [1 4 2 5 3 6]
celldata_roworder = mat2cell((1:size(A,1))',grps); %//'
newx = cell2mat(celldata_roworder(new_pattern)).'; %//'
newA = A(newx,:)
Sample run -
Input :
A =
6 8 9 8 7
4 8 8 3 4
3 8 2 1 10
5 2 6 8 3
5 7 4 7 7
4 5 6 8 7
6 3 4 7 4
8 1 5 5 2
5 9 2 4 1
5 2 3 9 5
2 2 1 4 2
1 7 10 9 8
3 9 7 8 4
4 6 10 9 9
7 8 2 6 8
10 2 10 7 6
10 10 8 10 2
5 6 6 5 10
3 7 5 1 3
8 1 3 9 10
grps =
3 2 2 4 5 4
new_pattern =
1 4 2 5 3 6
Output:
newx =
1 2 3 8 9 10 11 4 5 12 ...
13 14 15 16 6 7 17 18 19 20
newA =
3 3 2 5 8
4 3 3 7 7
1 5 2 8 1
4 6 4 1 4
7 1 5 8 8
4 9 10 10 8
7 10 10 4 3
7 3 1 6 9
2 9 2 6 10
1 1 7 10 3
10 10 10 4 7
9 1 8 9 5
8 7 4 5 7
9 8 7 5 3
1 10 7 6 8
8 1 10 6 1
4 6 3 3 2
7 9 3 2 9
6 9 7 4 8
6 7 6 8 10
I assume you are using a 2-dimensional matrix with Row rows and Cols columns.
You can use the colon : as a second index to address a full row, e.g. for the third row:
A(3, :)
(equal to A(3, 1:end) but little bit clearer).
So you could split your matrix into lines and re-arrange them like this (putting back together the lines to a two-dimensional matrix):
A = [ A(3:4, :); A(1:2, :); A(5:end, :) ]
This moves rows 3 and 4 at the beginning, then old lines 1 and 2 and then all the rest. Does this help you?
Hint: you can use eye for experimenting.
I have an array A, I want to arrange each row in descending order to get a new array B. How could I do this ?
E.g.
Array A (original array):
11 9 13 10
12 4 1 6
13 5 12 11
Array B (rearranged array):
13 11 10 9
12 6 4 1
13 12 11 5
>> A=[11 9 13 10;12 4 1 6;13 5 12 11]
A =
11 9 13 10
12 4 1 6
13 5 12 11
>> sort(A,2,'descend')
ans =
13 11 10 9
12 6 4 1
13 12 11 5
For details type: help sort at Matlab command window