Looping through a set of variables sharing the same prefix - macros

I routinely work with student exam files, where each response to an exam item is recorded in points. I want to transform that variable into 1 or 0 (effectively reducing each item to Correct or Incorrect).
Every dataset has the same nomenclature, where the variable is prefixed with points_ and then followed by an identification number (e.g., points_18616). I'm using the following syntax:
RECODE points_18616 (0=Copy) (SYSMIS=SYSMIS) (ELSE=1) INTO Binary_18616.
VARIABLE LABELS Binary_18616 'Binary Conversion of Item_18616'.
EXECUTE.
So I end up creating this syntax for each variable, and since every dataset is different, it gets tedious. Is there a way to loop through a dataset and perform this transformation on all variables that are prefixed with points_?

Here is a way to do it:
First I'll create a little fake data to demonstrate on:
data list list/points_18616 points_18617 points_18618 points_18619 (4f2).
begin data
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 9
end data.
* the following code will create a list of all the relevant variables in a new file.
SPSSINC SELECT VARIABLES MACRONAME="!list" /PROPERTIES PATTERN = "points_*".
* now we'll use the created list in a macro to loop your syntax over all the vars.
define !doList ()
!do !lst !in(!eval(!list))
RECODE !lst (0=Copy) (SYSMIS=SYSMIS) (ELSE=1) INTO !concat("Binary", !substr(!lst,7)).
VARIABLE LABELS !concat("Binary", !substr(!lst,7)) !concat("'Binary Conversion of Item",!substr(!lst,7) ,"'.").
!doend
!enddefine.
!doList.
EXECUTE.

Related

How to put multiple where statements into function on kdb+

I'm trying to write a function using kdb+ which will look at the list, and find the values that simply meet two conditions.
Let's call the list DR (for data range). And I want a function that will combine these two conditions
"DR where (DR mod 7) in 2"
and
"DR where (DR.dd) in 1"
I'm able to apply them one at a time but I really need to combine them into one function. I was hoping I could do this
"DR was (DR.dd mod 7) in 2 and DR where (DR.dd) in 1"
but this obviously didn't work. Any advice?
You can utilize the and function to help with this, which is the same as &:
q)dr:.z.d+til 100
q)and
&
q)2=dr mod 7
10000001000000100000010000001000000100000010000001000000100000010000001000000..
q)1=dr.dd
00000000000000000000000001000000000000000000000000000000100000000000000000000..
q)(1=dr.dd)&2=dr mod 7
00000000000000000000000000000000000000000000000000000000100000000000000000000..
q)dr where(1=dr.dd)&2=dr mod 7
2021.02.01 2021.03.01
Its necessary wrap the first part in brackets due to how kdb reads code from right to left. This format changes slightly when doing this in a where clause, the brackets arent needed due to how each where clause is parsed, that is each clause between the commas are parsed seperately. However it is essentially doing the same thing as the code above.
q)t:([]date:dr)
q)select from t where 1=date.dd,2=date mod 7
date
----------
2021.02.01
2021.03.01
You could also do this using min to achieve similar results, like so:
DR where min(1=DR.dd;2=DR mod 7)

How to find max&min of all variables with SPSS and display in table?

I have a table with about 500 variables and 2000 cases. The type of these variables varies. My supervisor has asked me to produce a table listing all the numeric variables, along with their maximums and minimums. I am supposed to use SPSS because R apparently messes up the value labels.
I've only done very basic things in SPSS before this, like finding statistics for single variables, and I'm not sure how to do this. I think I should probably do something like:
*Create new table*
DATASET DECLARE maxAndMin.
*Loop through all variables: Use conditional statement to identify numeric variables*
DO REPEAT R=var1 TO varN.
FREQUENCIES VARIABLES /STATISTICS=MINIMUM
END REPEAT
*Find max and minimum*
I'm not sure how to go about this though. Any suggestions would be appreciated.
The following code will first make a list of all numeric variables in the dataset (and store it in a macro called !nums) and then it will run an analysis of those variables to tell you the mean, maximum and minimum of each:
SPSSINC SELECT VARIABLES MACRONAME="!nums" /PROPERTIES TYPE= NUMERIC.
DESCRIPTIVES !nums /STATISTICS=MEAN MIN MAX.
You can use the following code to create a tiny dataset to test the above code on:
data list list/n1 (f1) t1(a1) n2(f1) t2(a1).
begin data
1 "a" 34 "b"
2 "a" 23 "b"
3 "a" 52 "b"
4 "a" 71 "b"
end data.
If SUMMARIZE produces a nice enough table for you, here is a "non-extension" way of doing it.
file handle mydata /name="<whatever/wherever>".
data list free /x (f1) y (a5) z (F4.2).
begin data.
1 yes 45.67
2 no 32.00
3 maybe .
4 yes 22.02
5 no 12.79
end data.
oms select tables
/destination format=sav outfile=mydata
/if subtypes="Descriptive Statistics" /tag="x".
des var all.
omsend tag="x".
get file mydata.
summarize Var1 Mean Minimum Maximum /format list nocasenum nototal
/cells none /statistics none /title "Numeric Variables Only".
or use a DATASET command instead of file handle if you don't need the file on disk.

Conditional processing in SPSS

I would like to conditionally process blocks of syntax where the condition is based on the active data set.
Within an SPSS macro, you can conditionally process a block of syntax using the !IF/!IFEND macro command. However, as far as I can tell, the user is required to explicitly give a value to the flag by either using the !LET command (!LET !FLAG = 1), or by using a Macro input variable. This is wildly different from my experience with other languages, where I can write code that has branching logic based on the data I'm working with.
Say that there is a block of syntax that I only want to run if there are at least 2 records in the active data set. I can create a variable in the data set which is equal to the number of records using the AGGREGATE function, but I can't find a way to make a macro variable equal to that value in a way that is usable as a !IF condition. Below is a very simple version of what I'd like to do.
COMPUTE DUMMY=1.
AGGREGATE
/OUTFILE = * MODE = ADDVARIABLES
/BREAK DUMMY
/NUMBER_OF_CASES = N.
!LET !N_CASES = NUMBER_OF_CASES.
!IF (!N_CASES > 1) !THEN
MEANS TABLES = VAR1 VAR2 VAR3.
!IFEND
Is what I'm attempting possible? Thanks in advance for your time and consideration.
Following is a way to put a value from the dataset into a macro, which you can then use wherever you need - including in another macro.
First we'll make a little dataset to recreate your example:
data list free/var1 var2 var3.
begin data
1 1 1 2 2 2 3 3 3
end data.
* this will create the number of cases value:
AGGREGATE /OUTFILE = * MODE = ADDVARIABLES /BREAK /NUMBER_OF_CASES = N.
Now we can send the value into a macro - by writing a separate syntax file with the macro definition.
do if $casenum=1.
write out='SomePath\N_CASES.sps' /"define !N_CASES() ", NUMBER_OF_CASES, " !enddefine.".
end if.
exe.
insert file='SomePath\N_CASES.sps'.
The macro is now defined and you can use the value in calculations (e.g if you want to use it for analysis of a different dataset, or later in your syntax when the current data is not available).
for example:
compute just_checking= !N_CASES .
You can also use it in your macro as in your example - you'll see that the new macro can't read the !N_CASES macro as is, that's why you need the !eval() function:
define !cond_means ()
!IF (!eval(!N_CASES) > 1) !THEN
MEANS TABLES = VAR1 VAR2 VAR3.
!IFEND
!enddefine.
Now running the macro will produce nothing if there is just one line in your data, and will run means if there was more than one line:
!cond_means.

Scalding: Create list from column in Pipe

I need to take a pipe that has a column of labels with associated values, and pivot that pipe so that there is a column for each label with the correct values in each column. So f example if I have this:
Id Label Value
1 Red 5
1 Blue 6
2 Red 7
2 Blue 8
3 Red 9
3 Blue 10
I need to turn it into this:
ID Red Blue
1 5 6
2 7 8
3 9 10
I know how to do this using the pivot command, but I have to explicitly know the values of the labels. How can I can dynamically read the labels from the “label” column into a list that I can then pass into the pivot command? I have tried to create list with:
pipe.groupBy('id) {_.toList('label) }
, but I get a type mismatch saying it found a symbol but is expecting (cascading.tuple.Fields, cascading.tuple.Fields). Also, from reading online, it sounds like using toList is frowned upon. The number of things in 'label is finite and not that big (30-50 items maybe), but may be different depending on what sample of data I am working with.
Any suggestions you have would be great. Thanks very much!
I think you're on the right track, you just need to map the desired values to Symbols:
val newHeaders = lines
.map(_.split(" "))
.map(a=>a(1))
.distinct
.map(f=>Symbol(f))
.toList
The Execution type will help you to combine with the subsequent pivot, for performance reasons.
Note that I'm using a TypedPipe for the lines variable.
If you want your code to be super-concise, you could combine lines 1 & 2, but it's just a stylistic choice:
map(_.split(" ")(1))
Try using Execution to get the list of values from the data. More info on executions: https://github.com/twitter/scalding/wiki/Calling-Scalding-from-inside-your-application

Convert Matlab Console output to new expression

In order to debug a very complex set of functions, I want to isolate a subfunction from the workspace in order to make different test. Therefore a need selected values from the function workspace to be defined already. By setting a break point at the specific position I can "look" into the current workspace by displaying the values in the console, like the variable HF33
HF33 =
1.0777 0.0865 0.0955
-0.1891 0.8110 -0.1889
0.0935 0.0846 1.0755
Is there some function / script that could convert this result to a new Matlab expression that can be pasted somewhere else (for example at the head of a new script), e.g.:
HF33 = [ 1.0777, 0.0865, 0.0955;
-0.1891, 0.8110, -0.1889;
0.0935, 0.0846, 1.0755 ];
With that I could test the subfunction and its behavior by easily changing the given values and see whats happening without having the huge debug workspace running.
Is there some easy function like res2exp(HF33)?
First: Create this function to get the variable name
function out = varname(var)
out = inputname(1);
end
you can print it direct to console:
fprintf('%s =%s\n',varname(varToSave),mat2str(varToSave));
Or use fopen and fprint to write it in a file
fop = fopen('filename','w');
fprint(fop,'%s = %s' ,varname(varToSave),mat2str(varToSave));
fclose(fop);
I think this will help you
It might be a function like mat2str() you are looking for but it will not give exactly the printout you are asking for. Here is an example of how it could be used:
>> A = magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
>> B = mat2str(A)
B =
[16 2 3 13;5 11 10 8;9 7 6 12;4 14 15 1]
And if you want the output to be totally copy/paste-able you could use:
disp(['C = ',mat2str(A)])
C = [16 2 3 13;5 11 10 8;9 7 6 12;4 14 15 1]
I made this up just now. It is not formatted beautifully, but it achieves what you are trying to do - if I understand you correctly.
a = [ 2 3 4 5
4 5 5 6
3 4 5 6];
fprintf('\nb = [\n\n');
disp(a);
fprintf(']\n\n');
Copy and paste this and see if it does what you want. It's also very simple code, so you could modify it if the spacing and newline characters aren't where you want them.
You could also make a small function out of this if you wanted to.
If you want me to make a function of it, let me know... I can do it tomorrow. But you can probably figure it out.
Ehh, I just made the function. It didn't take long.
function reprint_matrix(matrix)
var_name = inputname(1);
fprintf('\n%s = [\n\n', var_name);
disp(matrix);
fprintf(']\n\n');
end
I'm not sure what you are looking for, but I think this will help you:
http://www.mathworks.com/matlabcentral/fileexchange/24447-generate-m-file-code-for-any-matlab-variable/content/examples/html/gencode_example.html
Did not use it because I use mat-files to transfer data.
You can combine it with the clipboard function:
clipboard('copy',gencode(ans))
Though there are several ways to write variables to text, saving variables as text is definitely bad practice if it can be avoided. Hence, the best advice I can give you is to solve your problem in a different way.
Suppose you want to use HF33 in your subfunction, then here is what I would recommend:
First of all, save your variable of interest:
save HF33 HF33
Then when you are in the function where you want to use this variable:
load HF33
This assumes that your working directory (not workspace) is the same in both cases, but otherwise you can simply add the path in your save or load command. If you want to display it you can now simply call the variable HF33 without a semicolon (this is probably the only safe way to display it exactly the way you expect in all cases).
Note that this method can easily be adapted to transfer multiple variables at once.