How to find max&min of all variables with SPSS and display in table? - macros

I have a table with about 500 variables and 2000 cases. The type of these variables varies. My supervisor has asked me to produce a table listing all the numeric variables, along with their maximums and minimums. I am supposed to use SPSS because R apparently messes up the value labels.
I've only done very basic things in SPSS before this, like finding statistics for single variables, and I'm not sure how to do this. I think I should probably do something like:
*Create new table*
DATASET DECLARE maxAndMin.
*Loop through all variables: Use conditional statement to identify numeric variables*
DO REPEAT R=var1 TO varN.
FREQUENCIES VARIABLES /STATISTICS=MINIMUM
END REPEAT
*Find max and minimum*
I'm not sure how to go about this though. Any suggestions would be appreciated.

The following code will first make a list of all numeric variables in the dataset (and store it in a macro called !nums) and then it will run an analysis of those variables to tell you the mean, maximum and minimum of each:
SPSSINC SELECT VARIABLES MACRONAME="!nums" /PROPERTIES TYPE= NUMERIC.
DESCRIPTIVES !nums /STATISTICS=MEAN MIN MAX.
You can use the following code to create a tiny dataset to test the above code on:
data list list/n1 (f1) t1(a1) n2(f1) t2(a1).
begin data
1 "a" 34 "b"
2 "a" 23 "b"
3 "a" 52 "b"
4 "a" 71 "b"
end data.

If SUMMARIZE produces a nice enough table for you, here is a "non-extension" way of doing it.
file handle mydata /name="<whatever/wherever>".
data list free /x (f1) y (a5) z (F4.2).
begin data.
1 yes 45.67
2 no 32.00
3 maybe .
4 yes 22.02
5 no 12.79
end data.
oms select tables
/destination format=sav outfile=mydata
/if subtypes="Descriptive Statistics" /tag="x".
des var all.
omsend tag="x".
get file mydata.
summarize Var1 Mean Minimum Maximum /format list nocasenum nototal
/cells none /statistics none /title "Numeric Variables Only".
or use a DATASET command instead of file handle if you don't need the file on disk.

Related

How to put multiple where statements into function on kdb+

I'm trying to write a function using kdb+ which will look at the list, and find the values that simply meet two conditions.
Let's call the list DR (for data range). And I want a function that will combine these two conditions
"DR where (DR mod 7) in 2"
and
"DR where (DR.dd) in 1"
I'm able to apply them one at a time but I really need to combine them into one function. I was hoping I could do this
"DR was (DR.dd mod 7) in 2 and DR where (DR.dd) in 1"
but this obviously didn't work. Any advice?
You can utilize the and function to help with this, which is the same as &:
q)dr:.z.d+til 100
q)and
&
q)2=dr mod 7
10000001000000100000010000001000000100000010000001000000100000010000001000000..
q)1=dr.dd
00000000000000000000000001000000000000000000000000000000100000000000000000000..
q)(1=dr.dd)&2=dr mod 7
00000000000000000000000000000000000000000000000000000000100000000000000000000..
q)dr where(1=dr.dd)&2=dr mod 7
2021.02.01 2021.03.01
Its necessary wrap the first part in brackets due to how kdb reads code from right to left. This format changes slightly when doing this in a where clause, the brackets arent needed due to how each where clause is parsed, that is each clause between the commas are parsed seperately. However it is essentially doing the same thing as the code above.
q)t:([]date:dr)
q)select from t where 1=date.dd,2=date mod 7
date
----------
2021.02.01
2021.03.01
You could also do this using min to achieve similar results, like so:
DR where min(1=DR.dd;2=DR mod 7)

How to import dates correctly from this .csv file into Matlab?

I have a .csv file with the first column containing dates, a snippet of which looks like the following:
date,values
03/11/2020,1
03/12/2020,2
3/14/20,3
3/15/20,4
3/16/20,5
04/01/2020,6
I would like to import this data into Matlab (I think the best way would probably be using the readtable() function, see here). My goal is to bring the dates into Matlab as a datetime array. As you can see above, the problem is that the dates in the original .csv file are not consistently formatted. Some of them are in the format mm/dd/yyyy and some of them are mm/dd/yy.
Simply calling data = readtable('myfile.csv') on the .csv file results in the following, which is not correct:
'03/11/2020' 1
'03/12/2020' 2
'03/14/0020' 3
'03/15/0020' 4
'03/16/0020' 5
'04/01/2020' 6
Does anyone know a way to automatically account for this type of data in the import?
Thank you!
My version: Matlab R2017a
EDIT ---------------------------------------
Following the suggestion of Max, I have tried specifiying some of the input options for the read command using the following:
T = readtable('example.csv',...
'Format','%{dd/MM/yyyy}D %d',...
'Delimiter', ',',...
'HeaderLines', 0,...
'ReadVariableNames', true)
which results in:
date values
__________ ______
03/11/2020 1
03/12/2020 2
NaT 3
NaT 4
NaT 5
04/01/2020 6
and you can see that this is not working either.
If you are sure all the dates involved do not go back more than 100 years, you can easily apply the pivot method which was in use in the last century (before th 2K bug warned the world of the danger of the method).
They used to code dates in 2 digits only, knowing that 87 actually meant 1987. A user (or a computer) would add the missing years automatically.
In your case, you can read the full table, parse the dates, then it is easy to detect which dates are inconsistent. Identify them, correct them, and you are good to go.
With your example:
a = readtable(tfile) ; % read the file
dates = datetime(a.date) ; % extract first column and convert to [datetime]
idx2change = dates.Year < 2000 ; % Find which dates where on short format
dates.Year(idx2change) = dates.Year(idx2change) + 2000 ; % Correct truncated years
a.date = dates % reinject corrected [datetime] array into the table
yields:
a =
date values
___________ ______
11-Mar-2020 1
12-Mar-2020 2
14-Mar-2020 3
15-Mar-2020 4
16-Mar-2020 5
01-Apr-2020 6
Instead of specifying the format explicitly (as I also suggested before), one should use the delimiterImportoptions and in the case of a csv-file, use the delimitedTextImportOptions
opts = delimitedTextImportOptions('NumVariables',2,...% how many variables per row?
'VariableNamesLine',1,... % is there a header? If yes, in which line are the variable names?
'DataLines',2,... % in which line does the actual data starts?
'VariableTypes',{'datetime','double'})% as what data types should the variables be read
readtable('myfile.csv',opts)
because the neat little feature recognizes the format of the datetime automatically, as it knows that it must be a datetime-object =)

Looping through a set of variables sharing the same prefix

I routinely work with student exam files, where each response to an exam item is recorded in points. I want to transform that variable into 1 or 0 (effectively reducing each item to Correct or Incorrect).
Every dataset has the same nomenclature, where the variable is prefixed with points_ and then followed by an identification number (e.g., points_18616). I'm using the following syntax:
RECODE points_18616 (0=Copy) (SYSMIS=SYSMIS) (ELSE=1) INTO Binary_18616.
VARIABLE LABELS Binary_18616 'Binary Conversion of Item_18616'.
EXECUTE.
So I end up creating this syntax for each variable, and since every dataset is different, it gets tedious. Is there a way to loop through a dataset and perform this transformation on all variables that are prefixed with points_?
Here is a way to do it:
First I'll create a little fake data to demonstrate on:
data list list/points_18616 points_18617 points_18618 points_18619 (4f2).
begin data
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 9
end data.
* the following code will create a list of all the relevant variables in a new file.
SPSSINC SELECT VARIABLES MACRONAME="!list" /PROPERTIES PATTERN = "points_*".
* now we'll use the created list in a macro to loop your syntax over all the vars.
define !doList ()
!do !lst !in(!eval(!list))
RECODE !lst (0=Copy) (SYSMIS=SYSMIS) (ELSE=1) INTO !concat("Binary", !substr(!lst,7)).
VARIABLE LABELS !concat("Binary", !substr(!lst,7)) !concat("'Binary Conversion of Item",!substr(!lst,7) ,"'.").
!doend
!enddefine.
!doList.
EXECUTE.

Conditional processing in SPSS

I would like to conditionally process blocks of syntax where the condition is based on the active data set.
Within an SPSS macro, you can conditionally process a block of syntax using the !IF/!IFEND macro command. However, as far as I can tell, the user is required to explicitly give a value to the flag by either using the !LET command (!LET !FLAG = 1), or by using a Macro input variable. This is wildly different from my experience with other languages, where I can write code that has branching logic based on the data I'm working with.
Say that there is a block of syntax that I only want to run if there are at least 2 records in the active data set. I can create a variable in the data set which is equal to the number of records using the AGGREGATE function, but I can't find a way to make a macro variable equal to that value in a way that is usable as a !IF condition. Below is a very simple version of what I'd like to do.
COMPUTE DUMMY=1.
AGGREGATE
/OUTFILE = * MODE = ADDVARIABLES
/BREAK DUMMY
/NUMBER_OF_CASES = N.
!LET !N_CASES = NUMBER_OF_CASES.
!IF (!N_CASES > 1) !THEN
MEANS TABLES = VAR1 VAR2 VAR3.
!IFEND
Is what I'm attempting possible? Thanks in advance for your time and consideration.
Following is a way to put a value from the dataset into a macro, which you can then use wherever you need - including in another macro.
First we'll make a little dataset to recreate your example:
data list free/var1 var2 var3.
begin data
1 1 1 2 2 2 3 3 3
end data.
* this will create the number of cases value:
AGGREGATE /OUTFILE = * MODE = ADDVARIABLES /BREAK /NUMBER_OF_CASES = N.
Now we can send the value into a macro - by writing a separate syntax file with the macro definition.
do if $casenum=1.
write out='SomePath\N_CASES.sps' /"define !N_CASES() ", NUMBER_OF_CASES, " !enddefine.".
end if.
exe.
insert file='SomePath\N_CASES.sps'.
The macro is now defined and you can use the value in calculations (e.g if you want to use it for analysis of a different dataset, or later in your syntax when the current data is not available).
for example:
compute just_checking= !N_CASES .
You can also use it in your macro as in your example - you'll see that the new macro can't read the !N_CASES macro as is, that's why you need the !eval() function:
define !cond_means ()
!IF (!eval(!N_CASES) > 1) !THEN
MEANS TABLES = VAR1 VAR2 VAR3.
!IFEND
!enddefine.
Now running the macro will produce nothing if there is just one line in your data, and will run means if there was more than one line:
!cond_means.

Find all differences between .mat files

I am looking for a way to list the differences between two .mat files, something that can be usefull for many people.
Though I searched everywhere I could think of, I have not found anything that meets my requirements:
Pick 2 mat files
Find the differences
Save them properly
The closest I have come is visdiff. As long as I stay within matlab, it will allow me to browse the differences, but when I save the result it only shows me the top level.
Here is a simplified example of what my files typically look like:
a = 6;
b.c.d = 7;
b.c.e = 'x';
save f1
f = a;
clear a
b.c.e = 'y';
save f2
visdiff('f1.mat','f2.mat')
If I click here on b, I can find the difference. However if I run this and use 'file>save', I am not able to click on b. Thus I still don't know what has been changed.
Note: I don't have Simulink
Hence my question is:
How can I show all differences between 2 mat files to someone without Matlab
Here are the answers that I personally consider to be most suitable for different situations:
Answer for users with Simulink
General answer
Answer displaying all value differences
Find all differences between mat files without MATLAB?
You can find the differences between HDF5 based .mat files with the HDF5 Tools.
Example
Let me shorten your MATLAB example and assume you create two mat files with
clear ; a = 6 ; b.c = 'hello' ; save -v7.3 f1
clear ; a = 7 ; b.e = 'world' ; save -v7.3 f2
Outside MATLAB use
h5ls -v -r f1.mat
to get a listing about the kind of data included f1.mat:
Opened "f1.mat" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/a Dataset {1/1, 1/1}
Attribute: MATLAB_class scalar
Type: 6-byte null-terminated ASCII string
Data: "double"
Location: 1:2576
Links: 1
Storage: 8 logical bytes, 8 allocated bytes, 100.00% utilization
Type: native double
/b Group
Attribute: MATLAB_class scalar
Type: 6-byte null-terminated ASCII string
Data: "struct"
Location: 1:800
Links: 1
/b/c Dataset {5/5, 1/1}
Attribute: H5PATH scalar
Type: 2-byte null-terminated ASCII string
Data: "/b"
Attribute: MATLAB_class scalar
Type: 4-byte null-terminated ASCII string
Data: "char"
Attribute: MATLAB_int_decode scalar
Type: native int
Data: 2
Location: 1:1832
Links: 1
Storage: 10 logical bytes, 10 allocated bytes, 100.00% utilization
Type: native unsigned short
Use of
h5ls -d -r f1.mat
returns the values of the stored data:
/ Group
/a Dataset {1, 1}
Data:
(0,0) 6
/b Group
/b/c Dataset {5, 1}
Data:
(0,0) 104, 101, 108, 108, 111
The data 104, 101, 108, 108, 111 represents the word hello, which can be seen with
h5ls -d -r f1.mat | tail -1 | awk '{FS=",";printf("%c%c%c%c%c \n",$2,$3,$4,$5,$6)}'
You can get the same listing for f2.mat and compare the two outputs with the tool of your choice.
Comparison also works directly with HDF5 Tools. To compare the two numbers a from both files use
h5diff -r f1.mat f2.mat /a
which will show you the values and their difference
dataset: </a> and </a>
size: [1x1] [1x1]
position a a difference
------------------------------------------------------------
[ 0 0 ] 6 7 1
1 differences found
attribute: <MATLAB_class of </a>> and <MATLAB_class of </a>>
0 differences found
Remarks
There are a few more commands and options in the HDF5 Tools, which may help to get your real problem solved.
Binary distributions are available for Linux and Windows from The HDF Group. For OS X you can get them installed via MacPorts. If needed there is also a GUI: HDFView.
If you have simulink you can use Simulink.saveVars to generate an m-file that upon execution creates the same variables in work space:
a = 6;
b.c.d = 7;
b.c.e = 'x';
Simulink.saveVars('f1');
f = a;
clear a
b.c.e = 'y';
Simulink.saveVars('f2');
visdiff('f1.m','f2.m')
as illustrated in this sctreenshot
Note that by default it limits the number of elements in arrays to 1000 and you can increase it to 10000. Arrays larger than that limit will be saved in a separate mat-file.
UPDATE: From R2014a a new function similar to Simulink.saveVars has been added to MATLAB. see matlab.io.saveVariablesToScript
This is only part of the answer, but maybe it helps.
You could use gencode, a Matlab function that generates Matlab code from a variable such that running the code reproduces the variable. You do this for all of the variables in each mat-file (takes some programming, but should be doable) and put the results in different .m-files.
Then you use a standard text comparison tool (maybe even visdiff) to compare the .m-files.
There are several good tools to compare XML-Files, this I would proceed this way:
Download struct2xml.m
Load both matfiles
Export each with struct2xml
compare, using XMLSpy or similar
Simple general answer, without displaying value differences
Due to the insight I gained from the answers of #BHF, #Daniel R and #Dennis Jaheruddin, I have managed to find a simple scalable solution:
[fs1, fs2, er] = comp_struct(load('f1.mat'),load('f2.mat'))
Note that it works for .mat containing an arbritrary number of variables.
This uses the Compare Structures - File Exchange submission.
Answer for small files, displaying all value differences
Based on the suggestion by #A. Donda I have tried to use gencode to create a variable for everything.
Though it works for my toy example, it is quite slow and tells me that I exceed the allowed amount of variables for my real .mat files.
Anyway, for those who are looking for something that works with small files, I will post this option:
wList=who;
for iLoop = 1:numel(wList)
eval(['generated_' wList{iLoop} '= gencode(' wList{iLoop} ');'])
for jLoop = 1:numel(eval(['generated_' wList{iLoop}]))
eval(['generated_' wList{iLoop} '_' num2str(jLoop) '= generated_' wList{iLoop} '(' num2str(jLoop) ');' ])
end
end
Though it may work, I don't feel like this is the best way to go.
General answer, without displaying value differences
Due to the insight I gained from the answers of #BHF and #Daniel R I have managed to find a reasonably scalable solution.
Step 1: Save all variables from each files as a single struct
This uses the Save workspace to struct - File Exchange submission.
Here are the steps to take assuming you want to compare f1.mat and f2.mat:
clear
load f1
myStruct1 = ws2struct;
save myStruct1 myStruct1
clear
load f2
myStruct2 = ws2struct;
save myStruct2 myStruct2
clear
load myStruct1
load myStruct2
Step 2: Compare the structs
This uses the Compare Structures - File Exchange submission
Given that you want to compare myStruct1 and myStruct2 you can simply call:
[fs1, fs2, er] = comp_struct(myStruct1,myStruct2)
I was positively surprised at how readable the list of differences in er is, here is the output for the example that was used in the question:
er =
's2 is missing field a'
's1(1).b(1).c(1).e and s2(1).b(1).c(1).e do not match'
Note that it will not show values, from a technical point of view it is probably not too hard to change the m file if value difference displays are desirable. However, especially if there are some big matrices I suppose this could result in problematic output.