Matching Files in SPSS using Table or In - merge

I keep running into an error when trying to add variables of one spss file to another. File 1 has 1.800.000 cases [payments], File 2 has 800.000 cases [recipients]. They both have an ID number to match cases on.
For every payment in File 1 I want to add the recipient, from File 2. The recipients should thus be able to match for multiple payments.
This are the two codes I have been trying, which don't work:
code using IN
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid(A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid(A).
Match Files /File=DataSet1
/In=DataSet2
/BY globalrecipientid.
execute
When I use /In I don't get any errors, but the files don't properly match sin it doesn't add any variables.
code using TABLE
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid(A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid(A).
Match Files /File=DataSet1
/TABLE=DataSet2
/BY globalrecipientid.
execute
When I use /TABLE I get the following error:
Warning # 5132
Undefined error #5132 - Cannot open text file 'S:\Progra~1\spss\IBM\SPSS\STATIS~1\20\lang\en\spss.err": No such file or directory
I have run out of tricks, wouldn't dare try this in Ruby, and excel sadly is too small to handle this.. Any thoughts?

Your first solution is wring because you are using IN subcommand wrongly. In other words you are matching Dataset1 with nothing.
IN creates a new variable in the resulting file that indicates whether
a case came from the input file named on the preceding FILE
subcommand.
Your second solution. You are sorting dataset by variable recipientid but the match files is done by the variable globalrecipientid. Why do you sort by one variable but match by another? This could be a problem. And dataset names should be in quotes.
Solution 1:
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid (A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid (A).
Match Files
/File = "DataSet1"
/TABLE = "DataSet2"
/BY recipientid.
execute.
Solution 2. I never liked the implementation of datasets in SPSS. I did not trusted them. Other solution is to save datasets as files and do the match of files.
get "file1.sav".
SORT CASES BY recipientid (A).
save out "file1s.sav".
get "file2.sav".
SORT CASES BY recipientid (A).
save out "file2s.sav".
Match Files
/File = "file1s.sav"
/TABLE = "file2s.sav"
/BY recipientid.
execute.

My syntax looks somwhat different:
DATASET ACTIVATE DatenSet1.
MATCH FILES /FILE=*
/FILE='DatenSet2'
/RENAME VarsToRename
/BY ID
/DROP= Vars
EXECUTE.
Maybe this helps?

Related

Merge copy values to dup ID?

I want to merge two files using a shared ID variable.
Vars in file 1 (mother):
IDofMother Age Education
Vars in file 2 (child):
IDofMother (matches ID in file 1), Sex, Education, Child's name, Child's age.
Problem is that if a mother has more than one child, the same mothers ID will appear for each child in file 2, but SPSS does not duplicate values of cases associated with same IDofmother (Sex, Education, Child's name, Child's age) in file 2 - second and subsequent values of cases for the same IDofmother appear as missing values.
Any way I can force SPSS to copy the the second subsequent values for the same IDofmother at file 1?
I tried Merge files -> Add variables -> Match cases on key variables - Both files provide cases, but it didn't help.
It's a little unclear to me from your question, but I assume you want to add columns to file2 [child] from file1 [mother] -- namely, mother's age and mother's education.
You can use MATCH FILES to do that, but you should rename your variables first to make sure the two files only share variable names when variables refer to the same thing. Also, sort by ID (required for MATCH FILES).
For instance, in file1 [mother] you might use:
RENAME VARIABLES
(ID=IDofMother)
(Age=AgeOfMother)
(Education=EducationOfMother) .
SORT CASES BY IDofMother .
SAVE OUTFILE='mother.sav' .
and in file2 [child]:
RENAME VARIABLES
(Sex=SexOfChild)
(Education=EducationOfChild) .
SORT CASES BY IDofMother .
SAVE OUTFILE='child.sav' .
From there:
MATCH FILES FILE='child.sav' /* file2 */
/TABLE='mother.sav' /* file1 */
/BY IDofMother .
EXE .
The solution offered by #user45392 is the right way to go, this is the same just a bit shorter (and without saving extra files to disk)
get file='path\mothers data.sav'.
sort cases by ID.
dataset name mother.
get file='path\children data.sav'.
sort cases by IDofmother.
dataset name child.
MATCH FILES FILE=child /rename (sex Education=sexChl EducationChl)
/TABLE='mother.sav' /rename (ID Age Education=IDofMother AgeMoth EducationMoth)
/BY IDofMother .
EXE .

saving multiple fields of structure as separate mat files and creating directory non-existing directories

There are two parts of my query
1) How to save different fields of structures as separate files(each file containing only named field of structure )?
2) Forcing save command to create directories in the save path when intermediate directories do not exist?
For first part:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields=fieldnames(data);
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct','data');
end
I want to save each field of struct data as a separate .mat file. So that after executing the loop, I should have 3 files inside E:\data viz. a.mat,b.mat and c.mat and a.mat contains only data of field 'a', b.mat contains only data of field 'b' and so on.
When I exeucte the above code, I get three files in my directory but each file contains identical content of all three variables a, b and c, instead of individual variables in each file.
Following command does not work:
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct',['data.' fields{i} ]);
end
Error using save
The argument to -STRUCT must be the name of a scalar structure variable.
Is there some way to use save command to achieve my purpose without having to create temporary vaiables for saving each field?
For Second Part:
I have large number of files which need to stored in a directory structure. I want following to work.
test='abcdefgh';
save(['E:\data\' test(1:2) '\' test(3:4) '\' test(5:6) '\result.mat'])
But it showing following error
Error using save
Cannot create 'result.mat' because 'E:\data\ab\cd\ef' does not exist.
If any intermediate directory are not present, then they should be created by save command. I can get this part to work by checking if directory is present or not using exist command and then create directory using mkdir. I am wondering if there is some way to force save command to do the work using some argument I am not aware of.
Your field input argument to save is wrong. Per the documentation, the format is:
'-struct',structName,field1,...,fieldN
So the appropriate save syntax is:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields = fieldnames(data);
for ii = 1:length(fields)
save(['E:\data\' fields{ii} '.mat'], '-struct', 'data', fields{ii});
end
And no, you cannot force save to generate the intermediate directories. Check for the existence of the save path first and create it if necessary.

The value in "sym" file disappears when using splayed tables

I am using the following line:
`:c:/dir/ set .Q.en[`:c:/dir; tablename]
Everything is ok if I don't exit KDB, but if I do and then try to load the table using
get `dir
all the symbol columns are integer. I would really appreciate your help into understanding why this happens.
It looks like you forgot to repeat the table name on the l.h.s. of set.
Try
q)`:c:/dir/tablename/ set .Q.en[`:c:/dir; tablename]
This will correctly save table columns in c:/dir/tablename subdirectory and place the sym file alongside. Now you should be able to load both your table and the sym file by using the \l command or specifying c:/dir on the command line when you restart q
q c:/dir
or
q
q)\l c:/dir
(no backticks or leading :'s in either of those commands)
If you want to use get on this table, you will have to load sym separately:
q)load`:c:/dir/sym
q)get`:c:/dir/tablename/
(note the leading : in the path specs)
Finally, you may want to take a look at the rsave command which will save your table without you having to write tablename twice.
.Q.en takes 2 oarams - file handle and table data
Your first param isnt a hsym - should be backtick then colon then path to your db root
Also set takes 2 params - first in this case should be the path to where you want to save like dir/splayedTableName/

How to batch rename files to 3-digit numbers?

I apologize in advance that this question is not specific. But my goal is to take a bunch of image files, which are currently named as: 0.tif, 1.tif, 2.tif, etc... and rename them just as numbers to 000.tif, 001.tif, 002.tif, ... , 010.tif, etc...
The reason I want to do this is because I am trying to load the images into matlab and for batch processing but matlab does not order them correctly. I use the dir command as dir(*.tif) to get all the images and load them into an array of files that I can iterate over and process, but in this array element 1 is 0.tif, element 2 is 1.tif, element 3 is 10.tif, element 4 is 100.tif, and so on.
I want to keep the ordering of the elements as I process them. However, I do not care if I have to change the order of the elements BEFORE processing them (i.e. I can make it work to rename, for example, 2.tif to 10.tif if I had to) but I am looking for a way to convert the file names the way I initially described.
If there is a better way to get matlab to properly order the files when it loads them into the array using dir please let me know because that would be much easier.
Thanks!!
You can do this without having to rename the files, if you want. When you grab the files using dir, you'll have a list of files like so:
files =
'0.tif'
'1.tif'
'10.tif'
...
You can grab just the numeric part using regexp:
nums = regexp(files,'\d+','match');
nums = str2double([nums{:}]);
nums =
0 1 10 11 12 ...
regexp returns its matches as a cell-array, the second line converts it back to actual numbers.
We can now get an actual numeric order by sorting the resulting array:
[~,order] = sort(nums);
and then put the files in the correct order:
files = files(order);
This should (I haven't tested it, I don't have a folder full of numerically labelled files handy) produce a list of files like so:
files=
'0.tif'
'1.tif'
'2.tif'
'3.tif'
...
this is partially dependent on the version of matlab you have. If you have a version with findstr this should work well
num_files_to_rename = numel(name_array);
for ii=1:num_files_to_rename
%in my test i used cells to store my strings you may need to
%change the bracket type for your application
curr_file = name_array{ii};
%locates the period in the file name (assume there is only one)
period_idx = findstr(curr_file ,'.');
%takes everything to the left of the period (excluding the period)
file_name = str2num(curr_file(1:period_idx-1));
%zeropads the file name to 3 spaces using a 0
new_file_name = sprintf('%03d.tiff',file_name)
%you can uncomment this after you are sure it works as you planned
%movefile(curr_file, new_file_name);
end
the actual rename operation movefile is commented out for now. make sure the output names are as you expect before uncommenting it and renaming all the files.
EDIT there is no real error checking in this code, it just assumes every file name has one and only one period, and an actual number as the name
The Batch file below do the rename of the files you want:
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%f in ('dir /B *.tif') do (
set "name=00%%~Nf"
ren "%%f" "!name:~-3!.tif"
)
Note that this solution preserve the same order of your original files, even if there are missing numbers in the sequence..

Store user input as wildcard

I am having some trouble with a data processing function in MATLAB. The function takes the name of the file to be processed as an input, finds the desired files, and reads in the data.
However, several of the desired files are variants, such as Data_00.dat, Data.dat, or Data_1_March.dat. Within my function, I would like to search for all files containing Data and condense them into one usable file for processing.
To solve this, I would like desiredfile to be converted into a wildcard.
Here is the statement I would like to use.
selectedfiles = dir *desiredfile*.dat % Search for file names containing desiredfile
This returns all files containing the variable name desiredfile, rather than the user input.
The only solution that I can think of is writing a separate function that manually condenses all the variants into one file before my function is run, but I am trying to keep the number of files used down and would like to avoid this.
You could concatenate strings for that. Considering desiredFile as a variable.
desiredFile = input('Files: ');
selectedfiles = dir(['*' desiredfile '*.dat']) % Search for file names containing desiredfile
Enclosing strings between square brackets [string1 string2 ... stringN]concatenates them. Matlab's dir function receives a string.
I believe you can achieve that using the dir command.
dataSets = dir('/path/to/dir/containing/Data*.dat');
dataSets = {dataSets.name};
Now simply loop over them, more information here.
To quote the matlab help:
dir lists the files and folders in the MATLABĀ® current folder. Results appear in the order returned by the operating system.
dir name lists the files and folders that match the string name. When name is a folder, dir lists the contents of the folder. Specify name using absolute or relative path names. You can use wildcards (*).