I want to merge two files using a shared ID variable.
Vars in file 1 (mother):
IDofMother Age Education
Vars in file 2 (child):
IDofMother (matches ID in file 1), Sex, Education, Child's name, Child's age.
Problem is that if a mother has more than one child, the same mothers ID will appear for each child in file 2, but SPSS does not duplicate values of cases associated with same IDofmother (Sex, Education, Child's name, Child's age) in file 2 - second and subsequent values of cases for the same IDofmother appear as missing values.
Any way I can force SPSS to copy the the second subsequent values for the same IDofmother at file 1?
I tried Merge files -> Add variables -> Match cases on key variables - Both files provide cases, but it didn't help.
It's a little unclear to me from your question, but I assume you want to add columns to file2 [child] from file1 [mother] -- namely, mother's age and mother's education.
You can use MATCH FILES to do that, but you should rename your variables first to make sure the two files only share variable names when variables refer to the same thing. Also, sort by ID (required for MATCH FILES).
For instance, in file1 [mother] you might use:
RENAME VARIABLES
(ID=IDofMother)
(Age=AgeOfMother)
(Education=EducationOfMother) .
SORT CASES BY IDofMother .
SAVE OUTFILE='mother.sav' .
and in file2 [child]:
RENAME VARIABLES
(Sex=SexOfChild)
(Education=EducationOfChild) .
SORT CASES BY IDofMother .
SAVE OUTFILE='child.sav' .
From there:
MATCH FILES FILE='child.sav' /* file2 */
/TABLE='mother.sav' /* file1 */
/BY IDofMother .
EXE .
The solution offered by #user45392 is the right way to go, this is the same just a bit shorter (and without saving extra files to disk)
get file='path\mothers data.sav'.
sort cases by ID.
dataset name mother.
get file='path\children data.sav'.
sort cases by IDofmother.
dataset name child.
MATCH FILES FILE=child /rename (sex Education=sexChl EducationChl)
/TABLE='mother.sav' /rename (ID Age Education=IDofMother AgeMoth EducationMoth)
/BY IDofMother .
EXE .
Related
Let's say string is a variable file name like few examples below:
file1_name_cr_001.csv
file2_name1_name2.nn.123.456_updt_000.csv
filename_2012.444.1234_utc_del_004.csv
The length of last 8 string values will always remain fixed i.e. (_001.csv,_000.csv,_004.csv). We need to only extract values = cr, updt, del
How can we get the value as single value before _cr,_updt,_del.?
any suggetions.?
output should get like this:
file1_name/cr/001
file2_name1_name2.nn.123.456/updt/000
filename_2012.444.1234_utc/del/004
I have reproduced the above and got the below results.
First, I took a sample file name in set variable.
Then, I got the string from start to length-8.
#substring(variables('sample'),0,sub(length(variables('sample')),8))
For end folder:
#replace(split(substring(variables('sample'),sub(length(variables('sample')),8), 8),'.')[0],'_','')
For Start folder:
#substring(variables('before_8'), 0, lastIndexOf(variables('before_8'), '_'))
For middle folder:
#split(variables('before_8'), '_')[sub(length(split(variables('before_8'), '_')), 1)]
Result folder structure:
#concat(variables('start'),'/',variables('middle'),'/',variables('end'))
Result:
Give this variable in copy activity source folder path and it will generate the folder structure for you.
For multiple file names, first store all file names in an array then use a ForEach and inside ForEach do the same operations as above.
I have a requirement where i have a list of files to be renamed based on a pattern . There are multiple such patterns.
eg:
file1_type1.txt
file2_type1.txt
file1_type2.txt
file2.type2.txt
file.type3.txt
I want to rename the above files as :
1.a_(namingconvention)type1.txt
1.b(namingconvention)type1.txt
2.a(namingconvention)type2.txt
2.b(namingconvention)_type2.txt
3.(namingconvention)_type3.txt
logic to rename should be:
Look at the type of file -> type1, type2, type3 etc ...
if multiple files are present, first file to be taken as 1.a-(namingconvention)
namingconvention should be a variable.
Please help me out as I am not familiar with powershell.
I have multiple spss file having multiple number of variables(col1,col2,...col150).I am trying to create a common code for restructure the file using VARSTOCVASES. in this i need to KEEP 3 variables(col1,col34,col66)these are common in all files but the rest variables are different.I know the normal way in that we will add all the remaining variables in to MAKE sub command. that i am adding bellow
VARSTOCVASES
/MAKE VariableName1 FROM Col1 Col2 Col3 ....etc(except 3)
/INDEX=VariableName(VariableName1)
/KEEP=Col1 Col34 Col66
instead of this i want to create some variable list using the (SPSSINC SELECT VARIABLES) command.I got this idea but i don't have any examples for the same.This Select query must be small which means this query should dynamically select all the variables except these 3(Col1 Col34 Col66)because i have different SPSS files and in that these 3(Col1 Col34 Col66) variables are same but the rest are different and all containing different number of variables.
IF i have a variable list(dynamically generated by excluding the 3) then i can point that in MAKE sub Command.Please any one help me.
one way to go about this could be to rename these specific columns and then select all other variables that start with "col":
rename variables (col1 col34 col66=var1 var34 var66).
spssinc select variables MACRONAME = "!allCOL"
/PROPERTIES PATTERN="Col*".
Now all variables with names starting with "Col" are in the list called "!allCOL" which you can use in your syntax, for example:
VARSTOCVASES
/MAKE VariableName1 FROM !allCOL /INDEX=VariableName(VariableName1) .
EDIT: another solution
The solution above is valid only if there is a constant pattern to all the variables you want on the list. If that is not the case, this following solution enables you to name the variables that you don't want on the list, and put all the rest on the list.
* first we define a new attribute in which we mark the
variables we don't want on the list.
VARIABLE ATTRIBUTE VARIABLES=Car_Model_1 Car_Model_2
ATTRIBUTE=IncludeInMake ("no").
* now we create the list, leaving out the unwanted variables.
spssinc select variables MACRONAME = "!forMake"
/ATTRVALUES NAME=IncludeInMake VALUE="".
VARSTOCVASES /MAKE Val FROM !forMake /INDEX=var(val) .
I have a folder(Enroll) which contain 100 or more sub folders and each of them contain one image. I want to read this image and do some processing on this image. I have problem with how to read them from the different folders ?
note *
( the sub folders name is number like : 1, 2,.. " this number arrived from user " )
(the image name is number but different and not sequential like : 433535.bmp , 126554,bmp ,...)
foldername=1; // name of the sub folder arrived from user
d4= dir('C:\Users\Sarah\Desktop\Log\Log\Enroll\',foldername,'\*.bmp');// here problem when i put foldername variable
foldername2=d4(1).name;
w=imread(fullfile('C:\Users\Sarah\Desktop\Log\Log\Enroll\',foldername,'\*.bmp', foldername2));
help me please :(
foldername is not a string. therefore, you need to make it a string. I believe that what you want is
d4= dir(['C:\Users\Sarah\Desktop\Log\Log\Enroll\' num2str(foldername) '\*.bmp']);
Note:
1- you need to convert from number to string whatever number you have. If foldername is a string, then num2str is not needed.
2- You need to concatenate arrays, it doesn't happen automatically. therefore you need to ad the brackets [].
I keep running into an error when trying to add variables of one spss file to another. File 1 has 1.800.000 cases [payments], File 2 has 800.000 cases [recipients]. They both have an ID number to match cases on.
For every payment in File 1 I want to add the recipient, from File 2. The recipients should thus be able to match for multiple payments.
This are the two codes I have been trying, which don't work:
code using IN
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid(A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid(A).
Match Files /File=DataSet1
/In=DataSet2
/BY globalrecipientid.
execute
When I use /In I don't get any errors, but the files don't properly match sin it doesn't add any variables.
code using TABLE
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid(A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid(A).
Match Files /File=DataSet1
/TABLE=DataSet2
/BY globalrecipientid.
execute
When I use /TABLE I get the following error:
Warning # 5132
Undefined error #5132 - Cannot open text file 'S:\Progra~1\spss\IBM\SPSS\STATIS~1\20\lang\en\spss.err": No such file or directory
I have run out of tricks, wouldn't dare try this in Ruby, and excel sadly is too small to handle this.. Any thoughts?
Your first solution is wring because you are using IN subcommand wrongly. In other words you are matching Dataset1 with nothing.
IN creates a new variable in the resulting file that indicates whether
a case came from the input file named on the preceding FILE
subcommand.
Your second solution. You are sorting dataset by variable recipientid but the match files is done by the variable globalrecipientid. Why do you sort by one variable but match by another? This could be a problem. And dataset names should be in quotes.
Solution 1:
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid (A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid (A).
Match Files
/File = "DataSet1"
/TABLE = "DataSet2"
/BY recipientid.
execute.
Solution 2. I never liked the implementation of datasets in SPSS. I did not trusted them. Other solution is to save datasets as files and do the match of files.
get "file1.sav".
SORT CASES BY recipientid (A).
save out "file1s.sav".
get "file2.sav".
SORT CASES BY recipientid (A).
save out "file2s.sav".
Match Files
/File = "file1s.sav"
/TABLE = "file2s.sav"
/BY recipientid.
execute.
My syntax looks somwhat different:
DATASET ACTIVATE DatenSet1.
MATCH FILES /FILE=*
/FILE='DatenSet2'
/RENAME VarsToRename
/BY ID
/DROP= Vars
EXECUTE.
Maybe this helps?