The following code was used to generate a list of numeric variables and their maxima and minima from a datafile containing >500 variables and >2000 cases:
OMS select tables
/if commands=["descriptives"]
subtypes=["descriptive statistics"]
/DESTINATION FORMAT = SAV
OUTFILE = "C:\statyMcStatFace.sav".
SPSSINC SELECT VARIABLES MACRONAME="!nums" /PROPERTIES TYPE= NUMERIC.
DESCRIPTIVES !nums /STATISTICS=MIN MAX.
omsend.
Sadly, the variables weren't listed in the same order in the output file as they were in the original file, nor according to any discernible order I can see. For example, if you run the given code on plantar_fascitiitis.csv at
kaggle.com/rameessahlu/plantar-fasciitis
you'll find that the order of the variables in the original table is age, sex, weight... etc., while the order the variables are listed in the macro is Status, TendernessOfFoot, Alignment, Burning... etc.. Why does this happen, and is there a way for me to order the variables as they are in the original table?
When you are creating your numerical variables list using the select variables command, there is an option to keep the created list in the original order of the dataset. So all you have to do is use the command with this addition:
SPSSINC SELECT VARIABLES MACRONAME="!nums" /PROPERTIES TYPE= NUMERIC /OPTIONS ORDER=FILE.
Related
I have multiple spss file having multiple number of variables(col1,col2,...col150).I am trying to create a common code for restructure the file using VARSTOCVASES. in this i need to KEEP 3 variables(col1,col34,col66)these are common in all files but the rest variables are different.I know the normal way in that we will add all the remaining variables in to MAKE sub command. that i am adding bellow
VARSTOCVASES
/MAKE VariableName1 FROM Col1 Col2 Col3 ....etc(except 3)
/INDEX=VariableName(VariableName1)
/KEEP=Col1 Col34 Col66
instead of this i want to create some variable list using the (SPSSINC SELECT VARIABLES) command.I got this idea but i don't have any examples for the same.This Select query must be small which means this query should dynamically select all the variables except these 3(Col1 Col34 Col66)because i have different SPSS files and in that these 3(Col1 Col34 Col66) variables are same but the rest are different and all containing different number of variables.
IF i have a variable list(dynamically generated by excluding the 3) then i can point that in MAKE sub Command.Please any one help me.
one way to go about this could be to rename these specific columns and then select all other variables that start with "col":
rename variables (col1 col34 col66=var1 var34 var66).
spssinc select variables MACRONAME = "!allCOL"
/PROPERTIES PATTERN="Col*".
Now all variables with names starting with "Col" are in the list called "!allCOL" which you can use in your syntax, for example:
VARSTOCVASES
/MAKE VariableName1 FROM !allCOL /INDEX=VariableName(VariableName1) .
EDIT: another solution
The solution above is valid only if there is a constant pattern to all the variables you want on the list. If that is not the case, this following solution enables you to name the variables that you don't want on the list, and put all the rest on the list.
* first we define a new attribute in which we mark the
variables we don't want on the list.
VARIABLE ATTRIBUTE VARIABLES=Car_Model_1 Car_Model_2
ATTRIBUTE=IncludeInMake ("no").
* now we create the list, leaving out the unwanted variables.
spssinc select variables MACRONAME = "!forMake"
/ATTRVALUES NAME=IncludeInMake VALUE="".
VARSTOCVASES /MAKE Val FROM !forMake /INDEX=var(val) .
For example, say I've got the output of:
SELECT
$text$col1, col2, col3
0,my-value,text
7,value2,string
0,also a value,fort
$text$;
Would it be possible to populate a table directly from it with the COPY command?
Sort of. You would have to strip the first two and last lines of your example in order to use the data with COPY. You could do this by using the PROGRAM keyword:
COPY table_name FROM PROGRAM 'sed -e ''1,2d;$d'' inputfile';
Which is direct in that you are doing everything from the COPY command and indirect in that you are setting up an outside program to filter your input.
In SSIS 2008 I have a variable called #[User::EANcode] It contains a string with a product eancode like '1234567891123'. The value is derived from a filename like'1234567891123.jpg' via a foreach loop.
However, sometimes the filenames contain an extra '_1', '_2' etc. at the end like '1234567891123_1.jpg' resulting in a value '1234567891123_1' in the EANcode variable.
This happens when there is more than one image for the same EANcode (product). The _N addition is always a number and it is always at the end of the name/string.
What is the expression to find/cath the '_1' (or_2 or_N etc) so you can store it in another variable called #[User::Addition]?
If there is no addition, the variable stays empty which is fine.
The reason I need to get this _N addition into a separate variable is that I later on need it to rename the filename but paste the addition back at the end.
Thanks!
I think you're looking for CHARINDEX() in conjunction with SUBSTRING(). With that, you can split off that _# to another variable like this (copy/pasta and execute to see. Play with the #temp1 variable to see the limitations of the code):
declare #temp1 varchar(20), #temp2 varchar(20)
set #temp1 = '1234567891123_12'
IF CHARINDEX('_', #temp1) > 1
set #temp2 = SUBSTRING(#temp1,CHARINDEX('_', #temp1),LEN(#temp1)-CHARINDEX('_',#temp1)+1)
select #temp1, #temp2
Hope it helps!
I keep running into an error when trying to add variables of one spss file to another. File 1 has 1.800.000 cases [payments], File 2 has 800.000 cases [recipients]. They both have an ID number to match cases on.
For every payment in File 1 I want to add the recipient, from File 2. The recipients should thus be able to match for multiple payments.
This are the two codes I have been trying, which don't work:
code using IN
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid(A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid(A).
Match Files /File=DataSet1
/In=DataSet2
/BY globalrecipientid.
execute
When I use /In I don't get any errors, but the files don't properly match sin it doesn't add any variables.
code using TABLE
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid(A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid(A).
Match Files /File=DataSet1
/TABLE=DataSet2
/BY globalrecipientid.
execute
When I use /TABLE I get the following error:
Warning # 5132
Undefined error #5132 - Cannot open text file 'S:\Progra~1\spss\IBM\SPSS\STATIS~1\20\lang\en\spss.err": No such file or directory
I have run out of tricks, wouldn't dare try this in Ruby, and excel sadly is too small to handle this.. Any thoughts?
Your first solution is wring because you are using IN subcommand wrongly. In other words you are matching Dataset1 with nothing.
IN creates a new variable in the resulting file that indicates whether
a case came from the input file named on the preceding FILE
subcommand.
Your second solution. You are sorting dataset by variable recipientid but the match files is done by the variable globalrecipientid. Why do you sort by one variable but match by another? This could be a problem. And dataset names should be in quotes.
Solution 1:
DATASET ACTIVATE DataSet1.
SORT CASES BY recipientid (A).
DATASET ACTIVATE DataSet2.
SORT CASES BY recipientid (A).
Match Files
/File = "DataSet1"
/TABLE = "DataSet2"
/BY recipientid.
execute.
Solution 2. I never liked the implementation of datasets in SPSS. I did not trusted them. Other solution is to save datasets as files and do the match of files.
get "file1.sav".
SORT CASES BY recipientid (A).
save out "file1s.sav".
get "file2.sav".
SORT CASES BY recipientid (A).
save out "file2s.sav".
Match Files
/File = "file1s.sav"
/TABLE = "file2s.sav"
/BY recipientid.
execute.
My syntax looks somwhat different:
DATASET ACTIVATE DatenSet1.
MATCH FILES /FILE=*
/FILE='DatenSet2'
/RENAME VarsToRename
/BY ID
/DROP= Vars
EXECUTE.
Maybe this helps?
I don't think SPSS macros can return values, so instead of assigning a value like VIXL3 = !getLastAvail target=VIX level=3 I figured I need to do something like this:
/* computes last available entry of target at given level */
define !compLastAvail(name !Tokens(1) /target !Tokens(1) /level !Tokens(1))
compute tmpid= $casenum.
dataset copy tmpset1.
select if not miss(!target).
compute !name= lag(!target, !level).
match files /file= * /file= tmpset1 /by tmpid.
exec.
delete variables tmpid.
dataset close tmpset1.
!enddefine.
/* compute last values */
!compLastAvail name="VIXCL3" target=VIXC level=3.
The compute !name = ...is where the problem is.
How should this be done properly? The above returns:
>Error # 4285 in column 9. Text: VIXCL3
>Incorrect variable name: either the name is more than 64 characters, or it is
>not defined by a previous command.
>Execution of this command stops.
When you pass tokens to the macro, they get interpreted literally. So when you specify
!compLastAvail name="VIXCL3"
It gets passed to the corresponding compute statement as "VIXCL3", instead of just a variable name without quotation marks (e.g. VIXCL3).
Two other general pieces of advice;
If you do the command set mprint on before you execute your macro, you will see how your tokens are passed to the macro. In this instance, if you had taken that step, you would have seen that the offending compute statement and error message.
Sometimes you do what to use quotation marks in tokens, and when that is the case the string commands !QUOTE and !UNQUOTE come in handy.