I have a dataset where I need to separate the cases into several different files.
currently I am running:
DATASET COPY DATA1 WINDOW = FRONT.
... (repeats) ...
DATASET COPY DATA25 WINDOW = FRONT.
after it makes 25 copies, I use a bunch of select if commands on each one to pick only the cases I want, and save them as DATA1, through DATA25.
What I want to do is set up some kind of macro or loop so I can say:
LET %X = 1 to 25
loop
DATASET COPY DATA'%X' WINDOW = FRONT.
end loop
Instead of needing 25 lines of near identical syntax. This is just a very simple use case, but I am hoping I can branch out from there and do a whole bunch of other things using this kind of syntax, such as opening multiple files where I can put in a wildcard for the part for the filename that changes, or using a wildcard so I can open 'sheet 1' of and excel, and then repeat for sheets 2 to 10.
Is this something I can do with SPSS? Do I need a Python or R extension? Everything I have seen thus far only lets you loop through running a set of commands on a range of variables.
If you are using SPSS version 23 you don't need to install a Python extension, it is included in the original SPSS installation.
If you do not want to use Python or R, you can do what you described with syntax alone using SPSS macro - look up !define - !enddefine.
For example, the following macro will repeat the loop 25 times, in each loop it will go to the original file, copy it to a dataset called DATA#, select cases where MyFilterVar=#, and save the cases to a separate file named like the dataset:
define CreateCopies ()
!do !i=1 !to 25
dataset activate Orig.
dataset copy !concat('DATA',!i).
dataset activate !concat('DATA',!i).
select if MyFilterVar=!i.
!let !FileName=!concat('MyPath\DATA',!i,".sav")
save out=!quote(!FileName).
!doend
!enddefine.
after defining the macro you need to name your original file and then call the macro:
dataset name orig.
CreateCopies.
Related
I am working on this project which requires analyzing a large (>50GB) dataset in a server, both in Stata and MATLAB. Both parts are required and I cannot use only one of them.
My ultimate goal is to generate a .tex file named something like commands.tex which looks like this:
\newcommand{\var1}{val1}
\newcommand{\var2}{val2} % MATLAB file matlab_file.m on DD/MM/YYYY
\newcommand{\var3}{val3} % Stata file stata_file.m on DD/MM/YYYY
...
where variables are ordered alphabetically and each of the values is most probably a number. Note that the commands in the comments would help me trace where did I generate the values. The usage of the file is so that after a preamble I can use LaTeX on the following way:
<preamble>
\input{commands.tex}
\begin{document}
Variable 1 has a value of \var1 and variable 2 has a value of \var2.
\end{document}
The purpose of this is so that I can analyze locally (or remotely) a sample, say of 0.1 or 10 percent of the total observations, write a report with those, and then run the analysis again with a bigger size. I want to completely eliminate the chances of me copying a number wrong.
I am trying to write some code both in MATLAB and Stata, but I think that is beyond my expertise, and would be very grateful if someone could help me figure out how to do it. To be honest, I feel I would be able to do the MATLAB part but the Stata I have no idea.
Stata code
What I am trying to do is to generate a command that takes as an input a name and a scalar and as an output defines the corresponding variable in my commands.tex file detailed above. My goal is to be able to generate something like this:
sysuse auto
reg price weight
define_variable PriceWeight = _b[weight], format(%4.2f)
and what I hope the code to do is that:
If \newcommand{\PriceWeight} does not exist in commands.tex then it adds its value to the list, preserving the alphabetical order.
If the variable exists then it deletes its value and rewrites above it, with the value given in the scalar.
I know how to give the values to a program in Stata, but I do not exactly know how to use those values and perform the necessary commands. The syntax is something like:
program define define_variable
syntax anything = X, [format(string)]
<other code>
end
Note: Of course, I need something way deeper than regression coefficients, but as a simple example this would suffice.
MATLAB code
This seems to be easier in MATLAB, but I do not know exactly how to automate the process. In MATLAB what I want to be able to do is something like:
clc; clear;
PriceWeight = 3
define_variable('PriceWeight',PriceWeight,format)
again where it automatically goes to the single file and updates it accordingly. Any hel[p with be very much appreciated.
Based on your comments and assuming that your file with all relevant variables is not huge, I would suggest getting your data from Stata to Matlab, and update your variables there as necessary (using functions such as exist or strcmp if you have a list of names). A quick google search gives me this link for Stata to Matlab.
To make it easy to process you might want to create a cell (I will call C), where one column contains all variable names and one column contains the scalar values.
Then, once you have assembled all your variables, you can sort your cell alphabetically and write it to a file using this.. Of course you would write a .tex file, and then iterate over your cell with something like
fprintf(fID,'\newcommand{\%s}{%f} ',C{i,1},C{i,2})
I hope this is understandable and helps.
I have a long syntax (1800 lines) and this one portion has been giving me trouble. I can't for the life of me figure out what I'm doing incorrectly.
It is supposed to take an existing file and narrow it down to just the variables listed in the /KEEP statement. Then every variable is renamed to a similar variable name, but "oldxxxx". Later my syntax matches the new file to this updated variable file and points out any changes in the values, giving a list of reasons in the recoded file.
Once the syntax reaches the first RENAME VARIABLES I get the following error:
RENAME VARIABLES Duplicate variable names from RENAME.
Thank you in advance!
First a couple of remarks: It would be better practice to save to a different file name. In your syntax the original file gets saved over and you can't go back... Also I recommend you follow #Andy W's advice regarding how to keep only the variables you need in your file.
Now, in the sample syntax you posted I see an error - possibly that's your problem:
RENAME VARIABLES (total_EMFASYS_award=oldgrant).
The new name is oldgrant instead or oldtotal_EMFASYS_award. Possibly further down you've got another command saying
RENAME VARIABLES (grant=oldgrant).
hence the double name.
To avoid such errors and shorten your syntax, you could use the following macro:
define renVars (!pos=!cmdend)
rename variables
!do !i !in (!1) !i = !concat("old",!i)
!doend .
!enddefine.
After running this macro definition you can run the macro by stating the macro name and the full list of variables you want renamed, like this:
renVars
Student_ID rl_highschoolgpa comb need qualitygrp NewUpfrontGrant meritgrant
targetcounty_housing housinggrant tuitiongrant athlete_recruit .
One thing to note about RENAME VARIABLES command - it also works like this:
RENAME VARIABLES (list_of_starting_variable_names = list_of_final_variable_names).
you would just need to provide the 2 names lists, and the renaming will be done in th eorder in which the names are provided (1st variable in list 1 gets renamed to the 1st variable in list 2,... n-th variable in list 1 into the n-th variable in list 2... and so on.
This should avoid the Duplicate Variable Names error you are getting, as all renames are done in one go. But would require you to alter the original syntax a bit, and is a bit harder to spot which variable gets renamed into which variable.
So I have many files in a MATLAB workspace all in the same format,
"project1day1", "project1day2" etc. and instead of having them all in the same workspace, I want to save them as their own individual .mat files with the same name.
So, I want the "project1day1" variable in the workspace to go to a "project1day1.mat" file.
I have 7 projects, and all of them except for project 1 has 3 "days". I was having trouble executing the exact syntax to do it. I want to loop through my workspace data in a general fashion. I want to execute something along the lines of:
maxdays=3;
maxprojects=7;
for i = 1:maxprojects;
for j = 1:maxdays;
save('project%dday%d','project%dday%d,i,j,i,j)
end
end
Two things:
1) The save option isn't working
2) I need to include some sort of ~if(exist '...') for the case where there isn't a 3rd day, but I'm having trouble doing so.
As rayryeng wrote, I think in most cases it would be better to either save the variables in one file, or (you wrote they are all in the same format) use a structure or a cell array, which makes it much easier to access them later.
If you really need to save all variables in the workspace to separate files you can do something like this:
vars = who;
for i=1:length(vars)
save([vars{i} '.mat'], vars{i});
end
But again, I wouldn't do this if it is not (for some reason) absolutely necessary!
I have a .txt file that I need to attach in a column of my sheet, and i have the path to this file.
So I need to read this path and attach the file in another column programmatically. Is there a way to do it?
thanks in advance.
Indeed there is! And using by using macros it is quite easy to do.
Enabling macros
Go to the Tools > Options menu and click on the Security section under OpenOffice.org. Once there, click the Macro Security button. Now on the Security Level Tab, make sure that your settings will allow you to run Macros.
My settings are on low because I'm the author of all the macros I run, if you are not sure that this will be your case you might want to use a higher setting.
Note: Be careful, if you are unlucky or live in the 90's an evil macro can cause serious damage!
Creating a new macro
Now that you can run them, you must create a new macro. OpenOffice accepts a wide range of languages including Python, but since you didn't specified any I'll use OO's version of basic here.
Go to Tools > Macros > Organize Macros > OpenOffice.org Basic, and once there add a new module under your file's tree. Give it a meaningful name.
The actual macro
Once you create a new module the editor screen will pop up, write this code below:
Sub DataFromFile
Dim FileNo As Integer
Dim CurrentLine As String
Dim File As String
Dim Msg as String
Dim I as Integer
' Get the filename from the cell, in this case B1.
currentSheet=ThisComponent.CurrentController.ActiveSheet
fileName = currentSheet.getCellRangeByName("B1").getString
' Create a new file handler and open it for reading
FileNo = FreeFile
Open fileName For Input As #FileNo
I = 0
' Read file until EOF is reached
Do While not eof(FileNo)
' Read line
Line Input #FileNo, CurrentLine
' Define the range to put the data in as A4:A999 '
curentCell = currentSheet.getCellRangeByName("A4:A999").getCellByPosition(0,I)
' Select the I-th cell on the defined range and put a line of the file there
curentCell.String = CurrentLine
'Increase I by one
I = I + 1
Loop
Close #FileNo
End Sub
To test it, just create a text file and put something in it, then put the path to it on cell B1 and run the macro. You can run the macro in many ways, for test purposes just use the Run button on the same window that you used to create the module. This is the expected result:
Note: If you are unfamiliar with linux, don't be intimidated by that file path, it's just how they are on linux. This would just work the same with windows and it's file path structure.
Further improving the macro
I wrote the code above with the goal of making it as easy to understand as possible, therefore the macro have plenty of room for improvement, such as:
Being able to show the data retrieved on multiple columns/A single column/Something else
Once you have retrieved the data from the file, you can display it on your spreadsheet in nearly anyway you want it. Let me know if the way you initially intended was not addressed and I will edit the answer.
Having to re-run the macro every time you want the data updated.
This is easily fixed. There are many ways to automatize the macro execution, the one I'm most familiar with consists on making it run on a loop in conjunction with a delay of, say, 5 seconds and making it start as soon as the file loads.
Sub Main
Do While True
DataFromFile()
Wait(5000)
Loop
End Sub
And from now on you should call the Main sub instead of the DataFromFile.
To make the macro run at start-up go to Tools > Customize on the Events tab and select Open Document from the list then click on the Macro button. On the dialog to select the macro, pick Main. Now close the document, reopen it, and voila!
Using Cell Ranges
It's easier to keep your code and make changes to it if you name the cell ranges and use their names instead of their absolute address. To name a range (or a single cell) you must first select it then click on Data > Define Range to give it a name, for example B1 could be called 'FilePath' and A4:A999 could be called 'DataRange'. This way if you ever need to change them, you don't have to change the macro, just the defined range name.
Don't forget to update the code to look for the range instead of the address, for example, this bit of code:
getCellRangeByName("A4:A999")
would be rewritten to
getCellRangeByName("DataRange")
Error checking
It is a good idea to check and deal with error or unexpected events. What if the file doesn't exists? What if it is bigger than the defined range?
Further reading
Official reference regarding files for OpenOffice Basic macros.
A guide on different ways to run a macro
A great introduction to macro programming
I have several .csv files with similar filenames except a numeric month (i.e. 03_data.csv, 04_data.csv, 05_data.csv, etc.) that I'd like to read into R.
I have two questions:
Is there a function in R similar to
MATLAB's varname and assignin that
will let me create/declare a variable name
within a function or loop that will allow me to
read the respective .csv file - i.e.
03_data.csv into 03_data data.frame,
etc.? I want to write a quick loop to
do this because the filenames are
similar.
As an alternative, is it better to
create one dataframe with the first
file and then append the rest using a
for loop? How would I do that?
You could look at this related question. You can create the file names easily with a paste command:
file.names <- paste(sprintf("%02d",1:10), "_data.csv", sep="")
Once you have your file names (whether by creating them or by reading them from the directory as in the other question), you can import them quickly with an lapply:
import.list <- lapply(file.names, read.csv)
Lastly, to combine the list into one dataframe, the easiest approach is to use the reshape function below:
library(reshape)
data <- merge_recurse(import.list)
It is also very easy to read the content of a directory including use of regular expressions to skip focus on certain names only, e.g.
filestoread <- list.files(someDir, pattern="\\.csv$", full.names=TRUE)
returns all (fully-formed, including full path) files in the given directory someDir that end on ".csv". You can get fancier with better regular expressions which are documented in many places.
Once you have your list of files, it is straightforward to read them all using apply or lapply or a loop.