Importing Flat file dynamically using %MACRO and dataset values SAS - macros

I have a folder with various flat files. There will be new files added every month and I need to import this raw data using an automated job. I have managed everything except for the final little piece.
Here's my logic:
1) I Scan the folder and get all the file names that fit a certain description
2) I store all these file names and Routes in a Dataset
3) A macro has been created to check whether the file has been imported already. If it has, nothing will happen. If it has not yet been imported, it will be imported.
The final part that I need to get right, is I need to loop through all the records in the dataset created in step 2 and execute the macro from step 3 against all file names.
What is the best way to do this?

Look into call execute for executing a macro from a data step.

The method I most often use, is to write the macro statements to a file and use %include to submit it. I guess call execute as Reeza suggested is better, but I feel more in control when I do it like this:
filename s temp;
data _null_;
set table;
file s;
put '%macrocall(' variable ');';
run;
%inc s;

Related

(sas) concatenate multiple files from different folders

I'm a relatively new SAS user, so please bear with me!
I have 63 folders that each contain a uniquely named xls file, all containing the same variables in the same order. I need to concatenate them into a single file. I would post some of the code I've tried but, trust me, it's all gone horribly awry and is totally useless. Below is the basic library structure in a libname statement, though:
`libname JC 'W:\JCs\JC Analyses 2016-2017\JC Data 2016-2017\2 - Received from JCs\&jcname.\2016_&jcname..xls`
(there are 63 unique &jcname values)
Any ideas?
Thanks in advance!!!
This is a common requirement, but it requires a fairly uncommon knowledge of multiple SAS functions to execute well.
I like to approach this problem with a two step solution:
Get a list of filenames
Process each filename in a loop
While you can process each filename as you read it, it's a lot easier to debug and maintain code that separates these steps.
Step 1: Read filenames
I think the best way to get a list of filenames is to use dread() to read
directory entries into a dataset as follows:
filename myfiles 'c:\myfolder';
data filenames (keep=filename);
dir = dopen('myfiles');
do file = 1 to dnum(dir);
filename = dread(dir,file);
output;
end;
rc = dclose(dir);
run;
After this step you can verify that the correct filenames have been read be printing the dataset. You could also modify the code to only output certain types of files. I leave this as an exercise for the reader.
Step 2: use the files
Given a list of names in a dataset, I prefer to use call execute() inside a data step to process each file.
data _null_;
set filenames;
call execute('%import('||filename||')');
run;
I haven't included a macro to read in the Excel files and concatenate the dataset (partly because I don't have a suitable list of Excel files to test, but also because it's a situational problem). The stub macro below just outputs the filenames to the log, to verify that it's running:
%macro import(filename);
/* This is a dummy macro. Here is where you would do something with the file */
%put &filename;
%mend;
Notes:
Arguably there are many are many examples of how to do this in multiple places on the web, e.g.:
this SAS knowledge base article (http://support.sas.com/kb/41/880.html)
or this paper from SUGI,
However, most of them rely on the use of pipe to run a dir or ls command, which I feel is the wrong approach because it's platform dependent and in many modern environments the ability to pipe shell commands will be disabled.
I based this on an answer by Daniel Santos in communities.sas.com, but, given the superior functionality of stackoverflow I'd much rather see a good answer here.

How to save some variable values in a file

I want at the end of my program to get the values stored at certain variables and append them to a file let's say "result". I am going to run it several times (for different parameters) at night and then check results in the morning.
Basically, I am looking for something similar to redirection in linux (>>) for matlab.
I am using the diary function to store the whole messages from my program and i want to keep those for verifying later.
But here what I want is just some specific values. So how to do it?
It does not necessary have to be in the same file. If I can get each result in a separate file, that is ok too.
You can use a combination of diary and any function which can append data to a text file, but you have to turn off diary before writing. A short example using save
f='example.txt'
diary(f);
for ix=1:10
disp(ix);
diary off %diary off to flush
save(f,'ix','-append','-ascii')
diary(f);
end
Instead of save you can also use fprntf or dlmwrite

Write RDD in txt file

I have the following type of data:
`org.apache.spark.rdd.RDD[org.apache.spark.rdd.RDD[((String, String),Int)]] = MapPartitionsRDD[29] at map at <console>:38`
I'd like to write those data in a txt file to have something like
((like,chicken),2) ((like,dog),3) etc.
I store the data in a variable called res
But for the moment I tried with this:
res.coalesce(1).saveAsTextFile("newfile.txt")
But it doesn't seem to work...
If my assumption is correct, then you feel that the output should be a single .txt file if it was coalesced down to one worker. This is not how Spark is built. It is meant for distributed work and should not be attempted to be shoe-horned into a form where the output is not distributed. You should use a more generic command line tool for that.
All that said, you should see a folder named newfile.txt which contains data files with your expected output.

MATLAB- Individually Saving Workspace Variables into Many Individual .mat Files

So I have many files in a MATLAB workspace all in the same format,
"project1day1", "project1day2" etc. and instead of having them all in the same workspace, I want to save them as their own individual .mat files with the same name.
So, I want the "project1day1" variable in the workspace to go to a "project1day1.mat" file.
I have 7 projects, and all of them except for project 1 has 3 "days". I was having trouble executing the exact syntax to do it. I want to loop through my workspace data in a general fashion. I want to execute something along the lines of:
maxdays=3;
maxprojects=7;
for i = 1:maxprojects;
for j = 1:maxdays;
save('project%dday%d','project%dday%d,i,j,i,j)
end
end
Two things:
1) The save option isn't working
2) I need to include some sort of ~if(exist '...') for the case where there isn't a 3rd day, but I'm having trouble doing so.
As rayryeng wrote, I think in most cases it would be better to either save the variables in one file, or (you wrote they are all in the same format) use a structure or a cell array, which makes it much easier to access them later.
If you really need to save all variables in the workspace to separate files you can do something like this:
vars = who;
for i=1:length(vars)
save([vars{i} '.mat'], vars{i});
end
But again, I wouldn't do this if it is not (for some reason) absolutely necessary!

Script for running (testing) another matlab script?

I need to create a matlab mfile that will run another matlab file with default values given in a txt file. It's ment to be useful for testing programs, so that user may specify values in a txt files and instead of inputing values every time he starts the program, my script will give the program default values and user will only see the result.
My idea is to load tested file into a variable, change 'variable=input('...');' for variable = default_variable;, save it to tmp file, execute, and than delete tmp file. Is this going to do the job?
I have only two problems:
1) How to eliminate the problem of duplicated variable names - i mean this must work for all scripts, i don't know the names of variables used in tested script.
2) As I wrote before - is this going to work fine? Or maybe I missed a easier way to do it - for example maybe I don't have to create a tmp file?
I really need your help!
Thanks in advance!
If the person who has to edit the default values has access to Matlab, I would recommend saveing the values in a mat file and loading them when needed. Otherwise you could just write a smalls cript that contains the assignment to certain variables, but make sure to keep this small. For example:
maxRuns = 100;
clusters = 12;
So much for setting up the defaults. Regarding the process my main advice is to wrap the thing that you want to test into a function. This way variables used in the code to call the 'script' will not interfere as a function gets its own separate workspace. Check doc function if you are not familiar with them.