Writing a script for reading many .csv files with similar filenames - matlab

I have several .csv files with similar filenames except a numeric month (i.e. 03_data.csv, 04_data.csv, 05_data.csv, etc.) that I'd like to read into R.
I have two questions:
Is there a function in R similar to
MATLAB's varname and assignin that
will let me create/declare a variable name
within a function or loop that will allow me to
read the respective .csv file - i.e.
03_data.csv into 03_data data.frame,
etc.? I want to write a quick loop to
do this because the filenames are
similar.
As an alternative, is it better to
create one dataframe with the first
file and then append the rest using a
for loop? How would I do that?

You could look at this related question. You can create the file names easily with a paste command:
file.names <- paste(sprintf("%02d",1:10), "_data.csv", sep="")
Once you have your file names (whether by creating them or by reading them from the directory as in the other question), you can import them quickly with an lapply:
import.list <- lapply(file.names, read.csv)
Lastly, to combine the list into one dataframe, the easiest approach is to use the reshape function below:
library(reshape)
data <- merge_recurse(import.list)

It is also very easy to read the content of a directory including use of regular expressions to skip focus on certain names only, e.g.
filestoread <- list.files(someDir, pattern="\\.csv$", full.names=TRUE)
returns all (fully-formed, including full path) files in the given directory someDir that end on ".csv". You can get fancier with better regular expressions which are documented in many places.
Once you have your list of files, it is straightforward to read them all using apply or lapply or a loop.

Related

(sas) concatenate multiple files from different folders

I'm a relatively new SAS user, so please bear with me!
I have 63 folders that each contain a uniquely named xls file, all containing the same variables in the same order. I need to concatenate them into a single file. I would post some of the code I've tried but, trust me, it's all gone horribly awry and is totally useless. Below is the basic library structure in a libname statement, though:
`libname JC 'W:\JCs\JC Analyses 2016-2017\JC Data 2016-2017\2 - Received from JCs\&jcname.\2016_&jcname..xls`
(there are 63 unique &jcname values)
Any ideas?
Thanks in advance!!!
This is a common requirement, but it requires a fairly uncommon knowledge of multiple SAS functions to execute well.
I like to approach this problem with a two step solution:
Get a list of filenames
Process each filename in a loop
While you can process each filename as you read it, it's a lot easier to debug and maintain code that separates these steps.
Step 1: Read filenames
I think the best way to get a list of filenames is to use dread() to read
directory entries into a dataset as follows:
filename myfiles 'c:\myfolder';
data filenames (keep=filename);
dir = dopen('myfiles');
do file = 1 to dnum(dir);
filename = dread(dir,file);
output;
end;
rc = dclose(dir);
run;
After this step you can verify that the correct filenames have been read be printing the dataset. You could also modify the code to only output certain types of files. I leave this as an exercise for the reader.
Step 2: use the files
Given a list of names in a dataset, I prefer to use call execute() inside a data step to process each file.
data _null_;
set filenames;
call execute('%import('||filename||')');
run;
I haven't included a macro to read in the Excel files and concatenate the dataset (partly because I don't have a suitable list of Excel files to test, but also because it's a situational problem). The stub macro below just outputs the filenames to the log, to verify that it's running:
%macro import(filename);
/* This is a dummy macro. Here is where you would do something with the file */
%put &filename;
%mend;
Notes:
Arguably there are many are many examples of how to do this in multiple places on the web, e.g.:
this SAS knowledge base article (http://support.sas.com/kb/41/880.html)
or this paper from SUGI,
However, most of them rely on the use of pipe to run a dir or ls command, which I feel is the wrong approach because it's platform dependent and in many modern environments the ability to pipe shell commands will be disabled.
I based this on an answer by Daniel Santos in communities.sas.com, but, given the superior functionality of stackoverflow I'd much rather see a good answer here.

How do i open a number of xlsx files when I only know part of the filename in Matlab?

I am using Matlab 2013b. I have a list of excel files in a directory and I want to open chosen ones within a loop to read out the data.
I only know the start of each of the file names however, not the ending. I have a vector of numbers that provide the identifying portion of the file's name (filenumber), and I want to loop through the excel files one by one, opening them and extracting the data, then closing them.
There are 500 files, each of the format: img_****ff*******.xlsx, where the first set of asterisks is my filenumber, while the second set of asterisks is unknown.
So far, I have tried listing what is in the directory using:
list=dir('E:\processed\Img*');
filenames={list.name}
This provides with with the full file names.
I tried then within a loop to create the part of the filename I know exists:
x = sprintf('Img_%d_FF_',img(1,1));
I then thought I could use 'Find' to look for my my partial filename/string in the 'filenames' structure above. I don't think I have the code correct for this datatype though:
index = find(strcmp({list.name}, x)==1)
You're pretty close, but the issue is that strcmp compares the whole string and since you only have the beginning part, it is not going to match. I would use strncmp to compare only the first n characters of the string. We can determine what n is based upon the length of your string x.
matches = strncmp({list.name}, x, numel(x));
thisfile = list(matches).name;

Rename specific a component of file names using a list

I have a number of files that look like this:
imgDATA_subj001_log000_sess001_at.img
imgDATA_subj001_log000_sess001_cn.img
imgDATA_subj001_log000_sess001_cx.img
imgDATA_subj001_log000_sess002_at.img
imgDATA_subj001_log000_sess002_cn.img
imgDATA_subj001_log000_sess002_cx.img
imgDATA_subj002_log000_sess001_at.img
...
I want to rename a specific numeric part of the file name after subj . For instance, subj001, subj002, subj003, etc. would be renamed to subj014, subj027,subj65, etc. but preserve the rest of the file name. I have the list of new names but not sure how to look for old names and match with the new names then do the renaming. I tried using loops and fileparts but I don't know how to isolate the subj*** component. I could do move but that would be very inefficient. Can anyone help?
If you know the portions of the filename that you want to replace specifically, ie you know that subj001 = subj014 then what you should do is use a dir command to get the list of files in the existing directory.
This will give you a list of the files,
f=dir(imgData*.img)
for somecounter=1:length(f)
filename=f(somecounter).name
newname=strrep(filename,'subj001','subj014')
movefile(filename,newname)
end
Obviously you'll want to setup an array of each of the individual names to match up and iterate through that.

Write RDD in txt file

I have the following type of data:
`org.apache.spark.rdd.RDD[org.apache.spark.rdd.RDD[((String, String),Int)]] = MapPartitionsRDD[29] at map at <console>:38`
I'd like to write those data in a txt file to have something like
((like,chicken),2) ((like,dog),3) etc.
I store the data in a variable called res
But for the moment I tried with this:
res.coalesce(1).saveAsTextFile("newfile.txt")
But it doesn't seem to work...
If my assumption is correct, then you feel that the output should be a single .txt file if it was coalesced down to one worker. This is not how Spark is built. It is meant for distributed work and should not be attempted to be shoe-horned into a form where the output is not distributed. You should use a more generic command line tool for that.
All that said, you should see a folder named newfile.txt which contains data files with your expected output.

Count number of files in a FTP folder

I use this code to see existed files in a specific folder from FTP :
ftp_client = ftp('IP','Username','Password');
aa = dir(ftp_client,'First_folder/Second_folder');
I can see file names with these codes :
aa(1,1).name
aa(2,1).name
aa(3,1).name
How can I see all files names in a cell aray in this specific folder? Is there any command for it?
How can I count number of existed files in this folder?
How can I count number of existed files in this folder with a specific format?
Thanks.
A simple way is to collect the values into the cell array by using curly brackets: filenames = {aa.name};
The simplest method would be length(aa); or length(filenames);
A couple ways. You can either refine your dir call, aa = dir(ftp_client,'First_folder/Second_folder/*.jpeg') for example, or use your own filter (regexp is one option) on your filenames to return the indices of what you want.
As a general aside, if you're utilizing this program on different operating systems, I would recommend utilizing fullfile (or filesep at the very least) to build your full pathnames to ensure the right separator is used. Though I didn't do it in my example above...