How do I sort files into multiple folders in MATLAB? - matlab

I am little stuck on a problem. I have tons of files generated daily and I need to sort them by file name and date. I need to do this so my MATLAB script can read them. I currently do this manually, but was wondering if there is a easier way in MATLAB to sort and copy files.
My file names look like:
data1_2009_12_12_9.10
data1_2009_12_12_9.20
data1_2009_12_12_9.30
data1_2009_12_12_9.40
data2_2009_12_12_9.10
data2_2009_12_12_9.20
data2_2009_12_12_9.30
data2_2009_12_12_9.40
data3_2009_12_12_9.10
data3_2009_12_12_9.20
data3_2009_12_12_9.30
data3_2009_12_12_9.40
...
and tons of files like this.
Addition to above problem :
There has to be a easier way to stitch the files together.
I mean copy file
' data1_2009_12_12_9.20' after file 'data1_2009_12_12_9.10' and so on ,...
such that i am left with a huge txt file in end named data1_2009_12_12 ( or what ever ). containing all the data stitched together.
Only way now i know to do is open all files with individual dlmread command in matlab and xls write one after another ( or more trivial way of copy paste manually )

Working in the field of functional imaging research, I've often had to sort large sets of files into a particular order for processing. Here's an example of how you can find files, parse the file names for certain identifier strings, and then sort the file names by a given criteria...
Collecting the files...
You can first get a list of all the file names from your directory using the DIR function:
dirData = dir('your_directory'); %# Get directory contents
dirData = dirData(~[dirData.isdir]); %# Use only the file data
fileNames = {dirData.name}; %# Get file names
Parsing the file names with a regular expression...
Your file names appear to have the following format:
'data(an integer)_(a date)_(a time)'
so we can use REGEXP to parse the file names that match the above format and extract the integer following data, the three values for the date, and the two values for the time. The expression used for the matching will therefore capture 6 "tokens" per valid file name:
expr = '^data(\d+)\_(\d+)\_(\d+)\_(\d+)\_(\d+)\.(\d+)$';
fileData = regexp(fileNames,expr,'tokens'); %# Find tokens
index = ~cellfun('isempty',fileData); %# Find index of matches
fileData = [fileData{index}]; %# Remove non-matches
fileData = vertcat(fileData{:}); %# Format token data
fileNames = fileNames(index); %# Remove non-matching file names
Sorting based on the tokens...
You can convert the above string tokens to numbers (using the STR2DOUBLE function) and then convert the date and time values to a date number (using the function DATENUM):
nFiles = size(fileData,1); %# Number of files matching format
fileData = str2double(fileData); %# Convert from strings to numbers
fileData = [fileData zeros(nFiles,1)]; %# Add a zero column (for the seconds)
fileData = [fileData(:,1) datenum(fileData(:,2:end))]; %# Format dates
The variable fileData will now be an nFiles-by-2 matrix of numeric values. You can sort these values using the function SORTROWS. The following code will sort first by the integer following the word data and next by the date number:
[fileData,index] = sortrows(fileData,1:2); %# Sort numeric values
fileNames = fileNames(index); %# Apply sort to file names
Concatenating the files...
The fileNames variable now contains a cell array of all the files in the given directory that match the desired file name format, sorted first by the integer following the word data and then by the date. If you now want to concatenate all of these files into one large file, you could try using the SYSTEM function to call a system command to do this for you. If you are using a Windows machine, you can do something like what I describe in this answer to another SO question where I show how you can use the DOS for command to concatenate text files. You can try something like the following:
inFiles = strcat({'"'},fileNames,{'", '}); %# Add quotes, commas, and spaces
inFiles = [inFiles{:}]; %# Create a single string
inFiles = inFiles(1:end-2); %# Remove last comma and space
outFile = 'total_data.txt'; %# Output file name
system(['for %f in (' inFiles ') do type "%f" >> "' outFile '"']);
This should create a single file total_data.txt containing all of the data from the individual files concatenated in the order that their names appear in the variable fileNames. Keep in mind that each file will probably have to end with a new line character to get things to concatenate correctly.

An alternative to what #gnovice suggested is to loop over the file names and use sscanf() to recover the different sections in the filenames you are interested in:
n = sscanf(filename, 'data%d_%d_%d_%d_%d.%d')
n(1) %# data number
n(2) %# the year
...
Example:
files = dir('data*'); %# list all entries beginning with 'data'
parts = zeros(length(files), 6); %# read all the 6 parts into this matrix
for i=1:length(files)
parts(i,:) = sscanf(files(i).name, 'data%d_%d_%d_%d_%d.%d')'; %'#transposed
end
[parts idx] = sortrows(parts, [6 1]); %# sort by one/multiple columns of choice
files = files(idx); %# apply the new order to the files struct
EDIT:
I just saw your edit about merging those files. That can be done easily from the shell. For example lets create one big file for all data from the year 2009 (assuming it makes sense to stack files on top of each other):
on Windows:
type data*_2009_* > 2009.backup
on Unix:
cat data*_2009_* > 2009.backup

In Matlab the function call
files = dir('.');
returns a structure (called files) with fields
name
date
bytes
isdir
datenum
You can use your usual Matlab techniques for manipulating files.names.

Related

How do you call a function that takes in a MAT file, manipulate the data in that file, and create a new textfile with that same MAT file name?

The filename in question is a MAT file that contains elements in the form of "a - bi" where 'i' signifies an imaginary number. The objective is to separate the real, a, and imaginary, b, parts of these elements and put them into two arrays. Afterwards, a text file with the same name as the MAT file will be created to store the data of the newly created arrays.
Code:
function separate(filename)
realArray = real(filename)
imagArray = imag(filename)
fileIDname = strcat(filename, '.txt')
fileID = fopen(fileIDname, 'w')
% more code here - omitted for readability
end
I am trying to run the above code via command window. Here's what I've tried so far:
%attempt 1
separate testFileName
This does not work as the output does not contain the correct data from the MAT file. Instead, realArray and imagArray contains data based on the ascii characters of "testFileName".
e.g. first element of realArray corresponds to the integer value of 't', the second - 'e', third - 's', etc. So the array contains only the number of elements as the number of characters in the file name (12 in this case) instead of what is actually in the MAT file.
%attempt 2
load testFileName
separate(testFileName)
So I tried to load the testFileName MAT variable first. However this throws an error:
Complex values cannot be converted to chars
Error in strcat (line 87)
s(1:pos) = str;
Error in separate (line xx)
fileIDname = strcat(filename, '.txt')
Basically, you cannot concatenate the elements of an array to '.txt' (of course). But I am trying to concatenate the name of the MAT file to '.txt'.
So either I get the wrong output or I manage to successfully separate the elements but cannot save to a text file of the same name after I do so (an important feature to make this function re-usable for multiple MAT files).
Any ideas?
A function to read complex data, modify it and save it in a txt file with the same name would look approximately like:
function dosomestuff(fname)
% load
data=load(fname);
% get your data, you need to knwo the variable names, lets assume its call "datacomplex"
data.datacomplex=data.datacomplex+sqrt(-2); % "modify the data"
% create txt and write it.
fid=fopen([fname,'.txt'],'w');
fprintf(fid, '%f%+fj\n', real(data.datacomplex), imag(data.datacomplex));
fclose(fid);
There are quite few assumptions on the data and format, but can't do more without extra information.

Select specific filenames from an array of filenames containing a date in the name

If I have a group of .wav files and Im trying to pick only month wise or do daily/only night psd(power spectral density) averages etc or choose files belonging to a month how to go about? The following are first 10 .wav files in a .txt file that are read into matlab code-
AMAR168.1.20150823T200235Z.wav
AMAR168.1.20150823T201040Z.wav
AMAR168.1.20150823T201845Z.wav
AMAR168.1.20150823T202650Z.wav
AMAR168.1.20150823T203455Z.wav
AMAR168.1.20150823T204300Z.wav
AMAR168.1.20150823T205105Z.wav
AMAR168.1.20150823T205910Z.wav
AMAR168.1.20150823T210715Z.wav
yyyymmddTHHMMSSZ.wav is part of the format to get sense of some parameters.
Many thanks
You need to be more specific.
Do all files always start with "AMAR168.1." for instance?
Anyway, here's a general approach to get you started:
AllFilenames = fileread ('filenames.dat');
FileNames = strsplit (AllFilenames, '\n');
for i = FileNames
if ~isempty (strfind (i{:}, '20150823')); disp(i{:}); end
end
Your filename examples aren't very useful because they all have the same date, but, anyway, you get the point.
Alternatively, if the filenames always have the same format and size, you could do, e.g.:
AllFilenames = fileread ('filenames.dat');
AllFilenames = strvcat (strsplit (AllFilenames, '\n'));
LogicalIndices = categorical (cellstr (AllFilenames(:,15:16))) == '08';
to obtain all rows where the month is '08' for instance. This assumes that the month is always at position 15 to 16 in the string

Saving strings from the input .txt filename - MATLAB

Via textscan, I am reading a number of .txt files:
fid1 = fopen('Ev_An_OM2_l5_5000.txt','r');
This is a simplification as in reality I am loading several hundred .txt files via:
files = dir('Ev_An*.txt');
Important information not present within the .txt files themselves are instead part of the filename.
Is there a way to concisely extract portions of the filename and save them as strings/numbers? For example saving 'OM2' and '5000' from the above filename as variables.
fileparts appears to require the full path of the file rather than just defaulting to the MATLAB folder as with textscan.
It depends on how fixed your filename is. If your filename is in the string filename, then you can use regexp to extract parts of your filename, like so:
filename = 'Ev_An_OM2_l5_5000.txt'; %or whatever
parts = regexp(filename,'[^_]+_[^_]+_([^_]+)_[^_]+_([^\.]+)\.txt','tokens');
This will give you parts{1}=='OM2' and parts{2}=='5000', assuming that your filename is always in the form of
something_something_somethingofinterest_something_somethingofinterest.txt
Update:
If you like structs more than cells, then you can name your tokens like so:
parts = regexp(filename,'[^_]+_[^_]+_(?<first>[^_]+)_[^_]+_(?<second>[^\.]+)\.txt','names');
in which case parts.first=='OM2' and parts.second=='5000'. You can obviously name your tokens according to their actual meaning, since they are important. You just have to change first and second accordingly in the code above.
Update2:
If you use dir to get your filenames, you should have a struct array with loads of unnecessary information. If you really just need the file names, I'd use a for loop like so:
files = dir('Ev_An*.txt');
for i=1:length(files)
filename=files(i).name;
parts = regexp(filename,'[^_]+_[^_]+_(?<first>[^_]+)_[^_]+_(?<second>[^\.]+)\.txt','tokens');
%here do what you want with parts.first, parts.second and the file itself
end

Find and replace text file Matlab

I'm writting a Matlab code that generates an array number and it should replace that each number in a text file (that already exists) and replace all instances with that. The number should be in string format. I've achieved this:
ita='"';
for i=1:size(z,2)
word_to_replace=input('Replace? ','s');
tik=input('Replacement? ','s');
coluna=input('Column? ');
files = dir('*.txt');
for i = 1:numel(files)
if ~files(i).isdir % make sure it is not a directory
contents = fileread(files(i).name);
fh = fopen(files(i).name,'w');
val=num2str(z(i,coluna));
word_replacement=strcat(tik,val,ita);
contents = regexprep(contents,'word_to_replace','word_replacement');
fprintf(fh,contents); % write "replaced" string to file
fclose(fh) % close out file
end
end
end
I want the code to open the file#1 ('file.txt'), find and replace all instances 'word_replacement' with 'word_to_replace' and save to the same file. The number of txt files is undefined, it could be 100 or 10000.
Many thanks in advance.
The problem with your code is the following statement:
contents = regexprep(contents,'word_to_replace','word_replacement');
You are using regular expressions to find any instances of word_to_replace in your text files and changing them to word_replacement. Looking at your code, it seems that these are both variables that contain strings. I'm assuming that you want the contents of the variables instead of the actual name of the variables.
As such, simply remove the quotations around the second and third parameters of regexprep and this should work.
In other words, do this:
contents = regexprep(contents, word_to_replace, word_replacement);

How to avoid the repeated paragraghs of long txt files being ignored for importdata in matlab

I am trying to import all double from a txt file, which has this form
#25x1 string
#9999x2 double
.
.
.
#(repeat ten times)
However, when I am trying to use import Wizard, only the first
25x1 string
9999x2 double.
was successfully loaded, the other 9 were simply ignored
How may I import all the data? (Does importdata has a maximum length or something?)
Thanks
It's nothing to do with maximum length, importdata is just not set up for the sort of data file you describe. From the help file:
For ASCII files and spreadsheets, importdata expects
to find numeric data in a rectangular form (that is, like a matrix).
Text headers can appear above or to the left of the numeric data,
as follows:
Column headers or file description text at the top of the file, above
the numeric data. Row headers to the left of the numeric data.
So what is happening is that the first section of your file, which does match the format importdata expects, is being read, and the rest ignored. Instead of importdata, you'll need to use textscan, in particular, this style:
C = textscan(fileID,formatSpec,N)
fileID is returned from fopen. formatspec tells textscan what to expect, and N how many times to repeat it. As long as fileID remains open, repeated calls to textscan continue to read the file from wherever the last read action stopped - rather than going back to the start of the file. So we can do this:
fileID = fopen('myfile.txt');
repeats = 10;
for n = 1:repeats
% read one string, 25 times
C{n,1} = textscan(fileID,'%s',25);
% read two floats, 9999 times
C{n,2} = textscan(fileID,'%f %f',9999);
end
You can then extract your numerical data out of the cell array (if you need it in one block you may want to try using 'CollectOutput',1 as an option).