Converting multiple text file to Mat file in Matlab - matlab

I have multiple text files like Symbol1010, Symbol1020...SymbolXXXX.
I want to know if there is any easiest way to process those files in to mat files.
Specifications:
All the files have the same header (strings) in the first row.
All the files have the date in their first column
All the files have the same number of rows and columns.
I tried using importdata and it works good for single file.

If "importdata" works well for your files I would strongly suggest using it in a loop. If you encounter problems while implementing that, please be more specific in your question. Below is a sample that might be a good starting point.
prefix = 'Symbol';
suffixes = (1010:10:1100);
for idx = 1 : length(suffixes)
filename = [prefix, num2str(suffixes(idx))];
A = importdata(filename);
save(filename,'A');
end

Your question is missing quite a lot of detail so I can only give you a general answer, but I'm going to assume that you already know that you should put the single-file code in a loop and that in your single-file example you currently hardcode the name of the file.
Your first problem would then be how to get the list of files. The functions you want are dir and possibly fullfile, you should check out the documentation by typing doc dir in the console. Matlab has extensive documentation and you can often find answers in there very quickly indeed.
If you need more specific answers you would need to post the code that you have so far, a description of what you want to happen and what is happening. I recommend the stackoverflow.com/tour as a good introduction to how to pose a good question.

Thanks michael and xenoclast for the help. I got this
d = dir('*.txt');
nfiles = length(d);
%Conversion of data in text format to Mat format
data = cell(1, nfiles);
for k = 1:nfiles
data{k} = importdata(d(k).name);
end

Related

MATLAB find references to function

MATLAB has some great tools, among which this dependency listing function stands out. I'm wondering, is there a way to perform the inverse operation?
That is, fList = matlab.codetools.requiredFilesAndProducts(files) takes a function or script and returns a list of all of the dependencies. I am trying to do the opposite: given a function, I want to find all the functions where this function is called, perhaps limited to the scope of my working directory.
The only solution I can think of is a brute force approach (which would be painfully slow given the speed of matlab.codetools.requiredFilesAndProducts). In MATLAB-esque pseudocode:
foi = file of interest
files = empty set of file lists
i = 0;
for all files f in dir
files{i} = matlab.codetools.requiredFilesAndProducts(f);
i = i + 1;
end
find indices in files where list contains foi
Surely there must be better way.
The best solution I have found is to use the MATLAB "find files" tool (in the latest versions, it is a button on the editor window). It is actually extremely fast, and you can have it search all the .m files in a directory structure, and return every line where a particular string is used - like say, the name of your function.
See if the Parents listing in the dependency report is what you're looking for. It only looks in the current directory, and it has some exclusions.

How to create blank .mat file from terminal?

Is there any way to create an empty .mat file from a terminal session? Basically, what I am doing is brain graph analysis. The software I am using, if an entire brain is scrubbed (ie, if the displacement of the brain is greater than a certain threshold) the output file will be left out or will be very small. When analyzing, however, I need to be able to eliminate both subjects from the analysis if the entire brain is scrubbed/too much of the brain is scrubbed. To accomplish this, the easiest way would be to simply check the dimensions of the output file within matlab, and if they are below the arbitrary threshold I decide then both subjects will just be skipped over for analysis. The issue is, I can easily check if a file contains too few remaining frames, however, if the resulting file contains no frames, it will entirely just not exist. As the outputs are all sorted, the only thing I need to do is check consecutive files' dimensions, and if one of the files does not contain enough values, then I can simply skip over it entirely. Simply touching a blank file obviously will not work, since it will not contain any encoding. I hope this is a good explanation for my motivation to do this, and if any of you know of any suggestions, please let me know.
A simple solution would be to create an empty file from Matlab and duplicate the file when needed from the console.
Just open Matlab, set to the destination folder and type this:
clear all
save empty.mat
Then, when needed, copy the file from the console. :)
Saving the contents of an empty struct creates an empty .mat file:
emptyStruct = struct;
save('myFile.mat','-struct','emptyStruct');

MATLAB - Stitch Together Multiple Files

I am new to MATLAB programming and some of the syntax escapes me. So I need a little help. Plus I need some complex looping ideas.
Here's the breakdown of what I have:
12 seperate .dat files, each titled something like output_1_x.dat, output_2_x.dat, etc.
each file is actually one piece of a whole that was seperated and processed
each .dat file is approx. 3.9 GB
Here's what I need to do:
create a single file containing all the data from each seperate file, i.e. I need to recreate the original file.
call this complete output file something like output_final.dat
it has to be done in MATLAB, there are no other alternatives (actually there maybe; see note below)
What is implied:
I will have to fread each 3.9 GBfile into chunks or packets, probably 100 mb at a time (using an imbedded loop?)
these packets will have to be read then written sequentially
after one file is read then written into output_final.dat, the next file is automatically read & written (the master loop).
Well, that's pretty much it. I did a search for 'merging mulitple files' and found this. That isn't exactly what I need to do...I don't need to take part of a file, or data from files, and write it to a new one. I'm simply...concatenating...? This would be simple in Java or Perl, but I only have MATLAB as a tool.
Note: I am however running KDE in OpenSUSE on a pretty powerful box. Maybe someone who is also an expert in terminal knows a command/script to do this from the kernel?
So on this site we usually would point you to whathaveyoutried.com but this question is well phrased.
I wont write the code but i will give you how I would do it. So first I am a bit confused about why you need to fread the file. Are you just appending one file onto the end of another?
You can actually use unix commands to achieve what you want:
files = dir('*.dat');
for i = 1:length(files)
string = sprintf('cat %s >> output_final.dat.temp', files(i).name);
unix(string);
end
That code should loop through all the files and pipe all of the content into output_final.dat.temp (then just rename it, we didn't want it to be included in anything);
But if you really want to use fread because you want to parse the lines in some manner then you can use the same process:
files = dir('*.dat');
fidF = fopen('output_final.dat', 'w');
for i = 1:length(files)
fid = fopen(files(i).name);
while(~feof(fid))
string = fgetl(fid) %You may choose to parse the string in some manner here
fprintf(fidF, '%s', string)
end
end
Just remember, if you are not parsing the lines this will take much much longer.
Hope this helps.
I suggest using a matlab.io.matfileclass objects on two of the files:
matObj1 = matfile('datafile1.mat')
matObj2 = matfile('datafile2.mat')
This does not load any data into memory. Then you can use the objects' methods to sequentialy save a variable from one file to another.
matObj1.varName = matObj2.varName
You can get all the variables in one file with fieldnames(mathObj1) and loop through to copy contents from one file to another. You can then clear some space by removing the copied fields. Or you can use a bit more risky procedure by directly moving the data:
matObj1.varName = rmfield(matObj2,'varName')
Just a disclaimer: haven't tried it, use at own risk.

vectorization - importing excel files into matlab

I've written the following function for importing excel files into matlab. The function works fine, where by inserting the path name of the files, the scripts imports them into the workspace. The function is shown below:
function Data = xls_function(pathName);
%Script imports the relevant .xls files into matlab - ensure that the .xls
%files are stored in a folder specified by 'pathName'.
%--------------------------------------------------------------------------
TopFolder = pathName;
dirListing = dir(TopFolder);%Lists the folders in the directory specified
%by pathName.
dirListing = dirListing(3:end);%Remove the first two structures as they
%are only pointers.
for i = 1:length(dirListing);
SubFolder{i} = dirListing(i,1).name;%obtain the name of each folder in
%the specified path.
SubFolderPath{i} = fullfile(pathName, dirListing(i,1).name);%obtain
%the path name for each of the folders.
ExcelFile{i} = dir(fullfile(SubFolderPath{i},'*.xls'));%find the
%number of .xls files in each of the SubFolders.
for j = 1:length(ExcelFile{1,i});
ExcelFileName{1,i}{j,1} = ExcelFile{1,i}(j,1).name;%find the name
%of each .xls file in each of the SubFolders.
for k = 1:length(ExcelFileName);
for m = 1:length(ExcelFileName{1,k});
[status{1,k}{m,1},sheets{1,k}{m,1},format{1,k}{m,1}]...
= xlsfinfo((fullfile(pathName,SubFolder{1,k},...
ExcelFileName{1,k}{m,1})));%gather information on the
%.xls files i.e. worksheet names.
Name_worksheet{1,k}{m,1} = sheets{1,k}{m,1}{1,end};%obtain
%the name of each of the .xls worksheets within
%each spreadsheet.
end
end
end
end
for n = 1:length(ExcelFileName);
for o = 1:length(ExcelFileName{1,n});
%require two loops as the number of excel spreadsheets varies
%from the number of worksheets in each spreadsheet.
TXT{1,n}{o,1} = xlsread(fullfile(pathName,SubFolder{1,n},...
ExcelFileName{1,n}{o,1}),Name_worksheet{1,n}{o,1});%import the
%relevant data from excel by using the spreadsheet and
%worksheet names previously obtained.
Data.(SubFolder{n}){o,1} = TXT{1,n}{o,1};
end
end
The only problem with the script is that it takes too long to run if the number of .xls files is large. I've read that vectorization would improve the running time, therefore I am asking for any advice on how I could alter this code to run faster, through vectorization.
I realise that reading a code like this isn't easy (especially as my form of coding is by no means as efficient as I would like) but any advice provided would be much appreciated.
I don't think vectorization applies to your problem - but one after the other.
As an example for your data you could use cellfun to turn a loop vectorized:
tmp = ExcelFileName{1,n}
result_cell = cellfun(#(x) xlsread(fullfile(pathName,x)),tmp, 'UniformOutput', false))
But the key problem is the poor implementation of xlsread and the other excel related functions in matlab. What they do is with every(!) function call they create a new excel process (which is hidden) in which they perform your command and then end it.
I remember a tool at matlab central that reused the same excel instance and thus was very quick - but unfortunately I can no longer find it. But maybe you can find an example there on which you can base your own reader which reuses it.
On a related note - Excel has the stupid limitation that it doesn't allow you two files with the same name to be opened at the same time - and then fails with some error. So if you run your reading vectorized/parallel you are in for a whole new fun of strange errors :D
For myself I found the only propper way to deal with these documents through java with Apache POI libraries. These have the nice advantage you don't need Excel installed - but unfortunatly require some programming.

Reading large csv files with strings containing commas as one field

I have a large .csv file (~26000 rows). I want to be able to read it into matlab. Another problem is that it contains a collection of strings delimited by commas in one of the fields.
I'm having trouble reading it. I tried stuff like tdfread, which won't work here. Any tricks with textscan i should be aware about?
Is there any other way?
I'm not sure what is generating your CSV file but that is your problem.
The point of a CSV file, is that the file itself designates separation of fields. If the text of the CSV contains commas, then nothing you can do will help you. How would ANY program know when the text in a single field contains commas, or when that comma is a field delimiter?
Proper CSV would have a text qualifier. Some generators/readers gives you the option to use one. The standard text qualifier is a " (quote). Its changeable, though, because your text may contain those, too.
Again, its all about generating proper CSV content.
There's a chance that xlsread won't give you the answer you expect -- do the strings always appear in the same columns, for example? I think (as everyone else seems to :-) that it would be more robust to just use
fid = fopen('yourfile.csv');
and then either textscan
t = textscan(fid, '%s', delimiter', sprintf('\n'));
t = t{1};
or just fgetl (the example in the help is perfect).
After that you can do some line-by-line processing -- using textscan again on the text content of each line, for example, is a nice, quick way to get a cell-array that will allow fast analysis of each line.
You have a problem because you're reading it in as a .csv, and you have commas within your data. You can get it in Excel and manipulate the date, possibly extract the unwanted commas with Excel formulas. I work with .csv files for DB imports quite a bit. I imagine matLab has similar rules, which is - no commas in your data.
Can you tell us more about your data? Are there commas throughout, our just one column? Maybe you can read it in as tab delimited?
Are you using a Unix system? The reason I am asking is that you could use a command-line function such as sed and regular expressions to clean those data files before you pass them into Matlab. Here is a link that explains how to do exactly what you are looking for.
Since, as others have observed, your file is CSV with commas inside what you think of as a single field, it's going to be hard to persuade Matlab that that really is only one field. I think your best strategy is going to be to read one line at a time, into a string acting as a buffer, and to translate it, field-by-field, into the variables or other data structures that you want. Since Matlab has in-built regular expression capabilities this shouldn't be too hard.
And, as others have already suggested, posting a sample of your data would help us to help you.
One easy solution is:
path='C:\folder1\folder2\';
data = 'data.csv';
data = dataset('xlsfile',sprintf('%s\%s', path,data));
Of course you could also do the following:
[data,path] = uigetfile('C:\folder1\folder2\*.csv');
data = dataset('xlsfile',sprintf('%s\%s', path,data));
now you will have loaded the data as dataset. An easy way to get a column 1 for example is
double(data(1))