Workaround for indexing limitations of matfile command - matlab

I have a very large number of large data files. I would like to be able to categorize the data in each file, and then save the filename to a cell array, such that at the end I'll have one cell array of filenames for each category of data, which I could then save to a mat file so that I can then come back later and run analysis on each category. It might look something like this:
MatObj = matfile('listOfCategorizedFilenames.mat');
MatObj.boring = {};
MatObj.interesting = {};
files = dir(directory);
K = numel(files);
for k=1:K
load(files(k).name,'data')
metric = testfunction(data)
if metric < threshold
MatObj.boring{end+1} = files(k).name;
else
MatObj.interesting{end+1} = files(k).name;
end
end
Because the list of files is very long, and testfunction can be slow, I'd like to set this to run unattended overnight or over the weekend (this is a stripped down version, metric might return one of several different categories), and in case of crashes or unforeseen errors, I'd like to save the data on the fly rather than populating a cell array in memory and dumping to disk at the end.
The problem is that using matfile will not allow cell indexing, so the save step throws an error. My question is, is there a workaround for this limitation? Is there better way to incrementally write the filenames to a list that would be easy to retrieve later?

I have no experience with matfile, so I cannot help you with that. As a quick and dirty solution, I would just write the filenames to two different text-files. Quick testing suggests that the data is flushed to disk straight away and that the text-files are OK even if you close matlab without doing a fclose (to simulate a crash). Untested code:
files = dir(directory);
K = numel(files);
boring = fopen('boring.txt', 'w');
interesting = fopen('interesting.txt', 'w');
for k=1:K
load(files(k).name,'data')
metric = testfunction(data)
if metric < threshold
fprintf(boring, '%s\n', files(k).name);
else
fprintf(interesting, '%s\n', files(k).name);
end
end
%be nice and close files
fclose(boring);
fclose(interesting);
Processing the boring/interesting text files afterwards should be trivial. If you would also write the directory listing to a separate file before starting the loop, it should be pretty easy (either by hand or automatically) to figure out where to continue in case of a crash.

Mat files are probably the most efficient way to store lists of files, but I guess whenever I've had this problem, I make a cell array and save it using xlswrite or fprintf into a document that I can just reload later.
You said the save step throws an error, so I assume this part is okay, right?
for k=1:K
load(files(k).name,'data')
metric = testfunction(data)
if metric < threshold
MatObj.boring{end+1} = files(k).name;
else
MatObj.interesting{end+1} = files(k).name;
end
end
Personally, I just then write,
xlswrite('name.xls', MatObj.interesting, 1, 'A1');
[~, ~, list] = xlsread('name.xls'); % later on
Or if you prefer text,
% I'm assuming here that it's just a single list of text strings.
fid = fopen('name.txt', 'w');
for row=1:nrows
fprintf(fid, '%s\n', MatObj.interesting{row});
end
fclose(fid);
And then later open with fscanf. I just use the xlswrite. I've never had a problem with it, and it's not noticeably slow enough to detract me from using it. I know my answer is just a workaround instead of a real solution, but I hope it helps.

Related

Matlab uses too much memory

I am using Matlab do run some evaluations and then I want to save the struct where the results are saved, for future use.
Problem: The first thing I have noticed was that the execution lasted too long, maybe 8hours, then when I wanted to save the struct maybe further 2hours. After several blocking and redoing the process, I finally managed to save a copy of the data. What I find it confusing is that the file is 150GB big.
Process: The code structure goes as follows: It iterates over .csv files in a folder (50 000), it reads them in as csv format files, extracts the needed columns and calculates the data.
My view: I guess, the whole iteration and extracting data from each file takes a lot of cache, which could slow the process down, as the time goes!? But, I still don't undestand why the final .mat file takes so much memory, since in the past for the same data, but different parameters, it didn't need that much space to save the results.
Question/s: Is it possible to reduce the size of the final file, without affecting the results? I base this question on the assumption that matlab is maybe saving additional information from the process?
Code Schema:
clc; close all; clear all; fclose('all');
result = struct('values_a', [], 'other', []);
counter = 1;
for i=1:length(dataNames)
try
structRead = ezread(nameOfFile, ',');
values_a = structRead.timestamp;
for j=1:length(values_a)
if(strcmp(values_a(j),'N'))
if(j==1)
values_a(j) = values_a(j+1);
elseif (j==length(values_a))
values_a(j) = values_a(j-1);
else
values_a(j) = (values_a(j-1)+values_a(j+1))/2;
end
end
result(counter).values_a(j) = values_a(j);
end
counter=counter+1;
catch
counterFailed= counterFailed+1;
end
end
end
save(path2save,'result','-v7.3');

Faster way to load .csv files in folder and display them using imshow in MATLAB

I have a piece of MATLAB code that works fine, but I wanted to know is there any faster way of performing the same task, where each .csv file is a 768*768 dimension matrix
Current code:
for k = 1:143
matFileName = sprintf('ang_thresholded%d.csv', k);
matData = load(matFileName);
imshow(matData)
end
Any help in this regard will be very helpful. Thank You!
In general, its better to separate the loading, computational and graphical stuff.
If you have enough memory, you should try to change your code to:
n_files=143;
% If you know the size of your images a priori:
matData=zeros( 768, 768,n_files); % prealocate for speed.
for k = 1:n_files
matFileName = sprintf('ang_thresholded%d.csv', k);
matData(:,:,k) = load(matFileName);
end
seconds=0.01;
for k=1:n_Files
%clf; %Not needed in your case, but needed if you want to plot more than one thing (hold on)
imshow(matData(:,:,k));
pause(seconds); % control "framerate"
end
Note the use of pause().
Here is another option using Matlab's data stores which are designed to work with large datasets or lots of smaller sets. The TabularTextDatastore is specifically for this kind of text based data.
Something like the following. However, note that since I don't have any test files it is sort of notional example ...
ttds = tabularTextDatastore('.\yourDirPath\*.csv'); %Create the data store
while ttds.hasdata %This turns false after reading the last file.
temp = read(ttds); %Returns a Matlab table class
imshow(temp.Variables)
end
Since it looks like your filenames' numbering is not zero padded (e.g. 1 instead of 001) then the file order might get messed up so that may need addressed as well. Anyway I thought this might be a good alternative approach worth considering depending on what else you want to do with the data and how much of it there might be.

save variables: add new values to a new row in each iteration MATLAB

I have a loop as below
for chnum=1:300
PI=....
area=....
save ('Result.mat' ,'chnum' ,'PI' ,'area',' -append') %-append
%% I like to have sth like below
% 1, 1.2,3.7
% 2, 1,8, 7.8
% .....
end
but it doesn't save. Do you have any idea why?
Best
Analysis of the Problem
The matlab help page for save states that the -append option will append new variables to the saved file. It will not append new rows to the already saved matrices.
Solution
To achieve what you intended you have to store your data in matrices and save the whole matrice with a single call to save().
PI = zeros(300,1);
area = zeros(300,1);
for chnum=1:300
PI(chnum)=.... ;
area(chnum)=.... ;
end
save ('Result.mat' ,'chnum' ,'PI' ,'area');
For nicer memory management I have added a pre-allocation of the arrays.
Well, even if it's not part of the question, I don't think that you are using a good approach to save your calculations. Reading/writing operations performed on the disk (saving data on a file is falls in this case) are very expensive in terms of time. This is why I suggest you to proceed as follows:
res = NaN(300,2)
for chnum = 1:300
PI = ...
area = ...
res(chnum,:) = [PI area]; % saving chnum looks a bit like an overkill since you can retrieve it just using size(res,1) when you need it...
end
save('Result.mat','res');
Basically, instead of processing a row and saving it into the file, then processing another row and saving it into the file, etc... you just save your whole data into a matrix and you just save your final result to file.

How to get number from file in matlab?

In my program I want to save the counter value. Like 1 and if the counter increased then 1 will be replace with 2 like that. I created a file test.txt and manually entered a number 1, in the directory and used this code to read that number.
f=fopen('test.txt');
cno=fread(f);
cno
fclose(f);
But the value of cno is in ASCII I guess because its saved like this in file.
I try to used functions like parseInt but didn't work.
Please tell me how to write as well as read a number from file.
Also if there is any other way possible to save that counter value instead of file. I want to retain value even If I close matlab code that's why I am saving it to the file.
There are a couple methods for this. If you don't need to ensure compatibility with external programs, then MATLAB's save and load commands should be more than sufficient.
A basic example:
a = 5;
save('test.mat', 'a');
clear a
load('test.mat');
disp(a)
See the documentation for save for syntax information.
As a general note I would advise calling load with an output declared, which will load all of the variables in your saved *.mat file into a structure, preventing them from overwriting existing data in your workspace.
Using test.mat from the previous example:
mydata = load('test.mat');
disp(mydata.a)
EDIT: Now, if you wanted to store this to a generic file, the most common method would be to use fprintf and fscanf:
a = 5;
fID = fopen('test.txt', 'w+');
fprintf(fID, '%u', a);
fclose(fID);
clear a
fID = fopen('test.txt', 'r')
a = fscanf(fID, '%u');
fclose(fID);

How to create array data-structures in MATLAB?

I basically have a large data set file and I want to write a MATLAB script that creates a data structure for it. I have tried to read about using structured arrays in MATLAB, but I haven't found a solution of how to do this. I don't really have a lot of experience in writing scripts on MATLAB.
Edited: My data set is a large list of items with, say, 10 different characteristics of each item written down. So for example, say 100,000 listings of houses and characteristics given could be price, county, state, date when sold, etc. This file is in a txt., xls., or any format you like to play with.
I would like to write a MATLAB script that creates a data structure of it say in the format:
house(i).price
house(i).county
house(i).state
house(i).date
etc
Any suggestions to the right direction or examples of teaching how to do this would be greatly appreciated.
This seems like a very reasonable question, and one that can be easily addressed.
The format of the file, really makes this problem easy or hard. I really don't like .xls files for this kind of work myself, but I realize, you get what you get. Let's assume it's in a tab delimited text file like:
Price County State Date
100000 Sherlock London 2001-10-01
134000 Holmes Dartmoor 2011-12-30
123456 Watson Boston 2003-04-15
IfI would just read the whole thing into an parse the field name row and use dynamic structure naming to make the array of structures.
fid = fopen('data.txt','r');
tline = fgetl(fid);
flds = regexp(tline,'\s*','split');
% initialize the first prototype struct
data = struct();
for ii=1:length(flds)
data.(flds{ii}) = [];
end
ii = 1;
% get the first line of data
tline = fgetl(fid);
while ischar(tline)
% parse the data
rowData = regexp(tline,'\s*','split');
% we're assuming no missing data, etc
% populate the structure
for jj=1:length(flds)
data(ii).(flds{jj}) = rowData{jj};
end
% since we don't know how many lines we have
% we could figure that out, but we won't now
% we'll just use the size extending feature of
% matlab arrays, even though it's slow, just
% to show how we would do it
tline = fgetl(fid);
ii = ii + 1;
end
fclose(fid)
Hope this gets you started!