MATLAB memory allocation: loop over mat-files - matlab

I have n mat-files containing parts of a single huge matrix. I want to load the mat-files and concatenate the row,column,value-vectors to build matrix W. Right now, I'm doing the following, but it's really slow and I know I shouldn't dynamically increase the size of rows,cols,vals:
rows=[];cols=[];vals=[];
for ii=1:n
var = load( sprintf('w%d.mat', ii) );
[r,c,v] = find( var.w );
rows = [rows; r];
cols = [cols; c];
vals = [vals; v];
end
W = sparse( rows, cols, vals );
Do you know a better way? Thanks in advance!
Solution
Based on Daniel R's suggestion, I solved it like this:
rows=zeros(maxSize,1);
cols=zeros(maxSize,1);
vals=zeros(maxSize,1);
idx = 1;
for ii=1:n
var = load( sprintf('w%d.mat', ii) );
[r,c,v] = find( var.w );
len = size(r,1);
rows(idx:(idx-1+len))=r;
cols(idx:(idx-1+len))=c;
vals(idx:(idx-1+len))=v;
idx = idx+len;
end
W = sparse( rows, cols, vals );
Extremely fast, thanks a lot!

You need to preallocate an array. Assuming r,c and v have the size a*b each.
In total, you need a*n rows and b columns, thus preallocate using rows=nan(a*n,b)
To write the Data into the array, you have to set the correct indices: rows((ii-1)*a+1:ii*a,1:end)=r

A few ideas.
It looks to me the matrices have the same sizes.
Another bottleneck might be the find function.
could it happen that two matrices have different values at the same index? That might lead to problems.
You could do:
var = load( 'w1.mat' );
W = var.w;
for ii = 2:n
var = load( sprintf('w%d.mat', ii) );
ig = var.w~=0;
W(ig) = var.w(ig);
end
in case var.w is not sparse add a W = sparse(W)

Here is a trick that sometimes helps to speed things up as if your variables were initialized. However, sometimes it just does not help so the only way to find out is to try:
Replace lines like:
rows = [rows; r];
by lines like:
rows(end+(1:numel(r))) = r;

Related

Function call with variable number of input arguments when number of input arguments is not explicitly known

I have a variable pth which is a cell array of dimension 1xn where n is a user input. Each of the elements in pth is itself a cell array and length(pth{k}) for k=1:n is variable (result of another function). Each element pth{k}{kk} where k=1:n and kk=1:length(pth{k}) is a 1D vector of integers/node numbers of again variable length. So to summarise, I have a variable number of variable-length vectors organised in a avriable number of cell arrays.
I would like to try and find all possible intersections when you take a vector at random from pth{1}, pth{2}, {pth{3}, etc... There are various functions on the File Exchange that seem to do that, for example this one or this one. The problem I have is you need to call the function this way:
mintersect(v1,v2,v3,...)
and I can't write all the inputs in the general case because I don't know explicitly how many there are (this would be n above). Ideally, I would like to do some thing like this;
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},...,pth{n}{1})
mintersect(pth{1}{1},pth{2}{2},pth{3}{1},...,pth{n}{1})
mintersect(pth{1}{1},pth{2}{3},pth{3}{1},...,pth{n}{1})
etc...
mintersect(pth{1}{1},pth{2}{length(pth{2})},pth{3}{1},...,pth{n}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},...,pth{n}{1})
etc...
keep going through all the possible combinations, but I can't write this in code. This function from the File Exchange looks like a good way to find all possible combinations but again I have the same problem with the function call with the variable number of inputs:
allcomb(1:length(pth{1}),1:length(pth{2}),...,1:length(pth{n}))
Does anybody know how to work around this issue of function calls with variable number of input arguments when you can't physically specify all the input arguments because their number is variable? This applies equally to MATLAB and Octave, hence the two tags. Any other suggestion on how to find all possible combinations/intersections when taking a vector at random from each pth{k} welcome!
EDIT 27/05/20
Thanks to Mad Physicist's answer, I have ended up using the following which works:
disp('Computing intersections for all possible paths...')
grids = cellfun(#(x) 1:numel(x), pth, 'UniformOutput', false);
idx = cell(1, numel(pth));
[idx{:}] = ndgrid(grids{:});
idx = cellfun(#(x) x(:), idx, 'UniformOutput', false);
idx = cat(2, idx{:});
valid_comb = [];
k = 1;
for ii = idx'
indices = reshape(num2cell(ii), size(pth));
selection = cellfun(#(p,k) p{k}, pth, indices, 'UniformOutput', false);
if my_intersect(selection{:})
valid_comb = [valid_comb k];
endif
k = k+1;
end
My own version is similar but uses a for loop instead of the comma-separated list:
disp('Computing intersections for all possible paths...')
grids = cellfun(#(x) 1:numel(x), pth, 'UniformOutput', false);
idx = cell(1, numel(pth));
[idx{:}] = ndgrid(grids{:});
idx = cellfun(#(x) x(:), idx, 'UniformOutput', false);
idx = cat(2, idx{:});
[n_comb,~] = size(idx);
temp = cell(n_pipes,1);
valid_comb = [];
k = 1;
for k = 1:n_comb
for kk = 1:n_pipes
temp{kk} = pth{kk}{idx(k,kk)};
end
if my_intersect(temp{:})
valid_comb = [valid_comb k];
end
end
In both cases, valid_comb has the indices of the valid combinations, which I can then retrieve using something like:
valid_idx = idx(valid_comb(1),:);
for k = 1:n_pipes
pth{k}{valid_idx(k)} % do something with this
end
When I benchmarked the two approaches with some sample data (pth being 4x1 and the 4 elements of pth being 2x1, 9x1, 8x1 and 69x1), I got the following results:
>> benchmark
Elapsed time is 51.9075 seconds.
valid_comb = 7112
Elapsed time is 66.6693 seconds.
valid_comb = 7112
So Mad Physicist's approach was about 15s faster.
I also misunderstood what mintersect did, which isn't what I wanted. I wanted to find a combination where no element present in two or more vectors, so I ended writing my version of mintersect:
function valid_comb = my_intersect(varargin)
% Returns true if a valid combination i.e. no combination of any 2 vectors
% have any elements in common
comb_idx = combnk(1:nargin,2);
[nr,nc] = size(comb_idx);
valid_comb = true;
k = 1;
% Use a while loop so that as soon as an intersection is found, the execution stops
while valid_comb && (k<=nr)
temp = intersect(varargin{comb_idx(k,1)},varargin{comb_idx(k,2)});
valid_comb = isempty(temp) && valid_comb;
k = k+1;
end
end
Couple of helpful points to construct a solution:
This post shows you how to construct a Cartesian product between arbitrary arrays using ndgrid.
cellfun accepts multiple cell arrays simultaneously, which you can use to index specific elements.
You can capture a variable number of arguments from a function using cell arrays, as shown here.
So let's get the inputs to ndgrid from your outermost array:
grids = cellfun(#(x) 1:numel(x), pth, 'UniformOutput', false);
Now you can create an index that contains the product of the grids:
index = cell(1, numel(pth));
[index{:}] = ndgrid(grids{:});
You want to make all the grids into column vectors and concatenate them sideways. The rows of that matrix will represent the Cartesian indices to select the elements of pth at each iteration:
index = cellfun(#(x) x(:), index, 'UniformOutput', false);
index = cat(2, index{:});
If you turn a row of index into a cell array, you can run it in lockstep over pth to select the correct elements and call mintersect on the result.
for i = index'
indices = num2cell(i');
selection = cellfun(#(p, i) p{i}, pth, indices, 'UniformOutput', false);
mintersect(selection{:});
end
This is written under the assumption that pth is a row array. If that is not the case, you can change the first line of the loop to indices = reshape(num2cell(i), size(pth)); for the general case, and simply indices = num2cell(i); for the column case. The key is that the cell from of indices must be the same shape as pth to iterate over it in lockstep. It is already generated to have the same number of elements.
I believe this does the trick. Calls mintersect on all possible combinations of vectors in pth{k}{kk} for k=1:n and kk=1:length(pth{k}).
Using eval and messing around with sprintf/compose a bit. Note that typically the use of eval is very much discouraged. Can add more comments if this is what you need.
% generate some data
n = 5;
pth = cell(1,n);
for k = 1:n
pth{k} = cell(1,randi([1 10]));
for kk = 1:numel(pth{k})
pth{k}{kk} = randi([1 100], randi([1 10]), 1);
end
end
% get all combs
str_to_eval = compose('1:length(pth{%i})', 1:numel(pth));
str_to_eval = strjoin(str_to_eval,',');
str_to_eval = sprintf('allcomb(%s)',str_to_eval);
% use eval to get all combinations for a given pth
all_combs = eval(str_to_eval);
% and make strings to eval in intersect
comp = num2cell(1:numel(pth));
comp = [comp ;repmat({'%i'}, 1, numel(pth))];
str_pattern = sprintf('pth{%i}{%s},', comp{:});
str_pattern = str_pattern(1:end-1); % get rid of last ,
strings_to_eval = cell(length(all_combs),1);
for k = 1:size(all_combs,1)
strings_to_eval{k} = sprintf(str_pattern, all_combs(k,:));
end
% and run eval on all those strings
result = cell(length(all_combs),1);
for k = 1:size(all_combs,1)
result{k} = eval(['mintersect(' strings_to_eval{k} ')']);
%fprintf(['mintersect(' strings_to_eval{k} ')\n']); % for debugging
end
For a randomly generated pth, the code produces the following strings to evaluate (where some pth{k} have only one cell for illustration):
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
As Madphysicist pointed out, I misunderstood the initial structure of your initial cell array, however the point stands. The way to pass an unknown number of arguments to a function is via comma-separated-list generation, and your function needs to support it by being declared with varargin. Updated example below.
Create a helper function to collect a random subcell from each main cell:
% in getRandomVectors.m
function Out = getRandomVectors(C) % C: a double-jagged array, as described
N = length(C);
Out = cell(1, N);
for i = 1 : length(C)
Out{i} = C{i}{randi( length(C{i}) )};
end
end
Then assuming you already have an mintersect function defined something like this:
% in mintersect.m
function Intersections = mintersect( varargin )
Vectors = varargin;
N = length( Vectors );
for i = 1 : N; for j = 1 : N
Intersections{i,j} = intersect( Vectors{i}, Vectors{j} );
end; end
end
Then call this like so:
C = { { 1:5, 2:4, 3:7 }, {1:8}, {2:4, 3:9, 2:8} }; % example double-jagged array
In = getRandomVectors(C); % In is a cell array of randomly selected vectors
Out = mintersect( In{:} ); % Note the csl-generator syntax
PS. I note that your definition of mintersect differs from those linked. It may just be you didn't describe what you want too well, in which case my mintersect function is not what you want. What mine does is produce all possible intersections for the vectors provided. The one you linked to produces a single intersection which is common to all vectors provided. Use whichever suits you best. The underlying rationale for using it is the same though.
PS. It is also not entirely clear from your description whether what you're after is a random vector k for each n, or the entire space of possible vectors over all n and k. The above solution does the former. If you want the latter, see MadPhysicist's solution on how to create a cartesian product of all possible indices instead.

insert value in a matrix in a for loop

I wrote this matlab code in order to concatenate the results of the integration of all the columns of a matrix extracted form a multi matrix array.
"datimf" is a matrix composed by 100 matrices, each of 224*640, vertically concatenated.
In the first loop i select every single matrix.
In the second loop i integrate every single column of the selected matrix
obtaining a row of 640 elements.
The third loop must concatenate vertically all the lines previously calculated.
Anyway i got always a problem with the third loop. Where is the error?
singleframe = zeros(224,640);
int_frame_all = zeros(1,640);
conc = zeros(100,640);
for i=0:224:(22400-224)
for j = 1:640
for k = 1:100
singleframe(:,:) = datimf([i+1:(i+223)+1],:);
int_frame_all(:,j) = trapz(singleframe(:,j));
conc(:,k) = vertcat(int_frame_all);
end
end
end
An alternate way to do this without using any explicit loops (edited in response to rayryeng's comment below. It's also worth noting that using cellfun may not be more efficient than explicitly looping.):
nmats = 100;
nrows = 224;
ncols = 640;
datimf = rand(nmats*nrows, ncols);
% convert to an nmats x 1 cell array containing each matrix
cellOfMats = mat2cell(datimf, ones(1, nmats)*nrows, ncols);
% Apply trapz to the contents of each cell
cellOfIntegrals = cellfun(#trapz, cellOfMats, 'UniformOutput', false);
% concatenate the results
conc = cat(1, cellOfIntegrals{:});
Taking inspiration from user2305193's answer, here's an even better "loop-free" solution, based on reshaping the matrix and applying trapz along the appropriate dimension:
datReshaped = reshape(datimf, nrows, nmats, ncols);
solution = squeeze(trapz(datReshaped, 1));
% verify solutions are equivalent:
all(solution(:) == conc(:)) % ans = true
I think I understand what you want. The third loop is unnecessary as both the inner and outer loops are 100 elements long. Also the way you have it you are assigning singleframe lots more times than necessary since it does not depend on the inner loops j or k. You were also trying to add int_frame_all to conc before int_frame_all was finished being populated.
On top of that the j loop isn't required either since trapz can operate on the entire matrix at once anyway.
I think this is closer to what you intended:
datimf = rand(224*100,640);
singleframe = zeros(224,640);
int_frame_all = zeros(1,640);
conc = zeros(100,640);
for i=1:100
idx = (i-1)*224+1;
singleframe(:,:) = datimf(idx:idx+223,:);
% for j = 1:640
% int_frame_all(:,j) = trapz(singleframe(:,j));
% end
% The loop is uncessary as trapz can operate on the entire matrix at once.
int_frame_all = trapz(singleframe,1);
%I think this is what you really want...
conc(i,:) = int_frame_all;
end
It looks like you're processing frames in a video.
The most efficent approach in my experience would be to reshape datimf to be 3-dimensional. This can easily be achieved with the reshape command.
something along the line of vid=reshape(datimf,224,640,[]); should get you far in this regard, where the 3rd dimension is time. vid(:,:,1) then would display the first frame of the video.

How to make the MATLAB for loop faster

I am working on a dataset with 3750 text files each containing 10240*2 data. The file names are "Data_F_Ind0001 to Data_F_Ind3750". I have written a code to read each column of every file one-by-one and performed empirical mode decomposition (EMD). The EMD produced four variables I1 to I4 and for each of these four some other function (petropy) is performed. The problem with the code is, it's very slow. Can anyone suggest how to make it fast? I appreciate your help. Thank you.
I have give the sample code for processing first 9 files out of 3750. I have used same for loop for remaining files.
clear all;
close all;
l =1;
for k = 1:9
filename = sprintf('Data_F_Ind000%d.txt',k);
% a(:,:,k) = load(filename);
data = load (filename);
x = data(:,1);
y = data (:,2);
alldata = eemd(x,0.01,10);
I1 = alldata (1,:);
I2 = alldata (2,:);
I3 = alldata (3,:);
I4 = alldata (4,:);
imf = {I1, I2, I3, I4};
for j = 1:4
m1(k,j)= petropy(imf{j},3,1,'order');
j=j+1;
l=l+1;
end
end
You don't seem to preallocate memory for m1(k,j). Add m1 = zeros(3750,4) in front of the for loop.
I'm assuming m1() is an array, if it's a stuct or something else, change it accordingly.
Edit:
E.g. like this:
clear all;
close all;
l =1;
m1 = zeros(3750,4);
for k = 1:9
....
It's an important topic when dealing with for loops iteratively generating data - i'd suggest reading this article
These are the things that comes to my mind by looking at your code:
Don't put columns of data in variables x and y. By doing this you are using twice the memory. In you function eemd simply use data(:,1) as input. The same thing applies to I1 to I4 but I guess it has less effect as they are small size variables.
You can try textscan or fscanf instead of sprintf. This should also improve your code.
Here is a more optimised version of your code:
m1 = zeros(3750, 4);
for k = 1:9
filename = sprintf('Data_F_Ind000%d.txt',k);
data = load(filename);
alldata = eemd(data(:, 1),0.01,10);
% for j = 1:4
% m1(k,j)= petropy(alldata(j, :), 3, 1, 'order');
% end
m1(k, :) = arrayfun(#(j) petropy(alldata(j, :), 3, 1, 'order'), 1:4);
end
I have replaced the inner for loop with arrayfun. In case you don't understand it, you could use the for loop that i have commented out

Variably name histogram in for loop

I'm trying to have a hist function in a for loop because I work with varying amounts of datasets each time and its much faster and easier that having to edit a script each time, but I can't get it right. Can I have some help please? In essence I'm trying to have this in a for loop for variable number of unc{i} datasets and i number of [h{i},x{i}] resulting arrays:
[h1,x1] = hist(unc1,range);
[h2,x2] = hist(unc2,range);
[h3,x3] = hist(unc3,range);
[h4,x4] = hist(unc4,range);
Any help would be greatly appreciated. Thanking you in advance
Desclaimer: the use of eval is dangerous!
Let's say you have n uncs arrays. You can use struct to store them
for ii=1:n
cmd = sprintf( 's.unc%d = unc%d;', ii, ii );
eval( cmd );
end
Once you have the uncs is a sttruct, you can simply
for ii=n:-1:1
[h{ii} x{ii}] = hist( s.(sprintf('unc%d',ii)), range );
end
Notes:
1. Note that I used a backward loop for computing the histograms: this is a nice trick to preallocate h and x, see this thread.
2. It is extremly unwise to use eval, therefore, it might be wiser to create the different uncs arrays as a struct fields to begin with, skipping the first part of this answer.
You can put each of your input datasets in a cell array, and the output of the histograms in a second cell array.
For example,
unc1 = rand(5,1);
unc2 = rand(5,1);
unc3 = rand(5,1);
unc_cell = {unc1, unc2, unc3};
h_cell = cell(3, 1);
x_cell = cell(3, 1);
for ii = 1:3
[h{ii} x{ii}] = hist(unc_cell{ii});
end
This does require preloading all of the datasets and holding them in memory simultaneously. If this would use too much memory, you can load the datasets in the for loop rather than preloading them.
For example,
h_cell = cell(3, 1);
x_cell = cell(3, 1);
for ii = 1:3
unc = load(sprintf('data_%d.mat', ii)); %You would replace this with your file name
[h{ii} x{ii}] = hist(unc);
end

How to solve the .mat file sorting?

I have three .mat files, A.mat, B.mat and C.mat. And the content of the .mat files look like this:
![enter image description here][1]
It is in r2 = [rs2,cs2,sortedValues2] format. How can I sort rs2(e.g, 3468, 3909...) of the three
.mat files together in increasing order, and count the number of appearance of each element of rs2?
Anyone can give me any suggestion?
Original
First, you're going to want load each files r2, then pull out it's rs(1, :) value into a column vector, then sort that column vector. Once you have the vector, use logical indexing to determine how many times each element repeats. Finally, attach those tow columns together to attach each element to it's corresponding count. The output vector will have duplicate rows, but I assume that won't be a problem.
allData = [load('A.mat',r2); load('B.mat',r2)l; load('C.mat',r2)];
colVector = allData(:, 1);
sortedVec = sort(colVector);
countVec = zeros(size(sortedVec));
for ii = 1:length(sortedVec)
countVec(ii) = sum(sortedVec==sortedVec(ii));
end
outputVec = [sortedVec, countVec]
Edit
Since your edited question is easy, and almost the same as your original, I'll answer it here. Most of the code is the same, you just need to get the data out of the cell array instead of the files. Like so:
colVector = [yourCellArray{:}];
sortedVec = sort(colVector);
countVec = zeros(size(sortedVec));
for ii = 1:length(sortedVec)
countVec(ii) = sum(sortedVec==sortedVec(ii));
end
outputVec = [sortedVec, countVec]
load A
rs2_1 = r2(:,1);
clearvars r2
load B
rs2_2 = r2(:,1);
clearvars r2
load C
rs2_3 = r2(:,1);
clearvars r2
% to combine
rs2_combined = [rs2_1;rs2_2;rs2_3];
% to sort
rs2_sorted = sort(rs2_combined);
% to count for appearance
rs2_count = hist(rs2_combined, min(rs2_combined):1:max(rs2_combined));
EDIT: using cell arrays
% recreate your situation
R = cell(11,10);
R = cellfun(#(x) [randi(3000,50,1),ones(50,1),ones(50,1)*-.008], R,'UniformOutput', false);
% extract rs2
r = cell2mat( reshape(R, [1,1,11,10]) );
rs2 = reshape( r(:,1,:,:), [50*11*10,1] );
% do what you want
rs2_sorted = sort(rs2);
rs2_count = hist(rs2, min(rs2):1:max(rs2));
note - I assumed you have 50x3 arrays. If merely 50x1, then reshape(R, [1,11,10]) and reshape( r, [50*11*10,1] ); also works.
hist put all numbers into different bins of different values. It's equivalent to do
rs2_scale = min(rs2_combined):1:max(rs2_combined);
rs2_count = zeros(1, length(rs2_scale));
for ii = 1:length(rs2_scale)
rs2_count(ii) = sum( rs2_combined == rs2_scale(ii) );
end
To remove zero-count numbers -
rs2_count(rs2_count==0) = [];
Then you can calculate the probability -
rs2_prob = rs2_count / sum(rs2_count);
Validate this answer by
>> sum(rs2_prob)
ans =
1.0000