Memory usage cell array approach - matlab

I'm having some issues with a particular piece of code (see code below). As it happens I need to get the variable Thetas. However, before the code executes I don't know how long will Thetas be or the dimensions of their matrices, since it all depends on the variable "hidden_layers".
I decided to build a blank cell array and append them as they are created for later usage. But I was wondering what would be the cheapest way to compute this. I have used the same approach many times throughout my code and I'm having some issues with memory usage.
% Prompt the user for how many layers will the neural network have
hidden_layers = input('How many layers on the NN?: ');
% Group of variables required to get Thetas
X = csvread('myData.csv');
n = size(X, 2);
hidden_layer_size = 60;
num_labels = 39;
% The length of Thetas{} will be => hidden_layers + 1
Thetas = {};
% Function randomize(element1, element2) returns a random matrix with
% dimensions: element2 x (element1 + 1)
% There will always be Thetas{1}
Thetas{1} = randomize(n,hidden_layer_size);
% Now build the rest of the Thetas based on the number of hidden_layers
for i = 1:(hidden_layers)
if (i == hidden_layers)
Thetas{i} = randomize(hidden_layer_size, num_labels);
else
Thetas{i} = randomize(hidden_layer_size, hidden_layer_size);
end
end
PD: I've also tried to forget about the cell array and build a jx1 vector with all the Thetas appended, which I reshape later on for further computations. However, it doesn't seem to save any processing time according to the profiler and it's tiring to be reshaping it all the time.

Related

Fastest approach to copying/indexing variable parts of 3D matrix

I have large sets of 3D data consisting of 1D signals acquired in 2D space.
The first step in processing this data is thresholding all signals to find the arrival of a high-amplitude pulse. This pulse is present in all signals and arrives at different times.
After thresholding, the 3D data set should be reordered so that every signal starts at the arrival of the pulse and what came before is thrown away (the end of the signals is of no importance, as of now i concatenate zeros to the end of all signals so the data remains the same size).
Now, I have implemented this in the following manner:
First, i start by calculating the sample number of the first sample exceeding the threshold in all signals
M = randn(1000,500,500); % example matrix of realistic size
threshold = 0.25*max(M(:,1,1)); % 25% of the maximum in the first signal as threshold
[~,index] = max(M>threshold); % indices of first sample exceeding threshold in all signals
Next, I want all signals to be shifted so that they all start with the pulse. For now, I have implemented it this way:
outM = zeros(size(M)); % preallocation for speed
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
This works fine, and i know for-loops are not that slow anymore, but this easily takes a few seconds for the datasets on my machine. A single iteration of the for-loop takes about 0.05-0.1 sec, which seems slow to me for just copying a vector containing 500-2000 double values.
Therefore, I have looked into the best way to tackle this, but for now I haven't found anything better.
I have tried several things: 3D masks, linear indexing, and parallel loops (parfor).
for 3D masks, I checked to see if any improvements are possible. Therefore i first contruct a logical mask, and then compare the speed of the logical mask indexing/copying to the double nested for loop.
%% set up for logical mask copying
AA = logical(ones(500,1)); % only copy the first 500 values after the threshold value
Mask = logical(zeros(size(M)));
Jepla = zeros(500,size(M,2),size(M,3));
for i = 1:size(M,2)
for j = 1:size(M,3)
Mask(index(1,i,j):index(1,i,j)+499,i,j) = AA;
end
end
%% speed comparison
tic
Jepla = M(Mask);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
The for-loop is faster every time, even though there is more that's copied.
Next, linear indexing.
%% setup for linear index copying
%put all indices in 1 long column
LongIndex = reshape(index,numel(index),1);
% convert to linear indices and store in new variable
linearIndices = sub2ind(size(M),LongIndex,repmat(1:size(M,2),1,size(M,3))',repelem(1:size(M,3),size(M,2))');
% extend linear indices with those of all values to copy
k = zeros(numel(M),1);
count = 1;
for i = 1:numel(LongIndex)
values = linearIndices(i):size(M,1)*i;
k(count:count+length(values)-1) = values;
count = count + length(values);
end
k = k(1:count-1);
% get linear indices of locations in new matrix
l = zeros(length(k),1);
count = 1;
for i = 1:numel(LongIndex)
values = repelem(LongIndex(i)-1,size(M,1)-LongIndex(i)+1);
l(count:count+length(values)-1) = values;
count = count + length(values);
end
l = k-l;
% create new matrix
outM = zeros(size(M));
%% speed comparison
tic
outM(l) = M(k);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
Again, the alternative approach, linear indexing, is (a lot) slower.
After this failed, I learned about parallelisation, and though this would for sure speed up my code.
By reading some of the documentation around parfor and trying it out a bit, I changed my code to the following:
gcp;
outM = zeros(size(M));
inM = mat2cell(M,size(M,1),ones(size(M,2),1),size(M,3));
tic
parfor i = 1:500
for j = 1:500
outM(:,i,j) = [inM{i}(index(1,i,j):end,1,j);zeros(index(1,i,j)-1,1)];
end
end
end
toc
I changed it so that "outM" and "inM" would both be sliced variables, as I read this is best. Still this is very slow, a lot slower than the original for loop.
So now the question, should I give up on trying to improve the speed of this operation? Or is there another way in which to do this? I have searched a lot, and for now do not see how to speed this up.
Sorry for the long question, but I wanted to show what I tried.
Thank you in advance!
Not sure if an option in your situation, but looks like cell arrays are actually faster here:
outM2 = cell(size(M,2),size(M,3));
tic;
for i = 1:size(M,2)
for j = 1:size(M,3)
outM2{i,j} = M(index(1,i,j):end,i,j);
end
end
toc
And a second idea which also came out faster, batch all data which have to be shifted by the same value:
tic;
for i = 1:unique(index).'
outM(1:size(M,1)+1-i,index==i) = M(i:end,index==i);
end
toc
It totally depends on your data if this approach is actually faster.
And yes integer valued and logical indexing can be mixed

to find mean square error of two cell arrays of different sizes

I have two cell arrays. One is 'trans_blk' of size <232324x1> consists of cells of size <8x8> and another 'ca' is of size <1024x1> consists of cells of size <8x8>.
I want to compute mean square error (MSE) for each cell of 'ca' with respect to every cell of 'trans_blk'.
I used the following code to compute:
m=0;
for ii=0:7
for jj=0:7
m=m+((trans_blk{:,1}(ii,jj)-ca{:,1}(ii,jj))^2);
end
end
m=m/(size of cell); //size of cell=8*8
disp('MSE=',m);
Its giving an error. Bad cell reference operation in MATLAB.
A couple of ways that I figured you could go:
% First define the MSE function
mse = #(x,y) sum(sum((x-y).^2))./numel(x);
I'm a big fan of using bsxfun for things like this, but unfortunately it doesn't operate on cell arrays. So, I borrowed the singleton expansion form of the answer from here.
% Singleton expansion way:
mask = bsxfun(#or, true(size(A)), true(size(B))');
idx_A = bsxfun(#times, mask, reshape(1:numel(A), size(A)));
idx_B = bsxfun(#times, mask, reshape(1:numel(B), size(B))');
func = #(x,y) cellfun(#(a,b) mse(a,b),x,y);
C = func(A(idx_A), B(idx_B));
Now, if that is a bit too crazy (or if explicitly making the arrays by A(idx_A) is too big), then you could always try a loop approach such as the one below.
% Or a quick loop:
results = zeros(length(A),length(B));
y = B{1};
for iter = 1:length(B)
y = B{iter};
results(:,iter) = cellfun(#(x) mse(x,y) ,A);
end
If you run out of memory: Think of what you are allocating: a matrix of doubles that is (232324 x 1024) elements. (That's a decent chunk of memory. Depending on your system, that could be close to 2GB of memory...)
If you can't hold it all in memory, then you might have to decide what you are going to do with all the MSE's and either do it in batches, or find a machine that you can run the full simulation/code on.
EDIT
If you only want to keep the sum of all the MSEs (as OP states in comments below), then you can save on memory by
% Sum it as you go along:
results = zeros(length(A),1);
y = B{1};
for iter = 1:length(B)
y = B{iter};
results = results + cellfun(#(x) mse(x,y) ,A);
end
results =sum (results);

Regarding loop structure in Matlab for an iterative procedure

I'm trying to code a loop in Matlab that iteratively solves for an optimal vector s of zeros and ones. This is my code
N = 150;
s = ones(N,1);
for i = 1:N
if s(i) == 0
i = i + 1;
else
i = i;
end
select = s;
HI = (item_c' * (weights.*s)) * (1/(weights'*s));
s(i) = 0;
CI = (item_c' * (weights.*s)) * (1/(weights'*s));
standarderror_afterex = sqrt(var(CI - CM));
standarderror_priorex = sqrt(var(HI - CM));
ratio = (standarderror_afterex - standarderror_priorex)/(abs(mean(weights.*s) - weights'*select));
ratios(i) = ratio;
s(i) = 1;
end
[M,I] = min(ratios);
s(I) = 0;
This code sets the element to zero in s, which has the lowest ratio. But I need this procedure to start all over again, using the new s with one zero, to find the ratios and exclude the element in s that has the lowest ratio. I need that over and over until no ratios are negative.
Do I need another loop, or do I miss something?
I hope that my question is clear enough, just tell me if you need me to explain more.
Thank you in advance, for helping out a newbie programmer.
Edit
I think that I need to add some form of while loop as well. But I can't see how to structure this. This is the flow that I want
With all items included (s(i) = 1 for all i), calculate HI, CI and the standard errors and list the ratios, exclude item i (s(I) = 0) which corresponds to the lowest negative ratio.
With the new s, including all ones but one zero, calculate HI, CI and the standard errors and list the ratios, exclude item i, which corresponds to the lowest negative ratio.
With the new s, now including all ones but two zeros, repeat the process.
Do this until there is no negative element in ratios to exclude.
Hope that it got more clear now.
Ok. I want to go through a few things before I list my code. These are just how I would try to do it. Not necessarily the best way, or fastest way even (though I'd think it'd be pretty quick). I tried to keep the structure as you had in your code, so you could follow it nicely (even though I'd probably meld all the calculations down into a single function or line).
Some features that I'm using in my code:
bsxfun: Learn this! It is amazing how it works and can speed up code, and makes some things easier.
v = rand(n,1);
A = rand(n,4);
% The two lines below compute the same value:
W = bsxfun(#(x,y)x.*y,v,A);
W_= repmat(v,1,4).*A;
bsxfun dot multiplies the v vector with each column of A.
Both W and W_ are matrices the same size as A, but the first will be much faster (usually).
Precalculating dropouts: I made select a matrix, where before it was a vector. This allows me to then form a variable included using logical constructs. The ~(eye(N)) produces an identity matrix and negates it. By logically "and"ing it with select, then the $i$th column is now select, with the $i$th element dropped out.
You were explicitly calculating weights'*s as the denominator in each for-loop. By using the above matrix to calculate this, we can now do a sum(W), where the W is essentially weights.*s in each column.
Take advantage of column-wise operations: the var() and the sqrt() functions are both coded to work along the columns of a matrix, outputting the action for a matrix in the form of a row vector.
Ok. the full thing. Any questions let me know:
% Start with everything selected:
select = true(N);
stop = false; % Stopping flag:
while (~stop)
% Each column leaves a variable out...
included = ~eye(N) & select;
% This calculates the weights with leave-one-out:
W = bsxfun(#(x,y)x.*y,weights,included);
% You can comment out the line below, if you'd like...
W_= repmat(weights,1,N).*included; % This is the same as previous line.
% This calculates the weights before dropping the variables:
V = bsxfun(#(x,y)x.*y,weights,select);
% There's different syntax, depending on whether item_c is a
% vector or a matrix...
if(isvector(item_c))
HI = (item_c' * V)./(sum(V));
CI = (item_c' * W)./(sum(W));
else
% For example: item_c is a matrix...
% We have to use bsxfun() again
HI = bsxfun(#rdivide, (item_c' * V),sum(V));
CI = bsxfun(#rdivide, (item_c' * W),sum(W));
end
standarderror_afterex = sqrt(var(bsxfun(#minus,HI,CM)));
standarderror_priorex = sqrt(var(bsxfun(#minus,CI,CM)));
% or:
%
% standarderror_afterex = sqrt(var(HI - repmat(CM,1,size(HI,2))));
% standarderror_priorex = sqrt(var(CI - repmat(CM,1,size(CI,2))));
ratios = (standarderror_afterex - standarderror_priorex)./(abs(mean(W) - sum(V)));
% Identify the negative ratios:
negratios = ratios < 0;
if ~any(negratios)
% Drop out of the while-loop:
stop = true;
else
% Find the most negative ratio:
neginds = find(negratios);
[mn, mnind] = min(ratios(negratios));
% Drop out the most negative one...
select(neginds(mnind),:) = false;
end
end % end while(~stop)
% Your output:
s = select(:,1);
If for some reason it doesn't work, please let me know.

How to accumulate submatrices without looping (subarray smoothing)?

In Matlab I need to accumulate overlapping diagonal blocks of a large matrix. The sample code is given below.
Since this piece of code needs to run several times, it consumes a lot of resources. The process is used in array signal processing for a so-called subarray smoothing or spatial smoothing. Is there any way to do this faster?
% some values for parameters
M = 1000; % size of array
m = 400; % size of subarray
n = M-m+1; % number of subarrays
R = randn(M)+1i*rand(M);
% main code
S = R(1:m,1:m);
for i = 2:n
S = S + R(i:m+i-1,i:m+i-1);
end
ATTEMPTS:
1) I tried the following alternative vectorized version, but unfortunately it became much slower!
[X,Y] = meshgrid(1:m);
inds1 = sub2ind([M,M],Y(:),X(:));
steps = (0:n-1)*(M+1);
inds = repmat(inds1,1,n) + repmat(steps,m^2,1);
RR = sum(R(inds),2);
S = reshape(RR,m,m);
2) I used Matlab coder to create a MEX file and it became much slower!
I've personally had to fasten up some portions of my code lately. Being not an expert at all, I would recommend trying the following:
1) Vectorize:
Getting rid of the for-loop
S = R(1:m,1:m);
for i = 2:n
S = S + R(i:m+i-1,i:m+i-1)
end
and replacing it for an alternative based on cumsum should be the way to go here.
Note: will try and work on this approach on a future Edit
2) Generating a MEX-file:
In some instances, you could simply fire up the Matlab Coder app (given that you have it in your current Matlab version).
This should generate a .mex file for you, that you can call as it was the function that you are trying to replace.
Regardless of your choice (1) or 2)), you should profile your current implementation with tic; my_function(); toc; for a fair number of function calls, and compare it with your current implementation:
my_time = zeros(1,10000);
for count = 1:10000
tic;
my_function();
my_time(count) = toc;
end
mean(my_time)

Speeding up the conditional filling of huge sparse matrices

I was wondering if there is a way of speeding up (maybe via vectorization?) the conditional filling of huge sparse matrices (e.g. ~ 1e10 x 1e10). Here's the sample code where I have a nested loop, and I fill in a sparse matrix only if a certain condition is met:
% We are given the following cell arrays of the same size:
% all_arrays_1
% all_arrays_2
% all_mapping_arrays
N = 1e10;
% The number of nnz (non-zeros) is unknown until the loop finishes
huge_sparse_matrix = sparse([],[],[],N,N);
n_iterations = numel(all_arrays_1);
for iteration=1:n_iterations
array_1 = all_arrays_1{iteration};
array_2 = all_arrays_2{iteration};
mapping_array = all_mapping_arrays{iteration};
n_elements_in_array_1 = numel(array_1);
n_elements_in_array_2 = numel(array_2);
for element_1 = 1:n_elements_in_array_1
element_2 = mapping_array(element_1);
% Sanity check:
if element_2 <= n_elements_in_array_2
item_1 = array_1(element_1);
item_2 = array_2(element_2);
huge_sparse_matrix(item_1,item_2) = 1;
end
end
end
I am struggling to vectorize the nested loop. As far as I understand the filling a sparse matrix element by element is very slow when the number of entries to fill is large (~100M). I need to work with a sparse matrix since it has dimensions in the 10,000M x 10,000M range. However, this way of filling a sparse matrix in MATLAB is very slow.
Edits:
I have updated the names of the variables to reflect their nature better. There are no function calls.
Addendum:
This code builds the matrix adjacency for a huge graph. The variable all_mapping_arrays holds mapping arrays (~ adjacency relationship) between nodes of the graph in a local representation, which is why I need array_1 and array_2 to map the adjacency to a global representation.
I think it will be the incremental update of the sparse matrix, rather than the loop based conditional that will be slowing things down.
When you add a new entry to a sparse matrix via something like A(i,j) = 1 it typically requires that the whole matrix data structure is re-packed. The is an expensive operation. If you're interested, MATLAB uses a CCS data structure (compressed column storage) internally, which is described under the Data Structure section here. Note the statement:
This scheme is not effcient for manipulating matrices one element at a
time
Generally, it's far better (faster) to accumulate the non-zero entries in the matrix as a set of triplets and then make a single call to sparse. For example (warning - brain compiled code!!):
% Inputs:
% N
% prev_array and next_array
% n_labels_prev and n_labels_next
% mapping
% allocate space for matrix entries as a set of "triplets"
ii = zeros(N,1);
jj = zeros(N,1);
xx = zeros(N,1);
nn = 0;
for next_label_ix = 1:n_labels_next
prev_label = mapping(next_label_ix);
if prev_label <= n_labels_prev
prev_global_label = prev_array(prev_label);
next_global_label = next_array(next_label_ix);
% reallocate triplets on demand
if (nn + 1 > length(ii))
ii = [ii; zeros(N,1)];
jj = [jj; zeros(N,1)];
xx = [xx; zeros(N,1)];
end
% append a new triplet and increment counter
ii(nn + 1) = next_global_label; % row index
jj(nn + 1) = prev_global_label; % col index
xx(nn + 1) = 1.0; % coefficient
nn = nn + 1;
end
end
% we may have over-alloacted our triplets, so trim the arrays
% based on our final counter
ii = ii(1:nn);
jj = jj(1:nn);
xx = xx(1:nn);
% just make a single call to "sparse" to pack the triplet data
% as a sparse matrix object
sp_graph_adj_global = sparse(ii,jj,xx,N,N);
I'm allocating in chunks of N entries at a time. Assuming that you know alot about the structure of your matrix you might be able to use a better value here.
Hope this helps.