How to parallelize operations on a huge array - matlab

I couldn't find any relevant topics so I'm posting this one:
How can I parallelize operations/calculations on a huge array? The problem is that I use the arrays with size of 10000000x10 which is basically small enough to operate on in line, but while running on parfor - causes not enough memory error.
The code goes:
function aggregatedRes = umbrellaFct(preparedInputsAsaCellArray)
% Description: function used to parallelize calculation
% preparedInputsAsaCellArray - cell array with size of 1x10, for example first
% cell {1,1} would be: {array,corr,df}
% array - an array 1e7 by 10, with data from different regions to be aggregated
% corr - correlation matrix
% df - degrees of freedom as an integer value
% create a function handle from child function
fcnHndl = #childFct;
% For each available cell - calculate and aggregate
parfor j = 1:numel(preparedInputsAsaCellArray)
output = fcnHndl(preparedInputsAsaCellArray{j}{:});
end
% Extract results
for i = 1:numel(preparedInputsAsaCellArray)
aggregatedRes(:,i) = output{j};
end
end
And child function used in the umberella function:
function aggregated = childFct(array, corr, df)
% Description:
% array - an array 1e7 by 10, with data from different regions to be aggregated
% corr - correlation matrix
% df - degrees of freedom as an integer value
% get num of cases for multivariate nums
cases = lenght(array(:,1));
% preallocate space
corrMatrix = double(zeros(cases, size(corr,1)))
u = corrMatrix;
soerted = corrMatrix;
s = zeros(lenght(array(:,1)), lenght(array(1,:)));
% calc multivariate nums
u = mvtrnd(corr, df, cases);
clear corr, cases
% calc t-students cumulative dist
u = tcdf(u, df);
clear df
% double sort
[~, sorted] = sort(u);
clear u
[~, corrMatrix] = sort(sorted);
clear sorted
for jj = 1:lenght(lossData(1,:))
s(:,jj) = array(corrMatrix(:,jj),jj);
end
clear array corrMatrix jj
aggregated = sum(s,2);
end
I already tried with distributed memory but ultimately failed.
I will apreciate any help or hint!
Edit: The logic behind functions is to calculate and aggregate data from different regions. In total there are ten arrays, all with size 1e7x10. My idea was to use parfor to simultaneously calculate and aggregate them - to save time. It works fine for smaller arrays (like 1e6x10) but runned out of memory for 1e7x10 (in case of more than 2 pools). I suspect the way i used and implemented parfor could be wrong and inefficient.

Related

Verify Law of Large Numbers in MATLAB

The problem:
If a large number of fair N-sided dice are rolled, the average of the simulated rolls is likely to be close to the mean of 1,2,...N i.e. the expected value of one die. For example, the expected value of a 6-sided die is 3.5.
Given N, simulate 1e8 N-sided dice rolls by creating a vector of 1e8 uniformly distributed random integers. Return the difference between the mean of this vector and the mean of integers from 1 to N.
My code:
function dice_diff = loln(N)
% the mean of integer from 1 to N
A = 1:N
meanN = sum(A)/N;
% I do not have any idea what I am doing here!
V = randi(1e8);
meanvector = V/1e8;
dice_diff = meanvector - meanN;
end
First of all, make sure everytime you ask a question that it is as clear as possible, to make it easier for other users to read.
If you check how randi works, you can see this:
R = randi(IMAX,N) returns an N-by-N matrix containing pseudorandom
integer values drawn from the discrete uniform distribution on 1:IMAX.
randi(IMAX,M,N) or randi(IMAX,[M,N]) returns an M-by-N matrix.
randi(IMAX,M,N,P,...) or randi(IMAX,[M,N,P,...]) returns an
M-by-N-by-P-by-... array. randi(IMAX) returns a scalar.
randi(IMAX,SIZE(A)) returns an array the same size as A.
So, if you want to use randi in your problem, you have to use it like this:
V=randi(N, 1e8,1);
and you need some more changes:
function dice_diff = loln(N)
%the mean of integer from 1 to N
A = 1:N;
meanN = mean(A);
V = randi(N, 1e8,1);
meanvector = mean(V);
dice_diff = meanvector - meanN;
end
For future problems, try using the command
help randi
And matlab will explain how the function randi (or other function) works.
Make sure to check if the code above gives the desired result
As pointed out, take a closer look at the use of randi(). From the general case
X = randi([LowerInt,UpperInt],NumRows,NumColumns); % UpperInt > LowerInt
you can adapt to dice rolling by
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
as an example. Exchanging NumRolls and NumSamplePaths will yield Rolls.', or transpose(Rolls).
According to the Law of Large Numbers, the updated sample average after each roll should converge to the true mean, ExpVal (short for expected value), as the number of rolls (trials) increases. Notice that as NumRolls gets larger, the sample mean converges to the true mean. The image below shows this for two sample paths.
To get the sample mean for each number of dice rolls, I used arrayfun() with
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
which is equivalent to using the cumulative sum, cumsum(), to get the same result.
CumulativeAvg1 = (cumsum(Rolls(:,1))./(1:NumRolls).'); % equivalent
% MATLAB R2019a
% Create Dice
NumSides = 6; % positive nonzero integer
NumRolls = 200;
NumSamplePaths = 2;
% Roll Dice
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
% Output Statistics
ExpVal = mean(1:NumSides);
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
CumulativeAvgError1 = CumulativeAvg1 - ExpVal;
CumulativeAvg2 = arrayfun(#(jj)mean(Rolls(1:jj,2)),[1:NumRolls]);
CumulativeAvgError2 = CumulativeAvg2 - ExpVal;
% Plot
figure
subplot(2,1,1), hold on, box on
plot(1:NumRolls,CumulativeAvg1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvg2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(ExpVal,'k-')
title('Average')
xlabel('Number of Trials')
ylim([1 NumSides])
subplot(2,1,2), hold on, box on
plot(1:NumRolls,CumulativeAvgError1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvgError2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(0,'k-')
title('Error')
xlabel('Number of Trials')

Fast way to compute row by row matrix correlation

I have two very large matrices (228453x460) and I want to compute correlation between rows.
for i=1:228453
if(vec1_preprocess(i,1))
for j=1:228453
df = effdf(vec1_preprocess(i,:)',vec2_preprocess(j,:)');
corr_temp = corr(vec1_preprocess(i,:)', vec2_preprocess(j,:)');
p = calculate_p(corr_temp, df);
temp = (meanVec(i)+p)/2;
meanVec(i) = temp;
end
disp(i);
end
end
This takes ~1day. Is there a direct way to compute this?
Edit: Code for effdf
function df = effdf(ts1,ts2);
%function df = effdf(ts1,ts2);
ts1=ts1-mean(ts1);
ts2=ts2-mean(ts2);
N=length(ts1);
ac1=xcorr(ts1);
ac1=ac1/max(ac1); % normalized autocorrelation
ac1=ac1(((length(ac1)+3)/2):((length(ac1)+3)/2+floor(N/4)));
ac2=xcorr(ts2);
ac2=ac2/max(ac2); % normalized autocorrelation
ac2=ac2(((length(ac2)+3)/2):((length(ac2)+3)/2+floor(N/4)));
df = 1/((1/N)+(2/N)*sum(((N-(1:length(ac1)))/N)'.*ac1.*ac2));
Since you didn't post the code, I assume that your custom functions calculate_p and effdf are perfectly optimized and don't represent the bottleneck of your script. Let's focus on what we have.
The first problem I see is:
if (vec1_preprocess(i,1))
A check over 228453 iterations can sensibly increase the running time. Hence, extract only the matrix rows that don't contain a 0 in the first column and perform your calculations on those:
idx = vec1_preprocess(:,1) ~= 0;
vec1_preprocess = vec1_preprocess(idx,:);
for i = 1:size(vec1_preprocess,1)
% ...
end
The second problem is corr. It seems like you are computing p-values also, using calculate_p. Why don't you use the buil-in p-values returned by the function as second output argument?
[c,p] = corr(A,B);
Alternatively, if Pearson's correlation is what you are looking for, you could replace corr with corrcoef to see if it produces a better performance.
Last but not least (in fact it's the most important thing): is there any reason why you are performing this computation row by row instead of running it on the whole matrices?
If you read the documentation, you'll see that corr computes the correlation between columns, not rows.
To convert rows into columns and columns into rows, simply transpose the matrix:
tmp1 = vec1_preprocess';
tmp2 = vec2_preprocess';
C = corr(tmp1,tmp2);

Memory usage cell array approach

I'm having some issues with a particular piece of code (see code below). As it happens I need to get the variable Thetas. However, before the code executes I don't know how long will Thetas be or the dimensions of their matrices, since it all depends on the variable "hidden_layers".
I decided to build a blank cell array and append them as they are created for later usage. But I was wondering what would be the cheapest way to compute this. I have used the same approach many times throughout my code and I'm having some issues with memory usage.
% Prompt the user for how many layers will the neural network have
hidden_layers = input('How many layers on the NN?: ');
% Group of variables required to get Thetas
X = csvread('myData.csv');
n = size(X, 2);
hidden_layer_size = 60;
num_labels = 39;
% The length of Thetas{} will be => hidden_layers + 1
Thetas = {};
% Function randomize(element1, element2) returns a random matrix with
% dimensions: element2 x (element1 + 1)
% There will always be Thetas{1}
Thetas{1} = randomize(n,hidden_layer_size);
% Now build the rest of the Thetas based on the number of hidden_layers
for i = 1:(hidden_layers)
if (i == hidden_layers)
Thetas{i} = randomize(hidden_layer_size, num_labels);
else
Thetas{i} = randomize(hidden_layer_size, hidden_layer_size);
end
end
PD: I've also tried to forget about the cell array and build a jx1 vector with all the Thetas appended, which I reshape later on for further computations. However, it doesn't seem to save any processing time according to the profiler and it's tiring to be reshaping it all the time.

to find mean square error of two cell arrays of different sizes

I have two cell arrays. One is 'trans_blk' of size <232324x1> consists of cells of size <8x8> and another 'ca' is of size <1024x1> consists of cells of size <8x8>.
I want to compute mean square error (MSE) for each cell of 'ca' with respect to every cell of 'trans_blk'.
I used the following code to compute:
m=0;
for ii=0:7
for jj=0:7
m=m+((trans_blk{:,1}(ii,jj)-ca{:,1}(ii,jj))^2);
end
end
m=m/(size of cell); //size of cell=8*8
disp('MSE=',m);
Its giving an error. Bad cell reference operation in MATLAB.
A couple of ways that I figured you could go:
% First define the MSE function
mse = #(x,y) sum(sum((x-y).^2))./numel(x);
I'm a big fan of using bsxfun for things like this, but unfortunately it doesn't operate on cell arrays. So, I borrowed the singleton expansion form of the answer from here.
% Singleton expansion way:
mask = bsxfun(#or, true(size(A)), true(size(B))');
idx_A = bsxfun(#times, mask, reshape(1:numel(A), size(A)));
idx_B = bsxfun(#times, mask, reshape(1:numel(B), size(B))');
func = #(x,y) cellfun(#(a,b) mse(a,b),x,y);
C = func(A(idx_A), B(idx_B));
Now, if that is a bit too crazy (or if explicitly making the arrays by A(idx_A) is too big), then you could always try a loop approach such as the one below.
% Or a quick loop:
results = zeros(length(A),length(B));
y = B{1};
for iter = 1:length(B)
y = B{iter};
results(:,iter) = cellfun(#(x) mse(x,y) ,A);
end
If you run out of memory: Think of what you are allocating: a matrix of doubles that is (232324 x 1024) elements. (That's a decent chunk of memory. Depending on your system, that could be close to 2GB of memory...)
If you can't hold it all in memory, then you might have to decide what you are going to do with all the MSE's and either do it in batches, or find a machine that you can run the full simulation/code on.
EDIT
If you only want to keep the sum of all the MSEs (as OP states in comments below), then you can save on memory by
% Sum it as you go along:
results = zeros(length(A),1);
y = B{1};
for iter = 1:length(B)
y = B{iter};
results = results + cellfun(#(x) mse(x,y) ,A);
end
results =sum (results);

Speeding up the conditional filling of huge sparse matrices

I was wondering if there is a way of speeding up (maybe via vectorization?) the conditional filling of huge sparse matrices (e.g. ~ 1e10 x 1e10). Here's the sample code where I have a nested loop, and I fill in a sparse matrix only if a certain condition is met:
% We are given the following cell arrays of the same size:
% all_arrays_1
% all_arrays_2
% all_mapping_arrays
N = 1e10;
% The number of nnz (non-zeros) is unknown until the loop finishes
huge_sparse_matrix = sparse([],[],[],N,N);
n_iterations = numel(all_arrays_1);
for iteration=1:n_iterations
array_1 = all_arrays_1{iteration};
array_2 = all_arrays_2{iteration};
mapping_array = all_mapping_arrays{iteration};
n_elements_in_array_1 = numel(array_1);
n_elements_in_array_2 = numel(array_2);
for element_1 = 1:n_elements_in_array_1
element_2 = mapping_array(element_1);
% Sanity check:
if element_2 <= n_elements_in_array_2
item_1 = array_1(element_1);
item_2 = array_2(element_2);
huge_sparse_matrix(item_1,item_2) = 1;
end
end
end
I am struggling to vectorize the nested loop. As far as I understand the filling a sparse matrix element by element is very slow when the number of entries to fill is large (~100M). I need to work with a sparse matrix since it has dimensions in the 10,000M x 10,000M range. However, this way of filling a sparse matrix in MATLAB is very slow.
Edits:
I have updated the names of the variables to reflect their nature better. There are no function calls.
Addendum:
This code builds the matrix adjacency for a huge graph. The variable all_mapping_arrays holds mapping arrays (~ adjacency relationship) between nodes of the graph in a local representation, which is why I need array_1 and array_2 to map the adjacency to a global representation.
I think it will be the incremental update of the sparse matrix, rather than the loop based conditional that will be slowing things down.
When you add a new entry to a sparse matrix via something like A(i,j) = 1 it typically requires that the whole matrix data structure is re-packed. The is an expensive operation. If you're interested, MATLAB uses a CCS data structure (compressed column storage) internally, which is described under the Data Structure section here. Note the statement:
This scheme is not effcient for manipulating matrices one element at a
time
Generally, it's far better (faster) to accumulate the non-zero entries in the matrix as a set of triplets and then make a single call to sparse. For example (warning - brain compiled code!!):
% Inputs:
% N
% prev_array and next_array
% n_labels_prev and n_labels_next
% mapping
% allocate space for matrix entries as a set of "triplets"
ii = zeros(N,1);
jj = zeros(N,1);
xx = zeros(N,1);
nn = 0;
for next_label_ix = 1:n_labels_next
prev_label = mapping(next_label_ix);
if prev_label <= n_labels_prev
prev_global_label = prev_array(prev_label);
next_global_label = next_array(next_label_ix);
% reallocate triplets on demand
if (nn + 1 > length(ii))
ii = [ii; zeros(N,1)];
jj = [jj; zeros(N,1)];
xx = [xx; zeros(N,1)];
end
% append a new triplet and increment counter
ii(nn + 1) = next_global_label; % row index
jj(nn + 1) = prev_global_label; % col index
xx(nn + 1) = 1.0; % coefficient
nn = nn + 1;
end
end
% we may have over-alloacted our triplets, so trim the arrays
% based on our final counter
ii = ii(1:nn);
jj = jj(1:nn);
xx = xx(1:nn);
% just make a single call to "sparse" to pack the triplet data
% as a sparse matrix object
sp_graph_adj_global = sparse(ii,jj,xx,N,N);
I'm allocating in chunks of N entries at a time. Assuming that you know alot about the structure of your matrix you might be able to use a better value here.
Hope this helps.