In model predictive control, an optimization problem is solved at every time instant and it is very common to write down the matrices in a compact form. Without going into the details of the optimization problem, suppose I have matrices and . I need to compute the matrices and defined as
Note that N_p is called "prediction horizon" and it is not the order of matrix A. How can I compute these matrices in a fast and efficient way? In Matlab, I have done the following, but maybe there is a more efficient way to compute these matrices:
A_cal = zeros(length(A)*Np, length(A)); %calligraphic A matrix
B_cal = zeros(size(B,1)*Np, size(Bd,2)*Np); %calligraphic B matrix
temp = eye(size(A));
for j = 1:Np
A_cal(1+(j-1)*length(A):j*length(A),:)= temp;
if j > 1
%The current row is obtained as shift of the previous row, and only the block in the first column is computed
B_cal(1+(j-1)*size(B,1):j*size(B,1),:) = circshift(B_cal(1+(j-2)*size(B,1):(j-1)*size(B,1),:),size(B,2),2);
B_cal(1+(j-1)*size(B,1):j*size(B,1),1:size(B,2)) = temp_prev*B_cent;
temp_prev = temp; %this variable contains A^(j-1)
temp = temp * A_cent; %use temp variable to speed up the matrix power computation

I assume you already solved your problem, but here is the code from my exercises sheet, creating your matrices.
S_x being the first matrix.
function S_x = compute_Sx(A,N)
S_x = eye(size(A));
for i=1:N
S_x = [S_x;A^i];
function S_u = compute_Su(A,B,N)
S_u = zeros(size(A,1)*N,size(B,2)*N);
for i=1:N
S_u = S_u + kron(diag(ones(N-i+1,1),-i+1),A^(i-1)*B);
S_u = [zeros(size(A,1),size(B,2)*N);S_u];
How can I exploit parallelism when defining values in a sparse matrix?

The following MATLAB code loops through all elements of a matrix with size 2IJ x 2IJ.
for i=1:(I-2)
for j=1:(J-2)
ij1 = i*J+j+1; % row
ij2 = i*J+j+1 + I*J; % col
D1(ij1,ij1) = 2;
D1(ij1,ij2) = -1;
Is there any way I can parallelize it use MATLAB's parfor command? You can assume any element not defined is 0. So this matrix ends up being sparse (mostly 0s).
Before using parfor it is recommended to read the guidelines related to decide when to use parfor. Specially this:
Generally, if you want to make code run faster, first try to vectorize it.
Here vectorization can be used effectively to compute indices of the nonzero elements. Those indices are used in function sparse. For it you need to define one of i or j to be a column vector and another a row vector. Implicit expansion takes effect and indices are computed.
I = 300;
J = 300;
i = (1:I-2).';
j = 1:J-2;
ij1 = i*J+j+1;
ij2 = i*J+j+1 + I*J;
D1 = sparse(ij1, ij1, 2, 2*I*J, 2*I*J) + sparse(ij1, ij2, -1, 2*I*J, 2*I*J);
However for the comparison this can be a way of using parfor (not tested):
D1 = sparse (2*I*J, 2*I*J);
parfor i=1:(I-2)
for j=1:(J-2)
ij1 = i*J+j+1;
ij2 = i*J+j+1 + I*J;
D1 = D1 + sparse([ij1;ij1], [ij1;ij2], [2;-1], 2*I*J, 2*I*J) ;
Here D1 used as reduction variable.

Fastest approach to copying/indexing variable parts of 3D matrix

I have large sets of 3D data consisting of 1D signals acquired in 2D space.
The first step in processing this data is thresholding all signals to find the arrival of a high-amplitude pulse. This pulse is present in all signals and arrives at different times.
After thresholding, the 3D data set should be reordered so that every signal starts at the arrival of the pulse and what came before is thrown away (the end of the signals is of no importance, as of now i concatenate zeros to the end of all signals so the data remains the same size).
Now, I have implemented this in the following manner:
First, i start by calculating the sample number of the first sample exceeding the threshold in all signals
M = randn(1000,500,500); % example matrix of realistic size
threshold = 0.25*max(M(:,1,1)); % 25% of the maximum in the first signal as threshold
[~,index] = max(M>threshold); % indices of first sample exceeding threshold in all signals
Next, I want all signals to be shifted so that they all start with the pulse. For now, I have implemented it this way:
outM = zeros(size(M)); % preallocation for speed
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
This works fine, and i know for-loops are not that slow anymore, but this easily takes a few seconds for the datasets on my machine. A single iteration of the for-loop takes about 0.05-0.1 sec, which seems slow to me for just copying a vector containing 500-2000 double values.
Therefore, I have looked into the best way to tackle this, but for now I haven't found anything better.
I have tried several things: 3D masks, linear indexing, and parallel loops (parfor).
for 3D masks, I checked to see if any improvements are possible. Therefore i first contruct a logical mask, and then compare the speed of the logical mask indexing/copying to the double nested for loop.
%% set up for logical mask copying
AA = logical(ones(500,1)); % only copy the first 500 values after the threshold value
Mask = logical(zeros(size(M)));
Jepla = zeros(500,size(M,2),size(M,3));
for i = 1:size(M,2)
for j = 1:size(M,3)
Mask(index(1,i,j):index(1,i,j)+499,i,j) = AA;
%% speed comparison
Jepla = M(Mask);
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
The for-loop is faster every time, even though there is more that's copied.
Next, linear indexing.
%% setup for linear index copying
%put all indices in 1 long column
LongIndex = reshape(index,numel(index),1);
% convert to linear indices and store in new variable
linearIndices = sub2ind(size(M),LongIndex,repmat(1:size(M,2),1,size(M,3))',repelem(1:size(M,3),size(M,2))');
% extend linear indices with those of all values to copy
k = zeros(numel(M),1);
count = 1;
for i = 1:numel(LongIndex)
values = linearIndices(i):size(M,1)*i;
k(count:count+length(values)-1) = values;
count = count + length(values);
k = k(1:count-1);
% get linear indices of locations in new matrix
l = zeros(length(k),1);
count = 1;
for i = 1:numel(LongIndex)
values = repelem(LongIndex(i)-1,size(M,1)-LongIndex(i)+1);
l(count:count+length(values)-1) = values;
count = count + length(values);
l = k-l;
% create new matrix
outM = zeros(size(M));
%% speed comparison
outM(l) = M(k);
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
Again, the alternative approach, linear indexing, is (a lot) slower.
After this failed, I learned about parallelisation, and though this would for sure speed up my code.
By reading some of the documentation around parfor and trying it out a bit, I changed my code to the following:
outM = zeros(size(M));
inM = mat2cell(M,size(M,1),ones(size(M,2),1),size(M,3));
parfor i = 1:500
for j = 1:500
outM(:,i,j) = [inM{i}(index(1,i,j):end,1,j);zeros(index(1,i,j)-1,1)];
I changed it so that "outM" and "inM" would both be sliced variables, as I read this is best. Still this is very slow, a lot slower than the original for loop.
So now the question, should I give up on trying to improve the speed of this operation? Or is there another way in which to do this? I have searched a lot, and for now do not see how to speed this up.
Sorry for the long question, but I wanted to show what I tried.
Thank you in advance!
Not sure if an option in your situation, but looks like cell arrays are actually faster here:
outM2 = cell(size(M,2),size(M,3));
for i = 1:size(M,2)
for j = 1:size(M,3)
outM2{i,j} = M(index(1,i,j):end,i,j);
And a second idea which also came out faster, batch all data which have to be shifted by the same value:
for i = 1:unique(index).'
outM(1:size(M,1)+1-i,index==i) = M(i:end,index==i);
It totally depends on your data if this approach is actually faster.
And yes integer valued and logical indexing can be mixed

When does vectorization is a better or worse solution than a loop? [duplicate]

In Matlab, I am trying to vectorise my code to improve the simulation time. However, the result I got was that I deteriorated the overall efficiency.
To understand the phenomenon I created 3 distinct functions that does the same thing but with different approach :
The main file :
n = 10000;
Value = cumsum(ones(1,n));
NbLoop = 10000;
time01 = zeros(1,NbLoop);
time02 = zeros(1,NbLoop);
time03 = zeros(1,NbLoop);
for test = 1 : NbLoop
vector1 = function01(n,Value);
time01(test) = toc ;
vector2 = function02(n,Value);
time02(test) = toc ;
vector3 = function03(n,Value);
time03(test) = toc ;
hold on
plot( time01, 'b')
plot( time02, 'g')
plot( time03, 'r')
The function 01:
function vector = function01(n,Value)
vector = zeros( 2*n,1);
for k = 1:n
vector(2*k -1) = Value(k);
vector(2*k) = Value(k);
The function 02:
function vector = function02(n,Value)
vector = zeros( 2*n,1);
vector(1:2:2*n) = Value;
vector(2:2:2*n) = Value;
The function 03:
function vector = function03(n,Value)
MatrixTmp = transpose([Value(:), Value(:)]);
vector = MatrixTmp (:);
The blue plot correspond to the for - loop.
n = 100:
n = 10000:
When I run the code with n = 100, the more efficient solution is the first function with the for loop.
When n = 10000 The first function become the less efficient.
Do you have a way to know how and when to properly replace a for-loop by a vectorised counterpart?
What is the impact of index searching with array of tremendous dimensions ?
Does Matlab compute in a different manner an array of dimensions 3 or higher than a array of dimension 1 or 2?
Is there a clever way to replace a while loop that use the result of an iteration for the next iteration?
Using MATLAB Online I see something different:
n 10000 100
function01 5.6248e-05 2.2246e-06
function02 1.7748e-05 1.9491e-06
function03 2.7748e-05 1.2278e-06
function04 1.1056e-05 7.3390e-07 (my version, see below)
Thus, the loop version is always slowest. Method #2 is faster for very large matrices, Method #3 is faster for very small matrices.
The reason is that method #3 makes 2 copies of the data (transpose or a matrix incurs a copy), and that is bad if there's a lot of data. Method #2 uses indexing, which is expensive, but not as expensive as copying lots of data twice.
I would suggest this function instead (Method #4), which transposes only vectors (which is essentially free). It is a simple modification of your Method #3:
function vector = function04(n,Value)
vector = [Value(:).'; Value(:).'];
vector = vector(:);
Do you have a way to know how and when to properly replace a for-loop by a vectorised counterpart?
In general, vectorized code is always faster if there are no large intermediate matrices. For small data you can vectorize more aggressively, for large data sometimes loops are more efficient because of the reduced memory pressure. It depends on what is needed for vectorization.
What is the impact of index searching with array of tremendous dimensions?
This refers to operations such as d = data(data==0). Much like everything else, this is efficient for small data and less so for large data, because data==0 is an intermediate array of the same size as data.
Does Matlab compute in a different manner an array of dimensions 3 or higher than a array of dimension 1 or 2?
No, not in general. Functions such as sum are implemented in a dimensionality-independent waycitation needed.
Is there a clever way to replace a while loop that use the result of an iteration for the next iteration?
It depends very much on what the operations are. Functions such as cumsum can often be used to vectorize this type of code, but not always.
This is my timing code, I hope it shows how to properly use timeit:
%n = 10000;
n = 100;
Value = cumsum(ones(1,n));
vector1 = function01(n,Value);
vector2 = function02(n,Value);
vector3 = function03(n,Value);
vector4 = function04(n,Value);
function vector = function01(n,Value)
vector = zeros(2*n,1);
for k = 1:n
vector(2*k-1) = Value(k);
vector(2*k) = Value(k);
function vector = function02(n,Value)
vector = zeros(2*n,1);
vector(1:2:2*n) = Value;
vector(2:2:2*n) = Value;
function vector = function03(n,Value)
MatrixTmp = transpose([Value(:), Value(:)]);
vector = MatrixTmp(:);
function vector = function04(n,Value)
vector = [Value(:).'; Value(:).'];
vector = vector(:);

Matlab Vectorize

I have a probability matrix(glcm) of size 256x256x20. I have reshaped the matrix to
65536x20, so that I can eliminate one loop (along the 3rd dimension).
I want to do the following calculation.
for y = 1:256
for x = 1:256
if (ismember((x + y),(2:2*256)))
p_xplusy((x+y),:) = p_xplusy((x+y),:) + glcm(((y-1)*256+x),:);
So the p_xplusy will be a 511x20 matrix which each element is the sum of the diagonal of nxn sub matrix (where n belongs to 1:256) of the original 256x256x20 matrix.
This code block is making my program inefficient and I want to vectorize this loop. Any help would be appreciated.
Since your if statement is just checking whether x+y is less than or equal to 256, just force it to always be, and remove excess loops:
for y = 1:256
for x = 1:256-y
p_xplusy((x+y),:) = p_xplusy((x+y),:) + glcm(((y-1)*256+x),:);
This should provide a noticeable speed-up to your code.
You can reduce the complexity from O(n^2) to O(2*n) and thus improve runtime efficiency -
N = 256;
for k1 = 1:N
idx_glcm = k1:N-1:N*(k1-1)+1;
p_xplusy(k1+1,:) = p_xplusy(k1+1,:) + sum(glcm(idx_glcm,:),1);
for k1 = 2:N
idx_glcm = k1*N:N-1:N*(N-1)+k1;
p_xplusy(N+k1,:) = p_xplusy(N+k1,:) + sum(glcm(idx_glcm,:),1);
Some quick runtime tests seem to confirm our efficiency theory too.

Speeding up the conditional filling of huge sparse matrices

I was wondering if there is a way of speeding up (maybe via vectorization?) the conditional filling of huge sparse matrices (e.g. ~ 1e10 x 1e10). Here's the sample code where I have a nested loop, and I fill in a sparse matrix only if a certain condition is met:
% We are given the following cell arrays of the same size:
% all_arrays_1
% all_arrays_2
% all_mapping_arrays
N = 1e10;
% The number of nnz (non-zeros) is unknown until the loop finishes
huge_sparse_matrix = sparse([],[],[],N,N);
n_iterations = numel(all_arrays_1);
for iteration=1:n_iterations
array_1 = all_arrays_1{iteration};
array_2 = all_arrays_2{iteration};
mapping_array = all_mapping_arrays{iteration};
n_elements_in_array_1 = numel(array_1);
n_elements_in_array_2 = numel(array_2);
for element_1 = 1:n_elements_in_array_1
element_2 = mapping_array(element_1);
% Sanity check:
if element_2 <= n_elements_in_array_2
item_1 = array_1(element_1);
item_2 = array_2(element_2);
huge_sparse_matrix(item_1,item_2) = 1;
I am struggling to vectorize the nested loop. As far as I understand the filling a sparse matrix element by element is very slow when the number of entries to fill is large (~100M). I need to work with a sparse matrix since it has dimensions in the 10,000M x 10,000M range. However, this way of filling a sparse matrix in MATLAB is very slow.
I have updated the names of the variables to reflect their nature better. There are no function calls.
This code builds the matrix adjacency for a huge graph. The variable all_mapping_arrays holds mapping arrays (~ adjacency relationship) between nodes of the graph in a local representation, which is why I need array_1 and array_2 to map the adjacency to a global representation.
I think it will be the incremental update of the sparse matrix, rather than the loop based conditional that will be slowing things down.
When you add a new entry to a sparse matrix via something like A(i,j) = 1 it typically requires that the whole matrix data structure is re-packed. The is an expensive operation. If you're interested, MATLAB uses a CCS data structure (compressed column storage) internally, which is described under the Data Structure section here. Note the statement:
This scheme is not effcient for manipulating matrices one element at a
Generally, it's far better (faster) to accumulate the non-zero entries in the matrix as a set of triplets and then make a single call to sparse. For example (warning - brain compiled code!!):
% Inputs:
% N
% prev_array and next_array
% n_labels_prev and n_labels_next
% mapping
% allocate space for matrix entries as a set of "triplets"
ii = zeros(N,1);
jj = zeros(N,1);
xx = zeros(N,1);
nn = 0;
for next_label_ix = 1:n_labels_next
prev_label = mapping(next_label_ix);
if prev_label <= n_labels_prev
prev_global_label = prev_array(prev_label);
next_global_label = next_array(next_label_ix);
% reallocate triplets on demand
if (nn + 1 > length(ii))
ii = [ii; zeros(N,1)];
jj = [jj; zeros(N,1)];
xx = [xx; zeros(N,1)];
% append a new triplet and increment counter
ii(nn + 1) = next_global_label; % row index
jj(nn + 1) = prev_global_label; % col index
xx(nn + 1) = 1.0; % coefficient
nn = nn + 1;
% we may have over-alloacted our triplets, so trim the arrays
% based on our final counter
ii = ii(1:nn);
jj = jj(1:nn);
xx = xx(1:nn);
% just make a single call to "sparse" to pack the triplet data
% as a sparse matrix object
sp_graph_adj_global = sparse(ii,jj,xx,N,N);
I'm allocating in chunks of N entries at a time. Assuming that you know alot about the structure of your matrix you might be able to use a better value here.
Hope this helps.