MATLAB boxplot of high dimensional vectors with different lengths - matlab

I have been looking for a way to use boxplot for different length vectors. thanx for stackoverflow helpers, they give this solution:
A = randn(10, 1); B = randn(12, 1); C = randn(4, 1);
g = [repmat(1, [10, 1]) ; repmat(2, [12, 1]); repmat(3, [4, 1])];
figure; boxplot([A; B; C], g);
unfortunately, my data contains over 100 vectors with different lengths, I wonder if it can be done without repeating the repmat for over 100 times.

As long as your vectors have different lengths, store it in a cell array.
There are plenty was of doing it, here are 3 examples
1) "Naive" for loop
g = [];
vars_cell = {A, B, C, ....};
for it = 1 : length(vars_cell)
g = [g; repmat(it,size(vars_cell{it}))];
end
This way of doing it works but is very slow with big quantites of vectors or big vectors! It comes from the fact that you are re-defining g at each iteration, changing its size each time.
2) Not-naive for loop
vars_cell = {A, B, C, ....};
%find the sum of the length of all the vectors
total_l = sum(cellfun(#(x) length(x),vars_cell));
g = zeros(total_l,1);
acc = 1;
for it = 1 : length(vars_cell)
l = size(vars_cell{it});
g(acc:acc+l-1) = repmat(it,l);
acc = acc+l;
end
This method will be much faster than the 1st one because it defines g only once
3) The "one-liner"
vars_cell = {A, B, C, ....};
g = cell2mat(arrayfun(#(it) repmat(it, size(vars_cell{it})),1:length(vars_cell),'UniformOutput',0)');
This is qute equivalent to the 2nd solution, but if you like one line answers this is what you are looking for!

Related

Take a random draw of all possible pairs of indices in Matlab

Consider a Matlab matrix B which lists all possible unordered pairs (without repetitions) from [1 2 ... n]. For example, if n=4,
B=[1 2;
1 3;
1 4;
2 3;
2 4;
3 4]
Note that B has size n(n-1)/2 x 2
I want to take a random draw of m rows from B and store them in a matrix C. Continuing the example above, I could do that as
m=2;
C=B(randi([1 size(B,1)],m,1),:);
However, in my actual case, n=371293. Hence, I cannot create B and, then, run the code above to obtain C. This is because storing B would require a huge amount of memory.
Could you advise on how I could proceed to create C, without having to first store B? Comments on a different question suggest to
Draw at random m integers between 1 and n(n-1)/2.
I=randi([1 n*(n-1)/2],m,1);
Use ind2sub to obtain C.
Here, I'm struggling to implement the second step.
Thanks to the comments below, I wrote this
n=4;
m=10;
coord=NaN(m,2);
R= randi([1 n^2],m,1);
for i=1:m
[cr, cc]=ind2sub([n,n],R(i));
if cr>cc
coord(i,1)=cc;
coord(i,2)=cr;
elseif cr<cc
coord(i,1)=cr;
coord(i,2)=cc;
end
end
coord(any(isnan(coord),2),:) = []; %delete NaN rows from coord
I guess there are more efficient ways to implement the same thing.
You can use the function named myind2ind in this post to take random rows of all possible unordered pairs without generating all of them.
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end
I=randi([1 n*(n-1)/2],m,1);
[C1 C2] = myind2ind (I, n);
If you look at the odds, for i=1:n-1, the number of combinations where the first value is equal to i is (n-i) and the total number of cominations is n*(n-1)/2. You can use this law to generate the first column of C. The values of the second column of C can then be generated randomly as integers uniformly distributed in the range [i+1, n]. Here is a code that performs the desired tasks:
clc; clear all; close all;
% Parameters
n = 371293; m = 10;
% Generation of C
R = rand(m,1);
C = zeros(m,2);
s = 0;
t = n*(n-1)/2;
for i=1:n-1
if (i<n-1)
ind_i = R>=s/t & R<(s+n-i)/t;
else % To avoid rounding errors for n>>1, we impose (s+n-i)=t at the last iteration (R<(s+n-i)/t=1 always true)
ind_i = R>=s/t;
end
C(ind_i,1) = i;
C(ind_i,2) = randi([i+1,n],sum(ind_i),1);
s = s+n-i;
end
% Display
C
Output:
C =
84333 266452
46609 223000
176395 328914
84865 94391
104444 227034
221905 302546
227497 335959
188486 344305
164789 266497
153603 354932
Good luck!

MATLAB: efficient generating of block matrices using a block vector

Suppose
x = [x1;x2; ...; xn]
where each xi is a column vector with length l(i). We can set L = sum(l), the total length of x. I would like to generate 2 matrices based on x:
Let's call them A and B. For example, when x only as 2 blocks x1 and x2 then:
A = [x1*x1' zeros(l(1),l(2)); zeros(l(2),l(1)), x2*x2'];
B = [x1 zeros(l(1),1);
zeros(l(2),1), x2];
In the notation of the problem, A is always L by L and B is L by n. I can generate A and B given x using loops but it is tedious. Is there a clever (loop-free) way to generate A and B. I am using MATLAB 2018b but you can assume earlier version of MATLAB if necessary.
I think it is both short and fast:
B = x .* (repelem((1:numel(l)).',l)==(1:numel(l)));
A = B * B.';
If you have large data It is better to use sparse matrix:
B = sparse(1:numel(x), repelem(1:numel(l), l), x);
A = B * B.';
The following should work. In this case I do an inefficient conversion to cell arrays so there may be a more efficient implementation possible.
cuml = [0; cumsum(l(:))];
get_x = #(idx) x((1:l(idx))+cuml(idx));
x_cell = arrayfun(get_x, 1:numel(l), 'UniformOutput', false);
B = blkdiag(x_cell{:});
A = B*B';
Edit
After running some benchmarks I found a direct loop based implementation to be about twice as fast as the cell based approach above.
A = zeros(sum(l));
B = zeros(sum(l), numel(l));
prev = 0;
for idx = 1:numel(l)
xidx = (1:l(idx))+prev;
A(xidx, xidx) = x(xidx,1) * x(xidx,1)';
B(xidx, idx) = x(idx,1);
prev = prev + l(idx);
end
Here's an alternative approach:
s = repelem(1:numel(l), l).';
t = accumarray(s, x, [], #(x){x*x'});
A = blkdiag(t{:});
t = accumarray(s, x, [], #(x){x});
B = blkdiag(t{:});

How to compare elements in multiple vectors (of different sizes) against each other?

I have three vectors in Matlab that are possibly of different sizes. I want to compare the values in each vector against all the other values in the other vectors and only keep values that are 'close' in 2 out of 3 of the vectors. And by 'keep', I mean take the average of the close values.
For example:
a = [10+8i, 20];
b = [10+9i, 30, 40+3i, 55];
c = [10, 60, 41+3i];
If I set a closeness tolerance such that only values that are within, say, a magnitude of 1.5 of each other are kept, then the following values should be marked as close:
10 + 8i and 10 + 9i
40 + 3i and 41 + 3i
Then the routine should return a vector of length that contains the average of each of these sets of numbers:
finalVec = [10+8.5i,40.5+3i];
What is the most efficient way to do this in Matlab? Is there a better way than just straightforward looping over all elements?
Building on this solution:
a = [10+8i, 20];
b = [10+9i, 30, 40+3i, 55];
c = [10, 60, 41+3i];
M1 = compare_vectors(a , b);
M2 = compare_vectors(a , c);
M3 = compare_vectors(b , c);
finalVec = [M1, M2 , M3]
function M = compare_vectors(a , b)
% All combinations of vectors elements
[A,B] = meshgrid(a,b);
C = cat(2,A',B');
D = reshape(C,[],2);
% Find differences lower than tolerance
tolerance = 1.5
below_tolerance = abs(D(:,1) - D(:,2)) < tolerance ;
% If none, return empty
if all(below_tolerance== 0)
M = [];
return
end
% Calculate average of returned values
M = mean(D(below_tolerance,:));
end
% your data
a = [10+8i, 20];
b = [10+9i, 30, 40+3i, 55];
c = [10, 60, 41+3i];
tol = 1.5;
% call the function with each combination of vectors and concatenate the results
finalVec = cell2mat(cellfun(#closepoints, {a, a, b}, {b, c, c}, {tol, tol, tol}, 'Uni', 0))
function p = closepoints(a, b, tol)
% find the pairs of indexes of close points
% the bsxfun() part calculates the distance between all combinations of elements of the two vectors
[ii,jj] = find(abs(bsxfun(#minus, a, b.')) < tol);
% calculate the mean
p = (a(jj) + b(ii))/2;
end
Note that cellfun() isn't really faster than calling the function multiple times in a row or using a for loop. But it would be easier to add more vectors than the former and is IMO nicer to look at than the latter.

Use row indices of matrix as elements of new matrix? [duplicate]

For those super experts out there, I was wondering if you see a quick way to convert the following "for" loop into a one-line vector calculation that is more efficient.
%Define:
%A size (n,1)
%B size (n,m)
%C size (n,1)
B = [2 200; 3 300; 4 400];
C = [1;2;1];
for j=1:n
A(j) = B( j, C(j) );
end
So to be clear, is there any alternative way to express A, as a function of B and C, without having to write a loop?
Yes, there is:
A = B(sub2ind([n,m], (1:n).', C));
It depends on functions A, B, and C, but this might work:
j = 1:n;
A = B(j, C(j));

Multiply a 3D matrix with a 2D matrix

Suppose I have an AxBxC matrix X and a BxD matrix Y.
Is there a non-loop method by which I can multiply each of the C AxB matrices with Y?
As a personal preference, I like my code to be as succinct and readable as possible.
Here's what I would have done, though it doesn't meet your 'no-loops' requirement:
for m = 1:C
Z(:,:,m) = X(:,:,m)*Y;
end
This results in an A x D x C matrix Z.
And of course, you can always pre-allocate Z to speed things up by using Z = zeros(A,D,C);.
You can do this in one line using the functions NUM2CELL to break the matrix X into a cell array and CELLFUN to operate across the cells:
Z = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
The result Z is a 1-by-C cell array where each cell contains an A-by-D matrix. If you want Z to be an A-by-D-by-C matrix, you can use the CAT function:
Z = cat(3,Z{:});
NOTE: My old solution used MAT2CELL instead of NUM2CELL, which wasn't as succinct:
[A,B,C] = size(X);
Z = cellfun(#(x) x*Y,mat2cell(X,A,B,ones(1,C)),'UniformOutput',false);
Here's a one-line solution (two if you want to split into 3rd dimension):
A = 2;
B = 3;
C = 4;
D = 5;
X = rand(A,B,C);
Y = rand(B,D);
%# calculate result in one big matrix
Z = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
%'# split into third dimension
Z = permute(reshape(Z',[D A C]),[2 1 3]);
Hence now: Z(:,:,i) contains the result of X(:,:,i) * Y
Explanation:
The above may look confusing, but the idea is simple.
First I start by take the third dimension of X and do a vertical concatenation along the first dim:
XX = cat(1, X(:,:,1), X(:,:,2), ..., X(:,:,C))
... the difficulty was that C is a variable, hence you can't generalize that expression using cat or vertcat. Next we multiply this by Y:
ZZ = XX * Y;
Finally I split it back into the third dimension:
Z(:,:,1) = ZZ(1:2, :);
Z(:,:,2) = ZZ(3:4, :);
Z(:,:,3) = ZZ(5:6, :);
Z(:,:,4) = ZZ(7:8, :);
So you can see it only requires one matrix multiplication, but you have to reshape the matrix before and after.
I'm approaching the exact same issue, with an eye for the most efficient method. There are roughly three approaches that i see around, short of using outside libraries (i.e., mtimesx):
Loop through slices of the 3D matrix
repmat-and-permute wizardry
cellfun multiplication
I recently compared all three methods to see which was quickest. My intuition was that (2) would be the winner. Here's the code:
% generate data
A = 20;
B = 30;
C = 40;
D = 50;
X = rand(A,B,C);
Y = rand(B,D);
% ------ Approach 1: Loop (via #Zaid)
tic
Z1 = zeros(A,D,C);
for m = 1:C
Z1(:,:,m) = X(:,:,m)*Y;
end
toc
% ------ Approach 2: Reshape+Permute (via #Amro)
tic
Z2 = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
Z2 = permute(reshape(Z2',[D A C]),[2 1 3]);
toc
% ------ Approach 3: cellfun (via #gnovice)
tic
Z3 = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
Z3 = cat(3,Z3{:});
toc
All three approaches produced the same output (phew!), but, surprisingly, the loop was the fastest:
Elapsed time is 0.000418 seconds.
Elapsed time is 0.000887 seconds.
Elapsed time is 0.001841 seconds.
Note that the times can vary quite a lot from one trial to another, and sometimes (2) comes out the slowest. These differences become more dramatic with larger data. But with much bigger data, (3) beats (2). The loop method is still best.
% pretty big data...
A = 200;
B = 300;
C = 400;
D = 500;
Elapsed time is 0.373831 seconds.
Elapsed time is 0.638041 seconds.
Elapsed time is 0.724581 seconds.
% even bigger....
A = 200;
B = 200;
C = 400;
D = 5000;
Elapsed time is 4.314076 seconds.
Elapsed time is 11.553289 seconds.
Elapsed time is 5.233725 seconds.
But the loop method can be slower than (2), if the looped dimension is much larger than the others.
A = 2;
B = 3;
C = 400000;
D = 5;
Elapsed time is 0.780933 seconds.
Elapsed time is 0.073189 seconds.
Elapsed time is 2.590697 seconds.
So (2) wins by a big factor, in this (maybe extreme) case. There may not be an approach that is optimal in all cases, but the loop is still pretty good, and best in many cases. It is also best in terms of readability. Loop away!
Nope. There are several ways, but it always comes out in a loop, direct or indirect.
Just to please my curiosity, why would you want that anyway ?
To answer the question, and for readability, please see:
ndmult, by ajuanpi (Juan Pablo Carbajal), 2013, GNU GPL
Input
2 arrays
dim
Example
nT = 100;
t = 2*pi*linspace (0,1,nT)’;
# 2 experiments measuring 3 signals at nT timestamps
signals = zeros(nT,3,2);
signals(:,:,1) = [sin(2*t) cos(2*t) sin(4*t).^2];
signals(:,:,2) = [sin(2*t+pi/4) cos(2*t+pi/4) sin(4*t+pi/6).^2];
sT(:,:,1) = signals(:,:,1)’;
sT(:,:,2) = signals(:,:,2)’;
G = ndmult (signals,sT,[1 2]);
Source
Original source. I added inline comments.
function M = ndmult (A,B,dim)
dA = dim(1);
dB = dim(2);
# reshape A into 2d
sA = size (A);
nA = length (sA);
perA = [1:(dA-1) (dA+1):(nA-1) nA dA](1:nA);
Ap = permute (A, perA);
Ap = reshape (Ap, prod (sA(perA(1:end-1))), sA(perA(end)));
# reshape B into 2d
sB = size (B);
nB = length (sB);
perB = [dB 1:(dB-1) (dB+1):(nB-1) nB](1:nB);
Bp = permute (B, perB);
Bp = reshape (Bp, sB(perB(1)), prod (sB(perB(2:end))));
# multiply
M = Ap * Bp;
# reshape back to original format
s = [sA(perA(1:end-1)) sB(perB(2:end))];
M = squeeze (reshape (M, s));
endfunction
I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.
The advantages of MMX are:
It is easy to use.
Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
It uses C compiler and multi-thread computation for speed up.
For this problem, you just need to write this command:
C=mmx('mul',X,Y);
here is a benchmark for all possible methods. For more detail refer to this question.
1.6571 # FOR-loop
4.3110 # ARRAYFUN
3.3731 # NUM2CELL/FOR-loop/CELL2MAT
2.9820 # NUM2CELL/CELLFUN/CELL2MAT
0.0244 # Loop Unrolling
0.0221 # MMX toolbox <===================
I would like to share my answer to the problems of:
1) making the tensor product of two tensors (of any valence);
2) making the contraction of two tensors along any dimension.
Here are my subroutines for the first and second tasks:
1) tensor product:
function [C] = tensor(A,B)
C = squeeze( reshape( repmat(A(:), 1, numel(B)).*B(:).' , [size(A),size(B)] ) );
end
2) contraction:
Here A and B are the tensors to be contracted along the dimesions i and j respectively. The lengths of these dimensions should be equal, of course. There's no check for this (this would obscure the code) but apart from this it works well.
function [C] = tensorcontraction(A,B, i,j)
sa = size(A);
La = length(sa);
ia = 1:La;
ia(i) = [];
ia = [ia i];
sb = size(B);
Lb = length(sb);
ib = 1:Lb;
ib(j) = [];
ib = [j ib];
% making the i-th dimension the last in A
A1 = permute(A, ia);
% making the j-th dimension the first in B
B1 = permute(B, ib);
% making both A and B 2D-matrices to make use of the
% matrix multiplication along the second dimension of A
% and the first dimension of B
A2 = reshape(A1, [],sa(i));
B2 = reshape(B1, sb(j),[]);
% here's the implicit implication that sa(i) == sb(j),
% otherwise - crash
C2 = A2*B2;
% back to the original shape with the exception
% of dimensions along which we've just contracted
sa(i) = [];
sb(j) = [];
C = squeeze( reshape( C2, [sa,sb] ) );
end
Any critics?
I would think recursion, but that's the only other non- loop method you can do
You could "unroll" the loop, ie write out all the multiplications sequentially that would occur in the loop