Generate boolean matrix by predicate on row and column - matlab

I have the following vector:
y = [1; 3; 2; 3; 1];
All its values are between 1 and n (in this case, 3) and denote different options.
I want to create a matrix of size size(y, 1) x n whose rows correpond to y values:
1 0 0 % because y(1) = 1
0 0 1 % because y(2) = 3
0 1 0 % because y(3) = 2
0 0 1
1 0 0
One way to do this would be
Y = zeros(size(y, 1), num_labels);
for i = 1:m
Y(i, y(i)) = 1;
end
Is there a better way to do this, maybe in a single expression?
Basically, what I need is to generate a matrix with boolean predicate (i, j) => j == y(i).

You can try this if a is a column vector
a = [1; 3; 2; 3; 1];
bsxfun(#eq, a, [1:max(a)])
and this if it is a row vector
a = [1; 3; 2; 3; 1]';
bsxfun(#eq, a', [1:max(a)])

If you have access to Statistics Toolbox, the command dummyvar does exactly what you need.
>> y = [1; 3; 2; 3; 1];
>> dummyvar(y)
ans =
1 0 0
0 0 1
0 1 0
0 0 1
1 0 0

You can use sub2ind after initializing the matrix as follows:
y = [1; 3; 2; 3; 1];
m = length(y);
n = max(y);
Y = zeros(m, n);
Y(sub2ind(size(Y), 1:m, y')) = 1
Y =
1 0 0
0 0 1
0 1 0
0 0 1
1 0 0
The trick here is to know that the corresponding rows of y go from 1 to m one by one.

accumarray([(1:length(y)).' y], 1)

As suggested by Dmitri Bouianov on Coursera discussion forum, this also works:
Y = eye(num_labels)(y, :);
This solution uses elements of y to as indices to select rows from an identity matrix.

In Octave (at least as of 3.6.3, not sure when it was introduced), you can use broadcasting to do this extremely easily. It works like this:
Y = y==1:3
(if y is a row matrix, you need to transpose it first - if you want to have Y transposed instead, use y==(1:3)')

Related

Fastest way of generating a logical matrix by given row indices of true values?

What is the most efficient way of generating
>> A
A =
0 1 1
1 1 0
1 0 1
0 0 0
with
>> B = [2 3; 1 2; 1 3]
B =
2 3
1 2
1 3
in MATLAB?
E.g., B(1, :), which is [2 3], means that A(2, 1) and A(3, 1) are true.
My attempt still requires one for loop, iterating through B's row. Is there a loop-free or more efficient way of doing this?
This is one way of many, though sub2ind is the dedicated function for that:
%// given row indices
B = [2 3; 1 2; 1 3]
%// size of row index matrix
[n,m] = size(B)
%// size of output matrix
[N,M] = deal( max(B(:)), n)
%// preallocation of output matrix
A = zeros(N,M)
%// get col indices to given row indices
cols = bsxfun(#times, ones(n,m),(1:n).')
%// set values
A( sub2ind([N,M],B,cols) ) = 1
A =
0 1 1
1 1 0
1 0 1
If you want a logical matrix, change the following to lines
A = false(N,M)
A( sub2ind([N,M],B,cols) ) = true
Alternative solution
%// given row indices
B = [2 3; 1 2; 1 3];
%// number if rows
r = 4; %// e.g. = max(B(:))
%// number if cols
c = 3; %// size(B,1)
%// preallocation of output matrix
A = zeros(r,c);
%// set values
A( bsxfun(#plus, B.', 0:r:(r*(c-1))) ) = 1;
Here's a way, using the sparse function:
A = full(sparse(cumsum(ones(size(B))), B, 1));
This gives
A =
0 1 1
1 1 0
1 0 1
If you need a predefined number of rows in the output, say r (in your example r = 4):
A = full(sparse(cumsum(ones(size(B))), B, 1, 4, size(B,1)));
which gives
A =
0 1 1
1 1 0
1 0 1
0 0 0
You can equivalently use the accumarrray function:
A = accumarray([repmat((1:size(B,1)).',size(B,2),1), B(:)], 1);
gives
A =
0 1 1
1 1 0
1 0 1
Or with a predefined number of rows, r = 4,
A = accumarray([repmat((1:size(B,1)).',size(B,2),1), B(:)], 1, [r size(B,1)]);
gives
A =
0 1 1
1 1 0
1 0 1
0 0 0

Matlab - How to create logical array matrix without looping [duplicate]

This question already has answers here:
Creating Indicator Matrix
(6 answers)
Closed 6 years ago.
I want to do something like the following:
Y = [1; 2; 3];
X = repmat(1:10, 3, 1);
for i=1:3
X(i,:) = X(i,:) == Y(i);
end
So I end up with
X =
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
Is there a way to do this without looping?
If you start from 1:10 vector, using bsxfun:
Y = [1; 2; 3];
X = bsxfun(#eq, Y, 1:10);
Otherwise with repmat:
Y = [1; 2; 3];
X = repmat(1:10, 3, 1);
X = repmat(Y, 1, size(X,2)) == X;
(Or ones as suggested by leo.)
One solution would be
Y = [1; 2; 3];
X = repmat(1:10, 3, 1);
Z = X == (Y * ones(1, 10)) ;
But I am not sure this is faster. It does not use a loop, though :)
EDIT : you could use a repmat instead of the * ones(1, 10)

Performance of vectorizing code to create a sparse matrix with a single 1 per row from a vector of indexes

I have a large column vector y containing integer values from 1 to 10. I wanted to convert it to a matrix where each row is full of 0s except for a 1 at the index given by the value at the respective row of y.
This example should make it clearer:
y = [3; 4; 1; 10; 9; 9; 4; 2; ...]
% gets converted to:
Y = [
0 0 1 0 0 0 0 0 0 0;
0 0 0 1 0 0 0 0 0 0;
1 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 1;
0 0 0 0 0 0 0 0 1 0;
0 0 0 0 0 0 0 0 1 0;
0 0 0 1 0 0 0 0 0 0;
0 1 0 0 0 0 0 0 0 0;
...
]
I have written the following code for this (it works):
m = length(y);
Y = zeros(m, 10);
for i = 1:m
Y(i, y(i)) = 1;
end
I know there are ways I could remove the for loop in this code (vectorizing). This post contains a few, including something like:
Y = full(sparse(1:length(y), y, ones(length(y),1)));
But I had to convert y to doubles to be able to use this, and the result is actually about 3x slower than my "for" approach, using 10.000.000 as the length of y.
Is it likely that doing this kind of vectorization will lead to better performance for a very large y? I've read many times that vectorizing calculations leads to better performance (not only in MATLAB), but this kind of solution seems to result in more calculations.
Is there a way to actually improve performance over the for approach in this example? Maybe the problem here is simply that acting on doubles instead of ints isn't the best thing for comparison, but I couldn't find a way to use sparse otherwise.
Here is a test to comapre:
function [t,v] = testIndicatorMatrix()
y = randi([1 10], [1e6 1], 'double');
funcs = {
#() func1(y);
#() func2(y);
#() func3(y);
#() func4(y);
};
t = cellfun(#timeit, funcs, 'Uniform',true);
v = cellfun(#feval, funcs, 'Uniform',false);
assert(isequal(v{:}))
end
function Y = func1(y)
m = numel(y);
Y = zeros(m, 10);
for i = 1:m
Y(i, y(i)) = 1;
end
end
function Y = func2(y)
m = numel(y);
Y = full(sparse(1:m, y, 1, m, 10, m));
end
function Y = func3(y)
m = numel(y);
Y = zeros(m,10);
Y(sub2ind([m,10], (1:m).', y)) = 1;
end
function Y = func4(y)
m = numel(y);
Y = zeros(m,10);
Y((y-1).*m + (1:m).') = 1;
end
I get:
>> testIndicatorMatrix
ans =
0.0388
0.1712
0.0490
0.0430
Such a simple for-loop can be dynamically JIT-compiled at runtime, and would run really fast (even slightly faster than vectorized code)!
It seems you are looking for that full numeric matrix Y as the output. So, you can try this approach -
m = numel(y);
Y1(m,10) = 0; %// Faster way to pre-allocate zeros than using function call `zeros`
%// Source - http://undocumentedmatlab.com/blog/preallocation-performance
linear_idx = (y-1)*m+(1:m)'; %//'# since y is mentioned as a column vector,
%// so directly y can be used instead of y(:)
Y1(linear_idx)=1; %// Y1 would be the desired output
Benchmarking
Using Amro's benchmark post and increasing the datasize a bit -
y = randi([1 10], [1.5e6 1], 'double');
And finally doing the faster pre-allocation scheme mentioned earlier of using Y(m,10)=0; instead of Y = zeros(m,10);, I got these results on my system -
>> testIndicatorMatrix
ans =
0.1798
0.4651
0.1693
0.1457
That is the vectorized approach mentioned here (the last one in the benchmark suite) is giving you more than 15% performance improvement over your for-loop code (the first one in the benchmark suite). So, if you are using large datasizes and intend to get full versions of sparse matrices, this approach would make sense (in my personal opinion).
Does something like this not work for you?
tic;
N = 1e6;
y = randperm( N );
Y = spalloc( N, N, N );
inds = sub2ind( size(Y), y(:), (1:N)' );
Y = sparse( 1:N, y, 1, N, N, N );
toc
The above outputs
Elapsed time is 0.144683 seconds.

how can I vectorize setting the index values to one in Matlab?

I have the following loop that does what I need:
> whos Y
Name Size Bytes Class Attributes
Y 10x5000 400000 double
> whos y
Name Size Bytes Class Attributes
y 5000x1 40000 double
Y = zeros(K,m);
for i=1:m
Y(y(i),i)=1;
end
I would like to vectorize it and I have tried without success e.g.
Y = zeros(K,m);
Y(y,:)=1;
The idea is to get a vector of:
y = [9, 8, 7, .. etc]
and convert it to:
Y = [[0 0 0 0 0 0 0 0 1 0]' [0 0 0 0 0 0 0 1 0 0]' [0 0 0 0 0 0 1 0 0 0]' ... etc]
this I need in the context of a multi-class ANN implementation.
Have you considered using sparse matrix?
n=numel(y);
Y = sparse( y, 1:n, 1, n, n );
If you really must have the full matrix, you can call
Y = full(Y);
Here's one solution you could use. It's a starting point from which you could optimise
k = 10;
n = 20;
y = randi(k, 1, n);
columns = 1:n;
offsets = k*(columns-1);
indices = offsets + y;
Y = zeros(k, n);
Y(indices) = 1

How can I generate the following matrix in MATLAB?

I want to generate a matrix that is "stairsteppy" from a vector.
Example input vector: [8 12 17]
Example output matrix:
[1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1]
Is there an easier (or built-in) way to do this than the following?:
function M = stairstep(v)
M = zeros(length(v),max(v));
v2 = [0 v];
for i = 1:length(v)
M(i,(v2(i)+1):v2(i+1)) = 1;
end
You can do this via indexing.
A = eye(3);
B = A(:,[zeros(1,8)+1, zeros(1,4)+2, zeros(1,5)+3])
Here's a solution without explicit loops:
function M = stairstep(v)
L = length(v); % M will be
V = max(v); % an L x V matrix
M = zeros(L, V);
% create indices to set to one
idx = zeros(1, V);
idx(v + 1) = 1;
idx = cumsum(idx) + 1;
idx = sub2ind(size(M), idx(1:V), 1:V);
% update the output matrix
M(idx) = 1;
EDIT: fixed bug :p
There's no built-in function I know of to do this, but here's one vectorized solution:
v = [8 12 17];
N = numel(v);
M = zeros(N,max(v));
M([0 v(1:N-1)]*N+(1:N)) = 1;
M(v(1:N-1)*N+(1:N-1)) = -1;
M = cumsum(M,2);
EDIT: I like the idea that Jonas had to use BLKDIAG. I couldn't help playing with the idea a bit until I shortened it further (using MAT2CELL instead of ARRAYFUN):
C = mat2cell(ones(1,max(v)),1,diff([0 v]));
M = blkdiag(C{:});
A very short version of a vectorized solution
function out = stairstep(v)
% create lists of ones
oneCell = arrayfun(#(x)ones(1,x),diff([0,v]),'UniformOutput',false);
% create output
out = blkdiag(oneCell{:});
You can use ones to define the places where you have 1's:
http://www.mathworks.com/help/techdoc/ref/ones.html