Execute formulas ignoring zero elements - matlab

I want to run fast Matlab algorithms over Matrices by ignoring zero-elements.
In the past I just worked with a very slow double-for-loop e.g.
for i = 1 : size(x,1)
for j = 1 : size(x,2)
if x(i,j) ~= 0
... do something with x(i,j)
end
end
end
But how can I make the matrix operation on the whole matrix x?
E.g. how can I run
x(i,j) = log(x(i,j)) if x>0 else 0 <-- pseudo code
in Matlab on the whole matrix without for loops?
Finally I want to rewrite lines like
result = sum(sum((V.*log(V./(W*H))) - V + W*H));
with ignoring zeros.
I just need to understand the concept.
In case of need I could also use NaN instead of zero, but I didn't find e.g. the function
nanlog()

x~=0 returns you the indices of the locations not equal to zero. Then, you can use them to index corresponding locations of x such as follows:
>> x = [1 0 2 3; 0 4 0 5]
x =
1 0 2 3
0 4 0 5
>> mean(x(:)) %#mean of all elements
ans =
1.8750
>> mean(x(x~=0)) %#mean of nonzero elements
ans =
3
>> x(x~=0) = x(x~=0) + 1
x =
2 0 3 4
0 5 0 6

You can use NaN as a temporary and make use of the fact that log(NaN) = NaN, like so:
x(x==0) = NaN;
y = log(x);
y(isnan(y)) = 0;
alternatively, you can use logical indexing:
x(x~=0) = log(x(x~=0));
or, if you want to preserve x,
y = x;
y(y~=0) = log(y(y~=0));
For the example you provide, you can just do
result = nansum(nansum((V.*log(V./(W*H))) - V + W*H));
assuming that V == 0 is the problem.

Related

How can I randomize two binary vectors that are similar while making sure all possible combinations are respected?

Been trying to solve this simple problem for a while but just couldn't find the solution for the life of me...
I'm programming an experiment in PsychToolbox but I'll spare you the details, basically I have two vectors A and B of equal size with the same number of ones and zeroes:
A = [0 0 1 1]
B = [0 0 1 1]
Both vectors A and B must be randomized independently but in such a way that one combination of items between the two vectors is never repeated. That is, I must end up with this
A = [1 1 0 0]
B = [1 0 0 1]
or this:
A = [0 0 1 1]
B = [0 1 0 1]
but I should never end up with this:
A = [1 1 0 0]
B = [1 1 0 0]
or this
A = [0 1 0 1]
B = [0 1 0 1]
One way to determine this is to check the sum of items between the two vectors A+B, which should always contain only one 2 or only one 0:
A = [1 1 0 0]
B = [1 0 0 1]
A+B = 2 1 0 1
Been trying to make this a condition within a 'while' loop (e.g. so long as the number of zeroes in the vector obtained by A+B is superior to 1, keep randomizing A and B), but either it still produces repeated combination or it just never stops looping. I know this is a trivial problem but I just can't get my head around it somehow. Anyone care to help?
This is a simplified version of the script I got:
A = [1 1 0 0];
B = A;
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
while nnz(~(A+B)) > 1
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
end
Still, I end up with repeated combinations.
% If you are only looking for an answer to this scenario the easiest way is
% as follows:
A = [0 0 1 1];
B = [0 0 1 1];
nn = length(A);
keepset = [0 0 1 1;0 1 0 1];
keepset = keepset(:,randperm(nn))
% If you want a more general solution for arbitrary A & B (for example)
A = [0 0 0 1 1 1 2 2 2];
B = [0 0 0 1 1 1 2 2 2];
nn = length(A);
Ai = A(randperm(nn));
Bi = B(randperm(nn));
% initialize keepset with the first combination of A & B
keepset = [Ai(1);Bi(1)];
loopcnt = 0;
while (size(keepset,2) < nn)
% randomize the elements in A and B independently
Ai = A(randperm(nn));
Bi = B(randperm(nn));
% test each combination of Ai and Bi to see if it is already in the
% keepset
for ii = 1:nn
tstcombo = [Ai(ii);Bi(ii)];
matchtest = bsxfun(#eq,tstcombo,keepset);
matchind = find((matchtest(1,:) & matchtest(2,:)));
if isempty(matchind)
keepset = [keepset tstcombo];
end
end
loopcnt = loopcnt + 1;
if loopcnt > 1000
disp('Execution halted after 1000 attempts')
break
elseif (size(keepset,2) >= nn)
disp(sprintf('Completed in %0.f iterations',loopcnt))
end
end
keepset
It's much more efficient to permute the combinations randomly than shuffling the arrays independently and handling the inevitable matching A/B elements.
There are lots of ways to generate all possible pairs, see
How to generate all pairs from two vectors in MATLAB using vectorised code?
For this example I'll use
allCombs = combvec([0,1],[0,1]);
% = [ 0 1 0 1
% 0 0 1 1 ]
Now you just want to select some amount of unique (non-repeating) columns from this array in a random order. In all of your examples you select all 4 columns. The randperm function is perfect for this, from the docs:
p = randperm(n,k) returns a row vector containing k unique integers selected randomly from 1 to n inclusive.
n = size(allCombs,2); % number of combinations (or columns) to choose from
k = 4; % number of columns to choose for output
AB = allCombs( :, randperm(n,k) ); % random selection of pairs
If you need this split into two variables then you have
A = AB(1,:);
B = AB(2,:);
Here's a possible solution:
A = [0 0 1 1];
B = [0 0 1 1];
% Randomize A and B independently
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
% Keep randomizing A and B until the condition is met
while sum(A+B) ~= 1 && sum(A+B) ~= length(A)
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
end
This solution checks if the sum of the elements in A+B is either 1 or the length of A, which indicates that only one element in A+B is either a 0 or a 2, respectively. If either of these conditions is not met, the vectors A and B are randomized again.

Matlab/Octave: how to write n-dimensional zero padding algorithm without eval

I would like to write a "syntactical sugar" Octave or Matlab zero-padding function, to which the user sends an n-dimensional object and a vector of <= n entries. The vector contains new, equal or larger dimensions for the object, and the object is zero-padded to match these dimensions. Any dimensions not specified are left alone. One expected use is, given for example a 5d block X of 3d medical image volumes, I can call
y = simplepad(X, [128 128 128]);
and thus pad the first three dimensions to a power of two for wavelet analysis (in fact I use a separate function nextpwr2 to find these dimensions) while leaving the others.
I have racked my brains on how to write this method avoiding the dreaded eval, but cannot thus far find a way. Can anyone suggest a solution? Here is more or less what I have:
function y = simplepad(x, pad)
szx = size(x);
n_pad = numel(pad);
szy = [pad szx(n_pad+1:end)];
y = zeros(szy);
indices_string = '(';
for n = 1:numel(szx)
indices_string = [indices_string, '1:', num2str(szx(n))];
if n < numel(szx)
indices_string = [indices_string, ','];
else
indices_string = [indices_string, ')'];
end
end
command = ['y',indices_string,'=x;'];
eval(command);
end
Here's a solution that should handle all the little corner cases:
function A = simplepad(A, pad)
% Add singleton dimensions (i.e. ones) to the ends of the old size of A
% or pad as needed so they can be compared directly to one another:
oldSize = size(A);
dimChange = numel(pad)-numel(oldSize);
oldSize = [oldSize ones(1, dimChange)];
pad = [pad ones(1, -dimChange)];
% If all of the sizes in pad are less than or equal to the sizes in
% oldSize, there is no padding done:
if all(pad <= oldSize)
return
end
% Use implicit zero expansion to pad:
pad = num2cell(pad);
A(pad{:}) = 0;
end
And a few test cases:
>> M = magic(3)
M =
8 1 6
3 5 7
4 9 2
>> simplepad(M, [1 1]) % No change, since the all values are smaller
ans =
8 1 6
3 5 7
4 9 2
>> simplepad(M, [1 4]) % Ignore the 1, pad the rows
ans =
8 1 6 0
3 5 7 0
4 9 2 0
>> simplepad(M, [4 4]) % Pad rows and columns
ans =
8 1 6 0
3 5 7 0
4 9 2 0
0 0 0 0
>> simplepad(M, [4 4 2]) % Pad rows and columns and add a third dimension
ans(:,:,1) =
8 1 6 0
3 5 7 0
4 9 2 0
0 0 0 0
ans(:,:,2) =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
As I understand, you want just pass the some dynamic arguments to function.
You can do this by converting these arguments to cell and call your function with passing cell content. So, your function will look like:
function y = simplepad(x, pad)
szx = size(x);
n_pad = numel(pad);
szy = [pad szx(n_pad+1:end)];
y = x;
szyc = num2cell(szy);
y(szyc{:}) = 0; % warning: assume x array only grows
end

Performance of vectorizing code to create a sparse matrix with a single 1 per row from a vector of indexes

I have a large column vector y containing integer values from 1 to 10. I wanted to convert it to a matrix where each row is full of 0s except for a 1 at the index given by the value at the respective row of y.
This example should make it clearer:
y = [3; 4; 1; 10; 9; 9; 4; 2; ...]
% gets converted to:
Y = [
0 0 1 0 0 0 0 0 0 0;
0 0 0 1 0 0 0 0 0 0;
1 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 1;
0 0 0 0 0 0 0 0 1 0;
0 0 0 0 0 0 0 0 1 0;
0 0 0 1 0 0 0 0 0 0;
0 1 0 0 0 0 0 0 0 0;
...
]
I have written the following code for this (it works):
m = length(y);
Y = zeros(m, 10);
for i = 1:m
Y(i, y(i)) = 1;
end
I know there are ways I could remove the for loop in this code (vectorizing). This post contains a few, including something like:
Y = full(sparse(1:length(y), y, ones(length(y),1)));
But I had to convert y to doubles to be able to use this, and the result is actually about 3x slower than my "for" approach, using 10.000.000 as the length of y.
Is it likely that doing this kind of vectorization will lead to better performance for a very large y? I've read many times that vectorizing calculations leads to better performance (not only in MATLAB), but this kind of solution seems to result in more calculations.
Is there a way to actually improve performance over the for approach in this example? Maybe the problem here is simply that acting on doubles instead of ints isn't the best thing for comparison, but I couldn't find a way to use sparse otherwise.
Here is a test to comapre:
function [t,v] = testIndicatorMatrix()
y = randi([1 10], [1e6 1], 'double');
funcs = {
#() func1(y);
#() func2(y);
#() func3(y);
#() func4(y);
};
t = cellfun(#timeit, funcs, 'Uniform',true);
v = cellfun(#feval, funcs, 'Uniform',false);
assert(isequal(v{:}))
end
function Y = func1(y)
m = numel(y);
Y = zeros(m, 10);
for i = 1:m
Y(i, y(i)) = 1;
end
end
function Y = func2(y)
m = numel(y);
Y = full(sparse(1:m, y, 1, m, 10, m));
end
function Y = func3(y)
m = numel(y);
Y = zeros(m,10);
Y(sub2ind([m,10], (1:m).', y)) = 1;
end
function Y = func4(y)
m = numel(y);
Y = zeros(m,10);
Y((y-1).*m + (1:m).') = 1;
end
I get:
>> testIndicatorMatrix
ans =
0.0388
0.1712
0.0490
0.0430
Such a simple for-loop can be dynamically JIT-compiled at runtime, and would run really fast (even slightly faster than vectorized code)!
It seems you are looking for that full numeric matrix Y as the output. So, you can try this approach -
m = numel(y);
Y1(m,10) = 0; %// Faster way to pre-allocate zeros than using function call `zeros`
%// Source - http://undocumentedmatlab.com/blog/preallocation-performance
linear_idx = (y-1)*m+(1:m)'; %//'# since y is mentioned as a column vector,
%// so directly y can be used instead of y(:)
Y1(linear_idx)=1; %// Y1 would be the desired output
Benchmarking
Using Amro's benchmark post and increasing the datasize a bit -
y = randi([1 10], [1.5e6 1], 'double');
And finally doing the faster pre-allocation scheme mentioned earlier of using Y(m,10)=0; instead of Y = zeros(m,10);, I got these results on my system -
>> testIndicatorMatrix
ans =
0.1798
0.4651
0.1693
0.1457
That is the vectorized approach mentioned here (the last one in the benchmark suite) is giving you more than 15% performance improvement over your for-loop code (the first one in the benchmark suite). So, if you are using large datasizes and intend to get full versions of sparse matrices, this approach would make sense (in my personal opinion).
Does something like this not work for you?
tic;
N = 1e6;
y = randperm( N );
Y = spalloc( N, N, N );
inds = sub2ind( size(Y), y(:), (1:N)' );
Y = sparse( 1:N, y, 1, N, N, N );
toc
The above outputs
Elapsed time is 0.144683 seconds.

Coverting C style code into Matlab

I've the following code in C:
for(i=0;i<m;i++)
{
for(j=0;j<n;j++)
{
a[b[i]][c[j]]+=1;
}
}
Is there a way to write this in Matlab without using for loops? I mean the Matlab way using (:) which is faster.
Something like a(b(:),c(:))=a(b(:),c(:))+1 gives me out of memory error.
Interesting. While I don't (yet) have a solution for you (solution at bottom), I have a few notes and pointers:
1. The out of memory error is because you're creating a 512*256 by 512*256 element temporary matrix on the right hand side (a(b(:),c(:))+1). That is 2^34 bytes — 17GB! So that's why you're getting an out of memory error. Note, too, that this array isn't even what you want! Look at this example:
>> a = magic(5);
>> b = [1 5 4]; % The rows that contain the numbers 1,2,3 respectively
>> c = [3 4 5]; % The columns that contain ^ ...
Now, a(1,3) == 1, a(5,4) == 2, etc. But when you say a(b,c), you're selecting rows (1,5,4) and columns (3,4,5) for every one of those rows!
>> a(b,c)
ans =
1 8 15
25 2 9
19 21 3
All you care about is the diagonal. The solution is to use sub2ind to convert your subscript pairs to a linear index.
>> a(sub2ind(size(a),b,c))
ans =
1 2 3
2. Your proposed solution doesn't do what you want it to, either. Since Matlab lacks an increment operator, you are simply incrementing all indices that exist in (b,c) by one. And no more. It'll take some creative thinking to vectorize this. Use a smaller array to see what's going on:
>> a = zeros(4,4);
>> b = ones(8,4);
>> c = ones(8,4);
>> a(b,c) = a(b,c) + 1;
>> a
a =
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Edit Here we go! Vectorized incrementation:
>> idxs = sub2ind(size(a),b(:),c(:)); % convert subscripts to linear indices
>> [unique_idxs,~,ic] = unique(idxs); % Get the unique indices and their locations
>> increment_counts = histc(ic,1:max(ic)); % Get the number of occurrences of each idx
>> a(unique_idxs) = a(unique_idxs) + increment_counts;
Assuming you have the following matrices:
a = zeros(256); % or initialized with other values
b = randi(256, [512 256]);
c = randi(256, [512 256]);
Here is an even faster vectorized solution:
a = a + sparse(b,c,1,size(a,1),size(a,2));
Here is another one:
a = a + accumarray([b(:) c(:)], 1, size(a));
Answer: Yes.
a(b, c) = a(b, c) + 1;
Example:
>> a = zeros(5);
>> b = [1,3];
>> c = [2,4,5];
>> a(b,c) = a(b,c) + 1;
>> a
a =
0 1 0 1 1
0 0 0 0 0
0 1 0 1 1
0 0 0 0 0
0 0 0 0 0

How to vectorize row-wise diagonalization of a matrix

I have an n-by-m matrix that I want to convert to a mn-by-m matrix, with each m-by-m block of the result containing the diagonal of each row.
For example, if the input is:
[1 2; 3 4; 5 6]
the output should be:
[1 0; 0 2; 3 0; 0 4; 5 0; 0 6]
Of course, I don't want to assemble the matrix step by step myself with a for loop.
Is there a vectorized and simple way to achieve this?
For a vectorized way to do this, create the linear indices of the diagonal elements into the resulting matrix, and assign directly.
%# create some input data
inArray = [10 11;12 13;14 15];
%# make the index array
[nr,nc]=size(inArray);
idxArray = reshape(1:nr*nc,nc,nr)';
idxArray = bsxfun(#plus,idxArray,0:nr*nc:nr*nc^2-1);
%# create output
out = zeros(nr*nc,nc);
out(idxArray) = inArray(:);
out =
10 0
0 11
12 0
0 13
14 0
0 15
Here's a simple vectorized solution, assuming X is the input matrix:
Y = repmat(eye(size(X, 2)), size(X, 1), 1);
Y(find(Y)) = X;
Another alternative is to use sparse, and this can be written as a neat one-liner:
Y = full(sparse(1:numel(X), repmat(1:size(X, 2), 1, size(X, 1)), X'));
The easiest way I see to do this is actually quite simple, using simple index referencing and the reshape function:
I = [1 2; 3 4; 5 6];
J(:,[1,4]) = I;
K = reshape(J',2,6)';
If you examine J, it looks like this:
J =
1 0 0 2
3 0 0 4
5 0 0 6
Matrix K is just what wanted:
K =
1 0
0 2
3 0
0 4
5 0
0 6
As Eitan T has noted in the comments, the above is specific to the example, and doesn't cover the general solution. So below is the general solution, with m and n as described in the question.
J(:,1:(m+1):m^2) = I;
K=reshape(J',m,m*n)';
If you want to test it to see it working, just use
I=reshape(1:(m*n),m,n)';
Note: if J already exists, this can cause problems. In this case, you need to also use
J=zeros(n,m^2);
It may not be the most computationally efficient solution, but here's a 1-liner using kron:
A = [1 2; 3 4; 5 6];
B = diag(reshape(A', 6, 1) * kron(ones(3, 1), eye(2))
% B =
% 1 0
% 0 2
% 3 0
% 0 4
% 5 0
% 0 6
This can be generalized if A is n x m:
diag(reshape(A.', n*m, 1)) * kron(ones(n,1), eye(m))