MATLAB: Replace leading zeros of every column with NaN - matlab

I have a 3D matrix called mat. Every column may or may not comprise a variable number of leading zeros. I need to replace them with NaNs. It's important to recognize that there might follow even more zeros in any column after the occurence of the first non-zero elements. That is, just indexing ALL zeros in the matrix and replacing them with NaN won't lead to the correct result.
I do have a working solution. However, it contains two for-loops. I am wondering whether it's possible to vectorize and get rid of the loop. In reality, mat could be very big, something like 10000x15x10000. Therefore, I am quite sensitive to execution speed.
Here's my toy example:
% Create test matrix
mat = randi(100,20,5,2);
mat(1:5,1,1) = 0;
mat(1:7,2,1) = 0;
mat(1:3,4,1) = 0;
mat(1:10,5,1) = 0;
mat(1:2,1,2) = 0;
mat(1:3,3,2) = 0;
mat(1:7,4,2) = 0;
mat(1:4,5,2) = 0;
% Find first non-zero element in every column
[~, firstNonZero] = max( mat ~= 0 );
% Replace leading zeros with NaN
% How to vectorize this part???
[nRows, nCols, nPlanes] = size(mat);
for j = 1 : nPlanes
for i = 1 : nCols
mat(1:firstNonZero(1, i, j)-1, i, j) = NaN;
end
end

You could use cumsum to create a cumulative sum down each column, then all leading zeros have a cumulative sum of zero whilst all intermediate zeros have a cumulative sum greater than zero...
mat( cumsum(mat,1) == 0 ) = NaN;
As suggested in the comments, if your mat has negative values then there's a chance the cumulative sum will be 0 later on... use the sum of absolute values instead
mat( cumsum(abs(mat),1) == 0 ) = NaN;
Note that by default, cumsum operates along the first non-singleton dimension, you can use the optional dim argument to specify the dimension. I've used dim=1 to enforce column-wise operation in case your mat could be of height 1, but this is the default for any matrix with height greater than 1.
Note this uses == for comparison, you may want to read Why is 24.0000 not equal to 24.0000 in MATLAB? and use a threshold for your equality comparison.

Related

Mean value of each column of a matrix

I have a 64 X 64 matrix that I need to find the column-wise mean values for.
However, instead of dividing by the total number of elements in each column (i.e. 64), I need to divide by the total number of non-zeros in the matrix.
I managed to get it to work for a single column as shown below. For reference, the function that generates my matrix is titled fmu2(i,j).
q = 0;
for i = 1:64
if fmu2(i,1) ~= 0;
q = q + 1;
end
end
for i = 1:64
mv = (1/q).*sum(fmu2(i,1));
end
This works for generating the "mean" value of the first column. However, I'm having trouble looping this procedure so that I will get the mean for each column. I tried doing a nested for loop, but it just calculated the mean for the entire 64 X 64 matrix instead of one column at a time. Here's what I tried:
q = 0;
for i = 1:64
for j = 1:64
if fmu2(i,j) ~= 0;
q = q +1;
end
end
end
for i = 1:64
for j = 1:64
mv = (1/q).*sum(fmu2(i,j));
end
end
Like I said, this just gave me one value for the entire matrix instead of 64 individual "means" for each column. Any help would be appreciated.
For one thing, do not call the function that generates your matrix in each iteration of a loop. This is extremely inefficient and will cause major problems if your function is complex enough to have side effects. Store the return value in a variable once, and refer to that variable from then on.
Secondly, you do not need any loops here at all. The total number of nonzeros is given by the nnz function (short for number of non-zeros). The sum function accepts an optional dimension argument, so you can just tell it to sum along the columns instead of along the rows or the whole matrix.
m = fmu2(i,1)
averages = sum(m, 1) / nnz(m)
averages will be a 64-element array with an average for each column, since sum(m, 1) is a 64 element sum along each column and nnz(m) is a scalar.
One of the great things about MATLAB is that it provides vectorized implementations of just about everything. If you do it right, you should almost never have to use an explicit loop to do any mathematical operations at all.
If you want the column-wise mean of non-zero elements you can do the following
m = randi([0,5], 5, 5); % some data
avg = sum(m,1) ./ sum(m~=0,1);
This is a column-wise sum of values, divided by the column-wise number of elements not equal to 0. The result is a row vector where each element is the average of the corresponding column in m.
Note this is very flexible, you could use any condition in place of ~=0.

Columnwise removal of first ones from binary matrix. MATLAB

I have some binary matrix. I want to remove all first ones from each column, but keep one if this value is alone in column. I have some code, which produces correct result, but it looks ugly- I should iterate through all columns.
Could You give me a piece of advice how to improve my code?
Non-vectorised code:
% Dummy matrix for SE
M = 10^3;
N = 10^2;
ExampleMatrix = (rand(M,N)>0.9);
ExampleMatrix1=ExampleMatrix;
% Iterate columns
for iColumn = 1:size(ExampleMatrix,2)
idx = find(ExampleMatrix(:,iColumn)); % all nonzeroes elements
if numel(idx) > 1
% remove all ones except first
ExampleMatrix(idx(1),iColumn) = 0;
end
end
I think this does what you want:
ind_col = find(sum(ExampleMatrix, 1)>1); % index of relevant columns
[~, ind_row] = max(ExampleMatrix(:,ind_col), [], 1); % index of first max of each column
ExampleMatrix(ind_row + (ind_col-1)*size(ExampleMatrix,1)) = 0; % linear indexing
The code uses:
the fact that the second output of max gives the index of the first maximum value. In this case max is applied along the first dimension, to find the first maximum of each column;
linear indexing.

Generate a random sparse matrix with N non-zero-elements

I've written a function that generates a sparse matrix of size nxd
and puts in each column 2 non-zero values.
function [M] = generateSparse(n,d)
M = sparse(d,n);
sz = size(M);
nnzs = 2;
val = ceil(rand(nnzs,n));
inds = zeros(nnzs,d);
for i=1:n
ind = randperm(d,nnzs);
inds(:,i) = ind;
end
points = (1:n);
nnzInds = zeros(nnzs,d);
for i=1:nnzs
nnzInd = sub2ind(sz, inds(i,:), points);
nnzInds(i,:) = nnzInd;
end
M(nnzInds) = val;
end
However, I'd like to be able to give the function another parameter num-nnz which will make it choose randomly num-nnz cells and put there 1.
I can't use sprand as it requires density and I need the number of non-zero entries to be in-dependable from the matrix size. And giving a density is basically dependable of the matrix size.
I am a bit confused on how to pick the indices and fill them... I did with a loop which is extremely costly and would appreciate help.
EDIT:
Everything has to be sparse. A big enough matrix will crash in memory if I don't do it in a sparse way.
You seem close!
You could pick num_nnz random (unique) integers between 1 and the number of elements in the matrix, then assign the value 1 to the indices in those elements.
To pick the random unique integers, use randperm. To get the number of elements in the matrix use numel.
M = sparse(d, n); % create dxn sparse matrix
num_nnz = 10; % number of non-zero elements
idx = randperm(numel(M), num_nnz); % get unique random indices
M(idx) = 1; % Assign 1 to those indices

delete elements from a matrix and calculate mean

I have an N-by-M-Matrix as input called GR wich consists of the following numbers: -3,0,2,4,7,10,12
And I have to return a vector. If M=1, then it should just return the input.
If M>1 It should remove the lowest number from the matrix and then calculate the mean of the remaining numbers.
However, if one of the numbers in the row is -3, it should return the value -3 in the output.
My thoughts of the problem:
Is it possible to make a for loop?
for i=1:length(GR(:,1))
If length(GR(1,:))==1
GR=GR
end
If length(GR(1,:))>1
x=min(GR(i,:))=[] % for removing the lowest number in the row
GR=sum(x)/length(x(i,:))
I just don't have any Idea of how to detect if any of the numbers in the row is -3 and then return that value instead of calculating the mean and when I tried to delete the lowest number in the matrix using x=min(GR(i,:)) matlab gave me this error massage 'Deletion requires an existing variable.'
I put in a break function. As soon as it detects a -3 value it breaks from the loop. Same goes for the other function.
Note that it is an i,j (M*N) matrix. So you might need to change your loop.
for i=1:length(GR(:,1))
if GR(i,1)==-3
GR=-3
break
end
If length(GR(1,:))==1
GR=GR
break
end
If length(GR(1,:))>1
x=min(GR(i,:))=[] % for removing the lowest number in the row
GR=sum(x)/length(x(i,:))
end
end
you can use Nan's, nanmean, any, and dim argument in these functions:
% generate random matrix
M = randi(3);
N = randi(3);
nums = [-3,0,2,4,7,10,12];
GR = reshape(randsample(nums,N*M,true),[N M]);
% computation:
% find if GR has only one column
if size(GR,2) == 1
res = GR;
else
% find indexes of rows with -3 in them
idxs3 = any(GR == -3,2);
% the (column) index of the min. value in each row
[~,minCol] = min(GR,[],2);
% convert [row,col] index pair into linear index
minInd = sub2ind(size(GR),1:size(GR,1),minCol');
% set minimum value in each row to nan - to ignore it on averaging
GR(minInd) = nan;
% averaging each rows (except for the Nans)
res = nanmean(GR,2);
% set each row with (-3) in it to (-3)
res(idxs3) = -3;
end
disp(res)

How to generate this matrix in matlab

H matrix is n-by-n, n=10000. I can use loop to generate this matrix in matlab. I just wonder if there are any methods that can do this without looping in matlab.
You can see that the upper right portion of the matrix consists of 1 / sqrt(n*(n-1)), the diagonal elements consist of -(n-1)/sqrt(n*(n-1)), the first column consists of 1/sqrt(n) and the rest of the elements are zero.
We can generate the full matrix that consists of the first column having all 1 / sqrt(n), then having the rest of the columns with 1 / sqrt(n*(n-1)) then we'll need to modify the matrix to include the rest of what you want.
As such, let's concentrate on the elements that start from row 2, column 2 as these follow a pattern. Once we're done, we can construct the other things that build up the final matrix.
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Next, we will tackle the diagonal elements:
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Now, let's zero the rest of the elements:
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
Now that we're done, let's create a new matrix that pieces all of this together:
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Therefore, the full code is:
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Here's an example with n = 6:
>> H
H =
Columns 1 through 3
0.408248290463863 0.707106781186547 0.408248290463863
0.408248290463863 -0.707106781186547 0.408248290463863
0.408248290463863 0 -0.816496580927726
0.408248290463863 0 0
0.408248290463863 0 0
0.408248290463863 0 0
Columns 4 through 6
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
-0.866025403784439 0.223606797749979 0.182574185835055
0 -0.894427190999916 0.182574185835055
0 0 -0.912870929175277
Since you are working with a pretty large n value of 10000, you might want to squeeze out as much performance as possible.
Going with that, you can use an efficient approach based on cumsum -
%// Values to be set in each column for the upper triangular region
upper_tri = 1./sqrt([1:n].*(0:n-1));
%// Diagonal indices
diag_idx = [1:n+1:n*n];
%// Setup output array
out = zeros(n,n);
%// Set the first row of output array with upper triangular values
out(1,:) = upper_tri;
%// Set the diagonal elements with the negative triangular values.
%// The intention here is to perform CUMSUM across each column later on,
%// thus therewould be zeros beyond the diagonal positions for each column
out(diag_idx) = -upper_tri;
%// Set the first element of output array with n^(-1/2)
out(1) = -1/sqrt(n);
%// Finally, perform CUMSUM as suggested earlier
out = cumsum(out,1);
%// Set the diagonal elements with the actually expected values
out(diag_idx(2:end)) = upper_tri(2:end).*[-1:-1:-(n-1)];
Runtime Tests
(I) With n = 10000, the runtime at my end were - Elapsed time is 0.457543 seconds.
(II) Now, as the final performance-squeezing practice, you can edit the pre-allocation step for out with a faster pre-allocation scheme as listed in this MATLAB Undodumented Blog. Thus, the pre-allocation step would look like this -
out(n,n) = 0;
The runtime with this edited code was - Elapsed time is 0.400399 seconds.
(III) The runtime for n = 10000 with the other answer by #rayryeng yielded - Elapsed time is 1.306339 seconds.