remove columns of zeros for regstats function - matlab

I'm performing linear regression analyses (y=xb, solving for b with a given [nx1] vector y and [nxm] matrix x) on a pretty large set of data, using the regstats() function the Matlab statistics toolbox and looping through a series of matrix/vector pairs. The problem is that regstats returns NaN if there are columns of all zeros, because it can't perform the regression. There are columns of zeros in all of my x-matrices, but they do not always appear in the same column numbers. Since each column in my x-matrices represents a real-world variable, I can't just simply remove columns of zeros and run the regression. I need to remove the zeros, remember which columns have been removed, run the regression, and then incorporate 0 values into the b vector result in the appropriate places. That way all of my results represent the same number of variables in the same order, with zeros in the places where that particular variable was not included in the regression. I did this manually with a small set of test data, but now I need to run it for about 800 regression pairs so I need some way to automate searching for and replacing the zero columns.

IZEROS = find(all(M==0));
IZEROS will be a list of the indices of the columns that have all zeros.

allzero = all(x == 0, 1);
goodcols = find(~allzero);
b = zeros(m,1);
b(goodcols) = % solution to problem, taking into account only goodcols

Related

Is there a way to parse each row of a matrix in Octave?

I am new to Octave and I wanted to know if there is a way to parse each row of a matrix and use it individually. Ultimately I want to use the rows to check if they are all vertical to each other (the dot product have to be equal to 0 for two vectors to be vertical to each other) so if you have some ideas about that I would love to hear them. Also I wanted to know if there is a function to determine the length (or the amplitude) of a vector.
Thank you in advance.
If by "parse each row" you mean a loop that takes each row one by one, you only need a for loop over the transposed matrix. This works because the for loop takes successive columns of its argument.
Example:
A = [10 20; 30 40; 50 60];
for row = A.'; % loop over columns of transposed matrix
row = row.'; % transpose back to obtain rows of the original matrix
disp(row); % do whatever you need with each row
end
However, loops can often be avoided in Matlab/Octave, in favour of vectorized code. For the specific case you mention, computing the dot product between each pair of rows of A is the same as computing the matrix product of A times itself transposed:
A*A.'
However, for the general case of a complex matrix, the dot product is defined with a complex conjugate, so you should use the complex-conjugate transpose:
P = A*A';
Now P(m,n) contains the dot product of the n-th and m-th rows of A. The condition you want to test is equivalent to P being a diagonal matrix:
result = isdiag(P); % gives true of false

How to change the value of a random subset of elements in a matrix without using a loop?

I'm currently attempting to select a subset of 0's in a very large matrix (about 400x300 elements) and change their value to 1. I am able to do this, but it requires using a loop where each instance it selects the next value in a randperm vector. In other words, 50% of the 0's in the matrix are randomly selected, one-at-a-time, and changed to 1:
z=1;
for z=1:(.5*numberofzeroes)
A(zeroposition(rpnumberofzeroes(z),1),zeroposition(rpnumberofzeroes(z),2))=1;
z=z+1;
end
Where 'A' is the matrix, 'zeroposition' is a 2-column-wide matrix with the positions of the 0's in the matrix (the "coordinates" if you like), and 'rpnumberofzeros' is a randperm vector from 1 to the number of zeroes in the matrix.
So for example, for z=20, the code might be something like this:
A(3557,2684)=1;
...so that the 0 which appears in this location within A will now be a 1.
It performs this loop thousands of times, because .5*numberofzeroes is a very big number. This inevitably takes a long time, so my question is can this be done without using a loop? Or at least, in some way that takes less processing resources/time?
As I said, the only thing that needs to be done is an entirely random selection of 50% (or whatever proportion) of the 0's changed to 1.
Thanks in advance for the help, and let me know if I can clear anything up! I'm new here, so apologies in advance if I've made any faux pa's.
That's very easy. I'd like to introduce you to my friend sub2ind. sub2ind allows you to take row and column coordinates of a matrix and convert them into linear column-major indices so that you can access multiple values in a matrix simultaneously in a single call. As such, the equivalent code you want is:
%// First access the values in rpnumberofzeroes
vals = rpnumberofzeroes(1:0.5*numberofzeroes, :);
%// Now, use the columns of these to determine which rows and columns we want
%// to access A
rows = zeroposition(vals(:,1), 1);
cols = zeroposition(vals(:,2), 2);
%// Get linear indices via sub2ind
ind1 = sub2ind(size(A), rows, cols);
%// Now set these locations to 1
A(ind1) = 1;
The first statement gets the first half of your matrix of coordinates stored in rpnumberofzeroes. The first column is the row coordinates, the second column is the column coordinates. Notice that in your code, you wish to use the values in zeroposition to access the locations in A. As such, extract out the corresponding rows and columns from rpnumberofzeroes to figure out the right rows and columns from zeroposition. Once that's done, we wish to use these new rows and columns from zeroposition and index into A. sub2ind requires three inputs - the size of the matrix you are trying to access... so in our case, that's A, the row coordinates and the column coordinates. The output is a set of column major indices that are computed for each row and column pair.
The last piece of the puzzle is to use these to index into A and set the locations to 1.
This can be accomplished with linear indexing as well:
% find linear position of all zeros in matrix
ix=find(abs(A)<eps);
% set one half of those, selected at random, to one.
A(ix(randperm(round(numel(ix)*.5)))=1;

Matlab - vector divide by vector, use loop

I have to two evenly sized very large vectors (columns) A and B. I would like to divide vector A by vector B. This will give me a large matrix AxB filled with zeros, except the last column. This column contains the values I'm interested in. When I simple divide the vectors in a Matlab script, I run out of memory. Probably because the matrix AxB becomes very large. Probably I can prevent this from happening by repeating the following:
calculating the first row of matrix AxB
filter the last value and put it into another vector C.
delete the used row of matrix AxB
redo step 1-4 for all rows in vector A
How can I make a loop which does this?
You're question doesn't make it clear what you are trying to do, although it sounds like you want to do an element wise division.
Try:
C = A./B
"Matrix product AxB" and "dividing vectors" are distinct operations.
If we understood this correctly, what you do want to calculate is "C = last column from AxB", such that:
lastcolsel=zeros(size(B,2),1)
C=(A*B)*lastcolsel
If that code breaks your memory limit, recall that matrix product is associative (MxN)xP = Mx(NxP). Simplifying your example, we get:
lastcolsel=zeros(size(B,2),1)
simplifier=B*lastcolsel
C=A*simplifier

How to make matrix with unequal length of the vectors in Matlab?

I have different distribution for different time (t) with each distribution having 10,000 elements. I have following line of code which computes CDF for different distributions inside the loop with t varying from 1 to nT:
[f_CDF(:,t),x_CDF(:,t)]=ecdf(uncon_noise_columndata_all_nModels_diff_t(:,1,t));
Matlab's function ecdf gives CDF values which could be less than the total number of elements in the distribution because the probability for the repeated elements gets added up. As a result, the output variables f_CDF and x_CDF run into problem of ??? Subscripted assignment dimension mismatch. error because of different length of vectors at different t.
How can I sort this problem such that NaN could fill up the places where the vector's length is less than the maximum length of any vector in the whole matrix and I am able to implement the above line of code inside the loop. Thanks.
Here are two of many ways to approach this problem:
1) Use a cell array
Consider storing the results in a cell array instead of a matrix which, by definition, requires columns to have the same length.
[f_CDF{t},x_CDF{t}]=ecdf(uncon_noise_columndata_all_nModels_diff_t(:,1,t));
2) Preallocate NaN matrices
Before running the loop which calculates the CDF results, create a matrix filled with NaNs. You know that each column won't be longer than 10,000 records.
f_CDF = NaN * ones(10000, nT);
x_CDF = NaN * ones(10000, nT);
for t = 1:nT
[f_CDFTemp, x_CDFTemp]=ecdf(uncon_noise_columndata_all_nModels_diff_t(:,1,t));
f_CDF(1:length(f_CDFTemp),t) = f_CDFTemp;
x_CDF(1:length(x_CDFTemp),t) = x_CDFTemp;
end

How to have normalize data around the average for the column in MATLAB?

I am trying to take a matrix and normalize the values in each cell around the average for that column. By normalize I mean subtract the value in each cell from the mean value in that column i.e. subtract the mean for Column1 from the values in Column1...subtract mean for ColumnN from the values in ColumnN. I am looking for script in Matlab. Thanks!
You could use the function mean to get the mean of each column, then the function bsxfun to subtract that from each column:
M = bsxfun(#minus, M, mean(M, 1));
Additionally, starting in version R2016b, you can take advantage of the fact that MATLAB will perform implicit expansion of operands to the correct size for the arithmetic operation. This means you can simply do this:
M = M-mean(M, 1);
Try the mean function for starters. Passing a matrix to it will result in all the columns being averaged and returns a row vector.
Next, you need to subtract off the mean. To do that, the matrices must be the same size, so use repmat on your mean row vector.
a=rand(10);
abar=mean(a);
abar=repmat(abar,size(a,1),1);
anorm=a-abar;
or the one-liner:
anorm=a-repmat(mean(a),size(a,1),1);
% Assuming your matrix is in A
m = mean(A);
A_norm = A - repmat(m,size(A,1),1)
As has been pointed out, you'll want the mean function, which when called without any additional arguments gives the mean of each column in the input. A slight complication then comes up because you can't simply subtract the mean -- its dimensions are different from the original matrix.
So try this:
a = magic(4)
b = a - repmat(mean(a),[size(a,1) 1]) % subtract columnwise mean from elements in a
repmat replicates the mean to match the data dimensions.