Purpose of matrix length - matlab

Matlab defines the matrix function length to return
Length of largest array dimension
What is an example of a use of knowing the largest dimension? Knowing number of rows or columns has obvious uses... but I don't know why someone would want the largest dimension regardless of whether it is rows or cols.
Thank You

In fact, most of my code wants to do things exactly once for each row, for each column or for each element.
Therefore, I typically use one of these
size(M,1)
size(M,2)
numel(V)
In particular do not depend on length to match the number of elements in a vector!
The only real convenience that I found {in older versions of matlab} for length is if I need a repeat statement rather than a while. Then it is convenient that length of vectors usually returns at least one.
Some other uses that I had for length:
A quick rough check whether something is big.
Making something square as mentioned by #Mike

This question addresses a good point and I have seen programs fail because of applying the length command on matrices (for looping). Especially when one expects to get size(M, n) because the n-th dimension should be the largest. In total, I can not see an advantage of allowing length to be applied on matrices, in fact I only see risks from probably unexpected behavior.
If I want to know the largest dimension of any matrix, I would prefer to be more explicit and use max(size(M)), which also should be much clearer for anyone reading this code.
I am not sure, whether the following example should be in this answer, but It somehow addresses the same point.
It is also useful to be explicit with dimension, when averaging over matrices. Consider the case, where you always want to average over the first dimension, i.e. over the columns of a matrix. As long as your matrix is of size n x m, where n is greater than 1, you do not have to care about specifying a dimension. But for unforseen cases, where your matrix happens to be a row-vector, things get messy:
%// good case, where num of rows is 2 or greater
size(mean(rand(2, 4), 1)) %// [1, 4]
size(mean(rand(2, 4))) %// [1, 4]
%// bad case, where num of rows is 1
size(mean(rand(1, 4), 1)) %// [1, 4]
size(mean(rand(1, 4))) %// [1, 1], returns the average of that row

If you want to create a square matrix B that can contain the input matrix A which is non-square, you can take the latter's length and use it to initialize the matrix B with zeros where the rows and columns would be of A's length, then copy the input matrix into the new zeroed matrix.

Another example - the one I use most - is when working with vectors. There it is very convenient to work with length instead of size(vec,1) or size(vec,2) as it doesn't matter if it is a row or a column vector.
As #Dennis Jaheruddin pointed out, length gave wrong results for empty vectors in some versions of MATLAB. Using numel instead of length might therefore be convenient for better backward compatibility. The readibility of the code is almost the same IMHO.
This question compares length and numel and their performance, and comes to the result that they perform similarly up to 100k elements in a vector. With more than 100k elements, numel appears to be faster. I tried to verify this (with MATLAB R2014a) and came to the following results:
Here, length is a bit slower, but as it is in the range of micro seconds, I guess it won't be a real difference in speed.

Related

spdiags and features scaling

According to libsvm faqs, the following one-line code scale each feature to the range of [0,1] in Matlab
(data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
so I'm using this code:
v_feature_trainN=(v_feature_train - repmat(mini,size(v_feature_train,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_train,2),size(v_feature_train,2));
v_feature_testN=(v_feature_test - repmat(mini,size(v_feature_test,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_test,2),size(v_feature_test,2));
where I use the first one to train the classifier and the second one to classify...
In my humble opinion scaling should be performed by:
i.e.:
v_feature_trainN2=(v_feature_train -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
v_feature_test_N2=(v_feature_test -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
Now I compared the classification results using these two scaling methods and the first one outperforms the second one.
The question are:
1) What exactly does the first method? I didn't understand it.
2) Why the code suggested by libsvm outperforms the second one (e.g. 80% vs 60%)?
Thank you so much in advance
First of all:
The code described in the libsvm does something different than your code:
It maps every column independently onto the interval [0,1].
Your code however uses the global min and max to map all the columns using the same affine transformation instead of a separate transformation for each column.
The first code works in the following way:
(data - repmat(min(data,[],1),size(data,1),1))
This subtracts each column's minimum from the entire column. It does this by computing the row vector of minima min(data,[],1) which is then replicated to build a matrix the same size as data. Then it is subtracted from data.
spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
This generates a diagonal matrix. The entry (i,i) of this matrix is 1 divided by the difference of the maximum and the minimum of the ith column: max(data(:,i))-min(data(:,i)).
The right multiplication of this diagonal matrix means: Multiply each column of the left matrix with the corresponding diagonal entry. This effectively divides column i by max(data(:,i))-min(data(:,i)).
Instead of using a sparse diagonal matrix, you could do this even more efficiently with bsxfun:
bsxfun(#rdivide, ...
bsxfun(#minus, ...
data, min(data,[],1)), ...
max(data,[],1)-min(data,[],1))
Which is the matlab way of writing:
Divide:
The difference of:
each column and its respective minimum
by the difference of each column's max and min.
I know this has already been answered correctly, but I would like to present another solution that I think is also correct and I found more intuitive/shorther then the one presented by knedlsepp. I am new to matlab and as I was studying knedlsepp solution, I found it more intuitive to solve this problem with the following formula:
function [ output ] = feature_scaling( y)
output = (y - repmat(min(y),size(y,1),1)) * diag(1./(max(y) - min(y)));
end
I find it a bit easier to use diag this way instead of spdiags, but I believe it produces the same result for the purpose of this excercise.
Multiplying the first term by the second, effectively divides each member of the matrix (Y-min(Y)) by the scalar value 1/(max(y)-min(y)), achieving the desired result.
In case someone prefers a shorter version, maybe this can be of help.

MATLAB: why transpose single dimension array

I have the following Matlab code snippet that I'm having to translate to VBScript. However, I'm not understanding why the last line is even necessary.
clear i
for i = 1:numb_days
doy(i) = floor(dt_daily(i) - datenum(2012,12,31,0,0,0));
end
doy = doy';
Looking over the rest of the code, this happens in a lot of other places where there are single dimension arrays (?) being transposed in place. I'm a newbie when it comes to both these languages, as well as posting a question on Stack, as I'm a sleuth when it comes to finding answers, just not in this case. Thanks in advance.
All "arrays" in MATLAB have at least two dimensions, and can be treated as having any number of dimensions you wish. The transpose operator here is converting between a row (size [1 N] array) and a column (size [N 1] array). This can be significant when it comes to either concatenating the arrays, or performing other operations.
Conceptually, the dimension vector of a MATLAB array has as many trailing 1s as is required to perform an operation. This means that you can index any MATLAB array with any number of subscripts, providing you don't exceed the bounds, like so:
x = magic(4); % 4-by-4 square matrix
x(2,3,1,1,1) % pick an element
One final note: the ' operator is the complex-conjugate transpose CTRANSPOSE. The .' operator is the ordinary TRANSPOSE operator.

Matlab - vector divide by vector, use loop

I have to two evenly sized very large vectors (columns) A and B. I would like to divide vector A by vector B. This will give me a large matrix AxB filled with zeros, except the last column. This column contains the values I'm interested in. When I simple divide the vectors in a Matlab script, I run out of memory. Probably because the matrix AxB becomes very large. Probably I can prevent this from happening by repeating the following:
calculating the first row of matrix AxB
filter the last value and put it into another vector C.
delete the used row of matrix AxB
redo step 1-4 for all rows in vector A
How can I make a loop which does this?
You're question doesn't make it clear what you are trying to do, although it sounds like you want to do an element wise division.
Try:
C = A./B
"Matrix product AxB" and "dividing vectors" are distinct operations.
If we understood this correctly, what you do want to calculate is "C = last column from AxB", such that:
lastcolsel=zeros(size(B,2),1)
C=(A*B)*lastcolsel
If that code breaks your memory limit, recall that matrix product is associative (MxN)xP = Mx(NxP). Simplifying your example, we get:
lastcolsel=zeros(size(B,2),1)
simplifier=B*lastcolsel
C=A*simplifier

matlab: populating a sparse matrix with addition

preface: As the matlab guiderules state, Usually, when one wants to efficiently populate a sparse matrix in matlab, he should create a vector of indexes into the matrix and a vector of values he wants to assign, and then concentrate all the assignments into one atomic operation, so as to allow matlab to "prepare" the matrix in advance and optimize the assignment speed. A simple example:
A=sparse([]);
inds=some_index_generating_method();
vals=some_value_generating_method();
A(inds)=vals;
My question: what can I do in the case where inds contain overlapping indexes, i.e inds=[4 17 8 17 9] where 17 repeats twice.
In this case, what I would want to happen is that the matrix would be assigned the addition of all the values that are mapped to the same index, i.e for the previous example
A(17)=vals(2)+vals(4) %as inds(2)==inds(4)
Is there any straightforward and, most importantly, fast way to achieve this? I have no way of generating the indexes and values in a "smarter" way.
This might help:
S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space allocated for nzmax nonzeros. Vectors i, j, and s are all the same length. Any elements of s that are zero are ignored, along with the corresponding values of i and j. Any elements of s that have duplicate values of i and j are added together.
See more at MATLAB documentation for sparse function

MATLAB Remove Only Certain Zeros from Matrix

I have seen plenty of answers regarding how to remove leading and/or trailing zeros, and how to remove all zeros from a vector or matrix. What I need to do, though, is only remove some of them. I have two matrices, and I only want to remove the entries where both of them are zero. They are two-dimensional x and y coordinates, solved using characteristics (I can give more detail if needed) and I just want to remove the values where both matrices contain zeros at the same indices. I can easily convert the matrices into vectors and work with vectors, so any help in either case would be greatly appreciated.
For the sake of simplicity, let's assume you're using vectors called X and Y (of the same length), and you want to remove only those entries where both vectors are zero. Here's how (not tested):
% Find the indexes where either X or Y is different from zero
% (these are the indexes of the components we want to keep)
I = find(X~=0 | Y~=0);
% Select the desired components from X and Y
X=X(I);
Y=Y(I);
Edit: As Oli has pointed out below (and stefano explained further), you should use logical indexing for better performance.