Compute average distance between vector and its permutations - matlab

I have a vector, say x = [1 1.5 2]. I want to compute the expected distance between that vector and a random permutation of the vector. The assumption is that all permutations are equally likely.
For the example above, the solution should be 4/9. The first element changes 1/2 on average, the second element changes 1/3 on average, and the last one 1/2. The average change is therefore 4/9.
The problem is that this vector has about 50-100 entries. Is there a smart way to compute this expected distance?

I am now using mean(mean(abs(bsxfun(#minus,x,x')))) and this seems to do the trick.

One of the rare cases where bsxfun does not provide the fastest solution. If you want to make use of the symmetry, use pdist
s=sum(pdist(x,'cityblock'))/numel(x).^2*2

Related

spdiags and features scaling

According to libsvm faqs, the following one-line code scale each feature to the range of [0,1] in Matlab
(data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
so I'm using this code:
v_feature_trainN=(v_feature_train - repmat(mini,size(v_feature_train,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_train,2),size(v_feature_train,2));
v_feature_testN=(v_feature_test - repmat(mini,size(v_feature_test,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_test,2),size(v_feature_test,2));
where I use the first one to train the classifier and the second one to classify...
In my humble opinion scaling should be performed by:
i.e.:
v_feature_trainN2=(v_feature_train -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
v_feature_test_N2=(v_feature_test -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
Now I compared the classification results using these two scaling methods and the first one outperforms the second one.
The question are:
1) What exactly does the first method? I didn't understand it.
2) Why the code suggested by libsvm outperforms the second one (e.g. 80% vs 60%)?
Thank you so much in advance
First of all:
The code described in the libsvm does something different than your code:
It maps every column independently onto the interval [0,1].
Your code however uses the global min and max to map all the columns using the same affine transformation instead of a separate transformation for each column.
The first code works in the following way:
(data - repmat(min(data,[],1),size(data,1),1))
This subtracts each column's minimum from the entire column. It does this by computing the row vector of minima min(data,[],1) which is then replicated to build a matrix the same size as data. Then it is subtracted from data.
spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
This generates a diagonal matrix. The entry (i,i) of this matrix is 1 divided by the difference of the maximum and the minimum of the ith column: max(data(:,i))-min(data(:,i)).
The right multiplication of this diagonal matrix means: Multiply each column of the left matrix with the corresponding diagonal entry. This effectively divides column i by max(data(:,i))-min(data(:,i)).
Instead of using a sparse diagonal matrix, you could do this even more efficiently with bsxfun:
bsxfun(#rdivide, ...
bsxfun(#minus, ...
data, min(data,[],1)), ...
max(data,[],1)-min(data,[],1))
Which is the matlab way of writing:
Divide:
The difference of:
each column and its respective minimum
by the difference of each column's max and min.
I know this has already been answered correctly, but I would like to present another solution that I think is also correct and I found more intuitive/shorther then the one presented by knedlsepp. I am new to matlab and as I was studying knedlsepp solution, I found it more intuitive to solve this problem with the following formula:
function [ output ] = feature_scaling( y)
output = (y - repmat(min(y),size(y,1),1)) * diag(1./(max(y) - min(y)));
end
I find it a bit easier to use diag this way instead of spdiags, but I believe it produces the same result for the purpose of this excercise.
Multiplying the first term by the second, effectively divides each member of the matrix (Y-min(Y)) by the scalar value 1/(max(y)-min(y)), achieving the desired result.
In case someone prefers a shorter version, maybe this can be of help.

Purpose of matrix length

Matlab defines the matrix function length to return
Length of largest array dimension
What is an example of a use of knowing the largest dimension? Knowing number of rows or columns has obvious uses... but I don't know why someone would want the largest dimension regardless of whether it is rows or cols.
Thank You
In fact, most of my code wants to do things exactly once for each row, for each column or for each element.
Therefore, I typically use one of these
size(M,1)
size(M,2)
numel(V)
In particular do not depend on length to match the number of elements in a vector!
The only real convenience that I found {in older versions of matlab} for length is if I need a repeat statement rather than a while. Then it is convenient that length of vectors usually returns at least one.
Some other uses that I had for length:
A quick rough check whether something is big.
Making something square as mentioned by #Mike
This question addresses a good point and I have seen programs fail because of applying the length command on matrices (for looping). Especially when one expects to get size(M, n) because the n-th dimension should be the largest. In total, I can not see an advantage of allowing length to be applied on matrices, in fact I only see risks from probably unexpected behavior.
If I want to know the largest dimension of any matrix, I would prefer to be more explicit and use max(size(M)), which also should be much clearer for anyone reading this code.
I am not sure, whether the following example should be in this answer, but It somehow addresses the same point.
It is also useful to be explicit with dimension, when averaging over matrices. Consider the case, where you always want to average over the first dimension, i.e. over the columns of a matrix. As long as your matrix is of size n x m, where n is greater than 1, you do not have to care about specifying a dimension. But for unforseen cases, where your matrix happens to be a row-vector, things get messy:
%// good case, where num of rows is 2 or greater
size(mean(rand(2, 4), 1)) %// [1, 4]
size(mean(rand(2, 4))) %// [1, 4]
%// bad case, where num of rows is 1
size(mean(rand(1, 4), 1)) %// [1, 4]
size(mean(rand(1, 4))) %// [1, 1], returns the average of that row
If you want to create a square matrix B that can contain the input matrix A which is non-square, you can take the latter's length and use it to initialize the matrix B with zeros where the rows and columns would be of A's length, then copy the input matrix into the new zeroed matrix.
Another example - the one I use most - is when working with vectors. There it is very convenient to work with length instead of size(vec,1) or size(vec,2) as it doesn't matter if it is a row or a column vector.
As #Dennis Jaheruddin pointed out, length gave wrong results for empty vectors in some versions of MATLAB. Using numel instead of length might therefore be convenient for better backward compatibility. The readibility of the code is almost the same IMHO.
This question compares length and numel and their performance, and comes to the result that they perform similarly up to 100k elements in a vector. With more than 100k elements, numel appears to be faster. I tried to verify this (with MATLAB R2014a) and came to the following results:
Here, length is a bit slower, but as it is in the range of micro seconds, I guess it won't be a real difference in speed.

how to calculate the distance between two vectors in matlab

can you help me, I have 480(rows)*256(columns) which extracted by LBP operator.so i need to get the similarity matrices to apply the verification scenario.
e.g vector one with itself will give zero and vector one with vector two will give score and so on
why I am doing this, is because I need to calculate false accept rate and false reject rate
(FAR,FRR) by threshold.
thanks in advance
Use the pdist function. Note that it considers rows as instances (so you might want to transpose the matrix if you want to apply it to column vectors).

different sized bins in matlab

In Matlab I have a vector Muen which I want to reduce in size by dividing it in to different length bins. The vector has a few values that need high accuracy bins and a lot of values that are roughly equal and could be collected into bins with size of up to a few hundred values.
I also need to know the index for all old bins going into a new bin in order to shorten a sencod vector fluence.
The goal is to speed up a summation of two vectors sum(fluence.*Muen) by using different sized bins determined by Meun and do the sum of fluence into the new bins before the vector multiplication.
For this I try to use
edges=[min(Muen):0.0001:Muen(13),Muen(12:-1:1));
[N,bin]=histc(*Muen*,edges)
The problem is how to make the vector edges, as there is a large difference between the maximum and minimum of Muen and a small difference between other values. Is there a way to make the steps of edges depending on the derivative Muen?
In order to get the shorter version of Muen would be something like
MuenShort=N.*edges;
but it did not work quit right (could be a fault in edges), any suggestions?
I also do not really get how bin gives the index of the values that go into the new bins?
clarification:
what I want to do is from a vector m or Muen take the elements that are roughly equal and replace the with one element and at the same time keeping track of the index for which element goes into a new vector n or MuenShort. example
{m1}->n1,(1), {m2}->n2,(2), {m3,m4}-> m3=m4=n3,(3,4),{m5,m6,m7,m8}-> m5=m6=m7=m8=n4,{5,6,7,8}...
where n1>>n2 but the difference between n3 and n4 might not be so large. the number of m-elements in each n-element should be determined by the number of m-elements that are roughly equal to each other, or rather lies between two limits. So the bin size should vary between one element to a few hundred elements.
Then I want to use indexes to make the fluence vector shorter
fluenceShort(1:length(MuenShort))= [sum(fluence(1)),sum(fluence(2)),sum(fluence(3,4)),sum(fluence(5,6,7,8))...];
goal=sum(fluenceShort.*MuenShort)
Is there a way to implement this in Matlab?
Even if I don't understand your question clearly, I would suggest this. Perhaps you could sort your vector muen, pick a fixed number n, and define each bin so that it contains exactly n values of muen. For simplicity, the length of muen is assumed to be a multiple of n:
n = 10;
m = length(muen_sorted)/n;
muen_sorted = sort(muen);
edges = [-inf mean([muen_sorted(n:n:end-1); muen_sorted(n+1:n:end)]) inf ];
muen_short = mean(reshape(muen_sorted,n,m));
Note that m+1 edges (vector edges) are obtained, corresponding to m bins. Bin edges lie exactly between the closest values of neighbouring bins. Thus, the upper edge of the first bin is (muen_sorted(n)+muen_sorted(n+1)/2; the upper edge of the next bin is (muen_sorted(2*n)+muen_sorted(2*n+1)/2, and so on.
The "representative value" of each bin (vector muen_short) is computed as the mean of the values that lie in that bin. Or perhaps the median would make more sense, depending on your application.
As a result of this code, muen_short(1) is the value corresponding to the bin with edges edge(1) and edge(2); muen_short(2) is the value corresponding to the bin with edges edge(2) and edge(3), etc.
You can now use the variable edges to build the histogram of fluence with those same edges.

randomly pick number from a matrix in matlab

How can i randomly pick a number from the given following matrix below?
A=[0.06 0.47 0.47]
I just want to randomly pick a number from the matrix above. I am doing this in matlab enviornment. please help.
Also, Is it possible assume a variable in matlab that tends to zero, like we do in limits?
If your matrix is M then to pick a random element with uniform probability you can use randi:
M(randi(numel(M)))
Yes, using randi:
A(randi(numel(A)))