Maintaining order of corresponding matrices in MATLAB - matlab

I have a logistic regression model an I want to create a lift chart to show its efficacy. To do that I need to order my validation set by descending predicted probability. This sort is easily done in MATLAB but I need to know how it changes the order of my predictions so that I can re-order the actual values of validation set accordingly, is there a simple way to do this without writing code?

The second output of sort:
[As,inds] = sort(A,'descend');
Bs = B(inds);
Note that if you have your vectors in a single matrix, you can use sortrows. For example, if you want to sort a matrix X according to the second column:
Y = sortrows(X,-2) % -2 means second column, descending
Y1 = Y(:,1); % first column of X sorted according to X(:,2)

Related

Is there a way to parse each row of a matrix in Octave?

I am new to Octave and I wanted to know if there is a way to parse each row of a matrix and use it individually. Ultimately I want to use the rows to check if they are all vertical to each other (the dot product have to be equal to 0 for two vectors to be vertical to each other) so if you have some ideas about that I would love to hear them. Also I wanted to know if there is a function to determine the length (or the amplitude) of a vector.
Thank you in advance.
If by "parse each row" you mean a loop that takes each row one by one, you only need a for loop over the transposed matrix. This works because the for loop takes successive columns of its argument.
Example:
A = [10 20; 30 40; 50 60];
for row = A.'; % loop over columns of transposed matrix
row = row.'; % transpose back to obtain rows of the original matrix
disp(row); % do whatever you need with each row
end
However, loops can often be avoided in Matlab/Octave, in favour of vectorized code. For the specific case you mention, computing the dot product between each pair of rows of A is the same as computing the matrix product of A times itself transposed:
A*A.'
However, for the general case of a complex matrix, the dot product is defined with a complex conjugate, so you should use the complex-conjugate transpose:
P = A*A';
Now P(m,n) contains the dot product of the n-th and m-th rows of A. The condition you want to test is equivalent to P being a diagonal matrix:
result = isdiag(P); % gives true of false

Pairwise Similarity and Sorting Samples

The following is a problem from an assignment that I am trying to solve:
Visualization of similarity matrix. Represent every sample with a four-dimension vector (sepal length, sepal width, petal length, petal width). For every two samples, compute their pair-wise similarity. You may do so using the Euclidean distance or other metrics. This leads to a similarity matrix where the element (i,j) stores the similarity between samples i and j. Please sort all samples so that samples from the same category appear together. Visualize the matrix using the function imagesc() or any other function.
Here is the code I have written so far:
load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
iris_distance = table2array(iris_copy); % convert the table to an array
% pairwise similarity
D = pdist(iris_distance); % calculate the Euclidean distance and store the result in D
W = squareform(D); % convert to squareform
figure()
imagesc(W); % visualize the matrix
Now, I think I've got the coding mostly right to answer the question. My issue is how to sort all the samples so that samples from the same category appear together because I got rid of the names when I created the copy. Is it already sorted by converting to squareform? Other suggestions? Thank you!
It should be in the same order as the original data. While you could sort it afterwards, the easiest solution is to actually sort your data by class after line 2 and before line 3.
load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
% Sort the table here on the "Class" attribute. Don't forget to change the table name
% in the next line too if you need to.
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
Consider using sortrows:
tblB = sortrows(tblA,'RowNames') sorts a table based on its row names. Row names of a table label the rows along the first dimension of the table. If tblA does not have row names, that is, if tblA.Properties.RowNames is empty, then sortrows returns tblA.

Difference in Matlab results when using PCA() and PCACOV()

Closest match I can get is to run:
data=rand(100,10); % data set
[W,pc] = pca(cov(data));
then don't demean
data2=data
[W2, EvalueMatrix2] = eig(cov(data2));
[W3, EvalueMatrix3] = svd(cov(data2));
In this case W2 and W3 agree and W is the transpose of them?
Still not clear why W should be the transpose of the other two?
As an extra check I use pcacov:
[W4, EvalueMatrix4] = pcacov(cov(data2));
Again it agrees with WE and W3 but is the transpose of W?
The results are different because you're subtracting the mean of each row of the data matrix. Based on the way you're computing things, rows of the data matrix correspond to data points and columns correspond to dimensions (this is how the pca() function works too). With this setup, you should subtract the mean from each column, not row. This corresponds to 'centering' the data; the mean along each dimension is set to zero. Once you do this, results should be equivalent to pca(), up to sign flips.
Edit to address edited question:
Centering issue looks ok now. When you run the eigenvalue decomposition on the covariance matrix, remember to sort the eigenvectors in order of descending eigenvalues. This should match the output of pcacov(). When calling pca(), you have to pass it the data matrix, not the covariance matrix.

Find the minimum absolute values along the third dimension in a 3D matrix and ensuring the sign is maintained

I have a m X n X k matrix and I want to find the elements that have minimal absolute value along the third dimension for each unique 2D spatial coordinate. An additional constraint is that once I find these minimum values, the sign of these values (i.e. before I took the absolute value) must be maintained.
The code I wrote to accomplish this is shown below.
tmp = abs(dist); %the size(dist)=[m,n,k]
[v,ind] = min(tmp,[],3); %find the index of minimal absolute value in the 3rd dimension
ind = reshape(ind,m*n,1);
[col,row]=meshgrid(1:n,1:m); row = reshape(row,m*n,1); col = reshape(col,m*n,1);
ind2 = sub2ind(size(dist),row,col,ind); % row, col, ind are sub
dm = dist(ind2); %take the signed value from dist
dm = reshape(dm,m,n);
The resulting matrix dm which is m X n corresponds to the matrix that is subject to the constraints that I have mentioned previously. However, this code sounds a little bit inefficient since I have to generate linear indices. Is there any way to improve this?
If I'm interpreting your problem statement correctly, you wish to find the minimum absolute value along the third dimension for each unique 2D spatial coordinate in your 3D matrix. This is already being done by the first two lines of your code.
However, a small caveat is that once you find these minimum values, you must ensure that the original sign of these values (i.e. before taking the absolute value) are respected. That is the purpose of the rest of the code.
If you want to select the original values, you don't have a choice but to generate the correct linear indices and sample from the original matrix. However, a lot of code is rather superfluous. There is no need to perform any kind of reshaping.
We can simplify your method by using ndgrid to generate the correct spatial coordinates to sample from the 3D matrix then use ind from your code to reference the third dimension. After, use this to sample dist and complete your code:
%// From your code
[v,ind] = min(abs(dist),[],3);
%// New code
[row,col] = ndgrid(1:size(dist,1), 1:size(dist,2));
%// Output
dm = dist(sub2ind(size(dist), row, col, ind));

Matlab code to compare two histograms

I want to compare two image histograms. They are as follows:
h1 --> double valued 1 dimension vector .4096 in length.
h2 --> double valued 1 dimension vector .4096 in length.
I am using this matlab function here:
http://clickdamage.com/sourcecode/code/compareHists.m
It is as follows:
% s = compareHists(h1,h2)
% returns a histogram similarity in the range 0..1
%
% Compares 2 normalised histograms using the Bhattacharyya coefficient.
% Assumes that sum(h1) == sum(h2) == 1
%
function s = compareHists(h1,h2)
s = sum(sum(sum(sqrt(h1).*sqrt(h2))));
My question is :
Is there a need for multiple sums?
Even if there is only one sum in the above equation, it would suffice..right?
like this: sum(sqrt(h1).*sqrt(h2)) --> ?
Can some one please explain the code above? Also, tell me if I use a single sum will it be all right?
I tried both ways and got the same answer for two image histograms. I did this with only two histograms not more and hence want to be sure.
Thanks!
In general, sum does the sum along one dimension only. If you want to sum along multiple dimensions you either
use sum several times; or
use linear indexing to reduce to a single dimension and then use sum once: sum(sqrt(h1(:)).*sqrt(h2(:))).
In your case, if there's only one dimension, yes, a single sum would suffice.
You are right. Only one sum is needed. However, if either h1 or h2 is a multidimensional matrix, then you may want to sum as many as the dimensions. For example:
A=magic(4); % a 4 by 4 matrix of magic numbers.
sum(A) % returns [34,34,34,34], i.e. the sum of elements in each column.
sum(sum(A)) % returns 136, i.e. the sum of all elements in A.
I believe the code you downloaded originaly was written to handle multiple histograms stacked as columns of a matrix. This is (IMHO) the reason for the multiple sums.
In your case you can leave it with only one sum.
You can do even better - without any sum
Hover here to see the answer
s = sqrt(h1(:)')*sqrt(h2(:));
The trick is to use vector multiplication!
I don't see any points in 3 sums too, but if you have not a vector with histogram but a matrix you will need 2 sums like this sum(sum(sqrt(h1).*sqrt(h2))) to compare them. First one will calculate the sum of the rows, the second - the sum of the columns.