According to libsvm faqs, the following one-line code scale each feature to the range of [0,1] in Matlab
(data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
so I'm using this code:
v_feature_trainN=(v_feature_train - repmat(mini,size(v_feature_train,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_train,2),size(v_feature_train,2));
v_feature_testN=(v_feature_test - repmat(mini,size(v_feature_test,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_test,2),size(v_feature_test,2));
where I use the first one to train the classifier and the second one to classify...
In my humble opinion scaling should be performed by:
i.e.:
v_feature_trainN2=(v_feature_train -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
v_feature_test_N2=(v_feature_test -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
Now I compared the classification results using these two scaling methods and the first one outperforms the second one.
The question are:
1) What exactly does the first method? I didn't understand it.
2) Why the code suggested by libsvm outperforms the second one (e.g. 80% vs 60%)?
Thank you so much in advance
First of all:
The code described in the libsvm does something different than your code:
It maps every column independently onto the interval [0,1].
Your code however uses the global min and max to map all the columns using the same affine transformation instead of a separate transformation for each column.
The first code works in the following way:
(data - repmat(min(data,[],1),size(data,1),1))
This subtracts each column's minimum from the entire column. It does this by computing the row vector of minima min(data,[],1) which is then replicated to build a matrix the same size as data. Then it is subtracted from data.
spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
This generates a diagonal matrix. The entry (i,i) of this matrix is 1 divided by the difference of the maximum and the minimum of the ith column: max(data(:,i))-min(data(:,i)).
The right multiplication of this diagonal matrix means: Multiply each column of the left matrix with the corresponding diagonal entry. This effectively divides column i by max(data(:,i))-min(data(:,i)).
Instead of using a sparse diagonal matrix, you could do this even more efficiently with bsxfun:
bsxfun(#rdivide, ...
bsxfun(#minus, ...
data, min(data,[],1)), ...
max(data,[],1)-min(data,[],1))
Which is the matlab way of writing:
Divide:
The difference of:
each column and its respective minimum
by the difference of each column's max and min.
I know this has already been answered correctly, but I would like to present another solution that I think is also correct and I found more intuitive/shorther then the one presented by knedlsepp. I am new to matlab and as I was studying knedlsepp solution, I found it more intuitive to solve this problem with the following formula:
function [ output ] = feature_scaling( y)
output = (y - repmat(min(y),size(y,1),1)) * diag(1./(max(y) - min(y)));
end
I find it a bit easier to use diag this way instead of spdiags, but I believe it produces the same result for the purpose of this excercise.
Multiplying the first term by the second, effectively divides each member of the matrix (Y-min(Y)) by the scalar value 1/(max(y)-min(y)), achieving the desired result.
In case someone prefers a shorter version, maybe this can be of help.
Related
I would like a function to calculate the KL distance between two histograms in MatLab. I tried this code:
http://www.mathworks.com/matlabcentral/fileexchange/13089-kldiv
However, it says that I should have two distributions P and Q of sizes n x nbins. However, I am having trouble understanding how the author of the package wants me to arrange the histograms. I thought that providing the discretized values of the random variable together with the number of bins would suffice (I would assume the algorithm would use an arbitrary support to evaluate the expectations).
Any help is appreciated.
Thanks.
The function you link to requires that the two histograms passed be aligned and thus have the same length NBIN x N (not N X NBIN), that is, if N>1 then the number of rows in the inputs should be equal to the number of bins in the histograms. If you are just going to compare two histograms (that is if N=1) it doesn't really matter, you can pass either row or column vector versions of these as long as you are consistent and the order of bins matches.
A generic call to the function looks like this:
dists = kldiv(bins,P,Q)
The implementation allows comparison of multiple histograms to each other (that is, N>1), in which case pairs of columns (with matching column index) in each array are compared and the result is a row vector with distances for each matching pair.
Array bins should be the same size as P and Q and is used to perform a very minimal check that the inputs are of the same size, but is not used in the computation. The routine expects bins to contain the numeric labels of your bins so that it can check for repeated bin labels and warn you if repeats occur, but otherwise doesn't use the information.
You could do away with bins and compute the distance with
KL = sum(P .* (log2(P)-log2(Q)));
without using the Matlab Central versions. However the version you link to performs the abovementioned minimal checks and in addition allows computation of two alternative distances (consult the documentation).
The version linked to by eigenchris checks that no histogram bins are empty (which would make the computation blow up numerically) and if there are, removes their contribution to the sum (not sure this is entirely appropriate - consult an expert on the subject). It should probably also be aware of the exact form of the formula, specifically note the use of log2 above versus natural logarithm in the version linked to by eigenchris.
So I was trying to spread one matrix elements, which were generated with poissrnd, to another with using some bigger (wider?) probability function (for example 100 different possibilities with different weights) to plot both of them and see if the fluctuations after spread went down. After seeing it doesn't work right (fluctuations got bigger) I tried to identify what I did wrong on a really simple example. After testing it for a really long time I still can't understand what's wrong. The example goes like this:
I generate vector with poissrnd and vector for spreading (filled with zeros at the start)
Each element from the poiss vector tells me how many numbers (0.1 of the element value) to generate from the possible options which are: [1,2,3] with corresponding weights [0.2,0.5,0.2]
I spread what I got to my another vector on 3 elements: the corresponding (k-th one), one bofore the corresponding one and one after the corresponding one (so for example if k=3, the elements should be spread like this: most should go into 3rd element of another vector, and rest should go to 2nd and 1st element)
Plot both 0.1*poiss vector and vector after spreading to compare if fluctuations went down
The way I generate weighted numbers is from this thread: Weighted random numbers in MATLAB
and this is the code I'm using:
clear all
clc
eta=0.1;
N=200;
fot=10000000;
ix=linspace(-100,100,N);
mn =poissrnd(fot/N, 1, N);
dataw=zeros(1,N);
a=1:3;
w=[.25,.5,.25];
for k=1:N
[~,R] = histc(rand(1,eta*mn(1,k)),cumsum([0;w(:)./sum(w)]));
R = a(R);
przydz=histc(R,a);
if (k>1) && (k<N)
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,k-1)=dataw(1,k-1)+przydz(1,1);
dataw(1,k+1)=dataw(1,k+1)+przydz(1,3);
elseif k==1
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,N)=dataw(1,N)+przydz(1,1);
dataw(1,k+1)=dataw(1,k+1)+przydz(1,3);
else
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,k-1)=dataw(1,k-1)+przydz(1,1);
dataw(1,1)=dataw(1,1)+przydz(1,3);
end
end
plot(ix,eta*mn,'g',ix,dataw,'r')
The fluctuations are still bigger, and I can't identify what's wrong... Is the method for generating weighted numbers wrong in this situation? Cause it doesn't seem so. The way I'm accumulating data from the first vector seems fine too. Is there another way I could do it (so I could then optimize it for using 'bigger' probability functions)?
Sorry for my terrible English.
[EDIT]:
Here is simple pic to show what I meant (I hope it's understandable)
How about trying negative binomial distribution? It is often used as a hyper-dispersed analogue of Poisson distribution. Additional links can be found in this paper, as well as some apparatus in supplement.
I have two matrices X and Y, both of order mxn. I want to create a new matrix O of order mxm such that each i,j th entry in this new matrix is computed by applying a function to ith and jth row of X and Y respectively. In my case m = 10000 and n = 500. I tried using a loop but it takes forever. Is there an efficient way to do it?
I am targeting two functions dot product -- dot(row_i, row_j) and exp(-1*norm(row_i-row_j)). But I was wondering if there is a general way so that I can plugin any function.
Solution #1
For the first case, it looks like you can simply use matrix multiplication after transposing Y -
X*Y'
If you are dealing with complex numbers -
conj(X*ctranspose(Y))
Solution #2
For the second case, you need to do a little more work. You need to use bsxfun with permute to re-arrange dimensions and employ the raw form of norm calculations and finally squeeze to get a 2D array output -
squeeze(exp(-1*sqrt(sum(bsxfun(#minus,X,permute(Y,[3 2 1])).^2,2)))
If you would like to avoid squeeze, you can use two permute's -
exp(-1*sqrt(sum(bsxfun(#minus,permute(X,[1 3 2]),permute(Y,[3 1 2])).^2,3)))
I would also advise you to look into this problem - Efficiently compute pairwise squared Euclidean distance in Matlab.
In conclusion, there isn't a common most efficient way that could be employed for every function to ith and jth row of X. If you are still hell bent on that, you can use anonymous function handles with bsxfun, but I am afraid it won't be the most efficient technique.
For the second part, you could also use pdist2:
result = exp(-pdist2(X,Y));
I'm working on doing a logistic regression using MATLAB for a simple classification problem. My covariate is one continuous variable ranging between 0 and 1, while my categorical response is a binary variable of 0 (incorrect) or 1 (correct).
I'm looking to run a logistic regression to establish a predictor that would output the probability of some input observation (e.g. the continuous variable as described above) being correct or incorrect. Although this is a fairly simple scenario, I'm having some trouble running this in MATLAB.
My approach is as follows: I have one column vector X that contains the values of the continuous variable, and another equally-sized column vector Y that contains the known classification of each value of X (e.g. 0 or 1). I'm using the following code:
[b,dev,stats] = glmfit(X,Y,'binomial','link','logit');
However, this gives me nonsensical results with a p = 1.000, coefficients (b) that are extremely high (-650.5, 1320.1), and associated standard error values on the order of 1e6.
I then tried using an additional parameter to specify the size of my binomial sample:
glm = GeneralizedLinearModel.fit(X,Y,'distr','binomial','BinomialSize',size(Y,1));
This gave me results that were more in line with what I expected. I extracted the coefficients, used glmval to create estimates (Y_fit = glmval(b,[0:0.01:1],'logit');), and created an array for the fitting (X_fit = linspace(0,1)). When I overlaid the plots of the original data and the model using figure, plot(X,Y,'o',X_fit,Y_fit'-'), the resulting plot of the model essentially looked like the lower 1/4th of the 'S' shaped plot that is typical with logistic regression plots.
My questions are as follows:
1) Why did my use of glmfit give strange results?
2) How should I go about addressing my initial question: given some input value, what's the probability that its classification is correct?
3) How do I get confidence intervals for my model parameters? glmval should be able to input the stats output from glmfit, but my use of glmfit is not giving correct results.
Any comments and input would be very useful, thanks!
UPDATE (3/18/14)
I found that mnrval seems to give reasonable results. I can use [b_fit,dev,stats] = mnrfit(X,Y+1); where Y+1 simply makes my binary classifier into a nominal one.
I can loop through [pihat,lower,upper] = mnrval(b_fit,loopVal(ii),stats); to get various pihat probability values, where loopVal = linspace(0,1) or some appropriate input range and `ii = 1:length(loopVal)'.
The stats parameter has a great correlation coefficient (0.9973), but the p values for b_fit are 0.0847 and 0.0845, which I'm not quite sure how to interpret. Any thoughts? Also, why would mrnfit work over glmfit in my example? I should note that the p-values for the coefficients when using GeneralizedLinearModel.fit were both p<<0.001, and the coefficient estimates were quite different as well.
Finally, how does one interpret the dev output from the mnrfit function? The MATLAB document states that it is "the deviance of the fit at the solution vector. The deviance is a generalization of the residual sum of squares." Is this useful as a stand-alone value, or is this only compared to dev values from other models?
It sounds like your data may be linearly separable. In short, that means since your input data is one dimensional, that there is some value of x such that all values of x < xDiv belong to one class (say y = 0) and all values of x > xDiv belong to the other class (y = 1).
If your data were two-dimensional this means you could draw a line through your two-dimensional space X such that all instances of a particular class are on one side of the line.
This is bad news for logistic regression (LR) as LR isn't really meant to deal with problems where the data are linearly separable.
Logistic regression is trying to fit a function of the following form:
This will only return values of y = 0 or y = 1 when the expression within the exponential in the denominator is at negative infinity or infinity.
Now, because your data is linearly separable, and Matlab's LR function attempts to find a maximum likelihood fit for the data, you will get extreme weight values.
This isn't necessarily a solution, but try flipping the labels on just one of your data points (so for some index t where y(t) == 0 set y(t) = 1). This will cause your data to no longer be linearly separable and the learned weight values will be dragged dramatically closer to zero.
Today I encountered a piece of MATLAB code, which I can't understand well. It is
(Dpatch - min(Dpatch(:))) / (max(Dpatch(:)) - min(Dpatch(:)))
Dpatch is a n*n matrix here.
So what we will get after dividing a n*n matrix by a 1*n matrix ?
Hoping for your help, thank you in advance.
I think that, what LuisMendo mentiones in his comment is the clue to your understanding problem, but deserves more explanation since it's a Matlab-typical, elegant but obfuscated way of doing things.
Normaly min operates in one dimension only. E.g. min(Dpatch) would return the minimum of each column. Or min(Dpatch, [], 2) the minimum of each row. Now Dpatch(:) flattens the matrix to a one dimensional array such that min(Dpatch(:)) will return the minimum over all the values in the matrix which is just a number. The same holds of course for max.
Although there seems to be a n*n by 1*n division here, there really is only a n*n by 1 elementwise divsion. (By the way n*n divided by 1*n is defined as inversion similar to A*pinv(B), see help slash).
Hence, as pointed out by AkiSuihkonen, your line of code just projects the matrix Dpatch from its range onto the [0, 1] range.
You can translate this as (Matrix - Number)/(Number - Number), which is (Matrix - Number) /Number which is Matrix (with the same size as the original one :)