I am doing unsupervised classification. For this I have n features for classification and I want to use PCA for projecting data into new subspace and then perform clustering using output of PCA. I have written the following code:
for c=1:size(feature,1)
feature(c,:)=feature(c,:)-mean(feature);
end
DataCov=cov(feature); % covariance matrix
[PC,latent,explained] = pcacov(DataCov);
This gives me :
PC =
0.6706 0.7348 0.0965 0.0316 -0.0003 -0.0001
0.0009 -0.0060 0.0298 0.0378 0.8157 -0.5764
0.0391 -0.1448 0.5661 0.8091 -0.0406 0.0264
0.7403 -0.6543 -0.1461 -0.0505 0.0018 -0.0005
0.0003 -0.0020 0.0193 -0.0116 0.5768 0.8166
0.0264 -0.1047 0.8048 -0.5832 -0.0151 -0.0169
latent =
0.0116
0.0001
0.0000
0.0000
0.0000
0.0000
explained =
98.8872 <-----
1.0445
0.0478
0.0205
0.0000
0.0000
explained shows that only the first component (indicated by <--) really contributes a significant amount to explained variance.
Please reply, Is it possible to create a new features using only first component.???
Following is giving me new feature set, feature_New, using all Principle component. Is this a right way to create new feature set on which I can perform clustering:
feature_New= feature*PC;
Related
I encounter strange results from spap2 on some data:
The actual data is the blue curve, red circles are the knots I am using and yellow curve is the display of the cubic spline curve.
The code is quite simple, I cannot figure out what is the problem:
spgood = spap2(knots_zY, 4, ec, Y);
plot(ec, Y);
hold on;
scatter(knots_zY, Y(ec==knots_zY));
fnplt(spgood)
ec is the vector -4.12:0.02:-0.54.
Y is the following vector:
4.1291 4.0732 4.0173 4.2624 4.3826 4.3267 4.2708 4.4367 4.3808 4.1031 4.1721 3.8152 4.1572
4.1013 4.0454 3.5916 3.8367 3.7808 3.8218 3.6690 3.9141 3.7333 3.8023 3.3204 3.5656 3.4305
3.5787 3.3978 3.3419 3.2860 3.4062 3.4753 3.5706 3.2385 3.1826 3.4947 3.5315 3.1746 3.2089
3.2276 3.1940 2.9162 3.0364 3.0263 2.8155 2.7596 2.9555 2.8996 2.9081 2.7322 2.8524 2.6397
2.7662 2.5279 2.5417 2.2005 2.3409 2.5108 2.5202 2.3359 2.3660 2.3100 2.1682 2.1123 2.2140
2.1288 2.1116 1.9856 2.0089 1.8845 1.9148 1.9308 1.7273 1.7642 1.7326 1.6606 1.7378 1.6570
1.5815 1.5701 1.4630 1.5503 1.5181 1.4385 1.3083 1.3168 1.2991 1.2523 1.1390 0.9988 1.0373
0.9913 1.0113 0.9754 0.8912 0.8790 0.7491 0.7557 0.7544 0.7119 0.7031 0.6843 0.6418 0.5938
0.5193 0.5334 0.4312 0.4839 0.4437 0.3992 0.3689 0.3287 0.3348 0.3076 0.2274 0.2174 0.1970
0.2188 0.1760 0.1384 0.1773 0.1342 0.1388 0.1097 0.0830 0.0782 0.0725 0.0863 0.0581 0.0466
0.0398 0.0431 0.0187 0.0187 0.0176 0.0167 0.0231 0.0033 -0.0117 -0.0016 0.0084 -0.0055 -0.0120
-0.0080 -0.0064 -0.0075 -0.0134 -0.0075 0.0012 -0.0077 -0.0024 0.0006 0.0010 0.0043 0.0016 0.0018
0.0042 0.0030 0.0029 0.0029 0.0021 0.0013 -0.0002 -0.0020 -0.0030 -0.0032 -0.0002 -0.0013 0.0035
0.0028 -0.0000 -0.0057 -0.0032 0.0020 0.0597 0.1835 0.5083 1.0275 1.6448 3.0549
The knots are defined with the following 12 values:
-4.1200 -3.9400 -3.5400 -3.3000 -3.1400 -2.6800 -2.3600 -2.0600 -1.5000 -1.1600 -0.7000 -0.5400
I don't expect a nice fit, but at least the spline fit sticks with the knots ... but here the result is completely erroneous. I am stuck with this, unable to see where is the problem with this data sample.
Note: the knots are computed in a separate algorithm and should be used for the interpolator, getting a good fit is not the question here. The question is why the spline fit does not pass through the knots.
I have made several errors.
First, it's a mistake to assume that the result spline will pass through the knots, as it is an approximation (see this answer). The approximation smoothes the whole original data so there is no way to stick on knots.
Second, I have forgot to extend the end knots to impose boundary conditions. The default boundary condition is to have all derivatives (including the 0th-order) to be zero, resulting in this shape. The solution is then to use augknt to get an actual cubic spline with two continuous derivatives:
spgood = spap2(augknt(knots_zY,4), 4, ec, Y);
The resulting fit is:
which is way better, given the choice of the knot sequence.
I noticed that when I run an optimization method in MATLAB with a fmincon function, for example active-set method, the resulting coefficients and computation time is different every time I run the algorithm. I do not change the starting points, I just run the method one after another.
I understand this is the case for global optimization algorithms as they are using stochastic algorithms in the background. But why the same thing happens for the local minimum methods?
EDIT: adding the code
options = optimset('fmincon') ;
options = optimset(options,...
'LargeScale','on',...
'Algorithm','active-set',...
'MaxFunEvals',10000,...
'Display','off',...
'TolCon', 1e-10,...
'TolFun', 1e-10,...
'TolX', 1e-10,...
'MaxIter',10000) ;
[Param,fval,exitflag,output] = fmincon(#Cost_FCN, Param0,[],[],[],[],LB,UB,[],options,Fz,Side,Camber,List,Coeff,Fy) ;
1.8181 1.8181
0.1737 0.1737
0.0210 0.0210
0.0004 0.0004
0.0000 0.0000
9.5810 9.5811
4.7975 4.7981
1.5981 1.5991
9.9934 9.9934
0.8277 0.8277
0.0359 0.0360
0.2438 0.2437
0.0125 0.0125
0.0051 0.0051
0.0041 0.0041
Bounds and constraints are constant and the starting points are the same every time I run the method.
I have a matrix with 35 columns and I'm trying to reduce the dimension using PCA. I run PCA on my data:
[coeff,score,latent,tsquared,explained,mu] = pca(data);
explained =
99.9955
0.0022
0.0007
0.0003
0.0002
0.0001
0.0001
0.0001
Then, by looking at the vector explained, I notice the value of the first element is 99. Based on this, I decided to take only the first compoenet. So I did the follwoing:
k=1;
X = bsxfun(#minus, data, mean(data)) * coeff(:, 1:k);
and now, I used X for SVM training:
svmStruct = fitcsvm(X,Y,'Standardize',true, 'Prior','uniform','KernelFunction','linear','KernelScale','auto','Verbose',0,'IterationLimit', 1000000);
However, when I tried to predict and calculate the miss-classification rate:
[label,score,cost] = predict(svmStruct, X);
the result was disappointing. I notice, when I select only one component (k=1), I all classification was wrong. However, as I increase number of included components, k, the result improves, as you can see from the diagram below. But this doesn't make sense according to explained, which indicates that I should be fine with only the first eigenvector.
Did I do any mistake?
This diagram shows the classification error as a function of the number of included eginvectors:
This graph is generated after by doing normalization before doing PCA as suggested by #zelanix:
This is also plotted graph:
and this explained values obtained after doing normalization before PCA:
>> [coeff,score,latent,tsquared,explained,mu] = pca(data_normalised);
Warning: Columns of X are linearly dependent to within machine precision.
Using only the first 27 components to compute TSQUARED.
> In pca>localTSquared (line 501)
In pca (line 347)
>> explained
explained =
32.9344
15.6790
5.3093
4.7919
4.0905
3.8655
3.0015
2.7216
2.6300
2.5098
2.4275
2.3078
2.2077
2.1726
2.0892
2.0425
2.0273
1.9135
1.8809
1.7055
0.8856
0.3390
0.2204
0.1061
0.0989
0.0334
0.0085
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Parag S. Chandakkar is absolutely right that there is no reason to expect that PCA will automatically improve your classification result. It is an unsupervised method so is not intended to improve separability, only to find the components with the largest variance.
But there are some other problems with your code. In particular, this line confuses me:
X = bsxfun(#minus, data, mean(data)) * coeff(:, 1:k);
You need to normalise your data before performing PCA, and each feature needs to be normalised separately. I use the following:
data_normalised = data;
for f = 1:size(data, 2)
data_normalised(:, f) = data_normalised(:, f) - nanmean(data_normalised(:, f));
data_normalised(:, f) = data_normalised(:, f) / nanstd(data_normalised(:, f));
end
pca_coeff = pca(data_normalised);
data_pca = data_normalised * pca_coeff;
You can then extract the first principal component as data_pca(:, 1).
Also, always plot your PCA results to get an idea of what is actually going on:
figure
scatter(data_pca(Y == 1, 1), data_pca(Y == 1, 2))
hold on;
scatter(data_pca(Y == 2, 1), data_pca(Y == 2, 2))
PCA gives the direction of maximum variance in the data, it does not necessarily have to do better classification. If you want to reduce your data while trying to maximize your accuracy, you should do LDA.
The following picture illustrates exactly what I want to convey.
I want to use Yalmip in Matlab for solving a sdp problem,
min X11+X13
s.t. X22=1
X is positive semidefinite
Following is the code
P = sdpvar(3,3);
cons = [P >= 0,P(2,2)==1];
options = sdpsettings('Solver','Sedumi');
obj = [P(1,1)+P(1,3)];
solvesdp(cons,obj,options);
PP = double(P)
PP(1,1)+PP(2,3)
results are shown below
PP =
1.2900 0.0000 -2.2900
0.0000 0.0000 0.0000
-2.2900 0.0000 5.8700
ans =
1.2900
I am quite curious about the results, I've had the constraint P(2,2)==1, while in the final results, P(2,2)=5.87, why does this happen? Any one can help?
yalmip assumes symmetrical decision matrix, P = sdpvar(3,3,'full') will do better
in general i know that i can easily calculate correlation matrix in matlab,there is a lot of function for this,but what about weighted correlation?i found this matlab file
http://www.mathworks.com/matlabcentral/fileexchange/20846-weighted-correlation-matrix/content/weightedcorrs.m
but how does choosing weights depend on persons intuition or it is standard?
let say we have
x = randn(30,4)
x =
0.5377 0.8884 -1.0891 -1.1480
1.8339 -1.1471 0.0326 0.1049
-2.2588 -1.0689 0.5525 0.7223
0.8622 -0.8095 1.1006 2.5855
0.3188 -2.9443 1.5442 -0.6669
-1.3077 1.4384 0.0859 0.1873
-0.4336 0.3252 -1.4916 -0.0825
0.3426 -0.7549 -0.7423 -1.9330
3.5784 1.3703 -1.0616 -0.4390
2.7694 -1.7115 2.3505 -1.7947
-1.3499 -0.1022 -0.6156 0.8404
3.0349 -0.2414 0.7481 -0.8880
0.7254 0.3192 -0.1924 0.1001
-0.0631 0.3129 0.8886 -0.5445
0.7147 -0.8649 -0.7648 0.3035
-0.2050 -0.0301 -1.4023 -0.6003
-0.1241 -0.1649 -1.4224 0.4900
1.4897 0.6277 0.4882 0.7394
1.4090 1.0933 -0.1774 1.7119
1.4172 1.1093 -0.1961 -0.1941
0.6715 -0.8637 1.4193 -2.1384
-1.2075 0.0774 0.2916 -0.8396
0.7172 -1.2141 0.1978 1.3546
1.6302 -1.1135 1.5877 -1.0722
0.4889 -0.0068 -0.8045 0.9610
1.0347 1.5326 0.6966 0.1240
0.7269 -0.7697 0.8351 1.4367
-0.3034 0.3714 -0.2437 -1.9609
0.2939 -0.2256 0.2157 -0.1977
-0.7873 1.1174 -1.1658 -1.2078
and we have done
x(:,4) = sum(x,2); % Introduce correlation.
[r,p] = corrcoef(x) % Compute sample correlation and p-values.
and got
r =
1.0000 -0.0352 0.2673 0.6901
-0.0352 1.0000 -0.5101 0.2617
0.2673 -0.5101 1.0000 0.3504
0.6901 0.2617 0.3504 1.0000
it is unweighted correlation,but how can i do weighted correlation with help of matlab file?please help me
This function needs the weights of each observation as input. How you choose them is upto you.
If these were outputs of a simulation for example, you could let the weights be the number of performed iterations. If they were stock results, consider using the value in the portfolio. However, there is no standard way to get the 'best' weights in general. Just consider that a value that is more reliable should typically get more weight.