Difference between results of MATLAB and SPSS Factor Analysis(FA) - matlab

This is my code in IBM SPSS:
FACTOR
/VARIABLES VAR00001 VAR00002 VAR00003 VAR00004 VAR00005 VAR00006
/MISSING LISTWISE
/ANALYSIS VAR00001 VAR00002 VAR00003 VAR00004 VAR00005 VAR00006
/PRINT UNIVARIATE INITIAL CORRELATION SIG DET KMO INV REPR AIC EXTRACTION ROTATION
/PLOT EIGEN ROTATION
/CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION PC
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
/METHOD=CORRELATION.
and this is code of MATLAB R2015b to do the same:
[lambda,psi,T,stats,F]=factoran(DATA,2,'rotate','varimax');
SPSS output for roteted component matrix:
Rotated Component Matrix
Component
1 2
VAR00001 .973 -.062
VAR00002 .911 -.134
VAR00003 .833 -.035
VAR00004 .972 -.102
VAR00005 -.236 .823
VAR00006 .062 .878
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a Rotation converged in 3 iterations.
MATLAB lambda output:
0.993085200854508 -0.0537771548307969
0.875990644597448 -0.147112975689921
0.748570753047806 -0.0343768914779775
0.987459815125692 -0.0988807726538385
-0.203059229288894 0.976610007465447
0.00719025397609984 0.475514010080256
Why these outputs are different? I want same results in MATLAB. As you know SPSS ignores eigenvalues smaller than 1. I want same structure in MATLAB. How can I do this?
PS.
MATLAB T output:
0.622170579007477 -0.782881709211232
0.782881709211232 0.622170579007477
MATLAB psi output:
0.0108898014620571
0.210998162961140
0.438460057014266
0.0151457063113246
0.00500000000002244
0.773834726466399
Other SPSS outputs:
Component Matrix
Component
1 2
VAR00001 .964 .144
VAR00002 .919 .061
VAR00003 .821 .141
VAR00004 .971 .105
VAR00005 -.404 .755
VAR00006 -.124 .871
Extraction Method: Principal Component Analysis.
a 2 components extracted.
Component Transformation Matrix
Component 1 2
1 .977 -.211
2 .211 .977
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.

Matlab extracts factors using the maximum likelihood method. I don't think you can change this. SPSS extracts methods using principle components as its default, and this is the method that you have chosen for your SPSS analysis. That's yet another difference...

Related

Why do the principal component values from Scipy and MATLAB not agree?

I was training to do some PCA reconstroctions of MNIST on python and compare them to my (old) reconstruction in maltab and I happened to discover that my reconstruction don't agree. After some debugging I decided to print a unique characteristic of the principal components of each one to reveal if they were the same and I discovered to my surprised that they were not the same. I printing the sum of all components and I got different numbers. I did the following in matlab:
[coeff, ~, ~, ~, ~, mu] = pca(X_train);
U = coeff(:,1:K)
U_fingerprint = sum(U(:))
%print 31.0244
and in python/scipy:
pca = pca.fit(X_train)
U = pca.components_
print 'U_fingerprint', np.sum(U)
# prints 12.814
why are the twi PCA's not computing the same value?
All my attempts and solving this issue:
The way I discovered this was because when I was reconstructing my MNIST images, the python reconstructions where much much closer to their original images by a lot. I got error of 0.0221556788645 in python while in MATLAB I got errors of size 29.07578. To figure out where the difference was coming from I decided to finger print the data sets (maybe they were normalized differently). So I got two independent copies the MNIST data set (that were normalized by dividing my 255) and got the finger prints (summing all numbers in data set):
print np.sum(x_train) # from keras
print np.sum(X_train)+np.sum(X_cv) # from TensorFlow
6.14628e+06
6146269.1585420668
which are (essentially) same (one copy from tensorflow MNIST and the other from Keras MNIST, note MNIST train data set has about 1000 less training set so you need to append the missing ones). To my surprise, my MATLAB data had the same finger print:
data_fingerprint = sum(X_train(:))
% prints data_fingerprint = 6.1463e+06
meaning the data sets are exactly the same. Good, so the normalization data is not the issue.
In my MATLAB script I am actually computing the reconstruction manually as follow:
U = coeff(:,1:K)
X_tilde_train = (U * U' * X_train);
train_error_PCA = (1/N_train)*norm( X_tilde_train - X_train ,'fro')^2
%train_error_PCA = 29.0759
so I thought that might be the problem because I was using the interface python gave for computing the reconstructions as in:
pca = PCA(n_components=k)
pca = pca.fit(X_train)
X_pca = pca.transform(X_train) # M_train x K
#print 'X_pca' , X_pca.shape
X_reconstruct = pca.inverse_transform(X_pca)
print 'tensorflow error: ',(1.0/X_train.shape[0])*LA.norm(X_reconstruct_tf - X_train)
print 'keras error: ',(1.0/x_train.shape[0])*LA.norm(X_reconstruct_keras - x_train)
#tensorflow error: 0.0221556788645
#keras error: 0.0212030354818
which results in different error values 0.022 vs 29.07, shocking difference!
Thus, I decided to code that exact reconstruction formula in my python script:
pca = PCA(n_components=k)
pca = pca.fit(X_train)
U = pca.components_
print 'U_fingerprint', np.sum(U)
X_my_reconstruct = np.dot( U.T , np.dot(U, X_train.T) )
print 'U error: ',(1.0/X_train.shape[0])*LA.norm(X_reconstruct_tf - X_train)
# U error: 0.0221556788645
to my surprise, it has the same error as my MNIST error computing by using the interface. Thus, concluding that I don't have the misconception of PCA that I thought I had.
All that lead to me to check what the principal components actually where and to my surprise scipy and MATLAB have different fingerprint for their PCA values.
Does anyone know why or whats going on?
As warren suggested, the pca components (eigenvectors) might have different sign. After doing a finger print by adding all components in magnitude only I discovered they have the same finger print:
[coeff, ~, ~, ~, ~, mu] = pca(X_train);
K=12;
U = coeff(:,1:K)
U_fingerprint = sumabs(U(:))
% U_fingerprint = 190.8430
and for python:
k=12
pca = PCA(n_components=k)
pca = pca.fit(X_train)
print 'U_fingerprint', np.sum(np.absolute(U))
# U_fingerprint 190.843
which means the difference must be because of the different sign of the (pca) U vector. Which I find very surprising, I thought that should make a big difference, I didn't even consider it making a big difference. I guess I was wrong?
I don't know if this is the problem, but it certainly could be. Principal component vectors are like eigenvectors: if you multiply the vector by -1, it is still a valid PCA vector. Some of the vectors computed by matlab might have a different sign than those computed in python. That will result in very different sums.
For example, the matlab documentation has this example:
coeff = pca(ingredients)
coeff =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
I have my own python PCA code, and with the same input as in matlab, it produces this coefficient array:
[[ 0.0678 0.646 -0.5673 0.5062]
[ 0.6785 0.02 0.544 0.4933]
[-0.029 -0.7553 -0.4036 0.5156]
[-0.7309 0.1085 0.4684 0.4844]]
So, instead of simply summing the coefficient array, try summing the absolute values of the coefficients. Alternatively, ensure that all the vectors have the same sign convention before summing. You could do that by, say, multiplying each column by the sign of the first element in that column (assuming none of them are zero).

Why 'pca' in Matlab doesn't give orthogonal principal components?

Why using pca in Matlab, I cannot get the orthogonal principal component matrix
For example:
A=[3,1,-1;2,4,0;4,-2,-5;11,22,20];
A =
3 1 -1
2 4 0
4 -2 -5
11 22 20
>> W=pca(A)
W =
0.2367 0.9481 -0.2125
0.6731 -0.3177 -0.6678
0.7006 -0.0150 0.7134
>> PCA=A*W
PCA =
0.6826 2.5415 -2.0186
3.1659 0.6252 -3.0962
-3.9026 4.5028 -3.0812
31.4249 3.1383 -2.7616
Here, every column is a principle component. So,
>> PCA(:,1)'*PCA(:,2)
ans =
84.7625
But the principle component matrix hasn't mutually orthogonal components.
I checked some materials, it said they are not only uncorrelated, but strictly orthogonal. But I can't get the desired result. Can somebody tell me where I went wrong?
Thanks!
You are getting confused between the representation of A in the PCA feature space and the principal components. W are the principal components, and they will indeed be orthogonal.
Check that W(:,1).'*W(:,2) = 5.2040e-17, W(:,1).'*W(:,3) = -1.1102e-16 -- indeed orthogonal
What you are trying to do is to transform the data (i.e. A) in the PCA feature space. You should mean center the data first and then multiply by the principal components as follows.
% A_PCA = (A-repmat(mean(A),4,1))*W
% A more efficient alternative to the above command
A_PCA = bsxfun(#minus,A,mean(A))*W
% verify that it is correct by comparing it with `score` - i.e. the PCA representation
% of A given by MATLAB.
[W, score] = pca(A); % mean centering will occur inside pca
(score-(A-repmat(mean(A),4,1))*W) % elements are of the order of 1e-14, hence equal.

Matlab SVM linear binary classification failure

I'm trying to implement a simple SVM linear binary classification in Matlab but I got strange results.
I have two classes g={-1;1} defined by two predictors varX and varY. In fact, varY is enough to classify the dataset in two distinct classes (about varY=0.38) but I will keep varX as random variable since I will need it to other works.
Using the code bellow (adapted from MAtlab examples) I got a wrong classifier. Linear classifier should be closer to an horizontal line about varY=0.38, as we can perceive by ploting 2D points.
It is not displayed the line that should separate two classes
What am I doing wrong?
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
d = 0.005; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(m3(:,1)):d:max(m3(:,1)),...
min(m3(:,2)):d:max(m3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)]; % The grid
[~,scores2] = predict(SVMmodel_testm,xGrid); % The scores
figure();
h(1:2)=gscatter(m3(:,1), m3(:,2), g,'br','ox');
hold on
% Support vectors
h(3) = plot(m3(SVMmodel_testm.IsSupportVector,1),m3(SVMmodel_testm.IsSupportVector,2),'ko','MarkerSize',10);
% Decision boundary
contour(x1Grid,x2Grid,reshape(scores2(:,1),size(x1Grid)),[0 0],'k');
xlabel('varX'); ylabel('varY');
set(gca,'Color',[0.5 0.5 0.5]);
hold off
A common problem with SVM or any classification method for that matter is unnormalized data. You have one dimension that spans for 0 to 1 and the other from about 0.3 to 0.4. This causes inbalance between the features. Common practice is to somehow normalize the features, for examply by std. try this code:
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
m3(:,2) = m3(:,2)./std(m3(:,2));
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
Notice the line before the last.

Principal Componenet Analysis (mathworks example source code )

My question is quite elementary but I need help understanding the basic concepts.
In the following example from the Mathworks documentation page
of the princomp function
load hald;
[pc,score,latent,tsquare] = princomp(ingredients);
pc,latent
we get the following values for:
pc =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
latent =
517.7969
67.4964
12.4054
0.2372
score =
36.8218 -6.8709 -4.5909 0.3967
29.6073 4.6109 -2.2476 -0.3958
-12.9818 -4.2049 0.9022 -1.1261
23.7147 -6.6341 1.8547 -0.3786
-0.5532 -4.4617 -6.0874 0.1424
-10.8125 -3.6466 0.9130 -0.1350
-32.5882 8.9798 -1.6063 0.0818
22.6064 10.7259 3.2365 0.3243
-9.2626 8.9854 -0.0169 -0.5437
-3.2840 -14.1573 7.0465 0.3405
9.2200 12.3861 3.4283 0.4352
-25.5849 -2.7817 -0.3867 0.4468
-26.9032 -2.9310 -2.4455 0.4116
Legend:
latent is a vector containing the eigenvalues of the covariance matrix of X.
pc is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.**
score is the principal component scores; that is, the representation of X in the principal component space. Rows of SCORE correspond to observations, columns to components.
Can somebody explain whether the values of score are genetrated somehow using the values of pc and if this true, what kind of computation is perfomed ?
Yes, it holds that score = norm_ingredients * pc, where norm_ingredients is the normalized version of your input matrix so that its columns have zero mean, that is,
norm_ingredients = ingredients - repmat(mean(ingredients), size(ingredients, 1), 1)

complex valued neural network (CVNN) error divergence

I am currently working with my undergraduate thesis on complex valued neural network(CVNN).My topic is based on Single-layered complex-valued neural network for real-valued classification problems.I am using gradient-descent learning rule to classify a dataset given below:
Data Set
The alogorithm i used here can be found on page 946 of the following PDF labeled as Complex valued neuron (CVN) Model.The main algorithm can be on section 3 of that topic
Algorithm of CVN Model
But instead of getting convergence,my error curve has shown divergent characteristics.Here is my output of error curve.
error curve at CVNN implementation
I am simulating the code behind this on MATLAB.My implementation is also given below:
clc
clear all
epoch=1000;
n=8;
%x=real input value
in=dlmread('Diabetes1.txt');
x=in(1:384,1:8);
%d=desired output value
out=dlmread('Diabetes1.txt');
data_1=out(1:384,9);
data_2=out(1:384,10);
%m=complex representation of input
m=(cos((pi).*(x(:,:)-0))+1i*sin((pi).*(x(:,:)-0)));
%
%research
%m=i.*x(:,:)
%m=x(:,:)+i.*x(:,:)
%Wih=weight
%
%m=x(:,:).*(cos(pi./4)+i.*sin(pi./4));
Wih1 =0.5* exp(1i * 2*pi*rand(8,1));
Wih2 =0.5* exp(1i * 2*pi*rand(8,1));
%Pih=bias
Pih1 =0.5*exp(1i * 2*pi*rand(1,1));
Pih2 =0.5*exp(1i * 2*pi*rand(1,1));
for ite=1:epoch
% www(ite)=ite;
E_Total=0;
E1t=0;
E2t=0;
for j=1:384
%blr=learning rate
blr=0.1;
%cpat=current pattern
cpat = m(j,:);
z1=cpat*Wih1+Pih1;
u1=real(z1);
v1=imag(z1);
fu1=1/(1+exp(-u1));
fv1=1/(1+exp(-v1));
%y=actual output
%for activation function 1
y1=sqrt((fu1).^2+(fv1).^2);
%for activation function 2
% y1=(fu1-fv1).^2;
error1=(data_1(j,1)-y1);
E1=((data_1(j,1)-y1).^2);
t11=1./(1+exp(-u1));
f11=t11.*(1-t11);
t21=1./(1+exp(-v1));
f21=t21.*(1-t21);
%for activation function 1
r1= blr.*(data_1(j,1)-y1).*((t11.*f11)./y1)+i.*blr.*(data_1(j,1)-y1).*((t21.*f21)./y1);
%for activation function 2
%r1=2.*blr.*(data_1(j,1)-y1).*(t11-t21).*f11+1i.*2.*blr.*(data_1(j,1)-y1).*(t21-t11).*f21;
%
Pih1=Pih1+r1;
Wih1= Wih1+(conj(m(j,:)))'.*r1;
%////////////////////////////////////////////////
%cpat=current pattern
z2=cpat*Wih2+Pih2;
u2=real(z2);
v2=imag(z2);
fu2=1./(1+exp(-u2));
fv2=1./(1+exp(-v2));
% fu2=tanh(u2);
% fv2=tanh(v2);
%y=actual output
%for activation function 1
y2=sqrt((fu2).^2+(fv2).^2);
%for activation function 2
% y2=(fu2-fv2).^2;
error2=(data_2(j,1)-y2);
E2=((data_2(j,1)-y2).^2);
t12=1./(1+exp(-u2));
f12=t12.*(1-t12);
t22=1./(1+exp(-v2));
f22=t22.*(1-t22);
%for activation function1
r2= blr.*(data_2(j,1)-y2).*((t12.*f12)./y2)+i.*blr.*(data_2(j,1)-y2).*((t22.*f22)./y2);
%for activation function 2
%r2=2*blr*(data_2(j,1)-y2)*(t12-t22)*f12+1i*2*blr*(data_2(j,1)-y2)*(t22-t12)*f22;
Pih2=Pih2+r2;
Wih2= Wih2+(conj(m(j,:)))'.*r2;
%///////////////////////////////////////////////
E1t=E1+E1t;
E2t=E2+E2t;
E_Total=(E1+E2+E_Total);
E1;
E2;
end
Err=E_Total/(2.*384);
figure(1)
plot(ite,Err,'b--')
hold on;
%figure(1)
end
dlmwrite('weight.txt',Wih1)
dlmwrite('weight.txt', Wih2, '-append', ...
'roffset', 1, 'delimiter', ' ')
dlmwrite('weight.txt', Pih1, '-append', ...
'roffset', 1, 'delimiter', ' ')
dlmwrite('weight.txt', Pih2, '-append', ...
'roffset', 1, 'delimiter', ' ')
I still could not figure out reason behind this opposite characteristics on the dataset.So any kind of help regarding this is appreciated.
If you are doing gradient descent, a very common debugging technique is to check whether the gradient you calculated actually matches the numerical gradient of your loss function.
That is, check
(f(x+dx)-f(x))/dx==f'(x)*dx
for a variety of small dx. Usually try along each dimension, as well as in a variety of random directions. You also want to do this check for a variety of value of x.
You should take a glance at this blog for complex back-propagation.
For holomorphic functions, complex BP are fairly straight forward.
For non-holomorphic functions (every CVNN must have at least one non-holomorphic function), they need careful treat.