My question is quite elementary but I need help understanding the basic concepts.
In the following example from the Mathworks documentation page
of the princomp function
load hald;
[pc,score,latent,tsquare] = princomp(ingredients);
pc,latent
we get the following values for:
pc =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
latent =
517.7969
67.4964
12.4054
0.2372
score =
36.8218 -6.8709 -4.5909 0.3967
29.6073 4.6109 -2.2476 -0.3958
-12.9818 -4.2049 0.9022 -1.1261
23.7147 -6.6341 1.8547 -0.3786
-0.5532 -4.4617 -6.0874 0.1424
-10.8125 -3.6466 0.9130 -0.1350
-32.5882 8.9798 -1.6063 0.0818
22.6064 10.7259 3.2365 0.3243
-9.2626 8.9854 -0.0169 -0.5437
-3.2840 -14.1573 7.0465 0.3405
9.2200 12.3861 3.4283 0.4352
-25.5849 -2.7817 -0.3867 0.4468
-26.9032 -2.9310 -2.4455 0.4116
Legend:
latent is a vector containing the eigenvalues of the covariance matrix of X.
pc is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.**
score is the principal component scores; that is, the representation of X in the principal component space. Rows of SCORE correspond to observations, columns to components.
Can somebody explain whether the values of score are genetrated somehow using the values of pc and if this true, what kind of computation is perfomed ?
Yes, it holds that score = norm_ingredients * pc, where norm_ingredients is the normalized version of your input matrix so that its columns have zero mean, that is,
norm_ingredients = ingredients - repmat(mean(ingredients), size(ingredients, 1), 1)
Related
Background: I am working on a problem similar to the nonlinear logistic regression described in the link [1] (my problem is more complicated, but link [1] is enough for the next sections of this post). Comparing my results with those obtained in parallel with a R package, I got similar results for the coefficients, but (very approximately) an opposite logLikelihood.
Hypothesis: The logLikelihood given by fitnlm in Matlab is in fact the negative LogLikelihood. (Note that this impairs consequently the BIC and AIC computation by Matlab)
Reasonning: in [1], the same problem is solved through two different approaches. ML-approach/ By defining the negative LogLikelihood and making an optimization with fminsearch. GLS-approach/ By using fitnlm.
The negative LogLikelihood after the ML-approach is:380
The negative LogLikelihood after the GLS-approach is:-406
I imagine the second one should be at least multiplied by (-1)?
Questions: Did I miss something? Is the (-1) coefficient enough, or would this simple correction not be enough?
Self-contained code:
%copy-pasting code from [1]
myf = #(beta,x) beta(1)*x./(beta(2) + x);
mymodelfun = #(beta,x) 1./(1 + exp(-myf(beta,x)));
rng(300,'twister');
x = linspace(-1,1,200)';
beta = [10;2];
beta0=[3;3];
mu = mymodelfun(beta,x);
n = 50;
z = binornd(n,mu);
y = z./n;
%ML Approach
mynegloglik = #(beta) -sum(log(binopdf(z,n,mymodelfun(beta,x))));
opts = optimset('fminsearch');
opts.MaxFunEvals = Inf;
opts.MaxIter = 10000;
betaHatML = fminsearch(mynegloglik,beta0,opts)
neglogLH_MLApproach = mynegloglik(betaHatML);
%GLS Approach
wfun = #(xx) n./(xx.*(1-xx));
nlm = fitnlm(x,y,mymodelfun,beta0,'Weights',wfun)
neglogLH_GLSApproach = - nlm.LogLikelihood;
Source:
[1] https://uk.mathworks.com/help/stats/examples/nonlinear-logistic-regression.html
This answer (now) only details which code is used. Please see Tom Lane's answer below for a substantive answer.
Basically, fitnlm.m is a call to NonLinearModel.fit.
When opening NonLinearModel.m, one gets in line 1209:
model.LogLikelihood = getlogLikelihood(model);
getlogLikelihood is itself described between lines 1234-1251.
For instance:
function L = getlogLikelihood(model)
(...)
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
(...)
Please also not that this notably impacts ModelCriterion.AIC and ModelCriterion.BIC, as they are computed using model.LogLikelihood ("thinking" it is the logLikelihood).
To get the corresponding formula for BIC/AIC/..., type:
edit classreg.regr.modelutils.modelcriterion
this is Tom from MathWorks. Take another look at the formula quoted:
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
Remember the normal distribution has a factor (1/sqrt(2*pi)), so taking logs of that gives us -log(2*pi)/2. So the minus sign comes from that and it is part of the log likelihood. The property value is not the negative log likelihood.
One reason for the difference in the two log likelihood values is that the "ML approach" value is computing something based on the discrete probabilities from the binomial distribution. Those are all between 0 and 1, and they add up to 1. The "GLS approach" is computing something based on the probability density of the continuous normal distribution. In this example, the standard deviation of the residuals is about 0.0462. That leads to density values that are much higher than 1 at the peak. So the two things are not really comparable. You would need to convert the normal values to probabilities on the same discrete intervals that correspond to individual outcomes from the binomial distribution.
I was training to do some PCA reconstroctions of MNIST on python and compare them to my (old) reconstruction in maltab and I happened to discover that my reconstruction don't agree. After some debugging I decided to print a unique characteristic of the principal components of each one to reveal if they were the same and I discovered to my surprised that they were not the same. I printing the sum of all components and I got different numbers. I did the following in matlab:
[coeff, ~, ~, ~, ~, mu] = pca(X_train);
U = coeff(:,1:K)
U_fingerprint = sum(U(:))
%print 31.0244
and in python/scipy:
pca = pca.fit(X_train)
U = pca.components_
print 'U_fingerprint', np.sum(U)
# prints 12.814
why are the twi PCA's not computing the same value?
All my attempts and solving this issue:
The way I discovered this was because when I was reconstructing my MNIST images, the python reconstructions where much much closer to their original images by a lot. I got error of 0.0221556788645 in python while in MATLAB I got errors of size 29.07578. To figure out where the difference was coming from I decided to finger print the data sets (maybe they were normalized differently). So I got two independent copies the MNIST data set (that were normalized by dividing my 255) and got the finger prints (summing all numbers in data set):
print np.sum(x_train) # from keras
print np.sum(X_train)+np.sum(X_cv) # from TensorFlow
6.14628e+06
6146269.1585420668
which are (essentially) same (one copy from tensorflow MNIST and the other from Keras MNIST, note MNIST train data set has about 1000 less training set so you need to append the missing ones). To my surprise, my MATLAB data had the same finger print:
data_fingerprint = sum(X_train(:))
% prints data_fingerprint = 6.1463e+06
meaning the data sets are exactly the same. Good, so the normalization data is not the issue.
In my MATLAB script I am actually computing the reconstruction manually as follow:
U = coeff(:,1:K)
X_tilde_train = (U * U' * X_train);
train_error_PCA = (1/N_train)*norm( X_tilde_train - X_train ,'fro')^2
%train_error_PCA = 29.0759
so I thought that might be the problem because I was using the interface python gave for computing the reconstructions as in:
pca = PCA(n_components=k)
pca = pca.fit(X_train)
X_pca = pca.transform(X_train) # M_train x K
#print 'X_pca' , X_pca.shape
X_reconstruct = pca.inverse_transform(X_pca)
print 'tensorflow error: ',(1.0/X_train.shape[0])*LA.norm(X_reconstruct_tf - X_train)
print 'keras error: ',(1.0/x_train.shape[0])*LA.norm(X_reconstruct_keras - x_train)
#tensorflow error: 0.0221556788645
#keras error: 0.0212030354818
which results in different error values 0.022 vs 29.07, shocking difference!
Thus, I decided to code that exact reconstruction formula in my python script:
pca = PCA(n_components=k)
pca = pca.fit(X_train)
U = pca.components_
print 'U_fingerprint', np.sum(U)
X_my_reconstruct = np.dot( U.T , np.dot(U, X_train.T) )
print 'U error: ',(1.0/X_train.shape[0])*LA.norm(X_reconstruct_tf - X_train)
# U error: 0.0221556788645
to my surprise, it has the same error as my MNIST error computing by using the interface. Thus, concluding that I don't have the misconception of PCA that I thought I had.
All that lead to me to check what the principal components actually where and to my surprise scipy and MATLAB have different fingerprint for their PCA values.
Does anyone know why or whats going on?
As warren suggested, the pca components (eigenvectors) might have different sign. After doing a finger print by adding all components in magnitude only I discovered they have the same finger print:
[coeff, ~, ~, ~, ~, mu] = pca(X_train);
K=12;
U = coeff(:,1:K)
U_fingerprint = sumabs(U(:))
% U_fingerprint = 190.8430
and for python:
k=12
pca = PCA(n_components=k)
pca = pca.fit(X_train)
print 'U_fingerprint', np.sum(np.absolute(U))
# U_fingerprint 190.843
which means the difference must be because of the different sign of the (pca) U vector. Which I find very surprising, I thought that should make a big difference, I didn't even consider it making a big difference. I guess I was wrong?
I don't know if this is the problem, but it certainly could be. Principal component vectors are like eigenvectors: if you multiply the vector by -1, it is still a valid PCA vector. Some of the vectors computed by matlab might have a different sign than those computed in python. That will result in very different sums.
For example, the matlab documentation has this example:
coeff = pca(ingredients)
coeff =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
I have my own python PCA code, and with the same input as in matlab, it produces this coefficient array:
[[ 0.0678 0.646 -0.5673 0.5062]
[ 0.6785 0.02 0.544 0.4933]
[-0.029 -0.7553 -0.4036 0.5156]
[-0.7309 0.1085 0.4684 0.4844]]
So, instead of simply summing the coefficient array, try summing the absolute values of the coefficients. Alternatively, ensure that all the vectors have the same sign convention before summing. You could do that by, say, multiplying each column by the sign of the first element in that column (assuming none of them are zero).
Why using pca in Matlab, I cannot get the orthogonal principal component matrix
For example:
A=[3,1,-1;2,4,0;4,-2,-5;11,22,20];
A =
3 1 -1
2 4 0
4 -2 -5
11 22 20
>> W=pca(A)
W =
0.2367 0.9481 -0.2125
0.6731 -0.3177 -0.6678
0.7006 -0.0150 0.7134
>> PCA=A*W
PCA =
0.6826 2.5415 -2.0186
3.1659 0.6252 -3.0962
-3.9026 4.5028 -3.0812
31.4249 3.1383 -2.7616
Here, every column is a principle component. So,
>> PCA(:,1)'*PCA(:,2)
ans =
84.7625
But the principle component matrix hasn't mutually orthogonal components.
I checked some materials, it said they are not only uncorrelated, but strictly orthogonal. But I can't get the desired result. Can somebody tell me where I went wrong?
Thanks!
You are getting confused between the representation of A in the PCA feature space and the principal components. W are the principal components, and they will indeed be orthogonal.
Check that W(:,1).'*W(:,2) = 5.2040e-17, W(:,1).'*W(:,3) = -1.1102e-16 -- indeed orthogonal
What you are trying to do is to transform the data (i.e. A) in the PCA feature space. You should mean center the data first and then multiply by the principal components as follows.
% A_PCA = (A-repmat(mean(A),4,1))*W
% A more efficient alternative to the above command
A_PCA = bsxfun(#minus,A,mean(A))*W
% verify that it is correct by comparing it with `score` - i.e. the PCA representation
% of A given by MATLAB.
[W, score] = pca(A); % mean centering will occur inside pca
(score-(A-repmat(mean(A),4,1))*W) % elements are of the order of 1e-14, hence equal.
I am trying to model a vector data containing 200 sample points denoting a measurement.I want to see "goodness of fit" and after reading I found that this can be done by predicting the next set of values(I am not that confident though if this is the correct way).I am stuck at this since the following code gives an error and I am just unable to solve it.Can somebody please help in removing the error
Error using *
Inner matrix dimensions must agree.
Error in data_predict (line 27)
ypred(j) = ar_coeff' * y{i}(j-1:-1:j-p);
Also,can somebody tell me how to do the same thing i.e get the coefficients using nonlinear AR modelling,moving average and ARMA since using the command nlarx() did not return any model coefficients?
CODE
if ~iscell(y); y = {y}; end
model = ar(y, 2, 'yw');
%prediction
yresiduals=[];
nsegments=length(y);
ar_coeffs = model.a;
ar_coeff=[ar_coeffs(2) ar_coeffs(3)]
for i=1:nsegments
pred = zeros(length(y{i}),1);
for j=p+1:length(y{i})
ypred(j) = ar_coeff(:)' * y{i}(j-1:-1:j-p);
end
yresiduals = [yresiduals; y{i}(p+1:end) - ypred(p+1:end)];
end
In matlab, * is the matrix product between two matrices. That means that the number of columns in the first matrix must equal the number of rows in the second matrix. You might have intended to use .* element by element multiplication. EDIT: For element by element multiplication, the matrices must be the same size. Check the size of your matrices. If they don't fit either of these conditions, something needs to change.
I have the transition matrix, emission matrix and starting state for a hidden Markov model. I want to generate a sequence of observations (emissions). However, I'm stuck on one thing.
I understand how to choose among two states (or emissions). If Event A probability x, then Event B (or, really not-A) occurs with probability 1-x. To generate a sequence of A's and B's, with a random number, rand, you do the following.
for iteration in iterations:
observation[iteration] <- A if rand < x else B
I don't understand how to extend this to more than two variables. For example, if three events occur such that Event A occurs with probability x1, Event B with x2 and Event C with 1-(x1+x2), then how do I extend the above pseudocode?
I didn't find the answer Googling. In fact I get the impression that I'm missing a basic fact that many of the notes online assume. :-/
One way would be
x<-rand()
if x < x1 observation is A
else if x < x1 + x2 observation is B
else observation is C
Of course if you have a large number of alternatives it might be better to build a cumulative probability table (holding x1, x1+x2, x1+x2+x3 ...) and then do a binary search in that table given the random number. If you are willing to do more preprocessing, there is an even more efficient way, see here for example
The two value case is a binomial distribution, and you generate random draws from a binomial distribution (a series of coin flips, essentially).
For more than 2 variables, you need to draw samples from a multinomial distribution, which is simply a generalisation of the binomial distribution for n>2.
Regardless of what language you use, there should most likely be built-in functions to accomplish this task. Below is some code in Python, which simulates a set of observations and states given your hmm model object:
import numpy as np
def random_MN_draw(n, probs):
""" get X random draws from the multinomial distribution whose probability is given by 'probs' """
mn_draw = np.random.multinomial(n,probs) # do 1 multinomial experiment with the given probs with probs= [0.5,0.5], this is a coin-flip
return np.where(mn_draw == 1)[0][0] # get the index of the state drawn e.g. 0, 1, etc.
def simulate(self, nSteps):
""" given an HMM = (A, B1, B2 pi), simulate state and observation sequences """
lenB= len(self.emission)
observations = np.zeros((lenB, nSteps), dtype=np.int) # array of zeros
states = np.zeros(nSteps)
states[0] = self.random_MN_draw(1, self.priors) # appoint the first state from the prior dist
for i in range(0,lenB): # initialise observations[i,0] for all observerd variables
observations[i,0] = self.random_MN_draw(1, self.emission[i][states[0],:]) #ith variable array, states[0]th row
for t in range(1,nSteps): # loop through t
states[t] = self.random_MN_draw(1, self.transition[states[t-1],:]) # given prev state (t-1) pick what row of the A matrix to use
for i in range(0,lenB): # loop through the observed variable for each t
observations[i,t] = self.random_MN_draw(1, self.emission[i][states[t],:]) # given current state t, pick what row of the B matrix to use
return observations,states
In pretty much every language, you can find equivalents of
np.random.multinomial()
for multinomial and other discrete or continuous distributions as built-in functions.