initial seed for sparse GP regression - matlab

I use the sparse Gaussian process for regression from Rasmussen.
[http://www.tsc.uc3m.es/~miguel/downloads.php][1]
The syntax for predicting the mean is:
[~, mu_1, ~, ~, loghyper] = ssgpr_ui(Xtrain, Ytrain, Xtest, Ytest, m);
My question is, the author states that the initial hyper parameter search condition is different for different iterations, hence the results of the model is different from every iteration. Is there any way to ensure that the best initialization or seed condition is set to have good quality hyper parameters for best predictions and reproducible results?

In order to obtain the same predictions every time, it is possible to set the seed by
stream = RandStream('mt19937ar','Seed',123456);
RandStream.setGlobalStream(stream);
However, there is no standard procedure to set the best seed. Doing so will lead to over fitting of the model as we are giving too much of ideal conditions to fit the training data as quoted by #mikkola

Related

Using Linear Prediction Over Time Series to Determine Next K Points

I have a time series of N data points of sunspots and would like to predict based on a subset of these points the remaining points in the series and then compare the correctness.
I'm just getting introduced to linear prediction using Matlab and so have decided that I would go the route of using the following code segment within a loop so that every point outside of the training set until the end of the given data has a prediction:
%x is the data, training set is some subset of x starting from beginning
%'unknown' is the number of points to extend the prediction over starting from the
%end of the training set (i.e. difference in length of training set and data vectors)
%x_pred is set to x initially
p = length(training_set);
coeffs = lpc(training_set, p);
for i=1:unknown
nextValue = -coeffs(2:end) * x_pred(end-unknown-1+i:-1:end-unknown-1+i-p+1)';
x_pred(end-unknown+i) = nextValue;
end
error = norm(x - x_pred)
I have three questions regarding this:
1) Does this appropriately do what I have described? I ask because my error seems rather large (>100) when predicting over only the last 20 points of a dataset that has hundreds of points.
2) Am I interpreting the second argument of lpc correctly? Namely, that it means the 'order' or rather number of points that you want to use in predicting the next point?
3) If this is there a more efficient, single line function in Matlab that I can call to replace the looping and just compute all necessary predictions for me given some subset of my overall data as a training set?
I tried looking through the lpc Matlab tutorial but it didn't seem to do the prediction as I have described my needs require. I have also been using How to use aryule() in Matlab to extend a number series? as a reference.
So after much deliberation and experimentation I have found the above approach to be correct and there does not appear to be any single Matlab function to do the above work. The large errors experienced are reasonable since I am using a linear prediction algorithm for a problem (i.e. sunspot prediction) that has inherent nonlinear behavior.
Hope this helps anyone else out there working on something similar.

leave-one-out regression using lasso in Matlab

I have 300 data samples with around 4000 dimension feature each. Each input has a 5 dim. output which is in the range of -2 to 2. I am trying to fit a lasso model to it. I went through a few posts which talk about cross validation strategies like this one: Leave one out cross validation algorithm in matlab
But I saw that lasso does not support leaveout in Matlab! http://www.mathworks.com/help/stats/lasso.html
How can I train a model using leave one out cross validation and fit a model using lasso on my dataset? I am trying to do this in matlab. I would like to get a set of weights which I will be able to use for future predictions on other data.
I tried using glmnet: http://www.stanford.edu/~hastie/glmnet_matlab/intro.html but I couldn't compile it on my machine due to lack of proper mex compiler.
Any solutions to my problem? Thanks :)
EDIT
I am also trying to use lasso function in-built with MATLAB. It has an option to perform cross validation. It outputs B and Fit Statistics, where B is Fitted coefficients, a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values.
Now given a new test sample, how can I calculate the output using this model?
You can use a leave-one-out approach regardless of your training method. As explained here, you can use crossvalind to split the data into training and test sets.
[Train, Test] = crossvalind('LeaveMOut', N, M)

weighted correlation for case of matrix

i have question how to calculate weighted correlations for matrices,from wikipedia i have created three following codes
1.weighted mean calculation
function [y]= weighted_mean(x,w);
n=length(x);
%assume that weight vector and input vector have same length
sum=0.0;
sum_weight=0.0;
for i=1:n
sum=sum+ x(i)*w(i);
sum_weight=sum_weight+w(i);
end
y=sum/sum_weight;
end
2.weighted covariance
function result=cov_weighted(x,y,w)
n=length(x);
sum_covar=0.0;
sum_weight=0;
for i=1:n
sum_covar=sum_covar+w(i)*(x(i)-weighted_mean(x,w))*(y(i)-weighted_mean(y,w));
sum_weight=sum_weight+w(i);
end
result=sum_covar/sum_weight;
end
and finally weighted correlation
3.
function corr_weight=weighted_correlation(x,y,w);
corr_weight=cov_weighted(x,y,w)/sqrt(cov_weighted(x,x,w)*cov_weighted(y,y,w));
end
now i want to apply weighted correlation method for matrices,related to this link
http://www.mathworks.com/matlabcentral/fileexchange/20846-weighted-correlation-matrix/content/weightedcorrs.m
i did not understand anything how to apply,that why i have created my self,but need in case of input are matrices,thanks very much
#dato-datuashvili Maybe I am providing too much information...
1) I would like to stress that the evaluation of Weighted Correlation matrices are very uncommon. This happens because you have to provide beforehand the weights. Unless you have a clear reason to choose the weights, there is no clear way to provide them.
How can you tell that a measurement of your sample is more or less important than another measurement?
Having said that, the weights are up to you! Yo have to choose them!
So, people usually consider just the correlation matrix (no weights or all weights are the same e.g w_i=1).
If you have a clear way to choose good weights, just do not consider this part.
2) I understand that you want to test your code. So, in order to that, you have to have correlated random variables. How to generate them?
Multivariate normal distributions are the simplest case. See the wikipedia page about them: Multivariate Normal Distribution (see the item "Drawing values from the distribution". Wikipedia shows you how to generate the random numbers from this distribution using Choleski Decomposition). The 2-variate case is much simpler. See for instance Generate Correlated Normal Random Variables
The good news is that if you are using Matlab there is a function for you. See Matlab: Random numbers from the multivariate normal distribution.]
In order to use this function you have to provide the desired means and covariances. [Note that you are making the role of nature here. You are generating the data! In real life, you are going to apply your function to the real data. What I am trying to say is that this step is only useful for tests. Furthermore, pay attencion to the fact that in the Matlab function you are providing the variances and evaluating the correlations (covariances normalized by standard errors). In the 2-dimensional case (that is the case of your function it is possible to provide directly the correlation. See the page above that I provided to you of Math.Stackexchange]
3) Finally, you can apply them to your function. Generate X and Y from a normal multivarite distribution and provide the vector of weights w to your function corr_weight_correlation and you are done!
I hope I provide what you need!
Daniel
Update:
% From the matlab page
mu = [2 3];
SIGMA = [1 1.5; 1.5 3];
n=100;
[x,y] = mvnrnd(mu,SIGMA,n);
% Using your code
w=ones(n,1);
corr_weight=weighted_correlation(x,y,w); % Remember that Sigma is covariance and Corr_weight is correlation. In order to calculate the same thing, just use result=cov_weighted instead.

Naïve Bayes Classifier -- is normalization necessary?

We recently studied the Naïve Bayesian Classifier in our Machine Learning class and now I'm trying to implement it on the Fisher Iris dataset as a self-exercise. The concept is easy and straightforward, with some trickiness involved for continuous attributes. I read up several literature resources which recommended using a Gaussian approximation to compute probability of test data values, so I'm going with it in my code.
Now I'm trying to run it initially for 50% training and 50% test data samples, but something is missing. The current code is always predicting class 1 (I used integers to represent the classes) for all test samples, which is obviously wrong.
My guess is that the problem may be due to normalization being omitted by the code? Though I think adding normalization would still yield proportionate results, and so far my attempts to normalize have produced the same classification results.
Can someone please suggest if there is anything obvious missing here? Or if I'm not approaching this right? Since most of the code is 'mechanics', I have made prominent (****************) the 2 lines that are responsible for the calculations. Any help is appreciated, thanks!
nsamples=75; % 50% samples
% acquire training set and test set
[trainingSample,idx] = datasample(data,nsamples,'Replace',false);
testData = data(setdiff(1:150,idx),:);
% define Gaussian function
%***********************************************************%
Phi=#(mu,sig2,x) (1/sqrt(2*pi*sig2))*exp(-((x-mu)^2)/2*sig2);
%***********************************************************%
for c=1:3 % for 3 classes in training set
clear y x mu sig2;
index=1;
for i=1 : length(trainingSample)
if trainingSample(i,5)==c
y(index,:)=trainingSample(i,:); % filter current class samples
index=index+1; % for conditional probabilities
end
end
for j=1:size(testData,1) % iterate over test samples
clear pf p;
for i=1:4 % iterate over columns
x=testData(j,i); % representing attributes
mu=mean(y(:,i));
sig2=var(y(:,i));
pf(i) = Phi(mu,sig2,x); % calc conditional probability
end
% calc class likelihood; prior * posterior
%*****************************************************%
pc(j,c) = size(y,1)/nsamples * pf(1)*pf(2)*pf(3)*pf(4);
%*****************************************************%
end
end
% find the predicted class for each test sample
% by taking the max probability calculated
for i=1:size(pc,1)
[~,q]=max(pc(i,:));
predicted(i)=q;
actual(i)=testData(i,5);
end
Normalization shouldn't be necessary since the features are only compared to each other.
p(class|thing) = p(class)p(thing|class) =
= p(class)p(feature_1|class)p(feature_2|class)...p(feature_N|class)
So when fitting the parameters for the distribution feature_i|class it will just rescale the parameters (for the new "scale") in this case (mu, sigma2), but the probabilities will remain the same.
It's hard to read the matlab code due to alot of indexing and splitting of training/testing etc. Which is a possible problem source.
You should try something with a lot less non-necessary stuff around it (I would recommend python with scikit-learn for example, alot of helpers for splitting data and such http://scikit-learn.org/).
It's really important that you separate the training and test data, and only train the model with training data and test the trained model with the test data. (Is this done?)
Next step is to check the parameters which is easiest done with either printing them out (sanity check) or..
for each feature render the gaussian bells fitted next to a histogram of the data to see that they match (remember that each histogram bar must be of height number_of_samples_within_range/total_number_of_samples.
Visualising the data and the model is really important to know what is happening.

Issues related to plots in pattern recognition(Part1)

I cannot follow crossval() & cvpartition() function given in MATLAB documentation crossval(). What goes in the parameter and how would it help to compare performance and accuracy of different classifiers. Would be obliged if a simpler version of it is provided here.
Let's work on Example 2 from CROSSVAL documentation.
load('fisheriris');
y = species;
X = meas;
Here we loaded the data from example mat-file and assigned variable to X and y. meas amtrix contains different measurements of iris flowers and species are tree classes of iris, what we are trying to predict with the data.
Cross-validation is used to train a classifier on the same data set many times. Basically at each iteration you split the data set to training and test data. The proportion is determined by k-fold. For example, if k is 10, 90% of the data will be used for training, and the rest 10% - for test, and you will have 10 iterations. This is done by CVPARTITION function.
cp = cvpartition(y,'k',10); % Stratified cross-validation
You can explore cp object if you type cp. and press Tab. You will see different properties and methods. For example, find(cp.test(1)) will show indices of the test set for 1st iteration.
Next step is to prepare prediction function. This is probably where you had the main problem. This statement create function handle using anonymous function. #(XTRAIN, ytrain,XTEST) part declare that this function has 3 input arguments. Next part (classify(XTEST,XTRAIN,ytrain)) defines the function, which gets training data XTRAIN with known ytrain classes and predicts classes for XTEST data with generated model. (Those data are from cp, remember?)
classf = #(XTRAIN, ytrain,XTEST)(classify(XTEST,XTRAIN,ytrain));
Then we are running CROSSVAL function to estimate missclassification rate (mcr) passing the complete data set, prediction function handle and partitioning object cp.
cvMCR = crossval('mcr',X,y,'predfun',classf,'partition',cp)
cvMCR =
0.0200
Still have questions?