Matlab : how to change initial probability distribution of hidden markov model - matlab

How do I change initial probabilities distribution of hidden Markov model in MATLAB. I know by default, MATLAB begins the HMM algorithms at state 1. To assign a different distribution of probabilities, the transmission and emission matrices are augmented to include the prior matrix. How can I do this? I need answer with practical example. now assuming I have Trans=[o.2, 0.8; 0.4, 0.6] , Emis=[0.6, 0.5, o.3; 0.4, 0.5, 0.7] and initial=[0.2 ,0.8]

Related

Multitask learning in GPFlow with missing inputs?

Is it possible to do multitask learning with GPFlow where some inputs are missing? Specifically, I am trying to fit spatial data from several related individuals, but the data are not at identical input locations for all individuals. I know I ought to be doing hierarchical GPs here, but they tend not to scale well. I was hoping that multitask learning might be used instead, though the use case does not map exactly onto the typical application of this method.
Is it possible to do multitask learning with GPFlow where some inputs are missing?
Absolutely, yes.
The way GPflow does this is by stacking the output index onto the input. For example, suppose you have two outputs (0, 1) observed at locations [0.1, 0.2, 0.3] and [0.3, 0.4, 0.5], you'y construct the "input matrix"
[0.1 0]
[0.2 0]
[0.3 0]
[0.3 1]
[0.4 1]
[0.5 1]
Then, specify how the kernel acts on this matrix using "active_dims". The simplest kernel that could act on this model would be:
k = gpflow.kernels.Matern32(1, active_dims=[0]) * gpflow.kernels.Coregion(1, 2, 2, active_dims=[1])
Which is the intrinsic coregionalization model (see Alvarez et al, [1]). You can gfind a more detailed demo in the GPflow documentation, here.
Note that you probably want a more powerful model that intrinsic coregionalization: the Linear Model of Coregionalization is more powerful and still easy to implement.
[1] http://eprints.whiterose.ac.uk/114503/1/1106.6251v2.pdf
There is currently not a model in GPflow that does this. However, GPflow does provide tools to easily implement this. Two suggestions:
Use a multioutput kernel, and for missing datapoints you set your observation variance to infinity.
Define a multioutput kernel, and specify a custom Kuf for which the requested outputs are passed together with the corresponding input.

How's it even possible to use softmax for word2vec?

How is it possible to use softmax for word2vec? I mean softmax outputs probabilities of all classes which sum up to 1, e.g. [0, 0.1, 0.8, 0.1]. But if my label is, for example [0, 1, 0, 1, 0] (multiple correct classes), then it is impossible for softmax to output the correct value?
Should I use softmax instead? Or am I missing something?
I suppose you're talking about Skip-Gram model (i.e., predict the context word by the center), because CBOW model predicts the single center word, so it assumes exactly one correct class.
Strictly speaking, if you were to train word2vec using SG model and ordinary softmax loss, the correct label would be [0, 0.5, 0, 0.5, 0]. Or, alternatively, you can feed several examples per center word, with labels [0, 1, 0, 0, 0] and [0, 0, 0, 1, 0]. It's hard to say, which one performs better, but the label must be a valid probability distribution per input example.
In practice, however, ordinary softmax is rarely used, because there are too many classes and strict distribution is too expensive and simply not needed (almost all probabilities are nearly zero all the time). Instead, the researchers use sampled loss functions for training, which approximate softmax loss, but are much more efficient. The following loss functions are particularly popular:
Negative Sampling
Noise-Contrastive Estimation
These losses are more complicated than softmax, but if you're using tensorflow, all of them are implemented and can be used just as easily.

Drawing from correlated marginal uniform in Matlab: Alternatives to copularnd?

Consider two random variables X and Y both uniformly distributed in [0,1] and correlated with correlation rho.
I want to draw P realisations from their joint distribution in Matlab.
Is
A = copularnd('gaussian',rho,P);
the only way to do it?
Copulas provide a very convenient way modelling the joint distribution. Just saying that you want X~U[0,1], Y~U[0,1] and corr(X,Y)=rho is not enough to define the relationship between these two random variables. Simply by substituting different copulas you can build different models (eventually choosing the one suiting your use-case best) that all satisfy this condition.
Apart from understanding the basics of what copulas are, you need to understand there are different types of correlations, such as linear (Pearson) and rank (Spearman / Kendall), and how they relate to each other. In particular, rank correlations preserve when a monotonic transformation is applied, unlike linear correlation. This is the key property allowing us to easily translate the desired linear correlation of uniform marginal distributions to the linear correlation of bivariate normal (or other type of distribution you use in the copula), which would be the input correlation to copularnd.
There is a very good answer on Cross Validated, describing exactly how to convert the correlation in your case. You should also read this Matlab guide on Using Rank Correlation Coefficients and copulas.
In a nutshell, to model desired linear correlation of marginals, you need to translate it into some rank correlation (using Spearman is convenient, since for uniform distribution it equals Pearason correlation), which would be the same for your normals (because it's a monotonic transformation). All you need is to convert that Spearman correlation for your normal to a linear Person correlation, which would be the input to copularnd.
Hopefully the code would make it easy to understand. Obviously you don't need those temporary variables, but it should make the logic clear:
rho_uni_Pearson = 0.7;
rho_uni_Spearman = rho_uni_Pearson;
rho_normal_Spearman = rho_uni_Spearman;
rho_normal_Pearson = 2*sin(rho_normal_Spearman*pi/6);
X = copularnd('gaussian', rho_normal_Pearson, 1e7);
And the resulting linear correlation is exactly what we wanted (since we've generated a very large sample):
corr(X(:,1), X(:,2));
ans =
0.7000
Note that for a bivariate normal copula, the relationship between linear and rank correlations is easy enough, but it could be more complex if you use a different copula. Matlab has a function copulaparam that allows you to translate from rank to linear correlation. So instead of writing out explicit formula as above, we could just use:
rho_normal_Pearson = copulaparam('Gaussian', rho_normal_Spearman, 'type', 'spearman')
Now that we have learned the basics, let's go ahead and use a t copula with 5 degress of freedom instead of gaussian copula:
nu = 5; % number of degrees of freedom of t distribution
rho_t_Pearson = copulaparam('t', 0.7, nu, 'type', 'spearman');
X = copularnd('t', rho_t_Pearson, nu, 1e7);
Ensuring resulting linear correlation is what we wanted:
corr(X(:,1), X(:,2));
ans =
0.6996
It's easy to observe that resulting distributions may be strikingly different, depending on which copula you choose, even through they all give you the same linear correlation. It is up to you, the researcher, to determine which model is best suited to your particular data and problem. For example:
N = 500;
rho_Pearson = copulaparam('gaussian', 0.1, 'type', 'spearman');
X1 = copularnd('gaussian', rho_Pearson, N);
figure(); scatterhist(X1(:,1),X1(:,2)); title('gaussian');
nu = 1; % number of degrees of freedom of t distribution
rho_Pearson = copulaparam('t', 0.1, nu, 'type', 'spearman');
X2 = copularnd('t', rho_Pearson, nu, N);
figure(); scatterhist(X2(:,1),X2(:,2)); title('t, nu=1');

fitness in inverted pendulum

What is the fitness function used to solve an inverted pendulum ?
I am evolving neural networks with genetic algorithm. And I don't know how to evaluate each individual.
I tried minimize the angle of pendulum and maximize distance traveled at the end of evaluation time (10 s), but this won't work.
inputs for neural network are: cart velocity, cart position, pendulum angular velocity and pendulum angle at time (t). The output is the force applied at time (t+1)
thanks in advance.
I found this paper which lists their objective function as being:
Defined as:
where "Xmax = 1.0, thetaMax = pi/6, _X'max = 1.0, theta'Max =
3.0, N is the number of iteration steps, T = 0.02 * TS and Wk are selected positive weights." (Using specific values for angles, velocities, and positions from the paper, however, you will want to use your own values depending on the boundary conditions of your pendulum).
The paper also states "The first and second terms determine the accumulated sum of
normalised absolute deviations of X1 and X3 from zero and the third term when minimised, maximises the survival time."
That should be more than enough to get started with, but i HIGHLY recommend you read the whole paper. Its a great read and i found it quite educational.
You can make your own fitness function, but i think the idea of using a position, velocity, angle, and the rate of change of the angle the pendulum is a good idea for the fitness function. You can, however, choose to use those variables in very different ways than the way the author of the paper chose to model their function.
It wouldn't hurt to read up on harmonic oscillators either. They take the general form:
mx" + Bx' -kx = Acos(w*t)
(where B, or A may be 0 depending on whether or not the oscillator is damped/undamped or driven/undriven respectively).

Calculating the area under curve from classification accuracy

I have an assignment:
Using Naive Bayes we built a model on some data with 2 classes (model returns 2 probabilities - one for positive and one for negative class). We calculated the area under ROC curve AUC = 0.8 and classification accuracy CA = 0.6 with threshold set to 0.5 (if the probability of some example for positive class is higher than 0.5, we predict positive class for that example, else the negative class). Then we discovered that if we set the threshold to 0.3, classification accuracy becomes CA = 0.7. What is the AUC for the second threshold? If the result depends on initial data, present all possibilities.
How can I calculate that?
Not sure if that qualifies as an answer, but the ROC AUC is the integral of sensitivity and specificity over all classification thresholds. Therefore you cannot compute the AUC for a specific threshold.