Hidden Markov Model Multiple Observation values for each state - matlab

I am new to Hidden Markov Model. I understand the main idea and I have tried some Matlab built-in HMM functions to help me understand more.
If I have a sequence of observations and corresponding states,
e.g.
seq = 2 6 6 1 4 1 1 1 5 4
states = 1 1 2 2 2 2 2 2 2 2
and I can use hmmestimate function to calculate transition and emission probability matrices as:
[TRANS_EST, EMIS_EST] = hmmestimate(seq, states)
TRANS_EST =
0.5000 0.5000
0 1.0000
EMIS_EST =
0 0.5000 0 0 0 0.5000
0.5000 0 0 0.2500 0.1250 0.1250
In the example, the observation is just a single value.
The example picture below describes my situation.
If I have states: {Sleep, Work, Sport}, and I have a set of observations: {lightoff, light on, heart rate>100 .....}
If I use number to represent each observation, in my situation each state has multiple observations at the same time,
seq = {2,3,5} {6,1} {2} {2,3,6} {4} {1,2} {1}
states = 1 1 2 2 2 2 2
I have no idea how to implement this in Matlab to get transition and emission probability matrix. I am quite lost, what shall I do in the next step? Am I using the right approach?
Thanks!

If you know the hidden state sequence, then max likelihood estimation is trivial: it's the normalized empirical counts. In other words, count up the transitions and emissions and then divide the elements in each row by the total counts in that row.
In the case where you have multiple observation variables, code the observations as a vector where each element gives the value of one of the random variables on that time step, e.g. '{lights=1, computer=0, Heart Rate >100 = 1, location =0}'. The key is that you need to have the same number of observations at each time step or else things will be much more difficult.

I think you have two options.
1) code multiple observations into one number. For example, if you know that the maximal possible value for the observation is N, and at each state you may have at most K observations, then you can code any combinations of observations as a number between 0 and N^K - 1. By doing this, you are assuming that {2,3,6} and {2,3,5} do not share anything, they are completely different two observations.
2) Or you can have multiple emission distributions for each state. I haven't used the built-in functions in matlab for HMM estimation, so I have no idea whether or not it supports that. But the idea is, if you have multiple emission distributions at a state, the emission likelihood is just the product of them. This is what jerad suggests.

Related

MATLAB: Subtracting matrix subsets by specific rows

Here is an example of a subset of the matrix I would like to use:
1 3 5
2 3 6
1 1 1
3 5 4
5 5 5
8 8 0
This matrix is in fact 3000 x 3.
For the first 3 rows, I wish to subtract each of these rows with the first row of these three.
For the second 3 rows, I wish to subtract each of these rows with the first of these three, and so on.
As such, the output matrix will look like:
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
What code in MATLAB will do this for me?
You could also do this completely vectorized by using mat2cell, cellfun, then cell2mat. Assuming our matrix is stored in A, try:
numBlocks = size(A,1) / 3;
B = mat2cell(A, 3*ones(1,numBlocks), 3);
C = cellfun(#(x) x - x([1 1 1], :), B, 'UniformOutput', false);
D = cell2mat(C); %//Output
The first line figures out how many 3 x 3 blocks we need. This is assuming that the number of rows is a multiple of 3. The second line uses mat2cell to decompose each 3 x 3 block and places them into individual cells. The third line then uses cellfun so that for each cell in our cell array (which is a 3 x 3 matrix), it takes each row of the 3 x 3 matrix and subtracts itself with the first row. This is very much like what #David did, except I didn't use repmat to minimize overhead. The fourth line then takes each of these matrices and stacks them back so that we get our final matrix in the end.
Example (this is using the matrix that was defined in your post):
A = [1 3 5; 2 3 6; 1 1 1; 3 5 4; 5 5 5; 8 8 0];
numBlocks = size(A,1) / 3;
B = mat2cell(A, 3*ones(1, numBlocks), 3);
C = cellfun(#(x) x - x([1 1 1], :), B, 'UniformOutput', false);
D = cell2mat(C);
Output:
D =
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
In hindsight, I think #David is right with respect to performance gains. Unless this code is repeated many times, I think the for loop will be more efficient. Either way, I wanted to provide another alternative. Cool exercise!
Edit: Timing and Size Tests
Because of our discussion earlier, I have decided to do timing and size tests. These tests were performed on an Intel i7-4770 # 3.40 GHz CPU with 16 GB of RAM, using MATLAB R2014a on Windows 7 Ultimate. Basically, I did the following:
Test #1 - Set the random seed generator to 1 for reproducibility. I wrote a loop that cycled 10000 times. For each iteration in the loop, I generated a random integer 3000 x 3 matrix, then performed each of the methods that were described here. I took note of how long it took for each method to complete after 10000 cycles. The timing results are:
David's method: 0.092129 seconds
rayryeng's method: 1.9828 seconds
natan's method: 0.20097 seconds
natan's bsxfun method: 0.10972 seconds
Divakar's bsxfun method: 0.0689 seconds
As such, Divakar's method is the fastest, followed by David's for loop method, followed closely by natan's bsxfun method, followed by natan's original kron method, followed by the sloth (a.k.a mine).
Test #2 - I decided to see how fast this would get as you increase the size of the matrix. The set up was as follows. I did 1000 iterations, and at each iteration, I increase the size of the matrix rows by 3000 each time. As such, iteration 1 consisted of a 3000 x 3 matrix, the next iteration consisted of a 6000 x 3 matrix and so on. The random seed was set to 1 again. At each iteration, the time taken to complete the code was taken a note of. To ensure fairness, the variables were cleared at each iteration before the processing code began. As such, here is a stem plot that shows you the timing for each size of matrix. I subsetted the plot so that it displays timings from 200000 x 3 to 300000 x 3. Take note that the horizontal axis records the number of rows at each iteration. The first stem is for 3000 rows, the next is for 6000 rows and so on. The columns remain the same at 3 (of course).
I can't explain the random spikes throughout the graph.... probably attributed to something happening in RAM. However, I'm very sure I'm clearing the variables at each iteration to ensure no bias. In any case, Divakar and David are closely tied. Next comes natan's bsxfun method, then natan's kron method, followed last by mine. Interesting to see how Divakar's bsxfun method and David's for method are side-by-side in timing.
Test #3 - I repeated what I did for Test #2, but using natan's suggestion, I decided to go on a logarithmic scale. I did 6 iterations, starting at a 3000 x 3 matrix, and increasing the rows by 10 fold after. As such, the second iteration had 30000 x 3, the third iteration had 300000 x 3 and so on, up until the last iteration, which is 3e8 x 3.
I have plotted on a semi-logarithmic scale on the horizontal axis, while the vertical axis is still a linear scale. Again, the horizontal axis describes the number of rows in the matrix.
I changed the vertical limits so we can see most of the methods. My method is so poor performing that it would squash the other timings towards the lower end of the graph. As such, I changed the viewing limits to take my method out of the picture. Essentially what was seen in Test #2 is verified here.
Here's another way to implement this with bsxfun, slightly different from natan's bsxfun implementation -
t1 = reshape(a,3,[]); %// a is the input matrix
out = reshape(bsxfun(#minus,t1,t1(1,:)),[],3); %// Desired output
a slightly shorter and vectorized way will be (if a is your matrix) :
b=a-kron(a(1:3:end,:),ones(3,1));
let's test:
a=[1 3 5
2 3 6
1 1 1
3 5 4
5 5 5
8 8 0]
a-kron(a(1:3:end,:),ones(3,1))
ans =
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
Edit
Here's a bsxfun solution (less elegant, but hopefully faster):
a-reshape(bsxfun(#times,ones(1,3),permute(a(1:3:end,:),[2 3 1])),3,[])'
ans =
0 0 0
1 0 1
0 -2 -4
0 0 0
2 0 1
5 3 -4
Edit 2
Ok, this got me curios, as I know bsxfun starts to be less efficient for bigger array sizes. So I tried to check using timeit my two solutions (because they are one liners it's easy). And here it is:
range=3*round(logspace(1,6,200));
for n=1:numel(range)
a=rand(range(n),3);
f=#()a-kron(a(1:3:end,:),ones(3,1));
g=#() a-reshape(bsxfun(#times,ones(1,3),permute(a(1:3:end,:),[2 3 1])),3,[])';
t1(n)=timeit(f);
t2(n)=timeit(g);
end
semilogx(range,t1./t2);
So I didn't test for the for loop and Divkar's bsxfun, but you can see that for arrays smaller than 3e4 kron is better than bsxfun, and this changes at larger arrays (ratio of <1 means kron took less time given the size of the array). This was done at Matlab 2012a win7 (i5 machine)
Simple for loop. This does each 3x3 block separately.
A=randi(5,9,3)
B=A(1:3:end,:)
for i=1:length(A(:,1))/3
D(3*i-2:3*i,:)=A(3*i-2:3*i,:)-repmat(B(i,:),3,1)
end
D
Whilst it may be possible to vectorise this, I don't think the performance gains would be worth it, unless you will do this many times. For a 3000x3 matrix it doesn't take long at all.
Edit: In fact this seems to be pretty fast. I think that's because Matlab's JIT compilation can speed up simple for loops well.
You can do it using just indexing:
a(:) = a(:) - a(3*floor((0:numel(a)-1)/3)+1).';
Of course, the 3 above can be replaced by any other number. It works even if that number doesn't divide the number of rows.

My neural network forgets the last training when I try to teach next set of training inputs

Im learning(started today) neural networks and could finish a 2x2x1 network(forward data feeding and backward error propagated) that can learn AND operation for one set of inputs. It also dodges any local minimums using randomized parameters. My first source for this is: http://www.codeproject.com/Articles/14342/Designing-And-Implementing-A-Neural-Network-Librar
The problem is: it learns 0 AND 0 using inputs (0,0) but when I give (0,1) it forgets 0 AND 0 then learns 0 AND 1. Is this a general newbie bug?
What I tried:
loop for 10000 times
learn 0 and 0
end loop
loop for 10000 times
learn 0 and 1 (forgets 0 and 0)
end loop
loop for 10000 times
learn 1 and 0 (forgets 0 and 1)
end loop
loop for 10000 times
learn 1 and 1 (forgets 1 and 0)
end loop
only one set is learned
fail
Trial 2:
loop for 10000 times
learn 0 and 0
learn 0 and 1
learn 1 and 0
learn 1 and 1
end loop
gives same result for all input combinations.
fail.
Activation function for each neuron: hyperbolic tangent
2x2 structure: all-pairs
2x1 structure: all-pairs
Randomized learning rate: yes, small enough to keep far from explosive iteration (per iteration)
Randomized bias per neuron: yes, between -0.5 and +0.5 (just at start)
Randomized weighting: yes, between -0.5 and +0.5 (just at start)
Edit: Bias and weight updates are done for all-pairs of hidden and output layers.
Edit: All neurons(hidden+output) use same activation function.
Without specific code it is hard to say for sure, but I think the issue is that you are only giving it one case to learn at a time. You should give it a matrix of your different learning examples, with an expected result vector. Then, when you update your weights and biases, you are finding the values that minimize the error between your network output for all cases, and the expected output for all cases.
For an AND gate, your input would be (in MATLAB code, not sure what language you are using but that syntax is easy to understand):
input = [0, 0;
0, 1;
1, 0;
1, 1];
And your expected output would be:
output = [0;
0;
0;
1];
I think what you are doing now is basically finding the weights and biases that minimize the error between the network output and the expected output for just one input case, and then re-training those weights and biases to minimize the error for the second case, then the third, then the fourth. If you put them in arrays like this it should minimize the overall error for all cases. This is just my best guess though without any code to go on.

Interpretation of Probability Estimate for Multi-class classification in LibSVM for MATLAB

Problem: 3 class classification with labels 1,2,3.
Tool: LibSVM for MATLAB
svmModel = svmtrain(<Trainfeatures>, <TrainclassLabels>, '-b 1 -c <someCValue> -g <someGammaValue>');
[predLabels, classAccuracy, **probEstimates**] = svmpredict(<TestFeatures>, <TestClassLabels>, '-b 1');
AFter this step, I get the first ten rows of probEstimates to be,
0.9129 0.0749 0.0122
0.9059 0.0552 0.0389
0.8231 0.0183 0.1586
0.9077 0.0098 0.0825
0.9074 0.0668 0.0257
0.8685 0.0146 0.1169
0.8962 0.0664 0.0374
0.9074 0.0548 0.0377
0.9474 0.0054 0.0472
0.9178 0.0642 0.0180
but the first ten predicted labels to be:
2
2
2
2
2
2
2
2
2
2
Questions:
My understanding was that the probability estimate was the probability that a particular item would belong to a particular class, given its feature vector. However, if that were true, then these items should belong to class 1 and not class 2. Does the libsvm change the order of classes or am I missing something here? If I am wrong, can someone please explain what the real interpretation of probability estimate is?
If I have to move the decision boundary to increase the precision of class 1 (have less items to be predicted to be class 1 and hence be more conservative in the decision boundary), which of these class probabilities should I have to deal with and how?
I came across the same problem recently.
The reason is related to the order of training data.
If you want the index of post-probability vector to correspond to the label of training data, the training data should be sorted according to the label.
For example, if the label of the the first data point is 4, then the first entry of post-probability vector is related to data points labeled 4.
The order of the the labels stored in the model may different from what we thought it should be. You can check using svmModel.Label. And the probability estimates are outputted according to this order.

Find groups with high cross correlation matrix in Matlab

Given a lower triangular matrix (100x100) containg cross-correlation
values, where entry 'ij' is the correlation value between signal 'i'
and 'j' and so a high value means that these two signals belong to
the same class of objects, and knowing there are at most four distinct
classes in the data set, does someone know of a fast and effective way
to classify the data and assign all the signals to the 4 different
classes, rather than search and cross check all the entries against
each other? The following 7x7 matrix may help illustrate
the point:
1 0 0 0 0 0 0
.2 1 0 0 0 0 0
.8 .15 1 0 0 0 0
.9 .17 .8 1 0 0 0
.23 .8 .15 .14 1 0 0
.7 .13 .77 .83. .11 1 0
.1 .21 .19 .11 .17 .16 1
there are three classes in this example:
class 1: rows <1 3 4 6>,
class 2: rows <2 5>,
class 3: rows <7>
This is a good problem for hierarchical clustering. Using complete linkage clustering you will get compact clusters, all you have to do is determine the cutoff distance, at which two clusters should be considered different.
First, you need to convert the correlation matrix to a dissimilarity matrix. Since correlation is between 0 and 1, 1-correlation will work well - high correlations get a score close to 0, and low correlations get a score close to 1. Assume that the correlations are stored in an array corrMat
%# remove diagonal elements
corrMat = corrMat - eye(size(corrMat));
%# and convert to a vector (as pdist)
dissimilarity = 1 - corrMat(find(corrMat))';
%# decide on a cutoff
%# remember that 0.4 corresponds to corr of 0.6!
cutoff = 0.5;
%# perform complete linkage clustering
Z = linkage(dissimilarity,'complete');
%# group the data into clusters
%# (cutoff is at a correlation of 0.5)
groups = cluster(Z,'cutoff',cutoff,'criterion','distance')
groups =
2
3
2
2
3
2
1
To confirm that everything is great, you can visualize the dendrogram
dendrogram(Z,0,'colorthreshold',cutoff)
You can use the following method instead of creating the dissimilarity matrix.
Z = linkage(corrMat,'complete','correlation')
This allows Matlab to interpret your matrix as correlation distance and then, you can plot the dendrogram as follows:
dendrogram(Z);
One way to verify if your dendrogram is right or not is by checking its maximum height which should correspond to 1-min(corrMat). If the minimum value in corrMat is 0 then the maximum height of your tree should be 1. If the minimum value is -1 (negative correlation), the height should be 2.
Since it is given that there are going to be 4 groups, I'd start with a pretty simplistic two stage approach.
In the first stage you find the maximum correlation among any two elements, place those two elements in a group, then zero out their correlation in the matrix. Repeat, finding the next highest correlation among two elements and either adding those to an existing group or creating a new one until you have the correct number of groups.
Finally, check which elements aren't in a group, go to their column, and identify the highest correlation they have with any other group. If that element is in a group already, place them in that group as well, otherwise skip to the next element and come back to them later.
If there is interest or anything isn't clear I can add code later. Like I said, the approach is simplistic but if you don't need to verify the number of groups I think it should be effective.

How to compare different distribution means with reference truth value in Matlab?

I have production (q) values from 4 different methods stored in the 4 matrices. Each of the 4 matrices contains q values from a different method as:
Matrix_1 = 1 row x 20 column
Matrix_2 = 100 rows x 20 columns
Matrix_3 = 100 rows x 20 columns
Matrix_4 = 100 rows x 20 columns
The number of columns indicate the number of years. 1 row would contain the production values corresponding to the 20 years. Other 99 rows for matrix 2, 3 and 4 are just the different realizations (or simulation runs). So basically the other 99 rows for matrix 2,3 and 4 are repeat cases (but not with exact values because of random numbers).
Consider Matrix_1 as the reference truth (or base case ). Now I want to compare the other 3 matrices with Matrix_1 to see which one among those three matrices (each with 100 repeats) compares best, or closely imitates, with Matrix_1.
How can this be done in Matlab?
I know, manually, that we use confidence interval (CI) by plotting the mean of Matrix_1, and drawing each distribution of mean of Matrix_2, mean of Matrix_3 and mean of Matrix_4. The largest CI among matrix 2, 3 and 4 which contains the reference truth (or mean of Matrix_1) will be the answer.
mean of Matrix_1 = (1 row x 1 column)
mean of Matrix_2 = (100 rows x 1 column)
mean of Matrix_3 = (100 rows x 1 column)
mean of Matrix_4 = (100 rows x 1 column)
I hope the question is clear and relevant to SO. Otherwise please feel free to edit/suggest anything in question. Thanks!
EDIT: My three methods I talked about are a1, a2 and a3 respectively. Here's my result:
ci_a1 =
1.0e+008 *
4.084733001497999
4.097677503988565
ci_a2 =
1.0e+008 *
5.424396063219890
5.586301025525149
ci_a3 =
1.0e+008 *
2.429145282593182
2.838897116739112
p_a1 =
8.094614835195452e-130
p_a2 =
2.824626709966993e-072
p_a3 =
3.054667629953656e-012
h_a1 = 1; h_a2 = 1; h_a3 = 1
None of my CI, from the three methods, includes the mean ( = 3.454992884900722e+008) inside it. So do we still consider p-value to choose the best result?
If I understand correctly the calculation in MATLAB is pretty strait-forward.
Steps 1-2 (mean calculation):
k1_mean = mean(k1);
k2_mean = mean(k2);
k3_mean = mean(k3);
k4_mean = mean(k4);
Step 3, use HIST to plot distribution histograms:
hist([k2_mean; k3_mean; k4_mean]')
Step 4. You can do t-test comparing your vectors 2, 3 and 4 against normal distribution with mean k1_mean and unknown variance. See TTEST for details.
[h,p,ci] = ttest(k2_mean,k1_mean);
EDIT : I misinterpreted your question. See the answer of Yuk and following comments. My answer is what you need if you want to compare distributions of two vectors instead of a vector against a single value. Apparently, the latter is the case here.
Regarding your t-tests, you should keep in mind that they test against a "true" mean. Given the number of values for each matrix and the confidence intervals it's not too difficult to guess the standard deviation on your results. This is a measure of the "spread" of your results. Now the error on your mean is calculated as the standard deviation of your results divided by the number of observations. And the confidence interval is calculated by multiplying that standard error with appx. 2.
This confidence interval contains the true mean in 95% of the cases. So if the true mean is exactly at the border of that interval, the p-value is 0.05 the further away the mean, the lower the p-value. This can be interpreted as the chance that the values you have in matrix 2, 3 or 4 come from a population with a mean as in matrix 1. If you see your p-values, these chances can be said to be non-existent.
So you see that when the number of values get high, the confidence interval becomes smaller and the t-test becomes very sensitive. What this tells you, is nothing more that the three matrices differ significantly from the mean. If you have to choose one, I'd take a look at the distributions anyway. Otherwise the one with the closest mean seems a good guess. If you want to get deeper into this, you could also ask on stats.stackexchange.com
Your question and your method aren't really clear :
Is the distribution equal in all columns? This is important, as two distributions can have the same mean, but differ significantly :
is there a reason why you don't use the Central Limit Theorem? This seems to me like a very complex way of obtaining a result that can easily be found using the fact that the distribution of a mean approaches a normal distribution where sd(mean) = sd(observations)/number of observations. Saves you quite some work -if the distributions are alike! -
Now if the question is really the comparison of distributions, you should consider looking at a qqplot for a general idea, and at a 2-sample kolmogorov-smirnov test for formal testing. But please read in on this test, as you have to understand what it does in order to interprete the results correctly.
On a sidenote : if you do this test on multiple cases, make sure you understand the problem of multiple comparisons and use the appropriate correction, eg. Bonferroni or Dunn-Sidak.