Hidden Markov Model to predict the next state - matlab

I understood HMM so so but not training it very well. I could not figure out how I should setup training and testing in my case. I am going to explain my problem.
I am working on Foursquare check-in dataset to model a HMM to predict next venue category of a user (I assume it is the Forward algorithm).
First, I divided the area where these check-ins are reported into hexagonal grid. Let's say there are 200 hexagons. These hexagons are my hidden states. Then, I calculated transition probabilities for these hex cells by exploiting order of hex visits of users in time. The transition matrix is 200x200.
Second, I calculated emission probabilities by counting number of each category in every cell. There are 9 venue categories. Thus, emission matrix is 200x9.
Third, start probabilities are calculated by looking distribution of check-ins in each category.
Namely:
states = ('hex1', 'hex2', ..., 'hex200')
observations = ('category1', 'category2', ..., 'category9')
Till this point, everything is okay. However, I want to predict next category of a user if I know his/her previously located hexs and categories in a way.
I want to train my model, then test it and also show accuracy of my predictions according to the model. For this, the model should be trained with what kind of sequences? I cannot realize that how I predict the next category and more importantly, how can I setup the model to perform Expectation Maximization. Any effort to help me will be appreciated. If you want to explain these as code sample, you may write it in pseudo or MATLAB. Also, verbal explanation is okay.

Related

Choice of Neural Network and Activation Function

I am very new to the field of Neural Network. Apologies, if this question is very amateurish.
I am looking to build a neural network model to predict whether a particular image that I am about to post on a social media platform will get a certain engagement rate.
I have around 120 images with historical data about the engagement rate. The following information is available:
Images of size 501 px x 501 px
Type of image (Exterior photoshoot/Interior photoshoot)
Day of posting the image (Sunday/Monday/Tuesday/Wednesday/Thursday/Friday/Saturday)
Time of posting the image (18:33, 10:13, 19:36 etc)
No. of people who have seen the post (15659, 35754, 25312 etc)
Engagement rate (5.22%, 3.12%, 2.63% etc)
I would like the model to predict if a certain image when posted on a particular day and time will give an engagement rate of 3% or more.
As you may have noticed, the input data is images, text (signifying what type or day), time and numbers.
Could you please help me understand how to build a neural network for this problem?
P.S: I am very new to this field. It would be great if you can give a detailed direction how I should proceed to solve this problem.
A neural network has three kinds of neuronal layers:
Input layer. It stores the inputs this network will receive. The number of neurons must equal the number of inputs you have;
Hidden layer. It uses the inputs that come from the previous layer and it does the necessary calculations so as to obtain a result, which passes to the output layer. More complex problems may require more than one hidden layer. As far as I know, there is not an algorithm to determine the number of neurons in this layer, so I think you determine this number based on trial and error and previous experience;
Output layer. It gets the results from the hidden layer and gives it to the user for his personal use. The number of neurons from the output layer equals the number of outputs you have.
According to what you write here, your training database has 6 inputs and one output (the engagement rate). This means that your artificial neural network (ANN) will have 6 neurons on the input layer and one neuron on the output layer.
I not sure if you can pass images as inputs to a neural network. Also, because in theory there are an infinite types of images, I think you should categorize them a bit, each category receiving a number. An example of categorization would be:
Images with dogs are in category 1;
Images with hospitals are in category 2, etc.
So, your inputs will look like this:
Image category (dogs=1, hospitals=2, etc.);
Type of image (Exterior photoshoot=1, interior photoshoot=2);
Posting day (Sunday=1, Monday=2, etc.);
Time of posting the image;
Number of people who have seen the post;
Engagement rate.
The number of hidden layers and the number of each neuron from each hidden layer depends on your problem's complexity. Having 120 pictures, I think one hidden layer and 10 neurons on this layer is enough.
The ANN will have one hidden layer (the engagement rate).
Once the database containing the information about the 120 pictures is created (known as training database) is created, the next step is to train the ANN using the database. However, there is some discussion here.
Training an ANN means computing some parameters of the hidden neurons by using an optimization algorithm so as the sum of squared errors is minimum. The training process has some degree of randomness to it. To minimize the effect of the randomness factor and to get as precise estimations as possible, your training database must have:
Consistent data;
Many records;
I don't know how consistent your data are, but from my experience, a small training database with consistent data beats a huge database with non-consistent ones.
Judging by the problem, I think you should use the default activation function provided by the software you use for ANN handling.
Once you have trained your database, it is time to see how efficient this training was. The software which you use for ANN should provide you with tools to estimate this, tools which should be documented. If training is satisfactory for you, you may begin using it. If it is not, you may either re-train the ANN or use a larger database.

Error function and ReLu in a CNN

I'm trying to get a better understanding of neural networks by trying to programm a Convolution Neural Network by myself.
So far, I'm going to make it pretty simple by not using max-pooling and using simple ReLu-activation. I'm aware of the disadvantages of this setup, but the point is not making the best image detector in the world.
Now, I'm stuck understanding the details of the error calculation, propagating it back and how it interplays with the used activation-function for calculating the new weights.
I read this document (A Beginner's Guide To Understand CNN), but it doesn't help me understand much. The formula for calculating the error already confuses me.
This sum-function doesn't have defined start- and ending points, so i basically can't read it. Maybe you can simply provide me with the correct one?
After that, the author assumes a variable L that is just "that value" (i assume he means E_total?) and gives an example for how to define the new weight:
where W is the weights of a particular layer.
This confuses me, as i always stood under the impression the activation-function (ReLu in my case) played a role in how to calculate the new weight. Also, this seems to imply i simply use the error for all layers. Doesn't the error value i propagate back into the next layer somehow depends on what i calculated in the previous one?
Maybe all of this is just uncomplete and you can point me into the direction that helps me best for my case.
Thanks in advance.
You do not backpropagate errors, but gradients. The activation function plays a role in caculating the new weight, depending on whether or not the weight in question is before or after said activation, and whether or not it is connected. If a weight w is after your non-linearity layer f, then the gradient dL/dw wont depend on f. But if w is before f, then, if they are connected, then dL/dw will depend on f. For example, suppose w is the weight vector of a fully connected layer, and assume that f directly follows this layer. Then,
dL/dw=(dL/df)*df/dw //notations might change according to the shape
//of the tensors/matrices/vectors you chose, but
//this is just the chain rule
As for your cost function, it is correct. Many people write these formulas in this non-formal style so that you get the idea, but that you can adapt it to your own tensor shapes. By the way, this sort of MSE function is better suited to continous label spaces. You might want to use softmax or an svm loss for image classification (I'll come back to that). Anyway, as you requested a correct form for this function, here is an example. Imagine you have a neural network that predicts a vector field of some kind (like surface normals). Assume that it takes a 2d pixel x_i and predicts a 3d vector v_i for that pixel. Now, in your training data, x_i will already have a ground truth 3d vector (i.e label), that we'll call y_i. Then, your cost function will be (the index i runs on all data samples):
sum_i{(y_i-v_i)^t (y_i-vi)}=sum_i{||y_i-v_i||^2}
But as I said, this cost function works if the labels form a continuous space (here , R^3). This is also called a regression problem.
Here's an example if you are interested in (image) classification. I'll explain it with a softmax loss, the intuition for other losses is more or less similar. Assume we have n classes, and imagine that in your training set, for each data point x_i, you have a label c_i that indicates the correct class. Now, your neural network should produce scores for each possible label, that we'll note s_1,..,s_n. Let's note the score of the correct class of a training sample x_i as s_{c_i}. Now, if we use a softmax function, the intuition is to transform the scores into a probability distribution, and maximise the probability of the correct classes. That is , we maximse the function
sum_i { exp(s_{c_i}) / sum_j(exp(s_j))}
where i runs over all training samples, and j=1,..n on all class labels.
Finally, I don't think the guide you are reading is a good starting point. I recommend this excellent course instead (essentially the Andrew Karpathy parts at least).

Decision Level Fusion of SVR outputs

I have two sets of features predicting the same outputs. But instead of training everything at once, I would like to train them separately and fuse the decisions. In SVM classification, we can take the probability values for the classes which can be used to train another SVM. But in SVR, how can we do this?
Any ideas?
Thanks :)
There are a couple of choices here . The two most popular ones would be:
ONE)
Build the two models and simply average the results.
It tends to work well in practice.
TWO)
You could do it in a very similar fashion as when you have probabilities. The problem is, you need to control for over fitting .What I mean is that it is "dangerous" to produce a score with one set of features and apply to another where the labels are exactly the same as before (even if the new features are different). This is because the new applied score was trained on these labels and therefore over fits in it (hyper-performs).
Normally you use a Cross-validation
In your case you have
train_set_1 with X1 features and label Y
train_set_2 with X2 features and same label Y
Some psedo code:
randomly split 50-50 both train_set_1 and train_set_2 at exactly the same points along with the Y (output array)
so now you have:
a.train_set_1 (50% of training_set_1)
b.train_set_1 (the rest of 50% of training_set_1)
a.train_set_2 (50% of training_set_2)
b.train_set_2 (the rest of 50% of training_set_2)
a.Y (50% of the output array that corresponds to the same sets as a.train_set_1 and a.train_set_2)
b.Y (50% of the output array that corresponds to the same sets as b.train_set_1 and b.train_set_2)
here is the key part
Build a svr with a.train_set_1 (that contains X1 features) and output a.Y and
Apply that model's prediction as a feature to b.train_set_2 .
By this I mean, you score the b.train_set_2 base on your first model. Then you take this score and paste it next to your a.train_set_2 .So now this set will have X2 features + 1 more feature, the score produced by the first model.
Then build your final model on b.train_set_2 and b.Y
The new model , although uses the score produced from training_set_1, still it does so in an unbiased way , since the later was never trained on these labels!
You might also find this paper quite useful

Lake Visitor Modeling by Neural Networks

Let's say I want to model the amount of visitors at an arbitrary lake at specific time.
Given Data:
Time Series of Amount of Visitors for 12 lakes.
Weather Time Series for the 12 lakes
Number of Trees at lake
Percentage of grass/stone ground of the beach.
Hereby I want to use a Neural Network (NN) to model the amount of visitors and I have some essential questions which I want to introduce step by step. Note that the visitor time series shall not be used!
1) we only use the Inputs:
Time of Day
Day of Week
So there is two inputs and one output. I read of a rule of thumb which says that the hidden neurons should be chosen as
#input>=neurons>=#output.
Is the number of inputs here 2 or is it an estimate of the real amount of dependent variables (as weather, mood of persons, economical situation, ....). If yes so I should choose my hidden neurons as 1 or 2, correct?
2) If I want to include lake specific parameters as the number of treas or the ground ratio, can I just add these as additional inputs (constant for each of the twelve lakes) or would that not help for some reason? How could I assure that there is a causal connection between these inputs and the output?
3) For weather since it is a time series which weather values should I use. How do I get the optimal delay for example. Would Granger Causality be a mean to determine that?
Hope you can help. I just wanna discuss on the strength of NNs for modeling and want to hear your opinion. I would use Matlabs Neural Network Toolbox for this.
Thanks in advance.

Mapping Vision Outputs To Neural Network Inputs

I'm fairly new to MATLAB, but have acquainted myself with Simulink and Computer Vision over the past few days. My problem statement involves taking a traffic/highway video input and detecting if an accident has occurred.
I plan to do this by extracting the values of centroid to plot trajectory, velocity difference (between frames) and distance between two vehicles. I can successfully track the centroids, and aim to derive the rest of the features.
What I don't know is how to map these to ANN. I mean, every image has more than one vehicle blobs, which means, there are multiple centroids in a single frame/image. So, how does NN act on multiple inputs (the extracted features per vehicle) simultaneously? I am obviously missing the link. Help me figure it out please.
Also, am I looking at time series data?
I am not exactly sure about your question. The problem can be both time series data and not. You might be able to transform the time series version of the problem, such that it can be solved using ANN, but it is sort of a Maslow's hammer :). Also, Could you rephrase the problem.
As you said, you could give it features from two or three frames and then use the classifier to detect accident or not, but it might be difficult to train such a classifier. The problem is really difficult and the so you might need tons of training samples to get it right, esp really good negative samples (for examples cars travelling close to each other) etc.
There are multiple ways you can try to solve this problem of accident detection. For example : Build a classifier (ANN/SVM etc) to detect accidents without time series data. In which case your input would be accident images and non accident images or some sort of positive and negative samples for training and later images for test. In this specific case, you are not looking at the time series data. But here you might need lots of features to detect the same (this in some sense a single frame version of the problem).
The second method would be to use time series data, in which case you will have to detect the features, track the features (say using Lucas Kanade/Horn and Schunck) and then use the information about velocity and centroid to detect the accident. You might even be able to formulate it for HMMs.