I wish to use online logistic regression training in Matlab in which I train the model by presenting the first sample, evaluate the model, next add the second sample, evaluate etc. etc.
I could do this by first creating a model on the first sample, evaluating it, throw this model away; next create a model on sample one and two, evaluate it etc. etc but this is very ineffecient. Is there a way I could do 'real' online training of the logistic regression model in Matlab?
Short answer: No Matlab does not support it (at least not that i'm aware of). Therefore you need to create a whole new model every time you get new input data. Depending on the size of the task this might still be the best choice.
Workaround: You can implement it yourself, by creating a loss function which updates every time. Take a look at this paper if you decide to go this way (it about many kinds of loss function but you are interested in the logistic one):
http://arxiv.org/abs/1011.1576
Or you could go Bayesan and update your priors any time a new point comes in.
Related
In text generate mission, we usually use model's last output as current input to generate next word. More generalized, I want to achieve a neural network that regards next layer's finally hidden state as current layer's input. Just like the following(what confuses me is the decoder part):
But I have read Keras document and haven't found any functions to achieve it.
Can I achieve this structure by Keras? How?
What you are asking is an autoencoders, you can find similar structures in Keras.
But there are certain details that you should figure it out on your own. Including the padding strategy and preprocessing your input and output data. Your input cannot get dynamic input size, so you need to have a fixed length for input and outputs. I don't know what do you mean by arrows who join in one circle but I guess you can take a look at Merge layer in Keras (basically adding, concatenating, and etc.)
You probably need 4 sequential model and one final model that represent the combined structure.
One more thing, the decoder setup of LSTM (The Language Model) is not dynamic in design. In your model definition, you basically introduce a fixed inputs and outputs for it. Then you prepare the training correctly, so you don't need anything dynamic. Then during the test, you can predict each decoded word in a loop by running the model once predict the next output step and run it again for next time step and so on.
The structure you have showed is a custom structure. So, Keras doesn't provide any class or wrapper to directly build such structure. But YES, you can build this kind of structure in Keras.
So, it looks like you need LSTM model in backward direction. I didn't understand the other part which probably looks like incorporating previous sentence embedding as input to the next time-step input of LSTM unit.
I rather encourage you to work with simple language-modeling with LSTM first. Then you can tweak the architecture later to build an architecture depicted in figure.
Example:
Text generation with LSTM in Keras
I am trying to fit some input to predict an output in Matlab using fitnet neural networks, but I am concerned in finding which input candidate vector would correlate the most with the output as a preprocessing step prior to my neural network training.
In the figure below the output in yellow has five input candidates where I need to chose only from. What command should I use in Matlab and how should I prepare that data (repeated around 1000 time) so I can get a clear correlation between the input candidate and the output.
To find out correlation between given feature and target variable you can use R = corrcoef(A,B), but... do not do it!.
This process makes no sense and will be probably harmfull for the whole process. You are going to remove part of information from your data so only features which have idependent, linear realtion to target variable persist. Then, you will apply highly-non linear model which exploits co-occurences and features correlations. These two steps are completely incompatible. The only valid relation is - if your data is very simple and it can be pretty much modeled with linear model, then neural net will work as well. But then there is no point in using a neural net in the first place, just apply linear regression. Consequently: do not perform feature selection unless you have to. Try to build a good model without doing that, and if you have to remove some features (maybe getting them is expensive process?) use post-hoc model analysis to remove features which are not used by this model. Do not split your problem to multiple, independent processes if you do not have to (unless you can show that this decomposition does not harm the process, but in case of feature selection + regressor this is not true, as you cannot construct a valid feature selection supervision without trained regressor).
I have run ANN in matlab for prediction a variable based on several response variables.ALL variables have numerical values.I could not get a desirable results although I changed hidden neuron several times many runs of the model and so on.My question is should I use transformation of the input variables to get a better results?how can I know that which transformation I should choos?Thanks for any help.
I strongly advise you to use some methods from time series analysis like lagged correlation or window lagged correlation (with statistical tests). You can find it in most of statistical packages (e.g. in R). From one small picture it's hard to deduce whether your prediction is lagged or not. Testing huge amount of data can help you in revealing true dependencies and avoid trusting in spurious correlations.
I'm trying to get started using neural networks for a classification problem. I chose to use the Encog 3.x library as I'm working on the JVM (in Scala). Please let me know if this problem is better handled by another library.
I've been using resilient backpropagation. I have 1 hidden layer, and e.g. 3 output neurons, one for each of the 3 target categories. So ideal outputs are either 1/0/0, 0/1/0 or 0/0/1. Now, the problem is that the training tries to minimize the error, e.g. turn 0.6/0.2/0.2 into 0.8/0.1/0.1 if the ideal output is 1/0/0. But since I'm picking the highest value as the predicted category, this doesn't matter for me, and I'd want the training to spend more effort in actually reducing the number of wrong predictions.
So I learnt that I should use a softmax function as the output (although it is unclear to me if this becomes a 4th layer or I should just replace the activation function of the 3rd layer with softmax), and then have the training reduce the cross entropy. Now I think that this cross entropy needs to be calculated either over the entire network or over the entire output layer, but the ErrorFunction that one can customize calculates the error on a neuron-by-neuron basis (reads array of ideal inputs and actual inputs, writes array of error values). So how does one actually do cross entropy minimization using Encog (or which other JVM-based library should I choose)?
I'm also working with Encog, but in Java, though I don't think it makes a real difference. I have similar problem and as far as I know you have to write your own function that minimizes cross entropy.
And as I understand it, softmax should just replace your 3rd layer.
I am new to image classification, currently working on SVM(support Vector Machine) method for classifying four groups of images by multisvm function, my algorithm every time the training and testing data are randomly selected and the performance is varies at every time. Some one suggested to do cross validation i did not understand why we need cross validation and what is the main purpose of this? . My actual data set consist training matrix size 28×40000 and testing matrix size 17×40000. how to do cross validation by this data set help me. thanks in advance .
Cross validation is used to select your model. The out-of-sample error can be estimated from your validation error. As a result, you would like to select the model with the least validation error. Here the model refers to the features you want to use, and of more importance, the gamma and C in your SVM. After cross validation, you will use the selected gamma and C with the least average validation error to train the whole training data.
You may also need to estimate the performance of your features and parameters to avoid both high-bias and high-variance. Whether your model suffers underfitting or overfitting can be observed from both in-sample-error and validation error.
Ideally 10-fold is often used for cross validation.
I'm not familiar with multiSVM but you may want to check out libSVM, it is a popular, free SVM library with support for a number of different programming languages.
Here they describe cross validation briefly. It is a way to avoid over-fitting the model by breaking up the training data into sub groups. In this way you can find a model (defined by a set of parameters) which fits both sub groups optimally.
For example, in the following picture they plot the validation accuracy contours for parameterized gamma and C values which are used to define the model. From this contour plot you can tell that the heuristically optimal values (from those tested) are those that give an accuracy closer to 84 instead of 81.
Refer to this link for more detailed information on cross-validation.
You always need to cross-validate your experiments in order to guarantee a correct scientific approach. For instance, if you don't cross-validate, the results you read (such as accuracy) might be highly biased by your test set. In an extreme case, your training step might have been very weak (in terms of fitting data) and your test step might have been very good. This applies to ALL machine learning and optimization experiments, not only SVMs.
To avoid such problems just divide your initial dataset in two (for instance), then train in the first set and test in the second, and repeat the process invesely, training in the second and testing in the first. This will guarantee that any biases to the data are visible to you. As someone suggested, you can perform this with even further division: 10-fold cross-validation, means dividing your data set in 10 parts, then training in 9 and testing in 1, then repeating the process until you have tested in all parts.