Can I use next layer's output as current layer's input by Keras? - neural-network

In text generate mission, we usually use model's last output as current input to generate next word. More generalized, I want to achieve a neural network that regards next layer's finally hidden state as current layer's input. Just like the following(what confuses me is the decoder part):
But I have read Keras document and haven't found any functions to achieve it.
Can I achieve this structure by Keras? How?

What you are asking is an autoencoders, you can find similar structures in Keras.
But there are certain details that you should figure it out on your own. Including the padding strategy and preprocessing your input and output data. Your input cannot get dynamic input size, so you need to have a fixed length for input and outputs. I don't know what do you mean by arrows who join in one circle but I guess you can take a look at Merge layer in Keras (basically adding, concatenating, and etc.)
You probably need 4 sequential model and one final model that represent the combined structure.
One more thing, the decoder setup of LSTM (The Language Model) is not dynamic in design. In your model definition, you basically introduce a fixed inputs and outputs for it. Then you prepare the training correctly, so you don't need anything dynamic. Then during the test, you can predict each decoded word in a loop by running the model once predict the next output step and run it again for next time step and so on.

The structure you have showed is a custom structure. So, Keras doesn't provide any class or wrapper to directly build such structure. But YES, you can build this kind of structure in Keras.
So, it looks like you need LSTM model in backward direction. I didn't understand the other part which probably looks like incorporating previous sentence embedding as input to the next time-step input of LSTM unit.
I rather encourage you to work with simple language-modeling with LSTM first. Then you can tweak the architecture later to build an architecture depicted in figure.
Example:
Text generation with LSTM in Keras

Related

create deep network in matlab with logsig layer instead of softmax layer

I want to create a deep classification net, but my classes aren't mutually exclusive (that is what sofmaxlayer do).
Is it possible to define a non mutually exclusive classification layer (i.e., a data can be in more than one class)?
One way to do it, it would be with a logsig function in the classification layer, instead of a softmax, but I have no idea how to acomplish that....
In CNN you can have multiple class in last layer as you know. But if I understand correctly your need in last layer an out put with that is in a range of numbers instead of 1 or 0 for each class. Its mean you need regression. If your labels support this task it's OK and you can do it with regression just like what happen in bounding box regression for localization. And you don't need soft-max in last layer. just use other activation functions that produce sufficient out put for your task.

Usage of indicator functions as features in Sequential Models

I am currently using Mallet for training a sequential model using CRF. I have understood how to provide features (that solely depend on input sequence) to the mallet package. Based on my understanding, in mallet, we have to compute all the values of the feature functions (upfront). Now, I would like to use indicator functions that depend on the label of a token. The value of these functions depends on the output label sequence and during training, I can compute the values of these indicator functions as the output label sequence is known. But, when I am applying this trained CRF model on a new input (whose output label sequene is unknown), how should I calculate the values for such features.
It will be very helpful to me if anyone can provide me any tips/relevant documents.
As you've phrased it, the question doesn't make sense: if you don't know the hidden labels, you can't set anything based on those unknown labels. An example might help.
You may not need to explicitly record these relationships. At training time the algorithm sets the parameters of the CRF to represent the relationship between the observed features and the unobserved state. Different CRF architectures can allow you to add dependencies between multiple hidden states.

How to add a custom layer and loss function into a pretrained CNN model by matconvnet?

I'm new to matconvnet. Recently, I'd like to try a new loss function instead of the existing one in pretrained model, e.g., vgg-16, which usually uses softmax loss layer. What's more, I want to use a new feature extractor layer, instead of pooling layer or max layer. I know there are 2 CNN wrappers in matconvnet, simpleNN and DagNN respectively, since I'm using vgg-16,a linear model which has a linear sequence of building blocks. So, in simpleNN wrapper, how to create a custom layer in detail, espectially the procedure and the relevant concept, e.g., do I need to remove layers behind the new feature extractor layer or just leave them ? And I know how to compute the derivative of the loss function so the details of computation inside the layer is not that important in this question, I just want to know the procedure represented by codes. Could someone help me? I'll appreciate it a lot !
You can remove the older error or objective layer
net.layer('abc')=[];
and you can add new error code in vl_nnloss() file

Online logistic regression

I wish to use online logistic regression training in Matlab in which I train the model by presenting the first sample, evaluate the model, next add the second sample, evaluate etc. etc.
I could do this by first creating a model on the first sample, evaluating it, throw this model away; next create a model on sample one and two, evaluate it etc. etc but this is very ineffecient. Is there a way I could do 'real' online training of the logistic regression model in Matlab?
Short answer: No Matlab does not support it (at least not that i'm aware of). Therefore you need to create a whole new model every time you get new input data. Depending on the size of the task this might still be the best choice.
Workaround: You can implement it yourself, by creating a loss function which updates every time. Take a look at this paper if you decide to go this way (it about many kinds of loss function but you are interested in the logistic one):
http://arxiv.org/abs/1011.1576
Or you could go Bayesan and update your priors any time a new point comes in.

Neural Network bias

I'm building a feed forward neural network, and I'm trying to decide how to implement the bias. I'm not sure about two things:
1) Is there any downside to implementing the bias as a trait of the node as opposed to a dummy input+weight?
2) If I implement it as a dummy input, would it be input just in the first layer (from the input to the hidden layer), or would I need a dummy input in every layer?
Thanks!
P.S. I'm currently using 2d arrays to represent weights between layers. Any ideas for other implementation structures? This isn't my main question, just looking for food for thought.
Implementation doesn't matter as long as the behaviour is right.
Yes, it is needed in every layer.
2d array is a way to go.
I'd suggest to include bias as another neuron with constant input 1. This will make it easier to implement - you don't need a special variable for it or anything like that.