I have a question about using Keras (with Theano as my backend) to which I'm rather new. I'm using a many to one RNN (takes in a time series as the input, computes one number as the output) as my first set of layers. So far, this is trivial with Keras using the recurrent layer IO.
Here is where I'm having trouble:
Now I like to pass the output of this RNN (the one number) to a separate function (lets call this f) and then do some computation with it.
What I would like to do is take this computed output (after the function f) and train it against the expected output (via some loss such as mse).
I'd like some advice on how to feed the output post computation from the function f and still train it via model.fit feature in Keras.
My pseudo code is as follows:
X = input
Y = output
#RNN layer
model.add(LSTM(....))
model.add(Activation(...)) %%Returns W*X
#function f %%Returns f(W*X)
(Needs to take in output from final RNN layer to generate a new number)
model.fit(X,Y,....)
In above, I'm not sure how to write code to include the output from function f while it is training for weights in the RNN (i.e. train f(W*x) against Y).
Any help is appreciated, thanks!!
It is not clear from your question if the RNN's weights should update with the training of f.
1st option - They should
As Matias said - a simple Dense layer is probably what you are looking for:
X = input
Y = output
#RNN layer
model.add(LSTM(....))
model.add(Activation(...)) %%Returns W*X
model.add(Dense(...))
model.fit(X,Y,....)
option 2 - They should not
Your f function would still be a Dense layer but you will iteratively train f and the RNN separately.
Assuming you have an rnn_model that you defined as above, define a new model f:
X = input
Y = output
#RNN layer
rnn_model = Sequential()
rnn_model .add(LSTM(....))
rnn_model .add(Activation(...)) %%Returns W*X
f_model = Sequential()
f_model.add(rnn_model)
f_model.add(Dense(...))
Now you can train them separately by doing:
# Code to train rnn_model
rnn_model.trainable = False
# Code to train f_model
rnn_model.trainable = True
The simplest way is to add a layer to your model that does the exact computation you want. From your comments it seems you just want
f(W*X), and that is exactly what a Dense layer does, minus the bias term.
So I believe adding a dense layer with the appropriate activation function is everything you need to do. If you don't need an activation at the output then just use "linear" as activation.
Just note that function f needs to be specified as a symbolic function using methods from keras.backend and it should be a differentiable function.
Related
When building a simple perceptron neural network we usuall passes a 2D matrix of input of format (batch_size,features) to a 2D weight matrix, similar to this simple neural network in numpy. I always assumed a Perceptron/Dense/Linear layer of a neural network only accepts an input of 2D format and outputs another 2D output. But recently I came across this pytorch model in which a Linear layer accepts a 3D input tensor and output another 3D tensor (o1 = self.a1(x)).
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class Net(nn.Module):
def __init__(self):
super().__init__()
self.a1 = nn.Linear(4,4)
self.a2 = nn.Linear(4,4)
self.a3 = nn.Linear(9,1)
def forward(self,x):
o1 = self.a1(x)
o2 = self.a2(x).transpose(1,2)
output = torch.bmm(o1,o2)
output = output.view(len(x),9)
output = self.a3(output)
return output
x = torch.randn(10,3,4)
y = torch.ones(10,1)
net = Net()
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters())
for i in range(10):
net.zero_grad()
output = net(x)
loss = criterion(output,y)
loss.backward()
optimizer.step()
print(loss.item())
These are the question I have,
Is the above neural network a valid one? that is whether the model will train correctly?
Even after passing a 3D input x = torch.randn(10,3,4), why is the pytorch nn.Linear doesn't shows any error and gives a 3D output?
Newer versions of PyTorch allows nn.Linear to accept N-D input tensor, the only constraint is that the last dimension of the input tensor will equal in_features of the linear layer. The linear transformation is then applied on the last dimension of the tensor.
For instance, if in_features=5 and out_features=10 and the input tensor x has dimensions 2-3-5, then the output tensor will have dimensions 2-3-10.
If you have a look at the documentation, you will find that indeed the Linear layer accepts tensors of arbitrary shape, where only the last dimension must match with the in_features argument you specified in the constructor.
The output will have exactly the same shape as the input, only the last dimension will change to whatever you specified as out_features in the constructor.
It works in a way that the same layer (with the same weights) is applied on each of the (possibly) multiple inputs. In your example you have an input shape of (10, 3, 4) which is basically a set of 10 * 3 == 30 4-dimensional vectors. So, your layers a1 and a2 are applied on all of these 30 vectors to generate another 10 * 3 == 30 4D vectors as the output (because you specified out_features=4 in the constructor).
So, to answer your questions:
Is the above neural network a valid one? that is whether the model will train correctly?
Yes, it is valid and it will be trained "correctly" from a technical pov. But, as with any other network, if this will actually correctly tackle your problem is another question.
Even after passing a 3D input x = torch.randn(10,3,4), why is the pytorch nn.Linear doesn't shows any error and gives a 3D output?
Well, because it is defined to work this way.
In convolutional Neural Networks, How to know the output of a specific conv layer? (I am using keras to build a CNN model)
For example if I am using one dimensional conv layer, where number_of_filters=20, kernel_size=10, and input_shape(500,1)
cnn.add(Conv1D(20,kernel_size=10,strides=1, padding="same",activation="sigmoid",input_shape=(Dimension_of_input,1)))
and if I am using two dimensional conv layer, where number_of_filters=64, kernal_size=(5,100), input_shape= (5,720,1) (height,width,channel)
Conv2D(64, (5, 100),
padding="same",
activation="sigmoid",
data_format="channels_last",
input_shape=(5,720,1)
what is the number of output in the above two conv layers? Is there any equation that can be used to know the number of outputs of a conv layer in convolution neural network?
Yes, there are equations for it, you can find them in the CS231N course website. But as this is a programming site, Keras provides an easy way to get this information programmaticaly, by using the summary function of a Model.
model = Sequential()
fill model with layers
model.summary()
This will print in terminal/console all the layer information, such as input shapes, output shapes, and number of parameters for each layer.
Actually, the model.summary() function might not be what you are looking for if you want to do more than just look at the model.
If you want to access layers of your Keras model you can do this by using model.layers which returns all of the layers (assignement stores them as a list). If you then want to look at a specific layer you can simply index the list:
list_of_layers = model.layers
list_of_layers[5] # gives you the 6th layer
What you are still working with are just objects so you probably want to get specific values. You just have to specify attribute you want to look at then:
list_of_layers[-1].output_shape # returns output_shape of last layer
Gives you back the output_shape tuple of the last layer in the model.
You can even skip the whole list assignement thing if you already know that you only want to look at the output_shape of a certain layer and just do:
model.layers[-1].output_shape # equivalent to the above method without storing in a list
This might be useful if you want to use these values while building the model to guide the execution in a certain way (adding a pooling layer or doing the padding etc.).
when first time i am working with TensorFlow cnn it is very difficult to dealing with dimensions. below is the general scenario for calculating dimensions:
consider
we have a image of dimension (nXn), filter dimension : (fXf), no padding, no strides applies :
after convolution dimension are : (n-f+1,n-f+1)
dimension of image = (nXn) and filter dimension = (fXf) and we have padding : p
then output dims are = (n+2P-f+1,n+2P-f+1)
if we are using Padding = 'SAME" it means output dims = input dims in this case equation looks like : n+2P-f+1=n
so from here p = (f-1)/2
if we are using valid padding then it means no padding and p =0
in computer vision f is usually odd if f is even it means we have asymmetric padding.
case when we are using stride = s
output dims are ( floor( ((n+2P-f)/s)+1 ),floor( ( (n+2P-f)/s)+1 ) )
I wrote this script (Matlab) for classification using Softmax. Now I want to use same script for regression by replacing the Softmax output layer with a Sigmoid or ReLU activation function. But I wasn't able to do that.
X=houseInputs ;
T=houseTargets;
%Train an autoencoder with a hidden layer of size 10 and a linear transfer function for the decoder. Set the L2 weight regularizer to 0.001, sparsity regularizer to 4 and sparsity proportion to 0.05.
hiddenSize = 10;
autoenc1 = trainAutoencoder(X,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin');
%%
%Extract the features in the hidden layer.
features1 = encode(autoenc1,X);
%Train a second autoencoder using the features from the first autoencoder. Do not scale the data.
hiddenSize = 10;
autoenc2 = trainAutoencoder(features1,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin',...
'ScaleData',false);
features2 = encode(autoenc2,features1);
%%
softnet = trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');
%Stack the encoders and the softmax layer to form a deep network.
deepnet = stack(autoenc1,autoenc2,softnet);
%Train the deep network on the wine data.
deepnet = train(deepnet,X,T);
%Estimate the deep network, deepnet.
y = deepnet(X);
Regression is a different problem from classification. You have to change your loss function to something that fits with a regression e.g. mean square error and of course change the number of neuron to one (you will only ouput 1 value on your last layer).
It is possible to use a Neural Network to perform a regression task but it might be an overkill for many tasks. True regression means to perform a mapping of one set of continuous inputs to another set of continuous outputs:
f: x -> ý
Changing the architecture of a neural network to make it perform a regression task is usually fairly simple. Instead of mapping the continuous input data to a specific class as it is done using the Softmax function as in your case, you have to make the network use only a single output node.
This node will just sum the outputs of the the previous layer (last hidden layer) and multiply the summed activations by 1. During the training process this output ý will be compared to the correct ground-truth value y that comes with your dataset. As a loss function you may use the Root-means-squared-error (RMSE).
Training such a network will result in a model that maps an arbitrary number of independent variables x to a dependent variable ý, which basically is a regression task.
To come back to your Matlab implementation, it would be incorrect to change the current Softmax output layer to be an activation function such as a Sigmoid or ReLU. Instead your would have to implement a custom RMSE output layer for your network, which is fed with the sum of activations coming from the last hidden layer of your network.
I know how the step transfer function works but how does the linear transfer function work? What equation do you use?
Relate answer to AND gate with two inputs and a bias
First of all, in general you want to apply linear transfer function only in the output layer of an MLP and "never" in the hidden layers, where non-linear transfer functions are typically used (logistic function, step. etc.).
Linear transfer function (in the form of f(x) = x for pure linear or purelin as it is mentioned in literature) is typically used for function approximation / regression tasks (this is intuitive because step and logistic functions give binary results where the linear function gives continuous results).
Non- linear transfer functions are used for classification tasks.
Non-linear transfer function(aka: activation function) is the most important factor which assigns the nonlinear approximation capability to the simple fully connected multilayer neural network.
Nevertheless, 'linear' activation function, of course, is one of the many alternatives you might want to adopt. But the problem is, pure linear transfer(f(x) = x) in hidden layers doesn't make sense for us, which means it may be 'in vain' if we try to train a network whose hidden units are activated by pure linear function.
We may understand this process with the following:
Assuming f(x)=x is our activation function, and we try to train a single hidden layer network having 2 input units(x1,x2), 3 hidden units(a1,a2,a3) and 1 output unit(y).
Hence, the network tries to approximate the function :
# hidden units
a1 = f(w11*x1+w12*x2+b1) = w11*x1+w12*x2+b1
a2 = f(w21*x1+w22*x2+b2) = w21*x1+w22*x2+b2
a3 = f(w31*x1+w32*x2+b3) = w31*x1+w32*x2+b3
# output unit
y = c1*a1+c2*a2+c3*a3+b4
if we combine all these equations, it turns out:
y = c1(w11*x1+w12*x2+b1) + c2(w21*x1+w22*x2+b2) + c3(w31*x1+w32*x2+b3) + b4
= (c1*w11+c2*w21+c3*w31)*x1 + (c1*w12+c2*w22+c3*w32)*x2 + (c1*b1+c2*b2+c3*b3+b4)
= A1*x1+A2*x2+C
As shown above, linear activation degenerate the network into a single input-output linear product, regardless of the structure of the network. What was done during the training process is factorizing A1, A2 and C into various factors.
Even one very popular quasi-linear activation function call Relu in deep neural network is also rectified. In other words, no pure linear activation in hidden layers is used unless you want to factorize coefficients.
I need to classify a dataset using Matlab MLP and show classification.
The dataset looks like
Click to view
What I have done so far is:
I have create an neural network contains a hidden layer (two neurons
?? maybe someone could give me some suggestions on how many
neurons are suitable for my example) and a output layer (one
neuron).
I have used several different learning methods such as Delta bar
Delta, backpropagation (both of these methods are used with or -out
momentum and Levenberg-Marquardt.)
This is the code I used in Matlab(Levenberg-Marquardt example)
net = newff(minmax(Input),[2 1],{'logsig' 'logsig'},'trainlm');
net.trainParam.epochs = 10000;
net.trainParam.goal = 0;
net.trainParam.lr = 0.1;
[net tr outputs] = train(net,Input,Target);
The following shows hidden neuron classification boundaries generated by Matlab on the data, I am little bit confused, beacause network should produce nonlinear result, but the result below seems that two boundary lines are linear..
Click to view
The code for generating above plot is:
figure(1)
plotpv(Input,Target);
hold on
plotpc(net.IW{1},net.b{1});
hold off
I also need to plot the output function of the output neuron, but I am stucking on this step. Can anyone give me some suggestions?
Thanks in advance.
Regarding the number of neurons in the hidden layer, for such an small example two are more than enough. The only way to know for sure the optimum is to test with different numbers. In this faq you can find a rule of thumb that may be useful: http://www.faqs.org/faqs/ai-faq/neural-nets/
For the output function, it is often useful to divide it in two steps:
First, given the input vector x, the output of the neurons in the hidden layer is y = f(x) = x^T w + b where w is the weight matrix from the input neurons to the hidden layer and b is the bias vector.
Second, you will have to apply the activation function g of the network to the resulting vector of the previous step z = g(y)
Finally, the output is the dot product h(z) = z . v + n, where v is the weight vector from the hidden layer to the output neuron and n the bias. In the case of more than one output neurons, you will repeat this for each one.
I've never used the matlab mlp functions, so I don't know how to get the weights in this case, but I'm sure the network stores them somewhere. Edit: Searching the documentation I found the properties:
net.IW numLayers-by-numInputs cell array of input weight values
net.LW numLayers-by-numLayers cell array of layer weight values
net.b numLayers-by-1 cell array of bias values