Multi dimensional inputs in pytorch Linear method? - neural-network

When building a simple perceptron neural network we usuall passes a 2D matrix of input of format (batch_size,features) to a 2D weight matrix, similar to this simple neural network in numpy. I always assumed a Perceptron/Dense/Linear layer of a neural network only accepts an input of 2D format and outputs another 2D output. But recently I came across this pytorch model in which a Linear layer accepts a 3D input tensor and output another 3D tensor (o1 = self.a1(x)).
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class Net(nn.Module):
def __init__(self):
super().__init__()
self.a1 = nn.Linear(4,4)
self.a2 = nn.Linear(4,4)
self.a3 = nn.Linear(9,1)
def forward(self,x):
o1 = self.a1(x)
o2 = self.a2(x).transpose(1,2)
output = torch.bmm(o1,o2)
output = output.view(len(x),9)
output = self.a3(output)
return output
x = torch.randn(10,3,4)
y = torch.ones(10,1)
net = Net()
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters())
for i in range(10):
net.zero_grad()
output = net(x)
loss = criterion(output,y)
loss.backward()
optimizer.step()
print(loss.item())
These are the question I have,
Is the above neural network a valid one? that is whether the model will train correctly?
Even after passing a 3D input x = torch.randn(10,3,4), why is the pytorch nn.Linear doesn't shows any error and gives a 3D output?

Newer versions of PyTorch allows nn.Linear to accept N-D input tensor, the only constraint is that the last dimension of the input tensor will equal in_features of the linear layer. The linear transformation is then applied on the last dimension of the tensor.
For instance, if in_features=5 and out_features=10 and the input tensor x has dimensions 2-3-5, then the output tensor will have dimensions 2-3-10.

If you have a look at the documentation, you will find that indeed the Linear layer accepts tensors of arbitrary shape, where only the last dimension must match with the in_features argument you specified in the constructor.
The output will have exactly the same shape as the input, only the last dimension will change to whatever you specified as out_features in the constructor.
It works in a way that the same layer (with the same weights) is applied on each of the (possibly) multiple inputs. In your example you have an input shape of (10, 3, 4) which is basically a set of 10 * 3 == 30 4-dimensional vectors. So, your layers a1 and a2 are applied on all of these 30 vectors to generate another 10 * 3 == 30 4D vectors as the output (because you specified out_features=4 in the constructor).
So, to answer your questions:
Is the above neural network a valid one? that is whether the model will train correctly?
Yes, it is valid and it will be trained "correctly" from a technical pov. But, as with any other network, if this will actually correctly tackle your problem is another question.
Even after passing a 3D input x = torch.randn(10,3,4), why is the pytorch nn.Linear doesn't shows any error and gives a 3D output?
Well, because it is defined to work this way.

Related

Is a fully connected layer equivalent to Flatten + Dense in Tensorflow?

A fully-connected layer, also known as a dense layer, refers to the layer whose inside neurons connect to every neuron in the preceding layer (see Wikipedia).
In the MATLAB Deep Learning Toolkit, when defining a fullyConnectedLayer(n), the output will always be (borrowing the terminology from Tensorflow) a "tensor" of shape 1×1×n.
However, defining a dense layer in Keras via tf.keras.layers.Dense(n) will not result in a rank 1 tensor depending on the input, as explained in the Keras documentation:
For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).
Am I correct in assuming that what MATLAB does in fullyConnectedLayer(n) is equivalent to cascading a Flatten() layer and a Dense(n) layer in Tensorflow? By equivalent I mean that exactly the same operation is performed.
It would appear that this is the case based on the number of weights that MATLAB requires for a fullyConnectedLayer. The weights in fact are n×M where M is the dimension of the input (see MATLAB Documentation: "At training time, Weights is an OutputSize-by-InputSize matrix"). In fact snooping around the internals of this MATLAB function, it seems to me that the InputSize is precisely the size of the input if it were "flattened", i.e. M = a*b*c if the input tensor has shape (a,b,c) (and of course I experimentally verified this by multiplying).
The layer I'm trying to build is towards the final stages of a categorical classifiers, so I need the final output of the Keras model to be of shape (None, n) where n is the number of labels in the training data.

Calculating size of output of a Conv layer in CNN model

In convolutional Neural Networks, How to know the output of a specific conv layer? (I am using keras to build a CNN model)
For example if I am using one dimensional conv layer, where number_of_filters=20, kernel_size=10, and input_shape(500,1)
cnn.add(Conv1D(20,kernel_size=10,strides=1, padding="same",activation="sigmoid",input_shape=(Dimension_of_input,1)))
and if I am using two dimensional conv layer, where number_of_filters=64, kernal_size=(5,100), input_shape= (5,720,1) (height,width,channel)
Conv2D(64, (5, 100),
padding="same",
activation="sigmoid",
data_format="channels_last",
input_shape=(5,720,1)
what is the number of output in the above two conv layers? Is there any equation that can be used to know the number of outputs of a conv layer in convolution neural network?
Yes, there are equations for it, you can find them in the CS231N course website. But as this is a programming site, Keras provides an easy way to get this information programmaticaly, by using the summary function of a Model.
model = Sequential()
fill model with layers
model.summary()
This will print in terminal/console all the layer information, such as input shapes, output shapes, and number of parameters for each layer.
Actually, the model.summary() function might not be what you are looking for if you want to do more than just look at the model.
If you want to access layers of your Keras model you can do this by using model.layers which returns all of the layers (assignement stores them as a list). If you then want to look at a specific layer you can simply index the list:
list_of_layers = model.layers
list_of_layers[5] # gives you the 6th layer
What you are still working with are just objects so you probably want to get specific values. You just have to specify attribute you want to look at then:
list_of_layers[-1].output_shape # returns output_shape of last layer
Gives you back the output_shape tuple of the last layer in the model.
You can even skip the whole list assignement thing if you already know that you only want to look at the output_shape of a certain layer and just do:
model.layers[-1].output_shape # equivalent to the above method without storing in a list
This might be useful if you want to use these values while building the model to guide the execution in a certain way (adding a pooling layer or doing the padding etc.).
when first time i am working with TensorFlow cnn it is very difficult to dealing with dimensions. below is the general scenario for calculating dimensions:
consider
we have a image of dimension (nXn), filter dimension : (fXf), no padding, no strides applies :
after convolution dimension are : (n-f+1,n-f+1)
dimension of image = (nXn) and filter dimension = (fXf) and we have padding : p
then output dims are = (n+2P-f+1,n+2P-f+1)
if we are using Padding = 'SAME" it means output dims = input dims in this case equation looks like : n+2P-f+1=n
so from here p = (f-1)/2
if we are using valid padding then it means no padding and p =0
in computer vision f is usually odd if f is even it means we have asymmetric padding.
case when we are using stride = s
output dims are ( floor( ((n+2P-f)/s)+1 ),floor( ( (n+2P-f)/s)+1 ) )

Manually do forward calculation for neural network trained in Matlab

I'm training a neural network in Matlab using the built-in toolbox. I use dataset iris and put 10 hidden units for the hidden layer. The architecture of the network is quite simple, one input (size=4), one hidden (size=10) and one output (size=3).
After training, I can see the result is pretty good via the confusion matrix.
Now I want to extract the weight matrices from the trained network. I use the following commands:
w1 = cell2mat(net.IW);
w2 = cell2mat(net.LW);
b1 = cell2mat(net.b(1));
b2 = cell2mat(net.b(2));
Now I want to do forward calculation manually. From the variable net saved in Matlab environment, I know the transfer functions are tansig and softmax. I use following commands:
B1 = repmat(b1,1,150);
A1 = tansig(w1*irisInputs + B1);
B2 = repmat(b2,1,150);
A2 = softmax(w2*A1 + B2):
[~,classes] = max(A2);
However, I can see the result classes contain a lot of wrong classifications. I find it confusing because the confusion matrix shows a very good result (~99%).
If I use the Matlab function:
ouput = net(irisInputs), I get the same results as what drawn by the confusion matrix.
What's gone wrong here?

Keras: How to add computation after a layer and then train model?

I have a question about using Keras (with Theano as my backend) to which I'm rather new. I'm using a many to one RNN (takes in a time series as the input, computes one number as the output) as my first set of layers. So far, this is trivial with Keras using the recurrent layer IO.
Here is where I'm having trouble:
Now I like to pass the output of this RNN (the one number) to a separate function (lets call this f) and then do some computation with it.
What I would like to do is take this computed output (after the function f) and train it against the expected output (via some loss such as mse).
I'd like some advice on how to feed the output post computation from the function f and still train it via model.fit feature in Keras.
My pseudo code is as follows:
X = input
Y = output
#RNN layer
model.add(LSTM(....))
model.add(Activation(...)) %%Returns W*X
#function f %%Returns f(W*X)
(Needs to take in output from final RNN layer to generate a new number)
model.fit(X,Y,....)
In above, I'm not sure how to write code to include the output from function f while it is training for weights in the RNN (i.e. train f(W*x) against Y).
Any help is appreciated, thanks!!
It is not clear from your question if the RNN's weights should update with the training of f.
1st option - They should
As Matias said - a simple Dense layer is probably what you are looking for:
X = input
Y = output
#RNN layer
model.add(LSTM(....))
model.add(Activation(...)) %%Returns W*X
model.add(Dense(...))
model.fit(X,Y,....)
option 2 - They should not
Your f function would still be a Dense layer but you will iteratively train f and the RNN separately.
Assuming you have an rnn_model that you defined as above, define a new model f:
X = input
Y = output
#RNN layer
rnn_model = Sequential()
rnn_model .add(LSTM(....))
rnn_model .add(Activation(...)) %%Returns W*X
f_model = Sequential()
f_model.add(rnn_model)
f_model.add(Dense(...))
Now you can train them separately by doing:
# Code to train rnn_model
rnn_model.trainable = False
# Code to train f_model
rnn_model.trainable = True
The simplest way is to add a layer to your model that does the exact computation you want. From your comments it seems you just want
f(W*X), and that is exactly what a Dense layer does, minus the bias term.
So I believe adding a dense layer with the appropriate activation function is everything you need to do. If you don't need an activation at the output then just use "linear" as activation.
Just note that function f needs to be specified as a symbolic function using methods from keras.backend and it should be a differentiable function.

Matlab Multilayer Perceptron Question

I need to classify a dataset using Matlab MLP and show classification.
The dataset looks like
Click to view
What I have done so far is:
I have create an neural network contains a hidden layer (two neurons
?? maybe someone could give me some suggestions on how many
neurons are suitable for my example) and a output layer (one
neuron).
I have used several different learning methods such as Delta bar
Delta, backpropagation (both of these methods are used with or -out
momentum and Levenberg-Marquardt.)
This is the code I used in Matlab(Levenberg-Marquardt example)
net = newff(minmax(Input),[2 1],{'logsig' 'logsig'},'trainlm');
net.trainParam.epochs = 10000;
net.trainParam.goal = 0;
net.trainParam.lr = 0.1;
[net tr outputs] = train(net,Input,Target);
The following shows hidden neuron classification boundaries generated by Matlab on the data, I am little bit confused, beacause network should produce nonlinear result, but the result below seems that two boundary lines are linear..
Click to view
The code for generating above plot is:
figure(1)
plotpv(Input,Target);
hold on
plotpc(net.IW{1},net.b{1});
hold off
I also need to plot the output function of the output neuron, but I am stucking on this step. Can anyone give me some suggestions?
Thanks in advance.
Regarding the number of neurons in the hidden layer, for such an small example two are more than enough. The only way to know for sure the optimum is to test with different numbers. In this faq you can find a rule of thumb that may be useful: http://www.faqs.org/faqs/ai-faq/neural-nets/
For the output function, it is often useful to divide it in two steps:
First, given the input vector x, the output of the neurons in the hidden layer is y = f(x) = x^T w + b where w is the weight matrix from the input neurons to the hidden layer and b is the bias vector.
Second, you will have to apply the activation function g of the network to the resulting vector of the previous step z = g(y)
Finally, the output is the dot product h(z) = z . v + n, where v is the weight vector from the hidden layer to the output neuron and n the bias. In the case of more than one output neurons, you will repeat this for each one.
I've never used the matlab mlp functions, so I don't know how to get the weights in this case, but I'm sure the network stores them somewhere. Edit: Searching the documentation I found the properties:
net.IW numLayers-by-numInputs cell array of input weight values
net.LW numLayers-by-numLayers cell array of layer weight values
net.b numLayers-by-1 cell array of bias values