Recently, I've been trying to use Caffe for some of the deep learning work that I'm doing. Although writing the model in Caffe is very easy, I've not been able to know the answer to this question. How does Caffe determine the number of neurons in a hidden layer? I do know that determination of number of neurons in a layer and the number of hidden layers itself are problems that cannot be determined analytically and the use of 'thumb rules' is imperative in this regard. But is there a way to define or know the number of neurons in each layer in Caffe? And by default, how does Caffe inherently determine this?
Any help is much appreciated!
Caffe doesn't determine the number of neurons--the user does.
This is pulled straight from Caffe's website, here: http://caffe.berkeleyvision.org/tutorial/layers.html
For example, this is a convolution layer of 96 nodes (or neurons):
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
# learning rate and decay multipliers for the filters
param { lr_mult: 1 decay_mult: 1 }
# learning rate and decay multipliers for the biases
param { lr_mult: 2 decay_mult: 0 }
convolution_param {
num_output: 96 # learn 96 filters
kernel_size: 11 # each filter is 11x11
stride: 4 # step 4 pixels between each filter application
weight_filler {
type: "gaussian" # initialize the filters from a Gaussian
std: 0.01 # distribution with stdev 0.01 (default mean: 0)
}
bias_filler {
type: "constant" # initialize the biases to zero (0)
value: 0
}
}
}
Related
I have a 3D input dataset. The dimensions are (24,80,42). 80 number of timesteps or samples. Each timestep has 24 entities and each entity is attributed to 42 features. How do I give this as input to an ordinary Feed-forward Neural Network? I have already got results with LSTM.
This is the error I'm getting.
I don't know how to reshape the data to give as input.
ValueError: Error when checking input: expected dense_3_input to have 3 dimensions, but got array with shape (1920, 42)
input_shape = (80,24,42)
network = models.Sequential()
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=42, activation='relu',
input_shape=input_shape))
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=42, activation='relu'))
# Add fully connected layer with no activation function
network.add(layers.Dense(units=24))
network.summary()
Is this correct ?
Layer (type) Output Shape Param #
dense_25 (Dense) (None, 80, 24, 42) 1806
dense_26 (Dense) (None, 80, 24, 42) 1806
dense_27 (Dense) (None, 80, 24, 24) 1032
Total params: 4,644
Trainable params: 4,644
Non-trainable params: 0
I am using Cross Entropy with Softmax as loss function for my neural network.
The cross entropy function I have written is as follows:
def CrossEntropy(calculated,desired):
sum=0
n=len(calculated)
for i in range(0,n):
sum+=(desired[i] * math.log(calculated[i])) + ((1-desired[i])* math.log(1-calculated[i]))
crossentropy=(-1)*sum/n
return crossentropy
Now let us suppose the desired output is [1,0,0,0] and we are testing it for two calculated outputs i.e. a=[0.1,0.9,0.1,0.1] and b=[0.1,0.1,0.1,0.9]. The problem is that for both these calculated outputs will the function would return the exact same value for cross entropy. So how does the neural network learn that which output is the correct one ?
That is expected because you have a data symmetry in your two calculated cases.
In your example, the desired output is [1, 0, 0, 0]. Thus the true class is the first class. However, in both a and b your prediction for the first class are the same (0.1). Also for other classes (true negatives - 2nd, 3rd and 4th class), you have this data symmetry (class 2 and class 4 are equally important with respect to the loss calculation).
a -> 0.9,0.1,0.1
^
| |
V
b -> 0.1,0.1,0.9
Thus you have the same loss which is expected.
If you remove this symmetry, you get different cross entropy loss. See examples below:
# The first two are from your examples.
print CrossEntropy(calculated=[0.1,0.9,0.1,0.1], desired=[1, 0, 0, 0])
print CrossEntropy(calculated=[0.1,0.1,0.1,0.9], desired=[1, 0, 0, 0])
# below we have prediction for the last class as 0.75 thus break the data symmetry.
print CrossEntropy(calculated=[0.1,0.1,0.1,0.75], desired=[1, 0, 0, 0])
# below we have prediction for the true class as 0.45.
print CrossEntropy(calculated=[0.45,0.1,0.1,0.9], desired=[1, 0, 0, 0])
result:
1.20397280433
1.20397280433
0.974900121357
0.827953455132
I need to now how data is padded in a 1d convolutional layer using Keras with Theano as backend. I use a "same" padding.
Assuming we have an output_length of 8 and a kernel_size of 4. According to the original Keras code we have padding of 8//4 == 2. However, when adding two zeros at the left and the right end of my horizontal data, I could compute 9 convolutions instead of 8.
Can somebody explain me how data is padded? Where are zeros added and how do I compute the number of padding values on the right and left side of my data?
How to test the way keras pads the sequences:
A very simple test you can do is to create a model with a single convolutional layer, enforce its weights to be 1 and its biases to be 0, and give it an input with ones to see the output:
from keras.layers import *
from keras.models import Model
import numpy as np
#creating the model
inp = Input((8,1))
out = Conv1D(filters=1,kernel_size=4,padding='same')(inp)
model = Model(inp,out)
#adjusting the weights
ws = model.layers[1].get_weights()
ws[0] = np.ones(ws[0].shape) #weights
ws[1] = np.zeros(ws[1].shape) #biases
model.layers[1].set_weights(ws)
#predicting the result for a sequence with 8 elements
testData=np.ones((1,8,1))
print(model.predict(testData))
The output of this code is:
[[[ 2.] #a result 2 shows only 2 of the 4 kernel frames were activated
[ 3.] #a result 3 shows only 3 of the 4 kernel frames were activated
[ 4.] #a result 4 shows the full kernel was used
[ 4.]
[ 4.]
[ 4.]
[ 4.]
[ 3.]]]
So we can conclude that:
Keras adds the padding before performing the convolutions, not after. So the results are not "zero".
Keras distributes the padding equally, and when there is an odd number, it goes first.
So, it made the input data look like this before applying the convolutions
[0,0,1,1,1,1,1,1,1,1,0]
I am trying to understand the following part in caffe network model.
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
What I understood is there are 256 filters used in this layer.
I want to know how the values inside those filters are selected ?
Using size 5x5 and by std dev 0.01 we can create one filter, how other filters are created ?
Depending on the input dimension to this layer (the "channel" shape) this layer has 256 filters of shape in-dim-by-5-by-5. Caffe init all these values (according to weight_filler param) with i.i.d random samples from a Gaussian (normal) distribution with zero mean and std=0.01.
You can see the values in python (assuming the layer name is "conv1"):
import caffe
net = caffe.Net('/path/to/net.prototxt', caffe.TEST)
layer_idx = list(net._layer_names).index('conv1')
weights = net.layers[layer_idx].blobs[0].data
print "filter values =", weights
I've created a neural network, with the following structure:
Input1 - Input2 - Input layer.
N0 - N1 - Hidden layer. 3 Weights per node (one for bias).
N2 - Output layer. 3 Weights (one for bias).
I am trying to train it the XOR function with the following test data:
0 1 - desired result: 1
1 0 - desired result: 1
0 0 - desired result: 0
1 1 - desired result: 0
After training, the mean square error of test (when looking for a 1 result) {0, 1} = 0, which is good I presume. However the mean square error of test (when looking for a 0 result) {1, 1} = 0.5, which surely needs to be zero. During the learn stage I notice the MSE of true results drops to zero within the first few epochs, whereas MSE of false results lingers around 0.5.
I'm using back propagation to train the network, with a sigmoid function. The issue is that when I test any combination after the training, I always get a 1.0 result ouput. - The network seems to learn very fast, even with an extremely small learning rate.
If it helps, here is the weights that are produced:
N0-W0 = 0.5, N0-W1 = -0.999, N0-W2 = 0.304 (bias) - Hidden Layer
N1-W0 = 0.674, N1-W1 = -0.893, N1-W2 = 0.516 (bias) - Hidden Layer
N2-W0 = -0.243, N2-W1 = 0.955, N3-W2 = 0.369 (bias) - Output node
Thanks.
These are some steps which can help solve your problem:
Change your activation function. Here is a similar question which I answered using relu as the activation function: Neural network XOR gate not learning
Increase the the number of epochs.
Change your learning rate to a larger suitable value, so that you can reach convergence faster. You can find more info here:
How to determine the learning rate and the variance in a gradient descent algorithm?