I am trying to understand the following part in caffe network model.
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
What I understood is there are 256 filters used in this layer.
I want to know how the values inside those filters are selected ?
Using size 5x5 and by std dev 0.01 we can create one filter, how other filters are created ?
Depending on the input dimension to this layer (the "channel" shape) this layer has 256 filters of shape in-dim-by-5-by-5. Caffe init all these values (according to weight_filler param) with i.i.d random samples from a Gaussian (normal) distribution with zero mean and std=0.01.
You can see the values in python (assuming the layer name is "conv1"):
import caffe
net = caffe.Net('/path/to/net.prototxt', caffe.TEST)
layer_idx = list(net._layer_names).index('conv1')
weights = net.layers[layer_idx].blobs[0].data
print "filter values =", weights
Related
I have Implemented a custom loss function which takes in additional Noise ( numpy array) As illustrated below :
def custom_rcae_loss(self):
N = self.Noise
lambda_val = self.lamda[0]
mue = self.mue
self.batchNo += 1
index = self.batchNo
def custom_rcae(y_true, y_pred):
if(N.ndim >1):
term1 = keras.losses.mean_squared_error(y_true, (y_pred + N ))
The issue is that y_pred is of shape (batch_size, 28,28,1) :
How can I make sure my Noise is also of the same shape of y_pred?
Since I would like to perform (y_pred + Noise).
For instance: If my input is 5983 number of samples with a batch size of 128 There is not the same number of batch_size splits.
How can we address this issue while using keras for making sure Noise is of the same shape of y_pred
Looking forward to suggestions and hints
Thanks in advance
I need to now how data is padded in a 1d convolutional layer using Keras with Theano as backend. I use a "same" padding.
Assuming we have an output_length of 8 and a kernel_size of 4. According to the original Keras code we have padding of 8//4 == 2. However, when adding two zeros at the left and the right end of my horizontal data, I could compute 9 convolutions instead of 8.
Can somebody explain me how data is padded? Where are zeros added and how do I compute the number of padding values on the right and left side of my data?
How to test the way keras pads the sequences:
A very simple test you can do is to create a model with a single convolutional layer, enforce its weights to be 1 and its biases to be 0, and give it an input with ones to see the output:
from keras.layers import *
from keras.models import Model
import numpy as np
#creating the model
inp = Input((8,1))
out = Conv1D(filters=1,kernel_size=4,padding='same')(inp)
model = Model(inp,out)
#adjusting the weights
ws = model.layers[1].get_weights()
ws[0] = np.ones(ws[0].shape) #weights
ws[1] = np.zeros(ws[1].shape) #biases
model.layers[1].set_weights(ws)
#predicting the result for a sequence with 8 elements
testData=np.ones((1,8,1))
print(model.predict(testData))
The output of this code is:
[[[ 2.] #a result 2 shows only 2 of the 4 kernel frames were activated
[ 3.] #a result 3 shows only 3 of the 4 kernel frames were activated
[ 4.] #a result 4 shows the full kernel was used
[ 4.]
[ 4.]
[ 4.]
[ 4.]
[ 3.]]]
So we can conclude that:
Keras adds the padding before performing the convolutions, not after. So the results are not "zero".
Keras distributes the padding equally, and when there is an odd number, it goes first.
So, it made the input data look like this before applying the convolutions
[0,0,1,1,1,1,1,1,1,1,0]
I'm trying to build a Neural Network using nolearn that can do regression on multiple classes.
For example:
net = NeuralNet(layers=layers_s,
input_shape=(None, 2048),
l1_num_units=8000,
l2_num_units=4000,
l3_num_units=2000,
l4_num_units=1000,
d1_p = 0.25,
d2_p = 0.25,
d3_p = 0.25,
d4_p = 0.1,
output_num_units=noutput,
output_nonlinearity=None,
regression=True,
objective_loss_function=lasagne.objectives.squared_error,
update_learning_rate=theano.shared(float32(0.1)),
update_momentum=theano.shared(float32(0.8)),
on_epoch_finished=[
AdjustVariable('update_learning_rate', start=0.1, stop=0.001),
AdjustVariable('update_momentum', start=0.8, stop=0.999),
EarlyStopping(patience=200),
],
verbose=1,
max_epochs=1000)
noutput is the number of classes for which I want to do regression, if I set this to 1 everything works. When I use 26 (the number of classes here) as output_num_unit I get a Theano dimension error. (dimension mismatch in args to gemm (128,1000)x(1000,26)->(128,1))
The Y labels are continues variables, corresponding to a class. I tried to reshape the Y labels to (rows,classes) but this means I have to give a lot of the Y labels a value of 0 (because the value for that class is unknown). Is there any way to do this without setting some y_labels to 0?
If you want to do multiclass (or multilabel) regression with 26 classes, your output must not have shape (1082,), but (1082, 26). In order to preprocess your output, you can use sklearn.preprocessing.label_binarize
which will transform your 1D output to 2D output.
Also, your output non linearity should be a softmax function, so that the rows of your output sum to 1.
Recently, I've been trying to use Caffe for some of the deep learning work that I'm doing. Although writing the model in Caffe is very easy, I've not been able to know the answer to this question. How does Caffe determine the number of neurons in a hidden layer? I do know that determination of number of neurons in a layer and the number of hidden layers itself are problems that cannot be determined analytically and the use of 'thumb rules' is imperative in this regard. But is there a way to define or know the number of neurons in each layer in Caffe? And by default, how does Caffe inherently determine this?
Any help is much appreciated!
Caffe doesn't determine the number of neurons--the user does.
This is pulled straight from Caffe's website, here: http://caffe.berkeleyvision.org/tutorial/layers.html
For example, this is a convolution layer of 96 nodes (or neurons):
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
# learning rate and decay multipliers for the filters
param { lr_mult: 1 decay_mult: 1 }
# learning rate and decay multipliers for the biases
param { lr_mult: 2 decay_mult: 0 }
convolution_param {
num_output: 96 # learn 96 filters
kernel_size: 11 # each filter is 11x11
stride: 4 # step 4 pixels between each filter application
weight_filler {
type: "gaussian" # initialize the filters from a Gaussian
std: 0.01 # distribution with stdev 0.01 (default mean: 0)
}
bias_filler {
type: "constant" # initialize the biases to zero (0)
value: 0
}
}
}
I've created a neural network, with the following structure:
Input1 - Input2 - Input layer.
N0 - N1 - Hidden layer. 3 Weights per node (one for bias).
N2 - Output layer. 3 Weights (one for bias).
I am trying to train it the XOR function with the following test data:
0 1 - desired result: 1
1 0 - desired result: 1
0 0 - desired result: 0
1 1 - desired result: 0
After training, the mean square error of test (when looking for a 1 result) {0, 1} = 0, which is good I presume. However the mean square error of test (when looking for a 0 result) {1, 1} = 0.5, which surely needs to be zero. During the learn stage I notice the MSE of true results drops to zero within the first few epochs, whereas MSE of false results lingers around 0.5.
I'm using back propagation to train the network, with a sigmoid function. The issue is that when I test any combination after the training, I always get a 1.0 result ouput. - The network seems to learn very fast, even with an extremely small learning rate.
If it helps, here is the weights that are produced:
N0-W0 = 0.5, N0-W1 = -0.999, N0-W2 = 0.304 (bias) - Hidden Layer
N1-W0 = 0.674, N1-W1 = -0.893, N1-W2 = 0.516 (bias) - Hidden Layer
N2-W0 = -0.243, N2-W1 = 0.955, N3-W2 = 0.369 (bias) - Output node
Thanks.
These are some steps which can help solve your problem:
Change your activation function. Here is a similar question which I answered using relu as the activation function: Neural network XOR gate not learning
Increase the the number of epochs.
Change your learning rate to a larger suitable value, so that you can reach convergence faster. You can find more info here:
How to determine the learning rate and the variance in a gradient descent algorithm?