In the snippet:
criterion = nn.CrossEntropyLoss()
raw_loss = criterion(output.view(-1, ntokens), targets)
output size is torch.Size([5, 5, 8967]), targets size is torch.Size([25]), and ntokens is 8967
After modifying the code, my
output size is torch.Size([5, 8967]) and targets size is torch.Size([25])
which rises dimensionality issues when computing the loss.
Is it sensible to increase the size of my Linear activation that produces the output by 5, so that I can resize the output later to be of the size torch.Size([5, 5, 8967])?
The problem with increasing the size of the tensor is that ntokens can become quite large and I can easily run out of memory because of that. Is there an alternative approach?
You should do something like this:
ntokens = 8000
output = Variable(torch.randn(5, 5, ntokens))
targets = Variable(torch.from_numpy(np.random.randint(0, ntokens, size=25)))
criterion = nn.CrossEntropyLoss()
loss = criterion(output.view(-1, ntokens), targets)
print(loss)
This prints:
Variable containing:
9.4613
[torch.FloatTensor of size 1]
Here, I am assuming output contains predictions of next word for 5 sentences (minibatch size is 5) and each sentence is of length 5 (sequence length is 5). 8000 is the vocabulary size, so your model is predicting a probability distribution over the entire vocabulary.
Now, you can compute the loss of predicting each word as your target shape is 25 as required.
Please note, CrossEntropyLoss expects input to contain scores for each class. So, input has to be a 2D Tensor of size (minibatch, C) and the target has to be a class index (0 to C-1) for each value of a 1D tensor of size minibatch.
Related
I have trained an RNN in Keras. Now, I want to get the values of the trained weights:
model = Sequential()
model.add(SimpleRNN(27, return_sequences=True , input_shape=(None, 27), activation = 'softmax'))<br>
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.get_weights()
This gives me 2 arrays of shape (27,27) and 1 array of shape (27,1). I am not getting the meaning of these arrays. Also, I should get 2 more array of shape (27,27) and (27,1) that will calculate the hidden state 'a' activation. How can I get these weights?
The arrays returned by model.get_weights() directly correspond to the weights used by SimpleRNNCell. They include:
The kernel matrix of size (input_shape[-1], units). In your case, input_shape=(None, 27) and units=27, so it's (27, 27). The kernel gets multiplied by the input.
The recurrent_kernel matrix of size (units, units), which also happens to be (27, 27). This matrix gets multiplied by the previous state.
The bias array of shape (units,) == (27,).
These arrays correspond to the standard formula:
# W = kernel
# U = recurrent_kernel
# B = bias
output = new_state = act(W * input + U * state + B)
Note that keras implementation uses a single bias vector, so all in all there are exactly three arrays.
I get constantly error Total number of RBF neurons must be some integer to the power of 'dimensions' with using method SetRBFCentersAndWidthsEqualSpacing in C#.
Can someone who is familiar with RBF network in Encog check the line 232 in RBFNetwork.cs. I think there is maybe a bug or I miss something:
var expectedSideLength = (int) Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
double cmp = Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
if (expectedSideLength != cmp) -> error
these two variables can't be equal, because (int) rounds the number. It's coincidence that it works for XOR example, it won't work with different dimenson like 19 for example.
This is how I create RBF network:
dataSet is VersatileMLDataSet
RBFNetwork n = new RBFNetwork(dataSet.CalculatedInputSize, dataSet.Count, 1, RBFEnum.Gaussian);
n.SetRBFCentersAndWidthsEqualSpacing(0, 1, RBFEnum.Gaussian, 2.0/(dataSet.CalculatedInputSize * dataSet.CalculatedInputSize), true);
My dataset has 19 attributes (dimension) with 731 records.
The number of hidden neurons is an integer raised to the power of the number of input neurons. So if you have 3 input attributes and a window size of 2, hidden neurons would be any integer (say 3) raised to the power of 6 (3 x 2) or 729. This limits the number of input attributes and window size as the number of hidden neurons gets very large very quickly.
I recently came across tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits.
Is the only difference that training vectors y have to be one-hot encoded when using sparse_softmax_cross_entropy_with_logits?
Reading the API, I was unable to find any other difference compared to softmax_cross_entropy_with_logits. But why do we need the extra function then?
Shouldn't softmax_cross_entropy_with_logits produce the same results as sparse_softmax_cross_entropy_with_logits, if it is supplied with one-hot encoded training data/vectors?
Having two different functions is a convenience, as they produce the same result.
The difference is simple:
For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.
Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.
Another tiny difference is that with sparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss 0 on this label.
I would just like to add 2 things to accepted answer that you can also find in TF documentation.
First:
tf.nn.softmax_cross_entropy_with_logits
NOTE: While the classes are mutually exclusive, their probabilities
need not be. All that is required is that each row of labels is a
valid probability distribution. If they are not, the computation of
the gradient will be incorrect.
Second:
tf.nn.sparse_softmax_cross_entropy_with_logits
NOTE: For this operation, the probability of a given label is
considered exclusive. That is, soft classes are not allowed, and the
labels vector must provide a single specific index for the true class
for each row of logits (each minibatch entry).
Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding.
You can verify this by running the following program:
import tensorflow as tf
from random import randint
dims = 8
pos = randint(0, dims - 1)
logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32)
labels = tf.one_hot(pos, dims)
res1 = tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=labels)
res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos))
with tf.Session() as sess:
a, b = sess.run([res1, res2])
print a, b
print a == b
Here I create a random logits vector of length dims and generate one-hot encoded labels (where element in pos is 1 and others are 0).
After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output
Pithy: Help with Matlab script that takes ImageData array and Convolution weights from Caffe and returns convolution. Please.
I am trying to recreate a convolution generated by Caffe in Matlab.
Let's make the following definitions
W**2 = Size of input
F**2 = Size of filter
P = Size of padding
S = Stride
K = Number of filters
The following text describes how to generalize the convolution as a matrix multiplication:
The local regions in the input image are stretched out into columns in an operation commonly called im2col. For example, if the input is [227x227x3] and it is to be convolved with 11x11x3 filters at stride 4, then we would take [11x11x3] blocks of pixels in the input and stretch each block into a column vector of size 11*11*3 = 363. Iterating this process in the input at stride of 4 gives (227-11)/4+1 = 55 locations along both width and height, leading to an output matrix X_col of im2col of size [363 x 3025], where every column is a stretched out receptive field and there are 55*55 = 3025 of them in total. Note that since the receptive fields overlap, every number in the input volume may be duplicated in multiple distinct columns.
From this, one could draw the conclusion that the im2col function call would look something like this:
input = im2col( input, [3*F*F, ((W-F)/S+1)**2)])
However, if I use the following parameter-values
W = 5
F = 3
P = 1
S = 2
K = 2
I get the following dimensions
>> size(input)
ans =
1 3 5 5
>> size(output)
ans =
1 2 3 3
>> size(filter)
ans =
2 3 3 3
And if I use the im2col function call from above, I end up with an empty matrix.
If I change the stride to 1 in the above example, the size of the input, the output and the filter remains the same. If I use Matlab's 'convn' command, the size is not the same as the actual output from Caffe.
>> size(convn(input,filter))
ans =
2 5 7 7
What would be the general way to resize your array for matrix multiplication?
You are using the second argument to im2col wrong, see the documentation.
You should give it the size of the filter window that you are trying to slide over the image, i.e.:
cols = im2col( input, [F, F])
I guess my question is very simple, but anyway...
I've created neural network using
net = newff(entry_borders, [20, 10], {'logsig', 'logsig'}, 'traingdx');
where entry_borders is an array 50x2: [(0,1), (0,1), ...]
It must be a network with a hidden layer with 50 entries and 10 outputs, isn't it?
But when I run this:
test_result = sim(net, zeros(50));
disp(test_result);
I get matrix with 10x50 elements in test_result (instead of 10 scalar values) - what's that?? I'm not speaking about the teaching process that's why here's so sily code...
zeros(50) gives you a 50x50 matrix, so it is treated as 50 examples (each of dimension 50), which gives 50 predictions (each of size 10)