Dimension out of range (expected to be in range of [-1, 0], but got 1) (pytorch) - neural-network

I have a very simple feed forward neural network (pytorch)
import torch
import torch.nn.functional as F
import numpy as np
class Net_1(nn.Module):
def __init__(self):
super(Net_1, self).__init__()
self.fc1 = nn.Linear(5*5, 64)
self.fc2 = nn.Linear(64, 32)
self.fc3 = nn.Linear(32, 3)
def forward(self,x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return F.log_softmax(x, dim=1)
net = Net_1()
and the input is this 5x5 numpy array
state = [[0, 0, 3, 0, 0],
[0, 0, 0, 0, 0],
[0, 2, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]
state = torch.Tensor(state).view(-1)
net(state) throws the following error
Dimension out of range (expected to be in range of [-1, 0], but got 1)
the problem is when F.log_softmax() is applied

at the point when you call return F.log_softmax(x, dim=1), x is a 1-dimensional tensor with shape torch.Size([3]).
dimension indexing in pytorch starts at 0, so you cannot use dim=1 for a 1-dimensional tensor, you will need to use dim=0.
replace return F.log_softmax(x, dim=1) with return F.log_softmax(x, dim=0) and you'll be good to go.
in the future you can check tensor sizes by adding print(x.shape) in forward.

You are giving a 3 element 1d array to your log_softmax function.
When saying dim=1 you are telling it to apply softmax to an axis that doesn't exist.
Just set dim=0 for a 1d array.
More on this function and what that parameter means here

Related

How to construct a sobel filter for kernel initialization in input layer for images of size 128x128x3?

This is my code for sobel filter:
def init_f(shape, dtype=None):
sobel_x = tf.constant([[-5, -4, 0, 4, 5], [-8, -10, 0, 10, 8], [-10, -20, 0, 20, 10], [-8, -10, 0, 10, 8], [-5, -4, 0, 4, 5]])
ker = np.zeros(shape, dtype)
ker_shape = tf.shape(ker)
kernel = tf.tile(sobel_x, ker_shape)//*Is this correct?*
return kernel
model.add(Conv2D(filters=30, kernel_size=(5,5), kernel_initializer=init_f, strides=(1,1), activation='relu'))
So far I have managed to do this.
But, this gives me error:
Shape must be rank 2 but is rank 4 for 'conv2d_17/Tile' (op: 'Tile') with input shapes: [5,5], [4].
Tensorflow Version: 2.1.0
You're close, but the args to tile don't appear to be correct. That is why you're getting the error "Shape must be rank 2 but is rank 4 for..." You're sobel_x must be a rank 4 tensor, so you need to add two more dimensions. I used reshape in this example.
from tensorflow import keras
import tensorflow as tf
import numpy
def kernelInitializer(shape, dtype=None):
print(shape)
sobel_x = tf.constant(
[
[-5, -4, 0, 4, 5],
[-8, -10, 0, 10, 8],
[-10, -20, 0, 20, 10],
[-8, -10, 0, 10, 8],
[-5, -4, 0, 4, 5]
], dtype=dtype )
#create the missing dims.
sobel_x = tf.reshape(sobel_x, (5, 5, 1, 1))
print(tf.shape(sobel_x))
#tile the last 2 axis to get the expected dims.
sobel_x = tf.tile(sobel_x, (1, 1, shape[-2],shape[-1]))
print(tf.shape(sobel_x))
return sobel_x
x1 = keras.layers.Input((128, 128, 3))
cvl = keras.layers.Conv2D(30, kernel_size=(5,5), kernel_initializer=kernelInitializer, strides=(2,2), activation='relu')
model = keras.Sequential();
model.add(x1)
model.add(cvl)
data = numpy.ones((1, 128, 128, 3))
data[:, 0:64, 0:64, :] = 0
pd = model.predict(data)
print(pd.shape)
d = pd[0, :, :, 0]
for row in d:
for col in row:
m = '0'
if col != 0:
m = 'X'
print(m, end="")
print("")
I looked at using expand_dims instead of reshape but there didn't appear any advantage. broadcast_to seems ideal, but you still have to add the dimensions, so I don't think it was better than tile.
Why 30 filters of the same filter though? Are they going to be changed afterwards?

Basic Matlab for loop

A=2;
for x=0:2:4
A=[A, A*x];
end
A
I'd appreciate any help! The for loop condition as well as the 3rd line and how they work together I can't quite piece together
So, here comes the walktrough.
A = 2;
A is an array of length 1, with 2 as the only element.
for x = 0:2:4
Have a look at the Examples section of the for help. You create an "iteration variable" x, which iterates through an array with the values [0, 2, 4]. See also the Examples section of the : operator help.
A = [A, A*x];
Concatenate array A with the value of A*x (multiplying an array with a scalar results in an array of the same length, in which each element is multiplied by the given scalar), and re-assign the result to A. See also the help on Concatenating Matrices.
Initially, A = [2].
For x = 0: A = [[2], [2] * 0], i.e. A = [2, 0].
For x = 2: A = [[2, 0], [2, 0] * 2], i.e. A = [2, 0, 4, 0].
For x = 4: A = [[2, 0, 4, 0], [2, 0, 4, 0] * 4], i.e. A = [2, 0, 4, 0, 8, 0, 16, 0].
end
End of for loop.
A
Output content of A by implicitly calling the display function by omitting the semicolon at the end of the line, see here for explanation.

TensorFlow XOR code works fine with two dimensional target but not without?

Trying to implement a very basic XOR FFNN in TensorFlow. I may just be misunderstanding the code but can anyone see an obvious reason why this won't work-- blows up to NaNs and starts with loss of $0$.
Toggles are on works/ doesn't work if you want to mess around with it.
Thanks!
import math
import tensorflow as tf
import numpy as np
HIDDEN_NODES = 10
x = tf.placeholder(tf.float32, [None, 2])
W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES]))
b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES]))
hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden)
#-----------------
#DOESN"T WORK
W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 1]))
b_logits = tf.Variable(tf.zeros([1]))
logits = tf.add(tf.matmul(hidden, W_logits),b_logits)
#WORKS
# W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 2]))
# b_logits = tf.Variable(tf.zeros([2]))
# logits = tf.add(tf.matmul(hidden, W_logits),b_logits)
#-----------------
y = tf.nn.softmax(logits)
#-----------------
#DOESN"T WORK
y_input = tf.placeholder(tf.float32, [None, 1])
#WORKS
#y_input = tf.placeholder(tf.float32, [None, 2])
#-----------------
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, y_input)
loss = tf.reduce_mean(cross_entropy)
loss = cross_entropy
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
#-----------------
#DOESN"T WORK
yTrain = np.array([[0], [1], [1], [0]])
# WORKS
#yTrain = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])
#-----------------
for i in xrange(500):
_, loss_val,logitsval = sess.run([train_op, loss,logits], feed_dict={x: xTrain, y_input: yTrain})
if i % 10 == 0:
print "Step:", i, "Current loss:", loss_val,"logits",logitsval
print sess.run(y,feed_dict={x: xTrain})
TL;DR: For this to work, you should use
loss = tf.nn.l2_loss(logits - y_input)
...instead of tf.nn.softmax_cross_entropy_with_logits.
The tf.nn.softmax_cross_entropy_with_logits operator expects the logits and labels inputs to be a matrix of size batch_size by num_classes. Each row of logits is an unscaled probability distribution across the classes; and each row of labels is a one-hot encoding of the true class for each example in the batch. If the inputs do not match these assumptions, the training process may diverge.
In this code, the logits are batch_size by 1, which means that there is only a single class, and the softmax outputs a prediction of class 0 for all of the examples; the labels are not one-hot. If you look at the implementation of the operator, the backprop value for tf.nn.softmax_cross_entropy_with_logits is:
// backprop: prob - labels, where
// prob = exp(logits - max_logits) / sum(exp(logits - max_logits))
This will be [[1], [1], [1], [1]] - [[0], [1], [1], [0]] in every step, which clearly does not converge.

How to generate vector with different prob. distributions for each element

I need to generate vector r of N values 1-6 (they can be repetitive) to given permutation p of N elements. But the values are generated with some probability distribution depending on the i-th value of the permutation.
E.g. I have permutation p = [2 3 1 4] and probabilistic distribution matrix (Nx6): Pr = [1, 0, 0, 0, 0, 0; 0, 0.5, 0, 0.5, 0, 0; 0, 0, 0, 1, 0, 0; 0.2, 0.2, 0.2, 0.2, 0.2, 0]
i-th row represents prob. distribution of values 1-6 to element i in permutation p (its value, not position), sum of rows is 1.
For example, we can assign value 1 to value 1, value 2 or 4 to value 2 etc. So it can look like this: r = [2 4 1 2] or r = [4 4 1 5].
Currently I am using this code:
for i = 1:N
r(i) = randsample(1:6,1,true,Pr(p(i),:));
end
But it is quite slow and I am trying to avoid the for-cycle, maybe by function bsxfun or something similar.
Does anyone have any clue, please? :-)
A solution to your problem is basically available in this answer, everything needed for your case is replacing the vector prob with a matrix and fix all operations to work properly on matrices.
Pr=[1, 0, 0, 0, 0, 0; 0, 0.5, 0, 0.5, 0, 0; 0, 0, 0, 1, 0, 0; 0.2, 0.2, 0.2, 0.2, 0.2, 0];
p = [2 3 1 4];
prob=Pr(p,:);
r=rand(size(pPr,1),1);
x=sum(bsxfun(#ge,r,cumsum(padarray(prob,[0,1],'pre'),2)),2);

Create a matrix according to a binary matrix

Here I got
A = [1, 2, 3]
B = [1, 0, 0, 1, 0, 1]
I want to create a matrix
C = [1, 0, 0, 2, 0, 3]
You can see B is like a mask, The number of ones in B is equal to the number of elements in A. What I want is arrange elements in A to the place where B is 1.
Any method without loop?
Untested, but should be close:
C = zeros(size(B));
C(logical(B)) = A;
This relies on logical indexing.