Using the the example network of a mlp with 2 hidden layers and two drop outs
so my load_data() function has 400 rows of 20 features and my label dataset is just 400 rows of one variable that will be split into X_train X_test y_train_y_test and some taken out for validation
my lasagne input layer is :
l_in = lasagne.layers.InputLayer(shape=(None, 20), input_var=input_var)
and my train function is train_fn = theano.function([input_var, target_var], loss, updates=updates, allow_input_downcast=True)
at around here my program skips: train_err += train_fn(inputs, targets)
'Wrong number of dimensions: expected 1, got 2 with shape (20, 1).')
the 20, 1 I understand, as I passed in twenty values on one side and 1 value in the labels side, but I thought theano autonmatically flattened each array?
what can I do to fix this?
any help would be appreciated!
The inputs that you pass to train_fn() should be an ndarray with shape (n, 20), where n is the number of examples in your minibatch. The targets should be an ndarray with shape (n) (note that shapes (1, n) and (n, 1) won't work). Try double checking that the arrays you actually pass to the function match these shapes.
Related
I was wondering how was working LSTM under Keras.
Let's take an example.
I have maximum sentence length of 3 words.
Example : 'how are you'
I vectorize each words in a vector of len 4. So I will have a shape (3,4)
Now, I want to use an lstm to do translation stuff. (Just an example)
model = Sequential()
model.add(LSTM(1, input_shape=(3,4), return_sequences=True))
model.summary()
I'm going to have an output shape of (3,1) according to Keras.
Layer (type) Output Shape Param #
=================================================================
lstm_16 (LSTM) (None, 3, 1) 24
=================================================================
Total params: 24
Trainable params: 24
Non-trainable params: 0
_________________________________________________________________
And this is what I don't understand.
Each unit of an LSTM (With return_sequences=True to have all the output of each state) should give me a vector of shape (timesteps, x)
Where timesteps is 3 in this case, and x is the size of my words vector (In this case, 4)
So, why I got an output shape of (3,1) ?
I searched everywhere, but can't figure it out.
Your interpretation of what the LSTM should return is not right. The output dimensionality doesn't need to match the input dimensionality. Concretely, the first argument of keras.layers.LSTM corresponds to the dimensionality of the output space, and you're setting it to 1.
In other words, setting:
model.add(LSTM(k, input_shape=(3,4), return_sequences=True))
will result in a (None, 3, k) output shape.
I'm trying to build a multi-layer perceptron wherein my data is made of pairs of traits, e.g. each input is an array x_1 = [v_1,v_2] where v_i are feature vectors. Therefore, my input tensor is of size [None,2,50] (each v_i is of size 50). Right now I'm trying and failing to split the input tensor into two tensors: one that will contain v_1's and one that will contain v_2's. For example, to get the v_1's modified tensor, I try:
v1 = tf.squeeze(tf.slice(input, [0,0,0], [-1, 1, -1]), squeeze_dims=[1]))
and I get:
size must be of rank 3... It seems to me like it is.. What am I doing wrong?
Thanks in advance!
split is on the axis =1 (your 2nd dimension : [None,2,50]) and want equal distribution on both side : [1,1], (experiment with [2,0],[0,2] to see what happens , one the split won't get anything)
v1,v2 = tf.split(input,num_or_size_splits=[1,1],axis=1)
v1 = tf.squeeze(v1,1)
split returns a list, squeeze might now work on it
I have not any idea about how to implement it with such library and function. Anybody can give me some idea. Just some function name or idea or some helpful website url would be ok! Thanks!
I thinks it's different.
How does linear regression map to your problem?
Given a matrix X, the rows may represent the samples, and the columns the variables.
The column containing the nunmpy.nan value represents the target variable ("y"). The remaining columns represent the input variables (the x1, x2,...).
The rows with observed values represent the training set, the rest represent the test set.
The code
Below is a code snippet that implements these points using your example matrix X.
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 2, 3], [2, 4, np.nan], [3, 6, 9]])
# Unknown rows (test examples), i.e. rows with a nan
impute_rows = np.any(np.isnan(X), axis=1)
# Known rows (training examples), i.e. rows without a nan
full_rows = np.logical_not(impute_rows)
# Column acting as variable to predict
output_var = np.any(np.isnan(X), axis=0)
input_var = np.logical_not(output_var)
# Check only one variable to predict
assert(np.sum(output_var)==1)
# Construct traing/test input/output
train_input = X[np.ix_(full_rows, input_var)]
train_output = X[np.ix_(full_rows, output_var)]
test_input = X[np.ix_(impute_rows, input_var)]
# Perform regression
lr = LinearRegression()
lr.fit(train_input, train_output)
lr.predict(test_input)
Note that using the specific X you provided represents an oversimplified case, where only two points are fitted, but these ideas should be applicable to larger matrices.
Also note there exist other more specialized methods to impute missing values from matrices (is understood in your question that this was an exercise). This specific method may be valid in cases where there is a linear relationship between the elements of the matrix (as is the case in your simplified example).
I recently came across tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits.
Is the only difference that training vectors y have to be one-hot encoded when using sparse_softmax_cross_entropy_with_logits?
Reading the API, I was unable to find any other difference compared to softmax_cross_entropy_with_logits. But why do we need the extra function then?
Shouldn't softmax_cross_entropy_with_logits produce the same results as sparse_softmax_cross_entropy_with_logits, if it is supplied with one-hot encoded training data/vectors?
Having two different functions is a convenience, as they produce the same result.
The difference is simple:
For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.
Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.
Another tiny difference is that with sparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss 0 on this label.
I would just like to add 2 things to accepted answer that you can also find in TF documentation.
First:
tf.nn.softmax_cross_entropy_with_logits
NOTE: While the classes are mutually exclusive, their probabilities
need not be. All that is required is that each row of labels is a
valid probability distribution. If they are not, the computation of
the gradient will be incorrect.
Second:
tf.nn.sparse_softmax_cross_entropy_with_logits
NOTE: For this operation, the probability of a given label is
considered exclusive. That is, soft classes are not allowed, and the
labels vector must provide a single specific index for the true class
for each row of logits (each minibatch entry).
Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding.
You can verify this by running the following program:
import tensorflow as tf
from random import randint
dims = 8
pos = randint(0, dims - 1)
logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32)
labels = tf.one_hot(pos, dims)
res1 = tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=labels)
res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos))
with tf.Session() as sess:
a, b = sess.run([res1, res2])
print a, b
print a == b
Here I create a random logits vector of length dims and generate one-hot encoded labels (where element in pos is 1 and others are 0).
After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output
I guess my question is very simple, but anyway...
I've created neural network using
net = newff(entry_borders, [20, 10], {'logsig', 'logsig'}, 'traingdx');
where entry_borders is an array 50x2: [(0,1), (0,1), ...]
It must be a network with a hidden layer with 50 entries and 10 outputs, isn't it?
But when I run this:
test_result = sim(net, zeros(50));
disp(test_result);
I get matrix with 10x50 elements in test_result (instead of 10 scalar values) - what's that?? I'm not speaking about the teaching process that's why here's so sily code...
zeros(50) gives you a 50x50 matrix, so it is treated as 50 examples (each of dimension 50), which gives 50 predictions (each of size 10)