Transforming Train argument in Chainer 5 - version-control

How can I change this train argument(older version code) and use this in trainer extensions. What are the necessary changes to be made to use this code in Chainer: 5.4.0.
ValueError: train argument is not supported anymore. Use
chainer.using_config
[AutoEncoder/StackedAutoEncoder/Regression.py](https://github.com/quolc/chainer-ML-examples/blob/master/mnist-stacked-autoencoder/net.py)
[Train.py](https://github.com/quolc/chainer-ML-examples/blob/master/mnist-stacked-autoencoder/train_mnist_sae.py)
for epoch in range(0, n_epoch):
print(' epoch {}'.format(epoch+1))
perm = np.random.permutation(N)
permed_data = np.array(input_data[perm])
sum_loss = 0
start = time.time()
for i in range(0, N, batchsize):
x = chainer.Variable(permed_data[i:i+batchsize])
y = chainer.Variable(permed_data[i:i+batchsize])
optimizer.update(model, x, y)
sum_loss += float(model.loss.data) * len(y.data)
end = time.time()
throughput = N / (end - start)
print(' train mean loss={}, throughput={} data/sec'.format(sum_loss
/ N, throughput))
sys.stdout.flush()
# prepare train data for next layer
x = chainer.Variable(np.array(train_data))
train_data_for_next_layer = cuda.to_cpu(ae.encode(x, train=False).data)
In errors it points out to two different sections:
1. optimizer.update(model, x, y)
2. prepare train data for next layer second line where they mismatch the number of nodes in each layer. The error code is given below.
InvalidType:
Invalid operation is performed in: LinearFunction (Forward)
Expect: prod(in_types[0].shape[1:]) == in_types[1].shape[1]
Actual: 784 != 250

As to train argument, the details are written here: https://docs.chainer.org/en/stable/upgrade_v2.html
train argument is used by dropout in v1, but now Chainer uses config to manage its phase: in training or not.
So, there are two things to do.
First, remove train arguments from scripts.
Second, move inference code in the context.
with chainer.using_config(‘train’, False):
# define the inference process
prepare train data for next layer second line where they mismatch the number of nodes in each layer.
Could you share the error messages?

Related

Multi-output GP with multi-inputs?

I am trying to implement a multi-output GP in GPFlow with multi-dimensional input data.
I have seen from this issue in GPflow that a multi-dimensional input is possible by 'define a multidimensional base kernel and then apply the coregion on top of that'.
I have written the following code, I know for isotopic data (all outputs are obtained) one can use something alternatively like described in this notebook but here as I need to try ICM so let's continue with the code below.
However, when I try running the following code:
from gpflow.gpr import GPR
import gpflow
import numpy as np
from gpflow.kernels import Coregion
def f(x):
def _y(_x):
function_sum = 0
for i in np.arange(0, len(_x) - 1):
function_sum += (1 - _x[i]) ** 2 + 100 * ((_x[i + 1] - _x[i] ** 2) ** 2)
return function_sum
return np.atleast_2d([_y(_x) for _x in (np.atleast_2d(x))]).T
isotropic_X = np.random.rand(100, 2) * 4 - 2
Y1 = f(isotropic_X)
Y2 = f(isotropic_X) + np.random.normal(loc=2000, size=(100,1))
Y3 = f(isotropic_X) + np.random.normal(loc=-2000, size=(100,1))
# a Coregionalization kernel. The base kernel is Matern, and acts on the first ([0]) data dimension.
# the 'Coregion' kernel indexes the outputs, and actos on the second ([1]) data dimension
k1 = gpflow.kernels.Matern32(2)
coreg = Coregion(1, output_dim=3, rank=1, active_dims=[3]) # gpflow.kernels.Coregion(2, output_dim=2, rank=1)
coreg.W = np.random.rand(3, 1)
kern = k1 * coreg
# Augment the time data with ones or zeros to indicate the required output dimension
X_augmented = np.vstack((np.hstack((isotropic_X, np.zeros(shape=(isotropic_X.shape[0], 1)))),
np.hstack((isotropic_X, np.ones(shape=(isotropic_X.shape[0], 1)))),
np.hstack((isotropic_X, 2 * np.ones(shape=(isotropic_X.shape[0], 1))))))
# Augment the Y data to indicate which likeloihood we should use
Y_augmented = np.vstack((np.hstack((Y1, np.zeros(shape=(Y1.shape[0], 1)))),
np.hstack((Y2, np.ones(shape=(Y2.shape[0], 1)))),
np.hstack((Y3, 2 * np.ones(shape=(Y3.shape[0], 1))))))
# now buld the GP model as normal
m = GPR(X_augmented, Y_augmented, kern=kern)
m.optimize()
print(m.predict_f(np.array([[0.2, 0.2, 0], [0.4, 0.4, 0]])))
It returns me something like:
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 3 is not in [0, 3)
[[{{node name.build_likelihood/name.kern.K/name.kern.coregion.K/GatherV2}}]]
So my questions are:
- What is this problem and how to enable multi-output GP with multi-dimension input
- I didn't quite get the workflow of gpflow with coregion, from this multi-output gp slide, The ICM returns output GP from a additive form of a latent process $u$ sampled from a GP parameterized by its weight $W$. But in the gpflow notebook demo I can't see any latent process of that and the notebooks says 'The 'Coregion' kernel indexes the outputs, and acts on the last ([1]) data dimension (indices) of the augmented X values', which is quite different than the slides, I am really confused about these different descriptions, any hint on these?
The issue is simply with your offset indexing: the coregionalisation kernel should be
coreg = Coregion(input_dim=1, output_dim=3, rank=1, active_dims=[2])
Because active_dims=[2] indexes the third column.
Thanks for providing a fully reproducible example! I managed to run your code and succesfully optimize the model using a few steps of AdamOptimizer and then ScipyOptimizer, to a log-likelihood value of -2023.4.

Training LSTM in keras for classification, with data structure with 60 time steps

I have a multidimensional dataset(3500,10), in which, there is one binary variable I want to predict, y (3500, 1). So I used the following code to separated X and y and create a data structure with 60 timesteps to use as input for the LSTM network:
data_set = data_set.as_matrix() # Using multiple predictors.
X_total = []
y_total = []
n_future = 1 # Number of days you want to predict into the future
n_past = 60 # Number of past days you want to use to predict the future
for i in range(60, len(data_set)):
X_total.append(data_set[i-n_past:i, :9])
y_total.append(data_set[i+n_future-1:i + n_future, 9])
X_total, y_total = np.array(X_total), np.array(y_total)
Then I get X_total(3460,60,9) and y_total(3460,1)
How can I be sure that the NN uses for each obs of X_total the matching y_total?
It is kind of confusing, when I look into X_total data, it seems that it starts at the first obs of the original data_set and y_total at the 60th.
How can I check it?

Using Keras LSTM to predict a single example after using batch training

I have a network model that is trained using batch training. Once it is trained, I want to predict the output for a single example.
Here is my model code:
model = Sequential()
model.add(Dense(32, batch_input_shape=(5, 1, 1)))
model.add(LSTM(16, stateful=True))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
I have a sequence of single inputs to single outputs. I'm doing some test code to map characters to next characters (A->B, B->C, etc).
I create an input data of shape (15,1,1) and an output data of shape (15, 1) and call the function:
model.fit(x, y, nb_epoch=epochs, batch_size=5, shuffle=False, verbose=0)
The model trains, and now I want to take a single character and predict the next character (input A, it predicts B). I create an input of shape (1, 1, 1) and call:
pred = model.predict(x, batch_size=1, verbose=0)
This gives:
ValueError: Shape mismatch: x has 5 rows but z has 1 rows
I saw one solution was to add "dummy data" to your predict values, so the input shape for the prediction would be (5,1,1) with data [x 0 0 0 0] and you would just take the first element of the output as your value. However, this seems inefficient when dealing with larger batches.
I also tried to remove the batch size from the model creation, but I got the following message:
ValueError: If a RNN is stateful, a complete input_shape must be provided (including batch size).
Is there another way? Thanks for the help.
Currently (Keras v2.0.8) it takes a bit more effort to get predictions on single rows after training in batch.
Basically, the batch_size is fixed at training time, and has to be the same at prediction time.
The workaround right now is to take the weights from the trained model, and use those as the weights in a new model you've just created, which has a batch_size of 1.
The quick code for that is
model = create_model(batch_size=64)
mode.fit(X, y)
weights = model.get_weights()
single_item_model = create_model(batch_size=1)
single_item_model.set_weights(weights)
single_item_model.compile(compile_params)
Here's a blog post that goes into more depth:
https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
I've used this approach in the past to have multiple models at prediction time- one that makes predictions on big batches, one that makes predictions on small batches, and one that makes predictions on single items. Since batch predictions are much more efficient, this gives us the flexibility to take in any number of prediction rows (not just a number that is evenly divisible by batch_size), while still getting predictions pretty rapidly.
#ClimbsRocks showed a nice workaround. I cannot provide a "correct" answer in sense of "this is how Keras intends it to be done", but I can share another workaround which might help somebody depending on the use-case.
In this workaround I use predict_on_batch(). This method allows to pass a single sample out of a batch without throwing an error. Unfortunately, it returns a vector in the shape the target has according to the training-settings. However, each sample in the target yields then the prediction for your single sample.
You can access it like this:
to_predict = #Some single sample that would be part of a batch (has to have the right shape)#
model.predict_on_batch(to_predict)[0].flatten() #Flatten is optional
The result of the prediction is exactly the same as if you would pass an entire batch to predict().
Here some cod-example.
The code is from my question which also deals with this issue (but in a sligthly different manner).
sequence_size = 5
number_of_features = 1
input = (sequence_size, number_of_features)
batch_size = 2
model = Sequential()
#Of course you can replace the Gated Recurrent Unit with a LSTM-layer
model.add(GRU(100, return_sequences=True, activation='relu', input_shape=input, batch_size=2, name="GRU"))
model.add(GRU(1, return_sequences=True, activation='relu', input_shape=input, batch_size=batch_size, name="GRU2"))
model.compile(optimizer='adam', loss='mse')
model.summary()
#Summary-output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
GRU (GRU) (2, 5, 100) 30600
_________________________________________________________________
GRU2 (GRU) (2, 5, 1) 306
=================================================================
Total params: 30,906
Trainable params: 30,906
Non-trainable params: 0
def generator(data, batch_size, sequence_size, num_features):
"""Simple generator"""
while True:
for i in range(len(data) - (sequence_size * batch_size + sequence_size) + 1):
start = i
end = i + (sequence_size * batch_size)
yield data[start : end].reshape(batch_size, sequence_size, num_features), \
data[end - ((sequence_size * batch_size) - sequence_size) : end + sequence_size].reshape(batch_size, sequence_size, num_features)
#Task: Predict the continuation of a linear range
data = np.arange(100)
hist = model.fit_generator(
generator=generator(data, batch_size, sequence_size, num_features),
steps_per_epoch=total_batches,
epochs=200,
shuffle=False
)
to_predict = np.asarray([[np.asarray([x]) for x in range(95,100,1)]]) #Only single element of a batch
correct = np.asarray([100,101,102,103,104])
print( model.predict_on_batch(to_predict)[0].flatten() )
#Output:
[ 99.92908 100.95854 102.32129 103.28584 104.20213 ]

Training bias weight backpropagation

I'm trying to write an XOR solution using neural networks and the sigmoid activation function. (With True=0.9 and False=0.1)
I'm at the backpropagation part now.
The formula I was given for computing weight adjustments is:
delta_weight(l,i,j) = gamma*output(l,i)*error_signal(l,j)
i.e - the weight adjustment for the link between layer 1 (hidden), node 2 and layer 2(output), node 0 is:
delta_weight(1,2,0)
I chose gamma=0.5
Since bias weights are associated with a single node I guessed the weight adjustment formula was:
delta_weight(l,i) = gamma*output(l,i)
My program is not working, clearly my guess was incorrect. Could someone help me along?
Thanks a bunch!
EDIT: CODE
def applyInputs(self, inps):
for i in range(len(self.layers)-1):
for n, node in enumerate(self.layers[i+1].nodes):
ans = 0
for m, mode in enumerate(self.layers[i].nodes):
ans += self.links[stringify(i,m,i+1,n)].weight * mode.output
if node.bias == True:
ans+= self.links[stringify(-1,-1,i+1,n)].weight
node.set_output(response(ans))
return self.layers[len(self.layers)-1].nodes[0].output
def computeErrorSignals(self, out): # 'out' is the output of the entire network (only 1 output node)
# output node error signal
output_node = self.layers[len(self.layers)-1].nodes[0]
fin_err = (out - output_node.output)*output_node.output*(1-output_node.output)
output_node.set_error(fin_err)
# hidden node error signals
for j in range(len(self.layers[1].nodes)):
hid_node = self.layers[1].nodes[j]
err = (hid_node.output)*(1-hid_node.output)*self.layers[2].nodes[0].error_signal*self.links[stringify(1,j,2,0)].weight
hid_node.set_error(err)
def computeWeightAdjustments(self):
for i in range(len(self.layers)-1):
for n, node in enumerate(self.layers[i+1].nodes):
for m, mode in enumerate(self.layers[i].nodes):
self.links[stringify(i,m,i+1,n)].weight += ((0.5)*self.layers[i+1].nodes[n].error_signal*self.layers[i].nodes[m].output)
if node.bias == True:
self.links[stringify(-1,-1,i+1,n)].weight += ((0.5)*self.layers[i].nodes[m].output)

How to perform multi-label learning with LSTM using theano?

I have some text data with multiple labels for each document. I want to train a LSTM network using Theano for this dataset. I came across http://deeplearning.net/tutorial/lstm.html but it only facilitates a binary classification task. If anyone has any suggestions on which method to proceed with, that will be great. I just need an initial feasible direction, I can work on.
thanks,
Amit
1) Change the last layer of the model. I.e.
pred = tensor.nnet.softmax(tensor.dot(proj, tparams['U']) + tparams['b'])
should be replaced by some other layer, e.g. sigmoid:
pred = tensor.nnet.sigmoid(tensor.dot(proj, tparams['U']) + tparams['b'])
2) The cost should also be changed.
I.e.
cost = -tensor.log(pred[tensor.arange(n_samples), y] + off).mean()
should be replaced by some other cost, e.g. cross-entropy:
one = np.float32(1.0)
pred = T.clip(pred, 0.0001, 0.9999) # don't piss off the log
cost = -T.sum(y * T.log(pred) + (one - y) * T.log(one - pred), axis=1) # Sum over all labels
cost = T.mean(cost, axis=0) # Compute mean over samples
3) In the function build_model(tparams, options), you should replace:
y = tensor.vector('y', dtype='int64')
by
y = tensor.matrix('y', dtype='int64') # Each row of y is one sample's label e.g. [1 0 0 1 0]. sklearn.preprocessing.MultiLabelBinarizer() may be handy.
4) Change pred_error() so that it supports multilabel (e.g. using some metrics like accuracy or F1 score from scikit-learn).
You can change the last layer of the model. It would have a vector of target where each element is 0 or 1, depending if you have the target or not.