Optuna - what is a suitable metric for TFKerasPruningCallback? - neural-network

I've been using optuna to perform a hyperparameter search for a Keras neural network model (using scikit-learn's wrapper, KerasRegressor). I have been trying to implement the TFKerasPruningCallback function to prune unpromising trials, but keep getting the following error: UserWarning: The metric 'val_accuracy' is not in the evaluation logs for pruning. Please make sure you set the correct metric name.
A recommended metric provided in the docs is 'val_acc', or 'val_accuracy' for tensorflow version > 2 as shown in this example. I'm using v2.9.1.
Here's a simplified version of my code:
def nn(n_layers, neurons):
model = Sequential()
model.add(Dense(neurons, input_dim=3, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_absolute_error', optimizer='Adam')
return model
def objective(trial, X_train, y_train):
# callbacks
cb = [TFKerasPruningCallback(trial,'val_accuracy')]
params = {'neurons': trial.suggest_int('neurons', 1, 10, 1),
'batch_size': trial.suggest_int('batch_size', 10, 200, 10),
'epochs': trial.suggest_int('epochs', 100, 500, 20),
'callbacks': cb}
# create model and perform cross-validation
model = KerasRegressor(model=nn, **params)
score = np.mean(cross_val_score(model, X_train, y_train, cv=5,
scoring='neg_root_mean_squared_error'))
return score
study = optuna.create_study(direction='maximize', pruner=optuna.pruners.MedianPruner(n_startup_trials=5))
study.optimize(lambda trial: objective(trial, X_train, y_train), n_trials=100)
Everything works fine if I don't bother to implement the callback. I haven't found any reports of this issue so it might be something really silly that I've overlooked, but I'm stuck. Any advice?

Related

RandomForestClassifier has no attribute transform, so how to get predictions?

How do you get predictions out of a RandomForestClassifier? Loosely following the latest docs here, my code looks like...
# Split the data into training and test sets (30% held out for testing)
SPLIT_SEED = 64 # some const seed just for reproducibility
TRAIN_RATIO = 0.75
(trainingData, testData) = df.randomSplit([TRAIN_RATIO, 1-TRAIN_RATIO], seed=SPLIT_SEED)
print(f"Training set ({trainingData.count()}):")
trainingData.show(n=3)
print(f"Test set ({testData.count()}):")
testData.show(n=3)
# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)
rf.fit(trainingData)
#print(rf.featureImportances)
preds = rf.transform(testData)
When running this, I get the error
AttributeError: 'RandomForestClassifier' object has no attribute 'transform'
Examining the python api docs, I see nothing that look like it relates to generating predictions from the trained model (nor feature importance for that matter). Not much experience with mllib, so not sure what to make of this. Anyone with more experience know what to do here?
by looking closely to the documentation
>>> model = rf.fit(td)
>>> model.featureImportances
SparseVector(1, {0: 1.0})
>>> allclose(model.treeWeights, [1.0, 1.0, 1.0])
True
>>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
>>> result = model.transform(test0).head()
>>> result.prediction
you will notice the rf.fit return fitted models which is different than the original RandomForestClassifier class.
And the model will have the method to transform and also feature importance
so in your code
# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)
model = rf.fit(trainingData)
#print(rf.featureImportances)
preds = model.transform(testData)

unable to add layer in keras

I am trying to following this suggestion.
outputs = Conv2DTranspose(3, (1, 1), activation='sigmoid') (c9)
model = Model(inputs=[inputs], outputs=[outputs])
model = multi_gpu_model(model, gpus=8)
model.compile(optimizer='adam', loss = bce, metrics = [mean_iou])
model.add(Lambda(lambda x: K.batch_flatten(x)))
But at that last line of code, I receive the following error:
'Model' object has no attribute 'add'
I understand that since I didn't instantiate model as sequential() as in linked post, the function add() might not be available to me. However, I'm not sure how to work around this.
Corrected to reflect the working solution:
outputs = Conv2DTranspose(3, (1, 1), activation='sigmoid') (c9)
outputs = Lambda(lambda x: K.batch_flatten(x))(outputs)
model = Model(inputs=[inputs], outputs=[outputs])
model = multi_gpu_model(model, gpus=8)
model.compile(optimizer='adam', loss = bce, metrics = [mean_iou])
Going off #Today's answer in the OP's comments,
outputs = Lambda(lambda x: K.batch_flatten(x))(outputs)

What's the purpose of nb_epoch in Keras's fit_generator?

It seems like I could get the exact same result by making num_samples bigger and keeping nb_epoch=1. I thought the purpose of multiple epochs was to iterate over the same data multiple times, but Keras doesn't reinstantiate the generator at the end of each epoch. It just keeps going. For example training this autoencoder:
import numpy as np
from keras.layers import (Convolution2D, MaxPooling2D,
UpSampling2D, Activation)
from keras.models import Sequential
rand_imgs = [np.random.rand(1, 100, 100, 3) for _ in range(1000)]
def keras_generator():
i = 0
while True:
print(i)
rand_img = rand_imgs[i]
i += 1
yield (rand_img, rand_img)
layers = ([
Convolution2D(20, 5, 5, border_mode='same',
input_shape=(100, 100, 3), activation='relu'),
MaxPooling2D((2, 2), border_mode='same'),
Convolution2D(3, 5, 5, border_mode='same', activation='relu'),
UpSampling2D((2, 2)),
Convolution2D(3, 5, 5, border_mode='same', activation='relu')])
autoencoder = Sequential()
for layer in layers:
autoencoder.add(layer)
gen = keras_generator()
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
history = autoencoder.fit_generator(gen, samples_per_epoch=100, nb_epoch=2)
It seems like I get the same result with (samples_per_epoch=100, nb_epoch=2) as I do for (samples_per_epoch=200, nb_epoch=1). Am I using fit_generator as intended?
Yes - you are right that when using keras.fit_generator these two approaches are equivalent. But - there are variety of reasons why keeping epochs is reasonable:
Logging: in this case epoch comprises the amount of data after which you want to log some important statistics about training (like e.g. time or loss at the end of the epoch).
Keeping directory structure when you are using generator to load data from your hard disk - in this case - when you know how many files you have in your directory - you may adjust the batch_size and nb_epoch to such values that epoch would comprise going through every example in your dataset.
Keeping the structure of data when using flow generator - in this case, when you have e.g. a set of pictures loaded to your Python and you want to use Keras.ImageDataGenerator to apply different kind of data transformations, setting batch_size and nb_epoch in such way that epoch comprises going through every example in your dataset might help you in keeping track of a progress of your trainning process.

How to set proper arguments to build keras Convolution2D NN model [Text Classification]?

I am trying to use 2D CNN to do text classification on Chinese Article and have trouble on setting arguments of keras Convolution2D. I know the basic flow of Convolution2D to cope with image, but stuck by using my dataset with keras.
Input data
My data is 9800 Chinese Article, max sentence length is 6810,with 200 word2vec size.
So the input shape is `(9800, 1, 6810, 200)`
Code for building model
MAX_FEATURES = 6810
# I just randomly pick one filter, seems this is the problem?
nb_filter = 128
input_shape = (1, 6810, 200)
# each word is 200 (word2vec size)
embedding_size = 200
# 3 word length
n_gram = 3
# so stride here is embedding_size*n_gram
model = Sequential()
model.add(Convolution2D(nb_filter, n_gram, embedding_size, border_mode='valid', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(100, 1), border_mode='valid'))
model.add(Dropout(0.5))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(hidden_dims))
model.add(Dropout(0.5))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# X is (9800, 1, 6810, 200)
model.fit(X, y, batch_size=32,
nb_epoch=5,
validation_split=0.1)
Question 1. I have problem to set Convolution2D arguments. My reseach is below,
The official docs do not contain an exmaple for 2D CNN text classifacation(though has 1D CNN).
Convolution2D defination is here https://keras.io/layers/convolutional/:
keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='default', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
nb_filter: Number of convolution filters to use.
nb_row: Number of rows in the convolution kernel.
nb_col: Number of columns in the convolution kernel.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
Some research about the arguments:
This issue https://github.com/fchollet/keras/issues/233 is about 2D CNN for text classification, I read all comments and pick:
(1) https://github.com/fchollet/keras/issues/233#issuecomment-117427013
model.add(Convolution2D(nb_filter=N_FILTERS, stack_size=1, nb_row=FIELD_SIZE,
nb_col=1, subsample=(STRIDE, 1)))
(2) https://github.com/fchollet/keras/issues/233#issuecomment-117700913
sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
But it seems has some diference to current keras version, also the arguments naming by different people are in a mess (I hope keras has an easy understandable argument expanation).
Another comment I see about current api:
https://github.com/fchollet/keras/issues/1665#issuecomment-181181000
The current API is as below:
keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation='linear', weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='th', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None)
So (36,1,7,7) seems the reason, the correct arguments would be (36,7,7,...).
By above research, on my understanding of convolution, Convolution2D create a (nb_filter, nb_row, nb_col) filter , by sliding a stride to get one filter result, recurse sliding, finally combine the result into array with shape (1, one_sample_article_length[6810] / nb_filter), and go to the next layer, is that right? Is my code below set nb_row and nb_col correct ?
Question 2. What is the proper MaxPooling2D arguments? (for my dateset or for commonm, either is OK)
I refer this issue https://github.com/fchollet/keras/issues/233#issuecomment-117427013 to set the argument, there are two kinds:
MaxPooling2D(poolsize=(((nb_features - FIELD_SIZE) / STRIDE) + 1, 1))
MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1))
I have no idea why they calculate MaxPooling2D argument like that.
Question 3. Any recommendation for batch_size and nb_epoch to do such text classification? I have no idea at all.

Deconv implementation in keras output_shape issue

I am implementing following Colorization Model written in Caffe. I am confused about my output_shape parameter to supply in Keras
model.add(Deconvolution2D(256,4,4,border_mode='same',
output_shape=(None,3,14,14),subsample=(2,2),dim_ordering='th',name='deconv_8.1'))
I have added a dummy output_shape parameter. But how can I determine the output parameter? In caffe model the layer is defined as:
layer {
name: "conv8_1"
type: "Deconvolution"
bottom: "conv7_3norm"
top: "conv8_1"
convolution_param {
num_output: 256
kernel_size: 4
pad: 1
dilation: 1
stride: 2
}
If I do not supply this parameter the code give parameter error but I can not understand what should I supply as output_shape
p.s. already asked on data science forum page with no response. may be due to small user base
What output shape does the Caffe deconvolution layer produce?
For this colorization model in particular you can simply refer to page 24 of their paper (which is linked in their GitHub page):
So basically the output shape of this deconvolution layer in the original model is [None, 56, 56, 128]. This is what you want to pass to Keras as output_shape. The only problem is as I mention in the section below, Keras doesn't really use this parameter to determine the output shape, so you need to run a dummy prediction to find what your other parameters need to be in order for you to get what you want.
More generally the Caffe source code for computing its Deconvolution layer output shape is:
const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - 1) + 1;
const int output_dim = stride_data[i] * (input_dim - 1)
+ kernel_extent - 2 * pad_data[i];
Which with a dilation argument equal to 1 reduces to just:
const int output_dim = stride_data[i] * (input_dim - 1)
+ kernel_shape_data[i] - 2 * pad_data[i];
Note that this matches the Keras documentation when the parameter a is zero:
Formula for calculation of the output shape 3, 4: o = s (i - 1) +
a + k - 2p
How to verify actual output shape with your Keras backend
This is tricky, because the actual output shape depends on the backend implementation and configuration. Keras is currently unable to find it on its own. So you actually have to execute a prediction on some dummy input to find the actual output shape. Here's an example of how to do this from the Keras docs for Deconvolution2D:
To pass the correct `output_shape` to this layer,
one could use a test model to predict and observe the actual output shape.
# Examples
```python
# apply a 3x3 transposed convolution with stride 1x1 and 3 output filters on a 12x12 image:
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 14, 14), border_mode='valid', input_shape=(3, 12, 12)))
# Note that you will have to change the output_shape depending on the backend used.
# we can predict with the model and print the shape of the array.
dummy_input = np.ones((32, 3, 12, 12))
# For TensorFlow dummy_input = np.ones((32, 12, 12, 3))
preds = model.predict(dummy_input)
print(preds.shape)
# Theano GPU: (None, 3, 13, 13)
# Theano CPU: (None, 3, 14, 14)
# TensorFlow: (None, 14, 14, 3)
Reference: https://github.com/fchollet/keras/blob/master/keras/layers/convolutional.py#L507
Also you might be curious to know why is it that the output_shape parameter apparently doesn't really define the output shape. According to the post Deconvolution2D layer in keras this is why:
Back to Keras and how the above is implemented. Confusingly, the output_shape parameter is actually not used for determining the output shape of the layer, and instead they try to deduce it from the input, the kernel size and the stride, while assuming only valid output_shapes are supplied (though it's not checked in the code to be the case). The output_shape itself is only used as input to the backprop step. Thus, you must also specify the stride parameter (subsample in Keras) in order to get the desired result (which could've been determined by Keras from the given input shape, output shape and kernel size).