Light GBM Value Error: ValueError: For early stopping, at least one dataset and eval metric is required for evaluation - classification

Here is my code. It is a binary classification problem and the evaluation criteria are the AUC score. I have looked at one solution on Stack Overflow and implemented it but did not work and still giving me an error.
param_grid = {
'n_estimators' : [1000, 10000],
'boosting_type': ['gbdt'],
'num_leaves': [30, 35],
#'learning_rate': [0.01, 0.02, 0.05],
#'colsample_bytree': [0.8, 0.95 ],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
#'reg_alpha' : [0.01, 0.02, 0.05],
#'reg_lambda' : [0.01, 0.02, 0.05],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(random_state=42, early_stopping_rounds = 10, eval_metric = 'auc', verbose_eval=20)
grid_search = GridSearchCV(lgb, param_grid= param_grid,
scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train, eval_set = (X_val, y_val))
best_model = grid_search.best_estimator_
start = time()
best_model.fit(X_train, y_train)
Train_time = round(time() - start, 4)
Error happens at best_model.fit(X_train, y_train)

Answer
This error is caused by the fact that you used early stopping during grid search, but decided not to use early stopping when fitting the best model over the full dataset.
Some keyword arguments you pass into LGBMClassifier are added to the params in the model object produced by training, including early_stopping_rounds.
To disable early stopping, you can use update_params().
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)
More Details
I made some assumptions to turn your question into a minimal reproducible example. In the future, I recommend doing that when you ask questions here. It will help you get better, faster help.
I installed lightgbm 3.1.0 with pip install lightgbm==3.1.0. I'm using Python 3.8.3 on Mac.
Things I changed from your example to make it an easier-to-use reproduction
removed commented code
cut the number of iterations to [10, 100] and num_leaves to [8, 10] so training would run much faster
added imports
added a specific dataset and code to produce it repeatably
reproducible example
from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split
param_grid = {
'n_estimators' : [10, 100],
'boosting_type': ['gbdt'],
'num_leaves': [8, 10],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(
random_state=42,
early_stopping_rounds = 10,
eval_metric = 'auc',
verbose_eval=20
)
grid_search = GridSearchCV(
lgb,
param_grid= param_grid,
scoring='roc_auc',
cv=5,
n_jobs=-1,
verbose=1
)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.1,
random_state=42
)
grid_search.fit(
X_train,
y_train,
eval_set = (X_test, y_test)
)
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)

Related

TF Keras code adaptation from python2.7 to python3

I am working to adapt a python2.7 code that uses keras and tensorflow to implement a CNN but looks like the keras API has changed a little bit since when the original code was idealized. I keep getting an error about "Negative dimension after subtraction" and I can not find out what is causing it.
Unfortunately I am not able to provide an executable piece of code because I was not capable of make the original code works, but the repository containing all the source files can be found here.
The piece of code:
from keras.callbacks import EarlyStopping
from keras.layers.containers import Sequential
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.core import Reshape, Flatten, Dropout, Dense
from keras.layers.embeddings import Embedding
from keras.models import Graph
from keras.preprocessing import sequence
filter_lengths = [3, 4, 5]
self.model = Graph()
'''Embedding Layer'''
self.model.add_input(name='input', input_shape=(max_len,), dtype=int)
self.model.add_node(Embedding(
max_features, emb_dim, input_length=max_len), name='sentence_embeddings', input='input')
'''Convolution Layer & Max Pooling Layer'''
for i in filter_lengths:
model_internal = Sequential()
model_internal.add(
Reshape(dims=(1, self.max_len, emb_dim), input_shape=(self.max_len, emb_dim))
)
model_internal.add(Convolution2D(
nb_filters, i, emb_dim, activation="relu"))
model_internal.add(
MaxPooling2D(pool_size=(self.max_len - i + 1, 1))
)
model_internal.add(Flatten())
self.model.add_node(model_internal, name='unit_' + str(i), input='sentence_embeddings')
What I have tried:
m = tf.keras.Sequential()
m.add(tf.keras.Input(shape=(max_len, ), name="input"))
m.add(tf.keras.layers.Embedding(max_features, emb_dim, input_length=max_len))
filter_lengths = [ 3, 4, 5 ]
for i in filter_lengths:
model_internal = tf.keras.Sequential(name=f'unit_{i}')
model_internal.add(
tf.keras.layers.Reshape(( 1, max_len, emb_dim ), input_shape=( max_len, emb_dim ))
)
model_internal.add(
tf.keras.layers.Convolution2D(100, i, emb_dim, activation="relu")
)
model_internal.add(
tf.keras.layers.MaxPooling2D(pool_size=( max_len - i + 1, 1 ))
)
model_internal.add(
tf.keras.layers.Flatten()
)
m.add(model_internal)
I do not expect a complete solution, what I am really trying to understand is what is the cause to the following error:
Negative dimension size caused by subtracting 3 from 1 for '{{node conv2d_5/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 200, 200, 1], use_cudnn_on_gpu=true](Placeholder, conv2d_5/Conv2D/ReadVariableOp)' with input shapes: [?,1,300,200], [3,3,200,100].

dropout_rate and learning_rate do not change in RandomSearchCV

Dears, does anyone has an ideia why 'dropout_rate' and 'learning_rate' returne only 0 and does not search for the range I gave when I am doing a RandomizedSearchCV on the hyperparameters?
Here is my code for a ANN using keras/tensoflow:
# Create the model
def create_model(neurons = 1, init_mode = 'uniform', activation='relu', inputDim = 8792, dropout_rate=0.7, learn_rate=0.01, momentum=0, weight_constraint=0): #, learn_rate=0.01, momentum=0):
model = Sequential()
model.add(Dense(neurons, input_dim=inputDim, kernel_initializer=init_mode, activation=activation, kernel_constraint=maxnorm(weight_constraint), kernel_regularizer=regularizers.l2(0.001))) # one inner layer
#model.add(Dense(neurons, input_dim=inputDim, activation=activation)) # second inner layer
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation='sigmoid'))
optimizer = RMSprop(lr=learn_rate)
# compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
# model
model = KerasClassifier(build_fn=create_model, verbose=0)
# Define K-fold cross validation test harness
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)
for train, test in kfold.split(X_train, Y_train):
print("TRAIN:", train, "VALIDATION:", test)
# Define Hyperparameters
# specify parameters and distributions to sample from
from scipy.stats import randint as sp_randint
param_dist = {'neurons': sp_randint(300, 360), #, 175, 180, 185, 190, 195, 200],
'learn_rate': sp_randint (0.001, 0.01),
'batch_size': sp_randint(50, 60),
'epochs': sp_randint(20, 30),
'dropout_rate': sp_randint(0.2, 0.8),
'weight_constraint': sp_randint(3, 8)
}
# run randomized search
n_iter_search = 100
print("[INFO] Starting training digits")
print("[INFO] Tuning hyper-parameters for accuracy")
grid = RandomizedSearchCV(estimator=model, param_distributions=param_dist,
n_iter=n_iter_search, n_jobs=10, cv=kfold)
start = time.time()
grid_result = grid.fit(X_train, Y_train)
print("[INFO] GridSearch took {:.2f} seconds".format(time.time() - start))
My answer:
[INFO] GridSearch took 1164.39 seconds
[INFO] GridSearch best score 1.000000 using parameters: {'batch_size': 54, 'dropout_rate': 0, 'epochs': 20, 'learn_rate': 0, 'neurons': 331, 'weight_constraint': 7}
[INFO] Grid scores on development set:
0.614679 (0.034327) with: {'batch_size': 54, 'dropout_rate': 0, 'epochs': 29, 'learn_rate': 0, 'neurons': 354, 'weight_constraint': 6}
0.883792 (0.008650) with: {'batch_size': 53, 'dropout_rate': 0, 'epochs': 27, 'learn_rate': 0, 'neurons': 339, 'weight_constraint': 7}
0.256881 (0.012974) with: {'batch_size': 59, 'dropout_rate': 0, 'epochs': 27, 'learn_rate': 0, 'neurons': 308, 'weight_constraint': 4}
...
Thanks for helping.
0.2 and 0.8 are not integers, so when you use sp_randint(0.2, 0.8), these are converted to integers so its the same as sp_randint(0, 0). You have to use an equivalent function that generates floating point numbers, not integers.
For example, you can use a uniform distribution (uniform from scipy.stats) to generate real numbers.

Incompatible shapes on tensorflow.equal() op for correct predictions evaluation

Using the MNIST tutorial of Tensorflow, I try to make a convolutional network for face recognition with the "Database of Faces".
The images size are 112x92, I use 3 more convolutional layer to reduce it to 6 x 5 as adviced here
I'm very new at convolutional network and most of my layer declaration is made by analogy to the Tensorflow MNIST tutorial, it may be a bit clumsy, so feel free to advice me on this.
x_image = tf.reshape(x, [-1, 112, 92, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_conv3 = weight_variable([5, 5, 64, 128])
b_conv3 = bias_variable([128])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
h_pool3 = max_pool_2x2(h_conv3)
W_conv4 = weight_variable([5, 5, 128, 256])
b_conv4 = bias_variable([256])
h_conv4 = tf.nn.relu(conv2d(h_pool3, W_conv4) + b_conv4)
h_pool4 = max_pool_2x2(h_conv4)
W_conv5 = weight_variable([5, 5, 256, 512])
b_conv5 = bias_variable([512])
h_conv5 = tf.nn.relu(conv2d(h_pool4, W_conv5) + b_conv5)
h_pool5 = max_pool_2x2(h_conv5)
W_fc1 = weight_variable([6 * 5 * 512, 1024])
b_fc1 = bias_variable([1024])
h_pool5_flat = tf.reshape(h_pool5, [-1, 6 * 5 * 512])
h_fc1 = tf.nn.relu(tf.matmul(h_pool5_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
print orlfaces.train.num_classes # 40
W_fc2 = weight_variable([1024, orlfaces.train.num_classes])
b_fc2 = bias_variable([orlfaces.train.num_classes])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
My problem appear when the session run the "correct_prediction" op which is
tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
At least I think given the error message:
W tensorflow/core/common_runtime/executor.cc:1027] 0x19369d0 Compute status: Invalid argument: Incompatible shapes: [8] vs. [20]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Traceback (most recent call last):
File "./convolutional.py", line 133, in <module>
train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 405, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2728, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 419, in _do_run
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Incompatible shapes: [8] vs. [20]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Caused by op u'Equal', defined at:
File "./convolutional.py", line 125, in <module>
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 328, in equal
return _op_def_lib.apply_op("Equal", x=x, y=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in __init__
self._traceback = _extract_stack()
It looks like the y_conv output a matrix of shape 8 x batch_size instead of number_of_class x batch_size
If I change the batch size from 20 to 10, the error message stay the same but instead [8] vs. [20] I get [4] vs. [10]. So from that I conclude that the problem may come from the y_conv declaration (last line of the code above).
The loss function, optimizer, training, etc declarations is the same as in the MNIST tutorial:
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run((tf.initialize_all_variables()))
for i in xrange(1000):
batch = orlfaces.train.next_batch(20)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
print "Step %d, training accuracy %g" % (i, train_accuracy)
train_step.run(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 0.5})
print "Test accuracy %g" % accuracy.eval(feed_dict = {x: orlfaces.test.images, y_: orlfaces.test.labels, keep_prob: 1.0})
Thanks for reading, have a good day
Well, after a lot debugging, I found that my issue was due to a bad instantiation of the labels. Instead of creating arrays full of zeros and replace one value by one, I created them with random value! Stupid mistake. In case someone wondering what I did wrong there and how I fix it here is the change I made.
Anyway during all the debugging I made, to find this mistake, I found some useful information to debug this kind of problem:
For the cross entropy declaration, the tensorflow's MNIST tutorial use a formula that can lead to NaN value
This formula is
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
Instead of this, I found two ways to declare it in a safer fashion:
cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y_conv, 1e-10, 1.0)))
or also:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logit, y_))
As mrry says. printing the shape of the tensors can help to detect shape anomaly.
To get the shape of a tensor just call his get_shape() method like this:
print "W shape:", W.get_shape()
user1111929 in this question use a debug print that help me assert where the problem come from.

Keras + IndexError

I am very new to keras. Trying to build a binary classifier for an NLP task. (My code is motivated from imdb example - https://github.com/fchollet/keras/blob/master/examples/imdb_cnn.py)
Below is my code snippet:
max_features = 30
maxlen = 30
batch_size = 32
embedding_dims = 30
nb_filter = 250
filter_length = 3
hidden_dims = 250
nb_epoch = 3
(Train_X, Train_Y, Test_X, Test_Y) = load_and_split_data()
model = Sequential()
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
model.add(Convolution1D(nb_filter=nb_filter,filter_length=filter_length,border_mode="valid",activation="relu",subsample_length=1))
model.add(MaxPooling1D(pool_length=2))
model.add(Flatten())
model.add(Dense(hidden_dims))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', class_mode="binary")
fitlog = model.fit(Train_X, Train_Y, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=2)
When I run model.fit(), I get the following error:
/.virtualenvs/nnet/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
857 t0_fn = time.time()
858 try:
--> 859 outputs = self.fn()
860 except Exception:
861 if hasattr(self.fn, 'position_of_error'):
IndexError: One of the index value is out of bound. Error code: 65535.\n
Apply node that caused the error: GpuAdvancedSubtensor1(<CudaNdarrayType(float32, matrix)>, Elemwise{Cast{int64}}.0)
Toposort index: 47
Inputs types: [CudaNdarrayType(float32, matrix), TensorType(int64, vector)]
Inputs shapes: [(30, 30), (3840,)]
Inputs strides: [(30, 1), (8,)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuReshape{3}(GpuAdvancedSubtensor1.0, MakeVector{dtype='int64'}.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Can you please help me resolve this ?
You need to Pad the imdb sequences you are using, add those lines:
from keras.preprocessing import sequence
Train_X = sequence.pad_sequences(Train_X, maxlen=maxlen)
Test_X = sequence.pad_sequences(Test_X, maxlen=maxlen)
Before building the actual model.

Average pooling with Theano

I am trying to implement another pooling function for neural network with Theano, expect of already existing maxpool, for example average pool.
Using to this source, where average pooling is already implemented, my code looks like:
Random initialization just to test:
invals = numpy.random.RandomState(1).rand(3,2,5,5)
Definition of Theano scalars and functions:
pdim = T.scalar('pool dim', dtype='float32')
pool_inp = T.tensor4('pool input', dtype='float32')
pool_sum = TSN.images2neibs(pool_inp, (pdim, pdim))
pool_out = pool_sum.mean(axis=-1)
pool_fun = theano.function([pool_inp, pdim], pool_out, name = 'pool_fun', allow_input_downcast=True)
TSN is theano.sandbox.neighbours
And the call of the function:
pool_dim = 2
temp = pool_fun(invals, pool_dim)
temp.shape = (invals.shape[0], invals.shape[1], invals.shape[2]/pool_dim,
invals.shape[3]/pool_dim)
print ('invals[1,0,:,:]=\n', invals[1,0,:,:])
print ('output[1,0,:,:]=\n',temp[1,0,:,:])
And I am getting an error:
TypeError: neib_shape[0]=2, neib_step[0]=2 and ten4.shape[2]=5 not consistent
Apply node that caused the error: Images2Neibs{valid}(pool input, MakeVector.0, MakeVector.0)
Inputs shapes: [(3, 2, 5, 5), (2,), (2,)]
Inputs strides: [(200, 100, 20, 4), (4,), (4,)]
Inputs types: [TensorType(float32, 4D), TensorType(float32, vector), TensorType(float32, vector)]
Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.
I don't really understand this error. Would be glad to have any suggestions how to correct this error or example of other pooling techniques, programmed in Theano.
Thanks!
Edit: with the ignoring the border, it works perfectly
pool_sum = TSN.images2neibs(pool_inp, (pdim, pdim), mode='ignore_borders')
invals[1,0,:,:]=
[[ 0.01936696 0.67883553 0.21162812 0.26554666 0.49157316]
[ 0.05336255 0.57411761 0.14672857 0.58930554 0.69975836]
[ 0.10233443 0.41405599 0.69440016 0.41417927 0.04995346]
[ 0.53589641 0.66379465 0.51488911 0.94459476 0.58655504]
[ 0.90340192 0.1374747 0.13927635 0.80739129 0.39767684]]
output[1,0,:,:]=
[[ 0.33142066 0.30330223]
[ 0.42902038 0.64201581]]
invals has shape (5, 5) in the last two dimensions, however you want to pool over (2, 2) subsets. This only works if you ignore the border (i.e. the last column and the last row of invals).