how to freeze some layers when fine tune resnet50 - neural-network

I am trying to fine tune resnet 50 with keras. When I freeze all the layers in resnet50, everything works OK. However, I want to freeze some layers of resnet50, not all of them. But when I do this, I get some errors. Here is my code:
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))
#this is where the error happens. The commented code works fine
"""
for layer in base_model.layers:
layer.trainable = False
"""
for layer in base_model.layers[:-26]:
layer.trainable = False
model.summary()
optimizer = Adam(lr=1e-4)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
callbacks = [
EarlyStopping(monitor='val_loss', patience=4, verbose=1, min_delta=1e-4),
ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=2, cooldown=2, verbose=1),
ModelCheckpoint(filepath='weights/renet50_best_weight.fold_' + str(fold_count) + '.hdf5', save_best_only=True,
save_weights_only=True)
]
model.load_weights(filepath="weights/renet50_best_weight.fold_1.hdf5")
model.fit_generator(generator=train_generator(), steps_per_epoch=len(df_train) // batch_size, epochs=epochs, verbose=1,
callbacks=callbacks, validation_data=valid_generator(), validation_steps = len(df_valid) // batch_size)
the error is as follows:
Traceback (most recent call last):
File "/home/jamesben/ai_challenger/src/train.py", line 184, in <module> model.load_weights(filepath="weights/renet50_best_weight.fold_" + str(fold_count) + '.hdf5')
File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 719, in load_weights topology.load_weights_from_hdf5_group(f, layers)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 3095, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2193, in batch_set_value get_session().run(assign_ops, feed_dict=feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 944, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (128,) for Tensor 'Placeholder_72:0', which has shape '(3, 3, 128, 128)'
can anyone give me some help on how many layer should I freeze with resnet50?

When using load_weights() and save_weights() with a nested model, it's very easy to get an error if the trainable settings are not the same.
To solve the error, make sure you freeze the same layers before calling model.load_weights(). That is, if the weight file is saved with all layers frozen, the procedure will be:
Recreate the model
Freeze all layers in base_model
Load the weights
Unfreeze those layers you want to train (in this case, base_model.layers[-26:])
For example,
base_model = ResNet50(include_top=False, input_shape=(224, 224, 3))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))
for layer in base_model.layers:
layer.trainable = False
model.load_weights('all_layers_freezed.h5')
for layer in base_model.layers[-26:]:
layer.trainable = True
The underlying reason:
When you call model.load_weights(), (roughly) the weight for each layer is loaded by the following steps (in the function load_weights_from_hdf5_group() in topology.py):
Call layer.weights to get the weight tensors
Match each weight tensor with its corresponding weight value in the hdf5 file
Call K.batch_set_value() to assign the weight values to the weight tensors
If your model is a nested model, you have to be careful about trainable because of Step 1.
I'll use an example to explain it. For the same model as above, model.summary() gives:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resnet50 (Model) (None, 1, 1, 2048) 23587712
_________________________________________________________________
flatten_10 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_5 (Dense) (None, 80) 163920
=================================================================
Total params: 23,751,632
Trainable params: 11,202,640
Non-trainable params: 12,548,992
_________________________________________________________________
The inner ResNet50 model is treated as a layer of model during weight loading. When loading the layer resnet50, in Step 1, calling layer.weights is equivalent to calling base_model.weights. The list of weight tensors for all layers in the ResNet50 model will be collected and returned.
Now the problem is that, when constructing the list of weight tensors, trainable weights will come before non-trainable weights. In the definition of Layer class:
#property
def weights(self):
return self.trainable_weights + self.non_trainable_weights
If all layers in base_model are frozen, the weight tensors will be in the following order:
for layer in base_model.layers:
layer.trainable = False
print(base_model.weights)
[<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
<tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
...
<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
<tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
However, if some layers are trainable, the weight tensors of the trainable layers will come before that of the frozen ones:
for layer in base_model.layers[-5:]:
layer.trainable = True
print(base_model.weights)
[<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
<tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
<tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
...
<tf.Variable 'bn5c_branch2b/moving_mean:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2b/moving_variance:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
The change in order is why you got an error about tensor shapes. The weight values saved in the hdf5 file are matched to the wrong weight tensors in Step 2 mentioned above. The reason that everything works fine when you freeze all layers is because your model checkpoint is saved also with all layers frozen and thus the order is correct.
Possibly better solution:
You can avoid a nested model by using the functional API. For example, the following code should work without error:
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)
for layer in base_model.layers:
layer.trainable = False
model.save_weights("all_nontrainable.h5")
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)
for layer in base_model.layers[:-26]:
layer.trainable = False
model.load_weights("all_nontrainable.h5")

Related

"ValueError: max_evals=500 is too low for the Permutation explainer" shap answers me do I have to give more data (photos)?

I want to test the explainability of a multiclass semantic segmentation model, deeplab_v3plus with shap to know which features contribute the most to semantic classification. However I have a ValueError: max_evals=500 is too low when running my file, and I struggle to understand the reason.
import glob
from PIL import Image
import torch
from torchvision import transforms
from torchvision.utils import make_grid
import torchvision.transforms.functional as tf
from deeplab import deeplab_v3plus
import shap
def test(args):
# make a video prez
model = deeplab_v3plus('resnet101', num_classes=args.nclass, output_stride=16, pretrained_backbone=True)
model.load_state_dict(torch.load(args.seg_file,map_location=torch.device('cpu'))) # because no gpu available on sandbox environnement
model = model.to(args.device)
model.eval()
explainer = shap.Explainer(model)
with torch.no_grad():
for i, file in enumerate(args.img_folder):
img = img2tensor(file, args)
pred = model(img)
print(explainer(img))
if __name__ == '__main__':
class Arguments:
def __init__(self):
self.device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
self.seg_file = "Model_Woodscape.pth"
self.img_folder = glob.glob("test_img/*.png")
self.mean = [0.485, 0.456, 0.406]
self.std = [0.229, 0.224, 0.225]
self.h, self.w = 483, 640
self.nclass = 10
self.cmap = {
1: [128, 64, 128], # "road",
2: [69, 76, 11], # "lanemarks",
3: [0, 255, 0], # "curb",
4: [220, 20, 60], # "person",
5: [255, 0, 0], # "rider",
6: [0, 0, 142], # "vehicles",
7: [119, 11, 32], # "bicycle",
8: [0, 0, 230], # "motorcycle",
9: [220, 220, 0], # "traffic_sign",
0: [0, 0, 0] # "void"
}
args = Arguments()
test(args)
But it returns:
(dee_env) jovyan#jupyter:~/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+$ python test_shap.py
BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
Traceback (most recent call last):
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/test_shap.py", line 85, in <module>
test(args)
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/test_shap.py", line 37, in test
print(explainer(img))
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/dee_env/lib/python3.9/site-packages/shap/explainers/_permutation.py", line 82, in __call__
return super().__call__(
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/dee_env/lib/python3.9/site-packages/shap/explainers/_explainer.py", line 266, in __call__
row_result = self.explain_row(
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/dee_env/lib/python3.9/site-packages/shap/explainers/_permutation.py", line 164, in explain_row
raise ValueError(f"max_evals={max_evals} is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = {2 * len(inds) + 1}!")
ValueError: max_evals=500 is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = 1854721!
In the source code it looks like it's because I don't give enough arguments. I only have three images in my test_img/* folder, is that why?
I have the same issue. A possible solution I found which seems to be working for my case is to replace this line
explainer = shap.Explainer(model)
With this line
explainer = shap.explainers.Permutation(model, max_evals = 1854721)
shap.Explainer by default has algorithm='auto'. From the documentation: shape.Explainer
By default the “auto” options attempts to make the best choice given
the passed model and masker, but this choice can always be overriden
by passing the name of a specific algorithm.
Since 'permutation' has been selected you can directly use shap.explainers.Permutation and set max_evals to the value suggested in the error message above.
Given the high number of your use case, this might take a really long time. I would suggest to use an easier model just for testing the above solution.

Light GBM Value Error: ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

Here is my code. It is a binary classification problem and the evaluation criteria are the AUC score. I have looked at one solution on Stack Overflow and implemented it but did not work and still giving me an error.
param_grid = {
'n_estimators' : [1000, 10000],
'boosting_type': ['gbdt'],
'num_leaves': [30, 35],
#'learning_rate': [0.01, 0.02, 0.05],
#'colsample_bytree': [0.8, 0.95 ],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
#'reg_alpha' : [0.01, 0.02, 0.05],
#'reg_lambda' : [0.01, 0.02, 0.05],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(random_state=42, early_stopping_rounds = 10, eval_metric = 'auc', verbose_eval=20)
grid_search = GridSearchCV(lgb, param_grid= param_grid,
scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train, eval_set = (X_val, y_val))
best_model = grid_search.best_estimator_
start = time()
best_model.fit(X_train, y_train)
Train_time = round(time() - start, 4)
Error happens at best_model.fit(X_train, y_train)
Answer
This error is caused by the fact that you used early stopping during grid search, but decided not to use early stopping when fitting the best model over the full dataset.
Some keyword arguments you pass into LGBMClassifier are added to the params in the model object produced by training, including early_stopping_rounds.
To disable early stopping, you can use update_params().
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)
More Details
I made some assumptions to turn your question into a minimal reproducible example. In the future, I recommend doing that when you ask questions here. It will help you get better, faster help.
I installed lightgbm 3.1.0 with pip install lightgbm==3.1.0. I'm using Python 3.8.3 on Mac.
Things I changed from your example to make it an easier-to-use reproduction
removed commented code
cut the number of iterations to [10, 100] and num_leaves to [8, 10] so training would run much faster
added imports
added a specific dataset and code to produce it repeatably
reproducible example
from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split
param_grid = {
'n_estimators' : [10, 100],
'boosting_type': ['gbdt'],
'num_leaves': [8, 10],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(
random_state=42,
early_stopping_rounds = 10,
eval_metric = 'auc',
verbose_eval=20
)
grid_search = GridSearchCV(
lgb,
param_grid= param_grid,
scoring='roc_auc',
cv=5,
n_jobs=-1,
verbose=1
)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.1,
random_state=42
)
grid_search.fit(
X_train,
y_train,
eval_set = (X_test, y_test)
)
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)

dropout_rate and learning_rate do not change in RandomSearchCV

Dears, does anyone has an ideia why 'dropout_rate' and 'learning_rate' returne only 0 and does not search for the range I gave when I am doing a RandomizedSearchCV on the hyperparameters?
Here is my code for a ANN using keras/tensoflow:
# Create the model
def create_model(neurons = 1, init_mode = 'uniform', activation='relu', inputDim = 8792, dropout_rate=0.7, learn_rate=0.01, momentum=0, weight_constraint=0): #, learn_rate=0.01, momentum=0):
model = Sequential()
model.add(Dense(neurons, input_dim=inputDim, kernel_initializer=init_mode, activation=activation, kernel_constraint=maxnorm(weight_constraint), kernel_regularizer=regularizers.l2(0.001))) # one inner layer
#model.add(Dense(neurons, input_dim=inputDim, activation=activation)) # second inner layer
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation='sigmoid'))
optimizer = RMSprop(lr=learn_rate)
# compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
# model
model = KerasClassifier(build_fn=create_model, verbose=0)
# Define K-fold cross validation test harness
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)
for train, test in kfold.split(X_train, Y_train):
print("TRAIN:", train, "VALIDATION:", test)
# Define Hyperparameters
# specify parameters and distributions to sample from
from scipy.stats import randint as sp_randint
param_dist = {'neurons': sp_randint(300, 360), #, 175, 180, 185, 190, 195, 200],
'learn_rate': sp_randint (0.001, 0.01),
'batch_size': sp_randint(50, 60),
'epochs': sp_randint(20, 30),
'dropout_rate': sp_randint(0.2, 0.8),
'weight_constraint': sp_randint(3, 8)
}
# run randomized search
n_iter_search = 100
print("[INFO] Starting training digits")
print("[INFO] Tuning hyper-parameters for accuracy")
grid = RandomizedSearchCV(estimator=model, param_distributions=param_dist,
n_iter=n_iter_search, n_jobs=10, cv=kfold)
start = time.time()
grid_result = grid.fit(X_train, Y_train)
print("[INFO] GridSearch took {:.2f} seconds".format(time.time() - start))
My answer:
[INFO] GridSearch took 1164.39 seconds
[INFO] GridSearch best score 1.000000 using parameters: {'batch_size': 54, 'dropout_rate': 0, 'epochs': 20, 'learn_rate': 0, 'neurons': 331, 'weight_constraint': 7}
[INFO] Grid scores on development set:
0.614679 (0.034327) with: {'batch_size': 54, 'dropout_rate': 0, 'epochs': 29, 'learn_rate': 0, 'neurons': 354, 'weight_constraint': 6}
0.883792 (0.008650) with: {'batch_size': 53, 'dropout_rate': 0, 'epochs': 27, 'learn_rate': 0, 'neurons': 339, 'weight_constraint': 7}
0.256881 (0.012974) with: {'batch_size': 59, 'dropout_rate': 0, 'epochs': 27, 'learn_rate': 0, 'neurons': 308, 'weight_constraint': 4}
...
Thanks for helping.
0.2 and 0.8 are not integers, so when you use sp_randint(0.2, 0.8), these are converted to integers so its the same as sp_randint(0, 0). You have to use an equivalent function that generates floating point numbers, not integers.
For example, you can use a uniform distribution (uniform from scipy.stats) to generate real numbers.

Incompatible shapes on tensorflow.equal() op for correct predictions evaluation

Using the MNIST tutorial of Tensorflow, I try to make a convolutional network for face recognition with the "Database of Faces".
The images size are 112x92, I use 3 more convolutional layer to reduce it to 6 x 5 as adviced here
I'm very new at convolutional network and most of my layer declaration is made by analogy to the Tensorflow MNIST tutorial, it may be a bit clumsy, so feel free to advice me on this.
x_image = tf.reshape(x, [-1, 112, 92, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_conv3 = weight_variable([5, 5, 64, 128])
b_conv3 = bias_variable([128])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
h_pool3 = max_pool_2x2(h_conv3)
W_conv4 = weight_variable([5, 5, 128, 256])
b_conv4 = bias_variable([256])
h_conv4 = tf.nn.relu(conv2d(h_pool3, W_conv4) + b_conv4)
h_pool4 = max_pool_2x2(h_conv4)
W_conv5 = weight_variable([5, 5, 256, 512])
b_conv5 = bias_variable([512])
h_conv5 = tf.nn.relu(conv2d(h_pool4, W_conv5) + b_conv5)
h_pool5 = max_pool_2x2(h_conv5)
W_fc1 = weight_variable([6 * 5 * 512, 1024])
b_fc1 = bias_variable([1024])
h_pool5_flat = tf.reshape(h_pool5, [-1, 6 * 5 * 512])
h_fc1 = tf.nn.relu(tf.matmul(h_pool5_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
print orlfaces.train.num_classes # 40
W_fc2 = weight_variable([1024, orlfaces.train.num_classes])
b_fc2 = bias_variable([orlfaces.train.num_classes])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
My problem appear when the session run the "correct_prediction" op which is
tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
At least I think given the error message:
W tensorflow/core/common_runtime/executor.cc:1027] 0x19369d0 Compute status: Invalid argument: Incompatible shapes: [8] vs. [20]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Traceback (most recent call last):
File "./convolutional.py", line 133, in <module>
train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 405, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2728, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 419, in _do_run
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Incompatible shapes: [8] vs. [20]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Caused by op u'Equal', defined at:
File "./convolutional.py", line 125, in <module>
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 328, in equal
return _op_def_lib.apply_op("Equal", x=x, y=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in __init__
self._traceback = _extract_stack()
It looks like the y_conv output a matrix of shape 8 x batch_size instead of number_of_class x batch_size
If I change the batch size from 20 to 10, the error message stay the same but instead [8] vs. [20] I get [4] vs. [10]. So from that I conclude that the problem may come from the y_conv declaration (last line of the code above).
The loss function, optimizer, training, etc declarations is the same as in the MNIST tutorial:
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run((tf.initialize_all_variables()))
for i in xrange(1000):
batch = orlfaces.train.next_batch(20)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
print "Step %d, training accuracy %g" % (i, train_accuracy)
train_step.run(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 0.5})
print "Test accuracy %g" % accuracy.eval(feed_dict = {x: orlfaces.test.images, y_: orlfaces.test.labels, keep_prob: 1.0})
Thanks for reading, have a good day
Well, after a lot debugging, I found that my issue was due to a bad instantiation of the labels. Instead of creating arrays full of zeros and replace one value by one, I created them with random value! Stupid mistake. In case someone wondering what I did wrong there and how I fix it here is the change I made.
Anyway during all the debugging I made, to find this mistake, I found some useful information to debug this kind of problem:
For the cross entropy declaration, the tensorflow's MNIST tutorial use a formula that can lead to NaN value
This formula is
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
Instead of this, I found two ways to declare it in a safer fashion:
cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y_conv, 1e-10, 1.0)))
or also:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logit, y_))
As mrry says. printing the shape of the tensors can help to detect shape anomaly.
To get the shape of a tensor just call his get_shape() method like this:
print "W shape:", W.get_shape()
user1111929 in this question use a debug print that help me assert where the problem come from.

Keras + IndexError

I am very new to keras. Trying to build a binary classifier for an NLP task. (My code is motivated from imdb example - https://github.com/fchollet/keras/blob/master/examples/imdb_cnn.py)
Below is my code snippet:
max_features = 30
maxlen = 30
batch_size = 32
embedding_dims = 30
nb_filter = 250
filter_length = 3
hidden_dims = 250
nb_epoch = 3
(Train_X, Train_Y, Test_X, Test_Y) = load_and_split_data()
model = Sequential()
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
model.add(Convolution1D(nb_filter=nb_filter,filter_length=filter_length,border_mode="valid",activation="relu",subsample_length=1))
model.add(MaxPooling1D(pool_length=2))
model.add(Flatten())
model.add(Dense(hidden_dims))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', class_mode="binary")
fitlog = model.fit(Train_X, Train_Y, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=2)
When I run model.fit(), I get the following error:
/.virtualenvs/nnet/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
857 t0_fn = time.time()
858 try:
--> 859 outputs = self.fn()
860 except Exception:
861 if hasattr(self.fn, 'position_of_error'):
IndexError: One of the index value is out of bound. Error code: 65535.\n
Apply node that caused the error: GpuAdvancedSubtensor1(<CudaNdarrayType(float32, matrix)>, Elemwise{Cast{int64}}.0)
Toposort index: 47
Inputs types: [CudaNdarrayType(float32, matrix), TensorType(int64, vector)]
Inputs shapes: [(30, 30), (3840,)]
Inputs strides: [(30, 1), (8,)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuReshape{3}(GpuAdvancedSubtensor1.0, MakeVector{dtype='int64'}.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Can you please help me resolve this ?
You need to Pad the imdb sequences you are using, add those lines:
from keras.preprocessing import sequence
Train_X = sequence.pad_sequences(Train_X, maxlen=maxlen)
Test_X = sequence.pad_sequences(Test_X, maxlen=maxlen)
Before building the actual model.