I have been reading Keras documentation to build my own MLP network that implements MLP backpropagation. I am familiar with MLPClassifier in sklearn but I want to learn Keras for deep learning. The following is the first attempt. The network has 3 layers of 1 input (features=64), 1 output and 1 hidden. The total is (64,64,1). The input is numpy matrix X of 125K samples (64 dim) and y is a 1D numpy binary class (1, -1):
# Keras imports
from keras.models import Sequential
from sklearn.model_selection import train_test_split
from keras.layers import Dense, Dropout, Activation
from keras.initializers import RandomNormal, VarianceScaling, RandomUniform
from keras.optimizers import SGD, Adam, Nadam, RMSprop
# System imports
import sys
import os
import numpy as np
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
def train_model(X, y, num_streams, num_stages):
'''
STEP1: Initialize the Model
'''
tr_X, ts_X, tr_y, ts_y = train_test_split(X, y, train_size=.8)
model = initialize_model(num_streams, num_stages)
'''
STEP2: Train the Model
'''
model.compile(loss='binary_crossentropy',
optimizer=Adam(lr=1e-3),
metrics=['accuracy'])
model.fit(tr_X, tr_y,
validation_data=(ts_X, ts_y),
epochs=3,
batch_size=200)
def initialize_model(num_streams, num_stages):
model = Sequential()
hidden_units = 2 ** (num_streams + 1)
# init = VarianceScaling(scale=5.0, mode='fan_in', distribution='normal')
init_bound1 = np.sqrt(3.5 / ((num_stages + 1) + num_stages))
init_bound2 = np.sqrt(3.5 / ((num_stages + 1) + hidden_units))
init_bound3 = np.sqrt(3.5 / (hidden_units + 1))
# drop_out = np.random.uniform(0, 1, 3)
# This is the input layer (that's why you have to state input_dim value)
model.add(Dense(num_stages,
input_dim=num_stages,
activation='relu',
kernel_initializer=RandomUniform(minval=-init_bound1, maxval=init_bound1)))
model.add(Dense(hidden_units,
activation='relu',
kernel_initializer=RandomUniform(minval=-init_bound2, maxval=init_bound2)))
# model.add(Dropout(drop_out[1]))
# This is the output layer
model.add(Dense(1,
activation='sigmoid',
kernel_initializer=RandomUniform(minval=-init_bound3, maxval=init_bound3)))
return model
The problem is that I get 99% accuracy with the same dataset X and y when using MLPClassifier sklearn. However, Keras gives poor accuracy as seen below:
Train on 100000 samples, validate on 25000 samples
Epoch 1/3
100000/100000 [==============================] - 1s - loss: -0.5358 - acc: 0.0022 - val_loss: -0.7322 - val_acc: 0.0000e+00
Epoch 2/3
100000/100000 [==============================] - 1s - loss: -0.6353 - acc: 0.0000e+00 - val_loss: -0.7385 - val_acc: 0.0000e+00
Epoch 3/3
100000/100000 [==============================] - 1s - loss: -0.7720 - acc: 9.0000e-05 - val_loss: -0.9474 - val_acc: 5.2000e-04
I don't understand why? Am I missing something here? Any help is appreciated.
I think the problem is that you are using a sigmoid output layer (bound to [0, 1]) but your classes are (1, -1), you need to change your output values or use tanh.
Also keras layers may have different default parameters than sklearn, make sure you take a look at those in the documentation.
One last thing, for the kernel_initializer try glorot_uniform, it is a good default.
check by converting your labeled data to one hot code before training the model.
For more info on why one hot code check out https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
Related
I am training a classification model to classify cells, and my model is based on this paper: https://www.nature.com/articles/s41598-019-50010-9. As my dataset consists of only 10 images, I performed image augmentation to artificially increase the size of the dataset to 3000 images, which were then split into 2400 training images and 600 validation images.
However, while the training loss and accuracy improved upon more iterations, the validation loss increased rapidly while validation accuracy remained stagnant at 0.0000e+00.
Is my model overfitting severely right from the start?
The code I used is as shown below:
import keras
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model, load_model, Sequential, model_from_json, load_model
from tensorflow.keras.layers import Input, BatchNormalization, Activation, Flatten, Dense, LeakyReLU
from tensorflow.python.keras.layers.core import Lambda, Dropout
from tensorflow.python.keras.layers.convolutional import Conv2D, Conv2DTranspose, UpSampling2D
from tensorflow.python.keras.layers.pooling import MaxPooling2D, AveragePooling2D
from tensorflow.python.keras.layers.merge import Concatenate, Add
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.optimizers import *
img_channel = 1
input_size = (512, 512, 1)
inputs = Input(shape = input_size)
initial_input = Lambda(lambda x: x) (inputs) #Ensure input value is between 0 and 1 to avoid negative loss
kernel_size = (3,3)
pad = 'same'
model = Sequential()
filters = 2
c1 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(initial_input)
b1 = BatchNormalization()(c1)
a1 = Activation('elu')(b1)
p1 = AveragePooling2D()(a1)
c2 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p1)
b2 = BatchNormalization()(c2)
a2 = Activation('elu')(b2)
p2 = AveragePooling2D()(a2)
c3 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p2)
b3 = BatchNormalization()(c3)
a3 = Activation('elu')(b3)
p3 = AveragePooling2D()(a3)
c4 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p3)
b4 = BatchNormalization()(c4)
a4 = Activation('elu')(b4)
p4 = AveragePooling2D()(a4)
c5 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p4)
b5 = BatchNormalization()(c5)
a5 = Activation('elu')(b5)
p5 = AveragePooling2D()(a5)
f = Flatten()(p5)
d1 = Dense(128, activation = 'elu')(f)
d2 = Dense(no_of_img, activation = 'softmax')(d1)
model = Model(inputs = [inputs], outputs = [d2])
print(model.summary())
learning_rate = 0.001
decay_rate = 0.0001
model.compile(optimizer = SGD(lr = learning_rate, decay = decay_rate, momentum = 0.9, nesterov = False),
loss = 'categorical_crossentropy', metrics = ['accuracy'])
perf_lr_scheduler = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.9, patience = 3,
verbose = 1, min_delta = 0.01, min_lr = 0.000001)
model_earlystop = EarlyStopping(monitor = 'val_loss', min_delta = 0.001, patience = 10, restore_best_weights = True)
#Convert labels to binary matrics
img_aug_label = to_categorical(img_aug_label, num_classes = no_of_img)
#Convert images to float to between 0 and 1
img_aug = np.float32(img_aug)/255
plt.imshow(img_aug[0,:,:,0])
plt.show()
#Train on augmented images
model.fit(
img_aug,
img_aug_label,
batch_size = 4,
epochs = 100,
validation_split = 0.2,
shuffle = True,
callbacks = [perf_lr_scheduler],
verbose = 2)
The output of my model is as shown below:
Train on 2400 samples, validate on 600 samples
Epoch 1/100
2400/2400 - 12s - loss: 0.6474 - accuracy: 0.8071 - val_loss: 9.8161 - val_accuracy: 0.0000e+00
Epoch 2/100
2400/2400 - 10s - loss: 0.0306 - accuracy: 0.9921 - val_loss: 10.1733 - val_accuracy: 0.0000e+00
Epoch 3/100
2400/2400 - 10s - loss: 0.0058 - accuracy: 0.9996 - val_loss: 10.9820 - val_accuracy: 0.0000e+00
Epoch 4/100
Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0009000000427477062.
2400/2400 - 10s - loss: 0.0019 - accuracy: 1.0000 - val_loss: 11.3029 - val_accuracy: 0.0000e+00
Epoch 5/100
2400/2400 - 10s - loss: 0.0042 - accuracy: 0.9992 - val_loss: 11.9037 - val_accuracy: 0.0000e+00
Epoch 6/100
2400/2400 - 10s - loss: 0.0024 - accuracy: 0.9996 - val_loss: 11.5218 - val_accuracy: 0.0000e+00
Epoch 7/100
Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0008100000384729356.
2400/2400 - 10s - loss: 9.9053e-04 - accuracy: 1.0000 - val_loss: 11.7658 - val_accuracy: 0.0000e+00
Epoch 8/100
2400/2400 - 10s - loss: 0.0011 - accuracy: 1.0000 - val_loss: 12.0437 - val_accuracy: 0.0000e+00
Epoch 9/100
I realised the error occured as I had not shuffled the data manually before using it as training data for the model. I thought the validation_split and shuffle arguments would only occur during training time, but in fact this happened before training time. In other words, the fit function will split your data into training and validation sets first, before shuffling the data in each set (but not across).
For my augmented dataset, the split had occurred in a position where the validation set contained types of images not found in the training set. Consequently, the model was performing validation on types of data that it had not seen in the training set, resulting in the poor validation loss and accuracy. Manually shuffling the data before the fitting them into the model solved this problem.
My model has two inputs branches, input1 2D grayscale images, and input2 color images. The two inputs branches are merged using the concatenate method and classified using a softmax function. The model is working fine but the problem is in understanding the operation of softmax in multiple inputs model and also how weights are updated in both the branches.
The softmax function and loss function perform similar to the single input/output model even in the case of multiple input/output model. If we only passed a single loss function to the model, the same loss function would be applied to every output, unless you specify different loss function and different activation function for multiple outputs.
Consider the following model, which has an image input of shape (32, 32, 3) (that's (height, width, channels)) and a timeseries input of shape (None, 10) (that's (timesteps, features)). Our model will have two outputs computed from the combination of these inputs: a "score" (of shape (1,)) and a probability distribution over five classes (of shape (5,)).
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')
x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)
x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)
x = layers.concatenate([x1, x2])
score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, name='class_output')(x)
model = keras.Model(inputs=[image_input, timeseries_input],
outputs=[score_output, class_output])
Let's plot this model, so you can clearly see what we're doing here (note that the shapes shown in the plot are batch shapes, rather than per-sample shapes).
keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)
If we only passed a single loss function to the model, the same loss function would be applied to every output, which is not appropriate here.
Passing data to a multi-input or multi-output model in fit works in a similar way as specifying a loss function in compile: you can pass lists of Numpy arrays (with 1:1 mapping to the outputs that received a loss function) or dicts mapping output names to Numpy arrays of training data.
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(),
keras.losses.CategoricalCrossentropy(from_logits=True)])
# Generate dummy Numpy data
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))
# Fit on lists
model.fit([img_data, ts_data], [score_targets, class_targets],
batch_size=32,
epochs=3)
# Alternatively, fit on dicts
model.fit({'img_input': img_data, 'ts_input': ts_data},
{'score_output': score_targets, 'class_output': class_targets},
batch_size=32,
epochs=3)
Output -
Train on 100 samples
Epoch 1/3
100/100 [==============================] - 2s 22ms/sample - loss: 5.2477 - score_output_loss: 0.1809 - class_output_loss: 5.3292
Epoch 2/3
100/100 [==============================] - 0s 191us/sample - loss: 4.8558 - score_output_loss: 0.1235 - class_output_loss: 4.5884
Epoch 3/3
100/100 [==============================] - 0s 202us/sample - loss: 4.7482 - score_output_loss: 0.1421 - class_output_loss: 4.5786
Train on 100 samples
Epoch 1/3
100/100 [==============================] - 0s 260us/sample - loss: 4.6704 - score_output_loss: 0.1377 - class_output_loss: 4.5686
Epoch 2/3
100/100 [==============================] - 0s 204us/sample - loss: 4.6210 - score_output_loss: 0.2038 - class_output_loss: 4.5260
Epoch 3/3
100/100 [==============================] - 0s 188us/sample - loss: 4.6014 - score_output_loss: 0.1678 - class_output_loss: 4.3346
<tensorflow.python.keras.callbacks.History at 0x7f8cc43a9ac8>
I have a question regarding feature extraction with data augmentation in Keras. I am building a dog breed classifier.
By feature extraction, I am referring to extending the model, (conv_base, VGG16) by adding Dense layers on top, and running the whole thing end to end on the input data. This will allow me to use data augmentation, because every input image goes through the convolutional base every time it’s seen by the model.
Training Set: 6680 images belonging to 133 classes
Validation Set: 835 images belonging to 133 classes
Test Set: 836 images belonging to 133 classes
I was able to successfully implement data augmentation and feature extraction independently of one another but when I try combining the 2, my accuracy is coming out incredibly small for some reason. Why is this? Am I doing something majorly wrong with my approach?
from keras.applications import VGG16
from keras.preprocessing.image import ImageDataGenerator
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(224, 224, 3))
model = Sequential()
model.add(conv_base)
conv_base.trainable = False
model.add(GlobalAveragePooling2D())
model.add(Dense(133, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
train_datagen_aug = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,)
test_datagen_aug = ImageDataGenerator(rescale=1./255)
train_generator_aug = train_datagen_aug.flow_from_directory(
'myImages/train',
target_size=(224, 224),
batch_size=50,
class_mode='categorical')
validation_generator_aug = test_datagen_aug.flow_from_directory(
'myImages/valid',
target_size=(224, 224),
batch_size=32,
class_mode='categorical')
checkpointer_aug = ModelCheckpoint(filepath='saved_models/dogs_transfer_aug_model.h5',
save_best_only=True)
history = model.fit_generator(
train_generator_aug,
steps_per_epoch=130,
epochs=20,
validation_data=validation_generator_aug,
verbose=1,
callbacks=[checkpointer_aug],
validation_steps=26)
Output looks like this:
Epoch 1/20
130/130 [==============================] - 293s - loss: 15.9044 - acc: 0.0083 - val_loss: 16.0019 - val_acc: 0.0072
Epoch 2/20
130/130 [==============================] - 281s - loss: 15.9972 - acc: 0.0075 - val_loss: 15.9977 - val_acc: 0.0075
Epoch 3/20
130/130 [==============================] - 280s - loss: 16.0220 - acc: 0.0060 - val_loss: 15.9977 - val_acc: 0.0075
Epoch 4/20
130/130 [==============================] - 280s - loss: 15.9941 - acc: 0.0077 - val_loss: 16.0019 - val_acc: 0.0072
I recommend it is model over fitting issue as shown in model's loss and accuracy. We can have a go with a smaller version (with reduced layers) of VGG16
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D
NUMBER_OF_TRAINING_SAMPLES = 6668
NUMBER_OF_VALIDATION_SAMPLES = 835 # let's say you have 400 dogs and 400 cats
batch_size = 32
out_classes = 133
input_shape=(224, 224, 3)
def buildSmallVGG(out_classes, input_shape):
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=input_shape))
model.add(Conv2D(16, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Conv2D(32, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Conv2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(out_classes, activation='softmax'))
return model
model = buildSmallVGG(out_classes, input_shape)
history = model.fit_generator(
train_generator_aug,
steps_per_epoch=NUMBER_OF_TRAINING_SAMPLES // batch_size,
epochs=20,
validation_data=validation_generator_aug,
callbacks=[checkpointer_aug],
validation_steps=NUMBER_OF_VALIDATION_SAMPLES // batch_size)
The above is not tested. Would be good if you could share your results on what you get in loss, accuracy etc.
Here is a simple keras neural network that attempts to map 1->1 and 2->0 (binary classification)
X = [[1] , [2]]
Y = [[1] , [0]]
from keras.callbacks import History
history = History()
from keras import optimizers
inputDim = len(X[0])
print('input dim' , inputDim)
model = Sequential()
model.add(Dense(1, activation='sigmoid', input_dim=inputDim))
model.add(Dense(1, activation='sigmoid'))
sgd = optimizers.SGD(lr=0.009, decay=1e-10, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd , metrics=['accuracy'])
model.fit(X,Y , validation_split=0.1 , verbose=2 , callbacks=[history] , epochs=20,batch_size=32)
Using SGD optimizer :
optimizers.SGD(lr=0.009, decay=1e-10, momentum=0.9, nesterov=True)
Output for epoch 20 :
Epoch 20/20
0s - loss: 0.5973 - acc: 1.0000 - val_loss: 0.4559 - val_acc: 0.0000e+00
If I use the adam optomizer :
sgd = optimizers.adam(lr=0.009, decay=1e-10)
Output for epoch 20 :
Epoch 20/20
0s - loss: 1.2140 - acc: 0.0000e+00 - val_loss: 0.2930 - val_acc: 1.0000
Switching between adam and sgd optimizers appears to reverse values for acc and val_acc . val_acc = 1 using adam but as acc is 0 , how can validation accuracy be at maximum and training accuracy be at minimum ?
Using sigmoid after sigmoid is a really bad idea. E.g. in this paper it's written why sigmoid suffers from a so-called saturation problem. Moreover - when you use sigmoid after sigmoid you push the overall saturation of your network to by sky-rocketing in fact. To understand why - notice that the output from a first layer is always from an interval (0, 1). As binary_crossentropy tries to make this output (transformed as linear transformation) as close to +/- inf as possible this makes your network to have extremely high weights. This is probably causing your total instability.
In order to solve your problem, I would simply leave only one layer with sigmoid as your problem has a linear separation property.
UPDATE:
As #Daniel mentioned - when you split your dataset containing two examples you end-up having one example in a dataset and other in a validation set. This is causing this weird behavior.
I am currently trying to get a decent score (> 40% accuracy) with Keras on CIFAR 100. However, I'm experiencing a weird behaviour of a CNN model: It tends to predict some classes (2 - 5) much more often than others:
The pixel at position (i, j) contains the count how many elements of the validation set from class i were predicted to be of class j. Thus the diagonal contains the correct classifications, everything else is an error. The two vertical bars indicate that the model often predicts those classes, although it is not the case.
CIFAR 100 is perfectly balanced: All 100 classes have 500 training samples.
Why does the model tend to predict some classes MUCH more often than other classes? How can this be fixed?
The code
Running this takes a while.
#!/usr/bin/env python
from __future__ import print_function
from keras.datasets import cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
import numpy as np
batch_size = 32
nb_classes = 100
nb_epoch = 50
data_augmentation = True
# input image dimensions
img_rows, img_cols = 32, 32
# The CIFAR10 images are RGB.
img_channels = 3
# The data, shuffled and split between train and test sets:
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
test_size=0.20,
random_state=42)
# Shuffle training data
perm = np.arange(len(X_train))
np.random.shuffle(perm)
X_train = X_train[perm]
y_train = y_train[perm]
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_val.shape[0], 'validation samples')
print(X_test.shape[0], 'test samples')
# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
Y_val = np_utils.to_categorical(y_val, nb_classes)
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_val /= 255
X_test /= 255
if not data_augmentation:
print('Not using data augmentation.')
model.fit(X_train, Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_val, y_val),
shuffle=True)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(X_train)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=X_train.shape[0],
nb_epoch=nb_epoch,
validation_data=(X_val, Y_val))
model.save('cifar100.h5')
Visualization code
#!/usr/bin/env python
"""Analyze a cifar100 keras model."""
from keras.models import load_model
from keras.datasets import cifar100
from sklearn.model_selection import train_test_split
import numpy as np
import json
import io
import matplotlib.pyplot as plt
try:
to_unicode = unicode
except NameError:
to_unicode = str
n_classes = 100
def plot_cm(cm, zero_diagonal=False):
"""Plot a confusion matrix."""
n = len(cm)
size = int(n / 4.)
fig = plt.figure(figsize=(size, size), dpi=80, )
plt.clf()
ax = fig.add_subplot(111)
ax.set_aspect(1)
res = ax.imshow(np.array(cm), cmap=plt.cm.viridis,
interpolation='nearest')
width, height = cm.shape
fig.colorbar(res)
plt.savefig('confusion_matrix.png', format='png')
# Load model
model = load_model('cifar100.h5')
# Load validation data
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
test_size=0.20,
random_state=42)
# Calculate confusion matrix
y_val_i = y_val.flatten()
y_val_pred = model.predict(X_val)
y_val_pred_i = y_val_pred.argmax(1)
cm = np.zeros((n_classes, n_classes), dtype=np.int)
for i, j in zip(y_val_i, y_val_pred_i):
cm[i][j] += 1
acc = sum([cm[i][i] for i in range(100)]) / float(cm.sum())
print("Validation accuracy: %0.4f" % acc)
# Create plot
plot_cm(cm)
# Serialize confusion matrix
with io.open('cm.json', 'w', encoding='utf8') as outfile:
str_ = json.dumps(cm.tolist(),
indent=4, sort_keys=True,
separators=(',', ':'), ensure_ascii=False)
outfile.write(to_unicode(str_))
Red herrings
tanh
I've replaced tanh by relu. The history csv looks ok, but the visualization has the same problem:
Please also note that the validation accuracy here is only 3.44%.
Dropout + tanh + border mode
Removing dropout, replacing tanh by relu, setting border mode to same everywhere: history csv
The visualization code still gives a much lower accuracy (8.50% this time) than the keras training code.
Q & A
The following is a summary of the comments:
The data is evenly distributed over the classes. So there is no "over training" of those two classes.
Data augmentation is used, but without data augmentation the problem persists.
The visualization is not the problem.
If you get good accuracy during training and validation, but not when testing, make sure you do exactly the same preprocessing on your dataset in both cases.
Here you have when training:
X_train /= 255
X_val /= 255
X_test /= 255
But no such code when predicting for your confusion matrix. Adding to testing:
X_val /= 255.
Gives the following nice looking confusion matrix:
I don't have a good feeling with this part of the code:
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
The remaining model is full of relus, but here there is a tanh.
tanh sometimes vanishes or explodes (saturates at -1 and 1), which might lead to your 2-class overimportance.
keras-example cifar 10 basically uses the same architecture (dense-layer sizes might be different), but also uses a relu there (no tanh at all). The same goes for this external keras-based cifar 100 code.
One important part of the problem was that my ~/.keras/keras.json was
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
Hence I had to change image_dim_ordering to tf. This leads to
and an accuracy of 12.73%. Obviously, there is still a problem as the validation history gave 45.1% accuracy.
I don't see you doing mean-centering, even in datagen. I suspect this is the main cause. To do mean centering using ImageDataGenerator, set featurewise_center = 1. Another way is to subtract the ImageNet mean from each RGB pixel. The mean vector to be subtracted is [103.939, 116.779, 123.68].
Make all activations relus, unless you have a specific reason to have a single tanh.
Remove two dropouts of 0.25 and see what happens. If you want to apply dropouts to convolution layer, it is better to use SpatialDropout2D. It is somehow removed from Keras online documentation but you can find it in the source.
You have two conv layers with same and two with valid. There is nothing wrong in this, but it would be simpler to keep all conv layers with same and control your size just based on max-poolings.