Grid Search in Multi class classification problems using Neural networks - neural-network

I'm trying to do grid search for a multi class problem in neural networks.
I am not able to get the optimum parameters, the kernel keeps on compiling.
Is there any problem with my code? Please do help
import keras
from keras.models import Sequential
from keras.layers import Dense
# defining the baseline model:
def neural(output_dim=10,init_mode='glorot_uniform'):
model = Sequential()
model.add(Dense(output_dim=output_dim,
input_dim=2,
activation='relu',
kernel_initializer= init_mode))
model.add(Dense(output_dim=output_dim,
activation='relu',
kernel_initializer= init_mode))
model.add(Dense(output_dim=3,activation='softmax'))
# Compile model
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
return model
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
estimator = KerasClassifier(build_fn=neural,
epochs=5,
batch_size=5,
verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero',
'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
output_dim = [10, 15, 20, 25, 30,40]
param_grid = dict(batch_size=batch_size,
epochs=epochs,
output_dim=output_dim,
init_mode=init_mode)
grid = GridSearchCV(estimator=estimator,
scoring= 'accuracy',
param_grid=param_grid,
n_jobs=-1,cv=5)
grid_result = grid.fit(X_train, Y_train)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_,
grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

There's no error in your code.
Your current param grid has 864 different combinations of parameters possible.
(6 values in 'batch_size' × 3 values in 'epochs' × 8 in 'init_mode' × 6 in 'output_dim') = 864
GridSearchCV will iterate over all those possibilities and your estimator will be cloned that many times. And that is again repeated 5 times because you have set cv=5.
So your model will be cloned (compiled and params set according to the possibilities) a total of 864 x 5 = 4320 times.
So you keep seeing in output that the model is being compiled those many times.
To check if GridSearchCV is working or not, use its verbose param.
grid = GridSearchCV(estimator=estimator,
scoring= 'accuracy',
param_grid=param_grid,
n_jobs=1,cv=5, verbose=3)
This will print the current possible params being tried, the cv iteration, time taken to fit on it, current accuracy etc.

Related

NN with Keras predicts classes as dtype=float32 as oppose to true class values of 1,2,3, why?

I am implementing a simple NN on wine data set. The NN works well and produces the prediction score, however, when I am trying to explore the actual predicted values on the test data set, I receive an array with dtype=float32 values, as oppose to values of the classes.
The classes are labelled as 1, 2, 3
I have 13 attributes and 178 observations (small data set)
Below is the the code on the implementation and the outcome I get:
df.head()
enter image description here
X=df.ix[:,1:13]
y= np.ravel(df.Type)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
scale the data:
scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
define the NN
model = Sequential()
model.add(Dense(13, activation='relu', input_shape=(12,)))
model.add(Dense(4, activation='softmax'))
fit the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train1,epochs=20, batch_size=1, verbose=1)
Now this is where I store my predictions into y_pred and get the final score:
`y_pred = model.predict(X_test)`
`score = model.evaluate(X_test, y_test1,verbose=1)`
`59/59 [==============================] - 0s 2ms/step
[0.1106848283591917, 0.94915255247536356]`
When i explore y_pred I see the following:
`y_pred[:5]`
`array([[ 3.86571424e-04, 9.97601926e-01, 1.96467945e-03,
4.67598657e-05],
[ 2.67244829e-03, 9.87006545e-01, 7.04612210e-03,
3.27492505e-03],
[ 9.50196641e-04, 1.42343721e-04, 4.57215495e-02,
9.53185916e-01],
[ 9.03929677e-03, 9.63497698e-01, 2.62350030e-02,
1.22799736e-03],
[ 1.39460826e-05, 3.24015366e-03, 9.96408522e-01,
3.37353966e-04]], dtype=float32)`
Not sure why I do not see the actual predicted classes as 1,2,3?
After trying to convert into int I just get an array of zeros, as all values are so small.
Really appreciate your help!!
You are seeing the probabilities for each class. To convert probabilities to class just take the max of each case.
import numpy as np
y_pred_class = np.argmax(y_pred,axis=1)

CNN binary classifier performing poorly

I'm building a neural network to classify images that have an email address written on them. The positive folder contains images with email addresses written on top of the picture, in different fonts, colors, sizes and positions.
The negative folder contains images without text on top and also images with text on top that doesn't have the format of an email address (no # sign).
The pictures are 300 x 225 x 3 (rgb).
It should be a simple a simple classification task (the NN should be able to pick up that when there's an #, the image has an email) but my model isn't performing well. It's stuck at 83% test accuracy after 25 epochs. Also, it's taking 10 hours to train, which sounds excessive to me.
Can you help me to analyse the structure of my CNN and suggest improvements (or help me avoid pitfalls)?
The model I wrote is this:
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.preprocessing.image import ImageDataGenerator
input_size = (64, 48)
# Initialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape = (*input_size, 3), activation = 'relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Part 2 - Fitting the CNN to the images
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('./training_Set',
target_size = input_size,
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('./test_set',
target_size = input_size,
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 25,
validation_data = test_set,
validation_steps = 2000)

Python Neural Network Prediction

I am working on a project , I have two various numbers :
1- First number maximum value is 1500 and minimum is 200.
2- Second number maximum value is 200 and minimum is 10.
3- I want to create neural network , adding samples and train the network to predict the last number, for example :
900,67 equals 87
870,99 equals 100
1000,50 equals ?
What's type of neural networks can work with my project?
To this example, you input two values and get one.
import numpy as np
import keras
from keras import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
MIN = np.random.rand(100)*500
MAX = np.random.rand(100)*500 + 500
x = np.concatenate((MIN.reshape(-1,1),MAX.reshape(-1,1)),axis = 1)
y = np.sin(x[:,0])*500 + np.cos(x[:,1])*500
x_max = x.max()
y_max = y.max()
x = x/x_max
y = (y-y.min())/(y_max-y.min())
model = Sequential()
model.add(Dense(200,input_dim = 2, activation = 'relu'))
model.add(Dense(100, activation = 'sigmoid'))
model.add(Dense(100, activation = 'sigmoid'))
model.add(Dense(1,activation = 'relu'))
opt = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.6, beta_2=0.97, amsgrad=False)
model.compile(loss='mean_squared_error',optimizer=opt , metrics=['mse'])
model.fit(x, y, epochs=10000, batch_size=2)
y_hat = model.predict(x)
plt.figure(figsize=(10,5))
plt.plot(y)
plt.plot(y_hat.reshape(-1))
This is the result:
You will need to make pre and post processing, normalizing inputs and reescaling outputs from the neural network. This is the input:
use example:
In [10]: model.predict(np.array([0.234,0.567]).reshape(-1,2))
Out[10]: array([[0.61975896]], dtype=float32)

Why does my CIFAR 100 CNN model mainly predict two classes?

I am currently trying to get a decent score (> 40% accuracy) with Keras on CIFAR 100. However, I'm experiencing a weird behaviour of a CNN model: It tends to predict some classes (2 - 5) much more often than others:
The pixel at position (i, j) contains the count how many elements of the validation set from class i were predicted to be of class j. Thus the diagonal contains the correct classifications, everything else is an error. The two vertical bars indicate that the model often predicts those classes, although it is not the case.
CIFAR 100 is perfectly balanced: All 100 classes have 500 training samples.
Why does the model tend to predict some classes MUCH more often than other classes? How can this be fixed?
The code
Running this takes a while.
#!/usr/bin/env python
from __future__ import print_function
from keras.datasets import cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
import numpy as np
batch_size = 32
nb_classes = 100
nb_epoch = 50
data_augmentation = True
# input image dimensions
img_rows, img_cols = 32, 32
# The CIFAR10 images are RGB.
img_channels = 3
# The data, shuffled and split between train and test sets:
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
test_size=0.20,
random_state=42)
# Shuffle training data
perm = np.arange(len(X_train))
np.random.shuffle(perm)
X_train = X_train[perm]
y_train = y_train[perm]
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_val.shape[0], 'validation samples')
print(X_test.shape[0], 'test samples')
# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
Y_val = np_utils.to_categorical(y_val, nb_classes)
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_val /= 255
X_test /= 255
if not data_augmentation:
print('Not using data augmentation.')
model.fit(X_train, Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_val, y_val),
shuffle=True)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(X_train)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=X_train.shape[0],
nb_epoch=nb_epoch,
validation_data=(X_val, Y_val))
model.save('cifar100.h5')
Visualization code
#!/usr/bin/env python
"""Analyze a cifar100 keras model."""
from keras.models import load_model
from keras.datasets import cifar100
from sklearn.model_selection import train_test_split
import numpy as np
import json
import io
import matplotlib.pyplot as plt
try:
to_unicode = unicode
except NameError:
to_unicode = str
n_classes = 100
def plot_cm(cm, zero_diagonal=False):
"""Plot a confusion matrix."""
n = len(cm)
size = int(n / 4.)
fig = plt.figure(figsize=(size, size), dpi=80, )
plt.clf()
ax = fig.add_subplot(111)
ax.set_aspect(1)
res = ax.imshow(np.array(cm), cmap=plt.cm.viridis,
interpolation='nearest')
width, height = cm.shape
fig.colorbar(res)
plt.savefig('confusion_matrix.png', format='png')
# Load model
model = load_model('cifar100.h5')
# Load validation data
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
test_size=0.20,
random_state=42)
# Calculate confusion matrix
y_val_i = y_val.flatten()
y_val_pred = model.predict(X_val)
y_val_pred_i = y_val_pred.argmax(1)
cm = np.zeros((n_classes, n_classes), dtype=np.int)
for i, j in zip(y_val_i, y_val_pred_i):
cm[i][j] += 1
acc = sum([cm[i][i] for i in range(100)]) / float(cm.sum())
print("Validation accuracy: %0.4f" % acc)
# Create plot
plot_cm(cm)
# Serialize confusion matrix
with io.open('cm.json', 'w', encoding='utf8') as outfile:
str_ = json.dumps(cm.tolist(),
indent=4, sort_keys=True,
separators=(',', ':'), ensure_ascii=False)
outfile.write(to_unicode(str_))
Red herrings
tanh
I've replaced tanh by relu. The history csv looks ok, but the visualization has the same problem:
Please also note that the validation accuracy here is only 3.44%.
Dropout + tanh + border mode
Removing dropout, replacing tanh by relu, setting border mode to same everywhere: history csv
The visualization code still gives a much lower accuracy (8.50% this time) than the keras training code.
Q & A
The following is a summary of the comments:
The data is evenly distributed over the classes. So there is no "over training" of those two classes.
Data augmentation is used, but without data augmentation the problem persists.
The visualization is not the problem.
If you get good accuracy during training and validation, but not when testing, make sure you do exactly the same preprocessing on your dataset in both cases.
Here you have when training:
X_train /= 255
X_val /= 255
X_test /= 255
But no such code when predicting for your confusion matrix. Adding to testing:
X_val /= 255.
Gives the following nice looking confusion matrix:
I don't have a good feeling with this part of the code:
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
The remaining model is full of relus, but here there is a tanh.
tanh sometimes vanishes or explodes (saturates at -1 and 1), which might lead to your 2-class overimportance.
keras-example cifar 10 basically uses the same architecture (dense-layer sizes might be different), but also uses a relu there (no tanh at all). The same goes for this external keras-based cifar 100 code.
One important part of the problem was that my ~/.keras/keras.json was
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
Hence I had to change image_dim_ordering to tf. This leads to
and an accuracy of 12.73%. Obviously, there is still a problem as the validation history gave 45.1% accuracy.
I don't see you doing mean-centering, even in datagen. I suspect this is the main cause. To do mean centering using ImageDataGenerator, set featurewise_center = 1. Another way is to subtract the ImageNet mean from each RGB pixel. The mean vector to be subtracted is [103.939, 116.779, 123.68].
Make all activations relus, unless you have a specific reason to have a single tanh.
Remove two dropouts of 0.25 and see what happens. If you want to apply dropouts to convolution layer, it is better to use SpatialDropout2D. It is somehow removed from Keras online documentation but you can find it in the source.
You have two conv layers with same and two with valid. There is nothing wrong in this, but it would be simpler to keep all conv layers with same and control your size just based on max-poolings.

How to build simple neural network on keras (not image recognition)

I am new to keras and I am trying to built my own neural network.
A task:
I need to write a system that can make decisions for the character, which may meet one or more enemies. The system can be known:
Percentage Health character
Presence of the pistol;
The number of enemies.
The answer must be in the form of one of the following:
Attack
Run
Hide (for a surprise attack)
To do nothing
To train up I made a table of "lessons":
https://i.stack.imgur.com/lD0WX.png
So here is my code:
# Create first network with Keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# split into input (X) and output (Y) variables
X = numpy.array([[0.5,1,1], [0.9,1,2], [0.8,0,1], [0.3,1,1], [0.6,1,2], [0.4,0,1], [0.9,1,7], [0.5,1,4], [0.1,0,1], [0.6,1,0], [1,0,0]])
Y = numpy.array([[1],[1],[1],[2],[2],[2],[3],[3],[3],[4],[4]])
# create model
model = Sequential()
model.add(Dense(3, input_dim=3, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
sgd = SGD(lr=0.001)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
# Fit the model
model.fit(X, Y, nb_epoch=150)
# calculate predictions
predictions = model.predict(X)
# round predictions
rounded = [round(x) for x in predictions]
print(rounded)
Here the predictions I get.
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
The accuracy on each epoch is 0.2727 and the loss is decrease.
It's not right.
I was trying to devide learning rate by 10, changing activations and optimizers. Even data I input manually.
Can anyone tell me how to solve my simple problem. thx.
There are several problems in your code.
Number of data entries are very small compared to the NN model.
Y is represented as classes number and not as class vector. A regression model can be learnt on this but its a poor design choice.
output of softmax function is always between 0-1 .. as this is used your model only knows to spew out values between 0-1.
Here below is a bit better modified code:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# split into input (X) and output (Y) variables
X = numpy.array([[0.5,1,1], [0.9,1,2], [0.8,0,1], [0.3,1,1], [0.6,1,2], [0.4,0,1], [0.9,1,7], [0.5,1,4], [0.1,0,1], [0.6,1,0], [1,0,0]])
y = numpy.array([[1],[1],[1],[2],[2],[2],[3],[3],[3],[0],[0]])
from keras.utils import np_utils
Y = np_utils.to_categorical(y, 4)
# print Y
# create model
model = Sequential()
model.add(Dense(3, input_dim=3, activation='relu'))
model.add(Dense(4, activation='softmax'))
# Compile model
# sgd = SGD(lr=0.1)
# model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, nb_epoch=700)
# calculate predictions
predictions = model.predict(X)
predictions_class = predictions.argmax(axis=-1)
print(predictions_class)
Note I have used the softmax activation as the classes are mutually exclusive