Related
This is a regression problem.
The shape of my training is: (417, 5) and the test data shape is: (105, 5). I do scaling for both using the following code:
from sklearn import preprocessing
import sklearn
from sklearn.preprocessing import MinMaxScaler
#Scale train
scaler = preprocessing.MinMaxScaler()
train_df = scaler.fit_transform(train_df)
train_df = pd.DataFrame(train_df)
#Scale test
test_df = scaler.fit_transform(test_df)
test_df = pd.DataFrame(test_df)
First four rows of training data after scaling look like below:
while '4' is the dependent variable and the rest are independent variables.
After training using deep neural network, I get predictions in scaled form. I try to unscale predictions using the following code:
scaler.inverse_transform(y_pred_dnn)
while predictions are stored in y_pred_dnn
But I get the following error:
ValueError: non-broadcastable output operand with shape (105,1) doesn't match the broadcast shape (105,5)
How do I debug the problem?
Thanks
you can solve this by separating out y before scaling. You dont need to scale y for prediction. So try:
y_train, y_test = train_df.iloc[:, 4], test_df.iloc[:, 4]
X_train, X_test = train_df.iloc[:, 1:4], test_df.iloc[:, 1:4]
After this you do te scaling on X part only and you wont need any inverse scaling
I am implementing a simple NN on wine data set. The NN works well and produces the prediction score, however, when I am trying to explore the actual predicted values on the test data set, I receive an array with dtype=float32 values, as oppose to values of the classes.
The classes are labelled as 1, 2, 3
I have 13 attributes and 178 observations (small data set)
Below is the the code on the implementation and the outcome I get:
df.head()
enter image description here
X=df.ix[:,1:13]
y= np.ravel(df.Type)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
scale the data:
scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
define the NN
model = Sequential()
model.add(Dense(13, activation='relu', input_shape=(12,)))
model.add(Dense(4, activation='softmax'))
fit the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train1,epochs=20, batch_size=1, verbose=1)
Now this is where I store my predictions into y_pred and get the final score:
`y_pred = model.predict(X_test)`
`score = model.evaluate(X_test, y_test1,verbose=1)`
`59/59 [==============================] - 0s 2ms/step
[0.1106848283591917, 0.94915255247536356]`
When i explore y_pred I see the following:
`y_pred[:5]`
`array([[ 3.86571424e-04, 9.97601926e-01, 1.96467945e-03,
4.67598657e-05],
[ 2.67244829e-03, 9.87006545e-01, 7.04612210e-03,
3.27492505e-03],
[ 9.50196641e-04, 1.42343721e-04, 4.57215495e-02,
9.53185916e-01],
[ 9.03929677e-03, 9.63497698e-01, 2.62350030e-02,
1.22799736e-03],
[ 1.39460826e-05, 3.24015366e-03, 9.96408522e-01,
3.37353966e-04]], dtype=float32)`
Not sure why I do not see the actual predicted classes as 1,2,3?
After trying to convert into int I just get an array of zeros, as all values are so small.
Really appreciate your help!!
You are seeing the probabilities for each class. To convert probabilities to class just take the max of each case.
import numpy as np
y_pred_class = np.argmax(y_pred,axis=1)
I am currently trying to get a decent score (> 40% accuracy) with Keras on CIFAR 100. However, I'm experiencing a weird behaviour of a CNN model: It tends to predict some classes (2 - 5) much more often than others:
The pixel at position (i, j) contains the count how many elements of the validation set from class i were predicted to be of class j. Thus the diagonal contains the correct classifications, everything else is an error. The two vertical bars indicate that the model often predicts those classes, although it is not the case.
CIFAR 100 is perfectly balanced: All 100 classes have 500 training samples.
Why does the model tend to predict some classes MUCH more often than other classes? How can this be fixed?
The code
Running this takes a while.
#!/usr/bin/env python
from __future__ import print_function
from keras.datasets import cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
import numpy as np
batch_size = 32
nb_classes = 100
nb_epoch = 50
data_augmentation = True
# input image dimensions
img_rows, img_cols = 32, 32
# The CIFAR10 images are RGB.
img_channels = 3
# The data, shuffled and split between train and test sets:
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
test_size=0.20,
random_state=42)
# Shuffle training data
perm = np.arange(len(X_train))
np.random.shuffle(perm)
X_train = X_train[perm]
y_train = y_train[perm]
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_val.shape[0], 'validation samples')
print(X_test.shape[0], 'test samples')
# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
Y_val = np_utils.to_categorical(y_val, nb_classes)
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_val /= 255
X_test /= 255
if not data_augmentation:
print('Not using data augmentation.')
model.fit(X_train, Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_val, y_val),
shuffle=True)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(X_train)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=X_train.shape[0],
nb_epoch=nb_epoch,
validation_data=(X_val, Y_val))
model.save('cifar100.h5')
Visualization code
#!/usr/bin/env python
"""Analyze a cifar100 keras model."""
from keras.models import load_model
from keras.datasets import cifar100
from sklearn.model_selection import train_test_split
import numpy as np
import json
import io
import matplotlib.pyplot as plt
try:
to_unicode = unicode
except NameError:
to_unicode = str
n_classes = 100
def plot_cm(cm, zero_diagonal=False):
"""Plot a confusion matrix."""
n = len(cm)
size = int(n / 4.)
fig = plt.figure(figsize=(size, size), dpi=80, )
plt.clf()
ax = fig.add_subplot(111)
ax.set_aspect(1)
res = ax.imshow(np.array(cm), cmap=plt.cm.viridis,
interpolation='nearest')
width, height = cm.shape
fig.colorbar(res)
plt.savefig('confusion_matrix.png', format='png')
# Load model
model = load_model('cifar100.h5')
# Load validation data
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
test_size=0.20,
random_state=42)
# Calculate confusion matrix
y_val_i = y_val.flatten()
y_val_pred = model.predict(X_val)
y_val_pred_i = y_val_pred.argmax(1)
cm = np.zeros((n_classes, n_classes), dtype=np.int)
for i, j in zip(y_val_i, y_val_pred_i):
cm[i][j] += 1
acc = sum([cm[i][i] for i in range(100)]) / float(cm.sum())
print("Validation accuracy: %0.4f" % acc)
# Create plot
plot_cm(cm)
# Serialize confusion matrix
with io.open('cm.json', 'w', encoding='utf8') as outfile:
str_ = json.dumps(cm.tolist(),
indent=4, sort_keys=True,
separators=(',', ':'), ensure_ascii=False)
outfile.write(to_unicode(str_))
Red herrings
tanh
I've replaced tanh by relu. The history csv looks ok, but the visualization has the same problem:
Please also note that the validation accuracy here is only 3.44%.
Dropout + tanh + border mode
Removing dropout, replacing tanh by relu, setting border mode to same everywhere: history csv
The visualization code still gives a much lower accuracy (8.50% this time) than the keras training code.
Q & A
The following is a summary of the comments:
The data is evenly distributed over the classes. So there is no "over training" of those two classes.
Data augmentation is used, but without data augmentation the problem persists.
The visualization is not the problem.
If you get good accuracy during training and validation, but not when testing, make sure you do exactly the same preprocessing on your dataset in both cases.
Here you have when training:
X_train /= 255
X_val /= 255
X_test /= 255
But no such code when predicting for your confusion matrix. Adding to testing:
X_val /= 255.
Gives the following nice looking confusion matrix:
I don't have a good feeling with this part of the code:
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
The remaining model is full of relus, but here there is a tanh.
tanh sometimes vanishes or explodes (saturates at -1 and 1), which might lead to your 2-class overimportance.
keras-example cifar 10 basically uses the same architecture (dense-layer sizes might be different), but also uses a relu there (no tanh at all). The same goes for this external keras-based cifar 100 code.
One important part of the problem was that my ~/.keras/keras.json was
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
Hence I had to change image_dim_ordering to tf. This leads to
and an accuracy of 12.73%. Obviously, there is still a problem as the validation history gave 45.1% accuracy.
I don't see you doing mean-centering, even in datagen. I suspect this is the main cause. To do mean centering using ImageDataGenerator, set featurewise_center = 1. Another way is to subtract the ImageNet mean from each RGB pixel. The mean vector to be subtracted is [103.939, 116.779, 123.68].
Make all activations relus, unless you have a specific reason to have a single tanh.
Remove two dropouts of 0.25 and see what happens. If you want to apply dropouts to convolution layer, it is better to use SpatialDropout2D. It is somehow removed from Keras online documentation but you can find it in the source.
You have two conv layers with same and two with valid. There is nothing wrong in this, but it would be simpler to keep all conv layers with same and control your size just based on max-poolings.
I have a multiclass classification problem. Say I have a feature matrix:
A B C D
1 -1 1 -6
2 0.5 0 11
7 3.7 1 1
4 -50 1 0
And labels:
LABEL
0
1
2
0
2
I want try to apply convolution kernels along each single feature row with Keras. Say nb_filter=2 and batch_size=3. So I expect input shape for convolution layer to be (3, 4) and output shape to be (3, 3) (as it is applied for AB, BC, CD).
Here what I tried with Keras (v1.2.1, Theano backend):
def CreateModel(input_dim, num_hidden_layers):
from keras.models import Sequential
from keras.layers import Dense, Dropout, Convolution1D, Flatten
model = Sequential()
model.add(Convolution1D(nb_filter=10, filter_length=1, input_shape=(1, input_dim), activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
return model
def OneHotTransformation(y):
from keras.utils import np_utils
return np_utils.to_categorical(y)
X_train = X_train.values.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test = X_test.values.reshape(X_test.shape[0], 1, X_test.shape[1]),
y_train = OneHotTransformation(y_train)
clf = KerasClassifier(build_fn=CreateModel, input_dim=X_train.shape[1], num_hidden_layers=1, nb_epoch=10, batch_size=500)
clf.fit(X_train, y_train)
Shapes:
print X_train.shape
print X_test.shape
print y_train.shape
Output:
(45561, 44)
(11391, 44)
(45561L,)
When I try to run this code I get and exception:
ValueError: Error when checking model target: expected dense_1 to have 3 dimensions, but got array with shape (45561L, 3L)
I tried to reshape y_train:
y_train = y_train.reshape(y_train.shape[0], 1, y_train.shape[1])
This gives me exception:
ValueError: Error when checking model target: expected dense_1 to have 3 dimensions, but got array with shape (136683L, 2L)
Is this approach with Convolution1D correct to achieve my goal?
If #1 is yes, how can I fix my code?
I've already read numerous github issues and some questions (1, 2) here, but it didn't really help.
Thanks.
UPDATE1:
According to Matias Valdenegro comment.
Here are shapes after reshaping 'X' and after onehot encoding for 'y':
print X_train.shape
print X_test.shape
print y_train.shape
Output:
(45561L, 1L, 44L)
(11391L, 1L, 44L)
(45561L, 3L)
UPDATE2: Thanks again to Matias Valdenegro. X reshaping is done after creating model for sure it was a copy paste issue. The code should look like:
def CreateModel(input_dim, num_hidden_layers):
from keras.models import Sequential
from keras.layers import Dense, Dropout, Convolution1D, Flatten
model = Sequential()
model.add(Convolution1D(nb_filter=10, filter_length=1, input_shape=(1, input_dim), activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
return model
def OneHotTransformation(y):
from keras.utils import np_utils
return np_utils.to_categorical(y)
clf = KerasClassifier(build_fn=CreateModel, input_dim=X_train.shape[1], num_hidden_layers=1, nb_epoch=10, batch_size=500)
X_train = X_train.values.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test = X_test.values.reshape(X_test.shape[0], 1, X_test.shape[1]),
y_train = OneHotTransformation(y_train)
clf.fit(X_train, y_train)
The input to a 1D convolution should have dimensions (num_samples, channels, width). So that means you need to reshape X_train and X_test, not y_train:
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
Autoencoder networks seems to be way trickier than normal classifier MLP networks. After several attempts using Lasagne all what I get in the reconstructed output is something that resembles at its best a blurry averaging of all the images of the MNIST database without distinction on what the input digit actually is.
The networks structure I chose are the following cascade layers:
input layer (28x28)
2D convolutional layer, filter size 7x7
Max Pooling layer, size 3x3, stride 2x2
Dense (fully connected) flattening layer, 10 units (this is the bottleneck)
Dense (fully connected) layer, 121 units
Reshaping layer to 11x11
2D convolutional layer, filter size 3x3
2D Upscaling layer factor 2
2D convolutional layer, filter size 3x3
2D Upscaling layer factor 2
2D convolutional layer, filter size 5x5
Feature max pooling (from 31x28x28 to 28x28)
All the 2D convolutional layers have the biases untied, sigmoid activations and 31 filters.
All the fully connected layers have sigmoid activations.
The loss function used is squared error, the updating function is adagrad. The length of the chunk for the learning is 100 samples, multiplied for 1000 epochs.
Just for completeness, the following is the code I used:
import theano.tensor as T
import theano
import sys
sys.path.insert(0,'./Lasagne') # local checkout of Lasagne
import lasagne
from theano import pp
from theano import function
import gzip
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt
def load_mnist():
def load_mnist_images(filename):
with gzip.open(filename, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=16)
# The inputs are vectors now, we reshape them to monochrome 2D images,
# following the shape convention: (examples, channels, rows, columns)
data = data.reshape(-1, 1, 28, 28)
# The inputs come as bytes, we convert them to float32 in range [0,1].
# (Actually to range [0, 255/256], for compatibility to the version
# provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
return data / np.float32(256)
def load_mnist_labels(filename):
# Read the labels in Yann LeCun's binary format.
with gzip.open(filename, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=8)
# The labels are vectors of integers now, that's exactly what we want.
return data
X_train = load_mnist_images('train-images-idx3-ubyte.gz')
y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
return X_train, y_train, X_test, y_test
def plot_filters(conv_layer):
W = conv_layer.get_params()[0]
W_fn = theano.function([],W)
params = W_fn()
ks = np.squeeze(params)
kstack = np.vstack(ks)
plt.imshow(kstack,interpolation='none')
plt.show()
def main():
#theano.config.exception_verbosity="high"
#theano.config.optimizer='None'
X_train, y_train, X_test, y_test = load_mnist()
ohe = OneHotEncoder()
y_train = ohe.fit_transform(np.expand_dims(y_train,1)).toarray()
chunk_len = 100
visamount = 10
num_epochs = 1000
num_filters=31
dropout_p=.0
print "X_train.shape",X_train.shape,"y_train.shape",y_train.shape
input_var = T.tensor4('X')
output_var = T.tensor4('X')
conv_nonlinearity = lasagne.nonlinearities.sigmoid
net = lasagne.layers.InputLayer((chunk_len,1,28,28), input_var)
conv1 = net = lasagne.layers.Conv2DLayer(net,num_filters,(7,7),nonlinearity=conv_nonlinearity,untie_biases=True)
net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(2,2))
net = lasagne.layers.DropoutLayer(net,p=dropout_p)
#conv2_layer = lasagne.layers.Conv2DLayer(dropout_layer,num_filters,(3,3),nonlinearity=conv_nonlinearity)
#pool2_layer = lasagne.layers.MaxPool2DLayer(conv2_layer,(3,3),stride=(2,2))
net = lasagne.layers.DenseLayer(net,10,nonlinearity=lasagne.nonlinearities.sigmoid)
#augment_layer1 = lasagne.layers.DenseLayer(reduction_layer,33,nonlinearity=lasagne.nonlinearities.sigmoid)
net = lasagne.layers.DenseLayer(net,121,nonlinearity=lasagne.nonlinearities.sigmoid)
net = lasagne.layers.ReshapeLayer(net,(chunk_len,1,11,11))
net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
net = lasagne.layers.Upscale2DLayer(net,2)
net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
#pool_after0 = lasagne.layers.MaxPool2DLayer(conv_after1,(3,3),stride=(2,2))
net = lasagne.layers.Upscale2DLayer(net,2)
net = lasagne.layers.DropoutLayer(net,p=dropout_p)
#conv_after2 = lasagne.layers.Conv2DLayer(upscale_layer1,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
#pool_after1 = lasagne.layers.MaxPool2DLayer(conv_after2,(3,3),stride=(1,1))
#upscale_layer2 = lasagne.layers.Upscale2DLayer(pool_after1,4)
net = lasagne.layers.Conv2DLayer(net,num_filters,(5,5),nonlinearity=conv_nonlinearity,untie_biases=True)
net = lasagne.layers.FeaturePoolLayer(net,num_filters,pool_function=theano.tensor.max)
print "output_shape:",lasagne.layers.get_output_shape(net)
params = lasagne.layers.get_all_params(net, trainable=True)
prediction = lasagne.layers.get_output(net)
loss = lasagne.objectives.squared_error(prediction, output_var)
#loss = lasagne.objectives.binary_crossentropy(prediction, output_var)
aggregated_loss = lasagne.objectives.aggregate(loss)
updates = lasagne.updates.adagrad(aggregated_loss,params)
train_fn = theano.function([input_var, output_var], loss, updates=updates)
test_prediction = lasagne.layers.get_output(net, deterministic=True)
predict_fn = theano.function([input_var], test_prediction)
print "starting training..."
for epoch in range(num_epochs):
selected = list(set(np.random.random_integers(0,59999,chunk_len*4)))[:chunk_len]
X_train_sub = X_train[selected,:]
_loss = train_fn(X_train_sub, X_train_sub)
print("Epoch %d: Loss %g" % (epoch + 1, np.sum(_loss) / len(X_train)))
"""
chunk = X_train[0:chunk_len,:,:,:]
result = predict_fn(chunk)
vis1 = np.hstack([chunk[j,0,:,:] for j in range(visamount)])
vis2 = np.hstack([result[j,0,:,:] for j in range(visamount)])
plt.imshow(np.vstack([vis1,vis2]))
plt.show()
"""
print "done."
chunk = X_train[0:chunk_len,:,:,:]
result = predict_fn(chunk)
print "chunk.shape",chunk.shape
print "result.shape",result.shape
plot_filters(conv1)
for i in range(chunk_len/visamount):
vis1 = np.hstack([chunk[i*visamount+j,0,:,:] for j in range(visamount)])
vis2 = np.hstack([result[i*visamount+j,0,:,:] for j in range(visamount)])
plt.imshow(np.vstack([vis1,vis2]))
plt.show()
import ipdb; ipdb.set_trace()
if __name__ == "__main__":
main()
Any ideas on how to improve this network to get a reasonably functioning autoencoder?