Incompatible shapes on tensorflow.equal() op for correct predictions evaluation - neural-network

Using the MNIST tutorial of Tensorflow, I try to make a convolutional network for face recognition with the "Database of Faces".
The images size are 112x92, I use 3 more convolutional layer to reduce it to 6 x 5 as adviced here
I'm very new at convolutional network and most of my layer declaration is made by analogy to the Tensorflow MNIST tutorial, it may be a bit clumsy, so feel free to advice me on this.
x_image = tf.reshape(x, [-1, 112, 92, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_conv3 = weight_variable([5, 5, 64, 128])
b_conv3 = bias_variable([128])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
h_pool3 = max_pool_2x2(h_conv3)
W_conv4 = weight_variable([5, 5, 128, 256])
b_conv4 = bias_variable([256])
h_conv4 = tf.nn.relu(conv2d(h_pool3, W_conv4) + b_conv4)
h_pool4 = max_pool_2x2(h_conv4)
W_conv5 = weight_variable([5, 5, 256, 512])
b_conv5 = bias_variable([512])
h_conv5 = tf.nn.relu(conv2d(h_pool4, W_conv5) + b_conv5)
h_pool5 = max_pool_2x2(h_conv5)
W_fc1 = weight_variable([6 * 5 * 512, 1024])
b_fc1 = bias_variable([1024])
h_pool5_flat = tf.reshape(h_pool5, [-1, 6 * 5 * 512])
h_fc1 = tf.nn.relu(tf.matmul(h_pool5_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
print orlfaces.train.num_classes # 40
W_fc2 = weight_variable([1024, orlfaces.train.num_classes])
b_fc2 = bias_variable([orlfaces.train.num_classes])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
My problem appear when the session run the "correct_prediction" op which is
tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
At least I think given the error message:
W tensorflow/core/common_runtime/executor.cc:1027] 0x19369d0 Compute status: Invalid argument: Incompatible shapes: [8] vs. [20]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Traceback (most recent call last):
File "./convolutional.py", line 133, in <module>
train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 405, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2728, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 419, in _do_run
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Incompatible shapes: [8] vs. [20]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Caused by op u'Equal', defined at:
File "./convolutional.py", line 125, in <module>
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 328, in equal
return _op_def_lib.apply_op("Equal", x=x, y=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in __init__
self._traceback = _extract_stack()
It looks like the y_conv output a matrix of shape 8 x batch_size instead of number_of_class x batch_size
If I change the batch size from 20 to 10, the error message stay the same but instead [8] vs. [20] I get [4] vs. [10]. So from that I conclude that the problem may come from the y_conv declaration (last line of the code above).
The loss function, optimizer, training, etc declarations is the same as in the MNIST tutorial:
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run((tf.initialize_all_variables()))
for i in xrange(1000):
batch = orlfaces.train.next_batch(20)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
print "Step %d, training accuracy %g" % (i, train_accuracy)
train_step.run(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 0.5})
print "Test accuracy %g" % accuracy.eval(feed_dict = {x: orlfaces.test.images, y_: orlfaces.test.labels, keep_prob: 1.0})
Thanks for reading, have a good day

Well, after a lot debugging, I found that my issue was due to a bad instantiation of the labels. Instead of creating arrays full of zeros and replace one value by one, I created them with random value! Stupid mistake. In case someone wondering what I did wrong there and how I fix it here is the change I made.
Anyway during all the debugging I made, to find this mistake, I found some useful information to debug this kind of problem:
For the cross entropy declaration, the tensorflow's MNIST tutorial use a formula that can lead to NaN value
This formula is
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
Instead of this, I found two ways to declare it in a safer fashion:
cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y_conv, 1e-10, 1.0)))
or also:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logit, y_))
As mrry says. printing the shape of the tensors can help to detect shape anomaly.
To get the shape of a tensor just call his get_shape() method like this:
print "W shape:", W.get_shape()
user1111929 in this question use a debug print that help me assert where the problem come from.

Related

"ValueError: max_evals=500 is too low for the Permutation explainer" shap answers me do I have to give more data (photos)?

I want to test the explainability of a multiclass semantic segmentation model, deeplab_v3plus with shap to know which features contribute the most to semantic classification. However I have a ValueError: max_evals=500 is too low when running my file, and I struggle to understand the reason.
import glob
from PIL import Image
import torch
from torchvision import transforms
from torchvision.utils import make_grid
import torchvision.transforms.functional as tf
from deeplab import deeplab_v3plus
import shap
def test(args):
# make a video prez
model = deeplab_v3plus('resnet101', num_classes=args.nclass, output_stride=16, pretrained_backbone=True)
model.load_state_dict(torch.load(args.seg_file,map_location=torch.device('cpu'))) # because no gpu available on sandbox environnement
model = model.to(args.device)
model.eval()
explainer = shap.Explainer(model)
with torch.no_grad():
for i, file in enumerate(args.img_folder):
img = img2tensor(file, args)
pred = model(img)
print(explainer(img))
if __name__ == '__main__':
class Arguments:
def __init__(self):
self.device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
self.seg_file = "Model_Woodscape.pth"
self.img_folder = glob.glob("test_img/*.png")
self.mean = [0.485, 0.456, 0.406]
self.std = [0.229, 0.224, 0.225]
self.h, self.w = 483, 640
self.nclass = 10
self.cmap = {
1: [128, 64, 128], # "road",
2: [69, 76, 11], # "lanemarks",
3: [0, 255, 0], # "curb",
4: [220, 20, 60], # "person",
5: [255, 0, 0], # "rider",
6: [0, 0, 142], # "vehicles",
7: [119, 11, 32], # "bicycle",
8: [0, 0, 230], # "motorcycle",
9: [220, 220, 0], # "traffic_sign",
0: [0, 0, 0] # "void"
}
args = Arguments()
test(args)
But it returns:
(dee_env) jovyan#jupyter:~/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+$ python test_shap.py
BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
Traceback (most recent call last):
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/test_shap.py", line 85, in <module>
test(args)
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/test_shap.py", line 37, in test
print(explainer(img))
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/dee_env/lib/python3.9/site-packages/shap/explainers/_permutation.py", line 82, in __call__
return super().__call__(
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/dee_env/lib/python3.9/site-packages/shap/explainers/_explainer.py", line 266, in __call__
row_result = self.explain_row(
File "/home/jovyan/use-cases/Scene_understanding/Code_Woodscape/deeplab_v3+/dee_env/lib/python3.9/site-packages/shap/explainers/_permutation.py", line 164, in explain_row
raise ValueError(f"max_evals={max_evals} is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = {2 * len(inds) + 1}!")
ValueError: max_evals=500 is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = 1854721!
In the source code it looks like it's because I don't give enough arguments. I only have three images in my test_img/* folder, is that why?
I have the same issue. A possible solution I found which seems to be working for my case is to replace this line
explainer = shap.Explainer(model)
With this line
explainer = shap.explainers.Permutation(model, max_evals = 1854721)
shap.Explainer by default has algorithm='auto'. From the documentation: shape.Explainer
By default the “auto” options attempts to make the best choice given
the passed model and masker, but this choice can always be overriden
by passing the name of a specific algorithm.
Since 'permutation' has been selected you can directly use shap.explainers.Permutation and set max_evals to the value suggested in the error message above.
Given the high number of your use case, this might take a really long time. I would suggest to use an easier model just for testing the above solution.

How to fix incorrect channel size in pytorch neural network?

I'm working with the Google utterance dataset in spectrogram form. Each data point has dimension (160, 101). In my data loader, I used batch_size=128. Therefore, each batch has dimension (128, 160, 101).
I use a LeNet model with the following code as the model:
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 30)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
I tried unsqueezing the data with dim=3, but got this error:
Traceback (most recent call last):
File "train_speech.py", line 359, in <module>
train_loss, reg_loss, train_acc, cost = train(epoch)
File "train_speech.py", line 258, in train
outputs = (net(inputs))['out']
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/gdrive/My Drive/Colab Notebooks/mixup_erm-master/models/lenet.py", line 15, in forward
out = F.relu(self.conv1(x))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [6, 1, 5, 5], expected input[128, 160, 101, 1] to have 1 channels, but got 160 channels instead
How do I fix this issue?
EDIT: New Error Message Below
torch.Size([128, 160, 101])
torch.Size([128, 1, 160, 101])
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
File "train_speech.py", line 363, in <module>
train_loss, reg_loss, train_acc, cost = train(epoch)
File "train_speech.py", line 262, in train
outputs = (net(inputs))['out']
IndexError: too many indices for tensor of dimension 2
I'm unsqueezing the data in each batch. The relevant section of my training code is below. inputs is analogous to x.
print(inputs.shape)
inputs = inputs.unsqueeze(1)
print(inputs.shape)
outputs = (net(inputs))['out']
Edit 2: New Error
Traceback (most recent call last):
File "train_speech.py", line 361, in <module>
train_loss, reg_loss, train_acc, cost = train(epoch)
File "train_speech.py", line 270, in train
loss.backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: Function AddmmBackward returned an invalid gradient at index 1 - got [128, 400] but expected shape compatible with [128, 13024]
Edit 3: Train Loop Below
def train(epoch):
print('\nEpoch: %d' % epoch)
net.train()
train_loss = 0
reg_loss = 0
correct = 0
total = 0
for batch_idx, (inputs, targets) in enumerate(trainloader):
if use_cuda:
inputs, targets = inputs.cuda(), targets.cuda()
inputs, targets_a, targets_b, lam,layer, cost = mixup_data(inputs, targets,
args.alpha,args.mixupBatch, use_cuda)
inputs, targets_a, targets_b = map(Variable, (inputs,
targets_a, targets_b))
outputs = net(inputs)
loss = mixup_criterion(criterion, outputs, targets_a, targets_b, lam)
train_loss += loss.item()
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += (lam * predicted.eq(targets_a.data).cpu().sum().float()
+ (1 - lam) * predicted.eq(targets_b.data).cpu().sum().float())
optimizer.zero_grad()
loss.backward()
optimizer.step()
return (train_loss/batch_idx, reg_loss/batch_idx, 100.*correct/total, cost/batch_idx)
You should expand on axis=1 a.k.a. the channel axis:
>>> x = x.unsqueeze(1)
If you're inside the dataset __getitem__, then it corresponds to axis=0.

GPflow, bvh: ValueError: mean must be 1 dimensional

I am having a weird "ValueError: mean must be 1 dimensional" when I am trying to build a Hierarchical GL-LVM model. Basically I'm trying to reproduce this paper: Hierarchical Gaussian Process Latent Variable Models using GPflow.
Therefore I implemented my own new model as follow:
class myGPLVM(gpflow.models.BayesianModel):
def __init__(self, data, latent_data, x_data_mean, kernel):
super().__init__()
print("GPLVM")
self.kernel0 = kernel[0]
self.kernel1 = kernel[1]
self.mean_function = Zero()
self.likelihood0 = gpflow.likelihoods.Gaussian(1.0)
self.likelihood1 = gpflow.likelihoods.Gaussian(1.0)
# make some parameters
self.data = (gpflow.Parameter(x_data_mean), gpflow.Parameter(latent_data), data)
def hierarchy_ll(self):
x, h, y = self.data
K = self.kernel0(x)
num_data = x.shape[0]
k_diag = tf.linalg.diag_part(K)
s_diag = tf.fill([num_data], self.likelihood0.variance)
ks = tf.linalg.set_diag(K, k_diag + s_diag)
L = tf.linalg.cholesky(ks)
m = self.mean_function(x)
return multivariate_normal(h, m, L)
def log_likelihood(self):
"""
Computes the log likelihood.
.. math::
\log p(Y | \theta).
"""
x, h, y = self.data
K = self.kernel1(h)
num_data = h.shape[0]
k_diag = tf.linalg.diag_part(K)
s_diag = tf.fill([num_data], self.likelihood1.variance)
ks = tf.linalg.set_diag(K, k_diag + s_diag)
L = tf.linalg.cholesky(ks)
m = self.mean_function(h)
# [R,] log-likelihoods for each independent dimension of Y
log_prob = multivariate_normal(y, m, L). # <- trows the error!
log_prob_h = self.hierarchy_ll()
log_likelihood = tf.reduce_sum(log_prob) + tf.reduce_sum(log_prob_h)
return log_likelihood
The model seems to work with a toy example:
from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=40, centers=3, n_features=12, random_state=2)
Y = tf.convert_to_tensor(X, dtype=default_float())
but fails and trough me the error when I am trying with a bvh file (the one from the paper actually). I also used Lawrence's code to read my bvh from mocap which I modified to fit python3
Anyway, it's been few a days and I am out of ideas. I tried multiple way to force my mean array "m" to be of one dimensional but nothing worked. I also tried with the "three_phase_oil_flow" dataset from the first GPLVM paper which works as well.
Therefore, I would assume that my model is correct, or at least I got some optimisation going on, and would think that perhaps the bvh reader could be the cause. But the data seems all fine to me... Especially I don't understand why when forcing multivariate function like:
m = np.zeros((np.shape(m)[0], 1))
log_prob = multivariate_normal(y, m, L)
or even with the gpflow Zero function
m = Zero(h)
log_prob = multivariate_normal(y, m, L)
it still trows me the error. Any help will be highly appreciated.
edited thanks to: Artem Artemev
The rest of the code if anyone wants to try to reproduce:
https://github.com/michaelStettler/h-GPLVM
error flow:
(venv) MacBookMichael2:stackOverflow michaelstettler$ python3 HGPLVM.py
(199, 96)
shape Y (199, 3, 38)
2020-01-26 17:00:48.104029: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-26 17:00:48.113609: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f8dd5ff5410 executing computations on platform Host. Devices:
2020-01-26 17:00:48.113627: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
shape Y (199, 38)
Number of points: 199 and Number of dimensions: 38
shape x_mean_latent (199, 8)
shape x_mean_init (199, 2)
HGPLVM
gpr_data (199, 2) (199, 8) (199, 38)
2020-01-26 17:00:48.139003: W tensorflow/python/util/util.cc:299] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
shape m (199, 1)
Traceback (most recent call last):
File "HGPLVM.py", line 131, in <module>
_ = opt.minimize(closure, method="bfgs", variables=model.trainable_variables, options=dict(maxiter=maxiter))
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/gpflow/optimizers/scipy.py", line 60, in minimize
**scipy_kwargs)
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/scipy/optimize/_minimize.py", line 594, in minimize
return _minimize_bfgs(fun, x0, args, jac, callback, **options)
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 998, in _minimize_bfgs
gfk = myfprime(x0)
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 327, in function_wrapper
return function(*(wrapper_args + args))
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 73, in derivative
self(x, *args)
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 65, in __call__
fg = self.fun(x, *args)
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/gpflow/optimizers/scipy.py", line 72, in _eval
loss, grads = _compute_loss_and_gradients(closure, variables)
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/gpflow/optimizers/scipy.py", line 116, in _compute_loss_and_gradients
loss = loss_cb()
File "HGPLVM.py", line 127, in closure
return - model.log_marginal_likelihood()
File "/Users/michaelstettler/PycharmProjects/GPflow/venv/lib/python3.6/site-packages/gpflow/models/model.py", line 45, in log_marginal_likelihood
return self.log_likelihood(*args, **kwargs) + self.log_prior()
File "HGPLVM.py", line 62, in log_likelihood
log_prob = multivariate_normal(y, m, L)
File "mtrand.pyx", line 3729, in numpy.random.mtrand.RandomState.multivariate_normal
ValueError: mean must be 1 dimensional
I would recommend posting a working MWE code. I have tried to use your code snippets, but it gives me errors.
I don't have issues with multivariate_normal function. If you have localised the issue correctly you can debug TF2.0 more thoroughly and find the place that causes that exception. Here is the code which I'm running:
In [2]: from sklearn.datasets.samples_generator import make_blobs
...: X, y = make_blobs(n_samples=40, centers=3, n_features=12, random_state=2)
In [10]: m = np.zeros((np.shape(y)[0], 1))
In [11]: m.shape
Out[11]: (40, 1)
In [12]: y.shape
Out[12]: (40,)
In [13]: L = np.eye(m.shape[0])
In [15]: gpflow.logdensities.multivariate_normal(y, m, L)
Out[15]:
<tf.Tensor: shape=(40,), dtype=float64, numpy=
array([ -56.75754133, ...])>

Callbackfunction modelcheckpoint causes error in keras

I seem to get this error when I am using the callback function modelcheckpoint..
I read from a github issue that the solution would be make use of model.get_weight, but I am implicitly only storing that since i am only storing the one with best weight.
Keras only seem to save weights using h5, which make me question is there any other way to do store them using the eras API, if so how? If not, how do i store it?
Made an example to recreate the problem:
#!/usr/bin/python
import glob, os
import sys
from os import listdir
from os.path import isfile, join
import numpy as np
import warnings
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from keras.utils import np_utils
from keras import metrics
import keras
from keras import backend as K
from keras.models import Sequential
from keras.optimizers import SGD, Adam
from keras.layers.core import Dense, Activation, Lambda, Reshape,Flatten
from keras.layers import Conv1D,Conv2D,MaxPooling2D, MaxPooling1D, Reshape
#from keras.utils.visualize_util import plot
from keras.models import Model
from keras.layers import Input, Dense
from keras.layers.merge import Concatenate, Add
import h5py
import random
import tensorflow as tf
import math
from keras.callbacks import CSVLogger
from keras.callbacks import ModelCheckpoint
if len(sys.argv) < 5:
print "Missing Arguments!"
print "python keras_convolutional_feature_extraction.py <workspace> <totale_frames> <fbank-dim> <window-height> <batch_size>"
print "Example:"
print "python keras_convolutional_feature_extraction.py deltas 15 40 5 100"
sys.exit()
total_frames = int(sys.argv[2])
total_frames_with_deltas = total_frames*3
dim = int(sys.argv[3])
window_height = int(sys.argv[4])
inserted_batch_size = int(sys.argv[5])
stride = 1
splits = ((dim - window_height)+1)/stride
#input_train_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_input"
#output_train_data ="/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_output"
#input_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_input"
#output_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_output"
#train_files =[f for f in listdir(input_train_data) if isfile(join(input_train_data, f))]
#test_files =[f for f in listdir(input_test_data) if isfile(join(input_test_data, f))]
#print len(train_files)
np.random.seed(100)
print "hallo"
def train_generator():
while True:
# input = random.choice(train_files)
# h5f = h5py.File(input_train_data+'/'+input, 'r')
# train_input = h5f['train_input'][:]
# train_output = h5f['train_output'][:]
# h5f.close()
train_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))
train_list_list = []
train_input = train_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))
train_input_list = np.split(train_input,splits*total_frames_with_deltas,axis=1)
for i in range(len(train_input_list)):
train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,window_height,3)
#for i in range(len(train_input_list)):
# train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)
train_output = np.random.randint(5, size = (1,total_frames,5))
middle = int(math.ceil(total_frames/2))
train_output = train_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))
#print train_output.shape
#print len(train_input_list)
#print train_input_list[0].shape
yield (train_input_list, train_output)
print "hallo"
def test_generator():
while True:
# input = random.choice(test_files)
# h5f = h5py.File(input_test_data+'/'+input, 'r')
# test_input = h5f['test_input'][:]
# test_output = h5f['test_output'][:]
# h5f.close()
test_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))
test_input = test_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))
test_input_list = np.split(test_input,splits*total_frames_with_deltas,axis=1)
#test_input_list = np.split(test_input,45,axis=3)
for i in range(len(test_input_list)):
test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,window_height,3)
#for i in range(len(test_input_list)):
# test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)
test_output = np.random.randint(5, size = (1,total_frames,5))
middle = int(math.ceil(total_frames/2))
test_output = test_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))
yield (test_input_list, test_output)
print "hallo"
def fws():
#print "Inside"
# Params:
# batch , lr, decay , momentum, epochs
#
#Input shape: (batch_size,40,45,3)
#output shape: (1,15,50)
# number of unit in conv_feature_map = splitd
next(train_generator())
model_output = []
list_of_input = [Input(shape=(8,3)) for i in range(splits*total_frames_with_deltas)]
output = []
#Conv
skip = total_frames_with_deltas
for steps in range(total_frames_with_deltas):
conv = Conv1D(filters = 100, kernel_size = 8)
column = 0
for _ in range(splits):
#print "column " + str(column) + "steps: " + str(steps)
output.append(conv(list_of_input[(column*skip)+steps]))
column = column + 1
#print len(output)
#print splits*total_frames_with_deltas
conv = []
for section in range(splits):
column = 0
skip = splits
temp = []
for _ in range(total_frames_with_deltas):
temp.append(output[((column*skip)+section)])
column = column + 1
conv.append(Add()(temp))
#print len(conv)
output_conc = Concatenate()(conv)
#print output_conc.get_shape
output_conv = Reshape((splits, -1))(output_conc)
#print output_conv.get_shape
#Pool
pooled = MaxPooling1D(pool_size = 6, strides = 2)(output_conv)
reshape = Reshape((1,-1))(pooled)
#Fc
dense1 = Dense(units = 1024, activation = 'relu', name = "dense_1")(reshape)
#dense2 = Dense(units = 1024, activation = 'relu', name = "dense_2")(dense1)
dense3 = Dense(units = 1024, activation = 'relu', name = "dense_3")(dense1)
final = Dense(units = 5, activation = 'relu', name = "final")(dense3)
model = Model(inputs = list_of_input , outputs = final)
sgd = SGD(lr=0.1, decay=1e-1, momentum=0.9, nesterov=True)
model.compile(loss="categorical_crossentropy", optimizer=sgd , metrics = ['accuracy'])
print "compiled"
model_yaml = model.to_yaml()
with open("model.yaml", "w") as yaml_file:
yaml_file.write(model_yaml)
print "Model saved!"
log= CSVLogger('/home/carl/kaldi-trunk/dnn/experimental/yesno_cnn_50_training_total_frames_'+str(total_frames)+"_dim_"+str(dim)+"_window_height_"+str(window_height)+".csv")
filepath='yesno_cnn_50_training_total_frames_'+str(total_frames)+"_dim_"+str(dim)+"_window_height_"+str(window_height)+"weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_weights_only=True, mode='max')
print "log"
#plot_model(model, to_file='model.png')
print "Fit"
hist_current = model.fit_generator(train_generator(),
steps_per_epoch=444,#len(train_files),
epochs = 10000,
verbose = 1,
validation_data = test_generator(),
validation_steps=44,#len(test_files),
pickle_safe = True,
workers = 4,
callbacks = [log,checkpoint])
fws()
Execute the script by: python name_of_script.py yens 50 40 8 1
which give me a full traceback:
full traceback
Error:
carl#ca-ThinkPad-T420s:~/Dropbox$ python mini.py yesno 50 40 8 1
Using TensorFlow backend.
Couldn't import dot_parser, loading of dot files will not be possible.
hallo
hallo
hallo
compiled
Model saved!
log
Fit
/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:2252: UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
warnings.warn('\n'.join(msg))
Epoch 1/10000
2017-05-26 13:01:45.851125: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 13:01:45.851345: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 13:01:45.851392: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
443/444 [============================>.] - ETA: 4s - loss: 100.1266 - acc: 0.3138Epoch 00000: saving model to yesno_cnn_50_training_total_frames_50_dim_40_window_height_8weights-improvement-00-0.48.hdf5
Traceback (most recent call last):
File "mini.py", line 205, in <module>
File "mini.py", line 203, in fws
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1933, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 77, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 411, in on_epoch_end
self.model.save_weights(filepath, overwrite=True)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2503, in save_weights
save_weights_to_hdf5_group(f, self.layers)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2746, in save_weights_to_hdf5_group
f.attrs['layer_names'] = [layer.name.encode('utf8') for layer in layers]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 93, in __setitem__
self.create(name, data=value, dtype=base.guess_dtype(value))
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 183, in create
attr = h5a.create(self._id, self._e(tempname), htype, space)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
File "h5py/h5a.pyx", line 47, in h5py.h5a.create (/tmp/pip-4rPeHA-build/h5py/h5a.c:1904)
RuntimeError: Unable to create attribute (Object header message is too large)
If you look at the amount of data Keras is trying to save under layer_names attribute (inside the output HDF5 file being create), you will find that it takes more than 64kB.
np.asarray([layer.name.encode('utf8') for layer in model.layers]).nbytes
>> 77100
I quote from https://support.hdfgroup.org/HDF5/faq/limits.html:
Is there an object header limit and how does that affect HDF5 ?
There is a limit (in HDF5-1.8) of the object header, which is 64 KB.
The datatype for a dataset is stored in the object header, so there is
therefore a limit on the size of the datatype that you can have. (See
HDFFV-1089)
The code above was (almost entirely) copied from the traceback:
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2746, in save_weights_to_hdf5_group
f.attrs['layer_names'] = [layer.name.encode('utf8') for layer in layers]
I am using numpy asarray method to get the figure fast but h5py gets similar figure (I guess), see https://github.com/h5py/h5py/blob/master/h5py/_hl/attrs.py#L102 if you want to find exact figure.
Anyway, either you will need to implement your own methods for saving/loading of the weights (or use existing workarounds), or you need to give a really short name to ALL the layers inside your model :), something like this:
list_of_input = [Input(shape=(8,3), name=('i%x' % i)) for i in range(splits*total_frames_with_deltas)]
conv = Conv1D(filters = 100, kernel_size = 8, name='cv%x' % steps)
conv.append(Add(name='add%x' % section)(temp))
output_conc = Concatenate(name='ct')(conv)
output_conv = Reshape((splits, -1), name='rs1')(output_conc)
pooled = MaxPooling1D(pool_size = 6, strides = 2, name='pl')(output_conv)
reshape = Reshape((1,-1), name='rs2')(pooled)
dense1 = Dense(units = 1024, activation = 'relu', name = "d1")(reshape)
dense2 = Dense(units
= 1024, activation = 'relu', name = "d2")(dense1)
dense3 = Dense(units = 1024, activation = 'relu', name = "d3")(dense1)
final = Dense(units = 5, activation = 'relu', name = "fl")(dense3)
You mustn't forget to name all the layers because the (numpy) string array into which the layer names are converted is using the size of the longest string for each individual string in it when it is saved!
After renaming the layers as proposed above (which takes almost 26kB) the model is saved successfully. Hope this elaborate answer helps someone.
Update: I have just made a PR to Keras which should fix the issue without implementing any custom loading/saving methods, see 7508
A simple solution, albeit possibly not the most elegant, could be to run a while loop with epochs = 1.
Get the weights at the end of every epoch together with the accuracy and the loss
Save the weights to file 1 with model.get_weight
if accuracy is greater than at the previous epoch (i.e. loop), store the weights to a different file (file 2)
Run the loop again loading the weights from file 1
Break the loops setting a manual early stopping so that it breaks if the loss does not improve for a certain number of loops
You can use get_weights() together with numpy.save.
It's not the best solution, because it will save several files, but it actually works.
The problem is that you won't have the "optimizer" saved with the current states. But you can perhaps work around that by using smaller learning rates after loading.
Custom callback using numpy.save:
def myCallback(epoch,logs):
global storedLoss
#do your comparisons here using the "logs" var.
print(logs)
if (logs['loss'] < storedLoss):
storedLoss = logs['loss']
for i in range(len(model.layers)):
WandB = model.layers[i].get_weights()
if len (WandB) > 0: #necessary because some layers have no weights
np.save("W" + "-" + str(i), WandB[0],False)
np.save("B" + "-" + str(i), WandB[1],False)
#remember that get and set weights use a list: [weights,biases]
#it may happen (not sure) that there is no bias, and thus you may have to check it (len(WandB)==1).
The logs var brings a dictionary with named metrics, such as "loss", and "accuracy", if you used it.
You can store the losses within the callback in a global var, and compare if each loss is better or worse than the last.
When fitting, use the lambda callback:
from keras.callbacks import LambdaCallback
model.fit(...,callbacks=[LambdaCallback(on_epoch_end=myCallback)])
In the example above, I used the LambdaCallback, which has more possibilities than just on_epoch_end.
For loading, do a similar loop:
#you have to create the model first and then set the layers
def loadModel(model):
for i in range(len(model.layers)):
WandBForCheck = model.layers[i].get_weights()
if len (WandBForCheck) > 0: #necessary because some layers have no weights
W = np.load(Wfile + str(i))
B = np.load(Bfile + str(i))
model.layers[i].set_weights([W,B])
See follow-up at https://github.com/fchollet/keras/issues/6766 and https://github.com/farizrahman4u/keras-contrib/pull/90.
I saw the YAML and the root cause is probably that you have so many Inputs. A few Inputs with many dimensions is preferred to many Inputs, especially if you can use scanning and batch operations to do everything efficiently.
Now, ignoring that entirely, here is how you can save and load your model if it has too much stuff to save as JSON efficiently:
You can pass save_weights_only=True. That won't save optimizer weights, so isn't a great solution.
Just put together a PR for saving model weights and optimizer weights but not configuration. When you want to load, first instantiate and compile the model as you did when you were going to train it, then use load_all_weights to load the model and optimizer weights into that model. I'll try to merge it soon so you can use it from the master branch.
You could use it something like this:
from keras.callbacks import LambdaCallback
from keras_contrib.utils.save_load_utils import save_all_weights, load_all_weights
# do some stuff to create and compile model
# use `save_all_weights` as a callback to checkpoint your model and optimizer weights
model.fit(..., callbacks=[LambdaCallback(on_epoch_end=lambda epoch, logs: save_all_weights(model, "checkpoint-{:05d}.h5".format(epoch))])
# use `load_all_weights` to load model and optimizer weights into an existing model
# if not compiled (no `model.optimizer`), this will just load model weights
load_all_weights(model, 'checkpoint-1337.h5')
So I don't endorse the model, but if you want to get it to save and load anyways this should probably work for you.
As a side note, if you want to save weights in a different format, something like this would work.
pickle.dump([K.get_value(w) for w in model.weights], open( "save.p", "wb" ) )
Cheers
Your model architecture must be too large to be saved.
USE get_weights AND set_weights TO SAVE AND LOAD MODEL, RESPECTIVELY.
Do not use callback model checkpoint. just once the training ends, save its weights with pickle.
Have a look at this link: Unable to save DataFrame to HDF5 ("object header message is too large")

Max pooling indices

I am trying to find the indices 2d max pooling in lasagne
network = batch_norm(Conv2DLayer(
network, num_filters=filter_size, filter_size=(kernel, kernel),pad=pad,
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform(),name="conv"), name="BN")
pool_in = lasagne.layers.get_output(network)
network = MaxPool2DLayer(network, pool_size=(pool_size, pool_size),stride=2,name="pool")
pool_out = lasagne.layers.get_output(network)
ind1 = T.grad(T.sum(pool_out), wrt=pool_in)
When I try to build the model it raises a error
DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Elemwise{mul,no_inplace}.0
Backtrace when the node is created:
File "//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2871, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2975, in run_ast_nodes
if self.run_code(code, result):
File "//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3035, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-28-0b136cc660e2>", line 1, in <module>
network = build_model()
File "<ipython-input-27-20acc3fe0d98>", line 8, in build_model
pool_in = lasagne.layers.get_output(network)
File "//anaconda/lib/python2.7/site-packages/lasagne/layers/helper.py", line 191, in get_output
all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
File "//anaconda/lib/python2.7/site-packages/lasagne/layers/special.py", line 52, in get_output_for
return self.nonlinearity(input)
File "//anaconda/lib/python2.7/site-packages/lasagne/nonlinearities.py", line 157, in rectify
return theano.tensor.nnet.relu(x)
What is the right way of coding functions on lasagne layers intermediate outputs.
I had a similar problem a while ago, check out my solution for 2d and 3d max pooling indices:
Theano max_pool_3d
(Its based on the same Google-groups post, i guess)