What is the purpose of the add_loss function in Keras? - neural-network

Currently I stumbled across variational autoencoders and tried to make them work on MNIST using keras. I found a tutorial on github.
My question concerns the following lines of code:
# Build model
vae = Model(x, x_decoded_mean)
# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
# Compile
vae.add_loss(vae_loss)
vae.compile(optimizer='rmsprop')
Why is add_loss used instead of specifying it as compile option? Something like vae.compile(optimizer='rmsprop', loss=vae_loss) does not seem to work and throws the following error:
ValueError: The model cannot be compiled because it has no loss to optimize.
What is the difference between this function and a custom loss function, that I can add as an argument for Model.fit()?
Thanks in advance!
P.S.: I know there are several issues concerning this on github, but most of them were open and uncommented. If this has been resolved already, please share the link!
Edit 1
I removed the line which adds the loss to the model and used the loss argument of the compile function. It looks like this now:
# Build model
vae = Model(x, x_decoded_mean)
# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)
This throws an TypeError:
TypeError: Using a 'tf.Tensor' as a Python 'bool' is not allowed. Use 'if t is not None:' instead of 'if t:' to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
Edit 2
Thanks to #MarioZ's efforts, I was able to figure out a workaround for this.
# Build model
vae = Model(x, x_decoded_mean)
# Calculate custom loss in separate function
def vae_loss(x, x_decoded_mean):
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
return vae_loss
# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)
...
vae.fit(x_train,
x_train, # <-- did not need this previously
shuffle=True,
epochs=epochs,
batch_size=batch_size,
validation_data=(x_test, x_test)) # <-- worked with (x_test, None) before
For some strange reason, I had to explicitly specify y and y_test while fitting the model. Originally, I didn't need to do this. The produced samples seem reasonable to me.
Although I could resolve this, I still don't know what the differences and disadvantages of these two methods are (other than needing a different syntax). Can someone give me more insight?

I'll try to answer the original question of why model.add_loss() is being used instead of specifying a custom loss function to model.compile(loss=...).
All loss functions in Keras always take two parameters y_true and y_pred. Have a look at the definition of the various standard loss functions available in Keras, they all have these two parameters. They are the 'targets' (the Y variable in many textbooks) and the actual output of the model. Most standard loss functions can be written as an expression of these two tensors. But some more complex losses cannot be written in that way. For your VAE example this is the case because the loss function also depends on additional tensors, namely z_log_var and z_mean, which are not available to the loss functions. Using model.add_loss() has no such restriction and allows you to write much more complex losses that depend on many other tensors, but it has the inconvenience of being more dependent on the model, whereas the standard loss functions work with just any model.
(Note: The code proposed in other answers here are somewhat cheating in as much as they just use global variables to sneak in the additional required dependencies. This makes the loss function not a true function in the mathematical sense. I consider this to be much less clean code and I expect it to be more error-prone.)

JIH's answer is right of course but maybe it is useful to add:
model.add_loss() has no restrictions, but it also removes the comfort of using for example targets in the model.fit().
If you have a loss that depends on additional parameters of the model, of other models or external variables, you can still use a Keras type encapsulated loss function by having an encapsulating function where you pass all the additional parameters:
def loss_carrier(extra_param1, extra_param2):
def loss(y_true, y_pred):
#x = complicated math involving extra_param1, extraparam2, y_true, y_pred
#remember to use tensor objects, so for example keras.sum, keras.square, keras.mean
#also remember that if extra_param1, extra_maram2 are variable tensors instead of simple floats,
#you need to have them defined as inputs=(main,extra_param1, extraparam2) in your keras.model instantiation.
#and have them defind as keras.Input or tf.placeholder with the right shape.
return x
return loss
model.compile(optimizer='adam', loss=loss_carrier)
The trick is the last row where you return a function as Keras expects them with just two parameters y_true and y_pred.
Possibly looks more complicated than the model.add_loss version, but the loss stays modular.

I was also wondering about the same query and some related stuff like how to add loss function within the intermediate layers. Here I'm sharing some of the observed information, hope it may help others. It's true that standard keras loss functions only take two arguments, y_true and y_pred. But during the experiment, there can some cases where we need some external parameter or coefficient while computing with these two values (y_true, y_pred). This can be needed at the last layer as usual or somewhere in the middle of the model's layer.
model.add_loss()
The accepted answer correctly said about the model.add_loss() functions. It potentially depends on the layer inputs (tensor). According to the official doc, when writing the call method of a custom layer or a subclassed model, we may want to compute scalar quantities that we want to minimize during training (e.g. regularization losses). We can use the add_loss() layer method to keep track of such loss terms. For instance, activity regularization losses dependent on the inputs passed when calling a layer. Here's an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:
from tensorflow.keras.layers import Layer
class MyActivityRegularizer(Layer):
"""Layer that creates an activity sparsity regularization loss."""
def __init__(self, rate=1e-2):
super(MyActivityRegularizer, self).__init__()
self.rate = rate
def call(self, inputs):
# We use `add_loss` to create a regularization loss
# that depends on the inputs.
self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
return inputs
Loss values added via add_loss can be retrieved in the .losses list property of any Layer or Model (they are recursively retrieved from every underlying layer):
from tensorflow.keras import layers
class SparseMLP(Layer):
"""Stack of Linear layers with a sparsity regularization loss."""
def __init__(self, output_dim):
super(SparseMLP, self).__init__()
self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
self.regularization = MyActivityRegularizer(1e-2)
self.dense_2 = layers.Dense(output_dim)
def call(self, inputs):
x = self.dense_1(inputs)
x = self.regularization(x)
return self.dense_2(x)
mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))
print(mlp.losses) # List containing one float32 scalar
Also note, when using model.fit(), such loss terms are handled automatically. When writing a custom training loop, we should retrieve these terms by hand from model.losses, like this:
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
# Iterate over the batches of a dataset.
for x, y in dataset:
with tf.GradientTape() as tape:
# Forward pass.
logits = model(x)
# Loss value for this batch.
loss_value = loss_fn(y, logits)
# Add extra loss terms to the loss value.
loss_value += sum(model.losses) # < ------------- HERE ---------
# Update the weights of the model to minimize the loss value.
gradients = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
Custom losses
With model.add_loss(), (AFAIK), we can use it somewhere in the middle of the network. Here we no longer bound with only two parameters i.e. y_true, y_pred. But what if we also want to impute external parameter or coefficient to the last layer loss functions of the network. Nric answer is correct. But it can also be implemented by subclassing the tf.keras.losses.Loss class by implementing the following two methods:
__init__(self): accept parameters to pass during the call of your loss function
call(self, y_true, y_pred): use the targets (y_true) and the model predictions (y_pred) to compute the model's loss
Here is an example of a custom MSE by subclassing the tf.keras.losses.Loss class. And here we also no longer bound only two parameters i.e. y_ture, y_pred.
class CustomMSE(keras.losses.Loss):
def __init__(self, regularization_factor=0.1, name="custom_mse"):
super().__init__(name=name)
self.regularization_factor = regularization_factor
def call(self, y_true, y_pred):
mse = tf.math.reduce_mean(tf.square(y_true - y_pred))
reg = tf.math.reduce_mean(tf.square(0.5 - y_pred))
return mse + reg * self.regularization_factor
model.compile(optimizer=..., loss=CustomMSE())

Try this:
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
from scipy import stats
import tensorflow as tf
import seaborn as sns
from pylab import rcParams
from sklearn.model_selection import train_test_split
from keras.models import Model, load_model, Sequential
from keras.layers import Input, Lambda, Dense, Dropout, Layer, Bidirectional, Embedding, Lambda, LSTM, RepeatVector, TimeDistributed, BatchNormalization, Activation, Merge
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras import regularizers
from keras import backend as K
from keras import metrics
from scipy.stats import norm
from keras.utils import to_categorical
from keras import initializers
bias = bias_initializer='zeros'
from keras import objectives
np.random.seed(22)
data1 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')
data2 = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')
data3 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')
#train = np.zeros(shape=(992,54))
#test = np.zeros(shape=(921,54))
train = np.zeros(shape=(300,54))
test = np.zeros(shape=(300,54))
for n, i in enumerate(train):
if (n<=100):
train[n] = data1
elif (n>100 and n<=200):
train[n] = data2
elif(n>200):
train[n] = data3
for n, i in enumerate(test):
if (n<=100):
test[n] = data1
elif(n>100 and n<=200):
test[n] = data2
elif(n>200):
test[n] = data3
batch_size = 5
original_dim = train.shape[1]
intermediate_dim45 = 45
intermediate_dim35 = 35
intermediate_dim25 = 25
intermediate_dim15 = 15
intermediate_dim10 = 10
intermediate_dim5 = 5
latent_dim = 3
epochs = 50
epsilon_std = 1.0
def sampling(args):
z_mean, z_log_var = args
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim), mean=0.,
stddev=epsilon_std)
return z_mean + K.exp(z_log_var / 2) * epsilon
x = Input(shape=(original_dim,), name = 'first_input_mario')
h1 = Dense(intermediate_dim45, activation='relu', name='h1')(x)
hD = Dropout(0.5)(h1)
h2 = Dense(intermediate_dim25, activation='relu', name='h2')(hD)
h3 = Dense(intermediate_dim10, activation='relu', name='h3')(h2)
h = Dense(intermediate_dim5, activation='relu', name='h')(h3) #bilo je relu
h = Dropout(0.1)(h)
z_mean = Dense(latent_dim, activation='relu')(h)
z_log_var = Dense(latent_dim, activation='relu')(h)
z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])
decoder_h = Dense(latent_dim, activation='relu')
decoder_h1 = Dense(intermediate_dim5, activation='relu')
decoder_h2 = Dense(intermediate_dim10, activation='relu')
decoder_h3 = Dense(intermediate_dim25, activation='relu')
decoder_h4 = Dense(intermediate_dim45, activation='relu')
decoder_mean = Dense(original_dim, activation='sigmoid')
h_decoded = decoder_h(z)
h_decoded1 = decoder_h1(h_decoded)
h_decoded2 = decoder_h2(h_decoded1)
h_decoded3 = decoder_h3(h_decoded2)
h_decoded4 = decoder_h4(h_decoded3)
x_decoded_mean = decoder_mean(h_decoded4)
vae = Model(x, x_decoded_mean)
def vae_loss(x, x_decoded_mean):
xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)
kl_loss = -0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var))
loss = xent_loss + kl_loss
return loss
vae.compile(optimizer='rmsprop', loss=vae_loss)
vae.fit(train, train, batch_size = batch_size, epochs=epochs, shuffle=True,
validation_data=(test, test))
vae = Model(x, x_decoded_mean)
encoder = Model(x, z_mean)
decoder_input = Input(shape=(latent_dim,))
_h_decoded = decoder_h (decoder_input)
_h_decoded1 = decoder_h1 (_h_decoded)
_h_decoded2 = decoder_h2 (_h_decoded1)
_h_decoded3 = decoder_h3 (_h_decoded2)
_h_decoded4 = decoder_h4 (_h_decoded3)
_x_decoded_mean = decoder_mean(_h_decoded4)
generator = Model(decoder_input, _x_decoded_mean)
generator.summary()

You need to change the compile row to
vae.compile(optimizer='rmsprop', loss=vae_loss)

Related

Machine Translation FFN : Dimension problem due to window size

this is my first time creating a FFN to train it to translate French to English using word prediction:
Input are two arrays of size 2 x window_size + 1 from source language and window_size target language. And the label of size 1
For e.g for window_size = 2:
["je","mange", "la", "pomme","avec"]
and
["I", "eat"]
So the input of size [5] and [2] after concatenating => 7
Label: "the" (refering to "la" in French)
The label is changed to one-hot-encoding before comparing with yHat
I'm using unique index for each word ( 1 to len(vocab) ) and train using the index (not the words)
The output of the FFN is a probability of the size of the vocab of the target language
The problem is that the FFN doesn't learn and the accuracy stays at 0.
When I print the size of y_final (target probability) and yHat (Model Hypo) they have different dimensions:
yHat.size()=[512, 7, 10212]
with 64 batch_size, 7 is the concatenated input size and 10212 size of target vocab, while
y_final.size()= [512, 10212]
And over all the forward method I have these sizes:
torch.Size([512, 5, 32])
torch.Size([512, 5, 64])
torch.Size([512, 5, 64])
torch.Size([512, 2, 256])
torch.Size([512, 2, 32])
torch.Size([512, 2, 64])
torch.Size([512, 2, 64])
torch.Size([512, 7, 64])
torch.Size([512, 7, 128])
torch.Size([512, 7, 10212])
Since the accuracy augments when yHat = y_final then I thought that it is never the case because they don't even have the same shapes (2D vs 3D). Is this the problem ?
Please refer to the code and if you need any other info please tell me.
The code is working fine, no errors.
trainingData = TensorDataset(encoded_source_windows, encoded_target_windows, encoded_labels)
# print(trainingData)
batchsize = 512
trainingLoader = DataLoader(trainingData, batch_size=batchsize, drop_last=True)
def ffnModel(vocabSize1,vocabSize2, learningRate=0.01):
class ffNetwork(nn.Module):
def __init__(self):
super().__init__()
self.embeds_src = nn.Embedding(vocabSize1, 256)
self.embeds_target = nn.Embedding(vocabSize2, 256)
# input layer
self.inputSource = nn.Linear(256, 32)
self.inputTarget = nn.Linear(256, 32)
# hidden layer 1
self.fc1 = nn.Linear(32, 64)
self.bnormS = nn.BatchNorm1d(5)
self.bnormT = nn.BatchNorm1d(2)
# Layer(s) afer Concatenation:
self.fc2 = nn.Linear(64,128)
self.output = nn.Linear(128, vocabSize2)
self.softmaaax = nn.Softmax(dim=0)
# forward pass
def forward(self, xSource, xTarget):
xSource = self.embeds_src(xSource)
xSource = F.relu(self.inputSource(xSource))
xSource = F.relu(self.fc1(xSource))
xSource = self.bnormS(xSource)
xTarget = self.embeds_target(xTarget)
xTarget = F.relu(self.inputTarget(xTarget))
xTarget = F.relu(self.fc1(xTarget))
xTarget = self.bnormT(xTarget)
xCat = torch.cat((xSource, xTarget), dim=1)#dim=128 or 1 ?
xCat = F.relu(self.fc2(xCat))
print(xCat.size())
xCat = self.softmaaax(self.output(xCat))
return xCat
# creating instance of the class
net = ffNetwork()
# loss function
lossfun = nn.CrossEntropyLoss()
# lossfun = nn.NLLLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learningRate)
return net, lossfun, optimizer
def trainModel(vocabSize1,vocabSize2, learningRate):
# number of epochs
numepochs = 64
# create a new Model instance
net, lossfun, optimizer = ffnModel(vocabSize1,vocabSize2, learningRate)
# initialize losses
losses = torch.zeros(numepochs)
trainAcc = []
# loop over training data batches
batchAcc = []
batchLoss = []
for epochi in range(numepochs):
#Switching on training mode
net.train()
# loop over training data batches
batchAcc = []
batchLoss = []
for A, B, y in tqdm(trainingLoader):
# forward pass and loss
final_y = []
for i in range(y.size(dim=0)):
yy = [0] * target_vocab_length
yy[y[i]] = 1
final_y.append(yy)
final_y = torch.tensor(final_y)
yHat = net(A, B)
loss = lossfun(yHat, final_y)
################
print("\n yHat.size()")
print(yHat.size())
print("final_y.size()")
print(final_y.size())
# backprop
optimizer.zero_grad()
loss.backward()
optimizer.step()
# loss from this batch
batchLoss.append(loss.item())
print(f'batchLoss: {loss.item()}')
#Accuracy calculator:
matches = torch.argmax(yHat) == final_y # booleans (false/true)
matchesNumeric = matches.float() # convert to numbers (0/1)
accuracyPct = 100 * torch.mean(matchesNumeric) # average and x100
batchAcc.append(accuracyPct) # add to list of accuracies
print(f'accuracyPct: {accuracyPct}')
trainAcc.append(np.mean(batchAcc))
losses[epochi] = np.mean(batchLoss)
return trainAcc,losses,net
trainAcc,losses,net = trainModel(len(source_vocab),len(target_vocab), 0.01)
print(trainAcc)

how to understand the origin in vtkImagedata?

I don't know how to understand the origin in vtkImageData. The document says that the origin is the coordinate of (0,0,0) in image. However, I use the vtkImageReslice to get two resliced image, and the origins are different but the images are the same. My code is:
from vtk.util.numpy_support import vtk_to_numpy, numpy_to_vtk
import vtk
import numpy as np
def vtkToNumpy(data):
temp = vtk_to_numpy(data.GetPointData().GetScalars())
dims = data.GetDimensions()
numpy_data = temp.reshape(dims[2], dims[1], dims[0])
numpy_data = numpy_data.transpose(2,1,0)
return numpy_data
def numpyToVTK(data):
flat_data_array = data.transpose(2,1,0).flatten()
vtk_data_array = numpy_to_vtk(flat_data_array)
vtk_data = numpy_to_vtk(num_array=vtk_data_array, deep=True, array_type=vtk.VTK_FLOAT)
img = vtk.vtkImageData()
img.GetPointData().SetScalars(vtk_data)
img.SetDimensions(data.shape)
return img
img = np.zeros(shape=[512,512,120])
img[0:300,0:100,:] = 1
vtkImg = numpyToVTK(img)
reslice = vtk.vtkImageReslice()
reslice.SetInputData(vtkImg)
reslice.SetAutoCropOutput(True)
reslice.SetOutputDimensionality(2)
reslice.SetInterpolationModeToCubic()
reslice.SetSlabNumberOfSlices(1)
reslice.SetOutputSpacing(1.0,1.0,1.0)
axialElement = [
1, 0, 0, 256,
0, 1, 0, 100,
0, 0, 1, 100,
0, 0, 0, 1
]
resliceAxes = vtk.vtkMatrix4x4()
resliceAxes.DeepCopy(axialElement)
reslice.SetResliceAxes(resliceAxes)
reslice.Update()
reslicedImg = reslice.GetOutput()
print('case 1', reslicedImg.GetOrigin())
reslicedNpImg = vtkToNumpy(reslicedImg)
import matplotlib.pyplot as plt
plt.figure()
plt.imshow(reslicedNpImg[:,:,0])
plt.show()
For another axialElement:
axialElement = [
1, 0, 0, 356,
0, 1, 0, 100,
0, 0, 1, 100,
0, 0, 0, 1
]
The two different axialElement would generate the same image, but the origin of image are different. So, I am confused about the origin in vtkImageData.
When you read in the image - you are using the numpy to vtk functions that read and create a vtkImageData object. Do these functions set the origins to be what you expect?
img.SetOrigin() needs to be called and fed the expected origins. Otherwise Origin will be set to a default value (the same for your two images). Printing out the imagedata and seeing if the spacing, origin, direction are what you expect is important.

Accuracy from sess.run(() is returning the value in bytes. How can I change to value?

I am new to CNN and tried to train the CNN model. However when I try to print the accuracies returned from cnn it gives me results in bytes format like b'\n\x11\n\naccuracy_1\x15\x00\x00\x80<'. However when I try to print the values from the loss_train obtained from the same sess.run I get value of 1419.06. Why is this happening.
########################################################################################################################
#IMPORT PACKAGES
import math
import shutil
import pywt
import sys
import random
import numpy as np
import h5py
import pip
import os
from os import system
import tensorflow as tf
from PIL import Image
import matplotlib
import matplotlib.pyplot as plt
import skimage.io as io
import matplotlib.image as mpimg
import time
np.random.seed(1)
slim = tf.contrib.slim
########################################################################################################################
########################################################################################################################
#The FLAGS are used to assign constant values to several paths as well as variables that will be constantly used.
flags = tf.app.flags
flags.DEFINE_string('dataset_dir','E:\\CODING\\CNN_Compressed\\Trial\\Codes\\using_numpy\\NWPU-RESISC45\\NWPU-RESISC45\\','E:\\CODING\\CNN_Compressed\\Trial\\Codes\\using_numpy\\NWPU-RESISC45\\NWPU-RESISC45\\')
flags.DEFINE_float('validation_size', 0.1, 'Float: The proportion of examples in the dataset to be used for validation')
flags.DEFINE_float('test_size', 0.1, 'Float: The proportion of examples in the dataset to be used for test')
flags.DEFINE_integer('num_shards', 1, 'Int: Number of shards to split the TFRecord files into')
flags.DEFINE_integer('random_seed', 0, 'Int: Random seed to use for repeatability.')
flags.DEFINE_string('tfrecord_filename', None, 'String: The output filename to name your TFRecord file')
tf.app.flags.DEFINE_integer('target_image_height', 256, 'train input image height')
tf.app.flags.DEFINE_integer('target_image_width', 256, 'train input image width')
tf.app.flags.DEFINE_integer('batch_size', 128, 'batch size of training.')
tf.app.flags.DEFINE_integer('num_epochs', 30, 'epochs of training.')
tf.app.flags.DEFINE_float('learning_rate', 0.001, 'learning rate of training.')
FLAGS = flags.FLAGS
img_size = 256
num_channels=3
num_classes=45
########################################################################################################################
########################################################################################################################
datapath_train = 'E:\\CODING\\CNN_Compressed\\Trial\\Codes\\using_numpy\\NWPU-RESISC45\\NWPU-RESISC45\\train\\None_train_00000-of-00001.tfrecord'
def _extract_fn(tfrecord):
features={
'image/encoded': tf.FixedLenFeature([], tf.string),
'image/format': tf.FixedLenFeature([], tf.string),
'image/class/label': tf.FixedLenFeature([], tf.int64),
'image/height': tf.FixedLenFeature([], tf.int64),
'image/width': tf.FixedLenFeature([], tf.int64),
'image/channels': tf.FixedLenFeature([],tf.int64)
}
parsed_example = tf.parse_single_example(tfrecord, features)
image_de = tf.io.decode_raw(parsed_example['image/encoded'],tf.uint8)
img_height = tf.cast(parsed_example['image/height'],tf.int32)
img_width = tf.cast(parsed_example['image/width'],tf.int32)
img_channel = tf.cast(parsed_example['image/channels'],tf.int32)
img_shape = tf.stack([img_height,img_width,img_channel])
label = tf.cast(parsed_example['image/class/label'],tf.int64)
image = tf.reshape(image_de,img_shape)
#label = parsed_example['image/class/label']
return image, img_shape, label
########################################################################################################################
#########################################################################################################################
"""
# Pipeline of dataset and iterator
dataset = tf.data.TFRecordDataset(datapath)
# Parse the record into tensors.
dataset = dataset.map(_extract_fn)
# Generate batches
dataset = dataset.batch(1)
# Create a one-shot iterator
iterator = dataset.make_one_shot_iterator()
image, img_shape, label = iterator.get_next()
with tf.Session() as sess:
try:
print(sess.run(img_shape))
image_batch=sess.run(image)
print(image_batch)
img_bas=tf.cast(image_batch,tf.uint8)
plt.imshow(image_batch[0,:,:,:]*255)
plt.show()
except tf.errors.OutOfRangeError:
pass"""
########################################################################################################################
########################################################################################################################
#INITIALIZATION FOR THE CNN ARCHITECTURE
filter_size_conv1 = [5,5]
num_filters_conv1 = 32
filter_shape_pool1 = [2,2]
filter_size_conv2 = [3,3]
num_filters_conv2 = 64
filter_shape_pool2 = [2,2]
#PLACEHOLDERS
x = tf.placeholder(tf.float32, shape = [None, img_size,img_size,num_channels], name='x')
y = tf.placeholder(tf.int32, shape= [None], name = 'ytrue') #Output data placeholder
y_one_hot = tf.one_hot(y,45)
y_true_cls = tf.argmax(y_one_hot, dimension=1)
########################################################################################################################
########################################################################################################################
def new_conv_layer(input, num_input_channels, filter_size, num_filters, name):
with tf.variable_scope(name) as scope:
# Shape of the filter-weights for the convolution
shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights (filters) with the given shape
weights = tf.Variable(tf.truncated_normal(shape, stddev=0.05))
# Create new biases, one for each filter
biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))
# TensorFlow operation for convolution
layer = tf.nn.conv2d(input=input, filter=weights, strides=[1, 1, 1, 1], padding='SAME')
# Add the biases to the results of the convolution.
layer += biases
return layer, weights
def new_pool_layer(input, name):
with tf.variable_scope(name) as scope:
# TensorFlow operation for convolution
layer = tf.nn.max_pool(value=input, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
return layer
def new_relu_layer(input, name):
with tf.variable_scope(name) as scope:
# TensorFlow operation for convolution
layer = tf.nn.relu(input)
return layer
def new_fc_layer(input, num_inputs, num_outputs, name):
with tf.variable_scope(name) as scope:
# Create new weights and biases.
weights = tf.Variable(tf.truncated_normal([num_inputs, num_outputs], stddev=0.05))
biases = tf.Variable(tf.constant(0.05, shape=[num_outputs]))
# Multiply the input and weights, and then add the bias-values.
layer = tf.matmul(input, weights) + biases
return layer
# CONVOLUTIONAL LAYER 1
layer_conv1, weights_conv1 = new_conv_layer(input=x, num_input_channels=3, filter_size=5, num_filters=32, name ="conv1")
# Pooling Layer 1
layer_pool1 = new_pool_layer(layer_conv1, name="pool1")
# RelU layer 1
layer_relu1 = new_relu_layer(layer_pool1, name="relu1")
# CONVOLUTIONAL LAYER 2
layer_conv2, weights_conv2 = new_conv_layer(input=layer_relu1, num_input_channels=32, filter_size=5, num_filters=64, name= "conv2")
# Pooling Layer 2
layer_pool2 = new_pool_layer(layer_conv2, name="pool2")
# RelU layer 2
layer_relu2 = new_relu_layer(layer_pool2, name="relu2")
# FLATTEN LAYER
num_features = layer_relu2.get_shape()[1:4].num_elements()
layer_flat = tf.reshape(layer_relu2, [-1, num_features])
# FULLY-CONNECTED LAYER 1
layer_fc1 = new_fc_layer(layer_flat, num_inputs=num_features, num_outputs=1000, name="fc1")
# RelU layer 3
layer_relu3 = new_relu_layer(layer_fc1, name="relu3")
# FULLY-CONNECTED LAYER 2
layer_fc2 = new_fc_layer(input=layer_relu3, num_inputs=1000, num_outputs=45, name="fc2")
# Use Softmax function to normalize the output
with tf.variable_scope("Softmax"):
y_pred = tf.nn.softmax(layer_fc2)
y_pred_cls = tf.argmax(y_pred, dimension=1)
# Use Cross entropy cost function
with tf.name_scope("cross_ent"):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=layer_fc2, labels=y_one_hot)
cost = tf.reduce_mean(cross_entropy)
# Use Adam Optimizer
with tf.name_scope("optimizer"):
optimizer = tf.train.AdamOptimizer(learning_rate = 1e-4).minimize(cost)
# Accuracy
with tf.name_scope("accuracy"):
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# setup the initialisation operator
init_op = tf.global_variables_initializer()
# Pipeline of dataset and iterator
dataset_train = tf.data.TFRecordDataset(datapath_train)
# Parse the record into tensors.
dataset_train = dataset_train.map(_extract_fn)
# Generate batches
dataset_train = dataset_train.batch(FLAGS.batch_size)
iterator_train = dataset_train.make_initializable_iterator()
next_element_train = iterator_train.get_next()
print('\n Starting the CNN train')
# Initialize the FileWriter
writer_train = tf.summary.FileWriter("Training_FileWriter/")
writer_val = tf.summary.FileWriter("Validation_FileWriter/")
#summary
accuracy = tf.summary.scalar("accuracy", accuracy)
loss = tf.summary.scalar("loss", cost)
# Merge all summaries together
merged_summary = tf.summary.merge_all()
#PERFORM THE CNN OPERATIONS
with tf.Session() as sess:
sess.run(init_op)
sess.run(iterator_train.initializer)
# Add the model graph to TensorBoard
writer_train.add_graph(sess.graph)
writer_val.add_graph(sess.graph)
# Loop over number of epochs
print('\nTraining')
for epoch in range(FLAGS.num_epochs):
sess.run(iterator_train.initializer)
start_time = time.time()
train_accuracy = 0
validation_accuracy = 0
acc_train_avg = 0
val_acc_avg = 0
for batch in range(0, int(25200/FLAGS.batch_size)):
img_train, shp_train, lbl_train = sess.run(next_element_train)
_, loss_train, acc_train, acc_summ = sess.run([optimizer, cost, accuracy, merged_summary], feed_dict = {x: img_train, y: lbl_train})
print(loss_train)
print(acc_train)
train_accuracy+=acc_train
end_time = time.time()
#acc_train_avg = (train_accuracy/(int(25200/FLAGS.batch_size)))
#TRAINING
print("Epoch "+str(epoch+1)+" completed : Time usage "+str(int(end_time-start_time))+" seconds")
print("\tAccuracy:")
print("\t- Training Loss:\t{}", loss_train)
print ("\t- Training Accuracy:\t{}",acc_train)
writer_train.add_summary(acc_summ,epoch+1)
#######################################################################################################################
The error is obtained as
Training
1427.1069
b'\n\x11\n\naccuracy_1\x15\x00\x00\x80<'
Traceback (most recent call last):
File "train_trial.py", line 302, in <module>
train_accuracy+=acc_train
TypeError: unsupported operand type(s) for +=: 'int' and 'bytes'
You are overwriting your loss and accuracy operations here:
accuracy = tf.summary.scalar("accuracy", accuracy)
loss = tf.summary.scalar("loss", cost)
Then when you run accuracy you get the protobuf bytes of the summary, instead of just running the op. You should rename these variables to prevent overwriting/name clashes.

One-hot vector prediction always returns the same value

My deep neural network returns the same output for every input. I tried (with no luck) different variations of:
loss
optimizer
network topology / layers types
number of epochs (1-100)
I have 3 outputs (one-hot) and for every input output they are like (it changes after every training):
4.701869785785675049e-01 4.793547391891479492e-01 2.381391078233718872e-01
This problem happens probably because of highly random nature of my training data (stock prediction).
The data set is also heavily shifted towards one of the answers (that's why I used sample_weight - calculated proportionally).
I think I can rule out overfitting (it happens even for 1 epoch and I have dropout layers).
One of the examples of my network:
xs_conv = xs.reshape(xs.shape[0], xs.shape[1], 1)
model_conv = Sequential()
model_conv.add(Conv1D(128, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Conv1D(64, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Flatten())
model_conv.add(Dense(128, activation='relu'))
model_conv.add(Dropout(0.4))
model_conv.add(Dense(3, activation='sigmoid'))
model_conv.compile(loss='mean_squared_error', optimizer='nadam', metrics=['accuracy'])
model_conv.fit(xs_conv, ys, epochs=10, batch_size=16, sample_weight=sample_weight, validation_split=0.3, shuffle=True)
I would understand if the outputs were random, but what happens seems very peculiar. Any ideas?
Data: computed.csv
Whole code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Input, Dense, Conv1D, Dropout, MaxPooling1D, Flatten
from keras.models import Model, Sequential
from keras import backend as K
import random
DATA_DIR = '../../Data/'
INPUT_DATA_FILE = DATA_DIR + 'computed.csv'
def get_y(row):
profit = 0.010
hot_one = [0,0,0]
hot_one[0] = int(row.close_future_5 >= profit)
hot_one[1] = int(row.close_future_5 <= -profit)
hot_one[2] = int(row.close_future_5 < profit and row.close_future_10 > -profit)
return hot_one
def rolling_window(window, arr):
return [np.array(arr[i:i+window]).transpose().flatten().tolist() for i in range(0, len(arr))][0:-window+1]
def prepare_data(data, widnow, test_split):
xs1 = data.iloc[:,1:26].as_matrix()
ys1 = [get_y(row) for row in data.to_records()]
xs = np.array(rolling_window(window, xs1)).tolist()
ys = ys1[0:-window+1]
zipped = list(zip(xs, ys))
random.shuffle(zipped)
train_size = int((1.0 - test_split) * len(data))
xs, ys = zip(*zipped[0:train_size])
xs_test, ys_test = zip(*zipped[train_size:])
return np.array(xs), np.array(ys), np.array(xs_test), np.array(ys_test)
def get_sample_weight(y):
if(y[0]): return ups_w
elif(y[1]): return downs_w
else: return flats_w
data = pd.read_csv(INPUT_DATA_FILE)
window = 30
test_split = .9
xs, ys, xs_test, ys_test = prepare_data(data, window, test_split)
ups_cnt = sum(y[0] for y in ys)
downs_cnt = sum(y[1] for y in ys)
flats_cnt = sum(y[0] == False and y[1] == False for y in ys)
total_cnt = ups_cnt + downs_cnt + flats_cnt
ups_w = total_cnt/ups_cnt
downs_w = total_cnt/downs_cnt
flats_w = total_cnt/flats_cnt
sample_weight = np.array([get_sample_weight(y) for y in ys])
_, input_columns = xs.shape
xs_conv = xs.reshape(xs.shape[0], xs.shape[1], 1)
model_conv = Sequential()
model_conv.add(Conv1D(128, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Conv1D(64, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Flatten())
model_conv.add(Dense(128, activation='relu'))
model_conv.add(Dropout(0.4))
model_conv.add(Dense(3, activation='sigmoid'))
model_conv.compile(loss='mean_squared_error', optimizer='nadam', metrics=['accuracy'])
model_conv.fit(xs_conv, ys, epochs=1, batch_size=16, sample_weight=sample_weight, validation_split=0.3, shuffle=True)
xs_test_conv = xs_test.reshape(xs_test.shape[0], xs_test.shape[1], 1)
res = model_conv.predict(xs_test_conv)
plotdata = pd.concat([pd.DataFrame(res, columns=['res_up','res_down','res_flat']), pd.DataFrame(ys_test, columns=['ys_up','ys_down','y_flat'])], axis = 1)
plotdata[['res_up', 'ys_up']][3000:3500].plot(figsize=(20,4))
plotdata[['res_down', 'ys_down']][3000:3500].plot(figsize=(20,4))
I have run your model with the attached data and so far can say that the biggest problem is lack of data cleaning.
For instance, there's a inf value in .csv at line 623. After I've filtered them all out with
xs1 = xs1[np.isfinite(xs1).all(axis=1)]
... I collected some statistics over xs, namely min, max and mean. They turned out pretty remarkable:
-43.0049723138
32832.3333333 # !!!
0.213126234391
On average, the values are close to 0, but some are 6 orders of magnitude higher. These particular rows definitely hurt the neural network, so you should either filter them as well or come up with a clever way to normalize the features.
But even with them, the model ended up with 71-79% validation accuracy. The result distribution is a bit skewed towards the 3rd class, but in general pretty diverse to name it peculiar: 19% for class 1, 7% for class 2, 73% for class 3. Example test output:
[[ 1.93120316e-02 4.47684433e-04 9.97518778e-01]
[ 1.40607255e-02 2.45630667e-02 9.74113524e-01]
[ 3.07740629e-01 4.80920941e-01 2.28664145e-01]
...,
[ 5.72797097e-02 9.45571139e-02 8.07634115e-01]
[ 1.05512664e-01 8.99530351e-02 6.70437515e-01]
[ 5.24505274e-03 1.46622911e-01 9.42657173e-01]]

scipy transferfunction vs state space

I have a LTI system which I am modeling using scipy.signals. But I get different results when using TransferFunction or StateSpace.
Besides the magnitudes for both the bode plot and the step response being different, the StateSpace representation add two more zeros on the LTI system I know are not there.
I know for a fact the two descriptions should be equivalent (at least I think I do), because I re did the math several times.
Could someone please help me explain what is happening?
the for the transfer function:
from params import *
numeratorthetaact = [Kt*Jl/(L*Jl*Ja), Kt*(betal+betac)/(L*Jl*Ja), Kt*kc/(L*Jl*Ja)]
denominatorthetaact = [1.0,
(L*betac*(Ja+Jl))/(Jl*Ja) + R/L,
(Ja+Jl)*(L*kc+R*betac)/(Jl*Ja*L) + (Kt*Kb)/(Ja*L),
(R*kc*(Ja+Jl))/(L*Jl*Ja) + (betac*Kt*Kb)/(L*Jl*Ja),
(kc*Kt*Kb)/(L*Jl*Ja),
0.0]
tfact = TransferFunction(numeratorthetaact, denominatorthetaact)
sysact = lti(tfact.num, tfact.den)
print "Zeros: ", sysact.zeros
print "Poles: ", sysact.poles
t, swact = step(sysact, T = time)
freqrad = numpy.multiply(frequency, 2.0*numpy.pi)
wrad, magact, phaseact = sysact.bode(w=freqrad)
whz = numpy.multiply(wrad, 1.0/(numpy.pi*2.0))
p.subplot(3,1,1)
p.plot(t, swact, label="Step Galvo")
p.legend()
p.subplot(3,1,2)
p.semilogx(whz, magact, label="Freq Galvo")
p.legend()
p.grid()
p.subplot(3,1,3)
p.semilogx(whz, phaseact, label="Phase Galvo")
p.legend()
p.grid()
p.show()
The code for StateSpace
from params import *
matrixA = numpy.array([[-(betaa+betac)/Ja, -kc/Ja, betac/Ja, kc/Ja, Kb/Ja],
[1.0, 0, 0, 0, 0],
[betac/Jl, kc/Jl, -(betal+betac)/Jl, -kc/Jl, 0],
[0, 0, 1.0, 0, 0],
[-Kb/L, 0, 0, 0, -R/L]])
matrixB = numpy.array([[0],
[0],
[0],
[0],
[1.0/L]])
matrixC = numpy.array([[0, 1.0, 0, 0, 0]])
matrixD = 0.0
ssact = StateSpace(matrixA, matrixB, matrixC, matrixD)
sysact = lti(ssact.A, ssact.B, ssact.C, ssact.D)
print "Zeros: ", sysact.zeros
print "Poles: ", sysact.poles
t, swact = step(sysact, T = time)
freqrad = numpy.multiply(frequency, 2.0*numpy.pi)
wrad, magact, phaseact = sysact.bode(w=freqrad)
whz = numpy.multiply(wrad, 1.0/(numpy.pi*2.0))
p.subplot(3,1,1)
p.plot(t, swact, label="Step Galvo")
p.legend()
p.subplot(3,1,2)
p.semilogx(whz, magact, label="Freq Galvo")
p.legend()
p.grid()
p.subplot(3,1,3)
p.semilogx(whz, phaseact, label="Phase Galvo")
p.legend()
p.grid()
p.show()
The resulting plots:
StateSpace vs TransferFunction
It is also worth mentioning that i get a BadCoefficients: Badly conditioned filter coefficients (numerator): the results may be meaningless "results may be meaningless", BadCoefficients) error when running the StateSpace code.
Thank you