Keras infinite loop - tf.keras

The code reads my images from colab folders. then it splits the codes as training set and validation set using generator. I used an existing premodel Dense201 to train it. However I am not sure why, for the the generator remains caught in an infinite loop and the loop that generates the validation data never executes. Does anyone know how to circumvent this ?
import tensorflow as tf
IMAGE_SIZE = 224
BATCH_SIZE = 64
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split=0.2)
train_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='training')
val_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='validation')
base_model = tf.keras.applications.DenseNet201(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(5, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)

In the line:
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)
change steps_per_epoch=100 to steps_per_epoch=(len(train_generator)//BATCH_SIZE)

It finally worked!
!pip uninstall tensorflow
!pip install tensorflow==2.1.0

This issue arises because your validation generator is stuck in an infinite loop unable to exit. While data generator exits due to steps_per_epoch=100 argument you provided you haven't specified how many time the generator must be called until your validation loss is calculated. There's a similar argument that fixes this issue called validation_steps
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator
validation_steps=50)
this way your validation loss will be calculated based on the data your validation generator returns for 50 calls, and it won't get stuck in an infinite loop

Related

K-means in pyspark runing infinitely in jupyter notebook, works fine in zepplin notebook

I am running a k-means algorithm in pyspark:
from pyspark.ml.clustering import KMeans
from pyspark.ml.clustering import KMeansModel
import numpy as np
kmeans_modeling = KMeans(k = 3, seed = 0)
model = kmeans_modeling.fit(data.select("parameters"))
The data is a pyspark sql dataframe: pyspark.sql.dataframe.DataFrame
However, the algorithm is running infinitely (it is taking much, much longer than supposed for the amount of data in the dataframe).
Does anyone know what could be causing the algorithm to behave like this? I ran this exact code for a different dataframe of the same type, and everything worked fine.
The dataset I used before (that worked) had 72020 rows and 35 columns, and the present dataset has 60297 rows and 31 columns, so it is not a size-related problem. The data was normalized in both cases, but I assume the problem has to be in the data treatment. Can anyone help me with this? If any other information is needed let me know in the comments and I will answer or edit the question.
EDIT:
This is what I can show about creating the data:
aux1 = temp.filter("valflag = 0")
sample = spark.read.option("header", "true").option("delimiter", ",").csv("gs://LOCATION.csv").select("id")
data_pre = aux1.join(sample, sample["sample"] == aux1["id"], "leftanti").drop("sample")
data_pre.createOrReplaceTempView("data_pre")
data_pre = spark.table("data_pre")
data_pre = data.withColumn(col, functions.col(col).cast("double"))
data_pre = data_pre.na.fill(0)
data = vectorization_function(df = data_pre, inputCols = inputCols, outputCol = "parameters")
EDIT 2: I cannot provide additional information about the data, but I have now realized that the algorithm runs without problem in a zepplin notebook, but it is not working in a jupyter notebook; I have edited the tags and titel accordingly. Does anyone know why this could be happening?
Here is some documentation about running clustering jobs in Spark.
https://spark.apache.org/docs/latest/ml-clustering.html
Here is another, very similar, idea.
https://spark.apache.org/docs/latest/mllib-clustering.html

tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed. Node number 1 (CONV_2D) failed to prepare. tflite

I am trying to convert a CNN model into tflite model. I converted it successfully, but this error happens when I try to load and run the model.
I am building a flutter app.
It initializes the Tensorflow Lite runtime but then raises this error.
I/tflite (27856): Initialized TensorFlow Lite runtime.
E/flutter (27856): [ERROR:flutter/lib/ui/ui_dart_state.cc(166)] Unhandled Exception: PlatformException(Failed to load model, Internal error: Unexpected failure when preparing tensor allocations: tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed.
E/flutter (27856):
E/flutter (27856): Node number 1 (CONV_2D) failed to prepare.
I think I have figured out the problem.
After spending days trying to solve this problem. I found out that the model I was using to convert was an ImagNet pretrained model which is InceptionV3. The problem is may be there are some layers could not converted.
I used the following and they worked perfectly fine.
MobileNet and MobileNetV2.
NasNet Mobile version.
OR if you are new to deep learning and don't want to train or skip the deep learning part you can use Teachable Machine then convert it easly.
I hope this could help you guys!! Thank you
I ran into the exact same issue the last few days. I tried to load and run a tflite model on Android. I finally figured out how to solve the problem.
I was creating my model using:
model = Xception(include_top=False)
The important part here is include_top=False, together with the default argument input_shape=None.
If you look at the source code of Xception, Inception, MobileNet, or whatever (that you can find here), you will see that at some point before creating the first layer they call
input_shape = imagenet_utils.obtain_input_shape(
input_shape,
default_size=<default_size>,
min_size=<min_size>,
data_format=backend.image_data_format(),
require_flatten=include_top,
weights=weights)
which is implemented here, with the most important part for us being:
if input_shape:
...
else:
if require_flatten:
input_shape = default_shape
else:
if data_format == 'channels_first':
input_shape = (3, None, None)
else:
input_shape = (None, None, 3)
Thus, if I am not mistaken, when we set include_top to False, instead of getting the default shape we end up with undefined number of rows and columns. I am not sure how this is converted to tflite, although there is no error raised during conversion, but it really seems that Android cannot work with that (probably this is equivalent to setting an infinite image size). Hence this error when initializing the interpreter:
BytesRequired number of elements overflowed
When I set the proper input_shape argument in the constructor, i.e.
model = Xception(include_top=False, weights=None, input_shape=(rows, cols, channels))
then the converted model was working fine on Android.
As for why it is initializing correctly with MobileNetV2 in the same situation, i.e. by creating the model like so:
model = MobileNetV2(include_top=False)
I cannot explain...
Hope this brings an answer to your original question.
In fact, this is specified in the documentation, for instance in Xception:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(299, 299, 3)`.
It should have exactly 3 inputs channels,
and width and height should be no smaller than 71.
E.g. `(150, 150, 3)` would be one valid value.
Whilst for MobileNetV2:
input_shape: Optional shape tuple, to be specified if you would
like to use a model with an input image resolution that is not
(224, 224, 3).
It should have exactly 3 inputs channels (224, 224, 3).
You can also omit this option if you would like
to infer input_shape from an input_tensor.
If you choose to include both input_tensor and input_shape then
input_shape will be used if they match, if the shapes
do not match then we will throw an error.
E.g. `(160, 160, 3)` would be one valid value.
Although it is not crystal clear.

Customize metric visualization in MLFlow UI when using mlflow.tensorflow.autolog()

I'm trying to integrate MLFlow to my project. Because I'm using tf.keras.fit_generator() for my training so I take advantage of mlflow.tensorflow.autolog()(docs here) to enable automatic logging of metrics and parameters:
model = Unet()
optimizer = tf.keras.optimizers.Adam(LEARNING_RATE)
metrics = [IOUScore(threshold=0.5), FScore(threshold=0.5)]
model.compile(optimizer, customized_loss, metrics)
callbacks = [
tf.keras.callbacks.ModelCheckpoint("model.h5", save_weights_only=True, save_best_only=True, mode='min'),
tf.keras.callbacks.TensorBoard(log_dir='./logs', profile_batch=0, update_freq='batch'),
]
train_dataset = Dataset(src_dir=SOURCE_DIR)
train_data_loader = DataLoader(train_dataset, BATCH_SIZE, shuffle=True)
with mlflow.start_run():
mlflow.tensorflow.autolog()
mlflow.log_param("batch_size", BATCH_SIZE)
model.fit_generator(
train_data_loader,
steps_per_epoch=len(train_data_loader),
epochs=EPOCHS,
callbacks=callbacks
)
I expected something like this (just a demonstration taken from the docs):
However, after the training finished, this is what I got:
How can I configure so that the metric plot will update and display its value at each epoch instead of just showing the latest value?
After searching around, I found this issue related to my problem above. Actually, all my metrics just logged once each training (instead of each epoch as my intuitive thought). The reason is I didn't specify the every_n_iter parameter in mlflow.tensorflow.autolog(), which indicates how many 'iterations' must pass before MLflow logs metric executed (see the docs). So, changing my code to:
mlflow.tensorflow.autolog(every_n_iter=1)
fixed the problem.
P/s: Remember that in TF 2.x, an 'iteration' is an epoch (in TF 1.x it's a batch).

How to monitor error on a validation set in Chainer framework?

I am kind of new to Chainer and have written a code which trains a simple feed forward neural network. I have a validation set and a train set and want to test on the validation set on each like 500 iterations and if the results are better I want to save my network weights. Can anyone tell me how can I do that?
Here is my code:
optimizer = optimizers.Adam()
optimizer.setup(model)
updater = training.StandardUpdater(train_iter, optimizer, device=0)
trainer = training.Trainer(updater, (10000, 'epoch'), out='result')
trainer.extend(extensions.Evaluator(validation_iter, model, device=0))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'validation/main/loss', 'elapsed_time']))
trainer.run()
Error on validation set
It is reported by Evaluator, and printed by PrintReport. Thus it should be shown with your code above. And to control the frequency of execution of these extentions, you can specify trigger keyword argument in trainer.extend function.
For example, below code specifies printing each 500 iteration.
trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'validation/main/loss', 'elapsed_time']), trigger=(500, 'iteration'))
You can also specify trigger to Evaluator.
Save network weights
You can use snapshot_object extension.
http://docs.chainer.org/en/stable/reference/generated/chainer.training.extensions.snapshot_object.html
It will be invoked every epoch as default.
If you want to invoke it when the loss improves, I think you can set trigger using MinValueTrigger.
http://docs.chainer.org/en/stable/reference/generated/chainer.training.triggers.MinValueTrigger.html

ROCR library prediction function error

I am using ROCR library and the prediction function for creating ROC curves. I am doing like this (copied from Stack Overflow)
p_Lr <- predict(Model_Lr,newdata=Tst,type="response")
pr_Lr <- prediction(p_Lr, Tst$Survived)
prf_Lr <- performance(pr_Lr, measure = "tpr", x.measure = "fpr")
This works - in the beginning. Suddenly after programming and running various code (I am unfortunately not able to say precisely which code) the line
pr_Lr <- prediction(p_Lr, Tst$Survived)
doesn't work any more and gives following error msg:
Error in nn$covariate : $ operator is invalid for atomic vectors using rocr library prediction
Then if I detach and add the ROCR library like this
detach(package:ROCR)
library(ROCR)
it works again! Anybody have any idea why and what to do?
Using the sos findFn function, it appears that two other packages have a function called prediction: bootPLS and frailtypack. Loading any of these packages after ROCR would mask ROCR's prediction function and prevent performance from working.
By re-attaching ROCR you put its prediction function back in front of the search path.
An alternative solution would be to use ROCR's prediction function explicitly:
p_Lr <- predict(Model_Lr,newdata=Tst,type="response")
pr_Lr <- ROCR::prediction(p_Lr, Tst$Survived)
prf_Lr <- ROCR::performance(pr_Lr, measure = "tpr", x.measure = "fpr")