Customize metric visualization in MLFlow UI when using mlflow.tensorflow.autolog()

Customize metric visualization in MLFlow UI when using mlflow.tensorflow.autolog() - tf.keras

I'm trying to integrate MLFlow to my project. Because I'm using tf.keras.fit_generator() for my training so I take advantage of mlflow.tensorflow.autolog()(docs here) to enable automatic logging of metrics and parameters:
model = Unet()
optimizer = tf.keras.optimizers.Adam(LEARNING_RATE)
metrics = [IOUScore(threshold=0.5), FScore(threshold=0.5)]
model.compile(optimizer, customized_loss, metrics)
callbacks = [
tf.keras.callbacks.ModelCheckpoint("model.h5", save_weights_only=True, save_best_only=True, mode='min'),
tf.keras.callbacks.TensorBoard(log_dir='./logs', profile_batch=0, update_freq='batch'),
]
train_dataset = Dataset(src_dir=SOURCE_DIR)
train_data_loader = DataLoader(train_dataset, BATCH_SIZE, shuffle=True)
with mlflow.start_run():
mlflow.tensorflow.autolog()
mlflow.log_param("batch_size", BATCH_SIZE)
model.fit_generator(
train_data_loader,
steps_per_epoch=len(train_data_loader),
epochs=EPOCHS,
callbacks=callbacks
)
I expected something like this (just a demonstration taken from the docs):
However, after the training finished, this is what I got:
How can I configure so that the metric plot will update and display its value at each epoch instead of just showing the latest value?

After searching around, I found this issue related to my problem above. Actually, all my metrics just logged once each training (instead of each epoch as my intuitive thought). The reason is I didn't specify the every_n_iter parameter in mlflow.tensorflow.autolog(), which indicates how many 'iterations' must pass before MLflow logs metric executed (see the docs). So, changing my code to:
mlflow.tensorflow.autolog(every_n_iter=1)
fixed the problem.
P/s: Remember that in TF 2.x, an 'iteration' is an epoch (in TF 1.x it's a batch).

Related

Testing a trading system on bootstrap samples using Arch library in python

I am trying to test a hypothesis on outperformance of a trading strategy over the buy and hold. I have original data's returns containing 1261 observations as a sample to be used for bootstrap.
I want to know if I have applied it correctly.
def back_test_series(x):
df= pd.DataFrame(x, columns= ['Close'])
return df.Close
from arch.bootstrap import CircularBlockBootstrap
bs = CircularBlockBootstrap(40, sample_return)
results = bs.apply(back_test_series, 2500)
Above, sample_return is the sample containing 2761 returns on actual data. I created 2500 bootstrapped samples containing 2761 observations each.
and then created a cummulative return to get price time series.
time_series = []
for simu in results:
df = pd.DataFrame(simu, columns=["Close"])
df['Close'] = (1+df).cumprod()
time_series.append(df)
and finally ran my backtesting in the price series obatained from bootstrap.
final_results = []
for simulation in enumerate(time_series):
x = Backtesting.scrip_backtest(simulation)
final_results.append(x)
Backtesting.scrip_backtest is my trading strategy which will return stats like buy and hold cagr, strategy cagr, std dev of strategy returns.
My question is can I use bootstrap in this way? Should I use MovingBlockBootstrap or CircularBlockBootstrap?
It it correct to run trading strategy on bootstrapped time series as mentioned above?

FMU 2.0 interaction - requires parallel "container" for parameter values etc?

I work with pyfmi in Jupyter notebooks to run simulations and I like to work interactively and evaluate incremental changes in parameters etc. Long time ago I found it necessary to introduce a dictionary that work as a "container" for parameter and initial values. Now I wonder if here is a way to get rid of this "container" that after all is partly a parallel structure to "model"?
A typical workflow look like this:
create a diagram where results from different simulations below should be shown
model = load_fmu(fmu_model)
parDict['model.x_0'] = 1
parDict['model.a'] = 2
for key in parDict.keys(): model.set(key,parDict[key])
sim_res = model.simulate(10)
plot results...
model = load_fmu(fmu_model)
parDict['model.x_0'] = 3
for key in parDict.keys(): model.set(key,parDict[key])
sim_res = model.simulate(10)
plot results...
There is a function model.reset() that brings the state back to default values at compilation without loading again, but you need to do more than the following
model.reset()
parDict['model.x_0'] = 3
for key in parDict.keys(): model.set(key,parDict[key])
sim_res = model.simulate(10)
plot results...
So,  this does NOT work...
and after all parameters and initial values needs to be brought back and we still need parDict, but we may avoid the load-command though.

Keras infinite loop

The code reads my images from colab folders. then it splits the codes as training set and validation set using generator. I used an existing premodel Dense201 to train it. However I am not sure why, for the the generator remains caught in an infinite loop and the loop that generates the validation data never executes. Does anyone know how to circumvent this ?
import tensorflow as tf
IMAGE_SIZE = 224
BATCH_SIZE = 64
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split=0.2)
train_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='training')
val_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='validation')
base_model = tf.keras.applications.DenseNet201(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(5, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)

In the line:
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)
change steps_per_epoch=100 to steps_per_epoch=(len(train_generator)//BATCH_SIZE)

It finally worked!
!pip uninstall tensorflow
!pip install tensorflow==2.1.0

This issue arises because your validation generator is stuck in an infinite loop unable to exit. While data generator exits due to steps_per_epoch=100 argument you provided you haven't specified how many time the generator must be called until your validation loss is calculated. There's a similar argument that fixes this issue called validation_steps
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator
validation_steps=50)
this way your validation loss will be calculated based on the data your validation generator returns for 50 calls, and it won't get stuck in an infinite loop

tensorflow checkpoint missing input tensor node

( please pardon my long post, dearly appreciate your help )
I am training the squeezeDet model for the pascal VOC style custom data as per the training code from the repository HERE
train.py
model_definition and HERE
the saved model checkpoint performs well as I can see acceptable performance.
Now i am trying to freeze the model for deployment using coreML to see how the performance is in a mobile platform. The authors of the script only report performance in a GPU environment in their research paper.
I follow the recommended steps as per tensorflow, my commands are as below
First,
I write the graph out from the checkpoint meta file
path_to_ckpt_meta = rootdir + "model.ckpt-355000.meta"
path_to_ckpt_data = rootdir + "model.ckpt-355000"
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
saver = tf.train.import_meta_graph(path_to_ckpt_meta)
saver.restore(sess, path_to_ckpt_data)
tf.train.write_graph(tf.get_default_graph().as_graph_def(), rootdir, "model_ckpt_355000_graph_V2.pb", False)
Now
I check the graph summary as see all the tensors in the model . The output summary file is HERE.
However, when I check the checkpoint file using the inspect_checkpoint.py function from tensorflow I see no image_input nodes. The output of inspection is HERE.
Second
I freeze the graph using the tensorflow freeze_graph.py function
python ./tensorflow/python/tools/freeze_graph.py \
--input_graph=path-to-dir/train/model_ckpt_355000_graph.pb \
--input_checkpoint=path-to-dir/train/model.ckpt-355000 \
--output_graph=path-to-dir/train/frozen_sqdt_ckpt_355000.pb \
--output_node_names=bbox/trimming/bbox,probability/score,probability/class_idx
the freeze_graph call completes without error and results in the frozen graph as per the command above.
Now,
when I check the frozen graph using the summarize_graph function call
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=/tmp/logs/squeezeDet_NewDataset_test01_March02/train/frozen_sqdt_ckpt_355000.pb
I get the following
No inputs spotted.
No variables spotted.
Found 3 possible outputs: (name=bbox/trimming/bbox, op=Transpose) (name=probability/score, op=Max) (name=probability/class_idx, op=ArgMax)
Found 2703452 (2.70M) const parameters, 0 (0) variable parameters, and 0 control_edges
Op types used: 130 Const, 68 Identity, 32 BiasAdd, 32 Conv2D, 31 Relu, 15 Mul, 14 Add, 10 ConcatV2, 9 Sub, 5 RealDiv, 5 Reshape, 4 Maximum, 4 Minimum, 3 StridedSlice, 3 MaxPool, 2 Exp, 2 Greater, 2 Cast, 2 Select, 1 Transpose, 1 Softmax, 1 Sigmoid, 1 Unpack, 1 RandomUniform, 1 QueueDequeueManyV2, 1 Pack, 1 Max, 1 Floor, 1 FIFOQueueV2, 1 ArgMax
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- --graph=/tmp/logs/squeezeDet_NewDataset_test01_March02/train/frozen_sqdt_ckpt_355000.pb --show_flops --input_layer= --input_layer_type= --input_layer_shape= --output_layer=bbox/trimming/bbox,probability/score,probability/class_idx
this output above suggests that there is no input detected from the frozen graph. I check the summary of the frozen graph and find no image_input tensor. HERE
When I check my original graph ( written in step 1 ) with summarize graph, It does show inputs.
My troubleshooting
Suggests there is some mixup in the original authors code where the image_input is not provided as an input tensor. Though, the confusing part is that I can see the input image tensor in the summary of the output graph from the checkpoint meta file.
My question is,
-- why is the frozen graph removing the input nodes, when the original graph has the inputs ?
-- And, what can I do to change this and be able to successfully freeze_graph correctly.
Is there a transformation that need to perform in order to make this freeze model compatible with the coreML format.?
All your help is much appreciated.
Best
Aman

How to monitor error on a validation set in Chainer framework?

I am kind of new to Chainer and have written a code which trains a simple feed forward neural network. I have a validation set and a train set and want to test on the validation set on each like 500 iterations and if the results are better I want to save my network weights. Can anyone tell me how can I do that?
Here is my code:
optimizer = optimizers.Adam()
optimizer.setup(model)
updater = training.StandardUpdater(train_iter, optimizer, device=0)
trainer = training.Trainer(updater, (10000, 'epoch'), out='result')
trainer.extend(extensions.Evaluator(validation_iter, model, device=0))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'validation/main/loss', 'elapsed_time']))
trainer.run()

Error on validation set
It is reported by Evaluator, and printed by PrintReport. Thus it should be shown with your code above. And to control the frequency of execution of these extentions, you can specify trigger keyword argument in trainer.extend function.
For example, below code specifies printing each 500 iteration.
trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'validation/main/loss', 'elapsed_time']), trigger=(500, 'iteration'))
You can also specify trigger to Evaluator.
Save network weights
You can use snapshot_object extension.
http://docs.chainer.org/en/stable/reference/generated/chainer.training.extensions.snapshot_object.html
It will be invoked every epoch as default.
If you want to invoke it when the loss improves, I think you can set trigger using MinValueTrigger.
http://docs.chainer.org/en/stable/reference/generated/chainer.training.triggers.MinValueTrigger.html

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Customize metric visualization in MLFlow UI when using mlflow.tensorflow.autolog() - tf.keras

Related

Testing a trading system on bootstrap samples using Arch library in python

FMU 2.0 interaction - requires parallel "container" for parameter values etc?

Keras infinite loop

tensorflow checkpoint missing input tensor node

How to monitor error on a validation set in Chainer framework?

Categories

Resources