Inplace parameter updation without torch.no_grad() - neural-network

I have just started learning this awesome tool called PyTorch but sadly I am stuck in an equivocal situation.
Below is a code snippet from one of the tutorials
with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()
I am kind of confused that even if I will do parameter update without using torch.no_grad() ( i.e. only in-place ) like this:-
# with torch.no_grad()
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()
and since the backward call has already been made in the code above this snippet( not included in the snippet ) which basically means all the “grad” attributes are already computed and don’t require the “original” values again. Then, why is it illegal to do those operations without torch.no_grad()?
I know it will flag off the error in PyTorch but I just wanted to know where my line of thought is at fault?

The questions have been solved by one of the PyTorch Moderator on the PyTorch Discussion forum.
Here's the link to it :- Inplace parameter updation without torch.no_grad()

Related

Julia JUMP Gurobi MIP - query and store best objective and bound at runtime

I am using Gurobi through the JuMP package in Julia to solve a mixed-integer program.
I would like to obtain a graph
like this one, where also a solution based on Python is provided (which was also addressed on
Gurobi community form).
However, I have not found working solutions for Julia calling Gurobi through JuMP.
I understand that callback functions have to be used (such as this suggestion or even the main documentation here), but I do not fully understand how they work and what is essential to achieve my goal.
Any help is much appreciated, as well as a possible description of what the callback function is doing at each step.
If it helps, I am using Gurobi (v.9.0.0), JuMP (v0.20.1), MathOptInterface (v0.9.22) and Julia (v.1.3.0).
You need to use the C API. Here is a translation of Eli's answer on the Gurobi forum:
using JuMP, Gurobi
model = direct_model(Gurobi.Optimizer())
N = 30
#variable(model, x[1:N], Bin)
#constraint(model, rand(N)' * x <= 10)
#objective(model, Max, rand(N)' * x)
data = Any[]
start_time = 0.0
function my_callback_function(cb_data, cb_where::Cint)
#show cb_where
if cb_where == GRB_CB_MIP
objbst = Ref{Cdouble}()
GRBcbget(cb_data, cb_where, GRB_CB_MIP_OBJBST, objbst)
objbnd = Ref{Cdouble}()
GRBcbget(cb_data, cb_where, GRB_CB_MIP_OBJBND, objbnd)
push!(data, (time() - start_time, objbst[], objbnd[]))
end
return
end
MOI.set(model, Gurobi.CallbackFunction(), my_callback_function)
start_time = time()
optimize!(model)
open("data.csv", "w") do io
for x in data
println(io, join(x, ", "))
end
end
p.s. please update to Julia 1.6 and JuMP 0.22. I have not tested whether this works on the older version.

Visualizing an AutoDiff MultibodyPlant in PyDrake

I am trying to build a simple multibody plant system in Drake using the basic DrakeVisualizer. However, for my use case, I also want to be able to automatically track the derivatives through the physics simulation, so am using the AutoDiffXd version of system:
timestep = 1e-3
builder = DiagramBuilder_[AutoDiffXd]()
plant = MultibodyPlant(timestep)
scene_graph = SceneGraph_[AutoDiffXd]()
brick_file = FindResourceOrThrow("drake/examples/manipulation_station/models/061_foam_brick.sdf")
parser = Parser(plant)
brick = parser.AddModelFromFile(brick_file, model_name="brick")
plant.Finalize()
plant_ad = plant.ToAutoDiffXd()
plant_ad.RegisterAsSourceForSceneGraph(scene_graph)
scene_graph.AddRenderer("renderer", MakeRenderEngineVtk(RenderEngineVtkParams()))
DrakeVisualizer.AddToBuilder(builder, scene_graph)
builder.AddSystem(plant_ad)
builder.AddSystem(scene_graph)
builder.Connect(plant_ad.get_geometry_poses_output_port(), scene_graph.get_source_pose_port(plant_ad.get_source_id()))
builder.Connect(scene_graph.get_query_output_port(), plant_ad.get_geometry_query_input_port())
diagram = builder.Build()
context = diagram.CreateDefaultContext()
simulator = Simulator_[AutoDiffXd](diagram, context)
simulator.AdvanceTo(2.0)
However, when I run this, I get the following error:
File "/home/craig/Repos/drake-exps/autoDiffExperiment.py", line 102, in auto_phys
DrakeVisualizer.AddToBuilder(builder, scene_graph)
TypeError: AddToBuilder(): incompatible function arguments. The following argument types are supported:
1. (builder: pydrake.systems.framework.DiagramBuilder_[float], scene_graph: drake::geometry::SceneGraph<double>, lcm: pydrake.lcm.DrakeLcmInterface = None, params: pydrake.geometry.DrakeVisualizerParams = <pydrake.geometry.DrakeVisualizerParams object at 0x7ff6274e14b0>) -> pydrake.geometry.DrakeVisualizer
2. (builder: pydrake.systems.framework.DiagramBuilder_[float], query_object_port: pydrake.systems.framework.OutputPort_[float], lcm: pydrake.lcm.DrakeLcmInterface = None, params: pydrake.geometry.DrakeVisualizerParams = <pydrake.geometry.DrakeVisualizerParams object at 0x7ff627736730>) -> pydrake.geometry.DrakeVisualizer
Invoked with: <pydrake.systems.framework.DiagramBuilder_[AutoDiffXd] object at 0x7ff65654f8f0>, <pydrake.geometry.SceneGraph_[AutoDiffXd] object at 0x7ff656562130>
From this error, it appears the DrakeVisualizer class only accepts systems which use float scalars exlusively. So I am stuck --- either I can go back to floats (but lose the autodiff differentiable simulation functionality I was after in the first place), or continue to use autodiffxd systems (but be completely unable to visualize what is going on in my simulation).
Is there a way to get both that I am missing?
Sorry for the pain and inconvenience. Your description and assessment are all spot on. Most of the visualization mechanisms are float only and, in its current state, attempts to visualizing an AutoDiff diagram will fail.
You have a couple of options (neither of which is appealing):
Go with one of the outcomes you've described above (no vis or no derivatives).
Put in a Drake feature request to be able to attach a visualizer to an AutoDiff diagram.
I can come up with some hacky workarounds (that aren't immediately clear would even work). So, if you're desperate for derivatives and visualization, they could be explored. But, ultimately, the feature request and a formal Drake solution would be the best long-term resolution.
=====================================
Big update. As of #14569, the DrakeVisualizer class is now templated on the scalar type (item 2 in the list above). That has two implications:
You can build an AutoDiffXd-valued diagram with a visualizer in it (as in your example), or
You can create a double-valued diagram and scalar convert it (i.e., diagram.ToAutoDiffXd() into an AutoDiffXd-valued diagram.

Keras infinite loop

The code reads my images from colab folders. then it splits the codes as training set and validation set using generator. I used an existing premodel Dense201 to train it. However I am not sure why, for the the generator remains caught in an infinite loop and the loop that generates the validation data never executes. Does anyone know how to circumvent this ?
import tensorflow as tf
IMAGE_SIZE = 224
BATCH_SIZE = 64
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split=0.2)
train_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='training')
val_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='validation')
base_model = tf.keras.applications.DenseNet201(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(5, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)
In the line:
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)
change steps_per_epoch=100 to steps_per_epoch=(len(train_generator)//BATCH_SIZE)
It finally worked!
!pip uninstall tensorflow
!pip install tensorflow==2.1.0
This issue arises because your validation generator is stuck in an infinite loop unable to exit. While data generator exits due to steps_per_epoch=100 argument you provided you haven't specified how many time the generator must be called until your validation loss is calculated. There's a similar argument that fixes this issue called validation_steps
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator
validation_steps=50)
this way your validation loss will be calculated based on the data your validation generator returns for 50 calls, and it won't get stuck in an infinite loop

How do I create arbitrary parameterized layers in DiffEqFlux.lj neuralODE? Julia Julialang Flux.jl

I am able to create and optimize neuralODEs in julia(1.3 and 1.2) using Flux.jl and DiffEqFlux.jl but it fails under a crucial important general case.
what works:
I can train the Neural net parameters if it is built out of the
provided Flux.jl layers like Dense().
I can include an arbitrary function as a layer in the network chain, e.g. x -> x.*x
What fails:
However if the arbitrary function has parameters I want to train then Flux. Train will not adjust these parameters causing it to fail.
I have tried making these added parameters Tracked and included in the list of parameters given to the training system but it ignores them and they remain unvaried.
The documentation says very cryptically that one can use Flux.#functor on a layer to make sure it's parameters get tracked. However functor was not included in Flux till version 0.10.0 and the only version of Flux compatible with NeuralODEs in DiffEqFlux is 0.9.0
So here's an toy example of a 2 layer neural net I want to use
p = param([1.0])
dudt = chain( x -> p[1]*x.*x, Dense(2,2) )
ps = Flux.params(dudt)
then I use the flux train on this. when I do this the parameter p is not varied, but the parameters in the Dense layer are.
I have tried explicitly including like this
ps = Flux.Params([p,dudt])
but that has the same result and the same problem
I think what I need to do is build a struct with an associted function that implements the
x->p[1]*x*x
then call #functor on this. That struct can then be used in the chain.
But as I noted the version of Flux with #functor is not compatible with DiffEqFlux of any version.
So I need a way to make flux pay attention to my custom parameters, not just the ones in Dense()
How???
I think I get what your question is, but please clarify if I am answering the wrong question here. The issue is that the p is only grabbing from a global reference and thus not differentiated during adjoints. A much better way to handle this in 2020 is to use FastChain. The FastChan interface lets you define layer functions and their parameter dependencies, so this is a nice way to make your neural network incorporate arbitrary functions with parameters. Here's what that looks like:
using DifferentialEquations
using Flux, Zygote
using DiffEqFlux
x = Float32[2.; 0.]
p = Float32[2.0]
tspan = (0.0f0,1.0f0)
mylayer(x,p) = p[1]*x
DiffEqFlux.paramlength(::typeof(mylayer)) = 1
DiffEqFlux.initial_params(::typeof(mylayer)) = rand(Float32,1)
dudt = FastChain(FastDense(2,50,tanh),FastDense(50,2),mylayer)
p = DiffEqFlux.initial_params(dudt)
function f(u,p,t)
dudt(u,p)
end
ex_neural_ode(x,p) = solve(ODEProblem(f,x,tspan,p),Tsit5())
solve(ODEProblem(f,x,tspan,p),Tsit5())
du0,dp = Zygote.gradient((x,p)->sum(ex_neural_ode(x,p)),x,p)
where the last value of p is the one parameter for p in mylayer. Or you can directly use Flux:
using DifferentialEquations
using Flux, Zygote
using DiffEqFlux
x = Float32[2.; 0.]
p2 = Float32[2.0]
tspan = (0.0f0,1.0f0)
dudt = Chain(Dense(2,50,tanh),Dense(50,2))
p,re = Flux.destructure(dudt)
function f(u,p,t)
re(p[1:end-1])(u) |> x-> p[end]*x
end
ex_neural_ode() = solve(ODEProblem(f,x,tspan,[p;p2]),Tsit5())
grads = Zygote.gradient(()->sum(ex_neural_ode()),Flux.params(x,p,p2))
grads[x]
grads[p]
grads[p2]

ROCR library prediction function error

I am using ROCR library and the prediction function for creating ROC curves. I am doing like this (copied from Stack Overflow)
p_Lr <- predict(Model_Lr,newdata=Tst,type="response")
pr_Lr <- prediction(p_Lr, Tst$Survived)
prf_Lr <- performance(pr_Lr, measure = "tpr", x.measure = "fpr")
This works - in the beginning. Suddenly after programming and running various code (I am unfortunately not able to say precisely which code) the line
pr_Lr <- prediction(p_Lr, Tst$Survived)
doesn't work any more and gives following error msg:
Error in nn$covariate : $ operator is invalid for atomic vectors using rocr library prediction
Then if I detach and add the ROCR library like this
detach(package:ROCR)
library(ROCR)
it works again! Anybody have any idea why and what to do?
Using the sos findFn function, it appears that two other packages have a function called prediction: bootPLS and frailtypack. Loading any of these packages after ROCR would mask ROCR's prediction function and prevent performance from working.
By re-attaching ROCR you put its prediction function back in front of the search path.
An alternative solution would be to use ROCR's prediction function explicitly:
p_Lr <- predict(Model_Lr,newdata=Tst,type="response")
pr_Lr <- ROCR::prediction(p_Lr, Tst$Survived)
prf_Lr <- ROCR::performance(pr_Lr, measure = "tpr", x.measure = "fpr")