updating subset of parameters in dynet - neural-network

Is there a way to update a subset of parameters in dynet? For instance in the following toy example, first update h1, then h2:
model = ParameterCollection()
h1 = model.add_parameters((hidden_units, dims))
h2 = model.add_parameters((hidden_units, dims))
...
for x in trainset:
...
loss.scalar_value()
loss.backward()
trainer.update(h1)
renew_cg()
for x in trainset:
...
loss.scalar_value()
loss.backward()
trainer.update(h2)
renew_cg()
I know that update_subset interface exists for this and works based on the given parameter indexes. But then it is not documented anywhere how we can get the parameter indexes in dynet Python.

A solution is to use the flag update = False when creating expressions for parameters (including lookup parameters):
import dynet as dy
import numpy as np
model = dy.Model()
pW = model.add_parameters((2, 4))
pb = model.add_parameters(2)
trainer = dy.SimpleSGDTrainer(model)
def step(update_b):
dy.renew_cg()
x = dy.inputTensor(np.ones(4))
W = pW.expr()
# update b?
b = pb.expr(update = update_b)
loss = dy.pickneglogsoftmax(W * x + b, 0)
loss.backward()
trainer.update()
# dy.renew_cg()
print(pb.as_array())
print(pW.as_array())
step(True)
print(pb.as_array()) # b updated
print(pW.as_array())
step(False)
print(pb.as_array()) # b not updated
print(pW.as_array())
For update_subset, I would guess that the indices are the integers suffixed at the end of parameter names (.name()).
In the doc, we are supposed to use a get_index function.
Another option is: dy.nobackprop() which prevents the gradient to propagate beyond a certain node in the graph.
And yet another option is to zero the gradient of the parameter that do not need to be updated (.scale_gradient(0)).
These methods are equivalent to zeroing the gradient before the update. So, the parameter will still be updated if the optimizer uses its momentum from previous training steps (MomentumSGDTrainer, AdamTrainer, ...).

Related

Reshape of Inducing Variables - GPflow

I have an SGPR model:
import numpy as np
import gpflow
X, Y = np.random.randn(50, 2), np.random.randn(50, 1)
Z1 = np.random.randn(13, 2)
k = gpflow.kernels.SquaredExponential()
m = gpflow.models.SGPR(data=(X, Y), kernel=k, inducing_variable=Z1)
And I would like to assign inducing variable but with different shape, like:
Z2 = np.random.randn(29, 2)
m.inducing_variable.Z.assign(Z2)
But if I do it, I got:
ValueError: Shapes (13, 2) and (29, 2) are incompatible
is there a way to reassign the inducing variables without redefining the model?
Context: Instead of optimizing the model with the inducing variables, I would like to optimize the model without optimizing the inducing variables, manually reassigning the inducing variables at each step of the optimization.
UPDATE: This issue is resolved by https://github.com/GPflow/GPflow/pull/1594, which will become part of the next GPflow patch release (2.1.4).
With that fix, you don't need a custom class. All you need to do is explicitly set the static shape with None along the first dimension:
inducing_variable = gpflow.inducing_variables.InducingPoints(
tf.Variable(
Z1, # initial value
trainable=False, # True does not work - see Note below
shape=(None, Z1.shape[1]), # or even tf.TensorShape(None)
dtype=gpflow.default_float(), # required due to tf's 32bit default
)
)
m = gpflow.models.SGPR(data=(X, Y), kernel=k, inducing_variable=inducing_variable)
Then m.inducing_variable.Z.assign(Z2) should work just fine.
Note that in this case Z cannot be trainable, as the TensorFlow optimizers need to know the shape at construction time and don't support dynamic shapes.
Right now (as of GPflow 2.1.2) there is no built-in way to change the shape of inducing variables for SGPR, though it is in principle possible. You can get what you want with your own inducing variable class though:
class VariableInducingPoints(gpflow.inducing_variables.InducingPoints):
def __init__(self, Z, name=None):
super().__init__(Z, name=name)
# overwrite with Variable with None as first element in shape so
# we can assign arrays with arbitrary length along this dimension:
self.Z = tf.Variable(Z, dtype=gpflow.default_float(),
shape=(None, Z.shape[1])
)
def __len__(self):
return tf.shape(self.Z)[0] # dynamic shape
# instead of the static shape returned by the InducingPoints parent class
and then do
m = gpflow.models.SGPR(
data=(X, Y), kernel=k, inducing_variable=VariableInducingPoints(Z1)
)
instead. Then your m.inducing_variable.Z.assign() should work as you like it.
(For SVGP, the size of the inducing variable and the distribution defined by q_mu and q_sqrt has to match, as well as be known at construction time, so in this case changing the number of inducing variables is less trivial.)

Can operations on a numpy.memmap be deferred?

Consider this example:
import numpy as np
a = np.array(1)
np.save("a.npy", a)
a = np.load("a.npy", mmap_mode='r')
print(type(a))
b = a + 2
print(type(b))
which outputs
<class 'numpy.core.memmap.memmap'>
<class 'numpy.int32'>
So it seems that b is not a memmap any more, and I assume that this forces numpy to read the whole a.npy, defeating the purpose of the memmap. Hence my question, can operations on memmaps be deferred until access time?
I believe subclassing ndarray or memmap could work, but don't feel confident enough about my Python skills to try it.
Here is an extended example showing my problem:
import numpy as np
# create 8 GB file
# np.save("memmap.npy", np.empty([1000000000]))
# I want to print the first value using f and memmaps
def f(value):
print(value[1])
# this is fast: f receives a memmap
a = np.load("memmap.npy", mmap_mode='r')
print("a = ")
f(a)
# this is slow: b has to be read completely; converted into an array
b = np.load("memmap.npy", mmap_mode='r')
print("b + 1 = ")
f(b + 1)
Here's a simple example of an ndarray subclass that defers operations on it until a specific element is requested by indexing.
I'm including this to show that it can be done, but it almost certainly will fail in novel and unexpected ways, and require substantial work to make it usable.
For a very specific case it may be easier than redesigning your code to solve the problem in a better way.
I'd recommend reading over these examples from the docs to help understand how it works.
import numpy as np
class Defered(np.ndarray):
"""
An array class that deferrs calculations applied to it, only
calculating them when an index is requested
"""
def __new__(cls, arr):
arr = np.asanyarray(arr).view(cls)
arr.toApply = []
return arr
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
## Convert all arguments to ndarray, otherwise arguments
# of type Defered will cause infinite recursion
# also store self as None, to be replaced later on
newinputs = []
for i in inputs:
if i is self:
newinputs.append(None)
elif isinstance(i, np.ndarray):
newinputs.append(i.view(np.ndarray))
else:
newinputs.append(i)
## Store function to apply and necessary arguments
self.toApply.append((ufunc, method, newinputs, kwargs))
return self
def __getitem__(self, idx):
## Get index and convert to regular array
sub = self.view(np.ndarray).__getitem__(idx)
## Apply stored actions
for ufunc, method, inputs, kwargs in self.toApply:
inputs = [i if i is not None else sub for i in inputs]
sub = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
return sub
This will fail if modifications are made to it that don't use numpy's universal functions. For instance percentile and median aren't based on ufuncs, and would end up loading the entire array. Likewise, if you pass it to a function that iterates over the array, or applies an index to substantial amounts the entire array will be loaded.
This is just how python works. By default numpy operations return a new array, so b never exists as a memmap - it is created when + is called on a.
There's a couple of ways to work around this. The simplest is to do all operations in place,
a += 1
This requires loading the memory mapped array for reading and writing,
a = np.load("a.npy", mmap_mode='r+')
Of course this isn't any good if you don't want to overwrite your original array.
In this case you need to specify that b should be memmapped.
b = np.memmap("b.npy", mmap+mode='w+', dtype=a.dtype, shape=a.shape)
Assigning can be done by using the out keyword provided by numpy ufuncs.
np.add(a, 2, out=b)

Make the basis of a function from nest loop outer components

I have a segment of code where a composition of nested loops needs to be run at various times; however, each time the operations within the nested loops are different. Is there a way to make the outer portion (loop composition) somehow a functional piece, so that the internal operations are variable. For example, below, two code blocks are shown which both use the same loop introduction, but have different purposes. According to the principle of DRY, how can I improve this, so as not to need to repeat myself each time a similar loop needs to be used?
% BLOCK 1
for a = 0:max(aVec)
for p = find(aVec'==a)
iDval = iDauVec{p};
switch numel(iDval)
case 2
r = rEqVec(iDval);
qVec(iDval(1)) = qVec(p) * (r(2)^0.5 / (r(1)^0.5 + r(2)^0.5));
qVec(iDval(2)) = qVec(p) - qVec(iDval(1));
case 1
qVec(iDval) = qVec(p);
end
end
end
% BLOCK 2
for gen = 0:max(genVec)-1
for p = find(genVec'==gen)
iDval = iDauVec{p};
QinitVec(iDval) = QinitVec(p)/numel(iDval);
end
end
You can write your loop structure as a function, which takes a function handle as one of its inputs. Within the loop structure, you can call this function to carry out your operation.
It looks as if the code inside the loop needs the values of p and iDval, and needs to assign to different elements of a vector variable in the workspace. In that case a suitable function definition might be something like this:
function vec = applyFunctionInLoop(aVec, vec, iDauVec, funcToApply)
for a = 0:max(aVec)
for p = find(aVec'==a)
iDval = iDauVec{p};
vec = funcToApply(vec, iDval, p);
end
end
end
You would need to put the code for each different operation you want to carry out in this way into a function with suitable input and output arguments:
function qvec = myFunc1(qVec, iDval, p)
switch numel(iDval)
case 2
r = rEqVec(iDval); % see note
qVec(iDval(1)) = qVec(p) * (r(2)^0.5 / (r(1)^0.5 + r(2)^0.5));
qVec(iDval(2)) = qVec(p) - qVec(iDval(1));
case 1
qVec(iDval) = qVec(p);
end
end
function v = myFunc2(v, ix, q)
v(ix) = v(q)/numel(ix);
end
Now you can use your loop structure to apply each function:
qvec = applyFunctionInLoop(aVec, qVec, iDauVec, myFunc1);
QinitVec = applyFunctionInLoop(aVec, QinitVec, iDauVec, myFunc2);
and so on.
In most of the answer I've kept to the same variable names you used in your question, but in the definition of myFunc2 I've changed the names to emphasise that these variables are local to the function definition - the function is not operating on the variables you passed in to it, but on the values of those variables, which is why we have to pass the final value of the vector out again.
Note that if you want to use the values of other variables in your functions, such as rEqVec in myFunc1, you need to think about whether those variables will be available in the function's workspace. I recommend reading these help pages on the Mathworks site:
Share Data Between Workspaces
Dynamic Function Creation with Anonymous and Nested Functions

Including time as an explicit variable in constraint in a Pyomo Model

I am using PyOMO to model a semi-batch reaction.
Consider an ODE system that describes a semi-batch reactor where one of the reactants is fed at a given volume flow for t1 units of time, the reaction goes on until t end, and obviously t1 < t end.
To specify the stop in the flow, I can either use a conditional rule (assume t1 = 3.5*60):
def _vol_flow_in_schedule(mod,t):
if t<=3.5*60:
return mod.vol_flow_in[t] == (12.3/1000)/(3.5*60)
else:
return mod.vol_flow_in[t] == 0
m1.vol_flow_in_schedule = Constraint(m1.time,rule=_vol_flow_in_schedule)
which will create a discontinuity (and then my model does not converge). What I want to do is use a sigmoidal function that will transition the flow to zero without a discontinuity.
To implement the sigmoidal though I need to refer to the time variable itself.
The below MATLAB code gives me the result I want:
t=[0:1:500];
acc=2; %Acceleration parameter, higher values yields sharper change.
time_of_step=3.5*60;
init_value = (12.3/1000)/(3.5*60);
end_value = 0;
sigmoidal=(init_value+(end_value-init_value)/2)...
+((end_value-init_value)/2)*atan((t-time_of_step)*acc)/atan(max(t));
This implementation however needs the time variable explicitly in the function. How can I access the time variable inside the PyOMO rule? I tried the below, but I get an " Cannot treat the scalar component 't_of_step' as an array" error:
m1.init_value = Param(initialize = (12.3/1000)/(3.5*60))
m1.end_value = Param(initialize = 0)
m1.t_of_step = Param(initialize = 210)
m1.acc = Param(initialize = 5)
.
.
def _vol_flow_sigmoidal (mod,t):
return mod.vol_flow_in[t] == (mod.init_value+(mod.end_value-mod.init_value)/2)+((mod.end_value-mod.init_value)/2)*atan((t-mod.t_of_step)*mod.acc)/atan(1500)
m1.vol_flow_sigmoidal = Constraint(m1.time,rule=_vol_flow_sigmoidal)
Hopefully I've described clearlyt what I am after. Any hints are most welcome,
Thanks!
Sal
How are you declaring the m1.time index?
My guess is that you are using a NumPy array to initialize the m1.time index. There is a known problem in Pyomo (see Issue #31) where the NumPy operator overloading and the Pyomo operator overloading end up fighting with each other (basically, NumPy gets fooled into thinking Pyomo scalars are actually indexed and attempts to treat them like arrays).
I was able to reproduce the error with the following complete example:
# pyomo 4.4.1
from pyomo.environ import *
import numpy as np
m1 = ConcreteModel()
m1.time = Set(initialize=np.array([0,100,200,300,400,500]))
m1.vol_flow_in = Var(m1.time)
m1.init_value = Param(initialize = (12.3/1000)/(3.5*60))
m1.end_value = Param(initialize = 0)
m1.t_of_step = Param(initialize = 210)
m1.acc = Param(initialize = 5)
def _vol_flow_sigmoidal (mod,t):
return mod.vol_flow_in[t] == (mod.init_value+(mod.end_value-mod.init_value)/2)\
+((mod.end_value-mod.init_value)/2)*atan((t-mod.t_of_step)*mod.acc)/atan(1500)
m1.vol_flow_sigmoidal = Constraint(m1.time,rule=_vol_flow_sigmoidal)
There are two alternatives that do work, both based on avoiding using NumPy arrays to initialize Pyomo Sets. You can either completely avoid Numpy:
m1.time = Set(initialize=[0,100,200,300,400,500])
or explicitly cast the NumPy array to a list:
timeArray = np.array([0,100,200,300,400,500])
m1.time = Set(initialize=timeArray.tolist())
Finally, for completeness, two other notes:
This also applies to initializing ContinuousSet objects in pyomo.dae
You will see the same behavior even if you avoid the explicit Pyomo Set declaration. That is, the following will also generate the error:
m1.time = np.array([0,100,200,300,400,500])
# ...
m1.vol_flow_sigmoidal = Constraint(m1.time,rule=_vol_flow_sigmoidal)
This is because Pyomo will quietly create the Set object for you behind the scenes as m1.vol_flow_sibmodial_index and then use that Set to index the Constraint.

How does rowfun know to reference variables inside a table

From the documentation, we see the following example:
g = gallery('integerdata',3,[15,1],1);
x = gallery('uniformdata',[15,1],9);
y = gallery('uniformdata',[15,1],2);
A = table(g,x,y)
func = #(x, y) (x - y);
B = rowfun(func,A,...
'GroupingVariable','g',...
'OutputVariableName','MeanDiff')
When the function func is applied to A in rowfun how does it know that there are variables in A called x and y?
EDIT: I feel that my last statement must not be true, as you do not get the same result if you did A = table(g, y, x).
I am still very confused by how rowfun can use a function that does not actually use any variables defined within the calling environment.
Unless you specify the rows (and their order) with the Name/Value argument InputVariables, Matlab will simply take column 1 as first input, column 2 as second input etc, ignoring eventual grouping columns.
Consequently, for better readability and maintainability of your code, I consider it good practice to always specify InputVariables explicitly.