How to train a combination of models in Flux? - neural-network

I am trying to build a deep learning model in julia. I have two models m1 and m2 which are neural networks. Here is my code:
using Flux
function even_mask(x)
s1, s2 = size(x)
weight_mask = zeros(s1, s2)
weight_mask[2:2:s1,:] = ones(Int(s1/2), s2)
return weight_mask
end
function odd_mask(x)
s1, s2 = size(x)
weight_mask = zeros(s1, s2)
weight_mask[1:2:s1,:] = ones(Int(s1/2), s2)
return weight_mask
end
function even_duplicate(x)
s1, s2 = size(x)
x_ = zeros(s1, s2)
x_[1:2:s1,:] = x[1:2:s1,:]
x_[2:2:s1,:] = x[1:2:s1,:]
return x_
end
function odd_duplicate(x)
s1, s2 = size(x)
x_ = zeros(s1, s2)
x_[1:2:s1,:] = x[2:2:s1,:]
x_[2:2:s1,:] = x[2:2:s1,:]
return x_
end
function Even(m)
x -> x .+ even_mask(x).*m(even_duplicate(x))
end
function InvEven(m)
x -> x .- even_mask(x).*m(even_duplicate(x))
end
function Odd(m)
x -> x .+ odd_mask(x).*m(odd_duplicate(x))
end
function InvOdd(m)
x -> x .- odd_mask(x).*m(odd_duplicate(x))
end
m1 = Chain(Dense(4,6,relu), Dense(6,5,relu), Dense(5,4))
m2 = Chain(Dense(4,7,relu), Dense(7,4))
forward = Chain(Even(m1), Odd(m2))
inverse = Chain(InvOdd(m2), InvEven(m1))
function loss(x)
z = forward(x)
return 0.5*sum(z.*z)
end
opt = Flux.ADAM()
x = rand(4,100)
for i=1:100
Flux.train!(loss, Flux.params(forward), x, opt)
println(loss(x))
end
The forward model is a combination of m1 and m2. I need to optimize m1 and m2 so I could optimize both forward and inverse models. But it seems that params(forward) is empty. How could I train my model?

I don't think plain functions can be used as layers in Flux. You need to use the #functor macro to add the extra functionality to collect parameters: https://fluxml.ai/Flux.jl/stable/models/basics/#Layer-helpers-1
In your case, rewriting Even, InvEven, Odd and InvOdd like this should help:
struct Even
model
end
(e::Even)(x) = x .+ even_mask(x).*e.model(even_duplicate(x))
Flux.#functor Even
After adding this definition,
Flux.params(Even(m1))
Should return a non-empty list
EDIT
An even simpler way to implement Even and friends is to use the built-in SkipConnection layer:
Even(m) = SkipConnection(Chain(even_duplicate, m),
(mx, x) -> x .+ even_mask(x) .* mx)
I suspect this is a version difference, but with Julia 1.4.1 and Flux v0.10.4, I get the error BoundsError: attempt to access () at index [1] when running your training loop, I need to replace the data with
x = [(rand(4,100), 0)]
Otherwise the loss is applied to each entry in the array x. since train! splats loss over x.
The next error mutating arrays is not supported is due to the implementation of *_mask and *_duplicate. These functions construct an array of zeros and then mutate it by replacing values from the input.
You can use Zygote.Buffer to implement this code in a way that can be differentiated.
using Flux
using Zygote: Buffer
function even_mask(x)
s1, s2 = size(x)
weight_mask = Buffer(x)
weight_mask[2:2:s1,:] = ones(Int(s1/2), s2)
weight_mask[1:2:s1,:] = zeros(Int(s1/2), s2)
return copy(weight_mask)
end
function odd_mask(x)
s1, s2 = size(x)
weight_mask = Buffer(x)
weight_mask[2:2:s1,:] = zeros(Int(s1/2), s2)
weight_mask[1:2:s1,:] = ones(Int(s1/2), s2)
return copy(weight_mask)
end
function even_duplicate(x)
s1, s2 = size(x)
x_ = Buffer(x)
x_[1:2:s1,:] = x[1:2:s1,:]
x_[2:2:s1,:] = x[1:2:s1,:]
return copy(x_)
end
function odd_duplicate(x)
s1, s2 = size(x)
x_ = Buffer(x)
x_[1:2:s1,:] = x[2:2:s1,:]
x_[2:2:s1,:] = x[2:2:s1,:]
return copy(x_)
end
Even(m) = SkipConnection(Chain(even_duplicate, m),
(mx, x) -> x .+ even_mask(x) .* mx)
InvEven(m) = SkipConnection(Chain(even_duplicate, m),
(mx, x) -> x .- even_mask(x) .* mx)
Odd(m) = SkipConnection(Chain(odd_duplicate, m),
(mx, x) -> x .+ odd_mask(x) .* mx)
InvOdd(m) = SkipConnection(Chain(odd_duplicate, m),
(mx, x) -> x .- odd_mask(x) .* mx)
m1 = Chain(Dense(4,6,relu), Dense(6,5,relu), Dense(5,4))
m2 = Chain(Dense(4,7,relu), Dense(7,4))
forward = Chain(Even(m1), Odd(m2))
inverse = Chain(InvOdd(m2), InvEven(m1))
function loss(x, y)
z = forward(x)
return 0.5*sum(z.*z)
end
opt = Flux.ADAM(1e-6)
x = [(rand(4,100), 0)]
function train!()
for i=1:100
Flux.train!(loss, Flux.params(forward), x, opt)
println(loss(x[1]...))
end
end
At this point, you get to the real fun of deep networks. After one training step, the training diverges to NaN with the default learning rate. Reducing the initial training rate to 1e-6 helps, and the loss looks like it is decreasing.

Related

Global fit coupled odes system lmfit

I'm trying to get the global fit of multiple set of data considering a system of 4 coupled ODEs.
I have the working code that solves the system of 4 coupled ODEs for a single set of data, and I have the working code that do the global fit with an arbitrary function (but not using odeint).
My problem is that I'm not able to merge the two codes...
Code for coupled ODEs
t =
data=
def gauss(x, amp, sigma, center):
"""Gaussian lineshape."""
return amp * np.exp(-(x-center)**2 / (2.*sigma**2))
def f(xs, t, ps):
"""Lotka-Volterra predator-prey model."""
try:
amp = ps['amp'].value
center = ps['center'].value
sigma = ps['sigma'].value
T1 = ps['T1'].value
Teq = ps['Teq'].value
except Exception:
amp, center, sigma, T1, Teq = ps
s0,s1,s2,s3 = xs
return [- gauss(t,amp,sigma,center) * (s0-s1),\
gauss(t,amp,sigma,center) * (s0-s1) - s1/T1,\
(s1/T1 - s2/Teq),\
(s2/Teq)]
def g(t, x0, ps):
x = odeint(f, x0, t, args=(ps,))
return x
def residual(ps, ts, data):
x0 = ps['s0'].value, ps['s1'].value, ps['s2'].value, ps['s3'].value
b = ps['b'].value
model = (((g(ts, x0, ps)[:,0]-g(ts, x0, ps)[:,1]+g(ts, x0, ps)[:,2]+b*g(ts, x0, ps)[:,3]))**2)/((g(ts, x0, ps)[0,0]))**2
return (model - data).ravel()
# set parameters incluing bounds
params = Parameters()
params.add('s0', value=1, vary=False)
params.add('s1', value=0, vary=False)
params.add('s2', value=0, vary=False)
params.add('s3', value=0, vary=False)
params.add('amp', value=0.02)
params.add('center', value=5)
params.add('sigma', value=0.1)
params.add('T1', value=0.3)
params.add('Teq', value=0.7)
params.add('b', value=-1)
# fit model and find predicted values
result = minimize(residual, params, args=(t, data), method='leastsq')
final = data + result.residual.reshape(data.shape)
Considering the code here: https://lmfit.github.io/lmfit-py/examples/example_fit_multi_datasets.html
I've tried to do by myself the code for global fit in this case
def gauss(x, amp, sigma, center):
"""Gaussian lineshape."""
return amp * np.exp(-(x-center)**2 / (2.*sigma**2))
def f(xs, t, ps):
"""Lotka-Volterra predator-prey model."""
try:
amp = ps['amp'].value
center = ps['center'].value
sigma = ps['sigma'].value
T1 = ps['T1'].value
Teq = ps['Teq'].value
except Exception:
amp, center, sigma, T1, Teq = ps
s0,s1,s2,s3 = xs
return [- gauss(t,amp,sigma,center) * (s0-s1),\
gauss(t,amp,sigma,center) * (s0-s1) - s1/T1,\
(s1/T1 - s2/Teq),\
(s2/Teq)]
def g(t, x0, params):
"""
Solution to the ODE x'(t) = f(t,x,k) with initial condition x(0) = x0
"""
x = odeint(f, x0, t, args=(params,))
return x
def testmodel(params, ts, data):
x0 = params['s0'].value, params['s1'].value, params['s2'].value, params['s3'].value
b = params['b'].value
model = (((g(ts, x0, params)[:,0]-g(ts, x0, params)[:,1]+g(ts, x0, params)[:,2]+b*g(ts, x0, params)[:,3]))**2)/((g(ts, x0, params)[0,0]))**2
return model
def testmodel_dataset(params, i, x):
"""Calculate Gaussian lineshape from parameters for data set."""
x0 = params[f's0_{i+1}'], params[f's1_{i+1}'], params[f's2_{i+1}'], params[f's3_{i+1}']
amp = params[f'amp_{i+1}']
center = params[f'center_{i+1}']
sigma = params[f'sigma_{i+1}']
T1 = params[f'T1_{i+1}']
Teq = params[f'Teq_{i+1}']
b = params[f'b_{i+1}']
return testmodel(params, x, data)
def objective(params, x, data):
"""Calculate total residual for fits of Gaussians to several data sets."""
ndata, _ = data.shape
resid = 0.0*data[:]
# make residual per data set
for i in range(ndata):
resid[i, :] = data[i, :] - testmodel_dataset(params, i, x)
# now flatten this to a 1D array, as minimize() needs
return resid.flatten()
fit_params = Parameters()
for iy, y in enumerate(data):
fit_params.add(f's0_{iy+1}', value=1)
fit_params.add(f's1_{iy+1}', value=0)
fit_params.add(f's2_{iy+1}', value=0)
fit_params.add(f's3_{iy+1}', value=0)
fit_params.add(f'amp_{iy+1}', value=0.5)
fit_params.add(f'center_{iy+1}', value=0.5)
fit_params.add(f'sigma_{iy+1}', value=0.5)
fit_params.add(f'T1_{iy+1}', value=0.5)
fit_params.add(f'Teq_{iy+1}', value=0.4)
fit_params.add(f'b_{iy+1}', value=0.3)
for iy in (2, 3, 4, 5, 6):
fit_params[f'sigma_{iy}'].expr = 'sigma_1'
out = minimize(objective, fit_params, args=(x, data))
report_fit(out.params)
Result -> KeyError: 's0'
There is a problem with x0 and s0,s1,s2,s3, population of the four states.
I'm sorry if the question may be very naive...
Thank you for your help.

Black Scholes function with vector inputs in Matlab

I'm trying to write a function in Matlab that calculates the Call price using the Black Scholes formula with vector inputs. I have so far:
function [C] = BlackScholesCall(S,K,t,r,sigma)
%This function calculates the call price per Black-Scholes equation
%INPUT S ... stock price at time 0
% K ... strike price
% r ... interest rate
% sigma ... volatility of the stock price measured as annual standard deviation
% t ... duration in years
%OUTPUT C ... call price
%USAGE BlackScholesCall(S,K,t,r,sigma)
for l = 1:length(K)
for z = 1:length(t)
d1 = (log(S/K(l)) + (r + 0.5*sigma^2)*t(z))/(sigma*sqrt(t(z)));
d2 = d1 - sigma*sqrt(t(z));
N1 = 0.5*(1+erf(d1/sqrt(2)));
N2 = 0.5*(1+erf(d2/sqrt(2)));
C(l) = S*N1-K(l)*exp(-r*t(z))*N2;
end
end
end
F.e. the code to call my function would be
S = 20
K = 16:21
t = 1:1:5
r = 0.02
sigma = 0.25
C = BlackScholesCall(S, K, t, r, sigma)
But when I compare this with the results of the blsprice function in Matlab, I get different results. I suspect there might be something wrong with the way I did the loop?
You are getting the same results as,
>> blsprice(S,K,r,t(end),sigma)
ans =
7.1509 6.6114 6.1092 5.6427 5.2102 4.8097
This is because by using C(l) = ... you are overwriting each element of C numel(t) times, and hence only storing/returning the last calculated values for each value of z.
At a minimum you need to use,
%C(l) = S*N1-K(l)*exp(-r*t(z))*N2;
C(z,l) = S*N1-K(l)*exp(-r*t(z))*N2;
But you should also pre-allocate your output matrix. That is, before either of the loops, you should add
C = nan(numel(K),numel(t));
Finally, you should note that you don't need to use any loops at all,
[Kmat,tmat] = meshgrid(K,t);
d1 = (log(S./Kmat) + (r + 0.5*sigma^2)*tmat)./(sigma*sqrt(tmat));
d2 = d1 - sigma*sqrt(tmat);
N1 = 0.5*(1+erf(d1/sqrt(2)));
N2 = 0.5*(1+erf(d2/sqrt(2)));
C = S*N1-Kmat.*exp(-r*tmat).*N2;
An R version could be the following.
BlackScholesCall <- function(S, K, tt, r, sigma){
f <- function(.K, .tt){
d1 <- (log(S/.K) + (r + 0.5*sigma^2)*.tt)/(sigma*sqrt(.tt))
d2 <- d1 - sigma*sqrt(.tt)
S*pnorm(d1) - .K*exp(-r*.tt)*pnorm(d2)
}
m <- length(K)
n <- length(tt)
o <- outer(K, tt, f)
last <- if(m > n) o[n:m, n] else o[m, m:n]
c(diag(o), last)
}
BlackScholesCall(S, K, tt, r, sigma)
#[1] 4.703480 4.783563 4.914990 5.059922 5.210161 5.210161 4.809748

Vectorization of Matrix Quadratics in MATLAB

I am trying to "vectorize" this loop in Matlab for computational efficiency
for t=1:T
j=1;
for m=1:M
for n=1:N
y(t,j) = v{m,n} + data(t,:)*b{m,n} + data(t,:)*f{m,n}*data(t,:)';
j=j+1;
end
end
end
Where v is a (M x N) cell of scalars. b is a (M x N) cell of (K x 1) vectors. f is a (M x N) cell of (K x K) matrices. data is a (T x K) array.
To give an example of what I mean the code I used to vectorize the same loop without the quadratic term is:
B = [reshape(cell2mat(v)',1,N*M);cell2mat(reshape(b'),1,M*N)];
X = [ones(T,1),data];
y = X*B;
Thanks!
For those interested here was the solution I found
f = f';
tMat = blkdiag(f{:})+(blkdiag(f{:}))';
y2BB = [reshape(cell2mat(v)',1,N*M);...
cell2mat(reshape(b',1,M*N));...
reshape(diag(blkdiag(f{:})),K,N*M);...
reshape(tMat((tril(tMat,-1)~=0)),sum(1:K-1),M*N)];
y2YBar = [ones(T,1),data,data.^2];
jj=1;
kk=1;
ll=1;
for k=1:sum(1:K-1)
y2YBar = [y2YBar,data(:,jj).*data(:,kk+jj)];
if kk<(K-ll)
kk=kk+1;
else
kk=1;
jj=jj+1;
ll=ll+1;
end
end
y = y2YBar*y2BB;
Here's the most vectorized form targeted for performance -
% Extract as multi-dim arrays
vA = reshape([v{:}],M,N);
bA = reshape([b{:}],K,M,N);
fA = reshape([f{:}],K,K,M,N);
% Perform : data(t,:)*f{m,n} for all iterations
data_f_mult = reshape(data*reshape(fA,K,[]),T,K,M,N);
% Now there are three parts :
% v{m,n}
% data(t,:)*b{m,n}
% data(t,:)*f{m,n}*data(t,:)';
% Compute those parts one by one
parte1 = vA(:).';
parte2 = data*reshape(bA,[],M*N);
parte3 = zeros(T,M*N);
for t = 1:T
parte3(t,:) = data(t,:)*reshape(data_f_mult(t,:,:),K,[]);
end
% Finally sum those up and to present in desired format permute dims
sums = bsxfun(#plus, parte1, parte2 + parte3);
out = reshape(permute(reshape(sums,T,M,N),[1,3,2]),[],M*N);

How to write/code several functions as one

I am trying to write a line composed of two segments as a single equation in :
y = m1*x + c1 , for x<=x1
y = m2*x + c2 , for x>=x1
My questions are:
How can I write the function of this combined line as a single equation?
How can I write multiple functions (valid in separate regions of a linear parameter space) as a single equation?
Please explain both how to express this mathematically and how to program this in general and in Matlab specifically.
You can write this equation as a single line by using the Heaviside step function, https://en.wikipedia.org/wiki/Heaviside_step_function.
Combining two functions into one:
In fact, what you are trying to do is
f(x) = a(x) (for x < x1)
f(x) = q (for x = x1), where q = a(x1) = b(x1)
f(x) = b(x) (for x > x1)
The (half-maximum) Heaviside function is defined as
H(x) = 0 (for x < 0)
H(x) = 0.5 (for x = 0)
H(x) = 1 (for x > 0)
Hence, your function will be
f(x) = H(x1-x) * a(c) + H(x-x1) * b(x)
and, therefore,
f(x) = H(x1-x) * (m1*x+c1) + H(x-x1) * (m2x+c2)
If you want to implement this, note that many programming languages will allow you to write something like
f(x) = (x<x1)?a(x):b(x)
which means if x<x1, then return value a(x), else return b(x), or in your case:
f(x) = (x<x1)?(m1*x+c1):(m2x+c2)
Matlab implementation:
In Matlab, you can write simple functions such as
a = #(x) m1.*x+c1,
b = #(x) m2.*x+c2,
assuming that you have previously defined m1, m2, and c1, c2.
There are several ways to using/implementing the Heaviside function
If you have the Symbolic Math Toolbox for Matlab, you can directly use heaviside() as a function.
#AndrasDeak (see comments below) pointed out that you can write your own half-maximum Heaviside function H in Matlab by entering
iif = #(varargin) varargin{2 * find([varargin{1:2:end}], 1, 'first')}();
H = #(x) iif(x<0,0,x>0,1,true,0.5);
If you want a continuous function that approximates the Heaviside function, you can use a logistic function H defined as
H = #(x) 1./(1+exp(-100.*x));
Independently of your implementation of the Heaviside function H, you can, create a one-liner in the following way (I am using x1=0 for simplicity) :
a = #(x) 2.*x + 3;
b = #(x) -1.5.*x + 3;
Which allows you to write your original function as a one-liner:
f = #(x) H(-x).*a(x) + H(x).*b(x);
You can then plot this function, for example from -10 to 10 by writing plot(-10:10, f(-10:10)) you will get the plot below.
Generalization:
Imagine you have
f(x) = a(x) (for x < x1)
f(x) = q (for x = x1), where q = a(x1) = b(x1)
f(x) = b(x) (for x1 < x < x2)
f(x) = r (for x = x2), where r = b(x2) = c(x2)
f(x) = c(x) (for x2 < x < x3)
f(x) = s (for x = x2), where s = c(x3) = d(x3)
f(x) = d(x) (for x3 < x)
By multiplying Heaviside functions, you can now determine zones where specific functions will be computed.
f(x) = H(x1-x)*a(c) + H(x-x1)*H(x2-x)*b(x) + H(x-x2)*H(x3-x)*c(x) + H(x-x3)*d(x)
PS: just realized that one of the comments above talks about the Heaviside function, too. Kudos to #AndrasDeak .

MATLAB - vectorize iteration over two matrices used in function

I have two matrices X and Y, both of order mxn. I want to create a new matrix Z of order mx1 such that each i th entry in this new matrix is computed by applying a function to ith and ith row of X and Y respectively. In my case m = 100000 and n = 2. I tried using a loop but it takes forever.
for i = 1:m
Z = function(X(1,:),Y(1,:), constant_parameters)
end
Is there an efficient way to vectorize it?
EDIT 1
This is the function
function [peso] = fxPesoTexturaCN(a,b, img, r, L)
ac = num2cell(a);
bc = num2cell(b);
imgint1 = img(sub2ind(size(img),ac{:}));
imgint2 = img(sub2ind(size(img),bc{:}));
peso = (sum((a - b) .^ 2) + (r/L) * (imgint2 - imgint1)) / (2*r^2);
Where img, r, L are constats. a is X(1,:) and b is Y(1,:)
And the call of this function is
peso = bsxfun(#(a,b) fxPesoTexturaCN(a,b,img,r,L), a, b);