Solving for a nonlinear Hamiltonian using SciPy's fsolve - nonlinear-optimization

I am in the midst of solving for a nonlinear Hamiltonian of a dimer, which consists of 2 complex wavefunctions. I am using SciPy's root solver method by iterations. Thus, the complex input for my initial guess has to be encoded into real and imaginary parts, which will then make the initial 2x1 matrix equation into a 4x1 matrix equation.
What I've tried :
from scipy.optimize import fsolve
import math
import numpy as np
import cmath
from math import pi
k = np.linspace(-pi, pi, 100)
v = 1
w = 1
H = np.zeros((2, 2), dtype=complex)
H[0, 1] = v+w*cmath.exp(-1j*k)
H[1, 0] = v+w*cmath.exp(1j*k)
E=np.zeros((2,2), )
E[0,1]=-(v**2+w**2+2*v*w*math.cos(k))**0.5
E[1,0]=(v**2+w**2+2*v*w*math.cos(k))**0.5
def f(x):
psi = np.zeros((2,1),float)
psi[0]=np.isreal(x[0])+1j*np.iscomplex(x[1])
psi[1]=np.isreal(x[2])+1j*np.iscomplex(x[3])
f=(H-E)*psi
return f
m = fsolve(f,np.array([[1],[2],[3],[4]]))
print (m.x)
The error message that I'm getting is as follows
TypeError: only length-1 arrays can be converted to Python scalars #for line 12
I need help to rectify this mistake.

Related

Why does stats.linregress return complex r-values for complex input arrays?

I'm attempting to perform linear regression on two complex arrays. That is, I'd like to find the line of best fit, w=mz+b, where m and b are both permitted to be complex and where the R^2-value, R^2=1-RSS/TSS is minimized. (Here RSS and TSS are the sum of squared residuals and the total of sum of squares.)
I know this can be done by creating a design matrix, computing m and b, etc., but out of curiosity, I tried using linregress from scipy.stats, which did return values:
import numpy as np
from scipy import stats
rng = np.random.default_rng()
x = rng.random(10)+1j*rng.random(10)
y = 1.6*x + rng.random(10)+1j*rng.random(10)
res = stats.linregress(x, y)
print(res)
LinregressResult(slope=(1.5814820568268182-0.004143389169974774j), intercept=.
(0.37141513243354485+0.4522070413718836j), rvalue=(0.8607413430092087-
0.002255091256570885j), pvalue=0.00138658952096427, stderr=.
(0.3306870298601568+0.0024769249452937106j), intercept_stderr=.
(0.16366363994151886+0.12045799398296754j))
What meaning does a non-real, complex-valued rvalue have? Is the modulus of this value the coefficient of determination?

How to correctly set the 'rtol' and 'atol' in scipy integration module 'solve_ivp' for solving a system of ODE with unknown analytic solution?

I was trying to reproduce some results of ode45 solver in Python using solve_ivp. Though all parameters, initial conditions, step size, and 'atol' and 'rtol' (which are 1e-6 and 1e-3) are same, I am getting different solutions. Both of the solutions are converging to a periodic solution but of different kind. As solve_ivp uses same rk4(5) method as ode45, this discrepancy in the final result is not quite understable. How can we know which one is the correct solution?
The code is included below
import sys
import numpy as np
from scipy.integrate import solve_ivp
#from scipy import integrate
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
# Pendulum rod lengths (m), bob masses (kg).
L1, L2, mu, a1 = 1, 1, 1/5, 1
m1, m2, B = 1, 1, 0.1
# The gravitational acceleration (m.s-2).
g = 9.81
# The forcing frequency,forcing amplitude
w, a_m =10, 4.5
A=(a_m*w**2)/g
A1=a_m/g
def deriv(t, y, mu, a1, B, w, A): # beware of the order of the aruments
"""Return the first derivatives of y = theta1, z1, theta2, z2, z3."""
a, c, b, d, e = y
#c, s = np.cos(theta1-theta2), np.sin(theta1-theta2)
adot = c
cdot = (-(1-A*np.sin(e))*(((1+mu)*np.sin(a))-(mu*np.cos(a-b)*np.sin(b)))-((mu/a1)*((d**2)+(a1*np.cos(a-b)*c**2))*np.sin(a-b))-(2*B*(1+(np.sin(a-b))**2)*c)-((2*B*A/w)*(2*np.sin(a)-(np.cos(a-b)*np.sin(b)))*np.cos(e)))/(1+mu*(np.sin(a-b))**2)
bdot = d
ddot = ((-a1*(1+mu)*(1-A*np.sin(e))*(np.sin(b)-(np.cos(a-b)*np.sin(a))))+(((a1*(1+mu)*c**2)+(mu*np.cos(a-b)*d**2))*np.sin(a-b))-((2*B/mu)*(((1+mu*(np.sin(a-b))**2)*d)+(a1*(1-mu)*np.cos(a-b)*c)))-((2*B*a1*A/(w*mu))*(((1+mu)*np.sin(b))-(2*mu*np.cos(a-b)*np.sin(a)))*np.cos(e)))/(1+mu*(np.sin(a-b))**2)
edot = w
return adot, cdot, bdot, ddot, edot
# Initial conditions: theta1, dtheta1/dt, theta2, dtheta2/dt.
y0 = np.array([3.15, -0.1, 3.13, 0.1, 0])
# Do the numerical integration of the equations of motion
sol = integrate.solve_ivp(deriv,[0,40000], y0, args=(mu, a1, B, w, A), method='RK45',t_eval=np.arange(0, 40000, 0.005), dense_output=True, rtol=1e-3, atol=1e-6)
T = sol.t
Y = sol.y
I am expecting similar result from ode45 in MATLAB and solve_ivp in Python. How can I exactly reproduce the result from ode45 in python? What is the reason of discrepancy?
Even if ode45and RK45use the same underlying scheme, they do not necessarily use the same exact strategy regarding the evolution of the time step and its adaptation to match the error tolerance. Thus, it is difficult to know which one is better.
The only thing you could is simply trying lower tolerances, e.g. 1e-10. Then, both solutions should end up being virtually identical... Here, your current error tolerance might be insufficiently low, so that small discrepancies in the fine details of both algorithms create a visible difference in the solution.

How to define a loss function in pytorch with dependency to partial derivatives of the model w.r.t input?

After reading about how to solve an ODE with neural networks following the paper Neural Ordinary Differential Equations and the blog that uses the library JAX I tried to do the same thing with "plain" Pytorch but found a point rather "obscure": How to properly use the partial derivative of a function (in this case the model) w.r.t one of the input parameters.
To resume the problem at hand as shown in 2 it is intended to solve the ODE y' = -2*x*y with the condition y(x=0) = 1 in the domain -2 <= x <= 2. Instead of using finite differences the solution is replaced by a NN as y(x) = NN(x) with a single layer with 10 nodes.
I managed to (more or less) replicate the blog with the following code
import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt
import numpy as np
# Define the NN model to solve the problem
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lin1 = nn.Linear(1,10)
self.lin2 = nn.Linear(10,1)
def forward(self, x):
x = torch.sigmoid(self.lin1(x))
x = torch.sigmoid(self.lin2(x))
return x
model = Model()
# Define loss_function from the Ordinary differential equation to solve
def ODE(x,y):
dydx, = torch.autograd.grad(y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True)
eq = dydx + 2.* x * y # y' = - 2x*y
ic = model(torch.tensor([0.])) - 1. # y(x=0) = 1
return torch.mean(eq**2) + ic**2
loss_func = ODE
# Define the optimization
# opt = optim.SGD(model.parameters(), lr=0.1, momentum=0.99,nesterov=True) # Equivalent to blog
opt = optim.Adam(model.parameters(),lr=0.1,amsgrad=True) # Got faster convergence with Adam using amsgrad
# Define reference grid
x_data = torch.linspace(-2.0,2.0,401,requires_grad=True)
x_data = x_data.view(401,1) # reshaping the tensor
# Iterative learning
epochs = 1000
for epoch in range(epochs):
opt.zero_grad()
y_trial = model(x_data)
loss = loss_func(x_data, y_trial)
loss.backward()
opt.step()
if epoch % 100 == 0:
print('epoch {}, loss {}'.format(epoch, loss.item()))
# Plot Results
plt.plot(x_data.data.numpy(), np.exp(-x_data.data.numpy()**2), label='exact')
plt.plot(x_data.data.numpy(), y_data.data.numpy(), label='approx')
plt.legend()
plt.show()
From here I manage to get the results as shown in the fig.
enter image description here
The problems is that at the definition of the ODE functional, instead of passing (x,y) I would rather prefer to pass something like (x,fun) (where fun is my model) such that the partial derivative and specific evaluations of the model can be done with a call . So, something like
def ODE(x,fun):
dydx, = "grad of fun w.r.t x as a function"
eq = dydx(x) + 2.* x * fun(x) # y' = - 2x*y
ic = fun( torch.tensor([0.]) ) - 1. # y(x=0) = 1
return torch.mean(eq**2) + ic**2
Any ideas? Thanks in advance
EDIT:
After some trials I found a way to pass the model as an input but found another strange behavior... The new problem is to solve the ODE y'' = -2 with the BC y(x=-2) = -1 and y(x=2) = 1, for which the analytical solution is y(x) = -x^2+x/2+4
Let's modify a bit the previous code as:
import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt
import numpy as np
# Define the NN model to solve the equation
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lin1 = nn.Linear(1,10)
self.lin2 = nn.Linear(10,1)
def forward(self, x):
y = torch.sigmoid(self.lin1(x))
z = torch.sigmoid(self.lin2(y))
return z
model = Model()
# Define loss_function from the Ordinary differential equation to solve
def ODE(x,fun):
y = fun(x)
dydx = torch.autograd.grad(y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True)[0]
d2ydx2 = torch.autograd.grad(dydx, x,
grad_outputs=dydx.data.new(dydx.shape).fill_(1),
create_graph=True, retain_graph=True)[0]
eq = d2ydx2 + torch.tensor([ 2.]) # y'' = - 2
bc1 = fun(torch.tensor([-2.])) - torch.tensor([-1.]) # y(x=-2) = -1
bc2 = fun(torch.tensor([ 2.])) - torch.tensor([ 1.]) # y(x= 2) = 1
return torch.mean(eq**2) + bc1**2 + bc2**2
loss_func = ODE
So, here I passed the model as argument and managed to derive twice... so far so good. BUT, using the sigmoid function for this case is not only not necessary but also gives a result that is far from the analytical one.
If I change the NN for:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lin1 = nn.Linear(1,1)
self.lin2 = nn.Linear(1,1)
def forward(self, x):
y = self.lin1(x)
z = self.lin2(y)
return z
In which case I would expect to optimize a double pass through two linear functions that would retrieve a 2nd order function ... I get the error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Adding the option to the definition of dydx doesn't solve the problem, and adding it to d2ydx2 gives a NoneType definition.
Is there something wrong with the layers as they are?
Quick Solution:
add allow_unused=True to .grad functions. So, change
dydx = torch.autograd.grad(
y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True)[0]
d2ydx2 = torch.autograd.grad(dydx, x, grad_outputs=dydx.data.new(
dydx.shape).fill_(1), create_graph=True, retain_graph=True)[0]
To
dydx = torch.autograd.grad(
y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True, allow_unused=True)[0]
d2ydx2 = torch.autograd.grad(dydx, x, grad_outputs=dydx.data.new(
dydx.shape).fill_(1), create_graph=True, retain_graph=True, allow_unused=True)[0]
More explanation:
See what allow_unused do:
allow_unused (bool, optional): If ``False``, specifying inputs that were not
used when computing outputs (and therefore their grad is always zero)
is an error. Defaults to ``False``.
So, if you try to differentiate w.r.t to a variable that is not in being used to compute the value, it will give an error. Also, note that error only occurs when you use linear layers.
This is because when you use linear layers, you have y=W1*W2*x + b = Wx+b and dy/dx is not a function of x, it is simply W. So when you try to differentiate dy/dx w.r.t x it throws an error. This error goes away as soon as you use sigmoid because then dy/dx will be a function of x. To avoid the error, either make sure dy/dx is a function of x or use allow_unused=True

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?
It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.
From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)
I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.

Regression not possible for same y value

I want to run a regression analysis on below data, here x1 and x2 produce y value. But in that case, y value is fixed in all time. So regression will not happen. But why? Need explanation.
Your training set shows that the coefficients are all ~0 and the constant is 5. There's no more information in that dataset, you don't need regression to show that.
You did not specify what kind of regression you are running. Depending on the type of regression you are using, you will need the matrices to be invertible and not be related linearly.
It seems to work using normal equation (with expected results):
import numpy as np
import matplotlib.pyplot as plt
input = np.array([
[2,3,5],
[1,2,5],
[4,2,5],
[1,7,5],
[1,9,5]
])
m = len(input)
X = np.array([np.ones(m), input[:, 0],input[:, 1]]).T # Add Constant to X
y = np.array(input[:, 2]).reshape(-1, 1) # Get the dependant values
betaHat = np.linalg.solve(X.T.dot(X), X.T.dot(y)) # Calculate coefficients
print(betaHat) # Show Constant and coefficients (in that order)
[[ 5.00000000e+00]
[ 5.29208238e-16]
[ 4.32685981e-17]]