Interpolation of a 3D regular griddata using scipy.interpolate.Rbf - scipy

This h5 file contains the information of an analytical function on a regular 3D gird. For interpolation purpose, I have got very poor result using the Regulargridinterpolator here. Now, I want to test scipy.interpolate.Rbf interpolator for my data set. Can anyone help me to do that? I had a look at the documentation of this interpolator but didn't understand properly.
I have created a h5 file like this:
import numpy as np
from numpy import gradient
import h5py
from scipy.interpolate import Rbf
def f(x,y,z):
return ( -1 / np.sqrt(x**2 + y**2 + z**2))
#grid
x = np.linspace(0, 100, 32) # since the boxsize is 320 Mpc/h
y = np.linspace(0, 100, 32)
z = np.linspace(0, 100, 32)
mesh_data = phi_an(*np.meshgrid(x, y, z, indexing='ij', sparse=True))
#create h5 file
h5file = h5py.File('analytic.h5', 'w')
h5file.create_dataset('/x', data=x)
h5file.create_dataset('/y', data=y)
h5file.create_dataset('/z', data=z)
h5file.create_dataset('/mesh_data', data=mesh_data)
h5file.close()

Related

Interpolation with radial basis function in julia

I have found few radial basis functions like BasisExpansionFunction, Surrogates.jl, ScatteredInterpolation in Julia.
However, I am unable to replicate the results from python's scipy.interpolate.rbf() function.
Python Example
from scipy.interpolate import Rbf
import numpy as np
xs = np.arange(10)
ys = xs**2 + np.sin(xs) + 1
interp_func = Rbf(xs, ys) # By default RbF uses Multiquadratic function
newarr = interp_func(np.arange(2.1, 3, 0.1))
print(newarr)
What is correct approach to replicate the above example in Julia?
The first tutorial in Surrogates.jl shows how to make and interpolate a radial basis function.
using Surrogates
using LinearAlgebra
f = x -> x[1]*x[2]
lb = [1.0,2.0]
ub = [10.0,8.5]
x = sample(50,lb,ub,SobolSample())
y = f.(x)
my_radial_basis = RadialBasis(x,y,lb,ub)
#I want an approximation at (1.0,1.4)
approx = my_radial_basis((1.0,1.4))

Applying scipy.stats.gaussian_kde to 3D point cloud

I have a set of about 33K (x,y,z) points in a csv file and would like to convert this to a grid of density values using scipy.stats.gaussian_kde. I have not been able to find a way to convert this point cloud array into an appropriate input format for the gaussian_kde function (and then take the output of this and convert it into a density value grid). Can anyone provide sample code?
Here's an example with some comments which may be of use. gaussian_kde wants the data and points to be row stacked, ie. (# ndim, # num values), as per the docs. In your case you would row_stack([x, y, z]) such that the shape is (3, 33000).
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
# simulate some data
n = 33000
x = np.random.randn(n)
y = np.random.randn(n) * 2
# data must be stacked as (# ndim, # n values) as per docs.
data = np.row_stack((x, y))
# perform KDE
kernel = gaussian_kde(data)
# create grid over which to evaluate KDE
s = np.linspace(-8, 8, 128)
grid = np.meshgrid(s, s)
# again KDE needs points to be row_stacked
grid_points = np.row_stack([g.ravel() for g in grid])
# evaluate KDE and reshape result correctly
Z = kernel(grid_points)
Z = Z.reshape(grid[0].shape)
# plot KDE as image and overlay some data points
fig, ax = plt.subplots()
ax.matshow(Z, extent=(s.min(), s.max(), s.min(), s.max()))
ax.plot(x[::10], y[::10], 'w.', ms=1, alpha=0.3)
ax.set_xlim(s.min(), s.max())
ax.set_ylim(s.min(), s.max())

gaussian process regression in multiple dimensions with GPflow

I would like to perform some multivariant regression using gaussian process regression as implemented in GPflow using version 2.
Installed with pip install gpflow==2.0.0rc1
Below is some example code that generates some 2D data and then attempts to fit it with using GPR and the finally computes the difference
between the true input data and the GPR prediction.
Eventually I would like to extend to higher dimensions
and do tests against a validation set to check for over-fitting
and experiment with other kernels and "Automatic Relevance Determination"
but understanding how to get this to work is the first step.
Thanks!
The following code snippet will work in a jupyter notebook.
import gpflow
import numpy as np
import matplotlib
from gpflow.utilities import print_summary
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def gen_data(X, Y):
"""
make some fake data.
X, Y are np.ndarrays with shape (N,) where
N is the number of samples.
"""
ys = []
for x0, x1 in zip(X,Y):
y = x0 * np.sin(x0*10)
y = x1 * np.sin(x0*10)
y += 1
ys.append(y)
return np.array(ys)
# generate some fake data
x = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, x)
X = X.ravel()
Y = Y.ravel()
z = gen_data(X, Y)
#note X.shape, Y.shape and z.shape
#are all (400,) for this case.
# if you would like to plot the data you can do the following
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X, Y, z, s=100, c='k')
# had to set this
# to avoid the following error
# tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
gpflow.config.set_default_positive_minimum(1e-7)
# setup the kernel
k = gpflow.kernels.Matern52()
# set up GPR model
# I think the shape of the independent data
# should be (400, 2) for this case
XY = np.column_stack([[X, Y]]).T
print(XY.shape) # this will be (400, 2)
m = gpflow.models.GPR(data=(XY, z), kernel=k, mean_function=None)
# optimise hyper-parameters
opt = gpflow.optimizers.Scipy()
def objective_closure():
return - m.log_marginal_likelihood()
opt_logs = opt.minimize(objective_closure,
m.trainable_variables,
options=dict(maxiter=100)
)
# predict training set
mean, var = m.predict_f(XY)
print(mean.numpy().shape)
# (400, 400)
# I would expect this to be (400,)
# If it was then I could compute the difference
# between the true data and the GPR prediction
# `diff = mean - z`
# but because the shape is not as expected this of course
# won't work.
The shape of z must be (N, 1), whereas in your case it is (N,). However, this is a missing check in GPflow and not your fault.

Skewnorm not fitting properly

This is a follow-up to my previous question here. I'm trying to fit my data from this csv file with scipy.stats.skewnorm, but I can't get it working right:
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import skewnorm
df = pd.read_csv('astro_data.csv')
x = df['delta z']
number_bins = 50
fig, ax = plt.subplots()
h, edges, _ = ax.hist(x, alpha = 0.5,
density = False,
bins = number_bins)
a_est, loc_est, scale_est = skewnorm.fit(x)
ax.plot(x, skewnorm.pdf(x, a_est, loc_est, scale_est), 'r-', lw=5, alpha=0.6, label='skewnorm pdf')
Can anyone see how I can fix this?
EDIT: when I change to density=True, the result is this:

Plotting the result of interpolation

I have a h5 file containing regulargrid data. I have used a code by which I can easily get the interpolated value for three given value. I have used RegularGridInterpolator function for interpolation purpose here. Now I want to make a plot to check whether the interpolation is correct or not. But I don't understand how can I do that. Can anyone help me to do that please? Here is my code:
import numpy as np
import h5py
from scipy.interpolate import RegularGridInterpolator
f = h5py.File('file.h5', 'r')
list(f.keys())
dset = f[u'data']
dset.shape
dset.value.shape
dset[0:63,0:63,0:63]
x = np.linspace(-10, 320, 64)
y = np.linspace(-10, 320, 64)
z = np.linspace(-10, 320, 64)
my_interpolating_function = RegularGridInterpolator((x, y, z), dset.value)
pts = np.array([7.36970468e-09, -4.54271563e-09, 1.51802701e-09])
my_interpolating_function(pts)
The output of the interpolation is array([5.45534467e-10])