Applying scipy.stats.gaussian_kde to 3D point cloud - scipy

I have a set of about 33K (x,y,z) points in a csv file and would like to convert this to a grid of density values using scipy.stats.gaussian_kde. I have not been able to find a way to convert this point cloud array into an appropriate input format for the gaussian_kde function (and then take the output of this and convert it into a density value grid). Can anyone provide sample code?

Here's an example with some comments which may be of use. gaussian_kde wants the data and points to be row stacked, ie. (# ndim, # num values), as per the docs. In your case you would row_stack([x, y, z]) such that the shape is (3, 33000).
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
# simulate some data
n = 33000
x = np.random.randn(n)
y = np.random.randn(n) * 2
# data must be stacked as (# ndim, # n values) as per docs.
data = np.row_stack((x, y))
# perform KDE
kernel = gaussian_kde(data)
# create grid over which to evaluate KDE
s = np.linspace(-8, 8, 128)
grid = np.meshgrid(s, s)
# again KDE needs points to be row_stacked
grid_points = np.row_stack([g.ravel() for g in grid])
# evaluate KDE and reshape result correctly
Z = kernel(grid_points)
Z = Z.reshape(grid[0].shape)
# plot KDE as image and overlay some data points
fig, ax = plt.subplots()
ax.matshow(Z, extent=(s.min(), s.max(), s.min(), s.max()))
ax.plot(x[::10], y[::10], 'w.', ms=1, alpha=0.3)
ax.set_xlim(s.min(), s.max())
ax.set_ylim(s.min(), s.max())

Related

How to set different stride with uniform filter in scipy?

I am using the following code to run uniform filter on my data:
from scipy.ndimage.filters import uniform_filter
a = np.arange(1000)
b = uniform_filter(a, size=10)
The filter right now semms to work as if a stride was set to size // 2.
How to adjust the code so that the stride of the filter is not half of the size?
You seem to be misunderstanding what uniform_filter is doing.
In this case, it creates an array b that replaces every a[i] with the mean of a block of size 10 centered at a[i]. So, something like:
for i in range(0, len(a)): # for the 1D case
b[i] = mean(a[i-10//2:i+10//2]
Note that this tries to access values with indices outside the range 0..1000. In the default case, uniform_filter supposes that the data before position 0 is just a reflection of the data thereafter. And similarly at the end.
Also note that b uses the same type as a. In the example where a is of integer type, the mean will also be calculated at integer, which can cause some loss of precision.
Here is some code and plot to illustrate what's happening:
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage.filters import uniform_filter
fig, axes = plt.subplots(ncols=2, figsize=(15,4))
for ax in axes:
if ax == axes[1]:
a = np.random.uniform(-1,1,50).cumsum()
ax.set_title('random curve')
else:
a = np.arange(50, dtype=float)
ax.set_title('values from 0 to 49')
b = uniform_filter(a, size=10)
ax.plot(a, 'b-')
ax.plot(-np.arange(0, 10)-1, a[:10], 'b:') # show the reflection at the start
ax.plot(50 + np.arange(0, 10), a[:-11:-1], 'b:') # show the reflection at the end
ax.plot(b, 'r-')
plt.show()

gaussian process regression in multiple dimensions with GPflow

I would like to perform some multivariant regression using gaussian process regression as implemented in GPflow using version 2.
Installed with pip install gpflow==2.0.0rc1
Below is some example code that generates some 2D data and then attempts to fit it with using GPR and the finally computes the difference
between the true input data and the GPR prediction.
Eventually I would like to extend to higher dimensions
and do tests against a validation set to check for over-fitting
and experiment with other kernels and "Automatic Relevance Determination"
but understanding how to get this to work is the first step.
Thanks!
The following code snippet will work in a jupyter notebook.
import gpflow
import numpy as np
import matplotlib
from gpflow.utilities import print_summary
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def gen_data(X, Y):
"""
make some fake data.
X, Y are np.ndarrays with shape (N,) where
N is the number of samples.
"""
ys = []
for x0, x1 in zip(X,Y):
y = x0 * np.sin(x0*10)
y = x1 * np.sin(x0*10)
y += 1
ys.append(y)
return np.array(ys)
# generate some fake data
x = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, x)
X = X.ravel()
Y = Y.ravel()
z = gen_data(X, Y)
#note X.shape, Y.shape and z.shape
#are all (400,) for this case.
# if you would like to plot the data you can do the following
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X, Y, z, s=100, c='k')
# had to set this
# to avoid the following error
# tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
gpflow.config.set_default_positive_minimum(1e-7)
# setup the kernel
k = gpflow.kernels.Matern52()
# set up GPR model
# I think the shape of the independent data
# should be (400, 2) for this case
XY = np.column_stack([[X, Y]]).T
print(XY.shape) # this will be (400, 2)
m = gpflow.models.GPR(data=(XY, z), kernel=k, mean_function=None)
# optimise hyper-parameters
opt = gpflow.optimizers.Scipy()
def objective_closure():
return - m.log_marginal_likelihood()
opt_logs = opt.minimize(objective_closure,
m.trainable_variables,
options=dict(maxiter=100)
)
# predict training set
mean, var = m.predict_f(XY)
print(mean.numpy().shape)
# (400, 400)
# I would expect this to be (400,)
# If it was then I could compute the difference
# between the true data and the GPR prediction
# `diff = mean - z`
# but because the shape is not as expected this of course
# won't work.
The shape of z must be (N, 1), whereas in your case it is (N,). However, this is a missing check in GPflow and not your fault.

How to use interpn?

I am trying to use interpn (in python using Scipy) to replicate results from Matlab using interp3. However, I am struggling to structure my arguments. I tried the following line:
f = interpn(blur_maps, fx, fy, pyr_level)
Where blur maps is a 600 x 800 x 7 representing a grayscale image at seven levels of blur,
fx and fy are indices of the seven maps. Both fx and fy are 2d arrays. pyr_level is a 2d array that contains values from 1 to 7 representing the blur map to be interpolated.
My question is since I incorrectly arranged the arguments, how can I arrange them in a way that works? I tried to look up examples but I didn't see anything similar. Here is an example of the data I am trying to interpolate:
import numpy as np
import cv2, math
from scipy.interpolate import interpn
levels = 7
img_path = '/Users/alimahdi/Desktop/i4.jpg'
img = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2GRAY)
row, col = img.shape
x_range = np.arange(0, col)
y_range = np.arange(0, row)
fx, fy = np.meshgrid(x_range, y_range)
e = np.exp(np.sqrt(fx ** 2 + fy ** 2))
pyr_level = 7 * (e - np.min(e)) / (np.max(e) - np.min(e))
blur_maps = np.zeros((row, col, levels))
blur_maps[:, :, 0] = img
for i in range(levels - 1):
img = cv2.pyrDown(img)
r, c = img.shape
tmp = img
for j in range(int(math.log(row / r, 2))):
tmp = cv2.pyrUp(tmp)
blur_maps[:, :, i + 1] = tmp
pixelGrid = [np.arange(x) for x in blur_maps.shape]
interpPoints = np.array([fx.flatten(), fy.flatten(), pyr_level.flatten()])
interpValues = interpn(pixelGrid, blur_maps, interpPoints.T)
finalValues = np.reshape(interpValues, fx.shape)
I am now getting the following error: ValueError: One of the requested xi is out of bounds in dimension 0 I do know that the problem is in interpPoints but I am not sure how to fix it. Any suggestions?
The documentation for scipy.interpolate.interpn states that the first argument is a grid of the data you are interpolating over (which is just the integers of the pixel numbers), second argument is data (blur_maps) and third arguments is the interpolation points in the form (npoints, ndims). So you would have to do something like:
import scipy.interpolate
pixelGrid = [np.arange(x) for x in blur_maps.shape] # create grid of pixel numbers as per the docs
interpPoints = np.array([fx.flatten(), fy.flatten(), pyr_level.flatten()])
# interpolate
interpValues = scipy.interpolate.interpn(pixelGrid, blur_maps, interpPoints.T)
# now reshape the output array to get in the original format you wanted
finalValues = np.reshape(interpValues, fx.shape)

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?
It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.
From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)
I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.

How to use scipy signal for MIMO systems

I am looking for a way to simulate the output of a signal for various input signals. To be more precise, I have a system defined by its transfer function H that takes one input and has one output. I generated several signals (stored in a numpy array). What I would like to do, is get the response of the system, to each input signal whithout using a for loop. Is there a way to proceed? Below is the code I wrote so far.
from __future__ import division
import numpy as np
from scipy import signal
nbr_inputs = 5
t_in = np.arange(0,10,0.2)
dim = (nbr_inputs, len(t_in))
x = np.cumsum(np.random.normal(0,2e-3, dim), axis=1)
H = signal.TransferFunction([1, 3, 3], [1, 2, 1])
t_out, y, _ = signal.lsim(H, x[0], t_in) # here, I would just like to simply write x
thanks for your help
This is not a MIMO system, it is a SISO system but you have multiple inputs.
You can create a MIMO system and apply your inputs all at once which will be computed channel by channel but simultaneously. Moreover, you can't use scipy.signal.lsim for MIMO systems yet. You can use other options such as python-control (if you have slycot extension otherwise again no MIMO) or harold if you have Python 3.6 or greater (disclaimer: I'm the author).
import numpy as np
from harold import *
import matplotlib.pyplot
nbr_inputs = 5
t_in = np.arange(0,10,0.2)
dim = (nbr_inputs, len(t_in))
x = np.cumsum(np.random.normal(0,2e-3, dim), axis=1)
# Forming a 1x5 system, common denominator will be completed automatically
H = Transfer([[[1, 3, 3]]*nbr_inputs], [1, 2, 1])
The keyword per_channel=True applies first input to first channel, second input to second and so on. Otherwise combined response is returned. You can check the shapes by playing around with it to see what I mean.
# Notice it is x.T below -> input shape = <num samples>, <num inputs>
y, t = simulate_linear_system(H, x.T, t_in, per_channel=True)
plt.plot(t, y)
This gives