How to plot date range with vlines in matplotlib? - date

The answer to this question appears relevant to my problem, however, it applies for ax.bar() instead of ax.vlines.
Matplotlib DateFormatter for axis label not working
The code below works with ax1.vlines(x, l, h, colors='k') and ax2.vlines(x, 0, v, colors='k') to plot vertical price and volume bars in a stock chart. But the horizontal axis is defined by a numpy array x = 0,1,2,3, ... etc. I have datetime objects in array d but if change to ax1.vlines(d, l, h, colors='k') and ax2.vlines(d,0,v,colors='k') then it throws an error. Thus d is defined but not used in the code below (it won't work using d but it works using x in the referenced code lines).
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(PATH+ticker+EXT, usecols=[0,2,3,4,5], header=None,
engine='python',skiprows=skr,skipfooter=skf)
d = pd.to_datetime(df[0]) # numpy array date
h = df[2].values # numpy array high
l = df[3].values # numpy array low
c = df[4].values # numpy array close
v = df[5].values # numpy array volume
x = np.arange(len(d))
# Draw Chart to White Background
ax1_y_label = ticker
fig1 = plt.figure()
fig1.set_size_inches(WIDE,TALL)
fig1.set_dpi(DTPI)
fig1.autofmt_xdate()
ax1 = plt.subplot2grid((5,4), (0,0), rowspan=4, colspan=4)
ax1.set_ylabel(ax1_y_label)
ax1.grid(True)
ax1.vlines(x, l, h, colors='k')
ax1.hlines(c, x, x+0.3, color='k')
ax2 = plt.subplot2grid((5,4), (4,0), sharex=ax1, rowspan=1, colspan=4)
ax2.set_ylabel(ax2_y_label)
ax2.grid(True)
ax2.vlines(x, 0, v, colors='k')
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax2.spines['right'].set_visible(False)
plt.setp(ax1.get_xticklabels(), visible=False)
plt.setp(ax1.get_yticklabels(), visible=False)
plt.setp(ax2.get_yticklabels(), visible=False)
plt.subplots_adjust(hspace=.01)

Related

How to set different stride with uniform filter in scipy?

I am using the following code to run uniform filter on my data:
from scipy.ndimage.filters import uniform_filter
a = np.arange(1000)
b = uniform_filter(a, size=10)
The filter right now semms to work as if a stride was set to size // 2.
How to adjust the code so that the stride of the filter is not half of the size?
You seem to be misunderstanding what uniform_filter is doing.
In this case, it creates an array b that replaces every a[i] with the mean of a block of size 10 centered at a[i]. So, something like:
for i in range(0, len(a)): # for the 1D case
b[i] = mean(a[i-10//2:i+10//2]
Note that this tries to access values with indices outside the range 0..1000. In the default case, uniform_filter supposes that the data before position 0 is just a reflection of the data thereafter. And similarly at the end.
Also note that b uses the same type as a. In the example where a is of integer type, the mean will also be calculated at integer, which can cause some loss of precision.
Here is some code and plot to illustrate what's happening:
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage.filters import uniform_filter
fig, axes = plt.subplots(ncols=2, figsize=(15,4))
for ax in axes:
if ax == axes[1]:
a = np.random.uniform(-1,1,50).cumsum()
ax.set_title('random curve')
else:
a = np.arange(50, dtype=float)
ax.set_title('values from 0 to 49')
b = uniform_filter(a, size=10)
ax.plot(a, 'b-')
ax.plot(-np.arange(0, 10)-1, a[:10], 'b:') # show the reflection at the start
ax.plot(50 + np.arange(0, 10), a[:-11:-1], 'b:') # show the reflection at the end
ax.plot(b, 'r-')
plt.show()

Applying scipy.stats.gaussian_kde to 3D point cloud

I have a set of about 33K (x,y,z) points in a csv file and would like to convert this to a grid of density values using scipy.stats.gaussian_kde. I have not been able to find a way to convert this point cloud array into an appropriate input format for the gaussian_kde function (and then take the output of this and convert it into a density value grid). Can anyone provide sample code?
Here's an example with some comments which may be of use. gaussian_kde wants the data and points to be row stacked, ie. (# ndim, # num values), as per the docs. In your case you would row_stack([x, y, z]) such that the shape is (3, 33000).
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
# simulate some data
n = 33000
x = np.random.randn(n)
y = np.random.randn(n) * 2
# data must be stacked as (# ndim, # n values) as per docs.
data = np.row_stack((x, y))
# perform KDE
kernel = gaussian_kde(data)
# create grid over which to evaluate KDE
s = np.linspace(-8, 8, 128)
grid = np.meshgrid(s, s)
# again KDE needs points to be row_stacked
grid_points = np.row_stack([g.ravel() for g in grid])
# evaluate KDE and reshape result correctly
Z = kernel(grid_points)
Z = Z.reshape(grid[0].shape)
# plot KDE as image and overlay some data points
fig, ax = plt.subplots()
ax.matshow(Z, extent=(s.min(), s.max(), s.min(), s.max()))
ax.plot(x[::10], y[::10], 'w.', ms=1, alpha=0.3)
ax.set_xlim(s.min(), s.max())
ax.set_ylim(s.min(), s.max())

gaussian process regression in multiple dimensions with GPflow

I would like to perform some multivariant regression using gaussian process regression as implemented in GPflow using version 2.
Installed with pip install gpflow==2.0.0rc1
Below is some example code that generates some 2D data and then attempts to fit it with using GPR and the finally computes the difference
between the true input data and the GPR prediction.
Eventually I would like to extend to higher dimensions
and do tests against a validation set to check for over-fitting
and experiment with other kernels and "Automatic Relevance Determination"
but understanding how to get this to work is the first step.
Thanks!
The following code snippet will work in a jupyter notebook.
import gpflow
import numpy as np
import matplotlib
from gpflow.utilities import print_summary
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def gen_data(X, Y):
"""
make some fake data.
X, Y are np.ndarrays with shape (N,) where
N is the number of samples.
"""
ys = []
for x0, x1 in zip(X,Y):
y = x0 * np.sin(x0*10)
y = x1 * np.sin(x0*10)
y += 1
ys.append(y)
return np.array(ys)
# generate some fake data
x = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, x)
X = X.ravel()
Y = Y.ravel()
z = gen_data(X, Y)
#note X.shape, Y.shape and z.shape
#are all (400,) for this case.
# if you would like to plot the data you can do the following
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X, Y, z, s=100, c='k')
# had to set this
# to avoid the following error
# tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
gpflow.config.set_default_positive_minimum(1e-7)
# setup the kernel
k = gpflow.kernels.Matern52()
# set up GPR model
# I think the shape of the independent data
# should be (400, 2) for this case
XY = np.column_stack([[X, Y]]).T
print(XY.shape) # this will be (400, 2)
m = gpflow.models.GPR(data=(XY, z), kernel=k, mean_function=None)
# optimise hyper-parameters
opt = gpflow.optimizers.Scipy()
def objective_closure():
return - m.log_marginal_likelihood()
opt_logs = opt.minimize(objective_closure,
m.trainable_variables,
options=dict(maxiter=100)
)
# predict training set
mean, var = m.predict_f(XY)
print(mean.numpy().shape)
# (400, 400)
# I would expect this to be (400,)
# If it was then I could compute the difference
# between the true data and the GPR prediction
# `diff = mean - z`
# but because the shape is not as expected this of course
# won't work.
The shape of z must be (N, 1), whereas in your case it is (N,). However, this is a missing check in GPflow and not your fault.

How to use interpn?

I am trying to use interpn (in python using Scipy) to replicate results from Matlab using interp3. However, I am struggling to structure my arguments. I tried the following line:
f = interpn(blur_maps, fx, fy, pyr_level)
Where blur maps is a 600 x 800 x 7 representing a grayscale image at seven levels of blur,
fx and fy are indices of the seven maps. Both fx and fy are 2d arrays. pyr_level is a 2d array that contains values from 1 to 7 representing the blur map to be interpolated.
My question is since I incorrectly arranged the arguments, how can I arrange them in a way that works? I tried to look up examples but I didn't see anything similar. Here is an example of the data I am trying to interpolate:
import numpy as np
import cv2, math
from scipy.interpolate import interpn
levels = 7
img_path = '/Users/alimahdi/Desktop/i4.jpg'
img = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2GRAY)
row, col = img.shape
x_range = np.arange(0, col)
y_range = np.arange(0, row)
fx, fy = np.meshgrid(x_range, y_range)
e = np.exp(np.sqrt(fx ** 2 + fy ** 2))
pyr_level = 7 * (e - np.min(e)) / (np.max(e) - np.min(e))
blur_maps = np.zeros((row, col, levels))
blur_maps[:, :, 0] = img
for i in range(levels - 1):
img = cv2.pyrDown(img)
r, c = img.shape
tmp = img
for j in range(int(math.log(row / r, 2))):
tmp = cv2.pyrUp(tmp)
blur_maps[:, :, i + 1] = tmp
pixelGrid = [np.arange(x) for x in blur_maps.shape]
interpPoints = np.array([fx.flatten(), fy.flatten(), pyr_level.flatten()])
interpValues = interpn(pixelGrid, blur_maps, interpPoints.T)
finalValues = np.reshape(interpValues, fx.shape)
I am now getting the following error: ValueError: One of the requested xi is out of bounds in dimension 0 I do know that the problem is in interpPoints but I am not sure how to fix it. Any suggestions?
The documentation for scipy.interpolate.interpn states that the first argument is a grid of the data you are interpolating over (which is just the integers of the pixel numbers), second argument is data (blur_maps) and third arguments is the interpolation points in the form (npoints, ndims). So you would have to do something like:
import scipy.interpolate
pixelGrid = [np.arange(x) for x in blur_maps.shape] # create grid of pixel numbers as per the docs
interpPoints = np.array([fx.flatten(), fy.flatten(), pyr_level.flatten()])
# interpolate
interpValues = scipy.interpolate.interpn(pixelGrid, blur_maps, interpPoints.T)
# now reshape the output array to get in the original format you wanted
finalValues = np.reshape(interpValues, fx.shape)

Skewnorm not fitting properly

This is a follow-up to my previous question here. I'm trying to fit my data from this csv file with scipy.stats.skewnorm, but I can't get it working right:
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import skewnorm
df = pd.read_csv('astro_data.csv')
x = df['delta z']
number_bins = 50
fig, ax = plt.subplots()
h, edges, _ = ax.hist(x, alpha = 0.5,
density = False,
bins = number_bins)
a_est, loc_est, scale_est = skewnorm.fit(x)
ax.plot(x, skewnorm.pdf(x, a_est, loc_est, scale_est), 'r-', lw=5, alpha=0.6, label='skewnorm pdf')
Can anyone see how I can fix this?
EDIT: when I change to density=True, the result is this: