Chaco - Getting multiple data series to use the same axes and maps - enthought

I am trying to plot several collections of data on a single plot.
Each dataset can be represented as an x-series (index) and several y-series (values). The ranges of x and y data series may be different in each data set. I want to have several of these data sets display on one plot. However, when I simply add the second plot object to the first (see below) it makes a second axis for it that is nested inside the plot.
I want both plots to share the same axis and for the axis bounds to be updated to fit all the data. What is the best way to achieve this? I am struggling to find topics on this in the documentation.
Thanks for your help. The code below highlights my problem.
# Major library imports
from numpy import linspace
from scipy.special import jn
from chaco.example_support import COLOR_PALETTE
# Enthought library imports
from enable.api import Component, ComponentEditor
from traits.api import HasTraits, Instance
from traitsui.api import Item, Group, View
# Chaco imports
from chaco.api import ArrayPlotData, Plot
from chaco.tools.api import BroadcasterTool, PanTool, ZoomTool
from chaco.api import create_line_plot, add_default_axes
def _create_plot_component():
# Create some x-y data series to plot
x = linspace(-2.0, 10.0, 100)
x2 =linspace(-5.0, 10.0, 100)
pd = ArrayPlotData(index = x)
for i in range(5):
pd.set_data("y" + str(i), jn(i,x))
#slightly different plot data
pd2 = ArrayPlotData(index = x2)
for i in range(5):
pd2.set_data("y" + str(i), 2*jn(i,x2))
# Create some line plots of some of the data
plot1 = Plot(pd)
plot1.plot(("index", "y0", "y1", "y2"), name="j_n, n<3", color="red")
# Tweak some of the plot properties
plot1.title = "My First Line Plot"
plot1.padding = 50
plot1.padding_top = 75
plot1.legend.visible = True
plot2 = Plot(pd2)
plot2.plot(("index", "y0", "y1"), name="j_n, n<3", color="green")
plot1.add(plot2)
# Attach some tools to the plot
broadcaster = BroadcasterTool()
broadcaster.tools.append(PanTool(plot1))
broadcaster.tools.append(PanTool(plot2))
for c in (plot1, plot2):
zoom = ZoomTool(component=c, tool_mode="box", always_on=False)
broadcaster.tools.append(zoom)
plot1.tools.append(broadcaster)
return plot1
# Attributes to use for the plot view.
size=(900,500)
title="Multi-Y plot"
# # Demo class that is used by the demo.py application.
#===============================================================================
class Demo(HasTraits):
plot = Instance(Component)
traits_view = View(
Group(
Item('plot', editor=ComponentEditor(size=size),
show_label=False),
orientation = "vertical"),
resizable=True, title=title,
width=size[0], height=size[1]
)
def _plot_default(self):
return _create_plot_component()
demo = Demo()
if __name__ == "__main__":
demo.configure_traits()

One of the warts in Chaco (and indeed many plotting libraries) is the overloading of terms---especially the word "plot".
You're creating two different (capital-"P") Plots, but (I believe) you really only want one. Plot is the container that holds all of your individual line ... umm ... plots. The Plot.plot method returns a list of LinePlot instances (this "plot" is also called a "renderer" sometimes). That renderer is what you want to add to your (capital-"P") Plot container. The plot method actually creates the LinePlot instance and adds it to the Plot container for you. (Yup, that's three different uses of "plot": The container, the renderer, and the method on the container that adds/returns the renderer.)
Here's a simpler version of _create_plot_component that does roughly what you want. Note that only a single (capital-"P") Plot container is created.
def _create_plot_component():
# Create some x-y data series to plot
x = linspace(-2.0, 10.0, 100)
x2 =linspace(-5.0, 10.0, 100)
pd = ArrayPlotData(x=x, x2=x2)
for i in range(3):
pd.set_data("y" + str(i), jn(i,x))
# slightly different plot data
for i in range(3, 5):
pd.set_data("y" + str(i), 2*jn(i,x2))
# Create some line plots of some of the data
canvas = Plot(pd)
canvas.plot(("x", "y0", "y1", "y2"), name="plot 1", color="red")
canvas.plot(("x2", "y3", "y4"), name="plot 2", color="green")
return canvas
Edit: An earlier response fixed the issue with a two-line modification, but it wasn't the ideal way to solve the problem.

Related

How to set different stride with uniform filter in scipy?

I am using the following code to run uniform filter on my data:
from scipy.ndimage.filters import uniform_filter
a = np.arange(1000)
b = uniform_filter(a, size=10)
The filter right now semms to work as if a stride was set to size // 2.
How to adjust the code so that the stride of the filter is not half of the size?
You seem to be misunderstanding what uniform_filter is doing.
In this case, it creates an array b that replaces every a[i] with the mean of a block of size 10 centered at a[i]. So, something like:
for i in range(0, len(a)): # for the 1D case
b[i] = mean(a[i-10//2:i+10//2]
Note that this tries to access values with indices outside the range 0..1000. In the default case, uniform_filter supposes that the data before position 0 is just a reflection of the data thereafter. And similarly at the end.
Also note that b uses the same type as a. In the example where a is of integer type, the mean will also be calculated at integer, which can cause some loss of precision.
Here is some code and plot to illustrate what's happening:
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage.filters import uniform_filter
fig, axes = plt.subplots(ncols=2, figsize=(15,4))
for ax in axes:
if ax == axes[1]:
a = np.random.uniform(-1,1,50).cumsum()
ax.set_title('random curve')
else:
a = np.arange(50, dtype=float)
ax.set_title('values from 0 to 49')
b = uniform_filter(a, size=10)
ax.plot(a, 'b-')
ax.plot(-np.arange(0, 10)-1, a[:10], 'b:') # show the reflection at the start
ax.plot(50 + np.arange(0, 10), a[:-11:-1], 'b:') # show the reflection at the end
ax.plot(b, 'r-')
plt.show()

Modyfing python code and running PCA in Tableau

I am a beginner, my first time to use Tableau. I want to perfrorm PCA from Python code in Tableau Dekstop. I got main ideas behind that process, TabPy is installed.
My dataset is really big, having around 1000 + columns.
I took a look on modyfing python code (my python code at the end) to be able to run in tableau.
My question is, in my case how can specify _arg1,_arg2,_arg3,... because I used dataset.drop('Class', 1) to define x, and dataset['Class'] to define y?
Thank you in advance.
# importing or loading the dataset
dataset = pd.read_excel('NL_undivided.xlsx')
# distributing the dataset into two components X and Y
X = dataset.drop('Class', 1)
Y = dataset['Class']
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
scaled_data = scaler.transform(X)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(scaled_data)
x_pca = pca.transform(scaled_data)
plt.figure(figsize=(20,10))
fig, ax = plt.subplots(figsize=(20, 10))
scatter = ax.scatter(x_pca[:,0],x_pca[:,1],c=Y,cmap='rainbow',)
# produce a legend with the unique colors from the scatter
legend1 = ax.legend(*scatter.legend_elements(),
loc="best", title="Cohorts")
ax.add_artist(legend1)
plt.figure(figsize=(15,8))

Applying scipy.stats.gaussian_kde to 3D point cloud

I have a set of about 33K (x,y,z) points in a csv file and would like to convert this to a grid of density values using scipy.stats.gaussian_kde. I have not been able to find a way to convert this point cloud array into an appropriate input format for the gaussian_kde function (and then take the output of this and convert it into a density value grid). Can anyone provide sample code?
Here's an example with some comments which may be of use. gaussian_kde wants the data and points to be row stacked, ie. (# ndim, # num values), as per the docs. In your case you would row_stack([x, y, z]) such that the shape is (3, 33000).
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
# simulate some data
n = 33000
x = np.random.randn(n)
y = np.random.randn(n) * 2
# data must be stacked as (# ndim, # n values) as per docs.
data = np.row_stack((x, y))
# perform KDE
kernel = gaussian_kde(data)
# create grid over which to evaluate KDE
s = np.linspace(-8, 8, 128)
grid = np.meshgrid(s, s)
# again KDE needs points to be row_stacked
grid_points = np.row_stack([g.ravel() for g in grid])
# evaluate KDE and reshape result correctly
Z = kernel(grid_points)
Z = Z.reshape(grid[0].shape)
# plot KDE as image and overlay some data points
fig, ax = plt.subplots()
ax.matshow(Z, extent=(s.min(), s.max(), s.min(), s.max()))
ax.plot(x[::10], y[::10], 'w.', ms=1, alpha=0.3)
ax.set_xlim(s.min(), s.max())
ax.set_ylim(s.min(), s.max())

Convergence when utilizing scipy.odr module to find best-fit parameters when there is only horizontal errorbars

I am trying to fit a piecewise (otherwise linear) function to a set of experimental data. The form of the data is such that there is only horizontal error bars and no vertical error bars. I am familiar with scipy.optimize.curve_fit module but that works when there is only vertical error bars corresponding to the dependent variable y. After searching for my specific need, I came across the following post where it explains about the possibility of using scipy.odr module when errorbars are those of independent variable x. (Correct fitting with scipy curve_fit including errors in x?)
Attached is my version of the code which tries to find best-fit parameters using ODR methodology. It actually draws best-fit function and it seems it's working. However, after changing initial (educated guess) values and trying to extract best-fit parameters, I am getting the same guessed parameters I inserted initially. This means that the method is not convergent and you can verify this by printing output.stopreason and getting
['Numerical error detected']
So, my question is whether this methodology is consistent with my function being piecewise and if not, if there is any other correct methodology to adopt in such cases?
from numpy import *
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from scipy.odr import ODR, Model, Data, RealData
x_array=array([8.2,8.6,9.,9.4,9.8,10.2,10.6,11.,11.4,11.8])
x_err_array=array([0.2]*10)
y_array=array([-2.05179545,-1.64998354,-1.49136169,-0.94200805,-0.60205999,0.,0.,0.,0.,0.])
y_err_array=array([0]*10)
# Linear Fitting Model
def func(beta, x):
return piecewise(x, [x < beta[0]], [lambda x:beta[1]*x-beta[1]*beta[0], lambda x:0.0])
data = RealData(x_array, y_array, x_err_array, y_err_array)
model = Model(func)
odr = ODR(data, model, [10.1,1.02])
odr.set_job(fit_type=0)
output = odr.run()
f, (ax1) = plt.subplots(1, sharex=True, sharey=True, figsize=(10,10))
ax1.errorbar(x_array, y_array, xerr = x_err_array, yerr = y_err_array, ecolor = 'blue', elinewidth = 3, capsize = 3, linestyle = '')
ax1.plot(x_array, func(output.beta, x_array), 'blue', linestyle = 'dotted', label='Best-Fit')
ax1.legend(loc='lower right', ncol=1, fontsize=12)
ax1.set_xlim([7.95, 12.05])
ax1.set_ylim([-2.1, 0.1])
ax1.yaxis.set_major_locator(MaxNLocator(prune='upper'))
ax1.set_ylabel('$y$', fontsize=12)
ax1.set_xlabel('$x$', fontsize=12)
ax1.set_xscale("linear", nonposx='clip')
ax1.set_yscale("linear", nonposy='clip')
ax1.get_xaxis().tick_bottom()
ax1.get_yaxis().tick_left()
f.subplots_adjust(top=0.98,bottom=0.14,left=0.14,right=0.98)
plt.setp([a.get_xticklabels() for a in f.axes[:-1]], visible=True)
plt.show()
An error of 0 for y is causing problems. Make it small but not zero, e.g. 1e-16. Doing so the fit converges. It also does if you omit the y_err_array when defining RealData but I am not sure what happens internally in that case.

Chaco bar plots

I'm really struggling to understand how to build chaco bar plots.
I've been poring over an online example and have reduced it down to the following simplified code:
import numpy as np
from traits.api import HasTraits, Instance
from traitsui.api import View, Item
from chaco.api import BarPlot, ArrayDataSource, DataRange1D, LinearMapper
from enable.api import ComponentEditor
class MyBarPlot(HasTraits):
plot = Instance(BarPlot)
traits_view = View(
Item('plot',editor=ComponentEditor(), show_label=False))
def _plot_default(self):
idx = np.array([1, 2, 3, 4, 5])
vals = np.array([2, 4, 7, 4, 3])
index = ArrayDataSource(idx)
index_range = DataRange1D(index, low=0.5, high=5.5)
index_mapper = LinearMapper(range=index_range)
value = ArrayDataSource(vals)
value_range = DataRange1D(value, low=0)
value_mapper = LinearMapper(range=value_range)
plot = BarPlot(index=index, value=value,
value_mapper=value_mapper,
index_mapper=index_mapper)
return plot
if __name__ == "__main__":
myplot = MyBarPlot()
myplot.configure_traits()
Needless to say my attempt didn't work. When I run this code (in an ipython notebook) all I get is a blank plot window, filled with black.
I suspect my error might have something to do with the 'valuemapper' entries as I don't really understand what these are for. I'd be grateful for any pointers on my code error.
More generally, this coding approach seems very complicated to me - is there a simpler way to make chaco bar plots?
Unfortunately, the bar_width trait of BarPlot doesn't try to be smart here. The default is a width of 10 in data space, which completely overwhelms the width of the plot. To fix this, you can adjust the width manually:
plot = BarPlot(index=index, value=value,
value_mapper=value_mapper,
index_mapper=index_mapper,
bar_width=0.8)
Note that this still won't give you a proper "bar plot" because you'll be missing the surrounding axes decorations and other components. The easiest way to accomplish that would be to use chaco's Plot object.
Here's an example that adds a BarPlot instance to a Plot:
https://github.com/enthought/chaco/blob/master/examples/demo/tornado.py
And here's an example that creates a bar plot using the plot convenience method in Plot:
https://github.com/enthought/chaco/blob/master/examples/demo/basic/bar_plot_stacked.py
Yes, "plot" is quite terribly overloaded (Chaco - Getting multiple data series to use the same axes and maps).