Python GGPlot Syntax to Annotate Chart with Statsmodels Variable? - ipython

Not able to located complete Yhat doc to answer this question, using the R version of
ggplot I've attempted to iteratively back into a solution.
What is the correct syntax for annotating a Python ggplot plot with text in generally, more specifically using a variable from Statsmodels (everything works except the last line of this code block below)?
from ggplot import *
ggplot(aes(x='rundiff', y='winpct'), data=mlb_df) +\
geom_point() + geom_text(aes(label='team'),hjust=0, vjust=0, size=10) +\
stat_smooth(method='lm', color='blue') +\
ggtitle('Contenders vs Pretenders') +\
ggannotate('text', x = 4, y = 7, label = 'R^2')
Thanks.

You can use geom_text as a provisional solution
from ggplot import *
import pandas as pd
dataText=pd.DataFrame.from_items([('x',[4]),('y',[7]),('text',['R^2'])])
ggplot(aes(x='rundiff', y='winpct'), data=mlb_df) +\
geom_point() + geom_text(aes(label='team'),hjust=0, vjust=0, size=10) +\
stat_smooth(method='lm', color='blue') +\
ggtitle('Contenders vs Pretenders') +\
geom_text(aes(x='x', y='y', label='text'), data=dataText)

Related

Issue with scipy.io.wavfile.read and scipy.fftpack.fft

from os.path import dirname, join as pjoin
import scipy.io as sio
from scipy.io import wavfile
from scipy.fftpack import fft
data_dir = pjoin(dirname(sio.__file__), 'tests', 'data')
wav_fname = pjoin(data_dir, 'test-44100Hz-2ch-32bit-float-be.wav')
print(wav_fname)
def create_FFT(fn,size=1000):
sample_rate, X = wavfile.read(fn)
fft_features = abs(fft(X)[:size])
return(sample_rate, X, fft_features)
for wav_fn in wav_fname :
samplerate, data, fft_features = create_FFT(wav_fn)
print(f"number of channels = {data.shape[1]}")
print("fft features are: {}".format(fft_features))
In the above code, if I don't include fft specific code in the create_FFT function, I could read the file and print the number of channels. However, as soon as I include fft specific code, I get an error "FileNotFoundError: [Errno 2] No such file or directory: 'C'"
Any help will be appreciated.
Found the answer. It was with the for loop at the bottom.

Error when using Seaborn in jupyter notebook(pyspark)

I am trying to visualize data using Seaborn. I have created a dataframe using SQLContext in pyspark. However, when I call lmplot it results in an error. I am not sure what I am missing. Given below is my code(I am using jupyter notebook):
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.load('file:///home/cloudera/Downloads/WA_Sales_Products_2012-14.csv',
format='com.databricks.spark.csv',
header='true',inferSchema='true')
sns.lmplot(x='Quantity', y='Year', data=df)
Error trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-86-2a2b43993475> in <module>()
----> 2 sns.lmplot(x='Quantity', y='Year', data=df)
/home/cloudera/anaconda3/lib/python3.5/site-packages/seaborn/regression.py in lmplot(x, y, data, hue, col, row, palette, col_wrap, size, aspect, markers, sharex, sharey, hue_order, col_order, row_order, legend, legend_out, x_estimator, x_bins, x_ci, scatter, fit_reg, ci, n_boot, units, order, logistic, lowess, robust, logx, x_partial, y_partial, truncate, x_jitter, y_jitter, scatter_kws, line_kws)
557 hue_order=hue_order, size=size, aspect=aspect,
558 col_wrap=col_wrap, sharex=sharex, sharey=sharey,
--> 559 legend_out=legend_out)
560
561 # Add the markers here as FacetGrid has figured out how many levels of the
/home/cloudera/anaconda3/lib/python3.5/site-packages/seaborn/axisgrid.py in __init__(self, data, row, col, hue, col_wrap, sharex, sharey, size, aspect, palette, row_order, col_order, hue_order, hue_kws, dropna, legend_out, despine, margin_titles, xlim, ylim, subplot_kws, gridspec_kws)
255 # Make a boolean mask that is True anywhere there is an NA
256 # value in one of the faceting variables, but only if dropna is True
--> 257 none_na = np.zeros(len(data), np.bool)
258 if dropna:
259 row_na = none_na if row is None else data[row].isnull()
TypeError: object of type 'DataFrame' has no len()
Any help or pointer is appreciated. Thank you in advance:-)
sqlContext.read.load(...) returns a Spark-DataFrame. I am not sure, whether seaborn can automatically cast a Spark-DataFrame into a Pandas-Dataframe.
Try:
sns.lmplot(x='Quantity', y='Year', data=df.toPandas())
df.toPandas() returns the the pandas-DF from the Spark-DF.

How to use the mlab iso_surface module in a Mayavi app

I'm trying to build a simple Mayavi script application which utilises the mlab iso_surface module.
However, when I run my app it throws up two windows, one showing my mayavi iso_surface plot and the other showing a blank "Edit properties" window. It seems that the mayavi scene is not being displayed in the specified view layout for the "Edit properties" window.
So my question is: Why is the mayavi iso_surface scene not appearing in the view layout, and how do I get it in there?
A simple test script which displays this behaviour is pasted below. I am using Canopy version: 2.1.1.3504 (64 bit), python 3.5.2 on a Windows 10 system.
[Note: I have modified my original question to include another question. How do I update the 's' data with the input from a Range object (mult_s)? I have had a go at doing this below, but with no success. It throws up: TraitError: Cannot set the undefined 's' attribute of a 'ArraySource' object.]
class Isoplot1(HasTraits):
scene = Instance(MlabSceneModel, ())
mult_s = Range(1, 5, 1)
#on_trait_change('scene.activated')
def _setup(self):
# Create x, y, z, s data
L = 10.
x, y, z = np.mgrid[-L:L:101j, -L:L:101j, -L:L:101j]
self.s0 = np.sqrt(4 * x ** 2 + 2 * y ** 2 + z ** 2)
# create the data pipeline
self.src1 = mlab.pipeline.scalar_field(x, y, z, self.s0)
# Create the plot
self.plot1 = self.scene.mlab.pipeline.iso_surface(
self.src1, contours=[5, ], opacity=0.5, color=(1, 1, 0)
)
#on_trait_change('mult_s')
def change_s(self):
self.src1.set(s=self.s0 * self.mult_s)
# Set the layout
view = View(Item('scene',
editor=SceneEditor(scene_class=MayaviScene),
height=400, width=600, show_label=False),
HGroup('mult_s',),
resizable=True
)
isoplot1 = Isoplot1()
isoplot1.configure_traits()
If you use self.scene.mlab.pipeline.scalar_field instead of mlab.pipeline.scalar_field this should not happen.
In general, you should avoid creating any visualization in the initializer. Instead you should always setup the scene when the scene.activated event is fired. To be safe for uses with raw mlab you should rewrite your code as follows.
from mayavi import mlab
from traits.api import HasTraits, Instance, on_trait_change
from traitsui.api import View, Item
from mayavi.core.ui.api import MayaviScene, SceneEditor, MlabSceneModel
import numpy as np
class Isoplot1(HasTraits):
scene = Instance(MlabSceneModel, ())
#on_trait_change('scene.activated')
def _setup(self):
# Create x, y, z, s data
L = 10.
x, y, z = np.mgrid[-L:L:101j, -L:L:101j, -L:L:101j]
s = np.sqrt(4 * x ** 2 + 2 * y ** 2 + z ** 2)
# create the data pipeline
self.src1 = mlab.pipeline.scalar_field(x, y, z, s)
# Create the plot
self.plot1 = self.scene.mlab.pipeline.iso_surface(
self.src1, contours=[5, ], opacity=0.5, color=(1, 1, 0)
)
# Set the layout
view = View(Item('scene',
editor=SceneEditor(scene_class=MayaviScene),
height=400, width=600, show_label=False),
resizable=True
)
isoplot1 = Isoplot1()
isoplot1.configure_traits()
You probably already know this but just in case you can also take a look at some of the other mayavi interactive examples in the documentation.

Print proper mathematical formatting

When I use sympy to get the square root of 8, the output is ugly:
2*2**(1/2)
import sympy
In [2]: sympy.sqrt(8)
Out[2]: 2*2**(1/2)
Is there any way to make sympy print output in proper mathematical notation (i.e. using the proper symbol for square root) ?
UPDATE:
when I follow the suggestions from #pqnet:
from sympy import *
x, y, z = symbols('x y z')
init_printing()
init_session()
I get following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-21d886bf3e54> in <module>()
2 x, y, z = symbols('x y z')
3 init_printing()
----> 4 init_session()
/usr/lib/python2.7/dist-packages/sympy/interactive/session.pyc in init_session(ipython, pretty_print, order, use_unicode, quiet, argv)
154 # and False means don't add the line to IPython's history.
155 ip.runsource = lambda src, symbol='exec': ip.run_cell(src, False)
--> 156 mainloop = ip.mainloop
157 else:
158 mainloop = ip.interact
AttributeError: 'ZMQInteractiveShell' object has no attribute 'mainloop'
In an ipython notebook you can enable Sympy's graphical math typesetting with the init_printing function:
import sympy
sympy.init_printing(use_latex='mathjax')
After that, sympy will intercept the output of each cell and format it using math fonts and symbols. Try:
sympy.sqrt(8)
See also:
Printing section in the Sympy Tutorial.
The simplest way to do it is this:
sympy.pprint(sympy.sqrt(8))
For me (using rxvt-unicode and ipython) it gives
___
2⋅╲╱ 2

"LapackError: Parameter a has non-native byte order in lapack_lite.dgesdd" when importing from Matlab files

After importing this data file from Matlab with scipy.io.loadmat, things appeared to work fine until we tried to calculate the conditioning number of one of the matrixes within.
Here's the minimum amount of code that reproduces for us:
import scipy
import numpy
stuff = scipy.io.loadmat("dati-esercizio1.mat")
numpy.linalg.cond(stuff["A"])
Here's the extended stacktrace courtesy of iPython:
In [3]: numpy.linalg.cond(A)
---------------------------------------------------------------------------
LapackError Traceback (most recent call last)
/snip/<ipython-input-3-15d9ef00a605> in <module>()
----> 1 numpy.linalg.cond(A)
/snip/python2.7/site-packages/numpy/linalg/linalg.py in cond(x, p)
1409 x = asarray(x) # in case we have a matrix
1410 if p is None:
-> 1411 s = svd(x,compute_uv=False)
1412 return s[0]/s[-1]
1413 else:
/snip/python2.7/site-packages/numpy/linalg/linalg.py in svd(a, full_matrices, compute_uv)
1313 work = zeros((lwork,), t)
1314 results = lapack_routine(option, m, n, a, m, s, u, m, vt, nvt,
-> 1315 work, -1, iwork, 0)
1316 lwork = int(work[0])
1317 work = zeros((lwork,), t)
LapackError: Parameter a has non-native byte order in lapack_lite.dgesdd
All obvious ideas (like flattening and reshaping the matrix or recreating the matrix from scratch reassigning it element by element) failed. How can I want to massage the data, then, in order to make it more agreeable with numpy?
It's a bug, fixed some time ago: https://github.com/numpy/numpy/pull/235
Workaround:
np.linalg.cond(stuff['A'].newbyteorder('='))
This works for me:
In [33]: stuff = loadmat('dati-esercizio1.mat')
In [34]: a = stuff['A']
In [35]: try: np.linalg.cond(a)
....: except: print "Fail!"
Fail!
In [36]: b = np.array(a, dtype='>d')
In [37]: np.linalg.cond(b)
Out[37]: 62493201976.673141
In [38]: np.all(a == b) # Verify they hold the same data.
Out[38]: True
Apparently it's something wrong with the byte order (endianness?) of each number in the resulting ndarray and not just with the ndarray object itself.
Something like this but more elegant should do the trick:
n, m = A.shape()
B = numpy.empty_like(A)
for i in xrange(n):
for j in xrange(m):
B[i,j] = float(A[i,j])
del A
B = A
print numpy.linalg.cond(A) # 62493210091.354507
(For some reason an in-place replacement still gives that error - so there's something wrong with the byte order of the whole object, too.)