Seaborn boxplots changes (narrows) width of boxes when a hue is chosen, how might I remedy this? - boxplot

I am using seaborn to create a boxplot. When I specify a column by which to group/color the boxes, the width of the boxes becomes so narrow that they are hard to see. The only change I am making is specifying an argument for hue, which points to a column in the dataframe passed. I have tried using the 'width' parameter (as mentioned here), which does increase the width of the boxplots, but also the distance at which they are spread apart.
Help: How can I maintain the width of the boxes while specifying a hue parameter?
I will show my code and results below:
My dataframe:
Out[3]:
timestamp room_number floor floor_room temperature
0 2016-01-19 09:00:00-05:00 11a06 11 11_11a06 23.0
1 2016-01-19 09:00:00-05:00 east-inner 11 11_east-inner 22.8
2 2016-01-19 09:00:00-05:00 east-window 11 11_east-window 22.9
Use of seaborn with odd boxplot widths, using a grouping factor:
sns.boxplot(x=xunit, y=var, data=df, order=order, hue='floor')
Use of seaborn that has reasonable boxplot widths, but no grouping factor:
sns.boxplot(x=xunit, y=var, data=df)

In version 0.8 (July 2017), the dodge parameter was added
to boxplot, violinplot, and barplot to allow use of hue without changing the position or width of the plot elements, as when the hue varible is not nested within the main categorical variable.
(release notes v0.8.0)
Your code would look like this:
sns.boxplot(x=xunit, y=var, data=df, order=order, hue='floor', dodge=False)

It turns out the the 'hue' parameter causes the issue (I am not sure why). By removing this parameter/argument from the function, the problem goes away, but you must provide extra information so that the boxplots are color coded by the condition desired. The following line of code fixed my problem:
sns.boxplot(x=xunit, y=var, data=df, order=order,palette=df[condition_column].map(palette_dir))
Where palette_dir is a dictionary of colors for each condition, mapped to a column of data.
The boxplots look normal now, but I am struggling to add a figure legend. I am hoping the person who resolved this in this post can point me to their method.

Related

Wish parameter in sns.boxplot is not reflected on the graphic when it changes

Assume we have a simple dataset and I want to determine the upper and the lower boundaries q1=27 and q3=51.
As you expected the boundaries should be -9 and 87. But when I use sns.boxplot it gives me shorter whiskers. More interesting thing is that, when I change the whis with any values chart does not give a reaction. As follows:
sns.boxplot(y=df[variable],whis=2)
plt.title('Boxplot')
plt.show()

Bokeh - How to properly set & control xaxis for datetime - possible bug?

I am trying to make a basic line & point plot in Bokeh (0.12.3) using the following code. I have set the x_axis_type as 'datetime' and I am plotting a (random) variable vs. a pandas (0.19.0) datetime64 dtype that is the index of the dataframe (i.e. a timeseries).
The problem I see with the plot is that the dates are not properly aligned. In the time series, the max date is 2016-11-06, however, the last scale tick is for Nov 16, and there is a point aligned to what appears to be several days after that.
Curiously, when zooming in the plot, the alignment looks correct!
Is this a bug, or am I doing something wrong for this plot? Do I need to be more specific in how the x-axis should be rendered?
Also, I really think the scale increments should be in equal number of days. However in this case, Bokeh plots the scale increments to be on the same day of esch month (which is a varying number of days increments). I have seen this before in other plots, and that default can hamper interpretation.
Appreciate any help on this. Here is the code and the screen shots that demonstrate the issue:
# imports & config
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show, output_notebook
output_notebook()
# create a times series dataframe
rng = pd.date_range('2016-07-24', periods=16, freq='W')
df = pd.DataFrame(np.random.randn(len(rng)), index = rng, columns=['Y'])
# view the tail of the data to compare to plot
df.tail()
# make and render the plot
p1 = figure(x_axis_type='datetime',
title='Y vs Week Ending',
plot_width=700, plot_height=400)
p1.xaxis.axis_label = 'Week Ending'
p1.yaxis.axis_label = 'Y'
p1.line(df.index, df['Y'])
p1.circle(df.index, df['Y'])
p1.yaxis.minor_tick_line_alpha=0
show(p1)
the last scale tick is for Nov 16, and there is a point aligned to what appears to be several days after that.
That tick is for Nov 2016.. it's not very intuitive, but the year is contracted into the label.
Knowing this may change your perspective w.r.t. this comment:
...I really think the scale increments should be in equal number of days. However in this case, Bokeh plots the scale increments to be on the same day of [each] month...
What it's done is change the base unit from day to month, which is probably the more correct approach.

Color nodes in Networkx Graph based on specific values

I have looked a bit into the node_color keyword parameter of the nx.draw() method. Here are two different graphs colored using node_colors.
node_colors = [.5,.5,0.,1.]. Colors appear as expected
node_colors = [.9,1.,1.,1.]. Colors do not appear as expected
In the second image, I would expect the color of node 1 to be almost as dark. I assume what is happening is the colormap is getting scaled from the minimum value to the maximum value. For the first example, that's fine, but how can I set the colormap to be scaled from: 0=white, 1=blue every time?
You are correct about the cause of the problem. To fix it, you need to define vmin and vmax.
I believe
nx.draw(G, node_color=[0.9,1.,1.,1.], vmin=0, vmax=1)
will do what you're after (I would need to know what colormap you're using to be sure).
For edges, there are similar parameters: edge_vmin and edge_vmax.

Custom number format for y-axis on Chart

I have created a chart with 2 axes that acts as a panel chart (see image)
As a panel chart I only want to show the portions of the relevant y-axes to the chart next to them. For example, for the right-most y-axis I used a custom number format to exclude anything less than 0:
_(* #,##0_);_("";_(* 0??_);_(#_)
But for the left most y-axis, I'm stuck. I want to show -400 to positive 400. I've tried 2 different options, but neither is producing the desired effect.
[<0](#,##0);[>500000000]"";#,##0_)
[<0](#,##0);[<500000000]#,##0_);""
Here is the result I'm looking for:
I learned something new today (and a bit weird) regarding formats and chart axes
After some experimenting, this is what I ended up using:
[White][>500]_(#,##0_);(#,##0);0;
The odd part: When you change the Display Units of the axis (for me, millions), then the formatting no longer recognizes the original amount (500,000,000).
Once I figured that out, I was able to work out the solution.

Sigmaplot: How to scale x-axis for correctly displaying boxplots

I want to display overlapping boxplots using Sigmaplot 12. When I choose the scale for the x-axis as linear then the boxes do indeed overlap but are much too thin. See figure below. Of course they should be much wider.
When I choose the scale of the x-axis to be "category", then the boxes have the right width, but are arranged along each single x-value.
I want the position as in figure 1 and the width as in figure 2. I tried to resize the box in figure 1 but when I choose 100% in "bar width" than it still looks like Figure 1.
many thanks!
okay, I found the answer myself. In Sigmaplot, there is often the need to prepare "style"-columns, for example if you want to color your barcharts, you need a column that holds the specific color names.
For my boxplot example I needed a column that has the values for "width". These had to be quite large (2000) in order to have an effect. Why ? I have no idea. First I thought it would be because of the latitude values and that the program interprets the point as "1.000"s, but when I changed to values without decimals, it didnĀ“t get better.
Well, here is the result in color.
Have fun !