Bokeh won't show string linebreaks in Datatable - jupyter

I'm trying to show a table of strings using Bokeh (I'm already using bokeh plots within a vplot+tabs, and wanted a table of data also in the vplot).
My strings are multiline via having '\n' characters, but when attempting to display these within a Bokeh DataTable the line breaks are stripped. Any way to avoid this?
Example code in a jupyter python3 notebook:
import bokeh
from bokeh.plotting import figure, output_notebook, show, vplot
from bokeh.io import output_file, show, vform
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import DataTable, StringFormatter, TableColumn
output_notebook()
table_data = ColumnDataSource(dict(strings=['[ -1.23456, \n \
7.89012, \n \
3.456789 \n \
]', \
'[ -1.23456, \n \
7.89012, \n \
3.456789 \n \
]']
))
columns = [TableColumn(field="strings", title="Strings", formatter=StringFormatter(text_color='#BB0000')) ]
tableplot = DataTable(source=table_data, columns=columns)
show(vform(tableplot))
This yields in the following:
(note: this example code is also at https://github.com/seltzered/Bokeh-multiline-datatable-demo
Environment: Python 3.5, Jupyter 1.0 / Notebook 4.1.0, bokeh 0.11.1
Small update: not solved yet, but realized that the web layout for the table/slickgrid seems to set hard height values (of 25px/line) and offsets for later rows, so I'm thinking this is an expected limitation.

I tried using the HTMLTemplateFormatter but ran into the same issue you mention. This only displays the first line of each array:
table_data = ColumnDataSource(dict(strings=['[ -1.23456, \n \
7.89012, \n \
3.456789 \n \
]', \
'[ -1.23456, \n \
7.89012, \n \
3.456789 \n \
]']
))
columns = [
TableColumn(field="strings",
title="Strings",
formatter=HTMLTemplateFormatter(template='<pre><%= value %></pre>')) ]
tableplot = DataTable(source=table_data, columns=columns)
show(vform(tableplot))

The issue doesn't appear to be due to bokeh, it's more of a limitation of slickgrid - there's some workarounds / slickgrid forks posted out there for those really looking for a solution to have multitline grids, but haven't seen anything officially merged in.
see related questions to slickgrid: Is variable rowheight a possibility in SlickGrid?
https://groups.google.com/forum/#!topic/slickgrid/jvqatSyH-hk

Related

How to read CSV correctly - pyspark and a mess data

I tried to read a CSV file with pyspark with the following line in it:
2100,"Apple Mac Air A1465 11.6"" Laptop - MD/B (Apr, 2014)",Apple MacBook
My code for reading:
df = spark.read.options(header='true', inferschema='true').csv(file_path)
And the df splits the second component at the middle:
first component: 2100
second component: "Apple Mac Air A1465 11.6"" Laptop - MD/B (Apr,
Third component: 2014)"
Meaning that the second original component was split into two components.
I tried several more syntaxes (databricks, sql context etc. ) but all had the same result.
What is the reason for that? How could I fix it?
For this type of scenarios spark has provided a great solution i.e. escape option.
just add escape =' " ' in options. you will get 3 components as shown below.
df= spark.read.options(header='true', inferschema='true',escape='"').csv("file:///home/srikarthik/av.txt")
This is happening because file seperator is comma(',').
So write a code such that it will ignore comma when it comes between " and "
otherwise second solution-you read the file as it is without column header.then replace comma with */any other punctuation when it comes bet " ".and then save the file then read using comma as seperator it will work

How do I specify a gcloud csv format separator

I am using gcloud beta logging read to read some logs and am using the --format option to format as csv:
--format="csv(timestamp,jsonPayload.message)"
which works fine.
gcloud topic formats suggests I can specify the separator for my CSV output (i'd like to specify ", " so that the entries are spaces out a little) but I can't figure out the syntax for specifying the separator. I've tried the following but neither are correct:
--format="csv(timestamp,jsonPayload.message),separator=', '"
--format="csv(timestamp,jsonPayload.message)" --separator=", "
Does anyone know how to do this?
thx
Never mind, I figured it out.
--format="csv[separator=', '](timestamp,jsonPayload.message)"

How can I import relationships csv file into Neo4j with labels including special characters?

I have an edge_d.csv file like the following:
:START_ID,:END_ID,:TYPE,reaction
CPD-12497,CPD-12498,direct,"RXN-11539"
CO-A,CPD-14010,direct,"RXN-12965"
CPD-8186,CPD-14010,direct,"RXN-12965"
everything works fine if I do not include the last column, "reaction". However, when I add this column my graph database can not be built anymore. I use neo4j-import tool like this:
/neo4j-import --into graph.db --id-type string --quote "\"" --bad-tolerance 100000 --nodes nodes1.csv, node2.csv --relationships edge_t.csv,edge_s.csv,edge_p.csv,edge_d.csv
The docs about the --quote options says:
Character to treat as quotation character for values in CSV data.
Quotes can be escaped by doubling them, for example "" would be
interpreted as a literal ". You cannot escape using . Default: "
Try removing the --quote option. This way:
/neo4j-import --into graph.db --id-type string --bad-tolerance 100000 --nodes nodes1.csv, node2.csv --relationships edge_t.csv, edge_s.csv, edge_p.csv, edge_d.csv

How to avoid markdown typesetting of $ signs in Jupyter output?

I am reading an Excel file in Jupyter, which contains income data, e.g. $2,500 to $4,999. Rendered output is returning:
How can I avoid this formatting?
In pandas>=0.23.0, you can prevent MathJax from rendering perceived LaTeX found in DataFrames. This is achieved using:
import pandas as pd
pd.options.display.html.use_mathjax = False
In Jupyter you can use a backslash ( \ ) before the dollar sign to avoid starting a LaTeX math block.
So write \$2,500 in your markdown instead of $2,500.
A markdown cell like this:
Characterisic | Total with Income| \$1 to \$2,499 | \$2,500 to \$4,999
--------------|------------------|----------------|--------------
data | data |data | data
data | data |data | data
will be rendered by Jupyter like so:
If the table is handled with typical Jupyter tools (python,numpy,pandas) you can alter the column names with a short code snippet.
The snippet below will replace all $ strings in the column names with \$ so that Jupyter will render them without LaTeX math.
import pandas as pd
data = pd.read_excel("test.xlsx")
for col in range(0, len(data.columns.values)):
data.columns.values[col] = data.columns.values[col].replace("$", "\$")
data
Before and after screenshot:

sed to replace SECRET_KEY in django settings file, introduces garbage

I built a small script that creates a copy from a standard django setup. After copying the project, I'd like to replace the SECRET_KEY. Both the original SECRET_KEY and the replacement contain numerous special characters. My shell code looks like this:
SECRET=$(python -c 'from random import choice; import sys; sys.stdout.write("".join([choice("abcdefghijklmnopqrstuvwxyz0123456789^&*(-_=+)") for i in range(50)]))')
sed --in-place "s/^SECRET_KEY = .*/SECRET_KEY = '${SECRET}'/" src/settings.py
When I run this, it works sometimes, but in most cases the result looks something like this:
SECRET_KEY = '*n(hbp+o31v*d3pSECRET_KEY = '=ih8(6hwlqiamvg88_jtatqi1w2^axl=+omrpwck*aena-c3ax'8gpv8SECRET_KEY = '=ih8(6hwlqiamvg88_jtatqi1w2^axl=+omrpwck*aena-c3ax'8bwc4ele+bk(*+)vv4tSECRET_KEY = '=ih8(6hwlqiamvg88_jtatqi1w2^axl=+omrpwck*aena-c3ax'*qscez(f'
I have no idea where all this garbage comes from, but I guess it has something to do with the special characters in either the original SECRET_KEY or the replacement that are interpretet as regex special characters. Any idea how I can get rid of this?
Do regex escaping of your secret generator, and you should be fine :
using re.escape.
SECRET=$(python -c 'import re;from random import choice; import sys; sys.stdout.write(re.escape("".join([choice("abcdefghijklmnopqrstuvwxyz0123456789^&*(-_=+)") for i in range(50)])))')