Use of regex on iso-2022-jp encoding with Python - encoding

I have some ISO-2022-JP encoded text.
Ex. :
まだ 正式 に 決まっ た わけ で は ない の で 。
According to the re library documentation, it can accept both ascii and unicode, so I tried to convert my text to unicode and to cut at the word level:
text.decode('iso-2022-jp')
print(text)
print(re.findall(r"[\w']+", text))
However, here is the kind of output I get:
まだ 正式 に 決まっ た わけ で は ない の で 。
['B', 'B', 'B', 'B', 'B', '5', '0', 'B', 'B', 'K', 'B', 'B7h', 'C', 'B', 'B', 'B', 'B', 'o', '1', 'B', 'B', 'G', 'B', 'B', 'O', 'B', 'B', 'J', 'B', 'B', 'N', 'B', 'B', 'G', 'B', 'B', 'B']
What do I do wrong ?
Thanks!

Your code work for me. (Python 3.3.0)
>>> text = "まだ 正式 に 決まっ た わけ で は ない の で 。"
>>> print(text)
まだ 正式 に 決まっ た わけ で は ない の で 。
>>> import re
>>> re.findall(r"[\w']+", text)
['まだ', '正式', 'に', '決まっ', 'た', 'わけ', 'で', 'は', 'ない', 'の', 'で']
BTW, you didn't assign the decoded string to text.
text = text.decode('iso-2022-jp')
UPDATE
I get following result, if I decode text as ascii (discarding non-ascii character).
>>> re.findall(r"[\w']+", text.encode('iso-2022-jp').decode('ascii', 'ignore'))
['B', 'B', 'B', '5', '0', 'B', 'B', 'K', 'B', 'B7h', 'C', 'B', 'B', 'B', 'B', 'o', '1', 'B', 'B', 'G', 'B', 'B', 'O', 'B', 'B', 'J', 'B', 'B', 'N', 'B', 'B', 'G', 'B', 'B', 'B']
Seems like you're decode/encode incorrectly.
UPDATE2
If you read text from file, you don't need decode individual lines. Specify encoding in open() call.
import re
with open('results', 'r', encoding='iso-2022-jp') as f:
for line in f:
matches = re.findall(r"[\w']+", line)
if matches:
print(matches)

Related

Altair plots no longer displaying in VS Code

My Altair plots are no longer displaying in VS Code. Is anyone else having this issue? Matplotlib / pandas plots still show normally.
I used the simple bar chart example:
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
I can plot a pandas bar chart and get it to display:
source.plot.bar()
But I get no output when using Altair:
alt.Chart(source).mark_bar().encode(
x='a',
y='b'
)
I think that you need to call the chart, so first you need to save the chart as a variable and then call.
Chart1=alt.Chart(source).mark_bar().encode(
x='a',
y='b'
)
Chart1

Consolidating multiple Python Charts into one Dashboard with Plotly-Dash

I have two different Python Dashboards, both of which visualize different types of financial data. I would like to have both figures on one single dashboard, one above the other. Would you happen to know if that is possible? If so, I'm sure one has to extend the entire app structure, including the layout and, more importantly, callback part. Has anyone any experience with merging two apps into one dashboard? Below you'll find my code I've assembled so far.
First Dashboard:
# import relevant packages
import pandas as pd
import numpy as np
import matplotlib as mpl
import plotly
import dash
import pyodbc
import plotly.express as px
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
data = [['2020-01-31', 100, 100, 100], ['2020-02-28', 101, 107, 99], ['2020-03-31', 104, 109, 93], ['2020-04-30', 112, 115, 94], ['2020-05-31', 112, 120, 89]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['DATE', 'A', 'B', 'C'])
df = df.set_index('DATE')
df
# create the Dash app
app = dash.Dash()
# Set up the app layout
app.layout = html.Div(children=[
html.H1(children='Index Dashboard'),
html.P('''Pick one or more stocks from the dropdown below.'''),
dcc.Dropdown(id='index-dropdown',
options=[{'label': x, 'value': x}
for x in df.columns],
value='A',
multi=True, clearable=True),
dcc.Graph(id='price-graph')
])
# Set up the callback function
#app.callback(
Output(component_id='price-graph', component_property='figure'),
[Input(component_id='index-dropdown', component_property='value')],
)
def display_time_series(selected_index):
dff = df[selected_index] # Only columns selected in dropdown
fig = px.line(dff, x=df.index, y=selected_index, labels={'x': 'x axis label'})
fig.update_layout(
title="Price Index Development",
xaxis_title="Month",
yaxis_title="Price",
font=dict(size=13))
return fig
# Run local server
if __name__ == '__main__':
app.run_server(debug=True, use_reloader=False)
Second Dashboard:
data2 = [['A', 'B', 0.4], ['A', 'C', 0.5], ['A', 'D', 0.1], ['X', 'Y', 0.15], ['X', 'Z', 0.85]]
df2 = pd.DataFrame(data2, columns = ['BM_NAME', 'INDEX_NAME', 'WEIGHT'])
df2
barchart = px.bar(
data_frame=df2,
x=df2.BM_NAME,
y="WEIGHT",
color="INDEX_NAME",
opacity=0.9,
barmode='group')
barchart
# create the Dash app
app = dash.Dash()
# set up app layout
app.layout = html.Div(children=[
html.H1(children='BM Composition'),
dcc.Dropdown(id='BM-dropdown',
options=[{'label': x, 'value': x}
for x in df2.BM_NAME.unique()],
value='A',
multi=False, clearable=True),
dcc.Graph(id='bar-chart')
])
# set up the callback function
#app.callback(
Output(component_id="bar-chart", component_property="figure"),
[Input(component_id="BM-dropdown", component_property="value")],
)
def display_BM_composition(selected_BM):
filtered_BM = df2[df2.BM_NAME == selected_BM] # Only use unique values in column "BM_NAME" selected in dropdown
barchart = px.bar(
data_frame=filtered_BM,
x="BM_NAME",
y="WEIGHT",
color="INDEX_NAME",
opacity=0.9,
barmode='group')
return barchart
# Run local server
if __name__ == '__main__':
app.run_server(debug=True, use_reloader=False)
Many thanks in advance!

Individual color for routes trough a for loop

I would like to give each route which will be created by plot_graph_routes to give a individual color. I'm able to do it for two routes like explaint here by #gboing. But I have a lot of routes so I do try to solve it with in a loop.
G = ox.graph_from_place('München, Oberbayern, Bayern, Deutschland', network_type='drive_service', simplify=True)
G_projected = ox.project_graph(G)
#%%
ox.config(log_console=True, use_cache=True)
colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k', 'w']
rc = []
nc = []
for r in range(len(routes)-1):
if r == 0:
rc.extend(colors[r] * (len(routes[r]) - 1))
else:
rc.extend(colors[r] * len(routes[r]))
nc.extend([colors[r], colors[r]])
fig, ax = ox.plot_graph_routes(G_projected, [routes], fig_height=40,route_color=rc, orig_dest_node_color=nc, node_size=0)
If I do so I get following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-32-956a08875e0e> in <module>
9 rc.extend(colors[r] * len(routes[0][r]))
10 nc.extend([colors[r], colors[r]])
---> 11 fig, ax = ox.plot_graph_routes(G_projected, [routes[0]], fig_height=40,route_color=rc, orig_dest_node_color=nc, node_size=0)
12
13
~\Anaconda3\lib\site-packages\osmnx\plot.py in plot_graph_routes(G, routes, bbox, fig_height, fig_width, margin, bgcolor, axis_off, show, save, close, file_format, filename, dpi, annotate, node_color, node_size, node_alpha, node_edgecolor, node_zorder, edge_color, edge_linewidth, edge_alpha, use_geom, orig_dest_points, route_color, route_linewidth, route_alpha, orig_dest_node_alpha, orig_dest_node_size, orig_dest_node_color, orig_dest_point_color)
732 origin_node = route[0]
733 destination_node = route[-1]
--> 734 orig_dest_points_lats.append(G.nodes[origin_node]['y'])
735 orig_dest_points_lats.append(G.nodes[destination_node]['y'])
736 orig_dest_points_lons.append(G.nodes[origin_node]['x'])
~\Anaconda3\lib\site-packages\networkx\classes\reportviews.py in __getitem__(self, n)
176
177 def __getitem__(self, n):
--> 178 return self._nodes[n]
179
180 # Set methods
TypeError: unhashable type: 'list'
Thanks for any advice.
Here is a full working example of multiple routes using the original example you cited:
import networkx as nx
import osmnx as ox
import numpy as np
import matplotlib.pylab as plt
ox.config(log_console=True, use_cache=True)
G = ox.graph_from_place('Piedmont, CA, USA', network_type='drive')
available_colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k']
# create Nroutes random routes
Nroutes = 5
routes = []
route_colors = []
node_colors = []
for iroute in range(Nroutes):
orig, dest = np.random.choice(G.nodes(), 2)
route = nx.shortest_path(G, orig, dest, weight='length')
routes.append(route)
route_color = available_colors[iroute%len(available_colors)]
node_colors.extend([route_color]*2) # the colors for Origin and Destination of this route
route_colors.extend([route_color]*(len(route) - 1))
# plot the routes
fig, ax = ox.plot_graph_routes(G, routes, route_color=route_colors, orig_dest_node_color=node_colors, node_size=0)
plt.show()

Receiving key error on Networkx color_map

I'm having trouble getting the color_map to work with my networkx graph. It's fairly simply code but won't seem to work. I've looked through other similar threads but no the solutions don't seem to work.
I have data that look like this:
edgelist_manual = [{'source': 'ABE', 'target': 'ATL', 'value': 851},
{'source': 'ABE', 'target': 'BHM', 'value': 1},
{'source': 'ABE', 'target': 'CLE', 'value': 805}]
edgelist = pd.DataFrame(edgelist_manual)
nodelist_manual = [{'source': 'ABE', 'value': '4807', 'group': 0},
{'source': 'ABI', 'value': '2660', 'group': 4},
{'source': 'ABQ', 'value': '41146', 'group': 2}]
nodelist = pd.DataFrame(nodelist_manual)
I run the code below, but my color_map keep screwing up. I just get a key error on the 'group' reference.
import itertools
import copy
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
nodelist = pd.read_csv('final_nodes.csv')
edgelist = pd.read_csv('final_edges.csv')
g = nx.Graph()
for i, elrow in edgelist.iterrows():
g.add_edge(elrow[0], elrow[1], attr_dict=elrow[2:].to_dict())
for i, nlrow in nodelist.iterrows():
g.node[nlrow['source']].update(nlrow[1:].to_dict())
color_map = {0: 'r', 1:'b', 2:'r', 3:'b', 4:'r', 5:'b'}
colors = [color_map[g.node[node]['group']] for node in g]
nx.draw(g, node_color=colors)
ax = plt.gca()
ax.collections[0].set_edgecolor("#555555")
plt.show()
The only difference from this and my code is that rather than creating the data manually I'm loading it from .csv. I've checked for trailing whitespaces on the feature labels but nothing. I don't understand indices well so I wonder if those are messing it up. Any ideas?
Thanks!

Cryptic python error 'classobj' object has no attribute '__getitem__'. Why am I getting this?

I really wish I could be more specific here but I have read through related questions and none of them seem to relate to the issue that I am experiencing here and I have no understanding of the issue i am experiencing. This is for a homework assignment so I am hesitant to put up all my code for the program, here is a stripped down version. Compile this and you will see the issue.
import copy
class Ordering:
def __init__(self, tuples):
self.pairs = copy.deepcopy(tuples)
self.sorted = []
self.unsorted = []
for x in self.pairs:
self.addUnsorted(left(x))
self.addUnsorted(right(x))
def addUnsorted(self, item):
isPresent = False
for x in self.unsorted:
if x == item:
isPresent = True
if isPresent == False:
self.unsorted.append(left(item))
Here I have created a class, Ordering, that takes a list of the form [('A', 'B'), ('C', 'B'), ('D', 'A')] (where a must come before b, c must come before b, etc.) and is supposed to return it in partial ordered form. I am working on debugging my code to see if it works correctly but I have not been able to yet because of the error message I get back.
When I input the follwing in my terminal:
print Ordering[('A', 'B'), ('C', 'B'), ('D', 'A')]
I get back the following error message:
Traceback (most recent call last): File "<stdin>", line 1, in (module) Type Error: 'classobj' object has no attribute '__getitem__'
Why is this?!
To access an element of a list, use square brackets. To instantiate a class, use parens.
In other words, do not use:
print Ordering[('A', 'B'), ('C', 'B'), ('D', 'A')]
Use:
print Ordering((('A', 'B'), ('C', 'B'), ('D', 'A')))
This will generate another error from deeper in the code but, since this is a homework assignment, I will let you think about that one a bit.
How to use __getitem__:
As a minimal example, here is a class that returns squares via __getitem__:
class HasItems(object):
def __getitem__(self, key):
return key**2
In operation, it looks like this:
>>> a = HasItems()
>>> a[4]
16
Note the square brackets.
Answer to "Why is this?"
Your demo-code is not complete ( ref. comment above ), however the issue with .__getitem__ method is clearly related with a statement to print an object ( which due to other reasons did fail to respond to a request to answer to a called .__getitem__ method ) rather than the Class itself.
>>> aList = [ ('A','B'), ('C','D'), ('E','F')] # the stated format of input
>>> aList # validated to be a list
[('A', 'B'), ('C', 'D'), ('E', 'F')]
>>> type( aList ) # cross-validated
<type 'list'>
>>> for x in aList: # iterator over members
... print x, type( x ) # show value and type
... left( x ) # request as in demo-code
...
('A', 'B') <type 'tuple'>
Traceback (most recent call last): <<< demo-code does not have it
File "<stdin>", line 3, in <module>
NameError: name 'left' is not defined
>>> dir( Ordering ) # .__getitem__ method missing
[ '__doc__', '__init__', '__module__', 'addUnsorted']
>>> dir( aList[0] ) # .__getitem__ method present
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__',
'__getslice__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__',
'__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'count', 'index']