"TypeError: 'int' object is not subscriptable" when trying to set node attributes in networkX - networkx

below is the code that is giving the TypeError...
import pandas as pd
import networkx as nx
datamuse = pd.read_csv(NetworkDatasheet.csv',index_col=0)
print(datamuse)
G = nx.DiGraph(datamuse.values)
nx.draw_random(G, with_labels=True)
dc= nx.degree_centrality(G)
bc=nx.betweenness_centrality(G,normalized = True)
ec=nx.eigenvector_centrality(G)
nx.set_node_attributes(G,'degree centrality',dc)
nx.set_node_attributes(G,'betweenness centrality',bc)
nx.set_node_attributes(G,'eigenvector centrality',ec)
G.nodes()[1]['degree centrality']
the values in the dictionary (e.g: dc) are float like 0.029411764705882353

The last line of your code should be replaced by:
G.nodes(data=True)[1][1]['degree centrality']
You need to have the associated properties of your nodes, hence data=True otherwise you only get the node ids.
Then when you do G.nodes(data=True)[1] you actually get a tuple (nodeId, data_dict) so to access the data values, you need to get the second element, hence [1].

Related

nearest building with open street map

I have a csv of relevant points with latitude and longitude and trying to get the nearest
building data to each point and add a column to the csv (or panda) in python. Tried using Pyrosm and various libraries but can't seem to prune the data to get the nearest building and then add the data. Thanks
This is what I have
from pyrosm import OSM
from pyrosm import get_data
import geopandas as gpd
from sklearn.neighbors import BallTree
import numpy as np
import osmnx as ox
# get rid of weird error
import shapely
import warnings
from shapely.errors import ShapelyDeprecationWarning
import csv
def get_gig_data(csv_fname):
with open(csv_fname, "r", encoding="latin-1") as gig_records:
for gig_record in csv.reader(gig_records):
yield gig_record
def main():
warnings.filterwarnings("ignore", category=ShapelyDeprecationWarning)
chicago_osm = OSM(get_data("chicago"))
#get a Point of Interest GeoDataFrame
points_of_interest = chicago_osm.get_pois() #can use a custom filter if we want to filter the types, but I think no filter might be the best
# get buildings nodes and edges
nodes, edges = chicago_osm.get_network(nodes=True, network_type="walking")
buildings = chicago_osm.get_buildings()
b_cnt = len(buildings)
G = chicago_osm.to_graph(nodes, edges)
#nodes = get_igraph_nodes(G)
buildings['geometry'] = buildings.centroid
# poi_list = np.asarray([ point.coords for point in points_of_interest['geometry'] ]) #if point.geom_type == point])
#print(poi_list.shape)
#tree = BallTree( np.asarray([ point.coords for point in points_of_interest['geometry'] if point.geom_type == point]), metric="manhattan") #Note: the scipy implementation of manhattan/cityblock distance might be faster according to the internet bc it uses a C function
#Read in the gig work data - I think the best way to do this will probably be with the CSV.reader with open thing because it will go line by line and save a ton of memory
'''for i in points_of_interest:
print('Type: ', type(i) , ' ',i)'''
gig_fp = "data_sample.csv"
#gig_data = gpd.read_file(gig_fp)
iter_gig = iter(get_gig_data(gig_fp))
next(iter_gig)
ids=dict()
for building in buildings.iterrows():
#print(type(building[1][32]) , ' ', building[1][32])
#tup = tuple(float(x) for x in [trip[17][8:-1].split()])
ids[building[1][32]] = building
#make the tree that determines closest POI
#if we use the CSV reader this for loop will be done already
for trip in iter_gig:
# Using generator so this should be efficient memory wise.
tup = tuple([float(x) for x in trip[17][8:-1].split() ])
print(type(tup), ' ', tup)
src_ids,euclidean_distance=ox.distance.nearest_nodes(G,tup)
src_ids, euclidean_distance= ox.distance.nearest_nodes(G,tup)
# find nearest node
#THEN ADD THE PICKUP AND DROPOFF IDS TO THIS TUPLE AND ADD TO A NEW NP ARRAY
if __name__ == '__main__':
main()

TabPy executing Python code, SCRIPT_REAL is being called with (string)

First time using TabPy and have the connections successfully set up. Within the "Create Calculated Field" button in Tableau, I have tried
SCRIPT_REAL("
import numpy as np
import pandas as pd
")
which results in "SCRIPT_REAL is being called with (string), did you mean (string, ...)?
Additionally, how do I refer to the dataset such as I did in Python to execute the following?
data = pd.read_csv("C:/Users/.../dataset.csv")
data.head()
plt.figure(figsize=(7,7))
plt.pie(data['stroke'].value_counts(sort = True),
explode = (0.05, 0),
labels = data['stroke'].value_counts(sort = True).index,
colors = ["blue","green"],
autopct = '%1.1f%%')
plt.title('Pie Chart')
plt.show()

I have a table 't' with two columns 'col24' and 'col23' I want to create a data frame 'r'

Imagine a table t with two columns -- col24 and col18I want to make a data frame 'r'.So that the resulting data frame will have only one column col24 called first_name.
I have tried the following code but it wont work.but I get it incorrect help me to solve
import pyspark.sql.functions as f
r = t.select(f.explode("col24").alias("first_name")).toPandas()
import pyspark.sql.functions as f
If I understood your question correctly, these two options should work:
r = t.select('col24').f.withColumnRenamed('col24', 'first_name')
r = t.withColumnRenamed('col24', 'first_name').drop('col18')
If you have multiple columns in a list my_cols for example, then second option becomes:
r = t.withColumnRenamed('col24', 'first_name').drop(*my_cols)
Then you can check your dataframe:
r.show()
or if t is massive, just check for columns names:
r.columns
Please find your expected answer below:
select(f.col("col24").alias("first_name"))

Deep decision tree in PySpark

I am using PySpark for machine learning and I want to train decision tree classifier, random forest and gradient boosted trees. I want to try out different maximum depth values and select the best one via grid search and cross-validation. However, Spark is telling me that DecisionTree currently only supports maxDepth <= 30. What is the reason to limit it to 30? Is there a way to increase it? I am using it with text data and my feature vectors are TF-IDFs, so I want to try higher values for the maximum depth. Sample code from the Spark website with some modifications:
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import IndexToString, StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
# Load and parse the data file, converting it to a DataFrame.
data = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
# Index labels, adding metadata to the label column.
# Fit on whole dataset to include all labels in index.
labelIndexer = StringIndexer(inputCol="label",
outputCol="indexedLabel").fit(data)
# Automatically identify categorical features, and index them.
# Set maxCategories so features with > 4 distinct values are treated as continuous.
featureIndexer =\
VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)
# Split the data into training and test sets (30% held out for testing)
(trainingData, testData) = data.randomSplit([0.7, 0.3])
# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="indexedLabel",
featuresCol="indexedFeatures", numTrees=500)
# Convert indexed labels back to original labels.
labelConverter = IndexToString(inputCol="prediction",
outputCol="predictedLabel",
labels=labelIndexer.labels)
# Chain indexers and forest in a Pipeline
pipeline = Pipeline(stages=[labelIndexer, featureIndexer, rf, labelConverter])
paramGrid_rf = ParamGridBuilder() \
.addGrid(rf.maxDepth, [50,100,150,250,300]) \
.build()
crossval_rf = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid_rf,
evaluator=BinaryClassificationEvaluator(),
numFolds= 5)
cvModel_rf = crossval_rf.fit(trainingData)
The code above gives me the error message below.
Py4JJavaError: An error occurred while calling o12383.fit.
: java.lang.IllegalArgumentException: requirement failed: DecisionTree currently only supports maxDepth <= 30, but was given maxDepth = 50.
From https://forums.databricks.com/questions/12300/for-decision-trees-is-the-current-maxdepth-limited.html
...the current implmentation imposes a restriction of maxDepth <= 30:
https://github.com/apache/spark/blob/ca6955858cec868c878a2fd8528dbed0ef9edd3f/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala#L137
You could ask to increase that limit in github forum!

Adding edge attribute causes TypeError: 'AtlasView' object does not support item assignment

Using networkx 2.0 I try to dynamically add an additional edge attribute by looping through all the edges. The graph is a MultiDiGraph.
According to the tutorial it seems to be possible to add edge attributes the way I do in the code below:
g = nx.read_gpickle("../pickles/" + gname)
yearmonth = gname[:7]
g.name = yearmonth # works
for source, target in g.edges():
g[source][target]['yearmonth'] = yearmonth
This code throws the following error:
TypeError: 'AtlasView' object does not support item assignment
What am I doing wrong?
That should happen if your graph is a nx.MultiGraph. From which case you need an extra index going from 0 to n where n is the number of edges between the two nodes.
Try:
for source, target in g.edges():
g[source][target][0]['yearmonth'] = yearmonth
The tutorial example is intended for a nx.Graph.