Adding edge attribute causes TypeError: 'AtlasView' object does not support item assignment - networkx

Using networkx 2.0 I try to dynamically add an additional edge attribute by looping through all the edges. The graph is a MultiDiGraph.
According to the tutorial it seems to be possible to add edge attributes the way I do in the code below:
g = nx.read_gpickle("../pickles/" + gname)
yearmonth = gname[:7]
g.name = yearmonth # works
for source, target in g.edges():
g[source][target]['yearmonth'] = yearmonth
This code throws the following error:
TypeError: 'AtlasView' object does not support item assignment
What am I doing wrong?

That should happen if your graph is a nx.MultiGraph. From which case you need an extra index going from 0 to n where n is the number of edges between the two nodes.
Try:
for source, target in g.edges():
g[source][target][0]['yearmonth'] = yearmonth
The tutorial example is intended for a nx.Graph.

Related

How To Use kmedoids from pyclustering with set number of clusters

I am trying to use k-medoids to cluster some trajectory data I am working with (multiple points along the trajectory of an aircraft). I want to cluster these into a set number of clusters (as I know how many types of paths there should be).
I have found that k-medoids is implemented inside the pyclustering package, and am trying to use that. I am technically able to get it to cluster, but I do not know how to control the number of clusters. I originally thought it was directly tied to the number of elements inside what I called initial_medoids, but experimentation shows that it is more complicated than this. My relevant code snippet is below.
Note that D holds a list of lists. Each list corresponds to a single trajectory.
def hausdorff( u, v):
d = max(directed_hausdorff(u, v)[0], directed_hausdorff(v, u)[0])
return d
traj_count = len(traj_lst)
D = np.zeros((traj_count, traj_count))
for i in range(traj_count):
for j in range(i + 1, traj_count):
distance = hausdorff(traj_lst[i], traj_lst[j])
D[i, j] = distance
D[j, i] = distance
from pyclustering.cluster.kmedoids import kmedoids
initial_medoids = [104, 345, 123, 1]
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()[0]
num_clusters = len(np.unique(cluster_lst))
print('There were %i clusters found' %num_clusters)
I have a total of 1900 trajectories, and the above-code finds 1424 clusters. I had expected that I could control the number of clusters through the length of initial_medoids, as I did not see any option to input the number of clusters into the program, but this seems unrelated. Could anyone guide me as to the mistake I am making? How do I choose the number of clusters?
In case of requirement to obtain clusters you need to call get_clusters():
cluster_lst = kmedoids_instance.get_clusters()
Not get_clusters()[0] (in this case it is a list of object indexes in the first cluster):
cluster_lst = kmedoids_instance.get_clusters()[0]
And that is correct, you can control amount of clusters by initial_medoids.
It is true you can control the number of cluster, which correspond to the length of initial_medoids.
The documentation is not clear about this. The get__clusters function "Returns list of medoids of allocated clusters represented by indexes from the input data". so, this function does not return the cluster labels. It returns the index of rows in your original (input) data.
Please check the shape of cluster_lst in your example, using .get_clusters() and not .get_clusters()[0] as annoviko suggested. In your case, this shape should be (4,). So, you have a list of four elements (clusters), each containing the index or rows in your original data.
To get, for example, data from the first cluster, use:
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()
traj_lst_first_cluster = traj_lst[cluster_lst[0]]

Is it possible to return a map of key values using gremlin scala

Currently i have two gremlin queries which will fetch two different values and i am populating in a map.
Scenario : A->B , A->C , A->D
My queries below,
graph.V().has(ID,A).out().label().toList()
Fetch the list of outE labels of A .
Result : List(B,C,D)
graph.traversal().V().has("ID",A).outE("interference").as("x").otherV().has("ID",B).select("x").values("value").headOption()
Given A and B , get the egde property value (A->B)
Return : 10
Is it possible that i can combine both there queries to get a return as Map[(B,10)(C,11)(D,12)]
I am facing some performance issue when i have two queries. Its taking more time
There is probably a better way to do this but I managed to get something with the following traversal:
gremlin> graph.traversal().V().has("ID","A").outE("interference").as("x").otherV().has("ID").label().as("y").select("x").by("value").as("z").select("y", "z").select(values);
==>[B,1]
==>[C,2]
I would wait for more answers though as I suspect there is a better traversal out there.
Below is working in scala
val b = StepLabel[Edge]()
val y = StepLabel[Label]()
val z = StepLabel[Integer]()
graph.traversal().V().has("ID",A).outE("interference").as(b)
.otherV().label().as(y)
.select(b).values("name").as(z)
.select((y,z)).toMap[String,Integer]
This will return Map[String,Int]

OrientDB create edge between two nodes with the same day of year

I'm interested in creating an edge (u,v) between two nodes of the same class in a graph if they share the same day of year and v.year = u.year+1.
Say I have vertices.csv:
id,date
A,2014-01-02
B,2015-01-02
C,2016-01-02
D,2013-06-01
E,2014-06-01
F,2016-06-01
The edge structure I'd like to see would be this:
A --> B --> C
D --> E
F
Let's set the vertex class to be "myVertex" and edge class to be "myEdge"? Is it possible to generate these edges using the SQL interface?
Based on this question, I started trying something like this:
BEGIN
LET source = SELECT FROM myVertex
LET target = SELECT from myVertex
LET edge = CREATE EDGE myEdge
FROM $source
TO (SELECT FROM $target WHERE $source.date.format('MM-dd') = $target.date.format('MM-dd')
AND $source.date.format('yyyy').asInteger() = $target.date.format('yyyy').asInteger()-1)
COMMIT
Unfortunately, this isn't correct. So I got less ambitious and wanted to see if I can create edges just based on matching day-of-year:
BEGIN
LET source = SELECT FROM myVertex
LET target = SELECT from myVertex
LET edge = CREATE EDGE myEdge FROM $source TO (SELECT FROM $target WHERE $source.date.format('MM-dd') = $target.date.format('MM-dd'))
COMMIT
Which still has errors... I'm sure it's something pretty simple to an experienced OrientDB user.
I thought about putting together a JavaScript function like Michela suggested on this question, but I'd prefer to stick to using the SQL commands as much as possible for now.
Help is greatly appreciated.
Other Stack Overflow References
How to print or log on function javascript OrientDB
I tried with OSQL batch but I think that you can't get what you want.
With whis OSQL batch
begin
let a = select #rid, $a1 as toAdd from test let $a1 = (select from test where date.format("MM") == $parent.$current.date.format("MM") and date.format("dd") == $parent.$current.date.format("dd") and #rid<>$parent.$current.#rid and date.format("yyyy") == sum($parent.$current.date.format("yyyy").asInteger(),1))
commit
return $a
I got this
but the problem is that when you create the edge you can not cycle on the table obtained in the previous step.
I think the best solution is to use an JS server-side function.
Hope it helps.

"TypeError: 'int' object is not subscriptable" when trying to set node attributes in networkX

below is the code that is giving the TypeError...
import pandas as pd
import networkx as nx
datamuse = pd.read_csv(NetworkDatasheet.csv',index_col=0)
print(datamuse)
G = nx.DiGraph(datamuse.values)
nx.draw_random(G, with_labels=True)
dc= nx.degree_centrality(G)
bc=nx.betweenness_centrality(G,normalized = True)
ec=nx.eigenvector_centrality(G)
nx.set_node_attributes(G,'degree centrality',dc)
nx.set_node_attributes(G,'betweenness centrality',bc)
nx.set_node_attributes(G,'eigenvector centrality',ec)
G.nodes()[1]['degree centrality']
the values in the dictionary (e.g: dc) are float like 0.029411764705882353
The last line of your code should be replaced by:
G.nodes(data=True)[1][1]['degree centrality']
You need to have the associated properties of your nodes, hence data=True otherwise you only get the node ids.
Then when you do G.nodes(data=True)[1] you actually get a tuple (nodeId, data_dict) so to access the data values, you need to get the second element, hence [1].

Entity Framework - TOP using a dynamic query

I'm having issues implementing the TOP or SKIP functionality when building a new object query.
I can't use eSQL because i need to use an "IN" command - which could get quite complex if I loop over the IN and add them all as "OR" parameters.
Code is below :
Using dbcontext As New DB
Dim r As New ObjectQuery(Of recipient)("recipients", dbcontext)
r.Include("jobs")
r.Include("applications")
r = r.Where(Function(w) searchAppIds.Contains(w.job.application_id))
If Not statuses.Count = 0 Then
r = r.Where(Function(w) statuses.Contains(w.status))
End If
If Not dtFrom.DbSelectedDate Is Nothing Then
r = r.Where(Function(w) w.job.create_time >= dtDocFrom.DbSelectedDate)
End If
If Not dtTo.DbSelectedDate Is Nothing Then
r = r.Where(Function(w) w.job.create_time <= dtDocTo.DbSelectedDate)
End If
'a lot more IF conditions to add in additional predicates
grdResults.DataSource = r
grdResults.DataBind()
If I use any form of .Top or .Skip it throws an error : Query builder methods are not supported for LINQ to Entities queries
Is there any way to specify TOP or Limit using this method? I'd like to avoid a query returning 1000's of records if possible. (it's for a user search screen)
Rather than
r = new ObjectQuery<recipient>("recipients", dbContext)
try
r = dbContext.recipients.
.Skip() and .Take() return IOrderedQueriable<T> while .Where returns IQueriable<T>. Thus put the .Skip() and .Take() last.
Also change grdResults.DataSource = r to grdResults.DataSource = r.ToList() to execute the query now. That'll also allow you to temporarily wrap this line in try/catch, which may expose a better message about why it's erroring.
Mark this one down to confusion. I should have been using the .Take instead of .Top or .Limit or anything.
my final part is the below and it works :
grdResults = r.Take(100)