How is it possible to get the list of all leaf nodes in Barabasi Albert Graph.
G = nx.barabasi_albert_graph(10, 2)
Best to use a list comprehension.
g = nx.barabasi_albert_graph(10, 2)
leaf_nodes = [node for node in g if nx.degree(node)==1]
Note that in networkx node in g works just like node in g.nodes().
Finding leaf nodes
Leaf nodes have one degree:
g = nx.barabasi_albert_graph(10, 2)
leaf_nodes = []
for node in g.nodes():
if nx.degree(g,node) == 1:
leaf_nodes.append(node)
print(leaf_nodes)
Related
I would like to evaluate the quality of a partitions of nodes in a graph using NetworkX's modularity function.
My partitions V are given in the following vector format: V(i) = node_label(i) where node_label is a function assigning to each node an integer from 1 to K where K is the number of clusters.
The modularity function intakes the graph G and the communities. It is specified in the documentation that communities must be a list or an iterable of set of nodes, as in the following example.
import networkx.algorithms.community as nx_comm
G = nx.barbell_graph(3, 0)
nx_comm.modularity(G, [{0, 1, 2}, {3, 4, 5}])
0.35714285714285715
nx_comm.modularity(G, nx_comm.label_propagation_communities(G))
0.35714285714285715
To convert my partition into the communities format, I have written the following code (not the most efficient):
communities = []
for k in np.unique(partition):
cluster = np.where(partition == k)
cluster = set(np.squeeze(np.array(cluster)).tolist())
communities.append(cluster)
However when I try to run the function I get the following error:
TypeError: 'int' object is not iterable
How do I get around this?
I'm trying to split GraphFrame connectedComponent output for each component to have a sub-group for each complete connected, meaning all vertices are connected to each other. the following sketch will help demonstrate what I'm trying to achieve
I'm using NetworkX method in order to achive it as following
def create_subgroups(edges,components, key_name = 'component'):
# joining the edges to enrich component id
sub_components = edges.join(components,[(edges.dst == components.id) | (edges.src == components.id)]).select('src','dst',key_name).drop_duplicates()
# caching the table using temp table
sub_components = save_temp_table(sub_components,f'inner_sub_{key_name}s', zorder = [key_name])
schema = StructType([ \
StructField("index",LongType(),True), \
StructField("id",StringType(),True), \
])
# applying pandas udf to enrich each vertices with the new component id
sub_components = sub_components.groupby(key_name).applyInPandas(pd_create_subgroups, schema).where('id != "not_connected"').drop_duplicates()
# joining the output and mulitplying each vertices by the time of sub-groups were found
components = components.join(sub_components,'id','left')
components = components.withColumn(key_name,when(col('index').isNull(),col(key_name)).otherwise(concat(col(key_name),lit('_'),concat('index')))).drop('index')
return components
import networkx as nx
from networkx.algorithms.clique import find_cliques
def pd_create_subgroups(pdf):
# building the graph
gnx = nx.from_pandas_edgelist(pdf,'src','dst')
# removing one degree nodes
outdeg = gnx.degree()
to_remove = [n[0] for n in outdeg if n[1] == 1]
gnx.remove_nodes_from(to_remove)
bic = list(find_cliques(gnx))
if len(bic)<=2:
return pd.DataFrame(data = {"index":[-1],"id":["not_connected"]})
res = {
"index":[],
"id":[]
}
ind = 0
for i in bic:
if len(i)<3:
continue
for id in i:
res['index'] = res['index'] + [ind]
res['id'] = res['id'] + [id]
ind += 1
return pd.DataFrame(res)
# creating sub-components if necessary
subgroups = create_subgroups(edges,components, key_name = 'component')
My problem is that there's a very large component containing 80% of the vertices causing very slow performance of the clusters. I've been trying to use labelPropagation to create smaller groups but it wouldn't do the trick. it has split it in a way that isn't suitable causing a split of vertices that should have been in the same groups.
Here's the cluster usage when it reaches the pandas_udf part
This issue was resolved by separating vertices into N groups, pulling all edges for each vertice in the group, and calculating the sub-group using the find_cliques method.
I'm new to Anylogic and I apologize in advance if this is a beginner question. I have an agent that travel in a network between nodes along paths. In a function, I would like to get the name of the path between two nodes - the node that the agent is at and the node that it is going to. As an example, the variables containing the name of the two nodes are n1 and n2.
The network is set up such that there is only one path between two given nodes.
I am using the following to get the name of the node for the the current node:
Node n = (Node)agent.getNetworkNode();
String n1 = n.getName();
n2 is manually assigned. For example:
String n2 = "node2";
What is the best way to get the name of the path? Any help would be much appreciated. Thank you.
I don't think there is and easy way so I show you this way... Note that there can be many paths connecting both nodes, so this solution gives you the paths names for all the paths that connect n1 and n2
Node n1=findFirst(network.nodes(),n->n.getName().equals("n1"));
Node n2=findFirst(network.nodes(),n->n.getName().equals("n2"));
ArrayList <Path> conn1=new ArrayList();
ArrayList <Path> conn2=new ArrayList();
ArrayList <Path> paths=new ArrayList();
for(int i=0;i<n1.getConnectionsCount();i++){
if(n1.getConnection(i) instanceof Path)
conn1.add(n1.getConnection(i)); // add the path connected to n1
}
for(int i=0;i<n2.getConnectionsCount();i++){
if(n2.getConnection(i) instanceof Path)
conn2.add(n2.getConnection(i)); // add the path connected to n2
}
for(Path p1 : conn1){
if(conn2.contains(p1)){
paths.add(p1); // add the path matches
}
}
for(int i=0;i<paths.size();i++){
traceln(paths.get(i).getName()); // print the name of the paths
}
I am trying to use k-medoids to cluster some trajectory data I am working with (multiple points along the trajectory of an aircraft). I want to cluster these into a set number of clusters (as I know how many types of paths there should be).
I have found that k-medoids is implemented inside the pyclustering package, and am trying to use that. I am technically able to get it to cluster, but I do not know how to control the number of clusters. I originally thought it was directly tied to the number of elements inside what I called initial_medoids, but experimentation shows that it is more complicated than this. My relevant code snippet is below.
Note that D holds a list of lists. Each list corresponds to a single trajectory.
def hausdorff( u, v):
d = max(directed_hausdorff(u, v)[0], directed_hausdorff(v, u)[0])
return d
traj_count = len(traj_lst)
D = np.zeros((traj_count, traj_count))
for i in range(traj_count):
for j in range(i + 1, traj_count):
distance = hausdorff(traj_lst[i], traj_lst[j])
D[i, j] = distance
D[j, i] = distance
from pyclustering.cluster.kmedoids import kmedoids
initial_medoids = [104, 345, 123, 1]
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()[0]
num_clusters = len(np.unique(cluster_lst))
print('There were %i clusters found' %num_clusters)
I have a total of 1900 trajectories, and the above-code finds 1424 clusters. I had expected that I could control the number of clusters through the length of initial_medoids, as I did not see any option to input the number of clusters into the program, but this seems unrelated. Could anyone guide me as to the mistake I am making? How do I choose the number of clusters?
In case of requirement to obtain clusters you need to call get_clusters():
cluster_lst = kmedoids_instance.get_clusters()
Not get_clusters()[0] (in this case it is a list of object indexes in the first cluster):
cluster_lst = kmedoids_instance.get_clusters()[0]
And that is correct, you can control amount of clusters by initial_medoids.
It is true you can control the number of cluster, which correspond to the length of initial_medoids.
The documentation is not clear about this. The get__clusters function "Returns list of medoids of allocated clusters represented by indexes from the input data". so, this function does not return the cluster labels. It returns the index of rows in your original (input) data.
Please check the shape of cluster_lst in your example, using .get_clusters() and not .get_clusters()[0] as annoviko suggested. In your case, this shape should be (4,). So, you have a list of four elements (clusters), each containing the index or rows in your original data.
To get, for example, data from the first cluster, use:
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()
traj_lst_first_cluster = traj_lst[cluster_lst[0]]
I want networkx to find the absolute longest path in my directed,
acyclic graph.
I know about Bellman-Ford, so I negated my graph lengths. The problem:
networkx's bellman_ford() requires a source node. I want to find the
absolute longest path (or the shortest path after negation), not the
longest path from a given node.
Of course, I could run bellman_ford() on each node in the graph and
sort, but is there a more efficient method?
From what I've read (eg,
http://en.wikipedia.org/wiki/Longest_path_problem) I realize there
actually may not be a more efficient method, but was wondering if
anyone had any ideas (and/or had proved P=NP (grin)).
EDIT: all the edge lengths in my graph are +1 (or -1 after negation), so a method that simply visits the most nodes would also work. In general, it won't be possible to visit ALL nodes of course.
EDIT: OK, I just realized I could add an additional node that simply connects to every other node in the graph, and then run bellman_ford from that node. Any other suggestions?
There is a linear-time algorithm mentioned at http://en.wikipedia.org/wiki/Longest_path_problem
Here is a (very lightly tested) implementation
EDIT, this is clearly wrong, see below. +1 for future testing more than lightly before posting
import networkx as nx
def longest_path(G):
dist = {} # stores [node, distance] pair
for node in nx.topological_sort(G):
pairs = [[dist[v][0]+1,v] for v in G.pred[node]] # incoming pairs
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
node, max_dist = max(dist.items())
path = [node]
while node in dist:
node, length = dist[node]
path.append(node)
return list(reversed(path))
if __name__=='__main__':
G = nx.DiGraph()
G.add_path([1,2,3,4])
print longest_path(G)
EDIT: Corrected version (use at your own risk and please report bugs)
def longest_path(G):
dist = {} # stores [node, distance] pair
for node in nx.topological_sort(G):
# pairs of dist,node for all incoming edges
pairs = [(dist[v][0]+1,v) for v in G.pred[node]]
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
node,(length,_) = max(dist.items(), key=lambda x:x[1])
path = []
while length > 0:
path.append(node)
length,node = dist[node]
return list(reversed(path))
if __name__=='__main__':
G = nx.DiGraph()
G.add_path([1,2,3,4])
G.add_path([1,20,30,31,32,4])
# G.add_path([20,2,200,31])
print longest_path(G)
Aric's revised answer is a good one and I found it had been adopted by the networkx library link
However, I found a little flaw in this method.
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
because pairs is a list of tuples of (int,nodetype). When comparing tuples, python compares the first element and if they are the same, will process to compare the second element, which is nodetype. However, in my case the nodetype is a custom class whos comparing method is not defined. Python therefore throw out an error like 'TypeError: unorderable types: xxx() > xxx()'
For a possible improving, I say the line
dist[node] = max(pairs)
can be replaced by
dist[node] = max(pairs,key=lambda x:x[0])
Sorry about the formatting since it's my first time posting. I wish I could just post below Aric's answer as a comment but the website forbids me to do so stating I don't have enough reputation (fine...)