With networks, how to find first node(s) in a DiGraph - networkx

Using networks, which is the direct way to find the first node on a directed graph.
There might be more than one and there are not isolated nodes.
The first nodes I mean the nodes without ancestors.
Best regards and thank you in advance,
Pablo

You can look at in_degree. A node with no edges pointing to it will have an in_degree of 0.
# make dummy graph
nodes = np.arange(10)
edges = [np.random.choice(nodes, 2) for a in range(10)]
G = nx.DiGraph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)
# find the nodes whose in_degree is 0
[node for node, in_degree in G.in_degree if in_degree==0]

Related

Finding all shortest paths between two nodes in NetworkX

Assume I have a BA network with N nodes where each node has at least 2 edges. The network is unweighted. I am trying to find all the shortest paths between every node i and j for all nodes in the network. But if there are more than 1 shortest path between node i and j, then I need every single shortest path between i and j.
So if node 2 can be reached from 0 by using the paths [0,1,2], [0,3,4,2], [0,3,4,5,2], [0,4,5,2] and [0,3,2], I need a list that says [[0,1,2], [0,3,2]].
Is the only way of doing this is calculating each path from i to j and getting the smallest lenghted lists? Can this be founded in a more efficient way?
Edit: Apparently there's a path finding method called all_shortest_paths. I will try this and see if it is efficient.
You can use nx.nx.all_shortest_paths for this:
all_shortest_paths(G, source, target, weight=None, method='dijkstra')
Which allows you to specify a source and target nodes. Here's a simple example:
plt.figure(figsize=(6,4))
G = nx.from_edgelist([[1,2],[2,3],[7,8],[3,8],[1,8], [2,9],[9,0],[0,7]])
nx.draw(G, with_labels=True, node_color='lightgreen')
list(nx.all_shortest_paths(G, 2, 8))
# [[2, 1, 8], [2, 3, 8]]
Floyd Warshall algorithm is that what you need

Centralities in networkx weighted graph

I am not able to compute centralities for a simple NetworkX weighted graph.
Is it normal or I am rather doing something wrong?
I add edges with a simple add_edge(c[0],c[1],weight = my_values), where
c[0],c[1] are strings (names of the nodes) and my_values integers, within a for loop. This is an example of the resulting edges:
('first node label', 'second node label', {'weight': 14})
(the number of nodes does't really matter — for now I keep it to only 20)
The edge list of my graph is a list of tuples, with (string_node1,string_node2,weight_dictionary) - everything looks fine, as I am also able to draw/save/read/ the graph...
Why?:
nx.degree_centrality gives me all 1s ?
nx.closeness_centrality gives me all 1s ?
example:
{'first node name': 1.0,
...
'last node name': 1.0}
Thanks for your help.
It was easy:
instead of using nx.degree_centrality() I use
my_graph.degree(weight='weight') - still I think this is a basic lack in the module...
...but, the issue is still open for nx.closeness_centrality
For making closeness_centrality consider weight, you have to add a distance attribute of 1 / weight to graph edges, as suggested in this issue.
Here's code to do it (graph is g):
g_distance_dict = {(e1, e2): 1 / weight for e1, e2, weight in g.edges(data='weight')}
nx.set_edge_attributes(g, g_distance_dict, 'distance')
I know this is a pretty old question, but just wanted to point out that the reason why your degree centrality values are all 1 is probably because your graph is complete (i.e., all nodes are connected to every other node), and degree centrality refers to the proportion of nodes in the graph to which a node is connected.
Per networkx's documentation:
The degree centrality for a node v is the fraction of nodes it is connected to.
The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph n-1 where n is the number of nodes in G.

Get observations per node in cluster

After creating a cluster from some data (using an example of 6 observations), I want to get the observations from each node that the tree holds.
For the given example:
Node5 [1,2,3,4,5,6]
Node4 [1,2,3,5,6]
Node3 [2,3,5,6]
...and so on
So far I have used this code, with n being the number of observations in linkDist, which is an an agglomerative hierarchical cluster tree:
for i=1:n-1
clusterVals = cluster(linkDist,'maxClust',i);
k = find(clusterVals==i);
end
The problem is that the cluster numeration is changing due to the iterations. For example
cluster(linkDist,'maxClust',2) % [2,2,1,2,2,2]
cluster(linkDist,'maxClust',3) % [2,2,3,2,1,2]
For the following tree:
Is there a solution for my problem?
Thank you very much!

Find the cross node for number of nodes in Gremlin?

I have a number of nodes connected through intermediate node of other type. Like on picture
There are can be multiple middle nodes.
I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests?
With the toy graph.
Let's assume vertex 1 and 6 are given:
g = TinkerGraphFactory.createTinkerGraph()
m=[:]
g.v(1,6).both.groupCount(m)
m.sort{-it.value}
Sorted Map m contains:
==>v[3]=2
==>v[2]=1
==>v[4]=1

Clusters merge threshold

I'm working with Mean shift, this procedure calculates where every point in the data set converges. I can also calculate the euclidean distance between the coordinates where 2 distinct points converged but I have to give a threshold, to say, if (distance < threshold) then this points belong to the same cluster and I can merge them.
How can I find the correct value to use as threshold??
(I can use every value and from it depends the result, but I need the optimal value)
I've implemented mean-shift clustering several times and have run into this same issue. Depending on how many iterations you're willing to shift each point for, or what your termination criteria is, there is usually some post-processing step where you have to group the shifted points into clusters. Points that theoretically shift to the same mode need not practically end up on directly top of each other.
I think the best and most general way to do this is to use a threshold based on the kernel bandwidth, as suggested in the comments. In the past my code to do this post processing has usually looked something like this:
threshold = 0.5 * kernel_bandwidth
clusters = []
for p in shifted_points:
cluster = findExistingClusterWithinThresholdOfPoint(p, clusters, threshold)
if cluster == null:
// create new cluster with p as its first point
newCluster = [p]
clusters.add(newCluster)
else:
// add p to cluster
cluster.add(p)
For the findExistingClusterWithinThresholdOfPoint function I usually use the minimum distance of p to each currently defined cluster.
This seems to work pretty well. Hope this helps.