get networkx Digraph control flow graph traversal - networkx

I start using networkx framework (python) for graphs and it seems lacking a basic topological traversal (control flow graph)function for directed graph (Digraph).
For example, I have the next graph:
and I would like to traverse the graph in a way that node with id 195 will not be visited until all its predecessors (nodes 43 and 45 were visited). bfs and dfs doesn't garenty it.

Related

simulation of creation of social network graph given present snapshot

I am using http://networkrepository.com/socfb-B-anon.php dataset for my analysis. I would like to do some analysis of how this present graph is formed from scratch. Is there any existing social network simulation framework for this kind of problem?
I am also open to use any other dataset if available. I would need the timestamp for every edge( nodes connected at).
The Barabási–Albert (BA) model describes a preferential attachment model for generating networks, or graphs. It iteratively builds a graph by adding new nodes and connecting them to previously added nodes. The new node is attached to some other nodes with a probability proportional to the degree of the old node with relation to the total number of edges in the graph.
This algorithm has been shown to produce graphs that are scale-free, which means the distribution of degrees follows a power law, which is a typical property of social networks.
This can be seen as a 'simulation' of a growing social network, where users are more likely to 'befriend' or 'follow' popular existing users. Of course it's not a complete simulation because it assumes a new user is done befriending or following other users right after they created an account, but it might be a good starting point for your exploration.
A timestamp for each edge or node creation can be generated by maintaining one during the creation process and increment it as you add more edges or nodes to the graph.
Hopefully this answer gives you enough terminology to further your research.

Finding "bubbles" in a graph

In a game, we have a universe described as a strongly-connected graph full of sectors and edges. Occasionally there are small pockets, players call them 'bubbles', where a small group of nodes all access the rest of the network through a single node. In the graph below, sectors 769, 195, 733, 918, and 451 are all only reachable via node 855. If you can guard 855 effectively, then those other sectors are safe. Other nodes on the chart have more edges (purple lines) and aren't 'bubbles' in the player nomenclature.
In a 1000 or 5000-node network, it's not easy to find these sorts of sub-structures. But I suspect this idea is described in graph theory somehow, and so probably is searchable for in networkx.
Could anyone suggested a graph-theory approach to systematically find structures like this? To be clear the graph is a directed graph but almost all edges end up being bi-directional in practice. Edges are unweighted.
Graph theory has no definitions for your "bubbles", but has the similar definition - bridges. Bridge is the edge, which removal increases the number of connected components. As you can see, it is exactly what you need. networkx has a bunch of algorithms to find bridges. Curiously enough, it is called bridges :)
Example:
import networkx as nx
G = nx.Graph()
G.add_edges_from([
(1,2),(1,3),(2,3),
(1,4),(4,5),(5,6),
(6,7),(7,4)
])
nx.draw(G)
list(nx.bridges(G))
[(1, 4)]

Measuring the "remoteness" of a node in a graph

I mapped out all the edges of a graph in the ancient BBS game 'TradeWars 2002'. There are 1000 nodes. The graph is officially a directed graph, although most edges between nodes are undirected. The graph is strongly connected.
I modelled the universe in networkx. I'd like to use networkx methods to identify the "most remote" nodes in the network. I don't know how to articulate "most-remote" in graph theory terminology though. But the idea I have is nodes that would be bumped into very rarely when someone is transitting between two other arbitrary nodes. And the idea that on the edge of the well-connected nodes, there might be a string of nodes that extend out along a single path that terminates.
I visualization of what I imagine is node 733. Pretty unlikely someone accidentally stumbles onto that one, compared to other better-connected nodes.
What could I use from networkx library to quantify some measure of 'remoteness'?
This is the entire universe:
But the idea I have is nodes that would be bumped into very rarely when someone is transitting between two other arbitrary nodes.
As #Joel mentioned, there are many centrality measures available, and there are often strong correlations between them such that many of them will probably give you more or less what you want.
That being said, I think the class of centrality measures that most closely reflect your intuition are based on random walks. Some of these are pretty costly to compute (although see this paper for some recent improvements on that front) but luckily there is a strong correspondence between the Eigenvector centrality and the frequency with which nodes are visited by a random walker.
The implementation in networkx is available via networkx.algorithms.centrality.eigenvector_centrality.
networkx has a collection of algorithms for this kind of problems: centrality. For example, you can use the simpliest function: closeness_centrality:
# Create a random graph
G = nx.gnp_random_graph(50, 0.1)
nx.closeness_centrality(G)
{0: 0.3888888888888889,
1: 0.45794392523364486,
2: 0.35507246376811596,
3: 0.4375,
4: 0.4083333333333333,
5: 0.3684210526315789,
...
# Draw the graph
labels = {n: n for n in G.nodes}
nx.draw(G, with_labels=True, labels=labels)
And the most remote (the less central) nodes can be listed by returning nodes with the least closeness_centrality (note nodes IDs and nodes in blue circles in the upper picture:
c = nx.closeness_centrality(G)
sorted(c.items(), key=lambda x: x[1])[:5]
[(48, 0.28823529411764703),
(7, 0.33793103448275863),
(11, 0.35251798561151076),
(2, 0.35507246376811596),
(46, 0.362962962962963)]

Given a OSM node id, how do I find the previous x points in all directions?

I have a OSM node Id/ latitude-longitude for a point in the road(say point Z). How do I find the previous x points that I need to travel to reach Z in all directions? I was thinking overpass API could help me. But it is able to return points only with tags. I am not able to get it return the node Ids on the road/way.
Can you please suggest any API/tutorial that could help?
if i'm not wrong what you are asking is: given a osm node id with coordinate x and y what are all points to do in order to arrive there from a starting point?
if this is the question well this is a graph oriented question; you should create a grah and then use some algorithm in order to find all the routes between starting point and end point; you should use some graph oriented software.. something like neo4j and spatial contrib (https://github.com/neo4j-contrib/spatial)
In past i built a project where i read an osmfile, create a graph and used A* algorithm; you may give to it a look https://github.com/angeloimm/neo4jAstarTest
I suggest to get started by reading about OSM elements, especially nodes and ways. Afterwards take a look at OSM XML format. It might also help to open an OSM editor (e.g. iD) and to take a look at the raw data.
Nodes don't have any order or "next node" themselves. Nodes can be part of one or multiple ways. Each way references a list of ordered nodes. So you have to look at all ways a node belongs to, then look at the way's node list to determine the previous and next nodes. If the node is at the start or end of a way then you have to look if there are one or more consecutive ways. Consecutive ways share the same node at their start/end.

Graph projections with Gremlin and Titan

I would like to extract graph projections from a graph so that I can build smaller graphs from large ones for specific use cases (in most of the cases I can think of these graph projections will have a smaller size than source graph). Consider the following simple example in my original graph (all nodes have different types aka labels):
A -> B -> C -> D
Since the path from A to D exists, in my new graph I would create the nodes A and D and the edge between them. These paths could be easily discovered with a traversal (or even with the subgraph step I think):
g.V().as('a').out('aTob').out('bToC').out('cToD').as('d').select('a', 'd');
The thing is that these kind of traversals where one does not start from a specific set of nodes (where an index would be used) are not very efficient since they require a full graph scan.
16:46:27 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices. For better performance, use indexes
What kind of actions can be done here so that graph projections can be accomplished in an efficient way?