Removing edges from NetworkX graph which are NOT in a set - networkx

I have an almost tree-like graph with one special node which serves as the root and a bunch of terminal nodes which are the leaves. The graph has some small loops which are artifacts from a previous function and I need to remove them.
To do this, I have performed a nx.dijkstra_path between the root node and every terminal node. I then convert the traversed nodes into tuples representing edges and store them in a set:
traversed_edges = set()
for terminal_node in terminal_nodes:
dijkstra_res = nx.dijkstra_path(graph, root_node, terminal_node, weight="length")
for n_idx in range(len(dijkstra_res)-1):
traversed_edges.add((dijkstra_res[n_idx], dijkstra_res[n_idx+1]))
I have to use this instead of a depth-first search because I want to use a weight, which doesn't seem possible any other way.
At this point I want to remove all edges which have NOT been traversed, which would make the graph into a proper tree. I try to do this like so:
graph.remove_edges_from(graph.edges - traversed_edges)
This statement executes successfully, but it has deleted edges seemingly at random. It's worth noting that this graph currently has many "psuedonodes", i.e. nodes that are just connecting two edges. Really, every long line is composed of lots of little short lines end-to-end.
Before:
After:
But what I find most perplexing is that if I remove the traversed edges, then I am indeed left with the edges that were definitely not traversed, as seen below, which is the result of executing:
graph.remove_edges_from(traversed_edges)
Why is it that removing the traversed edges successfully leaves me with all the edges that I want to delete, but removing the inverse just garbles the graph?

Related

Creation of two networks with the same node coordinates

I create a network add nodes and edges. I view it (it creates a dot and pdf file automatically). Later, I want to create a second network with the same nodes but different edges. I want to place the nodes in the same coordinates, so that I can make a comparison of both graphs easily. I tried to get the coordinates of the first graph, and tried to set the coordinates of the nodes) but I couldn't find proper functions to do that. I also checked networkx package. I also tried to get a copy of the first network, and delete the edges with no success. Can someone please show me how to create a second network with the same node coordinates?
This is the simple network creation code
import graphviz as G
network1 = G.Digraph(
graph_attr={...},
node_attr={...},
edge_attr={...} )
network.node("xxx")
network.node("yyy")
network.node("zzz")
network.edge("xxx", "yyy")
network.edge("yyy", "zzz")
network1.view(file_name)
First, calculate the node positions for the first graph using the layout of your choice (say, the spring layout):
node_positions = nx.layout.spring_layout(G1)
Now, you can draw this graph and any other graph with the same nodes in the same positions:
nx.draw(G1, with_labels=True, pos=node_positions)
nx.draw(G2, with_labels=True, pos=node_positions)
Graphviz's layers feature might also be interesting:
https://graphviz.org/faq/#FaqOverlays
Here is a working example of using layers - ignore the last two lines that create a video.
https://forum.graphviz.org/t/stupid-dot-tricks-2-making-a-video/109
And here is some more background:
https://forum.graphviz.org/t/getting-layers-to-work-with-svg/107

Breadth First Search controlled by edge number in networkx?

I know the networkx could provide the breadth-first search (bfs) results based on the control of depth. I am wondering is there a workaround so I can control the result with the number of edges? For example, I hope to get 10 edges around a node i by bfs. But I don't know what depth it could be.
The bfs controlled by the depth is something like
bfs = nx.bfs_edges(G, source=i, depth_limit=5)
I hope to use a function something like
bfs = nx.bfs_edges(G, source=i, number=k)
As I hope to find all the edges around a node. So it looks like the nx.edge_bfs is a better option? This function returns all the edges currently. Could we modify it somehow? I hope the source node can be located as center as possible, i.e., the yield edges could evenly around the source node.
After looking at the source code of nx.edge_bfs, it looks like this function itself returns an iterator which yields the edges surround the source node one by one. And it basically returns the surrounding edges like a circle.
It could be better if someone could confirm this. Here is the source link

GKGraph GKGraphNode GKGridGraphNode, what's relationship for them?

I've read the document but still confused of them, could any guy can give me a clearly explaining, e.g.any image comparison? Thanks.
The Wikipedia article on Pathfinding might help, as might the related topics on graphs and graph search algorithms linked from there. Beyond that, here's an attempt at a quick explainer.
Nodes are places that someone can be, and their connections to other nodes define someone can travel between places. Together, a collection of (connected) nodes form a graph.
GKGraphNode is the most general form of node — these nodes don't know anything about where they are in space, just about their connections to other nodes. (That's enough for basic pathfinding, though... if you have a graph where A is connected to B and B is connected to C, the path from A to C goes through B regardless of where those nodes are located, like below.)
GKGraph is a collection of nodes, and provides functions that work the graph as a whole, like the important one for finding paths.
GKGridGraphNode and GKGraphNode2D are specialized versions of GKGraphNode that add knowledge of the node's position in space — either integer grid space (like a chessboard) or open 2D space. Once you've added that kind of information, a GKGraph containing these kinds of nodes can take distance into account when pathfinding.
For example, look at this image:
If we're just using GKGraphNode, all we're talking about is which nodes are connected to which. So if we ask for the shortest path from A to D, we can get either ACD or ABD, because it's an qual number of connections either way. But if we use GKGridGraphNode or GKGraphNode2D, we're looking at the lengths of the lines between nodes, in which case ACD is the shortest path.
Once you start locating your nodes in (some sort of coordinate) space, it helps to be able to operate on the graph as a whole in that space. That's where GKGridGraph and GKObstacleGraph come in.
GKGridGraph works with GKGridGraphNodes and lets you do things like create a graph to fill a set of dimensions (say, a 10x10 grid, with diagonal movement allowed) instead of making you create and connect a bunch of nodes yourself.
GKObstacleGraph adds more to free-2D-space graphs by letting you mark areas as impassable obstacles and automatically managing the nodes and connections to route around obstacles.
Hopefully this helps a bit. For more, besides the reference docs and guide, Apple also has a WWDC video that shows how this stuff works.

How can I find all nodes around a point that are members of a way with a certain tag?

I would like to find all highway way member nodes in a certain radius. I cannot see how to do this without using intersection, however, that is not in the API. For example I have this:
[out:json];
way(around:25, 50.61193,-4.68711)["highway"];>->.a;
(node(around:25, 50.61193,-4.68711) - .a);
out;
Result set .a contains the nodes I want but also nodes outside the radius - potentially a large number if the ways are long. I can find all the nodes inside the radius I don't need, as returned by the complete query above. Now I can always perform a second around query and do the intersection of the two result sets outside of Overpass. Or I can do another difference:
[out:json];
way(around:25, 50.61193,-4.68711)["highway"];>->.a;
(node(around:25, 50.61193,-4.68711) - .a)->.b;
(node(around:25, 50.61193,-4.68711) - .b);
out;
This gives the result I want but can it be simplified? I'm certain I'm missing something here.
Indeed, your query can be simplified to an extent that we don't need any difference operator at all. I would recommend the following approach:
We first query for all nodes around a certain lat/lon position and a given radius.
Based on this set of nodes we determine all ways, which contain some of the previously found nodes (-> Hint: that's why we don't need any kind of intersection or difference!).
Using our set of highway ways we now look again for all nodes of those ways within a certain radius of our lat/lon position.
In Overpass QL this reads like:
[out:json];
node(around:25, 50.61193,-4.68711);
way(bn)[highway];
node(w)(around:25, 50.61193,-4.68711);
out;
Try it on Overpass Turbo

Functions for pruning a NetworkX graph?

I am using NetworkX to generate graphs of some noisy data. I'd like to "clean up" the graph by removing branches that are spurious, and hope to avoid re-inventing the wheel.
For example, the linked picture shows a sample set of graphs, as colored nodes connected by gray lines. I'd like to prune the nodes/edges indicated by the white boxes: http://www.broadinstitute.org/~mbray/example_tree.png
Essentially, the nodes/edges to be removed are branches typically only a few nodes (< 3) in length. By removing them, I hope to have a tree with a minimum of branching but the branches that do remain are "suitably" long.
Before I start crafting code to examine subtrees for removal, are there NetworkX functions that can be used for this purpose?
You can use the betweenness_centrality score of the nodes. If the node with a low centrality score is connected to a node of remarkably higher centrality score, and has 3 edges, then you can remove the low centrality node. (the rest of the <3 connected nodes aren't connected to the main graph anymore.).
You'll need to experiment with the phrase "remarkably higher".