Additional forces to networkx spring_layout

Additional forces to networkx spring_layout - networkx

I would like to add additional forces to networkx spring_layout.
I have a directected graph and I would like nodes to move to different sides according to the edges that they have. Nodes that have more outgoing edges should drift to nodes that have more ingoing edges should drift right. Another alternative would be. That these groups of nodes would drift towards each other, nodes with outgoing edges would get closer while nodes with ingoing edges would also get closer to each other.
I managed to look into to the source code of spring_layout of networkx http://networkx.lanl.gov/archive/networkx-0.37/networkx.drawing.layout-pysrc.html#spring_layout
but everything there is just beyond my comprehension
G.DiGraph()
G.add_edges_from([(1,5),(2,5),(3,5),(5,6),(5,7)])
The layout should show edges 1,2,3 closer to each other, the same regarding 6 and 7.
I imagine, that I could solve this by adding invisible edges via using MultiDiGraph. I could count ingoing and outgoing edges of each node and add invisible edges that connect the two groups. However, I am very sure that there are better ways of solving the problem.

Adding weights into the mix would be a good way to group things (with those invisible nodes). But the layouts have no way of knowing left from right. To get the exact layout you want you could specify each point's x,y coordinates.
import networkx as nx
G=nx.Graph()
G.add_node(1,pos=(1,1))
G.add_node(2,pos=(2,3))
G.add_node(3,pos=(3,4))
G.add_node(4,pos=(4,5))
G.add_node(5,pos=(5,6))
G.add_node(6,pos=(6,7))
G.add_node(7,pos=(7,9))
G.add_edges_from([(1,5),(2,5),(3,5),(5,6),(5,7)])
pos=nx.get_node_attributes(G,'pos')
nx.draw(G,pos)

Related

Creation of two networks with the same node coordinates

I create a network add nodes and edges. I view it (it creates a dot and pdf file automatically). Later, I want to create a second network with the same nodes but different edges. I want to place the nodes in the same coordinates, so that I can make a comparison of both graphs easily. I tried to get the coordinates of the first graph, and tried to set the coordinates of the nodes) but I couldn't find proper functions to do that. I also checked networkx package. I also tried to get a copy of the first network, and delete the edges with no success. Can someone please show me how to create a second network with the same node coordinates?
This is the simple network creation code
import graphviz as G
network1 = G.Digraph(
graph_attr={...},
node_attr={...},
edge_attr={...} )
network.node("xxx")
network.node("yyy")
network.node("zzz")
network.edge("xxx", "yyy")
network.edge("yyy", "zzz")
network1.view(file_name)

First, calculate the node positions for the first graph using the layout of your choice (say, the spring layout):
node_positions = nx.layout.spring_layout(G1)
Now, you can draw this graph and any other graph with the same nodes in the same positions:
nx.draw(G1, with_labels=True, pos=node_positions)
nx.draw(G2, with_labels=True, pos=node_positions)

Graphviz's layers feature might also be interesting:
https://graphviz.org/faq/#FaqOverlays
Here is a working example of using layers - ignore the last two lines that create a video.
https://forum.graphviz.org/t/stupid-dot-tricks-2-making-a-video/109
And here is some more background:
https://forum.graphviz.org/t/getting-layers-to-work-with-svg/107

Choosing a networkx layout that takes edge labels into account

I'm plotting networkx weighted graphs using the draw_networkx_edge_labels function. My problem is that, since edges sometimes cross each other, it is not always clear from the plot which weight belongs to which edge. For instance, in the following plot it is not immediately clear whether 2 is the weight of (1,2) or (3,7).
I'm currently using the neato layout, which does not take edge labels into account. In particular, this is how I'm drawing a weighted graph g:
layout = nx.nx_pydot.graphviz_layout(g, prog='neato')
nx.draw(g, pos=layout)
edge_labels = nx.get_edge_attributes(g, 'weight')
nx.draw_networkx_edge_labels(g, pos=layout, edge_labels=edge_labels)
I know I can manually control the position of the label along an edge using the label_pos parameter, but my question is whether there exists a way to automatically plot the graph such that edge labels do not usually collide (either using a layout that takes labels into account or a method that "neatly" selects label positions along edges).
I'm not expecting something that always works, but since my graphs are relatively sparsely connected, I hope there's a method that at least has a tendency to work well.

I have been meaning to implement this in netgraph, my fork of the networkx drawing utilities, for a while now. Unfortunately, I have a job interview on Thursday, so I won't have time to write this anytime soon. The basic idea, however, is pretty simple, and is also already implemented in some R packages such as ggrepel and also ggnetwork.
The basic idea is that you use a force directed layout to position your labels, given a predetermined and fixed layout for your nodes and edges. So:
Compute a node layout using the layout of your choice.
Partition each edge into a chain of many, many nodes, and compute the positions of the "edge nodes" using the already known positions of the source and target nodes of the edge. This partitioning is to give each edge a "mass" in the following force directed layout.
For each edge, add a "label" node and connect it to the most central "edge node".
Compute a force-directed layout keeping all nodes but the label nodes fixed (e.g. using spring_layout in networkx).
You should now have sensible edge label coordinates that do not overlap any of the edges. Use plt.annotate to plot a connection between the edge and the edge label.

What clustering algorithm is suitable for 2d rectangles without knowing the number of clusters ahead of time?

The problem I have is that there are rectangles within rectangles. Think of a map, except with the following traits with the key point being: rectangles with similar density often share similar dimensions and similar position on the x axis with other rectangles, but sometimes the distance between these rectangles may be big but usually small. If the position on x axis or dimensions are clearly way off, they would not be similar.
rectangles do not intersect, smaller rectangles are completely inside a larger rectangle.
rectangles often have similar x position and similar dimensions
(similar height and width), and have smaller rectangles inside it. The rectangle itself would be considered a cluster of it's own.
Sometimes the distance of these cluster from another cluster may be quite
big (think of islands). Often these clusters share the same or
similar dimension and same or similar density of sub rectangles. if so, they should be considered as part of the same cluster despite a distance between the two clusters.
The more dense a rectangle is (smaller rectangles inside), the more likely there is a similar or same dense rectangle sharing same or similar dimension nearby.
I've attached a diagram to describe the situation more clearly:
Red border means those groups are outliers, not part of any cluster
and are ignored.
Blue border has many clusters (black borders containing black solid
rectangles). They form a group of clusters that are similar due to
the criteria mentioned above (similar width, similar X position,
similar density). Even the clusters towards the bottom right corner
is still considered part of this group because of the criteria
(similar width, similar X position, similar density).
Turquois border has many clusters (black borders containing black
solid rectangles). However, these clusters differ in dimension, x
position, and density, from the ones in Blue border. They are
considered a group of their own.
So far I found density clustering such as DBSCAN which seems to be perfect since it takes noise (outliers) into consideration, and you do not need to know ahead of time how many clusters there will be.
However, you need to define the minimum number of points needed to form a cluster and a threshold distance. What happens if you don't know these two and it can vary based on the problem described above?
Another seemingly plausible solution would be hierarchial (agglomerative) clustering (r-tree), but I'm concerned that I would still need to know the cutoff point in the tree depth level to determine if it's a cluster.

You certainly will need to take all your constraints into account.
In general, your task looks more like constraint satisfaction to me than clustering.
Maybe some constraint clustering approaches are useful to you, but I'm not sure if they allow your kind of constraints. Usually, they only support must-link and must-not-link constraints.
But of course you should try DBSCAN (in particular also: generalized DBSCAN, since the generalization might allow you to add the constraints you have!) and R-trees (which aren't actually a clustering algorithm, but a data index).
Note that R-trees will put the "outliers" into some leaf, to ensure minimum fill.
As is, I cannot give you more detailed recommendations, because even from above sketch, your constraints are not well defined IMHO. Try putting them into pseudocode. You probably only have a small number of rectangles (say, 100); so you can afford to run really expensive algorithms, such as linkage clustering with a customized linkage criterion. Putting your criterions into code may already be 99% of the effort!

Functions for pruning a NetworkX graph?

I am using NetworkX to generate graphs of some noisy data. I'd like to "clean up" the graph by removing branches that are spurious, and hope to avoid re-inventing the wheel.
For example, the linked picture shows a sample set of graphs, as colored nodes connected by gray lines. I'd like to prune the nodes/edges indicated by the white boxes: http://www.broadinstitute.org/~mbray/example_tree.png
Essentially, the nodes/edges to be removed are branches typically only a few nodes (< 3) in length. By removing them, I hope to have a tree with a minimum of branching but the branches that do remain are "suitably" long.
Before I start crafting code to examine subtrees for removal, are there NetworkX functions that can be used for this purpose?

You can use the betweenness_centrality score of the nodes. If the node with a low centrality score is connected to a node of remarkably higher centrality score, and has 3 edges, then you can remove the low centrality node. (the rest of the <3 connected nodes aren't connected to the main graph anymore.).
You'll need to experiment with the phrase "remarkably higher".

How do I visualise clusters of users?

I have an application in which users interact with each-other. I want to visualize these interactions so that I can determine whether clusters of users exist (within which interactions are more frequent).
I've assigned a 2D point to each user (where each coordinate is between 0 and 1). My idea is that two users' points move closer together when they interact, an "attractive force", and I just repeatedly go through my interaction logs over and over again.
Of course, I need a "repulsive force" that will push users apart too, otherwise they will all just collapse into a single point.
First I tried monitoring the lowest and highest of each of the XY coordinates, and normalizing their positions, but this didn't work, a few users with a small number of interactions stayed at the edges, and the rest all collapsed into the middle.
Does anyone know what equations I should use to move the points, both for the "attractive" force between users when they interact, and a "repulsive" force to stop them all collapsing into a single point?
Edit: In response to a question, I should point out that I'm dealing with about 1 million users, and about 10 million interactions between users. If anyone can recommend a tool that could do this for me, I'm all ears :-)

In the past, when I've tried this kind of thing, I've used a spring model to pull linked nodes together, something like: dx = -k*(x-l). dx is the change in the position, x is the current position, l is the desired separation, and k is the spring coefficient that you tweak until you get a nice balance between spring strength and stability, it'll be less than 0.1. Having l > 0 ensures that everything doesn't end up in the middle.
In addition to that, a general "repulsive" force between all nodes will spread them out, something like: dx = k / x^2. This will be larger the closer two nodes are, tweak k to get a reasonable effect.

I can recommend some possibilities: first, try log-scaling the interactions or running them through a sigmoidal function to squash the range. This will give you a smoother visual distribution of spacing.
Independent of this scaling issue: look at some of the rendering strategies in graphviz, particularly the programs "neato" and "fdp". From the man page:
neato draws undirected graphs using ``spring'' models (see Kamada and
Kawai, Information Processing Letters 31:1, April 1989). Input files
must be formatted in the dot attributed graph language. By default,
the output of neato is the input graph with layout coordinates
appended.
fdp draws undirected graphs using a ``spring'' model. It relies on a
force-directed approach in the spirit of Fruchterman and Reingold (cf.
Software-Practice & Experience 21(11), 1991, pp. 1129-1164).
Finally, consider one of the scaling strategies, an attractive force, and some sort of drag coefficient instead of a repulsive force. Actually moving things closer and then possibly farther later on may just get you cyclic behavior.
Consider a model in which everything will collapse eventually, but slowly. Then just run until some condition is met (a node crosses the center of the layout region or some such).
Drag or momentum can just be encoded as a basic resistance to motion and amount to throttling the movements; it can be applied differentially (things can move slower based on how far they've gone, where they are in space, how many other nodes are close, etc.).
Hope this helps.

The spring model is the traditional way to do this: make an attractive force between each node based on the interaction, and a repulsive force between all nodes based on the inverse square of their distance. Then solve, minimizing the energy. You may need some fairly high powered programming to get an efficient solution to this if you have more than a few nodes. Make sure the start positions are random, and run the program several times: a case like this almost always has several local energy minima in it, and you want to make sure you've got a good one.
Also, unless you have only a few nodes, I would do this in 3D. An extra dimension of freedom allows for better solutions, and you should be able to visualize clusters in 3D as well if not better than 2D.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse