Choosing a networkx layout that takes edge labels into account - networkx

I'm plotting networkx weighted graphs using the draw_networkx_edge_labels function. My problem is that, since edges sometimes cross each other, it is not always clear from the plot which weight belongs to which edge. For instance, in the following plot it is not immediately clear whether 2 is the weight of (1,2) or (3,7).
I'm currently using the neato layout, which does not take edge labels into account. In particular, this is how I'm drawing a weighted graph g:
layout = nx.nx_pydot.graphviz_layout(g, prog='neato')
nx.draw(g, pos=layout)
edge_labels = nx.get_edge_attributes(g, 'weight')
nx.draw_networkx_edge_labels(g, pos=layout, edge_labels=edge_labels)
I know I can manually control the position of the label along an edge using the label_pos parameter, but my question is whether there exists a way to automatically plot the graph such that edge labels do not usually collide (either using a layout that takes labels into account or a method that "neatly" selects label positions along edges).
I'm not expecting something that always works, but since my graphs are relatively sparsely connected, I hope there's a method that at least has a tendency to work well.

I have been meaning to implement this in netgraph, my fork of the networkx drawing utilities, for a while now. Unfortunately, I have a job interview on Thursday, so I won't have time to write this anytime soon. The basic idea, however, is pretty simple, and is also already implemented in some R packages such as ggrepel and also ggnetwork.
The basic idea is that you use a force directed layout to position your labels, given a predetermined and fixed layout for your nodes and edges. So:
Compute a node layout using the layout of your choice.
Partition each edge into a chain of many, many nodes, and compute the positions of the "edge nodes" using the already known positions of the source and target nodes of the edge. This partitioning is to give each edge a "mass" in the following force directed layout.
For each edge, add a "label" node and connect it to the most central "edge node".
Compute a force-directed layout keeping all nodes but the label nodes fixed (e.g. using spring_layout in networkx).
You should now have sensible edge label coordinates that do not overlap any of the edges. Use plt.annotate to plot a connection between the edge and the edge label.

Related

Additional forces to networkx spring_layout

I would like to add additional forces to networkx spring_layout.
I have a directected graph and I would like nodes to move to different sides according to the edges that they have. Nodes that have more outgoing edges should drift to nodes that have more ingoing edges should drift right. Another alternative would be. That these groups of nodes would drift towards each other, nodes with outgoing edges would get closer while nodes with ingoing edges would also get closer to each other.
I managed to look into to the source code of spring_layout of networkx http://networkx.lanl.gov/archive/networkx-0.37/networkx.drawing.layout-pysrc.html#spring_layout
but everything there is just beyond my comprehension
G.DiGraph()
G.add_edges_from([(1,5),(2,5),(3,5),(5,6),(5,7)])
The layout should show edges 1,2,3 closer to each other, the same regarding 6 and 7.
I imagine, that I could solve this by adding invisible edges via using MultiDiGraph. I could count ingoing and outgoing edges of each node and add invisible edges that connect the two groups. However, I am very sure that there are better ways of solving the problem.
Adding weights into the mix would be a good way to group things (with those invisible nodes). But the layouts have no way of knowing left from right. To get the exact layout you want you could specify each point's x,y coordinates.
import networkx as nx
G=nx.Graph()
G.add_node(1,pos=(1,1))
G.add_node(2,pos=(2,3))
G.add_node(3,pos=(3,4))
G.add_node(4,pos=(4,5))
G.add_node(5,pos=(5,6))
G.add_node(6,pos=(6,7))
G.add_node(7,pos=(7,9))
G.add_edges_from([(1,5),(2,5),(3,5),(5,6),(5,7)])
pos=nx.get_node_attributes(G,'pos')
nx.draw(G,pos)

What clustering algorithm is suitable for 2d rectangles without knowing the number of clusters ahead of time?

The problem I have is that there are rectangles within rectangles. Think of a map, except with the following traits with the key point being: rectangles with similar density often share similar dimensions and similar position on the x axis with other rectangles, but sometimes the distance between these rectangles may be big but usually small. If the position on x axis or dimensions are clearly way off, they would not be similar.
rectangles do not intersect, smaller rectangles are completely inside a larger rectangle.
rectangles often have similar x position and similar dimensions
(similar height and width), and have smaller rectangles inside it. The rectangle itself would be considered a cluster of it's own.
Sometimes the distance of these cluster from another cluster may be quite
big (think of islands). Often these clusters share the same or
similar dimension and same or similar density of sub rectangles. if so, they should be considered as part of the same cluster despite a distance between the two clusters.
The more dense a rectangle is (smaller rectangles inside), the more likely there is a similar or same dense rectangle sharing same or similar dimension nearby.
I've attached a diagram to describe the situation more clearly:
Red border means those groups are outliers, not part of any cluster
and are ignored.
Blue border has many clusters (black borders containing black solid
rectangles). They form a group of clusters that are similar due to
the criteria mentioned above (similar width, similar X position,
similar density). Even the clusters towards the bottom right corner
is still considered part of this group because of the criteria
(similar width, similar X position, similar density).
Turquois border has many clusters (black borders containing black
solid rectangles). However, these clusters differ in dimension, x
position, and density, from the ones in Blue border. They are
considered a group of their own.
So far I found density clustering such as DBSCAN which seems to be perfect since it takes noise (outliers) into consideration, and you do not need to know ahead of time how many clusters there will be.
However, you need to define the minimum number of points needed to form a cluster and a threshold distance. What happens if you don't know these two and it can vary based on the problem described above?
Another seemingly plausible solution would be hierarchial (agglomerative) clustering (r-tree), but I'm concerned that I would still need to know the cutoff point in the tree depth level to determine if it's a cluster.
You certainly will need to take all your constraints into account.
In general, your task looks more like constraint satisfaction to me than clustering.
Maybe some constraint clustering approaches are useful to you, but I'm not sure if they allow your kind of constraints. Usually, they only support must-link and must-not-link constraints.
But of course you should try DBSCAN (in particular also: generalized DBSCAN, since the generalization might allow you to add the constraints you have!) and R-trees (which aren't actually a clustering algorithm, but a data index).
Note that R-trees will put the "outliers" into some leaf, to ensure minimum fill.
As is, I cannot give you more detailed recommendations, because even from above sketch, your constraints are not well defined IMHO. Try putting them into pseudocode. You probably only have a small number of rectangles (say, 100); so you can afford to run really expensive algorithms, such as linkage clustering with a customized linkage criterion. Putting your criterions into code may already be 99% of the effort!

Spatial data visualization level of detail

I have a 3D point cloud data set with different attributes that I visualize as points so far, and I want to have LOD based on distance from the set. I want to be able to have a generalized view from far away with fewer and larger points, and as I zoom in I want a more points correctly spaced out appearing automatically.
Kind of like this video below, behavior wise: http://vimeo.com/61148577
I thought one solution would be to use an adaptive octree, but I'm not sure if that is a good solution. I've been looking into hierarchical clustering with seamless transitions, but I'm not sure which solution I should go with that fits my goal.
Any ideas, tips on where to start? Or some specific method?
Thanks
The video you linked uses 2D metaballs. When metaballs clump together, they form blobs, not larger circles. Are you okay with that?
You should read an intro to metaballs before continuing. Just google 2D metaballs.
So, hopefully you've read about metaball threshold values and falloff functions. Your falloff function should have a radius--a distance at which the function falls to zero.
We can achieve an LOD effect by tuning the threshold and the radius. Basically, as you zoom out, increase radius so that points have influence over a larger area and start to clump together. Also, adjust threshold so that areas with insufficient density of points start to disappear.
I found this existing jsfiddle 2D metaballs demo and I've modified it to showcase LOD:
LOD 0: Individual points as circles. (http://jsfiddle.net/TscNZ/370/)
LOD 1: Isolated points start to shrink, but clusters of points start to form blobs. (http://jsfiddle.net/TscNZ/374/)
LOD 2: Isolated points have disappeared. Blobs are fewer and larger. (change above URL to jsfiddle revision 377)
LOD 3: Blobs are even fewer and even larger. (change above URL to jsfiddle revision 380)
As you can see in the different jsfiddle revisions, changing LOD just requires tuning a few variables:
threshold = 1,
max_alpha = 1,
point_radius = 10,
A crucial point that many metaballs articles don't touch on: you need to use a convention where only values above your threshold are considered "inside" the metaball. Then, when zoomed far out, you need to set your threshold value above the peak value of your falloff function. This will cause an isolated point to disappear completely, leaving only clumps visible.
Rendering metaballs is a whole topic in itself. This jsfiddle demo takes a very inefficient brute-force approach, but there's also the more efficient "marching squares".

Problem drawing a polygon on data clusters in MATLAB

I have some data points which I have devided into them into some clusters with some clustering algorithms as the picture below:(it might takes some time for the image to appear)
alt text http://www.freeimagehosting.net/uploads/05a807bc42.png
Each color represents different cluster. I have to draw polygons around each cluster. I use convhull for this reason. But as you can see the polygon for the red cluster is very big and covers a lot of areas, which is not the one I am looking for. I need to draw lines(ploygons) exactly around my data sets. For example in the picture above I want a polygon that is drawn exactly the same(and around) as the red cluster with the 3 branches. In other words, in this case I need a polygon with 3 branches to cover my red clusters not that big polygon that covers the whole area. Can anyone help me with this?
Please Note that the solution should be general, because the clusters will change in each run of the algorithm, so it needs to be in a way that is general.
I am not sure this is a fully specified question. I see this variants on this question come up quite often.
Why this can not really be answered here: Imagine six points, three in an equilateral triangle with another three in an equilateral triangle inside it in the same orientation.
What is the correct hull around this? Is it just the convex hull? Is it the inner triangle with three line spurs coming out from it? Does it matter what the relative sizes of the triangles are? Should you have to specify that parameter then?
If your clusters are very compact, you could try the following:
Create a grid, say with a spacing of 0.1.
Set every pixel in the grid to 1 if there's at least one data point covering it, set the pixel to 0 if there is no data point covering the pixel.
You may need to run imclose on your mask in order to fill little holes inside that have not been colored due to sheer bad luck.
Extract the border pixels using, e.g. bwperim. This is the outline of the polygon you're looking for.

How do I visualise clusters of users?

I have an application in which users interact with each-other. I want to visualize these interactions so that I can determine whether clusters of users exist (within which interactions are more frequent).
I've assigned a 2D point to each user (where each coordinate is between 0 and 1). My idea is that two users' points move closer together when they interact, an "attractive force", and I just repeatedly go through my interaction logs over and over again.
Of course, I need a "repulsive force" that will push users apart too, otherwise they will all just collapse into a single point.
First I tried monitoring the lowest and highest of each of the XY coordinates, and normalizing their positions, but this didn't work, a few users with a small number of interactions stayed at the edges, and the rest all collapsed into the middle.
Does anyone know what equations I should use to move the points, both for the "attractive" force between users when they interact, and a "repulsive" force to stop them all collapsing into a single point?
Edit: In response to a question, I should point out that I'm dealing with about 1 million users, and about 10 million interactions between users. If anyone can recommend a tool that could do this for me, I'm all ears :-)
In the past, when I've tried this kind of thing, I've used a spring model to pull linked nodes together, something like: dx = -k*(x-l). dx is the change in the position, x is the current position, l is the desired separation, and k is the spring coefficient that you tweak until you get a nice balance between spring strength and stability, it'll be less than 0.1. Having l > 0 ensures that everything doesn't end up in the middle.
In addition to that, a general "repulsive" force between all nodes will spread them out, something like: dx = k / x^2. This will be larger the closer two nodes are, tweak k to get a reasonable effect.
I can recommend some possibilities: first, try log-scaling the interactions or running them through a sigmoidal function to squash the range. This will give you a smoother visual distribution of spacing.
Independent of this scaling issue: look at some of the rendering strategies in graphviz, particularly the programs "neato" and "fdp". From the man page:
neato draws undirected graphs using ``spring'' models (see Kamada and
Kawai, Information Processing Letters 31:1, April 1989). Input files
must be formatted in the dot attributed graph language. By default,
the output of neato is the input graph with layout coordinates
appended.
fdp draws undirected graphs using a ``spring'' model. It relies on a
force-directed approach in the spirit of Fruchterman and Reingold (cf.
Software-Practice & Experience 21(11), 1991, pp. 1129-1164).
Finally, consider one of the scaling strategies, an attractive force, and some sort of drag coefficient instead of a repulsive force. Actually moving things closer and then possibly farther later on may just get you cyclic behavior.
Consider a model in which everything will collapse eventually, but slowly. Then just run until some condition is met (a node crosses the center of the layout region or some such).
Drag or momentum can just be encoded as a basic resistance to motion and amount to throttling the movements; it can be applied differentially (things can move slower based on how far they've gone, where they are in space, how many other nodes are close, etc.).
Hope this helps.
The spring model is the traditional way to do this: make an attractive force between each node based on the interaction, and a repulsive force between all nodes based on the inverse square of their distance. Then solve, minimizing the energy. You may need some fairly high powered programming to get an efficient solution to this if you have more than a few nodes. Make sure the start positions are random, and run the program several times: a case like this almost always has several local energy minima in it, and you want to make sure you've got a good one.
Also, unless you have only a few nodes, I would do this in 3D. An extra dimension of freedom allows for better solutions, and you should be able to visualize clusters in 3D as well if not better than 2D.