How to generate a random network graph based on degrees AND network density in NetworkX - networkx

There are a number of functions in NetworkX which allow for different types of random graphs to be generated.
Are there any which allow for the specified degree of the nodes as well as the overall network density (or similar metric) to be considered?

There may be other metrics that are possible to specify in the creation of a graph, but for your examples of degree and density, there exists only one combination of node and edge numbers that can meet specified degree an density criteria.
For an undirected graph, the density is calculated as 2*m/(n*(n-1)) where m is the number of edges and n is the number of nodes. The average degree is calculated as 2*m/n.
Using a bit of substitution, we can then say that n = (degree/density) + 1 and m = (n*degree)/2.
With NetworkX, you can use nx.gnm_random_graph() to specify the number of nodes and edges to match those calculated above.
If you use nx.gnp_random_graph(), note that the p parameter is equal to the density of the graph. Density is defined as the number of edges divided by the maximum number of possible edges, so including a probability that a node will attach to any of the other nodes (p) in generating the random graph effectively does the same thing. The resulting number of expected edges and average degree can then be easily calculated using that value and the number of nodes.

Related

How to find neighbors within a distance for an unconnected node within a networkx python graph

I would like to simulate a wireless network with time-varying and mobile behaviour of the nodes. Thus, I need every time that the node wakes up or moves to search for its neighbours within a distance. How can I find the nearby nodes? There exist any functions? Thank you
It's a single function: ego_graph. It lets you specify a distance parameter, called the radius.
import networkx as nx
# Sample data
G = nx.florentine_families_graph()
nx.draw_networkx(G, with_labels=True)
# Desired graph
H = nx.ego_graph(G, node=4, radius=2)
nx.draw_networkx(H, with_labels=True)
The entire Florentine families graph:
And just those within distance 2 of the node 'Acciauoli':
If you're using a distance measure other than simple topological distance (i.e. counting edges), you can provide the distance parameter to the ego_graph function to specify an edge attribute to use for distance.

How a clustering algorithm in R can end up with negative silhouette values? AB

We know that clustering methods in R assign observations to the closest medoids. Hence, it is supposed to be the closest cluster each observation can have. So, I wonder how it is possible to have negative values of silhouette , while we are supposedly assign each observation to the closest cluster and the formula in silhouette method cannot get negative?
Behnam.
Two errors:
most clustering algorithms do not use the medoid, only PAM does.
the silhouette does not use the distance to the medoid, but the average distance to all cluster members. If the closest cluster is very wide, the average distance can be larger than the distance to the medoid. Consider a cluster with one point in the center, and all others on a sphere around it.

Clustering algorithm with different epsilons on different axes

I am looking for a clustering algorithm such a s DBSCAN do deal with 3d data, in which is possible to set different epsilons depending on the axis. So for instance an epsilon of 10m on the x-y plan, and an epsilon 0.2m on the z axis.
Essentially, I am looking for large but flat clusters.
Note: I am an archaeologist, the algorithm will be used to look for potential correlations between objects scattered in large surfaces, but in narrow vertical layers
Solution 1:
Scale your data set to match your desired epsilon.
In your case, scale z by 50.
Solution 2:
Use a weighted distance function.
E.g. WeightedEuclideanDistanceFunction in ELKI, and choose your weights accordingly, e.g. -distance.weights 1,1,50 will put 50x as much weight on the third axis.
This may be the most convenient option, since you are already using ELKI.
Just define a custom distance metric when computing the DBSCAN core points. The standard DBSCAN uses the Euclidean distance to compute points within an epsilon. So all dimensions are treated the same.
However, you could use the Mahalanobis distance to weigh each dimension differently. You can use a diagonal covariance matrix for flat clusters. You can use a full symmetric covariance matrix for flat tilted clusters, etc.
In your case, you would use a covariance matrix like:
100 0 0 0 100 0 0 0 0.04
In the pseudo code provided at the Wikipedia entry for DBSCAN just use one of the distance metrics suggested above in the regionQuery function.
Update
Note: scaling the data is equivalent to using an appropriate metric.

Dividing a normal distribution into regions of equal probability in Matlab

Consider a Normal distribution with mean 0 and standard deviation 1. I would like to divide this distribution into 9 regions of equal probability and take a random sample from each region.
It sounds like you want to find the values that divide the area under the probability distribution function into segments of equal probability. This can be done in matlab by applying the norminv function.
In your particular case:
segmentBounds = norminv(linspace(0,1,10),0,1)
Any two adjacent values of segmentBounds now describe the boundaries of segments of the Normal probability distribution function such that each segment contains one ninth of the total probability.
I'm not sure exactly what you mean by taking random numbers from each sample. One approach is to sample from each region by performing rejection sampling. In short, for each region bounded by x0 and x1, draw a sample from y = normrnd(0,1). If x0 < y < x1, keep it. Else discard it and repeat.
It's also possible that you intend to sample from these regions uniformly. To do this you can try rand(1)*(x1-x0) + x0. This will produce problems for the extreme quantiles, however, since the regions extend to +/- infinity.

How can I weight the dimensions as I am computing KNN of given instance vector in MATLAB?

Suppose I have a bunch of instances and I want to find the closest K instances to a particular instance. Moreover, I have some weights showing the strengths of each dimension as we computing the distances. How can I incorporate these weights with the KNN finding process in MATLAB?
There are two methods that can allow you to do this. Looking at the knnsearch documentation, you can either use the seuclidean flag where this performs the standardized Euclidean distance. Each co-ordinate difference between two points is scaled by dividing by a corresponding scale value in S. S by default is the standard deviation for each co-ordinate. You can manually specify each of these scales by specifying the Scale parameter, then specifying a vector where each component will scale each dimension for you instead of the standard deviation in each dimension.
As such, the more contribution a co-ordinate has, the larger the scale should be, as you want to aggregate co-ordinates and will allow distances that are larger to have a smaller Euclidean distance. This is essentially the same thing as weighting the strengths in each dimension.
Alternatively, you can provide your own function that computes the distance between two vectors. You can define what these weights are in your workspace before hand, then create an anonymous function wrapper that accesses these weights when computing whatever distance measure you want yourself. The anonymous function can only take in two vectors, corresponding to two different co-ordinate vectors in KNN. As such, use this anonymous function to access the weights that should be already defined in the workspace then go from there.
Check out: http://www.mathworks.com/help/stats/knnsearch.html