partitioning large signed networks - networkx

I have a large signed network. The signed network is a weighted graph whose edges can be +1 or _1. I need to partition this graph so that most positive edges are placed inside the clusters and the negative edges are placed outside the cluster. this graph is very sparse.
Do you have ideas?
there is a special version of Louvain algorithm for the signed network in Pajek.
Does anyone know about the details of this algorithm?

This paper by Vincent Traag outlines one approach.
He also has a python package (built on top of igraph) called louvain that can do this for you.
This blog post demonstrates the package and method on an interesting use case.

Related

Why is the rich-club coefficient algorithm not defined for directed networks in NetworkX?

I'm working with NetworkX to compute the rich-club coefficient of a directed graph. However, I see in the documentation for the implementation of this algorithm that it is not implemented for directed networks.
I want to know if there are any references to understand better the reason of this and develop a solution for my scenario (compute Rich-Club for Directed graphs).
I found this reference and it seems that they have proposed a corrected equation to compute it. But I haven't found any additional references to confirm if the rich-club was initially defined for just undirected graphs (not even in the references cited by the doc page of NetworkX).

Multiple drivers with Mapbox Optimization API?

Using the Mapbox Optimization API is it possible to optimize the routes between multiple drivers?
Example: 6 locations are added, 2 drivers are added, the routes get split / optimized between the two drivers
I'm still in the planning stage, so I haven't poked around too much myself yet, but the code and all the examples I've seen are directed towards single driver optimization only... Has anybody done something like this before? Anything you can recommend to point me in the right direction?
Mapbox's Optimization API returns a duration-optimized route between the input coordinates, which is also known as solving the so-called "Travelling Salesman Problem". This is a well-known, NP-hard graph theory problem, meaning there is no general polynomial-time solution known for the problem.
The underlying data used for computing the aforementioned duration-optimized route are the cost functions of the edges connecting the coordinates input to the API request. You could retrieve the cost values (including traffic) between a set of these coordinate positions using Mapbox's Matrix API.
Adding a second driver/salesman to the problem makes the problem exponentially harder to solve, as discussed in the answer to this Stack Overflow post.
Here is a link to a scientific paper discussing a possible approach to this problem.
As evidenced by the research community, a solution for the Multiple Travelling Salesman Problem is not straightforward to implement. If you do not want to engage in this non-trivial task of implementing an algorithm that would solve it for you, you could implement a function that will make an educated guess on how to split up the destination coordinates between the two drivers. This "educated guess" could be based on values obtained from the Matrix API. You could make a one-to-many request for each driver, then take the lesser of the two durations for each coordinate and assign the coordinate to the appropriate driver. Then, you can use Mapbox's Optimization API to solve the two separate travelling salesman problems individually.
Even if you did implement an algorithm that would solve the Multiple Travelling Salesman Problem, the problem's complexity grows exponentially with the number of drivers and the number of waypoints. Therefore, you could end up with a solution that works, but would not necessarily compute in a reliable amount of time. These performance limitations are something to keep in mind when going about implementing a solution.

Applying vector based clustering algorithms to social network context

i have a social network described as edges in a file. I used graph based clustering algorithms to find dense parts of the graph. However there is also vector based clustering which i need to apply to the data i have, but i can not find any context to this. I have also information about each node considering their features. I think using vectors containing the features of each user makes no sense here. For example k-Means would calculate the distance between user u1 with his feature vector v1 = [f1,f2,f3,..] and user u2 with its feature vector v2 = [f1,f2,f3,...]. However both vectors would have binary values depending on which feature the user has. Additionally i have a matrix with the users on one axis and the features on the other, where the user is able to set permission.
My Question is now, how i can make use of k-means, dbscan etc. in the context of this topic.
Best wishes.
Many algorithms can be modified to allow being used with distances for binary features. For example k-means can be modified for binary data: k-modes.
But I don't think it will do anything useful on your data.
You approach to this problem is bad: don't first decide the algorithm, then try to make it run. You are then bound to solve the wrong problem. Instead, formalize the problem first, in mathematics, what a good clustering would be. Then identify the appropriate algorithm by it's mathematical ability to find a good solution to this objective.

Separate objects in a point cloud

I am looking to Separate a point cloud into unconnected objects, I have
tried using k-means algorithm, but it didn't do the whole job.
I'm looking to improve the results shown in the picture i added.
any thoughts or directions?
My cloud, separated using k-means
I suggest you look into Euclidean Segmentation, for which an algorithm is given on this page: http://www.pointclouds.org/documentation/tutorials/cluster_extraction.php. PCL also provides a built-in implementation.

ELKI implementation of OPTICS clustering algorithm detects only one cluster

I'm having issue with using OPTICS implementation in ELKI environment. I have used the same data for DBSCAN implementation and it worked like a charm. Probably I'm missing something with parameters but I can't figure it out, everything seems to be right.
Data is a simple 300х2 matrix, consists of 3 clusters with 100 points in each.
DBSCAN result:
Clustering result of DBSCAN
MinPts = 10, Eps = 1
OPTICS result:
Clustering result of OPTICS
MinPts = 10
You apparently already found the solution yourself, but here is the long story:
The OPTICS class in ELKI only computes the cluster order / reachability diagram.
In order to extract clusters, you have different choices, one of which (the one from the original OPTICS publication) is available in ELKI.
So in order to extract clusters in ELKI, you need to use the OPTICSXi algorithm, which will in turn use either OPTICS or the index based DeLiClu to compute the cluster order.
The reason why this is split into two parts in ELKI probably is so that you can on one hand implement another logic for extracting the clusters, and on the other hand implement different methods like DeLiClu for computing the cluster order. That would align well with the modular architecture of ELKI.
IIRC there is at least one more method (apparently not yet in ELKI) that extracts clusters by looking for local maxima, then extending them horizontally until they hit the end of the valley. And there was a different one that used "inflexion points" of the plot.
#AnonyMousse pretty much put it right. I just can't upvote or comment yet.
We hope to have some students contribute the other cluster extraction methods as small student projects over time. They are not essential for our research, but they are good tasks for students that want to learn about ELKI to get started.
ELKI is a fast moving project, and it lives from community contributions. We would be happy to see you contribute some code to it. We know that the codebase is not easy to get started with - it is fairly large, and the generality of the implementation and the support for index structures make it a bit hard to get started. We try to add Tutorials to help you to get started. And once you are used to it, you will actually benefit from the architecture: your algorithms get the benfits of indexing and arbitrary distance functions, while if you would implement from scratch, you would likely only support Euclidean distance, and no index acceleration.
Seeing that you struggled with OPTICS, I will try to write an OPTICS tutorial in the new year. In particular, OPTICS can benefit a lot from using an appropriate index structure.