Find the cross node for number of nodes in Gremlin? - orientdb

I have a number of nodes connected through intermediate node of other type. Like on picture
There are can be multiple middle nodes.
I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests?

With the toy graph.
Let's assume vertex 1 and 6 are given:
g = TinkerGraphFactory.createTinkerGraph()
m=[:]
g.v(1,6).both.groupCount(m)
m.sort{-it.value}
Sorted Map m contains:
==>v[3]=2
==>v[2]=1
==>v[4]=1

Related

Get the node indexes of connected components in a graph (matlab)

I have two Adjacency Matrix (same size), and I want to check how many nodes in a given connected components of the two graphs are same.
For example, if A has three connected components of size 4,5 and 6 and B has two connected components of size 3 and 7. Suppose I want to compare the number of nodes shared in all connected components greater than 5 i.e. I want to get the number of nodes common to the connected components of size (5,6) in A and connected component of size 7 in B, and the number of nodes not in connected components of size (5,6) in A but in connected component of size 7 in B and the number of nodes in connected components of size (5,6) in A but not in connected component of size 7 in B.
So far I did this,
Abins = conncomp(graph(A));
Bbins = conncomp(graph(B));
[Ca, iaa, ica] = unique(Abins);
[Cb, iab, icb] = unique(Bbins);
Cb_counts = accumarray(icb,1);
Now, I am not sure how can I get the indices of the nodes such that Cb_counts > 5

Indexing a subset of an adjacency matrix

I have a 200 x 200 adjacency matrix called M. This is the connection strength for 200 nodes (the nodes are numbered 1 to 200 and organized in ascending order in M - i.e., M(23,45) is the connection strength of node 23 and 45). Out of these 200 nodes, I'm interested in three subsets of nodes.
subset1 = [2,34,36,42,69,102,187];
subset2 = [5,11,28,89,107,199];
subset3 = [7,55,60,188];
Using M, I would like to conduct the following operations:
Average of the connection strength within subset1, subset2, and subset3, separately. For instance, for subset1, that would be the mean connection of all possible pairs of nodes 2, 34, 26,..., 187.
Find the connection strength between subset1, subset2, and subset3. This would be the average of connection strength between all pairs of node spanning all possible pairs of the three subsets (average of connection between subset1 & subset2, subset2 & subset3, and subset1 & subset3). Do note that this between connection does not equate putting all the nodes from three subsets into a single matrix (e.g., connection between two subsets is the mean connection of each node in one subset with every node in the other subset).
What I've tried so far was indexing M using a for loop. It was bulky, especially when I have a large number of nodes in each subset. Can someone help?
M1 = M(subset1, subset1);
ind = triu(true(size(M1)), 1); % upper triangle
M1_avg = mean(M1(ind));
I will leave M2_avg and M1_M2_avg to you.

to understand the phytree object in matlab

I asked the similar question here: what exactly is the phytree object in matlab?.
Now this is what I did to try to get it.
clear;
d=[4,2,5,4,5,5];
z=seqneighjoin(d);
view(z)
get(z, 'Pointers')
This is the output:
ans =
1 2
3 5
4 6
And the phytree figure in the following. For my understanding, this matrix is the same as the tree field of the phytree object. What is the relation between this matrix and the figure?
You should interpret the array in the following way.
Firstly, you have the four nodes 1, 2, 3 and 4. In the graph you attach, node 1 is labelled Leaf 1; node 2 is labelled Leaf 3; node 3 is labelled Leaf 2; and node 4 is labelled Leaf 4.
Then take each row of the array in turn.
The first row indicates that we first join nodes 1 and 2 - we now call this node 5, as it's the smallest number greater than the four nodes we already have. On the graph, this is the node connecting Leaf 1 and Leaf 3.
Then the second row indicates that we next join nodes 3 and 5 - we now call this node 6, as again it's the smallest number after node 5. On the graph, this is the node connecting the previous join to Leaf 2.
Then the third row indicates that we lastly join nodes 4 and 6 - we don't need to call it anything as it's the final root node, but it would be node 7. On the graph, this is the node connecting the previous join to Leaf 4.
Does that make more sense?

Generate data from kmean's clusters

So I have an input vector, A which is a row vector with 3,000 data points. Using MATLAB, I found 3 cluster centres for A.
Now that I have the 3 cluster centres, I have another row Vector B with 3000 points. The elements of B have one of three values: 1, 2 or 3. So say for e.g if the first 5 elements of B are
B(1,1:5) = [ 1 , 3, 3, 2, 1]
This means that B(1,1) belongs to cluster 1, B(1,2) belongs to cluster 3 etc. What I am trying to do is for every data point in the row vector B, I look at what cluster it belongs to by reading its value and then replace it with a data value from that cluster.
So after the above is done, the first 5 elements of B would look like:
B(1,1:5) = [ 2.7 , 78.4, 55.3, 19, 0.3]
Meaning that B(1,1) is a data value picked from the first cluster (that we got from A), B(1,2) is a data value picked from the third cluster (that we got from A) etc.
k-means only keeps means, it does not model the data distribution.
You cannot generate artificial data sensibly from k-means clusters without additional statistics and distribution assumptions.

how to perform K-medoids

I've been trying for a long time to figure out how to perform (on paper)the K-medoids algorithm, however I'm not able to understand how to begin and iterate. for example:
I have the distance matrix between 6 points, the k,C1 and C2.
I'll be very happy if someone can show me please how to perform the K-medoids algorithm on this example? how to start and iterate?
Thanks
A bit more of details then:
Set K to the desired number of clusters, lets use 2.
Choose randomly K entities to be the medoids m_1, m_2. Lets choose X_3 (Lets call this cluster 1) and X_5 (Cluster 2).
Assign a given entity to the cluster represented by its closest medoid. Cluster 1 will be made of entities (X_1, X_2, X_3 - just check your table, these are closer to X_3 than to X_5), cluster 2 will be (X_4, X_5, X_6).
Update the medoids. A medoid of a cluster should be the entity with the smallest sum of distances to all other entities within the same cluster. X_2 will be the new medoid for cluster 1, and X_4 for cluster 2.
Now what you have to do repeat steps 3-4 until convergence. So,
5- Assign each entity to the cluster of the closest medoid (now these are X_2 and X_4). Cluster one is now made of entities (X_1, X_2, X_3 and X_6), Cluster 2 will be (X_4, X_5).
(there was a change in the entities in each cluster, so iterations must continue.
6- The entity with the smallest sum of distances in cluster one is still X_2, in cluster 2 they are the same, so x_4 stays.
Another iteration
7- As there was no change in the medoids, the clusters will stay the same. This means its time to stop the iterations
Output: 2 clusters. Cluster 1 has entities (X_1, X_2, X_3, X_6), and cluster 2 has entities (X_4 and X_5).
Now, if I had started this using different initial medoids maybe I'd get a different clustering... you may wish to check the build algorithm for initialisation.
You have clusters C1 and C2 given.
Find the most central element in each cluster.
Compute new C1 and C2.
Repeat 1. and 2. until convergence