to understand the phytree object in matlab - matlab

I asked the similar question here: what exactly is the phytree object in matlab?.
Now this is what I did to try to get it.
clear;
d=[4,2,5,4,5,5];
z=seqneighjoin(d);
view(z)
get(z, 'Pointers')
This is the output:
ans =
1 2
3 5
4 6
And the phytree figure in the following. For my understanding, this matrix is the same as the tree field of the phytree object. What is the relation between this matrix and the figure?

You should interpret the array in the following way.
Firstly, you have the four nodes 1, 2, 3 and 4. In the graph you attach, node 1 is labelled Leaf 1; node 2 is labelled Leaf 3; node 3 is labelled Leaf 2; and node 4 is labelled Leaf 4.
Then take each row of the array in turn.
The first row indicates that we first join nodes 1 and 2 - we now call this node 5, as it's the smallest number greater than the four nodes we already have. On the graph, this is the node connecting Leaf 1 and Leaf 3.
Then the second row indicates that we next join nodes 3 and 5 - we now call this node 6, as again it's the smallest number after node 5. On the graph, this is the node connecting the previous join to Leaf 2.
Then the third row indicates that we lastly join nodes 4 and 6 - we don't need to call it anything as it's the final root node, but it would be node 7. On the graph, this is the node connecting the previous join to Leaf 4.
Does that make more sense?

Related

Find cluster of points in adjacency list [duplicate]

Imagine we have many points which some of them are connected together and we want to cluster them.
Please take a look at the following figure.
If we have the "connectivity matrix" of points, how we can cluster them in two group (groups of connected points)?
ConnectivityMatrix=
[1 2
1 3
2 4
2 3
2 1
3 1
3 2
3 4
4 3
4 2
5 8
5 7
5 6
6 5
6 7
7 6
7 5
7 8
8 7
8 5]
The final result should be nodes of 1,2,3,4 in a first group(cluster) and nodes of 5,6,7,8 in a second group (cluster).
Here is some code to get you started. I basically implemented Depth First Search... a very crude implementation of it anyway.
Depth First Search is an algorithm that is used for the traversal of trees. Graphs are essentially a special case of trees where there are leaf nodes that connect back to the root. The basic algorithm for Depth First Search is as so:
Start at the root of the tree and add this to a stack
For each node that is connected to the root, add this onto the stack and place this in a list of visited nodes
While there is still a node on the stack...
Pop off a node off the stack
Check the visited nodes list. If this is a node we have already visited, skip.
Else, visit any nodes that are connected to this node we popped off and add to the stack
If we have disconnected graphs like what you have above, we basically run Depth First Search multiple times. Each time will be for one cluster. After one Depth First Search result, we will discover nodes that belong to one cluster. We restart Depth First Search again with any node that we have not touched yet, which will be a node from another cluster that we have not visited. As you clearly have two clusters in your graph structure, we will have to run Depth First Search two times. This is commonly referred to as finding all connected components in an overall graph.
To find the Connected Components, here are the steps that I did:
Create a connectivity matrix
Initialize a Boolean list that tells us whether or not we have visited a node in your graph
Initialize an empty cluster list
Initialize an empty stack that contains which nodes we should visit.
While there is at least one node we need to visit...
Find such a node
Initialize our stack to contain this node
While our stack is not empty
Pop off a node from the stack
If we have visited this node, continue
Else mark as visited
Retrieve all nodes connected to this node
Remove those nodes that are not on the stack in (4)
Add these nodes to the stack and the cluster list
Once the stack is empty, we have a list of all of the nodes contained in a single cluster. Add this cluster to a final list.
Repeat 1 - 6 until we visit all nodes
Without further ado, this is the code. Bear in mind this is not battle tested. If you have graph structures that will generate an error, that'll be on your own to fix :)
ConnectivityMatrix = [1 2
1 3
2 4
2 3
2 1
3 1
3 2
3 4
4 3
4 2
5 8
5 7
5 6
6 5
6 7
7 6
7 5
7 8
8 7
8 5];
%// Find all possible node IDs
nodeIDs = unique(ConnectivityMatrix(:));
%// Flag that tells us if there are any nodes we should visit
nodeIDList = false(1,numel(nodeIDs));
%// Stores our list of clusters
clusterList = {};
%// Keeps track of how many clusters we have
counter = 1;
%// Stack - initialize to empty
stackNodes = [];
%// While there is at least one node we need to visit
while (~all(nodeIDList))
% Find any node
stackNodes = find(nodeIDList == false, 1);
% Initialize our stack to contain this node
nodesCluster = stackNodes;
%// While our stack is not empty
while (~isempty(stackNodes))
% Grab the node off the stack and pop off
node = stackNodes(end);
stackNodes(end) = [];
%// If we have marked this node as visited, skip
if (nodeIDList(node))
continue;
end
%// Mark as visited
nodeIDList(node) = true;
%// Retrieve all nodes connected to this node
connectedNodes = ConnectivityMatrix(ConnectivityMatrix(:,1) == node, :);
nodesToVisit = unique(connectedNodes(:,2).');
%// Remove those already visited
visitedNodes = ~nodeIDList(nodesToVisit);
finalNodesToVisit = nodesToVisit(visitedNodes);
%// Add to cluster
nodesCluster = unique([nodesCluster finalNodesToVisit]);
%// Add to stack
stackNodes = unique([stackNodes finalNodesToVisit]);
end
%// Add connected components to its own cluster
clusterList{counter} = nodesCluster;
counter = counter + 1;
end
Once we have run this code, we can display our clusters like so:
celldisp(clusterList)
clusterList{1} =
1 2 3 4
clusterList{2} =
5 6 7 8
As such, cluster #1 contains nodes 1,2,3,4 while cluster #2 contains nodes 5,6,7,8.
Bear in mind that this code will only work if you sequentially label your nodes as you did in your diagram. You can't skip any label numbers (i.e. you can't do 1,2,4,6,9, etc. This should be 1,2,3,4,5).
Good luck!
You can use "off-the-shelf" Matlab commands for this problem. For example, you can use graphconncomp.
The answer from rayryeng is pretty good. However, here are some details I would like to point out:
"numel(nodeIDs)" would bring in possible errors if your node label is not sequentially (i.e., you have 10 nodes and the maximum node label is 20). You could switch "numel(nodeIDs)" to "max(nodeIDs) or a larger value."
I also believe the following code would bring in some problems (i.e., some nodes are missing and become isolated nodes) when utilizing this function:
connectedNodes = ConnectivityMatrix(ConnectivityMatrix(:,1) == node, :);
nodesToVisit = unique(connectedNodes(:,2).');
I modified the simple two lines with the following messy code:
connectedNodes1 = ConnectivityMatrix (ConnectivityMatrix (:,1) == node, :);
connectedNodes2 = ConnectivityMatrix (ConnectivityMatrix (:,2) == node, :);
AC=connectedNodes1(:,2).';
AD=connectedNodes2(:,1).';
ACA=reshape(AC,[],1);
ADA=reshape(AD,[],1);
AE= [ACA; ADA] ;
AEA=reshape(AE,[],1);
AEA=AEA';
nodesToVisit = unique(AEA);
After modifying these two points, there are no problems with rayryeng's initial code.

How do I create a zero-one matrix out of all possible paths from source to destination node?

Suppose I have network with 4 nodes and 4 edges as shown below
All possible paths from source node 1 to destination node 4 are
{{A,C}, {A,B,D}}
From these possible paths, I want to create a zero-one matrix, where number of rows represents the number of paths and number of columns represents the number of edges in a network. The matrix takes 1 as entry if the edge is present in the path to reach destination node, and 0 as its entry if it is not. For above network, path matrix is
[1 0 1 0; 1 1 0 1]
This is because all possible paths are two and hence two rows, and there are 4 edges and hence 4 columns.
I have a large complex network with 15 nodes and 107 paths to which I need to apply this technique. Please help.

Find the cross node for number of nodes in Gremlin?

I have a number of nodes connected through intermediate node of other type. Like on picture
There are can be multiple middle nodes.
I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests?
With the toy graph.
Let's assume vertex 1 and 6 are given:
g = TinkerGraphFactory.createTinkerGraph()
m=[:]
g.v(1,6).both.groupCount(m)
m.sort{-it.value}
Sorted Map m contains:
==>v[3]=2
==>v[2]=1
==>v[4]=1

How can I cluster points which are connected in MATLAB?

Imagine we have many points which some of them are connected together and we want to cluster them.
Please take a look at the following figure.
If we have the "connectivity matrix" of points, how we can cluster them in two group (groups of connected points)?
ConnectivityMatrix=
[1 2
1 3
2 4
2 3
2 1
3 1
3 2
3 4
4 3
4 2
5 8
5 7
5 6
6 5
6 7
7 6
7 5
7 8
8 7
8 5]
The final result should be nodes of 1,2,3,4 in a first group(cluster) and nodes of 5,6,7,8 in a second group (cluster).
Here is some code to get you started. I basically implemented Depth First Search... a very crude implementation of it anyway.
Depth First Search is an algorithm that is used for the traversal of trees. Graphs are essentially a special case of trees where there are leaf nodes that connect back to the root. The basic algorithm for Depth First Search is as so:
Start at the root of the tree and add this to a stack
For each node that is connected to the root, add this onto the stack and place this in a list of visited nodes
While there is still a node on the stack...
Pop off a node off the stack
Check the visited nodes list. If this is a node we have already visited, skip.
Else, visit any nodes that are connected to this node we popped off and add to the stack
If we have disconnected graphs like what you have above, we basically run Depth First Search multiple times. Each time will be for one cluster. After one Depth First Search result, we will discover nodes that belong to one cluster. We restart Depth First Search again with any node that we have not touched yet, which will be a node from another cluster that we have not visited. As you clearly have two clusters in your graph structure, we will have to run Depth First Search two times. This is commonly referred to as finding all connected components in an overall graph.
To find the Connected Components, here are the steps that I did:
Create a connectivity matrix
Initialize a Boolean list that tells us whether or not we have visited a node in your graph
Initialize an empty cluster list
Initialize an empty stack that contains which nodes we should visit.
While there is at least one node we need to visit...
Find such a node
Initialize our stack to contain this node
While our stack is not empty
Pop off a node from the stack
If we have visited this node, continue
Else mark as visited
Retrieve all nodes connected to this node
Remove those nodes that are not on the stack in (4)
Add these nodes to the stack and the cluster list
Once the stack is empty, we have a list of all of the nodes contained in a single cluster. Add this cluster to a final list.
Repeat 1 - 6 until we visit all nodes
Without further ado, this is the code. Bear in mind this is not battle tested. If you have graph structures that will generate an error, that'll be on your own to fix :)
ConnectivityMatrix = [1 2
1 3
2 4
2 3
2 1
3 1
3 2
3 4
4 3
4 2
5 8
5 7
5 6
6 5
6 7
7 6
7 5
7 8
8 7
8 5];
%// Find all possible node IDs
nodeIDs = unique(ConnectivityMatrix(:));
%// Flag that tells us if there are any nodes we should visit
nodeIDList = false(1,numel(nodeIDs));
%// Stores our list of clusters
clusterList = {};
%// Keeps track of how many clusters we have
counter = 1;
%// Stack - initialize to empty
stackNodes = [];
%// While there is at least one node we need to visit
while (~all(nodeIDList))
% Find any node
stackNodes = find(nodeIDList == false, 1);
% Initialize our stack to contain this node
nodesCluster = stackNodes;
%// While our stack is not empty
while (~isempty(stackNodes))
% Grab the node off the stack and pop off
node = stackNodes(end);
stackNodes(end) = [];
%// If we have marked this node as visited, skip
if (nodeIDList(node))
continue;
end
%// Mark as visited
nodeIDList(node) = true;
%// Retrieve all nodes connected to this node
connectedNodes = ConnectivityMatrix(ConnectivityMatrix(:,1) == node, :);
nodesToVisit = unique(connectedNodes(:,2).');
%// Remove those already visited
visitedNodes = ~nodeIDList(nodesToVisit);
finalNodesToVisit = nodesToVisit(visitedNodes);
%// Add to cluster
nodesCluster = unique([nodesCluster finalNodesToVisit]);
%// Add to stack
stackNodes = unique([stackNodes finalNodesToVisit]);
end
%// Add connected components to its own cluster
clusterList{counter} = nodesCluster;
counter = counter + 1;
end
Once we have run this code, we can display our clusters like so:
celldisp(clusterList)
clusterList{1} =
1 2 3 4
clusterList{2} =
5 6 7 8
As such, cluster #1 contains nodes 1,2,3,4 while cluster #2 contains nodes 5,6,7,8.
Bear in mind that this code will only work if you sequentially label your nodes as you did in your diagram. You can't skip any label numbers (i.e. you can't do 1,2,4,6,9, etc. This should be 1,2,3,4,5).
Good luck!
You can use "off-the-shelf" Matlab commands for this problem. For example, you can use graphconncomp.
The answer from rayryeng is pretty good. However, here are some details I would like to point out:
"numel(nodeIDs)" would bring in possible errors if your node label is not sequentially (i.e., you have 10 nodes and the maximum node label is 20). You could switch "numel(nodeIDs)" to "max(nodeIDs) or a larger value."
I also believe the following code would bring in some problems (i.e., some nodes are missing and become isolated nodes) when utilizing this function:
connectedNodes = ConnectivityMatrix(ConnectivityMatrix(:,1) == node, :);
nodesToVisit = unique(connectedNodes(:,2).');
I modified the simple two lines with the following messy code:
connectedNodes1 = ConnectivityMatrix (ConnectivityMatrix (:,1) == node, :);
connectedNodes2 = ConnectivityMatrix (ConnectivityMatrix (:,2) == node, :);
AC=connectedNodes1(:,2).';
AD=connectedNodes2(:,1).';
ACA=reshape(AC,[],1);
ADA=reshape(AD,[],1);
AE= [ACA; ADA] ;
AEA=reshape(AE,[],1);
AEA=AEA';
nodesToVisit = unique(AEA);
After modifying these two points, there are no problems with rayryeng's initial code.

Generate data from kmean's clusters

So I have an input vector, A which is a row vector with 3,000 data points. Using MATLAB, I found 3 cluster centres for A.
Now that I have the 3 cluster centres, I have another row Vector B with 3000 points. The elements of B have one of three values: 1, 2 or 3. So say for e.g if the first 5 elements of B are
B(1,1:5) = [ 1 , 3, 3, 2, 1]
This means that B(1,1) belongs to cluster 1, B(1,2) belongs to cluster 3 etc. What I am trying to do is for every data point in the row vector B, I look at what cluster it belongs to by reading its value and then replace it with a data value from that cluster.
So after the above is done, the first 5 elements of B would look like:
B(1,1:5) = [ 2.7 , 78.4, 55.3, 19, 0.3]
Meaning that B(1,1) is a data value picked from the first cluster (that we got from A), B(1,2) is a data value picked from the third cluster (that we got from A) etc.
k-means only keeps means, it does not model the data distribution.
You cannot generate artificial data sensibly from k-means clusters without additional statistics and distribution assumptions.