Union of disjoint sets - merge

I am looking at disjoint sets that support the function of the Union.
The technique of height reduction of a tree:
We always merge the smaller tree to the greater one, i.e. we make the root of the smaller tree to point to the root of the greater tree.
A tree is greater than an other if it has more nodes.
Each node is a struct with fields: some information for the element, the pointer "parent" to the parent node, and a counter "count", that is used only if the node is the root and contains the number of the nodes at the up-tree.
The following algorithm merges two up trees:
pointer UpTreeUnion(pointer S,T) {
if (S == NULL OR P == NULL) return;
if (S->count >= T->count) {
S->count = S->count + T->count;
T->parent = S;
return S;
}
else {
T->count = T->count + S->count;
S->parent = T;
return T;
}
}
Consider an implementation of disjoint sets with union, where there can be at most k disjoint sets.
The implementation uses a hash table A[0.. max-1] at which there are stored keys based on the method ordered double hashing.
Let h1 and h2 be the primary and the secondary hash function, respectively. A contains the keys of the nodes of all of the above trees and also a pointer to the corresponding node for each of them.
I want to write an algorithm that takes as parameters the keys of two nodes and merges the up-trees to which the nodes belong (the nodes can belong to any up-trees, even at the same in which case it appears an appropriate message). At merging, we should apply techniques of path compression and height reduction.
Could you give me a hint how we could do this?
Suppose that we have this array:
At the beginning the nodes will be like that:
Then if k1=100, k2=5, after applying the algorithm, will we get this?
Then if we have k1=59, k2=5, we will get the following:
Right? Then applying the path compression we start doing this:
tmp=B
while (B->parent!=B)
parent=B->parent;
tmp->parent=B;
tmp=parent;
}
So we will have parent=F, tmp->parent=B, tmp=F.
How do we continue?
Having then k1=14, k2=59 we get this:

First, when you get keys, you need to find them in the hash table.
Hash table contains entries: (key, pointer-to-node).
Let's say you want to find key k. You check:
A[h1(k) + 0*h2(k) mod size(A)] - if it contains key k, you read corresponding pointer-to-node.
If there is something other than k, you check:
A[h1(k) + 1*h2(k) mod size(A)],
A[h1(k) + 2*h2(k) mod size(A)],
A[h1(k) + i*h2(k) mod size(A)]... until you find key k.
Now that you have pointers to 2 nodes, you need to find roots of the trees those nodes belong to. To find the root, you go up the tree until you reach root node. You use parent pointer of each node for it and you can assume that root's parent pointer points to itself for example.
Now that you have 2 roots, you can merge them using upTreeUnion.
Path compression works like this:
After you have found root of a tree for node s, you follow the path from s to root one more time and set parent pointer of every node on the path to the root.
Update:
Algorithm(k1,k2){
int i=0,j=0;
int i1,i2;
while (i<max and A[i1 = h1(k1)+i*h2(k1) mod size(A)]->key!=k1){
i++;
}
while (j<max and A[i2 = h1(k2)+j*h2(k2) mod size(A)]->key!=k2){
j++;
}
if (A[i1]->key!=k1) return;
if (A[i2]->key!=k2) return;
pointer node1,node2,root1,root2;
node1=A[i1]->node;
node2=A[i2]->node;
root1=UpTreeFind(node1);
root2=UpTreeFind(node2);
if (root1==root2){
printf("The nodes belong to the same up tree");
return;
}
// path compression
pointer tmp,tmpParent;
tmp = node1;
while (tmp->parent!=root1) {
tmpParent=tmp->parent;
tmp->parent=root1;
tmp=tmpParent;
}
tmp = node2;
while (tmp->parent!=root2) {
tmpParent=tmp->parent;
tmp->parent=root2;
tmp=tmpParent;
}
UpTreeUnion(root1,root2);
}

Related

How To Use kmedoids from pyclustering with set number of clusters

I am trying to use k-medoids to cluster some trajectory data I am working with (multiple points along the trajectory of an aircraft). I want to cluster these into a set number of clusters (as I know how many types of paths there should be).
I have found that k-medoids is implemented inside the pyclustering package, and am trying to use that. I am technically able to get it to cluster, but I do not know how to control the number of clusters. I originally thought it was directly tied to the number of elements inside what I called initial_medoids, but experimentation shows that it is more complicated than this. My relevant code snippet is below.
Note that D holds a list of lists. Each list corresponds to a single trajectory.
def hausdorff( u, v):
d = max(directed_hausdorff(u, v)[0], directed_hausdorff(v, u)[0])
return d
traj_count = len(traj_lst)
D = np.zeros((traj_count, traj_count))
for i in range(traj_count):
for j in range(i + 1, traj_count):
distance = hausdorff(traj_lst[i], traj_lst[j])
D[i, j] = distance
D[j, i] = distance
from pyclustering.cluster.kmedoids import kmedoids
initial_medoids = [104, 345, 123, 1]
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()[0]
num_clusters = len(np.unique(cluster_lst))
print('There were %i clusters found' %num_clusters)
I have a total of 1900 trajectories, and the above-code finds 1424 clusters. I had expected that I could control the number of clusters through the length of initial_medoids, as I did not see any option to input the number of clusters into the program, but this seems unrelated. Could anyone guide me as to the mistake I am making? How do I choose the number of clusters?
In case of requirement to obtain clusters you need to call get_clusters():
cluster_lst = kmedoids_instance.get_clusters()
Not get_clusters()[0] (in this case it is a list of object indexes in the first cluster):
cluster_lst = kmedoids_instance.get_clusters()[0]
And that is correct, you can control amount of clusters by initial_medoids.
It is true you can control the number of cluster, which correspond to the length of initial_medoids.
The documentation is not clear about this. The get__clusters function "Returns list of medoids of allocated clusters represented by indexes from the input data". so, this function does not return the cluster labels. It returns the index of rows in your original (input) data.
Please check the shape of cluster_lst in your example, using .get_clusters() and not .get_clusters()[0] as annoviko suggested. In your case, this shape should be (4,). So, you have a list of four elements (clusters), each containing the index or rows in your original data.
To get, for example, data from the first cluster, use:
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()
traj_lst_first_cluster = traj_lst[cluster_lst[0]]

system verilog 2 dimensional dynamic array randomization

I am trying to use system verilog constraint solver to solve the following problem statement :
We have N balls each with unique weight and these balls need to be distributed into groups , such that weight of each group does not exceed a threshold ( MAX_WEIGHT) .
Now i want to find all such possible solutions . The code I wrote in SV is as follows :
`define NUM_BALLS 5
`define MAX_WEIGHT_BUCKET 100
class weight_distributor;
int ball_weight [`NUM_BALLS];
rand int unsigned solution_array[][];
constraint c_solve_bucket_problem
{
foreach(solution_array[i,j]) {
solution_array[i][j] inside {ball_weight};
//unique{solution_array[i][j]};
foreach(solution_array[ii,jj])
if(!((ii == i) & (j == jj))) solution_array[ii][jj] != solution_array[i][j];
}
foreach(solution_array[i,])
solution_array[i].sum() < `MAX_WEIGHT_BUCKET;
}
function new();
ball_weight = {10,20,30,40,50};
endfunction
function void post_randomize();
foreach(solution_array[i,j])
$display("solution_array[%0d][%0d] = %0d", i,j,solution_array[i][j]);
$display("solution_array size = %0d",solution_array.size);
endfunction
endclass
module top;
weight_distributor weight_distributor_o;
initial begin
weight_distributor_o = new();
void'(weight_distributor_o.randomize());
end
endmodule
The issue i am facing here is that i want the size of both the dimentions of the array to be randomly decided based on the constraint solution_array[i].sum() < `MAX_WEIGHT_BUCKET; . From my understanding of SV constraints i believe that the size of the array will be solved before value assignment to the array .
Moreover i also wanted to know if unique could be used for 2 dimentional dynamic array .
You can't use the random number generator (RNG) to enumerate all possible solutions of your problem. It's not built for this. An RNG can give you one of these solutions with each call to randomize(). It's not guaranteed, though, that it gives you a different solution with each call. Say you have 3 solutions, S0, S1, S2. The solver could give you S1, then S2, then S1 again, then S1, then S0, etc. If you know how many solutions there are, you can stop once you've seen them all. Generally, though, you don't know this beforehand.
What an RNG can do, however, is check whether a solution you provide is correct. If you loop over all possible solutions, you can filter out only the ones that are correct. In your case, you have N balls and up to N groups. You can start out by putting each ball into one group and trying if this is a correct solution. You can then put 2 balls into one group and all the other N - 2 into a groups of one. You can put two other balls into one group and all the others into groups of one. You can start putting 2 balls into one group, 2 other balls into one group and all the other N - 4 into groups of one. You can continue this until you put all N balls into the same group. I'm not really sure how you can easily enumerate all solutions. Combinatorics can help you here. At each step of the way you can check whether a certain ball arrangement satisfies the constraints:
// Array describing an arrangement of balls
// - the first dimension is the group
// - the second dimension is the index within the group
typedef unsigned int unsigned arrangement_t[][];
// Function that gives you the next arrangement to try out
function arrangement_t get_next_arrangement();
// ...
endfunction
arrangement = get_next_arrangement();
if (weight_distributor_o.randomize() with {
solution.size() == arrangement.size();
foreach (solution[i]) {
solution[i].size() == arrangement[i].size();
foreach (solution[i][j])
solution[i][j] == arrangement[i][j];
}
})
all_solutions.push_back(arrangement);
Now, let's look at weight_distributor. I'd recommend you write each requirement in an own constraint as this makes the code much more readable.
You can shorten you uniqueness constraint that you wrote as a double loop to using the unique operator:
class weight_distributor;
// ...
constraint unique_balls {
unique { solution_array };
}
endclass
You already had a constraint that each group can have at most MAX_WEIGHT in it:
class weight_distributor;
// ...
constraint max_weight_per_group {
foreach (solution_array[i])
solution_array[i].sum() < `MAX_WEIGHT_BUCKET;
}
endclass
Because of the way array sizes are solved, it's not possible to write constraints that will ensure that you can compute a valid solution using simple calls randomize(). You don't need this, though, if you want to check whether a solution is valid. This is due to the constraints on array sizes in the between arrangement and solution_array.
Try this.!
class ABC;
rand bit[3:0] md_array [][]; // Multidimansional Arrays with unknown size
constraint c_md_array {
// First assign the size of the first dimension of md_array
md_array.size() == 2;
// Then for each sub-array in the first dimension do the following:
foreach (md_array[i]) {
// Randomize size of the sub-array to a value within the range
md_array[i].size() inside {[1:5]};
// Iterate over the second dimension
foreach (md_array[i][j]) {
// Assign constraints for values to the second dimension
md_array[i][j] inside {[1:10]};
}
}
}
endclass
module tb;
initial begin
ABC abc = new;
abc.randomize();
$display ("md_array = %p", abc.md_array);
end
endmodule
https://www.chipverify.com/systemverilog/systemverilog-foreach-constraint

Anylogic referencing columns in a collection

I am using a collection to represent available trucks in a system. I am using a 1 or 0 for a given index number, using a 1 to say that indexed truck is available. I am then trying to assign that index number to a customer ID. I am trying to randomly select an available truck from those listed as available. I am getting an error saying the left-hand side of an assignment must be a variable and highlighting the portion of the code reading Available_Trucks() = 1. This is the code:
agent.ID = randomWhere(Available_Trucks, Available_Trucks() = 1);
The way you are doing it won't work... randomWhere when applied to a collection of integers, will return the element of the collection (in this case 1 or 0).
So doing
randomWhere(Available_Trucks,at->at==1); //this is the right synthax
will return 1 always since that's the value of the number chosen in the collection. So what you need is to get the index of the number of the collection that is equal to 1. But you will have to create a function to do that yourself... something like this (probably not the best way but it works: agent.ID=getRandomAvailbleTruck(Available_Trucks);
And the function getRandomAvailbleTruck will take as an argument a collection (arrayList probably).. it will return -1 if there is no availble truck
int availableTrucks=count(collection,c->c==1);
if(availableTrucks==0) return -1;
int rand=uniform_discr(1,availableTrucks);
int i=0;
int j=0;
while(i<rand){
if(collection.get(j)==1){
i++;
if(i==rand){
return j;
}
}
j++;
}
return -1;
Now another idea is to instead of using 0 and 1 for the availability, you can use correlative numbers: 1,2,3,4,5 ... etc and use a 0 if it's not available. For instance if truck 3 is not availble, the array will be 1,2,0,4,5 and if it's available it will be 1,2,3,4,5.
In that case you can use
agent.ID=randomTrue(available_trucks,at->at>0);
But you will get an error if there is no available truck, so check that.
Nevertheless, what you are doing is horrible practice... And there is a much easier way to do it if you put the availability in your truck if your truck is an agent...
Then you can just do
Truck truck=randomWhere(trucks,t->t.available==1);
if(truck!=null)
agent.ID=truck.ID;

Overriding `Comparison method violates its general contract` exception

I have a comparator like this:
lazy val seq = mapping.toSeq.sortWith { case ((_, set1), (_, set2)) =>
// Just propose all the most connected nodes first to the users
// But also allow less connected nodes to pop out sometimes
val popOutChance = random.nextDouble <= 0.1D && set2.size > 5
if (popOutChance) set1.size < set2.size else set1.size > set2.size
}
It is my intention to compare sets sizes such that smaller sets may appear higher in a sorted list with 10% chance.
But compiler does not let me do that and throws an Exception: java.lang.IllegalArgumentException: Comparison method violates its general contract! once I try to use it in runtime. How can I override it?
I think the problem here is that, every time two elements are compared, the outcome is random, thus violating the transitive property required of a comparator function in any sorting algorithm.
For example, let's say that some instance a compares as less than b, and then b compares as less than c. These results should imply that a compares as less than c. However, since your comparisons are stochastic, you can't guarantee that outcome. In fact, you can't even guarantee that a will be less than b next time they're compared.
So don't do that. No sort algorithm can handle it. (Such an approach also violates the referential transparency principle of functional programming and will make your program much harder to reason about.)
Instead, what you need to do is to decorate your map's members with a randomly assigned weighting - before attempting to sort them - so that they can be sorted consistently. However, since this happens at the start of a sort operation, the result of the sort will be different each time, which I think is what you're looking for.
It's not clear what type mapping has in your example, but it appears to be something like: Map[Any, Set[_]]. (You can replace the types as required - it's not that important to this approach. For example, say mapping actually has the type Map[String, Set[SomeClass]], then you would replace references below to Any with String and Set[_] to Set[SomeClass].)
First, we'll create a case class that we'll use to score and compare the map elements. Then we'll map the contents of mapping to a sequence of elements of this case class. Next, we sort those elements. Finally, we extract the tuple from the decorated class. The result should look something like this:
final case class Decorated(x: (Any, Set[_]), rand: Double = random.nextDouble)
extends Ordered[Decorated] {
// Calculate a rank for this element. You'll need to change this to suit your precise
// requirements. Here, if rand is less than 0.1 (a 10% chance), I'm adding 5 to the size;
// otherwise, I'll report the actual size. This allows transitive comparisons, since
// rand doesn't change once defined. Values are negated so bigger sets come to the fore
// when sorted.
private def rank: Int = {
if(rand < 0.1) -(x._2.size + 5)
else -x._2.size
}
// Compare this element with another, by their ranks.
override def compare(that: Decorated): Int = rank.compare(that.rank)
}
// Now sort your mapping elements as follows and convert back to tuples.
lazy val seq = mapping.map(x => Decorated(x)).toSeq.sorted.map(_.x)
This should put the elements with larger sets towards the front, but there's 10% chance that sets appear 5 bigger and so move up the list. The result will be different each time the last line is re-executed, since map will create new random values for each element. However, during sorting, the ranks will be fixed and will not change.
(Note that I'm setting the rank to a negative value. The Ordered[T] trait sorts elements in ascending order, so that - if we sorted purely by set size - smaller sets would come before larger sets. By negating the rank value, sorting will put larger sets before smaller sets. If you don't want this behavior, remove the negations.)

how to get a parentNode's index i using d3.js

Using d3.js, were I after (say) some value x of a parent node, I'd use:
d3.select(this.parentNode).datum().x
What I'd like, though, is the data (ie datum's) index. Suggestions?
Thanks!
The index of an element is only well-defined within a collection. When you're selecting just a single element, there's no collection and the notion of an index is not really defined. You could, for example, create a number of g elements and then apply different operations to different (overlapping) subsets. Any individual g element would have several indices, depending on the subset you consider.
In order to do what you're trying to achieve, you would have to keep a reference to the specific selection that you want to use. Having this and something that identifies the element, you can then do something like this.
var value = d3.select(this.parentNode).datum().x;
var index = -1;
selection.each(function(d, i) { if(d.x == value) index = i; });
This relies on having an attribute that uniquely identifies the element.
If you have only one selection, you could simply save the index as another data attribute and access it later.
var gs = d3.selectAll("g").data(data).append("g")
.each(function(d, i) { d.index = i; });
var something = gs.append(...);
something.each(function() {
d3.select(this.parentNode).datum().index;
});