Avg distance between points in a cluster - distance

Sounds like I got the concept but cant seems to get the implementation correct. eI have a cluster (an ArrayList) with multiple points, and I want to calculate avg distance. Ex: Points in cluster (A, B, C, D, E, F, ... , n), Distance A-B, Distance A-C, Distance A-D, ... Distance A,N, Distance (B,C) Distance (B,D)... Distance (B,N)...
Thanks in advance.

You don't want to double count any segment, so your algorithm should be a double for loop. The outer loop goes from A to M (you don't need to check N, because there'll be nothing left for it to connect to), each time looping from curPoint to N, calculating each distance. You add all the distances, and divide by the number of points (n-1)^2/2. Should be pretty simple.
There aren't any standard algorithms for improving on this that I'm aware of, and this isn't a widely studied problem. I'd guess that you could get a pretty reasonable estimate (if an estimate is useful) by sampling distances from each point to a handful of others. But that's a guess.
(After seeing your code example) Here's another try:
public double avgDistanceInCluster() {
double totDistance = 0.0;
for (int i = 0; i < bigCluster.length - 1; i++) {
for (int j = i+1; j < bigCluster.length; j++) {
totDistance += distance(bigCluster[i], bigCluster[j]);
}
}
return totDistance / (bigCluster.length * (bigCluster.length - 1)) / 2;
}
Notice that the limit for the first loop is different.
Distance between two points is probably sqrt((x1 - x2)^2 + (y1 -y2)^2).

THanks for all the help, Sometimes after explaining the question on forum answer just popup to your mind. This is what I end up doing.
I have a cluster of point, and I need to calculate the avg distance of points (pairs) in the cluster. So, this is what I did. I am sure someone will come with a better answer if so please drop a note. Thanks in advance.
/**
* Calculate avg distance between points in cluster
* #return
*/
public double avgDistanceInCluster() {
double avgDistance = 0.0;
Stack<Double> holder = new Stack<Double>();
for (int i = 0; i < cluster.size(); i++) {
System.out.println(cluster.get(i));
for (int j = i+1; j < cluster.size(); j++) {
avgDistance = (cluster.get(i) + cluster.get(j))/2;
holder.push(avgDistance);
}
}
Iterator<Double> iter = holder.iterator();
double avgClusterDist = 0;
while (iter.hasNext()) {
avgClusterDist =+ holder.pop();
System.out.println(avgClusterDist);
}
return avgClusterDist/cluster.size();
}

Related

Minimum distance between two axis-aligned boxes in n-dimensions

Question: How can I efficiently compute the minimum distance between two axis-aligned boxes in n-dimensions?
Box format: The boxes, A and B, are given by their minimum and maximum points, A_min, A_max, B_min, B_max, each of which is a n-dimensional vector. That is, the boxes may be written mathematically as the following cartesian products of intervals:
A = [A_min(1), A_max(1)] x [A_min(2), A_max(2)] x ... x [A_min(n), A_max(n)]
B = [B_min(1), B_max(1)] x [B_min(2), B_max(2)] x ... x [B_min(n), B_max(n)]
Picture: here is a picture demonstrating the idea in 2D:
Note: Note: I ask this question, and answer it myself, because this question (in general n-dimensional form) appears to be absent from stackoverflow even after all these years. Good answers to this question are hard to find on the internet more generally. After googling around, I eventually had to figure this out myself, and am posting here to spare future people the same trouble.
The minimum distance between the boxes is given by:
dist = sqrt(||u||^2 + ||v||^2)
where
u = max(0, A_min - B_max)
v = max(0, B_min - A_max)
The maximization is done entrywise on the vectors (i.e., max(0, w) means replace all negative entries of vector w with zero, but leave the positive entries unchanged). The notation ||w|| means the euclidean norm of the vector w (square root of the sum of the squares of the entries).
This does not require any case-by-case analysis, and works for any dimension regardless of where the boxes are with respect to each other.
python code:
import numpy as np
def boxes_distance(A_min, A_max, B_min, B_max):
delta1 = A_min - B_max
delta2 = B_min - A_max
u = np.max(np.array([np.zeros(len(delta1)), delta1]), axis=0)
v = np.max(np.array([np.zeros(len(delta2)), delta2]), axis=0)
dist = np.linalg.norm(np.concatenate([u, v]))
return dist
type Rect = { x: number; y: number; length: number; width: number };
export function boxesDistance(a: Rect, b: Rect) {
const deltas = [a.x - b.x - b.width, a.y - b.y - b.length, b.x - a.x - a.width, b.y - a.y - a.length];
const sum = deltas.reduce((total, d) => {
return d > 0 ? total + d ** 2 : total;
}, 0);
return Math.sqrt(sum);
}
This is the equivalent code in typescript without the use of any libraries, though the input parameters were slightly different in my case.
The distance between two axis-aligned bounding boxes (AABB) can be computed as follows:
Find the intersection box of two input boxes, which can be expressed in C++:
Box Box::intersection( const Box & b ) const
{
Box res;
for ( int i = 0; i < V::elements; ++i )
{
res.min[i] = std::max( min[i], b.min[i] );
res.max[i] = std::min( max[i], b.max[i] );
}
return res;
}
where min and max are two corner points of a box. The "intersection" will be inverted (res.min[i] > res.max[i]) if two input boxes do not intersect actually.
Then the squared distance between two boxes is:
T Box::getDistanceSq( const Box & b ) const
{
auto ibox = intersection( b );
T distSq = 0;
for ( int i = 0; i < V::elements; ++i )
if ( ibox.min[i] > ibox.max[i] )
distSq += sqr( ibox.min[i] - ibox.max[i] );
return distSq;
}
The function returns zero if input boxes touch or intersect.
The code above was taken from MeshLib and it works for arbitrary n-dimensions cases (V::elements=n).

Moving an object along a path with constant speed

I have an object path composed by a polyline (3D point array) with points VERY unevenly distributed. I need to move an object at constant speed using a timer with interval set at 10 ms.
Unevenly distributed points produce variable speed to the human eye. So now I need to decide how to treat this long array of 3D points.
The first idea I got was to subdivide long segments in smaller parts. It works better but where points are jam-packed the problem persists.
What's the best approach in these cases? Another idea, could be to simplify the original path using Ramer–Douglas–Peucker algorithm, then to subdivide it evenly again but I'm not sure if it will fully resolve my problem.
This should be a fairly common problem in many areas of the 3D graphics, so does a proven approach exist?
I made a JavaScript pen for you https://codepen.io/dawken/pen/eYpxRmN?editors=0010 but it should be very similar in any other language. Click on the rect to add points.
You have to maintain a time dependent distance with constant speed, something like this:
const t = currentTime - startTime;
const distance = (t * speed) % totalLength;
Then you have to find the two points in the path such that the current distance is intermediate between the "distance" on the path; you store the "distance from start of the path" on each point {x, y, distanceFromStart}. The first point points[i] such that distance < points[i].distanceFromStart is your destination; the point before that points[i - 1] is your source. You need to interpolate linearly between them.
Assuming that you have no duplicate points (otherwise you get a division by zero) you could do something like this.
for (let i = 0; i < points.length; i++) {
if (distance < points[i].distanceFromStart) {
const pFrom = points[i - 1];
const pTo = points[i];
const f = (distance - pFrom.distanceFromStart) / (pTo.distanceFromStart- pFrom.distanceFromStart);
const x = pFrom.x + (pTo.x - pFrom.x) * f;
const y = pFrom.y + (pTo.y - pFrom.y) * f;
ctx.fillRect(x - 1, y - 1, 3, 3);
break;
}
}
See this pen. Click on the rectangle to add points: https://codepen.io/dawken/pen/eYpxRmN?editors=0010

relation between harris detector results in matlab and opencv

I am working on corner feature detection using harris detector. I wrote program detect feature in image in matlab using following code to detect harris feature
corners = detectHarrisFeatures(img, 'MinQuality', 0.0001);
S = corners.selectStrongest(100);
then I transfer all program from matlab to opencv
I used following code to detect harris corner points
int thresh = 70;
for( int j = 0; j < dst_norm.rows && cont < 100; j++ )
{
for( int i = 0; i < dst_norm.cols && cont < 100; i++ )
{
if((int) dst_norm.at<float>(j, i) > thresh )
{
S.at<int>(cont, 0) = i;
S.at<int>(cont, 1) = j;
I.at<int>(cont, 0) = i;
I.at<int>(cont, 1) = j;
cont = cont + 1;
}
}
}
extracted region was different in both program and I discovered that harris detected corner points in matlab not as harris detected corner points in opencv.
How can I make detected corner points from both programs are same?
Is dst_norm an array of Harris corner metric values? In that case you are choosing first 100 pixels with the corner metric above the threshold, which is incorrect.
In your MATLAB code, detectHarrisFeatures finds points which are local maxima of the corner metric. Then selectStrongest method selects 100 of those points with the highest metric. So, first you have to find the local maxima. Then you have to sort them, and take the top 100.
Even then, the results will not be exactly the same, because detectHarrisFeatures locates the corners with sub-pixel accuracy, using interpolation.

Procedural structure generation

I have a voxel based game in development right now and I generate my world by using Simplex Noise so far. Now I want to generate some other structures like rivers, cities and other stuff, which can't be easily generated because I split my world (which is practically infinite) into chunks of 64x128x64. I already generated trees (the leaves can grow into neighbouring chunks), by generating the trees for a chunk, plus the trees for the 8 chunks surrounding it, so leaves wouldn't be missing. But if I go into higher dimensions that can get difficult, when I have to calculate one chunk, considering chunks in an radius of 16 other chunks.
Is there a way to do this a better way?
Depending on the desired complexity of the generated structure, you may find it useful to first generate it in a separate array, perhaps even a map (a location-to-contents dictionary, useful in case of high sparseness), and then transfer the structure to the world?
As for natural land features, you may want to google how fractals are used in landscape generation.
I know this thread is old and I suck at explaining, but I'll share my approach.
So for example 5x5x5 trees. What you want is for your noise function to return the same value for an area of 5x5 blocks, so that even outside of the chunk, you can still check if you should generate a tree or not.
// Here the returned value is different for every block
float value = simplexNoise(x * frequency, z * frequency) * amplitude;
// Here it will return the same value for an area of blocks (you should use floorDiv instead of dividing, or you it will get negative coordinates wrong (-3 / 5 should be -1, not 0 like in normal division))
float value = simplexNoise(Math.floorDiv(x, 5) * frequency, Math.floorDiv(z, 5) * frequency) * amplitude;
And now we'll plant a tree. For this we need to check what x y z position this current block is relative to the tree's starting position, so we can know what part of the tree this block is.
if(value > 0.8) { // A certain threshold (checking if tree should be generated at this area)
int startX = Math.floorDiv(x, 5) * 5; // flooring the x value to every 5 units to get the start position
int startZ = Math.floorDiv(z, 5) * 5; // flooring the z value to every 5 units to get the start position
// Getting the starting height of the trunk (middle of the tree , that's why I'm adding 2 to the starting x and starting z), which is 1 block over the grass surface
int startY = height(startX + 2, startZ + 2) + 1;
int relx = x - startX; // block pos relative to starting position
int relz = z - startZ;
for(int j = startY; j < startY + 5; j++) {
int rely = j - startY;
byte tile = tree[relx][rely][relz]; // Get the needing block at this part of the tree
tiles[i][j][k] = tile;
}
}
The tree 3d array here is almost like a "prefab" of the tree, which you can use to know what block to set at the position relative to the starting point. (God I don't know how to explain this, and having english as my fifth language doesn't help me either ;-; feel free to improve my answer or create a new one). I've implemented this in my engine, and it's totally working. The structures can be as big as you want, with no chunk pre loading needed. The one problem with this method is that the trees or structures will we spawned almost within a grid, but this can easily be solved with multiple octaves with different offsets.
So recap
for (int i = 0; i < 64; i++) {
for (int k = 0; k < 64; k++) {
int x = chunkPosToWorldPosX(i); // Get world position
int z = chunkPosToWorldPosZ(k);
// Here the returned value is different for every block
// float value = simplexNoise(x * frequency, z * frequency) * amplitude;
// Here it will return the same value for an area of blocks (you should use floorDiv instead of dividing, or you it will get negative coordinates wrong (-3 / 5 should be -1, not 0 like in normal division))
float value = simplexNoise(Math.floorDiv(x, 5) * frequency, Math.floorDiv(z, 5) * frequency) * amplitude;
if(value > 0.8) { // A certain threshold (checking if tree should be generated at this area)
int startX = Math.floorDiv(x, 5) * 5; // flooring the x value to every 5 units to get the start position
int startZ = Math.floorDiv(z, 5) * 5; // flooring the z value to every 5 units to get the start position
// Getting the starting height of the trunk (middle of the tree , that's why I'm adding 2 to the starting x and starting z), which is 1 block over the grass surface
int startY = height(startX + 2, startZ + 2) + 1;
int relx = x - startX; // block pos relative to starting position
int relz = z - startZ;
for(int j = startY; j < startY + 5; j++) {
int rely = j - startY;
byte tile = tree[relx][rely][relz]; // Get the needing block at this part of the tree
tiles[i][j][k] = tile;
}
}
}
}
So 'i' and 'k' are looping withing the chunk, and 'j' is looping inside the structure. This is pretty much how it should work.
And about the rivers, I personally haven't done it yet, and I'm not sure why you need to set the blocks around the chunk when generating them ( you could just use perlin worms and it would solve problem), but it's pretty much the same idea, and for your cities too.
I read something about this on a book and what they did in these cases was to make a finer division of chunks depending on the application, i.e.: if you are going to grow very big objects, it may be useful to have another separated logic division of, for example, 128x128x128, just for this specific application.
In essence, the data resides is in the same place, you just use different logical divisions.
To be honest, never did any voxel, so don't take my answer too serious, just throwing ideas. By the way, the book is game engine gems 1, they have a gem on voxel engines there.
About rivers, can't you just set a level for water and let rivers autogenerate in mountain-side-mountain ladders? To avoid placing water inside mountain caveats, you could perform a raycast up to check if it's free N blocks up.

How does the link type "adjusted complete" work for agglomerative hierachical clustering in WEKA?

The only descriptions I can find about "adjusted complete" linkage say something like: "same as complete linkage, but with largest within cluster distance"
What is meant by "within cluster distance"?
How is the distance between two clusters finally calculated using this linkage approach?
Thanks for your replies!
One of the great things about open-source software is that you can find out exactly how the software works. The code below shows Weka's source code of the HierarchicalClusterer algorithm, more specifically it shows the part which implements the COMPLETE and ADJCOMPLETE functionality. The difference is as follows:
Just like the COMPLETE linkage method, compute the maximum distance between one node from cluster 1 and one node from cluster 2 and store this in fBestDist
Then, find the largest distance between nodes within cluster 1 or cluster 2 and store this in fMaxDist
Finally subtract fMaxDist from fBestDist
So the distance between two clusters calculated using ADJCOMPLETE as linkType corresponds to the COMPLETE distance minus the largest distance between 2 nodes within either cluster 1 or cluster 2.
Adjusted Complete-Link was proposed in the following paper:
Sepandar Kamvar, Dan Klein and Christopher Manning (2002). Interpreting and Extending Classical Agglomerative Clustering Algorithms Using a Model-Based Approach. In Proceedings of 19th International Conference on Machine Learning (ICML-2002)
According to it (section 4.2), Adjusted Complete-Link is a version of Complete-Link which should be used if the clusters having varying radii (see Figure 10).
case COMPLETE:
case ADJCOMLPETE:
// find complete link distance aka maximum link, which is the largest distance between
// any item in cluster1 and any item in cluster2
fBestDist = 0;
for (int i = 0; i < cluster1.size(); i++) {
int i1 = cluster1.elementAt(i);
for (int j = 0; j < cluster2.size(); j++) {
int i2 = cluster2.elementAt(j);
double fDist = fDistance[i1][i2];
if (fBestDist < fDist) {
fBestDist = fDist;
}
}
}
if (m_nLinkType == COMPLETE) {
break;
}
// calculate adjustment, which is the largest within cluster distance
double fMaxDist = 0;
for (int i = 0; i < cluster1.size(); i++) {
int i1 = cluster1.elementAt(i);
for (int j = i+1; j < cluster1.size(); j++) {
int i2 = cluster1.elementAt(j);
double fDist = fDistance[i1][i2];
if (fMaxDist < fDist) {
fMaxDist = fDist;
}
}
}
for (int i = 0; i < cluster2.size(); i++) {
int i1 = cluster2.elementAt(i);
for (int j = i+1; j < cluster2.size(); j++) {
int i2 = cluster2.elementAt(j);
double fDist = fDistance[i1][i2];
if (fMaxDist < fDist) {
fMaxDist = fDist;
}
}
}
fBestDist -= fMaxDist;
break;