Postgresql earth_box algorithm - postgresql

I found this tutorial how to find something in specified the radius. My question is what algorithm was used to implement it?

If you mean the earth_box, the idea is to come up with a data type that can be useful with a GIST index (inverted search tree):
http://www.postgresql.org/docs/current/static/gist-intro.html
See in particular the links at the bottom of the maintainers' page:
http://www.sai.msu.su/~megera/postgres/gist/
One leads to:
The GiST is a balanced tree structure like a B-tree, containing pairs. But keys in the GiST are not integers like the keys in a B-tree. Instead, a GiST key is a member of a user-defined class, and represents some property that is true of all data items reachable from the pointer associated with the key. For example, keys in a B+-tree-like GiST are ranges of numbers ("all data items below this pointer are between 4 and 6"); keys in an R-tree-like GiST are bounding boxes, ("all data items below this pointer are in Calfornia"); keys in an RD-tree-like GiST are sets ("all data items below this pointer are subsets of {1,6,7,9,11,12,13,72}"); etc. To make a GiST work, you just have to figure out what to represent in the keys, and then write 4 methods for the key class that help the tree do insertion, deletion, and search.
http://gist.cs.berkeley.edu/gist1.html
If you mean the earth distance itself, the meaty part of source is:
/* compute difference in longitudes - want < 180 degrees */
longdiff = fabs(long1 - long2);
if (longdiff > M_PI)
longdiff = TWO_PI - longdiff;
sino = sqrt(sin(fabs(lat1 - lat2) / 2.) * sin(fabs(lat1 - lat2) / 2.) +
                cos(lat1) * cos(lat2) * sin(longdiff / 2.) * sin(longdiff / 2.));
if (sino > 1.)
        sino = 1.;
return 2. * EARTH_RADIUS * asin(sino);
https://github.com/postgres/postgres/blob/master/contrib/earthdistance/earthdistance.c#L50
My math is too rusty to be affirmative on what the above does exactly, but my guess would be that it's computing the distance between two points on the surface of a sphere (without considering the height of the two points). In other words, nautical miles.

Related

limit random complex number to a given range

I can get the real part of a random number to stay withing a given range but the complex part of the number doesn't stay within the range I set. see matlab / octave code below.
xmin=-.5
xmax=1
n=3
x=xmin+rand(1,n)*(xmax-xmin)+(rand(1,n)-(xmax-xmin))*1i
x=x(:)
The real part works but the complex part isn't limited to -0.5 to 1
0.2419028288441536 - 0.6579427654754871i
0.2712527227134944 - 1.451964497492678i
0.3245051849394858 - 1.107556052779179i
You have two mistakes:
x=xmin+rand(1,n)*(xmax-xmin)+(xmin + rand(1,n)*(xmax-xmin))*1i
You should add xmin to the sum and change - to * in the second part.
I've added some spaces to your code so the difference more obvious:
x = xmin+rand(1,n)*(xmax-xmin) + ( rand(1,n)-(xmax-xmin) )*1i
^^^ correct ^^^ not correct: missing `xmin+`
(and as OmG noted, also a `-` instead of a `*`)
One good way to reduce the number of bugs is by avoiding code duplication. You could for example write:
rand_sequence = #(m,xmin,xmax) xmin+rand(1,n)*(xmax-xmin);
x = rand_sequence(n,xmin,xmax) + 1i*rand_sequence(n,xmin,xmax)
(This looks like more code, but the more complicated code logic is not duplicated.)
Or like this:
x = xmin + (rand(1,n)+1i*rand(1,n)) * (xmax-xmin);

Faster finding of rows in matlab

I have a very long list of x,y,z coordinates of items (200K-800K,3) that I need to search through and find the nearest items to a particular point - This final list is always has at least 1 item and usually less than 10 items.
I've tried a few simple search methods to find this list but I've hit a bit of a limit - here are my two best methods to date:
Method 1 - find Indexing
xInd = find(PositionsList(:,1) > (searchPoint(i,1) - searchRad) & PositionsList(:,1) < (searchPoint(i,1) + searchRad));
yInd = find(PositionsList(xInd,2) > (searchPoint(i,2) - searchRad) & PositionsList(xInd,2) < (searchPoint(i,2) + searchRad));
xyInd = xInd(yInd);
zInd = find(PositionsList(xyInd,3) > (searchPoint(i,3) - searchRad) & PositionsList(xyInd,3) < (searchPoint(i,3) + searchRad));
xyzInd = xyInd(zInd);
Method 2 - Brute force distance search
neighbours = sqrt(sum(bsxfun(#minus,searchPoint(i,:),PositionsList).^2,2)) <= searchRad;
xyzInd = find(neighbours == 1);
Method 3 - logial Indexing
xInd = PositionsList(:,1) > (searchPoint(i,1) - searchRad) & PositionsList(:,1) < (searchPoint(i,1) + searchRad);
newlist = PositionsList(xInd==1,:);
yzInd = newlist(:,2) > (searchPoint(i,2) - searchRad) & newlist(:,2) < (searchPoint(i,2) + searchRad)...
& newlist(:,3) > (searchPoint(i,3) - searchRad) & newlist(:,3) < (searchPoint(i,3) + searchRad);
xyzInd = newlist(yzInd==1,:);
For my data method 1 is much quicker - for a small list of 20000 particles it runs in about 25s whereas method 2 runs in about 170s, but method 2 is slightly more accurate - it has dubious neighbours (outlier on edge of search area) much less often.
My code calls this search several thousand times so I'm keen to save as much time on it as possible - it currently makes up about 85% of my run-time. I've read that mex implementation may be much quicker but I'm not familiar mex. I've also tried a 3rd method of logical indexing rather than find, but it is slower at 35s.
Can someone help with making this search faster? Maybe a mex function?
Following on from obchardon suggestion I had a look at k-d trees for searching and found the following, kd-tree for matlab, on file exchange with my data and began testing with my data.
Now I can complete a search in under 5s for 500K coordinates, compared with my previous best of 25s for 20K sets of coordinates. Huge improvement.
The only down side is the order it returns the neighbours seems to be random which mean I have some slight "caressing" of the results to do, but with the time saved, this is more than acceptable.
Thanks for the great suggestion!!

Is there a range data structure in Scala?

I'm looking for a way to handle ranges in Scala.
What I need to do is:
given a set of ranges and a range(A) return the range(B) where range(A) intersect range (B) is not empty
given a set of ranges and a range(A) remove/add range(A) from/to the set of ranges.
given range(A) and range(B) create a range(C) = [min(A,B), max(A,B)]
I saw something similar in java - http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/RangeSet.html
Though subRangeSet returns only the intersect values and not the range in the set (or list of ranges) that it intersects with.
RangeSet rangeSet = TreeRangeSet.create();
rangeSet.add(Range.closed(0, 10));
rangeSet.add(Range.closed(30, 40));
Range range = Range.closed(12, 32);
System.out.println(rangeSet.subRangeSet(range)); //[30,32] (I need [30,40])
System.out.println(range.span(Range.closed(30, 40))); //[12,40]
There is an Interval[A] type in the spire math library. This allows working with ranges of arbitrary types that define an Order. Boundaries can be inclusive, exclusive or omitted. So e.g. (-∞, 0.0] or [0.0, 1.0) would be possible intervals of doubles.
Here is a library intervalset for working with sets of non-overlapping intervals (IntervalSeq or IntervalTrie) as well as maps of intervals to arbitrary values (IntervalMap).
Here is a related question that describes how to use IntervalSeq with DateTime.
Note that if the type you want to use is 64bit or less (basically any primitive), IntervalTrie is extremely fast. See the Benchmarks.
As Tzach Zohar has mentioned in the comment, if all you need is range of Int - go for scala.collection.immutable.Range:
val rangeSet = Set(0 to 10, 30 to 40)
val r = 12 to 32
rangeSet.filter(range => range.contains(r.start) || range.contains(r.end))
If you need it for another underlying type - implement it by yourself, it's easy for your usecase.

Minimum distance to plot points on UIMapview

my query is whats the minimum distance required to plot point on a UIMAPVIEW, so that they could be shown as distinct points.
For e.g., suppose if there are users in same apartment, their would be hardly any distance between their latitude-longitude. So how do i differentiate them !
There are usually two ways - one is clustering, which means you use a marker with a number that indicates how many underlying markers there are. When tapping on that, the user is then shown the separate markers (or a recursive zoom-in that splits the markers up more and more). Superpin (www.getsuperpin.com) is one example, but there are others out there.
Another approach is to actually offset the marker from its real location. For this, you need some kind of distribution algorithm that offsets it just enough - that is, set the markers as close together as possible while still giving them enough surface area to be seen/touched. For this, we use Fibonacci's Sunflower patten. What you'd have to do is identify all the Annotations that have the same coordinate, group them, and then draw each group in a sequence while offsetting one from the other - for ex. have some code that iterates along a spiral shape and drops down markers along that spiral.
Can put up sample code etc to help if you're wanting to go with the second approach, let me know.
EDIT: I found some code we wrote for this, but it's not objective C. Can you read it?
class SunFlower
{
static $SEED_RADIUS = 2;
static $SCALE_FACTOR = 4;
static $PI2 = 6.28318531; // PI * 2
static $PHI = 1.61803399; // (sqrt(5)+1) / 2
public static function calcPos($xc, $yc, $factor, $i)
{
$theta = $i * SunFlower::$PI2 / SunFlower::$PHI;
$r = sqrt($i) * $factor;
$x = $xc + $r * cos($theta);
$y = $yc - $r * sin($theta);
if ($i == 1) {
$y += ($factor * 0.5);
}
return array($x, $y);
}
}

hash function providing unique uint from an integer coordinate pair

The problem in general:
I have a big 2d point space, sparsely populated with dots.
Think of it as a big white canvas sprinkled with black dots.
I have to iterate over and search through these dots a lot.
The Canvas (point space) can be huge, bordering on the limits
of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal:
I need a hash function taking a 2D point, returning a unique uint32.
So that no collisions can occur. You can assume that the number of
dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand
(it may even change),
so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise:
A hash function that provides keys with a very low probability of collision.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
Like Emil, but handles 16-bit overflows in x in a way that produces fewer collisions, and takes fewer instructions to compute:
hash = ( y << 16 ) ^ x;
You can recursively divide your XY plane into cells, then divide these cells into sub-cells, etc.
Gustavo Niemeyer invented in 2008 his Geohash geocoding system.
Amazon's open source Geo Library computes the hash for any longitude-latitude coordinate. The resulting Geohash value is a 63 bit number. The probability of collision depends of the hash's resolution: if two objects are closer than the intrinsic resolution, the calculated hash will be identical.
Read more:
https://en.wikipedia.org/wiki/Geohash
https://aws.amazon.com/fr/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/
https://github.com/awslabs/dynamodb-geo
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
Perhaps?
hash = ((y & 0xFFFF) << 16) | (x & 0xFFFF);
Works as long as x and y can be stored as 16 bit integers. No idea about how many collisions this causes for larger integers, though. One idea might be to still use this scheme but combine it with a compression scheme, such as taking the modulus of 2^16.
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
You can do
a >= b ? a * a + a + b : a + b * b
taken from here.
That works for points in positive plane. If your coordinates can be in negative axis too, then you will have to do:
A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;
But to restrict the output to uint you will have to keep an upper bound for your inputs. and if so, then it turns out that you know the bounds. In other words in programming its impractical to write a function without having an idea on the integer type your inputs and output can be and if so there definitely will be a lower bound and upper bound for every integer type.
public uint GetHashCode(whatever a, whatever b)
{
if (a > ushort.MaxValue || b > ushort.MaxValue ||
a < ushort.MinValue || b < ushort.MinValue)
{
throw new ArgumentOutOfRangeException();
}
return (uint)(a * short.MaxValue + b); //very good space/speed efficiency
//or whatever your function is.
}
If you want output to be strictly uint for unknown range of inputs, then there will be reasonable amount of collisions depending upon that range. What I would suggest is to have a function that can overflow but unchecked. Emil's solution is great, in C#:
return unchecked((uint)((a & 0xffff) << 16 | (b & 0xffff)));
See Mapping two integers to one, in a unique and deterministic way for a plethora of options..
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ).
You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs