How do I optimize point-to-circle matching? - postgresql

I have a table that contains a bunch of Earth coordinates (latitude/longitude) and associated radii. I also have a table containing a bunch of points that I want to match with those circles, and vice versa. Both are dynamic; that is, a new circle or a new point can be added or deleted at any time. When either is added, I want to be able to match the new circle or point with all applicable points or circles, respectively.
I currently have a PostgreSQL module containing a C function to find the distance between two points on earth given their coordinates, and it seems to work. The problem is scalability. In order for it to do its thing, the function currently has to scan the whole table and do some trigonometric calculations against each row. Both tables are indexed by latitude and longitude, but the function can't use them. It has to do its thing before we know whether the two things match. New information may be posted as often as several times a second, and checking every point every time is starting to become quite unwieldy.
I've looked at PostgreSQL's geometric types, but they seem more suited to rectangular coordinates than to points on a sphere.
How can I arrange/optimize/filter/precalculate this data to make the matching faster and lighten the load?

You haven't mentioned PostGIS - why have you ruled that out as a possibility?
http://postgis.refractions.net/documentation/manual-2.0/PostGIS_Special_Functions_Index.html#PostGIS_GeographyFunctions

Thinking out loud a bit here... you have a point (lat/long) and a radius, and you want to find all extisting point-radii combinations that may overlap? (or some thing like that...)
Seems you might be able to store a few more bits of information Along with those numbers that could help you rule out others that are nowhere close during your query... This might avoid a lot of trig operations.
Example, with point x,y and radius r, you could easily calculate a range a feasible lat/long (squarish area) that could be used to help rule it out if needless calculations against another point.
You could then store the max and min lat and long along with that point in the database. Then, before running your trig on every row, you could Filter your results to eliminate points obviously out of bounds.

If I undestand you correctly then my first idea would be to cache some data and eliminate most of the checking.
Like imagine your circle is actually a box and it has 4 sides
you could store the base coordinates of those lines much like you have lines (a mesh) on a real map. So you store east, west, north, south edge of each circle
If you get your coordinate and its outside of that box you can be sure it won't be inside the circle either since the box is bigger than the circle.
If it isn't then you have to check like you do now. But I guess you can eliminate most of the steps already.

Related

DITMatlab: How to calculate hysteresis for experimental data set?

I got an experimental data set that looks more or less like this.
I need to determine how big the hysteresis loop is, aka if I look at two points with the same capacity (Y axis), whats the maximum distance between said points (X axis).
The issue is, data points arent located on the same Y value, aka I cant just find max X and min X for every Y and subtract them - that'd be too easy :^)
I figured I can use convex hull (convhull) to calculate the outer envelope of the set, but then I realised, it will only work for the convex part, not the concaved part, but I guess I can divide my data set into smaller subsets and find a sum of them... or something.
And then, assuming I have the data set thats only the outer outline of the data set, I need to calculate distances between left and right border (as shown here), but then again, thats just data set of X and Y, and Id need to find the point where green line crosses outer rim
So here are the questions:
Is there a matlab procedure that calculates the outer outline of data set, that works with the concaved part - kinda like convhull, but better?
Assuming I have the outline data set, is there an easy way to calculate secant line of data set, like shown on second picture??
Thanks for any advice, hope I made what I have in mind clear enough - english isnt my first language
Benji
EDIT 1: Or perhaps there is an easier (?) way to determine, which points form biggest outline? Like... group points into (duh) groups, lets say, those near 20%, 30%, 40%... and then pick two randomly (or brute force pick all possible pairs), one for top boundary, other for bot boundary, and then calculate area of polygon formed this way? Then, select set of points resulting in polygon with biggest area?
EDIT 2: Ooor I could group them like I thought I would before, and then work on only two groups at a time. Find convex hull for two groups, then for two next groups, and when Im done with all the groups, Id only need to find points common to all the group, and find a global hull :D Yeah, that might work :D

Shape storage with postgres

I am dealing with storage of shapes. After a day spent to include the routines now I have some doubts.
The main trouble is recognize perfectly the new shape and know if already included or not.
I wrote something, and it works. I select all the shapes with same area and same number of vertex, and than I perform an Heuristic comparison. But, so just to be sure, I would ask you, if you know some direct algorithms, such matrix, spectrum, G theorm that gave two set of point can understand if the two sets are the same shape?
Before somebody screams about gis, the shapes can contain negatives shapes, better known with the name of holes. That are not to analyze in own coordinate system but in owner shape coord sys, where is the hole inside the other shape and his orientation. As far as I know, can gis nest shapes in other shapes and recognize them as whole shape.
Second place, what is the best way to store shape? I was thinking to store as array but I find it uncomfortable.

Hashing a graph to find duplicates (including rotated and reflected versions)

I am making a game that involves solving a path through graphs. Depending on the size of the graph this can take a little while so I want to cache my results.
This has me looking for an algorithm to hash a graph to find duplicates.
This is straightforward for exact copies of a graph, I simply use the node positions relative to the top corner. It becomes quite a bit more complicated for rotated or even reflected graphs. I suspect this isn't a new problem, but I'm unsure of what the terminology for it is?
My specific case is on a grid, so a node (if present) will always be connected to its four neighbors, north, south, east and west. In my current implementation each node stores an array of its adjacent nodes.
Suggestions for further reading or even complete algorithms are much appreciated.
My current hashing implementation starts at the first found node in the graph which depends on how i iterate over the playfield, then notes the position of all nodes relative to it. The base graph will have a hash that might be something like: 0:1,0:2,1:2,1:3,-1:1,
I suggest you do this:
Make a function to generate a hash for any graph, position-independent. It sounds like you already have this.
When you first generate the pathfinding solution for a graph, cache it by the hash for that graph...
...Then also generate the 7 other unique forms of that graph (rotated 90deg; rotated 270deg; flipped x; flipped y; flipped x & y; flipped along one diagonal axis; flipped along the other diagonal axis). You can of course generate these using simple vector/matrix transformations. For each of these 7 transformed graphs, you also generate that graph's hash, and cache the same pathfinding solution (which you first apply the same transform to, so the solution maps appropriately to the new graph configuration).
You're done. Later your code will look up the pathfinding solution for a graph, and even if it's an alternate (rotated, flipped) form of the graph you found the earlier solution for, the cache already contains the correct solution.
I spent some time this morning thinking about this and I think this is probably the most optimal solution. But I'll share the other over-analyzed versions of the solution that I was also thinking about...
I was considering the fact that what you really needed was a function that would take a graph G, and return the "canonical version" of G (which I'll call G'), AND the transform matrix required to convert G to G'. (It seemed like you would need the transform so you could apply it to the pathfinding data and get the correct path for G, since you would have just stored the pathfinding data for G'.) You could, of course, look up pathfinding data for G', apply the transform matrix to it, and have your pathfinding solution.
The problem is that I don't think there's any unambiguous and performant way to determine a "canonical version" of G, because it means you have to recognize all 8 variants of G and always pick the same one as G' based on some criteria. I thought I could do something clever by looking at each axis of the graph, counting the number of points along each row/column in that axis, and then rotating/flipping to put the more imbalanced half of the axis always in the top-or-left... in other words, if you pass in "d", "q", "b", "d", "p", etc. shapes, you would always get back the "p" shape (where the imbalance is towards the top-left). This would have the nice property that it should recognize when the graph was symmetrical along a given axis, and not bother to distinguish between the flipped versions on that axis, since they were the same.
So basically I just took the row-by-row/column-by-column point counts, counting the points in each half of the shape, and then rotating/flipping until the count is higher in the top-left. (Note that it doesn't matter that the count would sometimes be the same for different shapes, because all the function was concerned with was transforming the shape into a single canonical version out of all the different possible permutations.)
Where it fell down for me was deciding which axis was which in the canonical case - basically handling the case of whether to invert along the diagonal axis. Once again, for shapes that are symmetrical about a diagonal axis, the function should recognize this and not care; for any other case, it should have a criteria for saying "the axis of the shape that has the property [???] is, in the canonical version, the x axis of the shape, while the other axis will be the y axis". And without this kind of criteria, you can't distinguish two graphs that are flipped about the diagonal axis (e.g. "p" versus "σ"/sigma). The criteria I was trying to use was again "imbalance", but this turned out to be harder and harder to determine, at least the way I was approaching it. (Maybe I should have just applied the technique I was using for the x/y axes to the diagonal axes? I haven't thought through how that would work.) If you wanted to go with such a solution, you'd either need to solve this problem I failed to solve, or else give up on worrying about treating versions that are flipped about the diagonal axis as equivalent.
Although I was trying to focus on solutions that just involved calculating simple sums, I realized that even this kind of summing is going to end up being somewhat expensive to do (especially on large graphs) at runtime in pathfinding code (which needs to be as performant as possible, and which is the real point of your problem). In other words I realized that we were probably both overthinking it. You're much better off just taking a slight hit on the initial caching side and then having lightning-fast lookups based on the graph's position-independent hash, which also seems like a pretty foolproof solution as well.
Based on the twitter conversation, let me rephrase the problem (I hope I got it right):
How to compare graphs (planar, on a grid) that are treated as invariant under 90deg rotations and reflection. Bonus points if it uses hashes.
I don't have a full answer for you, but a few ideas that might be helpful:
Divide the problem into subproblems that are independently solvable. That would make
How to compare the graphs given the invariance conditions
How to transform them into a canonical basis
How to hash this canonical basis subject to tradeoffs (speed, size, collisions, ...)
You could try to solve 1 and 2 in a singe step. A naive geometric approach could be as follows:
For rotation invariance, you could try to count the edges in each direction and rotate the graph so that the major direction always point to the right. If there is no main direction you could see the graph as a point cloud of its vertices and use Eigenvectors and Priciple Compoment Analysis (PCA) to obtain the main direction and rotate it accordingly.
I don't have a smart solution for the reflection problem. My brute force way would be to just create the reflected graph all the time. Say you have a graph g and the reflected graph r(g). If you want to know if some other graph h == g you have to answer h == g || h == r(g).
Now onto the hashing:
For the hashing you probably have to trade off speed, size and collisions. If you just use the string of edges, you are high on speed and size and low on collisions. If you just take this string and apply some generic string hasher to it, you get different results.
If you use a short hash, with more frequent collisions, you can get achieve a rather small cost for comparing non matching graphs. The cost for matching graphs is a bit higher then, as you have to do a full comparison to see if they actually match.
Hope this makes some kind of sense...
best, Simon
update: another thought on the rotation problem if the edges don't give a clear winner: Compute the center of mass of the vertices and see to which side of the center of the bounding box it falls. Rotate accordingly.

Calculate nearest point of KML polygon for iPhone app

I have a series of nature reserves that need to be plotted, as polygon overlays, on a map using the coordinates contained within KML data. I’ve found a tutorial on the Apple website for displaying KML overlays on map instances.
The problem is that the reserves vary in size greatly - from a small pond right up to several hundred kilometers in size. As a result I can’t use the coordinates of the center point to find the nearest reserves. Instead I need to calculate the nearest point of the reserves polygon to find the nearest one. With the data in KML - how would I go about trying to achieve this?
I've only managed to find one other person ask this and no one had replied :(
Well, there are a couple different solutions depending on your needs. The higher the accuracy required, the more work required. I like Phil's meanRadius parameter idea. That would give you a rough idea of which polygon is closest and would be pretty easy to calculate. This idea works best if the polygons are "circlish". If the polygon are very irregular in shape, this idea loses it's accuracy.
From a math standpoint, here is what you want to do. Loop through all points of all polygons. Calculate the distance from those points to your current coordinate. Then just keep track of which one is closest. There is one final wrinkle. Imagine a two points making a line segment that is very long. You are located one meter away from the midpoint of the line. Well, the distance to these two points is very large, while, in fact you are very close to the polygon. You will need to calculate the distance from your coordinate to every possible line segment which you can do in a variety of manners which are outlined here:
http://www.worsleyschool.net/science/files/linepoint/distance.html
Finally, you need to ask yourself, am I in any polygons? If you're 10 meters away from a point on a polygon, but are, in fact, inside the polygon, obviously, you need to consider that. The best way to do that is to use a ray casting algorithm:
http://en.wikipedia.org/wiki/Point_in_polygon#Ray_casting_algorithm

Finding users close to you while the coordinates of you and others is free to change

I have a database with the current coordinates of every online user. With a push of a button the user can update his/her coordinates to update his current location (which are then sent off to server). The app will allow you to set the radius of a circle (where the user is in the center) in which you can see the other users on a map. The users outside the circle are discarded.
What is the optimal way to find the users around you?
1) The easiest solution is to find the distance between you and every user and then see if it's less than the radius. This would place the sever under unnecessarily great load as comparison has to be made with every user in the world. In addition, how would one deal with changes in the locations?
2) An improved way would be to only calculate and compare the distance with other users who have similar latitude and longitude. Again in order to be efficient, if the radius is decreased the app should only target users with even closer coordinates. This is not as easy as it sounds. If one were to walk around the North Pole with, say, 10m radius then every step around the circumference would equal to a change of 9 degrees longitude. Every step along the equator would be marginal. Still, even being very rough and assuming there aren't many users visiting the Poles I could narrow it down to some extent.
Any ideas regarding finding users close-by and how to keep them up to date would be much appreciated! :)
Andres
Very good practice is to use GeoHash concept (http://geohash.org/) or GeoModel http://code.google.com/p/geomodel/ (better for BigTable like databases). Those are efficient ways of geospatial searches. I encourage you to read some of those at links I have provided, but in few words:
GeoHash translates lon and lat to unique hash string, than you can query database through those hashes. If points are closer to each other similar prefix will bi longer
GeoModel is similar to GegoHash with that difference that hashed are squares with set accuracy. If square is smaller the hash is longer.
Hope I have helped you. But decision, which you will pick, is yours :).
Lukasz
1) you would probably need a two step process here.
a) Assuming that all locations go into a database, you can do a compare at the sql level (very rough one) based on the lat & long, i.e. if you're looking for 100m distances you can safely disregard locations that differ by more than 0.01 degree in both directions. I don't think your North Pole users will mind ;)
Also, don't consider this unnecessary - better do it on the server than the iPhone.
b) you can then use, for the remaining entries, a comparison formula as outlined below.
2) you can find a way to calculate distances between two coordinates here http://snipplr.com/view/2531/calculate-the-distance-between-two-coordinates-latitude-longitude/
The best solution currently, in my opinion, is to wrap the whole earth in a matrix. Every cell will cover a small area and have a unique identifier. This information would be stored for every coordinate in the database and it allows me to quickly filter out irrelevant users (who are very far away). Then use Pythagoras to calculate the distance between all the other users and the client.