How to do in-memory search for polygons that contain a given point? - postgresql

I have a PostGreSQL table that has a geometry type column, in which different simple polygons (possibly intersecting) are stored. The polygons are are all areas within a city. I receive an input of a point (latitude-longitude pair) and need to find the list of polygons that contain the given point. What I have currently:
Unclustered GiST index defined on the polygon column.
Use ST_Contains(#param_Point, table.Polygon) on the whole table.
It is quite slow, so I am looking for a more performant in-memory alternative. I have the following ideas:
Maintain dictionary of polygons in Redis, keyed by their geohash. Polygons with same geohash would be saved as a list. When I receive the point, calculate its geohash and trim to a desired level. Then search in the Redis map and keep trimming the point's geohash until I find the first result (or enough results).
Have a trie of geohashes loaded from the database. Update the trie periodically or by receiving update events. Calculate the point's geohash, search in the trie until I find enough results. I prefer this because the map may have long lists for a geohash, given the nature of the polygons.
Any other approaches?
I have read about libraries like GeoTrie and Polygon Geohasher but can't seem to integrate them with the database and the above ideas.
Any cues or starting points, please?

Have you tried using ST_Within? Not sure if it meets your criteria but I believe it is meant to be faster than st_contains

Related

Find closest match from 2 columns in postgresql

I have a table "ways" containing coordinates (lat/lon) values. Suppose I have a coordinate (x,y) and I want to check the closest match from the table ways. I have seen some similar questions like this: Is there a postgres CLOSEST operator?
However in my case I have 2 columns instead of 1. Is there a query to do something like this this ?
You could store the data as PostGIS 'point' type instead of coordinate values:
https://postgis.net/docs/ST_MakePoint.html
https://postgis.net/docs/ST_GeomFromText.html
This would empower you with all the PostGIS functionality such as:
https://postgis.net/docs/ST_DWithin.html
Then you could create a GiST index and use the PostGIS <-> operator to take advantage of index assisted nearest neighbor result sets. The 'nearest neighbor' functionality is pretty common. Here is a great explanation:
https://postgis.net/workshops/postgis-intro/knn.html
“KNN” stands for “K nearest neighbours”, where “K” is the number of neighbours you are looking for.
KNN is a pure index based nearest neighbour search. By walking up and down the index, the search can find the nearest candidate geometries without using any magical search radius numbers, so the technique is suitable and high performance even for very large tables with highly variable data densities.

What are the pros and cons of multiple rows of POLYGON vs one MULTIPOLYGON field?

So for the first time I'm gonna do a project that involves maps and layers on top of maps which have many points and many polygons on them.
I have the tendency to create separate tables for points and polygons and then create many-to-many relationships between them and the layers table. If I do that I end up with 5 tables: points, polygons, layers, layers_points and layers_polygons.
However, I see PostGIS also offers types called MULTIPOINT and MULTIPOLYGON. If I use those types then I could put it all in the layers table. I guess that would make queries faster, because I need less joins. However, I'm not sure if later I might regret it, if it means that working with the individual points and polygons becomes impossible. I'm not even sure yet if it will be necessary to work perform calculations on the individual points and polygons, but it would be nice to know whether that's possible or not in both approaches.
So basically I'm asking, what the pros and cons are of these different approaches?
In general, you would consider using multipolygons to represent entities that have disjoint surfaces (for example, the geometry of Alaska) or other topologies that you can't represent as polygons. The key here is that a single entity needs to be expressed with a multipolygon
What you wouldn't do is group unrelated polygons into a multipolygon, because you won't be able to perform queries at a child polygon level, unless you extract the rings into another geometry. If the polygons are unrelated, chances are you will need to query them individually. Even if they share a layer, you can manage that relation with business logic without merging them as they aren't representing the same entity.
Keep in mind that geometry tools in the frontend won't necesarilly treat multipolygons as a valid geometry or a multi object. Algorithms of point-in-polygon that looks like your use case, won't necesarilly work when checking if a point is contained in a multipolygon.
Tools like Wicket.js (transform from/to WKT/geojson/native objects) don't support multipolygons. Google maps api v3 doesn't support multipolygons except for the data layer (but you can't operate on the data layer as you would on a polygon feature). Turf.js has operations that would run on a Featurecollection containing several polygons, yet not over a multipolygon.
Without knowing your exact use case, that's the best I can tell you, and TL/DR: keep your polygons as they are.

Find points near LineString in mongodb sorted by distance

I have an array of points representing a street (black line) and points, representing a places on map (red points). I want to find all the points near the specified street, sorted by distance. I also need to have the ability to specify max distance (blue and green areas). Here is a simple example:
I thought of using the $near operator but it only accepts Point as an input, not LineString.
How mongodb can handle this type of queries?
As you mentioned, Mongo currently doesn't support anything other than Point. Have you come across the concept of a route boxer? 1 It was very popular a few years back on Google Maps. Given the line that you've drawn, find stops that are within dist(x). It was done by creating a series of bounding boxes around each point in the line, and searching for points that fall within the bucket.
I stumbled upon your question after I just realised that Mongo only works with points, which is reasonable I assume.
I already have a few options of how to do it (they expand on what #mnemosyn says in the comment). With the dataset that I'm working on, it's all on the client-side, so I could use the routeboxer, but I would like to implement it server-side for performance reasons. Here are my suggestions:
break the LineString down into its individual coordinate sets, and query for $near using each of those, combine results and extract an unique set. There are algorithms out there for simplifying a complex line, by reducing the number of points, but a simple one is easy to write.
do the same as above, but as a stored procedure/function. I haven't played around with Mongo's stored functions, and I don't know how well they work with drivers, but this could be faster than the first option above as you won't have to do roundtrips, and depending on the machine that your instance(s) of Mongo is(are) hosted, calculations could be faster by microseconds.
Implement the routeboxer approach server-side (has been done in PHP), and then use either of the above 2 to find stops that are $within the resulting bounding boxes. Heck since the routeboxer method returns rectangles, it would be possible to merge all these rectangles into one polygon covering your route, and just do a $within on that. (What #mnemosyn suggested).
EDIT: I thought of this but forgot about it, but it might be possible to achieve some of the above using the aggregation framework.
It's something that I'm going to be working on soon (hopefully), I'll open-source my result(s) based on which I end up going with.
EDIT: I must mention though that 1 and 2 have the flaw that if you have 2 points in a line that are say 2km apart, and you want points that are within 1.8km of your line, you'll obviously miss all the points between that part of your line. The solution is to inject points onto your line when simplifying it (I know, beats the objective of reducing points when adding new ones back in).
The flaw with 3 then is that it won't always be accurate as some points within your polygon are likely to have a distance greater than your limit, though the difference wouldn't be a significant percentage of your limit.
[1] google maps utils routeboxer
As you said Mongo's $near only works on points not lines as the centre point however if you flip your premise from find points near the line to find the line near the point then you can use your points as the centre and line as the target
this is the difference between
foreach line find points near it
and
foreach point find line near it
if you have a large number of points to check you can combine this with nevi_me's answer to reduce the list of points that need checking to a much smaller subset

Which PostGIS SRID is most efficient for a spatial index?

I have a PostGIS-enabled database with a table called locations that stores latitude-longitude points (SRID 4326) in a column called coordinates. However, all of my lookups on that table convert the points to a metric projection (SRID 26986) mainly to do distance comparisons.
Obviously I'd like to create a spatial index on the coordinates column. My question is, which is the best (most computationally efficient) SRID to use in the coordinates spatial index in this case?
I can either index using SRID 4326...
CREATE INDEX locations_coordinates_gist
ON locations
USING GIST (coordinates);
Or using SRID 26986...
CREATE INDEX locations_coordinates_gist
ON locations
USING GIST (ST_Transform(coordinates, 26986));
I discovered this helpful information reading the PostGIS documentation on the ST_Transform function...
If using more than one transformation, it is useful to have a
functional index on the commonly used transformations to take
advantage of index usage.
So it seems the answer is, use both! I have created two indices, one with each SRID.

What SRID should I use for my application and how?

I'm using PostgreSQL with PostGIS. All my data has already decimal lat/long attached to it (i.e. -87.34554 33.12321) but to use PostGIS I need to convert it to a certain type of SRID.
The majority of my queries are looking for data inside a certain radius.
What SRID should I use? I created already a geometry column with SRID 4269.
In this example:
link text the author is converting SRID 4269 to SRID 32661. I'm very confused about how and when to use these SRIDs. Any lite on the subject would be truly appreciated.
As long as you never intend to reproject/transform the data to another coordinate system, it doesn't technically matter what srid you use. However assuming you don't want to throw away that important metadata, and you do want to transform it, you will want to ensure your assigned srid matches the data, so postgis knows what to do when the time comes.
So why would you want to reproject from epsg:4269? The answer is because certain types of queries (such as distance) make no sense in this 'unprojected' world. Your units are in decimal degrees, and a straight measurement of x decimal degrees is a different real distance depending where in the planet you are.
In your example above, someone is using epsg:32661 as they believe it will give them better accuracy for the are they're working in. If your data is in a specific area of the globe, you can select a projection that's accurate for that area. If it spans the entire globe, you have to choose a projection that does 'ok' for your needs.
Now fortunately PostGIS has a few ways of making all this easier. For approx distances you can just use the st_distance_sphere function which, as you might guess, assumes the earth is a sphere. Or the more accurate st_distance_spheroid. Using these, you don't need to reproject and you will probably be fine for your distance queries except in edge cases. Newer versions of PostGIS also let you use geography columns
tl;dr - use st_distance_spheroid for your distance queries, store your data in geography columns, or transform it to a local projection (when storing, or on the fly, depending on your needs).
Take a look at this question: How do you know what SRID to use for a shp file?
The SRID is just a way of storing the WKT inside the database (you may have noticed that, altough you store lat/long points, the preferred storing is a long string with number and capital letters).
The SRID or EPSG can be different for the country/state/... altough there are some very common ones especially the 2 mentioned by you. If you need specific info what area uses what SRID, there is a database for handling that.
Inside your database, you have a table spatial_ref_sys that has the information on what SRID PostGIS knows about.