Spatial gist index in postgis - performance - postgresql

I have problem with gist index.
I have table 'country' with 'geog' (geography,multipolygon) columny.
I have also gist index on this column.
This simple query with ST_CoveredBy() against table with 2 rows ( each 'geog' about 5MB) takes 13 s (the query result is correct):
select c."ID" from gis.country c where ST_CoveredBy(ST_GeogFromText('SRID=4326;POINT(8.4375 58.5791015625)'), c."geog") =true
When I droped the index, the query also took 13s.
What I've already did:
VACUUM ANALYZE gis.country ("geog")
maybe this is the problem: "Do not call with a GEOMETRYCOLLECTION as an argument" I have read (http://www.mail-archive.com/postgis-users#postgis.refractions.net/msg17780.html), that it's because overlapping polygons, but in country map doesn't exist overlapping polygons
EDIT
Query plan -
Index Scan using sindx_country_geography_col1 on country c (cost=0.00..8.52 rows=1 width=4)
Index Cond: ('0101000020E61000000000000000E0204000000000204A4D40'::geography && "geog")
Filter: _st_covers("geog", '0101000020E61000000000000000E0204000000000204A4D40'::geography)

You won't see any benefit of an index querying against a table with only two rows. The benefit of an index only shines if you have hundreds or more rows to query.
I'm going to guess that you have two very detailed country multipolygons. There are strategies to divide these into grids to improve performance. How you break up your countries into grids should be based on (1) the density of your areas of interest (where you are most likely to query), and (2) multipolygon complexity or density of vertices.

Related

"Sparse" Geospatial Queries with ST_Contains are Much Slower than dense ones

I am hitting a wall when it comes to trying to explain what is happening with a query I have. For simplicity, I have stripped it down to the minimum. Simply put, I am try to find all points within a simple envelope like so:
SELECT
lines.start_point AS point
FROM
"objects"
INNER JOIN "lines" ON "lines"."id" = "objects"."line_id"
WHERE
(ST_Contains
(
ST_MakeEnvelope (142.055256,-10.798657,142.385532,-10.485534, 4326),lines.start_point::geometry
)
)
LIMIT 50 OFFSET 0;
I have indexes setup on lines.start_point and everything is very fast in areas with lots of data. I get data returned in the sub 500ms range for areas that have lots of data.
What I did not expect is that areas with very little data would be super slow - sometimes > 90,000ms. Is there something I am totally missing here with ST_Contains that would explain this?
As with the example bounding box above and screenshot of return points my data only has 63 start points within this box but the query took 2min 54sec to find them. My only thought is that maybe ST_Contains is quite fast when it can just pick up points quickly when it can find a lot in a single pass but if it has to scan the entire area, it is really slow.
Additionally, I have tried using other ways to looks for points - like the && operator. When this is the case, the roles reverse. Dense areas take a really long time and sparse areas are lightning fast. And example of that query is here:
SELECT
lines.start_point AS point
FROM
"objects"
INNER JOIN "lines" ON "lines"."id" = "objects"."line_id"
WHERE
lines.start_point && ST_MakeEnvelope (142.055256,-10.798657,142.385532,-10.485534, 4326)
LIMIT 50 OFFSET 0;
Any information would help. Thanks
EDIT: Add && query example
Because of the limit clause, the planner may think it is faster not to use the spatial index. You can try to query all rows and then to apply the limit. Make sure to keep the offset 0 to prevent inlining.
SELECT * FROM (
SELECT
lines.start_point AS point
FROM
"objects"
INNER JOIN "lines" ON "lines"."id" = "objects"."line_id"
WHERE
(ST_Contains
(
ST_MakeEnvelope (142.055256,-10.798657,142.385532,-10.485534, 4326),lines.start_point::geometry
)
)
OFFSET 0)
LIMIT 50;
Also I see you do a cast to geometry, so make sure the index is on the geometry too!

Optimize query for intersection of ST_Buffer layer in PostGIS

I have two tables stored in PostGIS:
1. a multipolygon vector with about 590000 rows (layerA) and
2. a single multipart (1 row) vector layer (layerB)
and I want to find the area of the intersection between each polygon's buffer in layerA and layerB. My query so far is
SELECT ST_Area(ST_Intersection(a.geom, b.geom)) AS myarea, a.gid AS mygid FROM
(SELECT ST_Buffer(geom, 500) AS geom, gid FROM layerA) AS a,
layerB AS b
So far, I can see my query working but I calculate that it needs 17 hours to be completed (with my PC). Is there another way to execute this query more efficiently and faster?
What if you check intersects of overlapping area before intersection and area calculation, it might lower time.
SELECT ST_Area(ST_Intersection(a.geom, b.geom)) AS myarea, a.gid AS mygid FROM
(SELECT ST_Buffer(geom, 500) AS geom, gid FROM layerA) AS a,
layerB AS b WHERE ST_intersects(a.geom, b.geom)
You would probably get more answers to this at gis.stackexchange.com.
Therea are several things you can do.
You should make sure you get that first filtering of polygons actually intersecting with help of index.
Put a gist index on the table with many geometries and use st_dwithin(geom,500) instead of st_intersects on the buffered geometries. That is because the buffered geometries cannot use the index calculated on the unbuffered geometries.
Also, you say you have multi polygons. If there actually is more than 1 polygon in each multipolygon you might get a lot more speed if you first split the polygons to single polygons before building the index. That will make the.index doing a much bigger part of the job.
There is actually a function in postgis to split even single polygons into smaller pieces for the same reason.
ST_SubDivide
So first use ST_Dump to get single polygons:
CREATE table a_singles AS
SELECT id, (ST_Dump(geom)).geom geom FROM a;
Then create index:
CREATE INDEX idx_a_s_geom
ON a_singles
USING gist(geom);
At last the query, something like
SELECT ST_Area(ST_Intersection(ST_Buffer(a_s.geom,500), b.geom))
FROM a_singles AS a_s
INNER JOIN b
on ST_DWithin(a_s.geom,b.geom,500);
If that still is slow you can start playing with ST_SubDivide.
One more thing. If the single multipolygon in table b contains many geometries, also split them and put an index also there.
It might be slow also after all those things. That depends on how many vertex points there is in the splitted polygons that actually intersect (and for st_dwithin also on how many vertexpoints there is in polygons with overlapping bounding boxes)
But now you don't have any index helping you so this should make it quite a lot faster.

Why my postgis not use index on geometry field?

postgresql 9.5 + postgis 2.2 on windows.
I firstly create a table:
CREATE TABLE points (
id SERIAL,
ad CHAR(40),
name VARCHAR(200)
);
then, add a geometry field 'geom':
select addgeometrycolumn('points', 'geom', 4326, 'POINT', 2);
and create gist index on it:
CREATE INDEX points_index_geom ON points USING GIST (geom);
then, I insert about 1,000,000 points into the table.
I want to query all points that within given distance from given point.
this is my sql code:
SELECT st_astext(geom) as location FROM points
WHERE st_distance_sphere(
st_geomfromtext('POINT(121.33 31.55)', 4326),
geom) < 6000;
the result is what I want, but it is too slow.
when I explain analyze verbose on this code, I found it dose not use points_index_geom (explain shows seq scan and no index).
So i want to know why it dose not use index, and how should i improve?
You cannot expect ST_Distance_Sphere() to use an index on this query. You are doing a calculation on the contents of the geom field and then you are doing a comparison on the calculation result. Databases may not use an index in such a scenario unless you have a function index that does pretty much the same calculation as in your query.
The correct way to find locations with in a given distance from some point is to use ST_DWithin
ST_DWithin — Returns true if the geometries are within the specified
distance of one another. For geometry units are in those of spatial
reference and For geography units are in meters and measurement is
defaulted to use_spheroid=true (measure around spheroid), for faster
check, use_spheroid=false to measure along sphere.
and
This function call will automatically include a bounding box
comparison that will make use of any indexes that are available on the
geometries.

Postgresql: Find road endpoints within distance from coast

I have 2.2M line geometries (roads) in one table and 1500 line geometries (coast lines) in another. Both tables have a spatial index.
I need to find the endpoints of roads which are within a certain distance from the coast and store the point geometry along with the distance.
Current solution, which seems ineffecient, and takes many many hours to complete on a very fast machine;
CREATE TEMP TABLE with start and end points of the road geometries within distance, using ST_STARTPOINT, ST_ENDPOINT and ST_DWITHIN.
CREATE SPATIAL INDEXES for both geometry columns in the temp table.
Do two INSERT INTO operations, one for startpoints and one for endpoints;
SELECT geometry and distance, using ST_DISTANCE from point to coastline and a WHERE ST_DWITHIN to only consider points within the chosen distance.
Code looks something along these lines:
create temp table roadpoints_temp as select st_startpoint(road.geom) as geomstart, st_endpoint(road.geom) as geomend from
coastline_table coast, roadline_table road where st_dwithin(road.geom, coast.geom, 100);
create index on roadpoints_temp (geomstart);
create index on roadpoints_temp (geomend);
create table roadcoast_points as select roadpoints_temp.geomstart as geom, round(cast(st_distance(roadpoints_temp.geomstart,kyst.geom) as numeric),2) as dist
from roadpoints_temp, coastline_table coast where st_dwithin(roadpoints_temp.geomstart, coast.geom, 100);
insert into roadcoast_points select roadpoints_temp.geomend as geom, round(cast(st_distance(roadpoints_temp.geomend,kyst.geom) as numeric),2) as dist
from roadpoints_temp, coastline_table coast where st_dwithin(roadpoints_temp.geomend, coast.geom, 100);
drop table roadpoints_temp;
All comments and suggestions welcome :-)
You need to effectively utilize your indexes. It seems that fastest plan would be to find for each coast all the roads that are within distance of it. Doing two rechecks separately means you lose connection of closest coastline to the road and need to re-find this pair again and again.
You need to check your execution plan using EXPLAIN to have a Seq Scan on coastline table and GiST index scan on road table.
select road.*
from coastline_table coast, roadline_table road
where
ST_DWithin(coast.geom, road.geom, 100) -- indexed query
and -- non-indexed recheck
(
ST_DWithin(ST_StartPoint(road.geom), coast.geom, 100)
or ST_DWithin(ST_EndPoint(road.geom), coast.geom, 100)
);

How to count up a value until all geometry features from one table are selected

For example, I have this query to find the minimum distance between two geometries (stored in 2 tables) with a PostGIS function called ST_Distance.
Having thousands of geometries (in both tables) it takes to much time without using ST_DWithin. ST_DWithin returns true if the geometries are within the specified distance of one another (here 2000m).
SELECT DISTINCT ON
(id)
table1.id,
table2.id
min(ST_Distance(a.geom, b.geom)) AS distance
FROM table1 a, table2 b
WHERE ST_DWithin(a.geom, b.geom, 2000.0)
GROUP BY table1.id, table2.id
ORDER BY table1.id, distance
But you have to estimate the distance value to fetch all geometries (e.g. stored in table1). Therefore you have to look at your data in some way in a GIS, or you have to calculate the maximum distance for all (and that takes a lot of time).
In the moment I do it in that way that I approximate the distance value until all features are queried from table1, for example.
Would it be efficient that my query automatically increases (with a reasonable value) the distance value until the count of all geometries (e.g. for table1) is reached? How can I put this in execution?
Would it be slow down everything because the query needs maybe a lot of approaches to find the distance value?
Do I have to use a recursive query for this purpose?
See this post here: K-Nearest Neighbor Query in PostGIS
Basically, the <-> operator is a bit unusual in that it works in the order by clause, but it avoids having to make a guess as to how far you want to search in ST_DWithin. There is a major gotcha with this operator though, which is that the geometry in the order by clause must be a constant that is you CAN NOT write:
select a.id, b.id from table a, table b order by geom.a <-> geom.b limit 1;
Instead you would have to create a loop, substituting in a value above for geom.b
More information can be found here: http://boundlessgeo.com/2011/09/indexed-nearest-neighbour-search-in-postgis/