Postgresql: Find road endpoints within distance from coast - postgresql

I have 2.2M line geometries (roads) in one table and 1500 line geometries (coast lines) in another. Both tables have a spatial index.
I need to find the endpoints of roads which are within a certain distance from the coast and store the point geometry along with the distance.
Current solution, which seems ineffecient, and takes many many hours to complete on a very fast machine;
CREATE TEMP TABLE with start and end points of the road geometries within distance, using ST_STARTPOINT, ST_ENDPOINT and ST_DWITHIN.
CREATE SPATIAL INDEXES for both geometry columns in the temp table.
Do two INSERT INTO operations, one for startpoints and one for endpoints;
SELECT geometry and distance, using ST_DISTANCE from point to coastline and a WHERE ST_DWITHIN to only consider points within the chosen distance.
Code looks something along these lines:
create temp table roadpoints_temp as select st_startpoint(road.geom) as geomstart, st_endpoint(road.geom) as geomend from
coastline_table coast, roadline_table road where st_dwithin(road.geom, coast.geom, 100);
create index on roadpoints_temp (geomstart);
create index on roadpoints_temp (geomend);
create table roadcoast_points as select roadpoints_temp.geomstart as geom, round(cast(st_distance(roadpoints_temp.geomstart,kyst.geom) as numeric),2) as dist
from roadpoints_temp, coastline_table coast where st_dwithin(roadpoints_temp.geomstart, coast.geom, 100);
insert into roadcoast_points select roadpoints_temp.geomend as geom, round(cast(st_distance(roadpoints_temp.geomend,kyst.geom) as numeric),2) as dist
from roadpoints_temp, coastline_table coast where st_dwithin(roadpoints_temp.geomend, coast.geom, 100);
drop table roadpoints_temp;
All comments and suggestions welcome :-)

You need to effectively utilize your indexes. It seems that fastest plan would be to find for each coast all the roads that are within distance of it. Doing two rechecks separately means you lose connection of closest coastline to the road and need to re-find this pair again and again.
You need to check your execution plan using EXPLAIN to have a Seq Scan on coastline table and GiST index scan on road table.
select road.*
from coastline_table coast, roadline_table road
where
ST_DWithin(coast.geom, road.geom, 100) -- indexed query
and -- non-indexed recheck
(
ST_DWithin(ST_StartPoint(road.geom), coast.geom, 100)
or ST_DWithin(ST_EndPoint(road.geom), coast.geom, 100)
);

Related

Assign points to polygons effeciently

I have table of polygons (thousands), and table of points (millions). Both tables have GIST indexes on geometry columns. Important this is, polygons do not overlap, so every point is contained by exactly one polygon. I want to generate table with this relation (polygon_id + point_id).
Trivial solution of course is
SELECT a.polygon_id, p.point_id
FROM my_polygons a
JOIN my_points p ON ST_Contains(a.geom, p.geom)
This works, but I think it is unnecessary slow, since it matches every polygon with every point - it does not know that every point can belong to one polygon only.
Is there any way to speed things up?
I tried looping for every polygon, selecting points by ST_Contains, but only those not already in the result table:
CREATE TABLE polygon2point (polygon_id uuid, point_id uuid);
DO $$DECLARE r record;
BEGIN
FOR r IN SELECT polygon_id, geom
FROM my_polygon
LOOP
INSERT INTO polygon2point (polygon_id, point_id)
SELECT r.polygon_id, p.point_id
FROM my_points p
LEFT JOIN polygon2point t ON p.point_id = t.point_id
WHERE t.point_id IS NULL AND ST_Contains(r.geom, p.geom);
END LOOP;
END$$;
This even slower than trivial JOIN approach. Any ideas?
A way to increase the speed is to subdivide the polygons into smaller ones.
You would create a new table (or a materialized view should the polygon change often), index it, and then run the query. If the subdivisions have 128 vertices or less, the data will, by default, be stored uncompressed on disk, making the queries even faster.
CREATE TABLE poly_subdivided AS
SELECT ST_SUBDIVIDE(a.geom, 128) AS geom , a.polygon_id
FROM poly;
CREATE INDEX poly_subdivided_geom_idx ON poly_subdivided USING gist(geom);
ANALYZE poly_subdivided;
SELECT a.polygon_id, p.point_id
FROM poly_subdivided a
JOIN my_points p ON ST_Contains(a.geom, p.geom)
Here is a great article on the topic.

How to configure PostgreSQL with Postgis to calculate distances

I know that it might be dumb question, but I'm searching for some time and can't find proper answer.
I have PostgreSQL database with PostGIS installed. In one table I have entries with lon lat (let's assume that columns are place, lon, lat).
What should I add to this table or/and what procedure I can use, to be able to count distance between those places in meters.
I've read that it is necessary to know SRID of a place to be able to count distance. Is it possible to not know/use it and still be able to count distance in meters basing only on lon lat?
Short answer:
Just convert your x,y values on the fly using ST_MakePoint (mind the overhead!) and calculate the distance from a given point, the default SRS will be WGS84:
SELECT ST_Distance(ST_MakePoint(lon,lat)::GEOGRAPHY,
ST_MakePoint(23.73,37.99)::GEOGRAPHY) FROM places;
Using GEOGRAPHY you will get the result in meters, while using GEOMETRY will give it in degrees. Of course, knowing the SRS of coordinate pairs is imperative for calculating distances, but if you have control of the data quality and the coordinates are consistent (in this case, omitting the SRS), there is not much to worry about. It will start getting tricky if you're planing to perform operations using external data, from which you're also unaware of the SRS and it might differ from yours.
Long answer:
Well, if you're using PostGIS you shouldn't be using x,y in separated columns in the first place. You can easily add a geometry / geography column doing something like this.
This is your table ...
CREATE TABLE places (place TEXT, lon NUMERIC, lat NUMERIC);
Containing the following data ..
INSERT INTO places VALUES ('Budva',18.84,42.92),
('Ohrid',20.80,41.14);
Here is how you add a geography type column:
ALTER TABLE places ADD COLUMN geo GEOGRAPHY;
Once your column is added, this is how you convert your coordinates to geography / geometry and update your table:
UPDATE places SET geo = ST_MakePoint(lon,lat);
To compute the distance you just need to use the function ST_Distance, as follows (distance in meters):
SELECT ST_Distance(geo,ST_MakePoint(23.73,37.99)) FROM places;
st_distance
-----------------
686560.16822422
430876.07368955
(2 Zeilen)
If you have your location parameter in WKT, you can also use:
SELECT ST_Distance(geo,'POINT(23.73 37.99)') FROM places;
st_distance
-----------------
686560.16822422
430876.07368955
(2 Zeilen)

Find KNN with a geometry data type using POSTGIS in postgresql

i want to find k nearest points for a point with the best performance in postgresql using PostGIS.
The structure of my table is :
CREATE TABLE mylocations
(id integer,
name varchar,
geom geometry);
sample inserted row is:
INSERT INTO mylocations VALUES
(5, 'Alaska', 'POINT(-172.7078 52.373)');
I can find nearest points by ST_Distance() with the following query :
SELECT ST_Distance(geography(geom), 'POINT(178.1375 51.6186)'::geometry) as distance,ST_AsText(geom),name, id
FROM mylocations
ORDER BY distance limit 10;
but actually i want to find them without calculating distance of my points with all points of table.
in fact i want to find the best query with best performance, because my table would have huge data.
i appreciate for your thoughts
You are missing <-> operator which Returns the 2D distance between A and B. Make sure your geom types and SRID are the same.
SELECT ST_Distance(geom,
'POINT(178.1375 51.6186)'::geometry) as distance,
ST_AsText(geom),
name, id
FROM mylocations
ORDER BY geom <-> 'POINT(178.1375 51.6186)'::geometry limit 10
finally i could answer to my question with this query:
SELECT id, name
FROM mylocations
WHERE ST_DWithin(geom::geography,
ST_GeogFromText('POINT(-73.856077 40.848447)'),
1609, false);
actually i want to give a point as a center of a circle with radius 1609 (as meter) and get all neighbours have distance less than 1609 meter to center of the circle.

Optimize query for intersection of ST_Buffer layer in PostGIS

I have two tables stored in PostGIS:
1. a multipolygon vector with about 590000 rows (layerA) and
2. a single multipart (1 row) vector layer (layerB)
and I want to find the area of the intersection between each polygon's buffer in layerA and layerB. My query so far is
SELECT ST_Area(ST_Intersection(a.geom, b.geom)) AS myarea, a.gid AS mygid FROM
(SELECT ST_Buffer(geom, 500) AS geom, gid FROM layerA) AS a,
layerB AS b
So far, I can see my query working but I calculate that it needs 17 hours to be completed (with my PC). Is there another way to execute this query more efficiently and faster?
What if you check intersects of overlapping area before intersection and area calculation, it might lower time.
SELECT ST_Area(ST_Intersection(a.geom, b.geom)) AS myarea, a.gid AS mygid FROM
(SELECT ST_Buffer(geom, 500) AS geom, gid FROM layerA) AS a,
layerB AS b WHERE ST_intersects(a.geom, b.geom)
You would probably get more answers to this at gis.stackexchange.com.
Therea are several things you can do.
You should make sure you get that first filtering of polygons actually intersecting with help of index.
Put a gist index on the table with many geometries and use st_dwithin(geom,500) instead of st_intersects on the buffered geometries. That is because the buffered geometries cannot use the index calculated on the unbuffered geometries.
Also, you say you have multi polygons. If there actually is more than 1 polygon in each multipolygon you might get a lot more speed if you first split the polygons to single polygons before building the index. That will make the.index doing a much bigger part of the job.
There is actually a function in postgis to split even single polygons into smaller pieces for the same reason.
ST_SubDivide
So first use ST_Dump to get single polygons:
CREATE table a_singles AS
SELECT id, (ST_Dump(geom)).geom geom FROM a;
Then create index:
CREATE INDEX idx_a_s_geom
ON a_singles
USING gist(geom);
At last the query, something like
SELECT ST_Area(ST_Intersection(ST_Buffer(a_s.geom,500), b.geom))
FROM a_singles AS a_s
INNER JOIN b
on ST_DWithin(a_s.geom,b.geom,500);
If that still is slow you can start playing with ST_SubDivide.
One more thing. If the single multipolygon in table b contains many geometries, also split them and put an index also there.
It might be slow also after all those things. That depends on how many vertex points there is in the splitted polygons that actually intersect (and for st_dwithin also on how many vertexpoints there is in polygons with overlapping bounding boxes)
But now you don't have any index helping you so this should make it quite a lot faster.

Why my postgis not use index on geometry field?

postgresql 9.5 + postgis 2.2 on windows.
I firstly create a table:
CREATE TABLE points (
id SERIAL,
ad CHAR(40),
name VARCHAR(200)
);
then, add a geometry field 'geom':
select addgeometrycolumn('points', 'geom', 4326, 'POINT', 2);
and create gist index on it:
CREATE INDEX points_index_geom ON points USING GIST (geom);
then, I insert about 1,000,000 points into the table.
I want to query all points that within given distance from given point.
this is my sql code:
SELECT st_astext(geom) as location FROM points
WHERE st_distance_sphere(
st_geomfromtext('POINT(121.33 31.55)', 4326),
geom) < 6000;
the result is what I want, but it is too slow.
when I explain analyze verbose on this code, I found it dose not use points_index_geom (explain shows seq scan and no index).
So i want to know why it dose not use index, and how should i improve?
You cannot expect ST_Distance_Sphere() to use an index on this query. You are doing a calculation on the contents of the geom field and then you are doing a comparison on the calculation result. Databases may not use an index in such a scenario unless you have a function index that does pretty much the same calculation as in your query.
The correct way to find locations with in a given distance from some point is to use ST_DWithin
ST_DWithin — Returns true if the geometries are within the specified
distance of one another. For geometry units are in those of spatial
reference and For geography units are in meters and measurement is
defaulted to use_spheroid=true (measure around spheroid), for faster
check, use_spheroid=false to measure along sphere.
and
This function call will automatically include a bounding box
comparison that will make use of any indexes that are available on the
geometries.