PostGIS - Count Points in Polygons (and average their features within the boundaries) - postgresql

I've a table with some points representing buildings :
CREATE TABLE buildings(
pk serial NOT NULL,
geom geometry(Point,4326),
height double precision,
area double precision,
perimeter double precision
)
And another table with polylines(most of them closed):
CREATE TABLE regions
(
pk serial NOT NULL,
geom geometry(Polygon,4326)
)
I would like to:
count the numbers of points inside each regions (buildings_n)
find the average value of one the features(eg. area) within the regions boundary (area_avg)
Adding the two new columns:
ALTER TABLE regions ADD COLUMN buildings_n integer;
ALTER TABLE regions ADD COLUMN area_avg double precision;
How can I do these two queries?
I've tried this one for the point 1, but it fails:
INSERT INTO regions (buildings_n)
SELECT count(b.geom)
FROM regions a, buildings b
WHERE st_contains(a.geom,b.geom);
thank you,
Stefano

Region geometry
The first problem you have is that ST_Contains with 'polylines' or linestrings only finds the points that are exactly on the linestring's geometry. If you want points within a region represented by a linestring it won't work, especially if these are not closed. See the examples of valid ST_Contains relations here:
http://www.postgis.org/docs/ST_Contains.html
For the spatial relation to work you have to transform the region's geometry to polygons, either beforehand or on the fly in the query. For example:
ST_Contains(ST_MakePolygon(a.geom),b.geom)
See this reference for more info:
http://www.postgis.org/docs/ST_MakePolygon.html
Calculate aggregate values
The second problem is that to use the aggregate functions count or average on subsets of the buildings table (and not the entire table) you need to associate the region id with each building...
SELECT a.pk region_pk, b.pk building_pk, b.area
FROM regions a, buildings b
WHERE ST_Contains(ST_MakePolygon(a.geom),b.geom)
.. and then group your building data by the region they belong to:
SELECT region_pk, count(), avg(area) average
FROM joined_regions_and_buildings
GROUP BY region_pk;
Update new columns
The third problem is that you are using INSERT to add values to the newly created columns. INSERT is for adding new records to a table, UPDATE is used for changing values of existing records in a table.
Solution
So, all of the points above combined result in the following query:
WITH joined_regions_and_buildings AS (
SELECT a.pk region_pk, b.pk building_pk, b.area
FROM regions a, buildings b
WHERE ST_Contains(ST_MakePolygon(a.geom),b.geom)
)
UPDATE regions a
SET buildings_n = b.count, area_avg = b.average
FROM (
SELECT region_pk, count(), avg(area) average
FROM joined_regions_and_buildings
GROUP BY region_pk
) b
WHERE a.pk = b.region_pk;

Related

Assign points to polygons effeciently

I have table of polygons (thousands), and table of points (millions). Both tables have GIST indexes on geometry columns. Important this is, polygons do not overlap, so every point is contained by exactly one polygon. I want to generate table with this relation (polygon_id + point_id).
Trivial solution of course is
SELECT a.polygon_id, p.point_id
FROM my_polygons a
JOIN my_points p ON ST_Contains(a.geom, p.geom)
This works, but I think it is unnecessary slow, since it matches every polygon with every point - it does not know that every point can belong to one polygon only.
Is there any way to speed things up?
I tried looping for every polygon, selecting points by ST_Contains, but only those not already in the result table:
CREATE TABLE polygon2point (polygon_id uuid, point_id uuid);
DO $$DECLARE r record;
BEGIN
FOR r IN SELECT polygon_id, geom
FROM my_polygon
LOOP
INSERT INTO polygon2point (polygon_id, point_id)
SELECT r.polygon_id, p.point_id
FROM my_points p
LEFT JOIN polygon2point t ON p.point_id = t.point_id
WHERE t.point_id IS NULL AND ST_Contains(r.geom, p.geom);
END LOOP;
END$$;
This even slower than trivial JOIN approach. Any ideas?
A way to increase the speed is to subdivide the polygons into smaller ones.
You would create a new table (or a materialized view should the polygon change often), index it, and then run the query. If the subdivisions have 128 vertices or less, the data will, by default, be stored uncompressed on disk, making the queries even faster.
CREATE TABLE poly_subdivided AS
SELECT ST_SUBDIVIDE(a.geom, 128) AS geom , a.polygon_id
FROM poly;
CREATE INDEX poly_subdivided_geom_idx ON poly_subdivided USING gist(geom);
ANALYZE poly_subdivided;
SELECT a.polygon_id, p.point_id
FROM poly_subdivided a
JOIN my_points p ON ST_Contains(a.geom, p.geom)
Here is a great article on the topic.

Find KNN with a geometry data type using POSTGIS in postgresql

i want to find k nearest points for a point with the best performance in postgresql using PostGIS.
The structure of my table is :
CREATE TABLE mylocations
(id integer,
name varchar,
geom geometry);
sample inserted row is:
INSERT INTO mylocations VALUES
(5, 'Alaska', 'POINT(-172.7078 52.373)');
I can find nearest points by ST_Distance() with the following query :
SELECT ST_Distance(geography(geom), 'POINT(178.1375 51.6186)'::geometry) as distance,ST_AsText(geom),name, id
FROM mylocations
ORDER BY distance limit 10;
but actually i want to find them without calculating distance of my points with all points of table.
in fact i want to find the best query with best performance, because my table would have huge data.
i appreciate for your thoughts
You are missing <-> operator which Returns the 2D distance between A and B. Make sure your geom types and SRID are the same.
SELECT ST_Distance(geom,
'POINT(178.1375 51.6186)'::geometry) as distance,
ST_AsText(geom),
name, id
FROM mylocations
ORDER BY geom <-> 'POINT(178.1375 51.6186)'::geometry limit 10
finally i could answer to my question with this query:
SELECT id, name
FROM mylocations
WHERE ST_DWithin(geom::geography,
ST_GeogFromText('POINT(-73.856077 40.848447)'),
1609, false);
actually i want to give a point as a center of a circle with radius 1609 (as meter) and get all neighbours have distance less than 1609 meter to center of the circle.

Optimize query for intersection of ST_Buffer layer in PostGIS

I have two tables stored in PostGIS:
1. a multipolygon vector with about 590000 rows (layerA) and
2. a single multipart (1 row) vector layer (layerB)
and I want to find the area of the intersection between each polygon's buffer in layerA and layerB. My query so far is
SELECT ST_Area(ST_Intersection(a.geom, b.geom)) AS myarea, a.gid AS mygid FROM
(SELECT ST_Buffer(geom, 500) AS geom, gid FROM layerA) AS a,
layerB AS b
So far, I can see my query working but I calculate that it needs 17 hours to be completed (with my PC). Is there another way to execute this query more efficiently and faster?
What if you check intersects of overlapping area before intersection and area calculation, it might lower time.
SELECT ST_Area(ST_Intersection(a.geom, b.geom)) AS myarea, a.gid AS mygid FROM
(SELECT ST_Buffer(geom, 500) AS geom, gid FROM layerA) AS a,
layerB AS b WHERE ST_intersects(a.geom, b.geom)
You would probably get more answers to this at gis.stackexchange.com.
Therea are several things you can do.
You should make sure you get that first filtering of polygons actually intersecting with help of index.
Put a gist index on the table with many geometries and use st_dwithin(geom,500) instead of st_intersects on the buffered geometries. That is because the buffered geometries cannot use the index calculated on the unbuffered geometries.
Also, you say you have multi polygons. If there actually is more than 1 polygon in each multipolygon you might get a lot more speed if you first split the polygons to single polygons before building the index. That will make the.index doing a much bigger part of the job.
There is actually a function in postgis to split even single polygons into smaller pieces for the same reason.
ST_SubDivide
So first use ST_Dump to get single polygons:
CREATE table a_singles AS
SELECT id, (ST_Dump(geom)).geom geom FROM a;
Then create index:
CREATE INDEX idx_a_s_geom
ON a_singles
USING gist(geom);
At last the query, something like
SELECT ST_Area(ST_Intersection(ST_Buffer(a_s.geom,500), b.geom))
FROM a_singles AS a_s
INNER JOIN b
on ST_DWithin(a_s.geom,b.geom,500);
If that still is slow you can start playing with ST_SubDivide.
One more thing. If the single multipolygon in table b contains many geometries, also split them and put an index also there.
It might be slow also after all those things. That depends on how many vertex points there is in the splitted polygons that actually intersect (and for st_dwithin also on how many vertexpoints there is in polygons with overlapping bounding boxes)
But now you don't have any index helping you so this should make it quite a lot faster.

Why my postgis not use index on geometry field?

postgresql 9.5 + postgis 2.2 on windows.
I firstly create a table:
CREATE TABLE points (
id SERIAL,
ad CHAR(40),
name VARCHAR(200)
);
then, add a geometry field 'geom':
select addgeometrycolumn('points', 'geom', 4326, 'POINT', 2);
and create gist index on it:
CREATE INDEX points_index_geom ON points USING GIST (geom);
then, I insert about 1,000,000 points into the table.
I want to query all points that within given distance from given point.
this is my sql code:
SELECT st_astext(geom) as location FROM points
WHERE st_distance_sphere(
st_geomfromtext('POINT(121.33 31.55)', 4326),
geom) < 6000;
the result is what I want, but it is too slow.
when I explain analyze verbose on this code, I found it dose not use points_index_geom (explain shows seq scan and no index).
So i want to know why it dose not use index, and how should i improve?
You cannot expect ST_Distance_Sphere() to use an index on this query. You are doing a calculation on the contents of the geom field and then you are doing a comparison on the calculation result. Databases may not use an index in such a scenario unless you have a function index that does pretty much the same calculation as in your query.
The correct way to find locations with in a given distance from some point is to use ST_DWithin
ST_DWithin — Returns true if the geometries are within the specified
distance of one another. For geometry units are in those of spatial
reference and For geography units are in meters and measurement is
defaulted to use_spheroid=true (measure around spheroid), for faster
check, use_spheroid=false to measure along sphere.
and
This function call will automatically include a bounding box
comparison that will make use of any indexes that are available on the
geometries.

Calculating total area of polygons that intersects with other polygons in Postgis

I want to calculate in Postgis the total area of 'a' polygons, that intersects with others 'b'.
SELECT DISTINCT a.fk_sites,
SUM(ST_Area(a.the_geom)/100) as area
FROM parcelles a, sites b
WHERE st_intersects(a.the_geom,b.the_geom)
GROUP BY a.fk_sites
I need to do a SELECT DISTINCT because 'a' polygons may intersect with several 'b' polygons, so that the returned 'a' polygons appear a few times.
This works fine, I just have the problem, that not all areas are calculated correctly. A few seam to ignore the DISTINCT case, so that the calculated area reflects the SUM of all, even the duplicated 'a' records (even if they should be eliminated).
When I do a query without the SUM function, I get the correct number of 'a' polygons and while adding their area I get the right value.
SELECT DISTINCT a.fk_sites,
ST_Area(a.the_geom)/100 as area
FROM parcelles a, sites b
WHERE st_intersects(a.the_geom,b.the_geom)
ORDER BY a.fk_sites
Is the combination of SELECT DISTINCT and the SUM / GROUP BY not correct?
This may have something to do with you fk_sites column because the query itself should be ok, although doing a DISTINCT on a double precision value is never a good thing.
You can solve this by identifying the distinct rows from a in a sub-query, then sum() in the main query:
SELECT fk_sites, sum(ST_Area(the_geom)/100) AS area
FROM (
SELECT a.fk_sites, a.the_geom
FROM parcelles a
JOIN sites b ON ST_Intersects(a.the_geom, b.the_geom)
) sub
GROUP BY fk_sites
ORDER BY fk_sites;