Data
I have 2 tables
- 3D point geometries (obs.geom), n=10
- a single 3D point (target.geom), n=1
Problem - part 1
When I run the following code it lists all 10 of the obs geoms rather than just the closest point to the target.geom. Can anyone give me some pointers?
SELECT ST_3DClosestPoint(target.geom, obs.geom)
FROM target, obs
Part 2
I then want to add in the Distance3D
SELECT ST_3DDistance(ST_3DClosestPoint(target.geom, obs.geom) as dist_closest, obs.geom) as distance
FROM target, obs
where dist_closest > 1.5
We cannot use knn operator(it works only with 2D) so we have to work around a bit.
For a single point in the target table it will be like this.
select * , st_distance(o.geom, t.geom), st_3ddistance(o.geom, t.geom)
from obs o, target t
order by st_3ddistance(o.geom, t.geom) limit 1
But it will not work if you want results for many targets at once. If you want find closest points for many targets then we have to use lateral join
select t2.*, a.*
from target t2,
lateral (select o.*
from obs o, target t
where t2.id=t.id
order by st_3ddistance(o.geom, t.geom) limit 1) a
If you want more then one closest point just increase the limit.
Related
I have two polygon layers. I want to run st_intersection on them, to give the result of the areas where they overlap as a new layer. The new layer should contain the attributes from both input layers. I found this image which seems to illustrate my desired end results.
My two input layers are both polygons:
SELECT st_geometrytype(geom),
COUNT(*)
FROM a
GROUP BY st_geometrytype(geom)
-- Result is 1368 st_polygons
SELECT st_geometrytype(geom),
COUNT(*)
FROM b
GROUP BY st_geometrytype(geom)
-- Result is 539548 st_polygons
The query I run is as below:
SELECT a.*,
b.*,
st_intersection(a.geom, b.geom) as geom
FROM a,b
WHERE st_intersects(a.geom, b.geom)
However in the result I get not just polygons (which I expect), but lines, points, multipolygons and geometry collections. I guess because some of my input polygons share points but not true intersections perhaps?
Grateful for some advice please on how to deal with this, whether my query is correct, anything I can do to improve performance etc. Thanks.
ST_intersect returns several geometry types, depending on the relative topology.
For example, running ST_intersect on two adjacent polygons returns the common part of the shared boundary.
While it ouptuts a single table (as you can verify in pgadmin, for example), in the Browser swatch of QGIS it will be shown as multiple tables of different geometry types (for example: POLYGON, MULTIPOLY, LINE, and POINT) but (somewhat confusingly) with the same name.
Visually, you can tell them apart observing the accompaining icons on the left:
You can however select which type of geometry you want, for example by adding a WHERE filter with ST_Dimension:
SELECT a.*,
b.*,
st_intersection(a.geom, b.geom) as geom
FROM a,b
WHERE st_intersects(a.geom, b.geom)
AND ST_Dimension(st_intersects(a.geom, b.geom)) = 2;
or, for performance sake, re-write it in a fashion similar to:
SELECT clipped.*
FROM (
SELECT a.id, b."fieldName",
(ST_Dump(ST_Intersection(a.geom, b.geom))).geom AS geom
FROM "public"."table_A_name" AS a INNER JOIN "public"."table_B_name" AS b
ON ST_Intersects(a.geom, b.geom)
) AS clipped
WHERE ST_Dimension("clipped"."geom") = 2;
The latter solution creates an anonymous temporary table, which allows ST_Intersection to run only once.
You might have noticed thath the trick is in ST_Dimension("clipped"."geom") = 2.
ST_Dimensions which filters the outputs from ST_Intersection so as to keep only polygons (which have a topological dimension of 2).
We have a database of individual trees with geo location in the DB we seem to have a geom point combined from long and lat named estimated_geometric_location. We get a periodic update of these trees lets say every month. I would like to get a list of trees that has two properties. I am looking to identify the most likely update of a specific tree ie. when a new set of trees from one tracking event comes we need to run a routine suggesting date entry x.2 is an update of the datapoint x.1. Ideally this routine then updates the new data point(child) adding the older mother data point which then hopefully represents that same tree.
So far i have something like this but the DB is not responding (or maybe i am not waiting long enough... waited about 10minutes so far)
SELECT
i.id
,ST_Distance(i.estimated_geometric_location, i.b_estimated_geometric_location) AS dist
FROM(
SELECT
a.id
,b.id AS b_id
,a.estimated_geometric_location
,b.estimated_geometric_location AS b_estimated_geometric_location
,rank() OVER (PARTITION BY a.id ORDER BY ST_Distance(a.estimated_geometric_location, b.estimated_geometric_location)) AS pos
FROM trees a, trees b
WHERE a.id <> b.id) i
WHERE pos = 1
Would be great to get some ideas on this. I got this from a post here somewhere and have adapted it but so far no luck.
There are a couple of things to mention. If the data comes from a tracking event, why compare existing trees to each other? I'd expect to have something like
SELECT id
FROM trees
ORDER BY st_distance(estimated_geometric_location, st_makepoint(15, 30))
LIMIT 1
which returns the tree closest to the point with longitude 15 and latitude 30. Have a look at whether you need to do that join at all.
Supposing that you do, the problem with a query like this is complexity. If you have any number (say 1000) trees in your database, then you're actually calculating the distances between 1000 trees and all of their 999 counterparts, calculating 999.000 distances! Just saying that if the distance between A and B is the same as between B and A, then you should be able to shave off half of them by saying a.id < b.id.
Furthermore, think about what you're doing. You want to find the minimal distance between any two trees and the ids of the trees that correspond to that distance, right? There is no need to calculate any distances as soon as you know they're not the minimal one.
SELECT a.id, b.id, ST_Distance(a.estimated_geometric_location, b.estimated_geometric_location)) distance
FROM trees a, trees b
WHERE a.id < b.id
ORDER BY distance
LIMIT 1
is a much simpler way of getting there, and for me it's a lot faster as well.
I'm not sure I can get a clear question name...
What I want is to calculate the distance between points and polygons (this is step 1 and then, for each point, get only the closest polygon (nb : one polygon can have many points attached, but one point must be attached to only one polygon).
I'm currently doing is the following :
CREATE TABLE temp_table AS
SELECT
areas.*
points.* -- includes a points_id column
ST_DistanceSphere(areas.geometry, points.geometry) AS distance_sphere
FROM points
INNER JOIN areas
ON st_DWithin(areas.geometry, points.geometry, 25)
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY temp_table.points_id ORDER BY distance_sphere ASC) as rownumber, *
FROM temp_table
) X
WHERE rownumber = 1
I have a feeling it's quite inefficient (the first request has been processing all night, on a 4 000 000 rows database... It took 29mn with a limit 10 at the end) as it's calculating many useless rows.
Would putting the first request in the second one be faster ?
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY temp_table.points_id ORDER BY distance_sphere ASC) as rownumber, *
FROM (
SELECT
areas.*
points.* -- includes a points_id column
ST_DistanceSphere(areas.geometry, points.geometry) AS distance_sphere
FROM areas
INNER JOIN points
ON st_DWithin(areas.geometry, points.geometry, 25)
)
) X
WHERE rownumber = 1
If not, how could I optimize what I'm doing ?
What EPSG/SRID do you use (degree, meters) for example:
- 4326 is in degrees
- 3857 is in meters
If you use meteric then you should use st_distance not st_distancesphere. If you use degree EPSG then be carefull with st_dwithin as this using units of EPSG so 25 means 25 degree and that is HUGE distance (around 3000 km).
So if you use 4326 (degree) then for your st_dwithin use much smaller value then 25.
Create gist indexes on both geometry columns.
Create index on point using gist(geometry);
Create index on areas using gist(geometry);
And just use your question with proposed changes.(change st_distancesphare to st_distance or use st_dwithin with much smaller value).
I want to calculate in Postgis the total area of 'a' polygons, that intersects with others 'b'.
SELECT DISTINCT a.fk_sites,
SUM(ST_Area(a.the_geom)/100) as area
FROM parcelles a, sites b
WHERE st_intersects(a.the_geom,b.the_geom)
GROUP BY a.fk_sites
I need to do a SELECT DISTINCT because 'a' polygons may intersect with several 'b' polygons, so that the returned 'a' polygons appear a few times.
This works fine, I just have the problem, that not all areas are calculated correctly. A few seam to ignore the DISTINCT case, so that the calculated area reflects the SUM of all, even the duplicated 'a' records (even if they should be eliminated).
When I do a query without the SUM function, I get the correct number of 'a' polygons and while adding their area I get the right value.
SELECT DISTINCT a.fk_sites,
ST_Area(a.the_geom)/100 as area
FROM parcelles a, sites b
WHERE st_intersects(a.the_geom,b.the_geom)
ORDER BY a.fk_sites
Is the combination of SELECT DISTINCT and the SUM / GROUP BY not correct?
This may have something to do with you fk_sites column because the query itself should be ok, although doing a DISTINCT on a double precision value is never a good thing.
You can solve this by identifying the distinct rows from a in a sub-query, then sum() in the main query:
SELECT fk_sites, sum(ST_Area(the_geom)/100) AS area
FROM (
SELECT a.fk_sites, a.the_geom
FROM parcelles a
JOIN sites b ON ST_Intersects(a.the_geom, b.the_geom)
) sub
GROUP BY fk_sites
ORDER BY fk_sites;
I have the following, which gives me the number of customers within 10,000 meters of any store location:
SELECT COUNT(*) as customer_count FROM customer_table c
WHERE EXISTS(
SELECT 1 FROM locations_table s
WHERE ST_Distance_Sphere(s.the_geom, c.the_geom) < 10000
)
What I need is for this query to return not only the number of customers within 10,000 meters, but also the following. The number of customers within...
10,000 meters
more than 10,000, but less than 50,000
more than 50,000, but less than 10,0000
more than 100,000
...of any location.
I'm open to this working a couple of ways. For a given customer, only count them one time (the shortest distance to any store), which would count everyone exactly once. I realize this is probably pretty complex. I'm also open to having people be counted multiple times, which is really the accurate values anyway and think should be much simpler.
Thanks for any direction.
You can do both types of queries relatively easily. But an issue here is that you do not know which customers are associated with which store locations, which seems like an interesting thing to know. If you want that, use the PK and store_name of the locations_table in the query. See both options with location id and store_name below. To emphasize the difference between the two options:
The first option indicates how many customers are in every distance class for every store location, for all customers for every store location.
The second option indicates how many customers are in every distance class for every store location, for the nearest store location for each customer only.
This is a query of O(n x m) running order (implemented with the CROSS JOIN between customer_table and locations_table) and likely to become rather slow with increasing numbers of rows in either table.
Count customers in all distance classes
You should make a CROSS JOIN between the distances of customers from store locations and then group them by the store location id, name and classes of maximum distance that you define. You can create a "table" from your distance classes with the VALUES command which you can then simply use in any query:
SELECT loc_dist.id, loc_dist.store_name, grps.grp, count(*)
FROM (
SELECT s.id, s.store_name, ST_Distance_Sphere(s.the_geom, c.the_geom) AS dist
FROM customer_table c, locations_table s) AS loc_dist
JOIN (
VALUES(1, 10000.), (2, 50000.), (3, 100000.), (4, 1000000.)
) AS grps(grp, dist) ON loc_dist.dist < grps.dist
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
Count customers in the nearest distance class
If you want customers listed in the nearest distance class only, then you should make the same CROSS JOIN on customer_table and locations_table as in the previous case, but then simply select the lowest group (i.e. the closest store) using a CASE clause in the query and GROUP BY store location id, name and distance class as before:
SELECT
id, store_name,
CASE
WHEN dist < 10000. THEN 1
WHEN dist < 50000. THEN 2
WHEN dist < 100000. THEN 3
ELSE 4
END AS grp,
count(*)
FROM (
SELECT s.id, s.store_name, ST_Distance_Sphere(s.the_geom, c.the_geom) AS dist
FROM customer_table c, locations_table s) AS loc_dist
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;