How do I remove several duplicate geometries?

How do I remove several duplicate geometries? - postgresql

Several of the same geometries appears in the database. I an output with only one object_id for every distinct geometry.
I have managed to get a result with two overlapping geometries.
SELECT
a.objekt_id,
b.objekt_id,
a.geometri
b.geometri
from
plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v a,
plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v b
WHERE
ST_EQUALS(a.geometri, b.geometri)
AND
a.objekt_id != b.objekt_id;
The result is a table showing a row for two overlapping geometries. Though sometimes there are three rows with six overlapping geometries. I want the result to have all these in one row.

DELETE FROM plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v t
WHERE EXISTS (SELECT FROM plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v t2
WHERE t.objekt_id > t2.objekt_id
AND ST_EQUALS(a.geometri, b.geometri))
It will delete all doubled geometries except this with lowest id for each equal geometry group.

Related

How to delete rows (with geometry column) from a table in PostgreSQL (PostGIS) where the row has no spatial intersection with any row in another table

I have seen a strange behavior in PostgreSQL (PostGIS).
I have two tables in PostGIS with geometry columns. one table is a grid and the other one is lines. I want to delete all grid cells that no line passes through them.
In other words, I want to delete the rows from a table when that row has no spatial intersection with any rows of second table.
First, in a subquery, I find the ids of the rows that have any intersection. Then, I delete any row that its id is not in that returned list of ids.
DELETE FROM base_grid_916453354
WHERE id NOT IN
(
SELECT DISTINCT bg.id
FROM base_grid_916453354 bg, (SELECT * FROM tracks_heatmap_1000
LIMIT 100000) tr
WHERE bg.geom && tr.buffer
);
The following subquery returns in only 12 seconds
SELECT DISTINCT bg.id
FROM base_grid_916453354 bg, (SELECT * FROM tracks_heatmap_1000 LIMIT
100000) tr
WHERE bg.geom && tr.buffer
, while the whole query did not return even in 1 hour!!
I ran explain command and it is the result of it, but I cannot interpret it:
How can I improve this query and why deleting from the returned list takes so much of time?
It is very strange because the subquery is a spatial query between 2 tables of 9 million and 100k rows, while the delete part is just checking a list and deleting!! In my mind, the delete part is much much easier.

Don't post text as images of text!
Increase work_mem until the subplan becomes a hashed subplan.
Or rewrite it to use 'NOT EXISTS' rather than NOT IN

I found a fast way to do this query. As #jjanes said, I used EXISTS() function:
DELETE FROM base_grid_916453354 bg
WHERE NOT EXISTS
(
SELECT 1
FROM tracks_heatmap_1000 tr
WHERE bg.geom && tr.buffer
);
This query takes around 1 minute and it is acceptable for the size of my tables.

Best way to run ST_intersects on features inside one table?

I have a table of LineString features and I wish to identify which lines intersect.
ST_Intersects(geom1, geom2) needs two geometries from two different tables. Right now i am creating two different references back to the same table and it just doesn't seem like the right approach.
I am currently using the following bit of code, and I am curious if there is some better way of accomplishing this. Surely running an intersect on features within one table must be a common task.
SELECT a.link_id as a_link_id,
b.link_id as b_link_id,
st_intersects(a.geom, b.geom)
INTO results_table
FROM table_one a, table_one b
WHERE a.link_id != b.link_id;
PostGIS 2.4.0
PG 9.6.5

Your approach is ok. The only problem with this is that it will return duplicate records. e.g if two lines are intersecting with IDs 10 and 11 respectively. There will be two rows for each ID in the result, even the lines are intersecting only once. You can cater this with > or < operator in place of !=. And intersect condition comes in where i guess
SELECT a.link_id as a_link_id,
b.link_id as b_link_id
INTO results_table
FROM table_one a, table_one b
WHERE a.link_id < b.link_id AND st_intersects(a.geom, b.geom)

Join two tables where both joined columns have a large set of different values

I am currently trying to join two tables, where both of the tables have very many different in the columns I am joining.
Here's the tsql
from AVG(Position) as Position from MonitoringGsc_Keywords as sk
Join GSC_RankingData on sk.Id = GSC_RankingData.KeywordId
groupy by sk.Id
The execution plan shows me, that it takes very much time to perform the join. I think it is because a huge group from the first table has to be compared with a huge group of values in the second table.
MonitoringGsc_Keywords.Id has about 60.000 different values
GSC_RankingData hat about 100.000.000 Values
MonitoringGsc_Keywords.Id is Primary-Key of MonitoringGsc_Keywords GSC_RankingData.KeywordId is indexed.
So, what can i do to increase performance?

Is Position column from GSC_RankingData table? If yes then JOIN is redundant and query should looks like this:
SELECT AVG(rd.Position) as Position
FROM GSC_RankingData rd
GROUP BY rd.KeywordId;
If Position column is in GSC_RankingData table then index on GSC_RankingData should include this column and looks like this:
CREATE INDEX IX_GSC_RankingData_KeywordId_Position ON GSC_RankingData(KeywordId) INCLUDE(Position);
You should check indexes fragmentation for this tables, to do this you could use this query:
SELECT * FROM sys.dm_db_index_physical_stats(db_id(), object_id('MonitoringGsc_Keywords'), null, null, 'DETAILED')
if avg_fragmentation_in_percent > 5% and < 30% then
ALTER INDEX [index name] on [table name] REORGANIZE;
if avg_fragmentation_in_percent >= 30% then
ALTER INDEX [index name] on [table name] REBUILD;
It could be problem with statistics, you could check it with query:
SELECT
sp.stats_id, name, filter_definition, last_updated, rows, rows_sampled,
steps, unfiltered_rows, modification_counter
FROM sys.stats AS stat
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
WHERE stat.object_id = object_id('GSC_RankingData');
check last update date, rows count, if it not be current then update statistics. Also it could be possible that statistics not exist, then you must create it.

number of points within a radius of another set of points

I have two tables. One is a list of stores (with lat/long). The other is a list of customer addresses (with lat/long). What I want is a query that will return the number of customers within a certain radius for each store in my table. This gives me the total number of customers within 10,000 meters of ANY store, but I'm not sure how to loop it to return one row for each store with a count.
Note that I'm doing this queries using cartoDB, where the_geom is basically long/lat.
SELECT COUNT(*) as customer_count FROM customer_table
WHERE EXISTS(
SELECT 1 FROM store_table
WHERE ST_Distance_Sphere(store_table.the_geom, customer_table.the_geom) < 10000
)
This results in a single row :
customer_count
4009
Suggestions on how to make this work against my problem? I'm open to doing this other ways that might be more efficient (faster).
For reference, the column with store names, which would be in one column is store_identifier.store_table

I'll assume that you use the_geom to represent the coordinate (lat/lon) of store and customer. I will also assume that the_geom is of geography type. Your query will be something like this
select s.id, count(*) as customer_count
from customers c
inner join stores s
on st_dwithin(c.the_geom, s.the_geom, 10000)
group by s.id
This should give you neat table with a store id and count of customers within 10,000 meters from the store.
If the_geom is of type geometry, you query will be very similar but you should use st_distance_sphere() instead in order to express distance in kilometers (not degrees).

Updating multiple rows in one table based on multiple rows in a second

I have two tables, table1 and table2, both of which contain columns that store postgis geometries. What I want to do is see where the geometry stored in any row of table2 geometrically intersects with the geometry stored in any row of table1 and update a count column in table1 with the number of intersections. Therefore, if I have a geometry in row 1 of table1 that intersects with the geometries stored in 5 rows in table2, I want to store a count of 5 in a separate column in table one. The tricky part for me is that I want to do this for every row of column 1 at the same time.
I have the following:
UPDATE circles SET intersectCount = intersectCount + 1 FROM rectangles
WHERE ST_INTERSECTS(cirlces.geom, rectangles.geom);
...which doesn't seem to be working. I'm not too familiar with postgres (or sql in general) and I'm wondering if I can do this all in one statement or if I need a few. I have some ideas for how I would do this with multiple statements (or using for loop) but I'm really looking for a concise solution. Any help would be much appreciated.
Thanks!

something like:
update t1 set ctr=helper.ctr
from (
select t1.id, count(*) as cnt
from t1, t2
where st_intersects(t1.col, t2.col)
group by t1.id
) helper
where helper.id=t1.id
?
btw: Your version does not work, because a row can get updated only once in a single update statement.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How do I remove several duplicate geometries? - postgresql

Related

How to delete rows (with geometry column) from a table in PostgreSQL (PostGIS) where the row has no spatial intersection with any row in another table

Best way to run ST_intersects on features inside one table?

Join two tables where both joined columns have a large set of different values

number of points within a radius of another set of points

Updating multiple rows in one table based on multiple rows in a second

Categories

Resources