Deleting duplicate edges from edge table - postgresql

I have thought of following query but it seems syntactically wrong.
delete from table where geom_line in(select distinct(a.geom_line) from table
a ,table b where a.source=b.target and b.source=a.target);
my idea is to delete on of the edges of there exists edges where same source to target is target to source in another edge that is two edges for nodes 1 and 2, from 1->2 and 2->1 and i will delete one of them.
When running only the subquery i get distinct edges with there "geom_lines" as 2000 rows while when i run the whole query 4000 rows are deleted what could be wrong here?

That should be as simple as
DELETE FROM "table" a
USING "table" b
WHERE a.source = b.target
AND a.target = b.source
AND a.source > a.target;

Related

How to delete rows (with geometry column) from a table in PostgreSQL (PostGIS) where the row has no spatial intersection with any row in another table

I have seen a strange behavior in PostgreSQL (PostGIS).
I have two tables in PostGIS with geometry columns. one table is a grid and the other one is lines. I want to delete all grid cells that no line passes through them.
In other words, I want to delete the rows from a table when that row has no spatial intersection with any rows of second table.
First, in a subquery, I find the ids of the rows that have any intersection. Then, I delete any row that its id is not in that returned list of ids.
DELETE FROM base_grid_916453354
WHERE id NOT IN
(
SELECT DISTINCT bg.id
FROM base_grid_916453354 bg, (SELECT * FROM tracks_heatmap_1000
LIMIT 100000) tr
WHERE bg.geom && tr.buffer
);
The following subquery returns in only 12 seconds
SELECT DISTINCT bg.id
FROM base_grid_916453354 bg, (SELECT * FROM tracks_heatmap_1000 LIMIT
100000) tr
WHERE bg.geom && tr.buffer
, while the whole query did not return even in 1 hour!!
I ran explain command and it is the result of it, but I cannot interpret it:
How can I improve this query and why deleting from the returned list takes so much of time?
It is very strange because the subquery is a spatial query between 2 tables of 9 million and 100k rows, while the delete part is just checking a list and deleting!! In my mind, the delete part is much much easier.
Don't post text as images of text!
Increase work_mem until the subplan becomes a hashed subplan.
Or rewrite it to use 'NOT EXISTS' rather than NOT IN
I found a fast way to do this query. As #jjanes said, I used EXISTS() function:
DELETE FROM base_grid_916453354 bg
WHERE NOT EXISTS
(
SELECT 1
FROM tracks_heatmap_1000 tr
WHERE bg.geom && tr.buffer
);
This query takes around 1 minute and it is acceptable for the size of my tables.

How do I remove several duplicate geometries?

Several of the same geometries appears in the database. I an output with only one object_id for every distinct geometry.
I have managed to get a result with two overlapping geometries.
SELECT
a.objekt_id,
b.objekt_id,
a.geometri
b.geometri
from
plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v a,
plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v b
WHERE
ST_EQUALS(a.geometri, b.geometri)
AND
a.objekt_id != b.objekt_id;
The result is a table showing a row for two overlapping geometries. Though sometimes there are three rows with six overlapping geometries. I want the result to have all these in one row.
DELETE FROM plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v t
WHERE EXISTS (SELECT FROM plandk.theme_pdk_tilslutningspligtomraade_vedtaget_v t2
WHERE t.objekt_id > t2.objekt_id
AND ST_EQUALS(a.geometri, b.geometri))
It will delete all doubled geometries except this with lowest id for each equal geometry group.

I'm creating an edge table, how to prevent duplicating edges

The query is like so:
CREATE TABLE Edge_Table AS
SELECT a.*gid,nextval('ty') AS edge_gid,
ST_SetSRID(ST_MakeLine(a.geom, getcentroids(a.gid)),4326) AS geom_line
FROM Points_table a;
where my getcentroids Function returns 8 nearest points to each point creating an edge with each one of them, the problem arises with duplicates as same edge is created from 1->2 and 2->1, how do I optimise this query itself as large data has to be processed, could index or UNIQUE constraint help?

Join two tables where both joined columns have a large set of different values

I am currently trying to join two tables, where both of the tables have very many different in the columns I am joining.
Here's the tsql
from AVG(Position) as Position from MonitoringGsc_Keywords as sk
Join GSC_RankingData on sk.Id = GSC_RankingData.KeywordId
groupy by sk.Id
The execution plan shows me, that it takes very much time to perform the join. I think it is because a huge group from the first table has to be compared with a huge group of values in the second table.
MonitoringGsc_Keywords.Id has about 60.000 different values
GSC_RankingData hat about 100.000.000 Values
MonitoringGsc_Keywords.Id is Primary-Key of MonitoringGsc_Keywords GSC_RankingData.KeywordId is indexed.
So, what can i do to increase performance?
Is Position column from GSC_RankingData table? If yes then JOIN is redundant and query should looks like this:
SELECT AVG(rd.Position) as Position
FROM GSC_RankingData rd
GROUP BY rd.KeywordId;
If Position column is in GSC_RankingData table then index on GSC_RankingData should include this column and looks like this:
CREATE INDEX IX_GSC_RankingData_KeywordId_Position ON GSC_RankingData(KeywordId) INCLUDE(Position);
You should check indexes fragmentation for this tables, to do this you could use this query:
SELECT * FROM sys.dm_db_index_physical_stats(db_id(), object_id('MonitoringGsc_Keywords'), null, null, 'DETAILED')
if avg_fragmentation_in_percent > 5% and < 30% then
ALTER INDEX [index name] on [table name] REORGANIZE;
if avg_fragmentation_in_percent >= 30% then
ALTER INDEX [index name] on [table name] REBUILD;
It could be problem with statistics, you could check it with query:
SELECT
sp.stats_id, name, filter_definition, last_updated, rows, rows_sampled,
steps, unfiltered_rows, modification_counter
FROM sys.stats AS stat
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
WHERE stat.object_id = object_id('GSC_RankingData');
check last update date, rows count, if it not be current then update statistics. Also it could be possible that statistics not exist, then you must create it.

Updating multiple rows in one table based on multiple rows in a second

I have two tables, table1 and table2, both of which contain columns that store postgis geometries. What I want to do is see where the geometry stored in any row of table2 geometrically intersects with the geometry stored in any row of table1 and update a count column in table1 with the number of intersections. Therefore, if I have a geometry in row 1 of table1 that intersects with the geometries stored in 5 rows in table2, I want to store a count of 5 in a separate column in table one. The tricky part for me is that I want to do this for every row of column 1 at the same time.
I have the following:
UPDATE circles SET intersectCount = intersectCount + 1 FROM rectangles
WHERE ST_INTERSECTS(cirlces.geom, rectangles.geom);
...which doesn't seem to be working. I'm not too familiar with postgres (or sql in general) and I'm wondering if I can do this all in one statement or if I need a few. I have some ideas for how I would do this with multiple statements (or using for loop) but I'm really looking for a concise solution. Any help would be much appreciated.
Thanks!
something like:
update t1 set ctr=helper.ctr
from (
select t1.id, count(*) as cnt
from t1, t2
where st_intersects(t1.col, t2.col)
group by t1.id
) helper
where helper.id=t1.id
?
btw: Your version does not work, because a row can get updated only once in a single update statement.