GIST index creation too slow on PostgreSQL - postgresql

I have a database in PostgreSQL with the following structure:
Column | Type | Collation | Nullable | Default
-------------+-----------------------+-----------+----------+------------------------------------------------
vessel_hash | integer | | not null | nextval('samplecol_vessel_hash_seq'::regclass)
status | character varying(50) | | |
station | character varying(50) | | |
speed | character varying(10) | | |
longitude | numeric(12,8) | | |
latitude | numeric(12,8) | | |
course | character varying(50) | | |
heading | character varying(50) | | |
timestamp | character varying(50) | | |
the_geom | geometry | | |
Check constraints:
"enforce_dims_the_geom" CHECK (st_ndims(the_geom) = 2)
"enforce_geotype_geom" CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL)
"enforce_srid_the_geom" CHECK (st_srid(the_geom) = 4326)
The database contains ~146.000.000 records and the size of table that contains the data is:
public | samplecol | table | postgres | 31 GB |
I try to create a GIST index on the geometry field the_geom with this command:
create index samplecol_the_geom_gist on samplecol using gist (the_geom);
but takes too long. It runs 2 hours already.
Based on this question Slow indexing of 300GB Postgis table
Ask Question, before index creation I execute in psql console:
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM
SELCT pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
But index creation takes too long. Does anyone know why? An how to fix this?

I am afraid you'll have to sit it out.
Apart from high maintenance_work_mem, there is not really a tuning option.
Increasing max_wal_size will help somewhat, since you will get fewer checkpoints.
If you can't live with an ACCESS EXCLUSIVE lock for that long, try CREATE INDEX CONCURRENTLY, which will be even slower, but won't block concurrent database activity.

Related

Any way to find and delete almost similar records with SQL?

I have a table in Postgres DB, that has a lot of almost identical rows. For example:
1. 00Zicky_-_San_Pedro_Danilo_Vigorito_Remix
2. 00Zicky_-_San_Pedro__Danilo_Vigorito_Remix__
3. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx__
4. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx_
5. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
6. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
I think about to writing a little golang script to remove duplicates, but maybe SQL can do it?
Table definition:
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
Tried several methods to find duplicates, but that methods search only for exact similarity.
There is no operator which is essentially A almost = B. (Well there is full text search, but that seems to be a little excessive here.) If the only difference is the number of - and _ then just get rid of them and compare the the resulting difference. If they are equal, then one is a duplicate. You can use the replace() function to remove them. So something like: (see demo)
delete
from songs s2
where exists ( select null
from songs s1
where s1.song_id < s2.song_id
and replace(replace(s1.name, '_',''),'-','') =
replace(replace(s2.name, '_',''),'-','')
);
If your table is large this will not be fast, but a functional index may help:
create index song_name_idx on songs
(replace(replace(name, '_',''),'-',''));

Postgres 13 permission issues with Revoke

I am a relatively new user of Postgres 13. Let me first tell you, that the Postgres database is hosted on AWS aurora. I have a user that owns a schema and I have a specific table that this user should only be able to SELECT and INSERT rows to this table and execute TRIGGERS.
I have REVOKED ALL on this table for this user and GRANTED SELECT, INSERT, TRIGGER ON TABLE TO USER. The INSERT, SELECT, and TRIGGER work as expected. However, when I execute a SQL UPDATE on that table it still lets me update a row in that table! I also forgot to tell you I REVOKED ALL and performed the same GRANTS to rds_superuser on this table since this user is referenced to rds_superuser.
Any help would be greatly appreciated!
Following are the results of \d:
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+------------------------
id | uuid | | not null | uuid_generate_v4()
patient_medication_id | bigint | | not null |
raw_xml | character varying | | not null |
digital_signature | character varying | | not null |
create_date | timestamp with time zone | | not null | CURRENT_TIMESTAMP
created_by | character varying | | |
update_date | timestamp with time zone | | |
updated_by | character varying | | |
deleted_at | timestamp with time zone | | |
deleted_by | character varying | | |
status | character varying(1) | | | 'A'::character varying
message_type | character varying | | not null |
Indexes:
"rx_cryptographic_signature_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"rx_cryptographic_signature_fk" FOREIGN KEY (patient_medication_id) REFERENCES tryon.patient_medication(id)
Triggers:
audit_trigger_row AFTER INSERT ON tryon.rx_cryptographic_signature FOR EACH ROW EXECUTE FUNCTION td_audit.if_changed_fn('id', '{}')
audit_trigger_stm AFTER TRUNCATE ON tryon.rx_cryptographic_signature FOR EACH STATEMENT EXECUTE FUNCTION td_audit.if_changed_fn() tr_log_delete_attempt BEFORE DELETE ON tryon.rx_cryptographic_signature FOR EACH STATEMENT EXECUTE FUNCTION tryon.fn_log_update_delete_attempt()
tr_log_update_attempt BEFORE UPDATE ON tryon.rx_cryptographic_signature FOR EACH STATEMENT EXECUTE FUNCTION tryon.fn_log_update_delete_attempt()
Following are the results of \z:
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+----------------------------+-------+---------------------------------------+-------------------+----------
tryon | rx_cryptographic_signature | table | TD_Administrator=art/TD_Administrator+| |
| | | td_administrator=art/TD_Administrator+| |
| | | rds_pgaudit=art/TD_Administrator +| |
| | | rds_superuser=art/TD_Administrator | |
(1 row)
Thanks so much for your help!!

Postgres query taking long to execute

I am using libpq to connect the Postgres server in c++ code. Postgres server version is 12.10
My table schema is defined below
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------------+----------+-----------+----------+------------+----------+--------------+-------------
event_id | bigint | | not null | | plain | |
event_sec | integer | | not null | | plain | |
event_usec | integer | | not null | | plain | |
event_op | smallint | | not null | | plain | |
rd | bigint | | not null | | plain | |
addr | bigint | | not null | | plain | |
masklen | bigint | | not null | | plain | |
path_id | bigint | | | | plain | |
attribs_tbl_last_id | bigint | | not null | | plain | |
attribs_tbl_next_id | bigint | | not null | | plain | |
bgp_id | bigint | | not null | | plain | |
last_lbl_stk | bytea | | not null | | extended | |
next_lbl_stk | bytea | | not null | | extended | |
last_state | smallint | | | | plain | |
next_state | smallint | | | | plain | |
pkey | integer | | not null | 1654449420 | plain | |
Partition key: LIST (pkey)
Indexes:
"event_pkey" PRIMARY KEY, btree (event_id, pkey)
"event_event_sec_event_usec_idx" btree (event_sec, event_usec)
Partitions: event_spl_1651768781 FOR VALUES IN (1651768781),
event_spl_1652029140 FOR VALUES IN (1652029140),
event_spl_1652633760 FOR VALUES IN (1652633760),
event_spl_1653372439 FOR VALUES IN (1653372439),
event_spl_1653786420 FOR VALUES IN (1653786420),
event_spl_1654449420 FOR VALUES IN (1654449420)
When I execute the following query it takes 1 - 2 milliseconds to execute.
Time is provided as a parameter to function executing this query, it contains epoche seconds and microseconds.
SELECT event_id FROM event WHERE (event_sec > time.seconds) OR ((event_sec=time.seconds) AND (event_usec>=time.useconds) ORDER BY event_sec, event_usec LIMIT 1
This query is executed every 30 seconds on the same client connection (Which is persistent for weeks). This process runs for weeks, but some time same query starts taking more than 10 minutes.
If I restart the process it recreated connection with the server and now execution time again falls back to 1-2 milliseconds. This issue is intermittent, sometimes it triggers after a week of running process and some time after 2 - 3 weeks of running process.
We add a new partition to table every Sunday and write new data in new partition.
I don't know why the performance is inconsistent, there are many possibilities we can't distinguish with the info provided. Like, does the plan change when the performance changes, or does the same plan just perform worse?
But your query is not written to take maximal advantage of the index. In my hands it can use the index for ordering, but then it still needs to read and individually skip over things that fail the WHERE clause until it finds the first one which passes. And due to partitioning, I think it is even worse than that, it has to do this read-and-skip until it finds the first one which passes in each partition.
You could rewrite it to do a tuple comparison, which can use the index to determine both the order, and where to start:
SELECT event_id FROM event
WHERE (event_sec,event_sec) >= (:seconds,:useconds)
ORDER BY event_sec, event_usec LIMIT 1;
Now this might also degrade, or might not, or maybe will degrade but still be so fast that it doesn't matter.

Does size of row or column effect aggregation queries in Postgresql?

Consider the following table definition:
Column | Type | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+-------------
id | uuid | | not null |
reference_id | text | | |
data | jsonb | | |
tag | character varying(255) | | |
created_at | timestamp with time zone | | |
updated_at | timestamp with time zone | | |
is_active | boolean | | not null | true
status | integer | | | 0
message | text | | |
batch_id | uuid | | not null |
config | jsonb | | |
Overall table size to be over 500M and every row in the table contains a data column to have a JSON of over 50MB.
Questions -
Does the size of the data column effect aggregation such as count?
Assume we are running the below query -
select count(*)
from table
where batch_id = '88f30539-32d7-445c-8d34-f1da5899175c';
Does the size of the data column effect aggregation such as sum?
Assume we are running the below queries -
Query 1 -
select sum(data->>'count'::int)
from table
where batch_id = '88f30539-32d7-445c-8d34-f1da5899175c';
Query 2 -
select sum(jsonb_array_length(data->'some_array'))
from table
where batch_id = '88f30539-32d7-445c-8d34-f1da5899175c';
The best way to know is to measure.
Once the data is large enough to always be TOASTed, then its size will no longer affect the performance of queries which do not need to access the TOASTed data contents, like your first one. Your last two do need to access the contents and their performance will depend on the size.

Can not store a double value for spatial datatype in Mysql5.6

I am trying to insert a polygon shape in a mysql database. Vertices of Polygon shapes consist of double values. To insert the value I have tried with the following query. But I got the following as my error.
INSERT INTO HIBERNATE_SPATIAL
(PRD_GEO_REGION_ID,OWNER_ID,GEO_REGION_NAME,GEO_REGION_DESCRIPTION,GEO_REGION_DEFINITION)
VALUES (8,1,'POLYGON8','SHAPE8',Polygon(LineString(10.12345612341243,11.12345612341234),LineString(10.34512341246,11.4123423456),LineString(10.31423424456,11.34123423456),LineString(10.341234256,11.3412342456),LineString(10.11423423456,11.123424)));
TABLE DESCRIPTION
+------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| PRD_GEO_REGION_ID | int(11) | NO | PRI | NULL | auto_increment |
| OWNER_ID | decimal(3,0) | NO | | NULL | |
| GEO_REGION_NAME | varchar(50) | NO | UNI | NULL | |
| GEO_REGION_DESCRIPTION | varchar(70) | YES | | NULL | |
| GEO_REGION_DEFINITION | geometry | YES | | NULL | |
+------------------------+--------------+------+-----+---------+----------------+
THE ERROR :_:
ERROR 1367 (22007): Illegal non geometric '10.12345612341243' value found during parsing
Regards,
ArunRaj.
Solved the issue. Stored Decimal value successfully. I have done two mistakes.
1 ) LineString can only store Points datatype (Not the decimal values or coordinates). Corrected syntax follows.
INSERT INTO HIBERNATE_SPATIAL
(PRD_GEO_REGION_ID,OWNER_ID,GEO_REGION_NAME,GEO_REGION_DESCRIPTION,GEO_REGION_DEFINITION)
VALUES (8,1,'POLYGON8','SHAPE8',Polygon(LineString(POINT(10.12345612341243,11.12345612341234)),LineString(POINT(10.34512341246,11.4123423456)),LineString(POINT(10.31423424456,11.34123423456)),LineString(POINT(10.341234256,11.3412342456)),LineString(POINT(10.11423423456,11.123424))));
2 ) If it is a polygon shape. The shape has to be closed ( Starting and Ending point should be the same ).
That was the problem.
WORKING QUERY
INSERT INTO HIBERNATE_SPATIAL
(PRD_GEO_REGION_ID,OWNER_ID,GEO_REGION_NAME,GEO_REGION_DESCRIPTION,GEO_REGION_DEFINITION)
VALUES (8,1,'POLYGON8','SHAPE8',Polygon(LineString(POINT(10.12345612341243,11.12345612341234)),LineString(POINT(10.34512341246,11.4123423456)),LineString(POINT(10.31423424456,11.34123423456)),LineString(POINT(10.341234256,11.3412342456)),LineString(POINT(10.12345612341243,11.12345612341234))));