Im struggling with super slow query which contains hundreds of AND .. OR conditions / text parameters - no joins etc (this is part of closed system co query cannot be easily changed)
Query slowed down recently, after some maintenance we did on DB (before it was executed few seconds, now ~30 mins)
I tried to analyse it - it does basically Bitmap Heap Scan than hundreds Bitmap Index Scan's for each AND condition
As this is closed (vendor) system I cant do much, but I wonder why it slowed down so much from day to day...
I tried:
Reindex Table - still slow
Full Vacuum Table - slow
vacuum full Database - slow
Restore DB from SQL Dump - slow
Copy data to table B - FAST AGAIN!
Truncate data and copy back from table A - FAST!
Query plan was exactly the same in all cases..
Can someone explain me what is happening here?
Table size: ~6'000'000 rows
PostgreSQL 12.
Related
I have a table that isn't overly large (139783 records). Performance was historically never an issue for this table however almost overnight, performance became extremely poor for any query that doesn't specifically have an index in the where clause.
This includes doing a basic select * from example.my_table which now takes around 8 seconds.
As far as I can tell, nothing changed schema wise when the tables performance degraded, it seemed to happen on its own which lead me to think it could be an issue with some sort of postgres internal.
I've forced a vacuum on the table and have had no improvement.
I've tried dumping all of the records from this table into a brand new one and the new one has the normal performance that we had before it degraded (running select * from example.my_table_duplicate takes less than 1 sec).
Ultimately, if it comes to it, I can just put the data into a new table and then rename them but I'd like to understand what could have caused this performance issue.
We are using PostgreSQL version 9.6.15, hosted on AWS RDS.
I'm using Postgres version 9.6
Most of my tables are for queries, update, insert.
Most of them around 200K-700K.
There are bigger (millions) and smaller.
Is that a good idea to perform vacuum (and analyze?) operation once a day? once a week? regardless if there is an autovacuum..
Advantages vs disadvantages?
Autovacuum is done when needed and it only creates statistics that are used when planning a query.
Basically you never need to do this manually, unless you have made vast changes to a table (filled it with data for example), and want to use it in another query within a few milliseconds. In that scenario, old statistics will result in the query planner coming up with a very bad query plan and will lead to a significantly slower query.
What you might want to do once per day / per week, or whatever, is to cluster tables, recreate degraded indexes, on tables that were modified a lot. Research these topics more to decide if / when / how to do it.
I am currently analyzing why the application installed on top of PostgreSQL we are using is sometimes soo slow. The logfiles are showing that queries to a specific table have extremely long execution times.
I further found out, that it is one column on the table, which contains XML documents (ranging from a few bytes to one entry with ~7MB XML data), which is the cause of the slow query.
There are 1100 Rows in the table and a
SELECT * FROM mytable
has the same query execution time of 5 Seconds as
SELECT [XML-column-only] FROM mytable
But in contrast, a
SELECT [id-only] FROM mytable
has a query execution time of only 0.2s!
I couldn't produce any noticeable differences depending on the settings (the usual ones, work_mem, shared_buffers,...), there is even almost no difference in comparison between our production server (PostgreSQL 9.3) and running it in a VM on PostgreSQL 9.4 on my workstation PC.
Disk monitoring shows almost no I/O activity for the query.
So the last thing I went to analyze was the Network I/O.
Of course, as mentioned before, it's a lot of data in these XML Column. Total size for the 1100 rows (XML column only) is 36 MB. Divided for the 5 seconds running time, this are a mere 7.2MB/s Network Transfer, which equal around 60MBit/s. Which is a little bit slow, as we all are on Gbit Ethernet, aren't we? :D Windows Taskmanger also show a utilization of 6% for the Networking during the runtime of the query, which is in concordance with the manual calculation from before.
Furthermore, the query execution time is almost linear to the amount of XML data in the table. For testing I deleted the top 10% rows with the largest amount of data on the XML column, and the execution time (now ~18 instead of 36MB to transfer) dropped to 2.5s instead of 5s.
So, to get to the point: What are my options on the database administration side (we cannot touch or change the application itself), to make the simple SELECT for this XML-Data noticeable faster? Is there any bottleneck I didn't take into account yet? Or is this the normal PostgreSQL behaviour?
EDIT: I use pgAdmin III. The Execution plan (explain (analyze, verbose) select * from gdb_items) shows a much shorter total runtime, than the actual query and the statement duration entry in the log:
Seq Scan on sde.gdb_items (cost=0.00..181.51 rows=1151 width=1399) (actual time=0.005..0.193 rows=1151 loops=1)
Output: objectid, uuid, type, name, physicalname, path, url, properties, defaults, datasetsubtype1, datasetsubtype2, datasetinfo1, datasetinfo2, definition, documentation, iteminfo, shape
Total runtime: 0.243 ms
I am working with a PostgreSQL 8.4.13 database.
Recently I had around around 86.5 million records in a table. I deleted almost all of them - only 5000 records are left. I ran
reindex
and
vacuum analyze
after deleting the rows. But I still see that the table is occupying a large disk space:
jbossql=> SELECT pg_size_pretty(pg_total_relation_size('my_table'));
pg_size_pretty
----------------
7673 MB
Also, the index value of the remaining rows are pretty high still - like in the million range. I thought after vacuuming and re-indexing, the index of the remaining rows would start from 1.
I read the documentation and it's pretty clear that my understanding of re-indexing was skewed.
But nonetheless, my intention is to reduce the table size after delete operation and bring down the index values so that the read operations (SELECT) from the table does not take that long - currently it's taking me around 40 seconds to retrieve just one record from my table.
Update
Thanks Erwin. I have corrected the pg version number.
vacuum full
worked for me. I have one follow up question here:
Restart primary key numbers of existing rows after deleting most of a big table
To actually return disk space to the OS, run VACUUM FULL.
Further reading:
VACUUM returning disk space to operating system
I have a table in PostgreSQL that I need read into memory. It is a very small table, with only three columns and 200 rows, and I just do a select col1, col2, col3 from my_table on the whole thing.
On the development machine this is very fast (less than 1ms), even though that machine is a VirtualBox inside of a Mac OS FileVault.
But on the production server it consistently takes 600 ms. The production server may have lower specs, and the database version is older, too (7.3.x), but that alone cannot explain the huge difference, I think.
In both cases, I am running explain analyze on the db server itself, so it cannot be the network overhead. The query execution plan is in both cases a simple sequential full table scan. There was also nothing else going on on the production machine at the time, so contention is out, too.
How can find out why this is so slow, and what can I do about it?
Sounds like perhaps you haven't been VACUUMing this database properly? 7.3 is way too old to have AutoVacuum, so this is something you must do manually (cron job is recommended). If you have had many updates (over time) to this table and not run VACUUM, it will be very slow to access.
It's clearly table bloat. Run vacuum analyze of the table in question. Also - upgrade. 7.3 is not even supported anymore.
what does happen if you run the query several times ?
the fisrt run should be slow, but the others should be faster because the 1st execution put the data in the cache.
BTW : if you do a SELECT ... FROM without any restriction, it's 100% normal that you have a seq scan, you have to seq scan to retrieve the value, and since you have do restrictions, there is no need to do an index scan.
Don't hesitate to post the result of your Explain Analyze query.
PostgreSQL 7.3 is really old, no option to upgrade to a more modern version ?