I setup TimescaleDB and Postgresql for testing performance on time-serial data. I have successful setup the hyper table. I test with inserted 2M rows from my C# program. But the TimescaleDB is process totally slower than pure Postgresql. Even with the TimescaleDB my program did not response after it insert some hundred records. I don't know why. Can anyone give me a hint? Or am I missing something?
We will need a bit more information in order to pinpoint what issues you're running into. If TimescaleDB is not responding at all after inserting a few hundred records, it sounds like something is definitely misconfigured, either on the database or the system level. Is the client simply timing out, or is it accompanied with some kind of error? If the process is hanging, do you have any insight into what the system is doing during this time (e.g., is there a lot of IO, is CPU maxed out?). Are you seeing any locks in contention when this happens (see Postgres Lock Monitoring? It would also be good to see your data model and how your TimescaleDB hypertable was created.
Also, note that TimescaleDB will not necessarily outperform Postgres with a low number of rows and single row inserts. TimescaleDB shines when you hit tens of millions of rows or more and insert in batches. See the PostgreSQL vs TimescaleDB Blog Post for more information.
Related
I'm facing an issue where 2 out of 10 spanner databases are showing a high CPU usage (above 40%) whereas the others are around %1 each, with almost identical or more data.
I notice one of our tables has become "unresponsive" no queries work against it. We shutdown all apps that connect to those dbs, and we also deleted all current sessions using gcloud sessions list and then gcloud session delete.
However the table is still unresponsive. A simple select like select id from mytable where name = 'test' is not responding (when tested from an app, and also from gcloud web interface), it only happens with that table, which has only a few columns with normal data and no more than 2000 records. We identified the query that could have been the source of the problem, however the table seems to be locked (only count(*) without any where clause works).
I was wondering if there is any way to "unlock" the table, kill those "transactions" that might be causing the issue, or restart those specific spanner databases, or in the worst case scenario restarting the spanner instance.
I have seen the monitoring high cpu documentation, but even if we can identify the cpu is high, we don't really know how to restart or make it back to normal before reviewing the query/ies that could have caused the issue (if that was the case).
High CPU can be caused by user initiated queries, from different types of operations. It is important to notice that your instance is where you allocate resources to be used by the underlying Cloud Spanner databases. This means, that if all of your databases are in the same instance and if some of your databases are hogging the CPU, all your other databases will struggle.
In terms of a locked table, I would be very surprised if a deadlock is the problem here, since Spanner mitigates those issues using "wound-wait" algorithm. What I suspect might be happening is that a long time is necessary to perform the query in that table and it times out. It would be nice to investigate a bit more on your problem:
How long did you wait for your query on the problematic table (before you deemed to be stuck)? It might be a problem where your query just takes long and you are timing out before getting the results. It might be a problem where there are hotspots in your table and transactions are getting aborted often, preventing you from getting results.
What error did you get when performing the query on the table? The error code can tell you more about what might be happening.
Have you tried doing a stale read on the table to see if any data is returned? If lock contention is the problem, this should succeed and return results faster for you. Thus, you can further investigate the lock usage with the statistics table (as shown below).
Inspect query statistics: you can list the queries with the highest CPU usage, find the average latency for a query and find out the queries that timeout the most. There is much more you can do, as seen here. You can also view the query statistics in cloud console. What I would verify is, by reducing the overall CPU, does your query come back without any issues? You might need more resources if so. Or you might need to reduce hotspots in your database.
Inspect Lock statistics: you can investigate lock conflicts in your rows and tables. I think that an interesting query for your problem would be Discovering which row keys and columns had long lock wait times during the selected period. You can then see if your query is reading those row keys and columns and act accordingly.
Summary
I am using Postgres UPSERTs in our ETLs and I'm experiencing issues with fragmentation and bloat on the tables I am writing to, which is slowing down all operations including reads.
Context
I have hourly batch ETLs upserting into tables (tables ~ 10s of Millions, upserts ~ 10s of thousands) and we have auto vacuums set to thresholds on AWS.
I have had to run FULL vacuums to get the space back and prevent processes from hanging. This has been exacerbated now as the frequency of one of our ETLs has increased, which populates some core tables which are the source for a number of denormalised views.
It seems like what is happening is that tables don't have a chance to be vacuumed before the next ETL run, thus creating a spiral which eventually leads to a complete slow-down.
Question!
Does Upsert fundamentally have a negative impact on fragmentation and if so, what are other people using? I am keen to implement some materialised views and move most of our indexes to the new views while retaining only the PK index on the tables we are writing to, but I'm not confident that this will resolve the issue I'm seeing with bloat.
I've done a bit of reading on the issue but nothing conclusive, for example --> https://www.targeted.org/articles/databases/fragmentation.html
Thanks for your help
It depends. If there are no constraint violations, INSERT ... ON CONFLICT won't cause any bloat. If it performs an update, it will produce a dead row.
The measures you can take:
set autovacuum_vacuum_cost_delay = 0 for faster autovacuum
use a fillfactor somewhat less than 100 and have no index on the updated columns, so that you can get HOT updates, which make autovacuum unnecessary
It is not clear what you are actually seeing. Can you turn track_io_timing on, and then do an EXPLAIN (ANALYZE, BUFFERS) for the query that you think has been slowed down by bloat?
Bloat and fragmentation aren't the same thing. Fragmentation is more an issue with indexes under some conditions, not the tables themselves.
It seems like what is happening is that tables don't have a chance to be vacuumed before the next ETL run
This one could be very easy to fix. Run a "manual" VACUUM (not VACUUM FULL) at end or at the beginning of each ETL run. Since you have a well defined workflow, there is no need to try get autovacuum to do the right thing, as it should be very easy to inject manual vacuums into your workflow. Or do you think that one VACUUM per ETL is overkill?
Hi there I'm looking for advice from someone who is good at IBM db2 performance.
I have a situation in which many batch tasks are massively inserting rows in the same db2 table, at the same time.
This situation looks potentially bad. I don't think db2 is able to resolve the many requests quickly enough, causing the concurring tasks to take longer to end and even causing some of them to abend with a -904 or -911 sqlcode.
What do you guys think? Should situations like these be avoided? Are there some sort of techniques that could improve the performance of the batch tasks, keeping them from abending or running too slow?
Thanks.
Inserting should not be a big problem running ETL workloads i.e. with DataStage do this all the time.
I suggest to run an
ALTER TABLE <tabname> APPEND ON
This avoids the free space search - details can be found here
With the errors reported the information provided is not sufficient to get the cause of it.
There are several things to consider. What indexes are on the table is one.
Append mode works well to relieve last page contention, there are also issues where you could see contention with the statement itself for the variation lock. Then you could have issues with the transaction logs if they are not fast enough.
What is the table, indexes and statement and maybe we can come up with how to do it. What hardware are you using and what io subsystem is being used for transaction logs and database tablespaces.
I have a server with 64GB RAM and PostgreSQL 9.2. On it is one small database "A" with only 4GB which is only queried once an hour or so and one big database "B" with about 60GB which gets queried 40-50x per second!
As expected, Linux and PostgreSQL fill the RAM with the bigger database's data as it is more often accessed.
My problem now is that the queries to the small database "A" are critical and have to run in <500ms. The logfile shows a couple of queries per day that took >3s though. If I execute them by hand they, too, take only 10ms so my indexes are fine.
So I guess that those long runners happen when PostgreSQL has to load chunks of the small databases indexes from disk.
I already have some kind of "cache warmer" script that repeats "SELECT * FROM x ORDER BY y" queries to the small database every second but it wastes a lot of CPU power and only improves the situation a little bit.
Any more ideas how to tell PostgreSQL that I really want that small database "sticky" in memory?
PostgreSQL doesn't offer a way to pin tables in memory, though the community would certainly welcome people willing to work on well thought out, tested and benchmarked proposals for allowing this from people who're willing to back those proposals with real code.
The best option you have with PostgreSQL at this time is to run a separate PostgreSQL instance for the response-time-critical database. Give this DB a big enough shared_buffers that the whole DB will reside in shared_buffers.
Do NOT create a tablespace on a ramdisk or other non-durable storage and put the data that needs to be infrequently but rapidly accessed there. All tablespaces must be accessible or the whole system will stop; if you lose a tablespace, you effectively lose the whole database cluster.
We're using Postgresql 9.1.4 as our db server. I've been trying to speed up my test suite so I've stared profiling the db a bit to see exactly what's going on. We are using database_cleaner to truncate tables at the end of tests. YES I know transactions are faster, I can't use them in certain circumstances so I'm not concerned with that.
What I AM concerned with, is why TRUNCATION takes so long (longer than using DELETE) and why it takes EVEN LONGER on my CI server.
Right now, locally (on a Macbook Air) a full test suite takes 28 minutes. Tailing the logs, each time we truncate tables... ie:
TRUNCATE TABLE table1, table2 -- ... etc
it takes over 1 second to perform the truncation. Tailing the logs on our CI server (Ubuntu 10.04 LTS), take takes a full 8 seconds to truncate the tables and a build takes 84 minutes.
When I switched over to the :deletion strategy, my local build took 20 minutes and the CI server went down to 44 minutes. This is a significant difference and I'm really blown away as to why this might be. I've tuned the DB on the CI server, it has 16gb system ram, 4gb shared_buffers... and an SSD. All the good stuff. How is it possible:
a. that it's SO much slower than my Macbook Air with 2gb of ram
b. that TRUNCATION is so much slower than DELETE when the postgresql docs state explicitly that it should be much faster.
Any thoughts?
This has come up a few times recently, both on SO and on the PostgreSQL mailing lists.
The TL;DR for your last two points:
(a) The bigger shared_buffers may be why TRUNCATE is slower on the CI server. Different fsync configuration or the use of rotational media instead of SSDs could also be at fault.
(b) TRUNCATE has a fixed cost, but not necessarily slower than DELETE, plus it does more work. See the detailed explanation that follows.
UPDATE: A significant discussion on pgsql-performance arose from this post. See this thread.
UPDATE 2: Improvements have been added to 9.2beta3 that should help with this, see this post.
Detailed explanation of TRUNCATE vs DELETE FROM:
While not an expert on the topic, my understanding is that TRUNCATE has a nearly fixed cost per table, while DELETE is at least O(n) for n rows; worse if there are any foreign keys referencing the table being deleted.
I always assumed that the fixed cost of a TRUNCATE was lower than the cost of a DELETE on a near-empty table, but this isn't true at all.
TRUNCATE table; does more than DELETE FROM table;
The state of the database after a TRUNCATE table is much the same as if you'd instead run:
DELETE FROM table;
VACCUUM (FULL, ANALYZE) table; (9.0+ only, see footnote)
... though of course TRUNCATE doesn't actually achieve its effects with a DELETE and a VACUUM.
The point is that DELETE and TRUNCATE do different things, so you're not just comparing two commands with identical outcomes.
A DELETE FROM table; allows dead rows and bloat to remain, allows the indexes to carry dead entries, doesn't update the table statistics used by the query planner, etc.
A TRUNCATE gives you a completely new table and indexes as if they were just CREATEed. It's like you deleted all the records, reindexed the table and did a VACUUM FULL.
If you don't care if there's crud left in the table because you're about to go and fill it up again, you may be better off using DELETE FROM table;.
Because you aren't running VACUUM you will find that dead rows and index entries accumulate as bloat that must be scanned then ignored; this slows all your queries down. If your tests don't actually create and delete all that much data you may not notice or care, and you can always do a VACUUM or two part-way through your test run if you do. Better, let aggressive autovacuum settings ensure that autovacuum does it for you in the background.
You can still TRUNCATE all your tables after the whole test suite runs to make sure no effects build up across many runs. On 9.0 and newer, VACUUM (FULL, ANALYZE); globally on the table is at least as good if not better, and it's a whole lot easier.
IIRC Pg has a few optimisations that mean it might notice when your transaction is the only one that can see the table and immediately mark the blocks as free anyway. In testing, when I've wanted to create bloat I've had to have more than one concurrent connection to do it. I wouldn't rely on this, though.
DELETE FROM table; is very cheap for small tables with no f/k refs
To DELETE all records from a table with no foreign key references to it, all Pg has to do a sequential table scan and set the xmax of the tuples encountered. This is a very cheap operation - basically a linear read and a semi-linear write. AFAIK it doesn't have to touch the indexes; they continue to point to the dead tuples until they're cleaned up by a later VACUUM that also marks blocks in the table containing only dead tuples as free.
DELETE only gets expensive if there are lots of records, if there are lots of foreign key references that must be checked, or if you count the subsequent VACUUM (FULL, ANALYZE) table; needed to match TRUNCATE's effects within the cost of your DELETE .
In my tests here, a DELETE FROM table; was typically 4x faster than TRUNCATE at 0.5ms vs 2ms. That's a test DB on an SSD, running with fsync=off because I don't care if I lose all this data. Of course, DELETE FROM table; isn't doing all the same work, and if I follow up with a VACUUM (FULL, ANALYZE) table; it's a much more expensive 21ms, so the DELETE is only a win if I don't actually need the table pristine.
TRUNCATE table; does a lot more fixed-cost work and housekeeping than DELETE
By contrast, a TRUNCATE has to do a lot of work. It must allocate new files for the table, its TOAST table if any, and every index the table has. Headers must be written into those files and the system catalogs may need updating too (not sure on that point, haven't checked). It then has to replace the old files with the new ones or remove the old ones, and has to ensure the file system has caught up with the changes with a synchronization operation - fsync() or similar - that usually flushes all buffers to the disk. I'm not sure whether the the sync is skipped if you're running with the (data-eating) option fsync=off .
I learned recently that TRUNCATE must also flush all PostgreSQL's buffers related to the old table. This can take a non-trivial amount of time with huge shared_buffers. I suspect this is why it's slower on your CI server.
The balance
Anyway, you can see that a TRUNCATE of a table that has an associated TOAST table (most do) and several indexes could take a few moments. Not long, but longer than a DELETE from a near-empty table.
Consequently, you might be better off doing a DELETE FROM table;.
--
Note: on DBs before 9.0, CLUSTER table_id_seq ON table; ANALYZE table; or VACUUM FULL ANALYZE table; REINDEX table; would be a closer equivalent to TRUNCATE. The VACUUM FULL impl changed to a much better one in 9.0.
Brad, just to let you know. I've looked fairly deeply into a very similar question.
Related question: 30 tables with few rows - TRUNCATE the fastest way to empty them and reset attached sequences?
Please also look at this issue and this pull request:
https://github.com/bmabey/database_cleaner/issues/126
https://github.com/bmabey/database_cleaner/pull/127
Also this thread: http://archives.postgresql.org/pgsql-performance/2012-07/msg00047.php
I am sorry for writing this as an answer, but I didn't find any comment links, maybe because there are too much comments already there.
I've encountered similar issue lately, i.e.:
The time to run test suite which used DatabaseCleaner varied widely between different systems with comparable hardware,
Changing DatabaseCleaner strategy to :deletion provided ~10x improvement.
The root cause of the slowness was a filesystem with journaling (ext4) used for database storage. During TRUNCATE operation the journaling daemon (jbd2) was using ~90% of disk IO capacity. I am not sure if this is a bug, an edge case or actually normal behaviour in these circumstances. This explains however why TRUNCATE was a lot slower than DELETE - it generated a lot more disk writes. As I did not want to actually use DELETE I resorted to setting fsync=off and it was enough to mitigate this issue (data safety was not important in this case).
A couple of alternate approaches to consider:
Create a empty database with static "fixture" data in it, and run the tests in that. When you are done, just just drop the database, which should be fast.
Create a new table called "test_ids_to_delete" that contains columns for table names and primary key ids. Update your deletion logic to insert the ids/table names in this table instead, which will be much faster than running deletes. Then, write a script to run "offline" to actually delete the data, either after a entire test run has finished, or overnight.
The former is a "clean room" approach, while latter means there will be some test data will persist in database for longer. The "dirty" approach with offline deletes is what I'm using for a test suite with about 20,000 tests. Yes, there are sometimes problems due to having "extra" test data in the dev database but at times. But sometimes this "dirtiness" has helped us find and fixed bug because the "messiness" better simulated a real-world situation, in a way that clean-room approach never will.