I have DB2 database version 11.5 running in linux.
I am trying to measure speed of large updates - 1,000,000 rows each update.
Example:
UPDATE test.test1 SET col1= 'X'||col2||'A';
The problem is that the timing is very different every time I execute this update.
The timing varies from 2.2 sec to 7.8 sec.
What can I do to have consistent timing every time I run the update?
Additional info :
The server that runs db2 does nothing else and I am the only session in the database, so it must be some db2 related behaviour.
There are no indexes/constraints/triggers/FK's on the table
The full structure of the table is:
CREATE TABLE "TEST"."TEST1" (
"COL1" VARCHAR(128 OCTETS) NOT NULL ,
"COL2" VARCHAR(128 OCTETS) NOT NULL ,
"COL3" VARCHAR(128 OCTETS) NOT NULL )
IN "USERSPACE1" ORGANIZE BY ROW
This started off as a comment, but then seemed possibly worthy as an answer. The performance of your DB2 instance doesn't only depend on the processes running within DB2 (such as triggers, other sessions, etc.). It also depends on the other processes running on the OS which is running DB2. The slight difference in performance you are seeing could easily be explained by demands being made on the OS from other processes besides DB2. So, you might want to load your OS profiler and observe what happens during each update test in your DB2.
Related
I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html
PostgreSQL 9.5.
A very small update SQL uses very high CPU for a long time like a hang.
My Windows console application uses a simple UPDATE statement to update the latest time as follows.
UPDATE META_TABLE SET latest_time = current_timestamp WHERE host = 'MY_HOST'
There are just 2 console applications which issue above SQL.
No index on META_TABLE.
Only 1 row.
When it is hanging, no lock information.
UNLOGGED Table.
IDLE status in pg_stat_activity
Commit after UPDATE.
During that the hang time, I can INSERT or DELETE data with the table above.
After starting the application, about in 20 minutes, this issue happens.
I think it is not a SQL statement or table structure issue, probably something wrong in a database side.
Can you guess anything to resolve this issue?
Update
There are 2 db connections in the console application. 1 for Select & 1 for DML.
I tried to close DML DB connection every 2 minutes. Then, I haven't seen the issue!! However, the hang issue happened on SELECT statement (also very simple SELECT).
It seems that there is a some limit per the session.
Now, I am also closing the Select db connection as well per 3 minutes and monitoring.
Context:
Using PostgreSQL (9.6), for a custom synchronisation project, we have an agent that make a lot of INSERTs between a database_1 and database_2 when syncing data.
For example: DB2 is down during 5 minutes, there are 40,000 new lines in DB1, so when DB2 is up again, all the 40,000 lines will be immediately synced from DB1 to DB2.
All this works great.
Problem/Fact:
During the synchronisation, the INSERT rate is around 1000 lines / second.
However, when we do a simple SELECT count(*) FROM table during the sync (in the middle of these thousands of INSERTs), we noticed that the INSERT rate is falling town to a few dozens per second (instead of 1000x per second).
Question:
Is there any reason why a SELECT operation (made inside pgAdmin, by another process than the syncing process) is slowing down the batch of INSERT ?
Any locking or internal reason that might explain this?
Or should I provide more information? How can I debug more?
Hints:
Logs are fully activated and all the INSERTs always take around 0.700ms (before slowdown and same during slowdown), it doesn't change.
INSERTs are currently performed one row by one row
(I'll be happy to provide more information)
Am newbie in PostgreSQL(Version 9.2) Database development. While looking one of my table a saw an option called autovaccum.
many of my table contains 20000+ rows.For testing purpose I've altered one of that table like below
ALTER TABLE theTable SET (
autovacuum_enabled = true
);
So,I wish to know the benefits/advantages/disadvantages(if any) autovacuuming a table ?
Autovacuum is enabled by default in current versions of Postgres (and has been for a while). It's generally a good thing to have enabled for performance and other reasons.
Prior to autovacuuming, you would need to explicitly vacuum tables yourself (via cronjobs which executed psql commands to vacuum them, or similar) in order to get rid of dead tuples, etc. Postgres has for a while now managed this for you via autovacuum.
I have in some cases, with tables that have immense churn (i.e. very high rates of insertions and deletions) found it necessary to still explicitly vacuum via a cron in order to keep the dead tuple count low and performance high, because the autovacuum doesn't kick in fast enough, but this is something of a niche case.
More info: http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html
I have given a postgres 9.2 DB around 20GB of size.
I looked through the database and saw that it has been never run vacuum and/or analyze on any tables.
Autovacuum is on and the transaction wraparound limit is very far (only 1% of it).
I know nothing about the data activity (number of deletes,inserts, updates), but I see, it uses a lot of index and sequence.
My question is:
does the lack of vacuum and/or analyze affect data integrity (for example a select doesn't show all the rows matches the select from a table or from an index)? The speed of querys and writes doesn't matter.
is it possible that after the vacuum and/or analyze the same query gives a different answer than it would executed before the vacuum/analyze command?
I'm fairly new to PG, thank you for your help!!
Regards,
Figaro88
Running vacuum and/or analyze will not change the result set produced by any select operation (unless there was a bug in PostgreSQL). They may effect the order of results if you do not supply an ORDER BY clause.