File loading issues in DB2 using Load utility - db2

I have a .csv file, comma-delimited (located at C:/). I am using the DB2 LOAD utility to load data present in the CSV file in a DB2 table.
LOAD CLIENT FROM C:\Users\somepath\FileName.csv of del
MODIFIED BY NOCHARDEL COLDEL, insert into SchemaName.TABLE_NAME;
CSV file has 25 rows. After the utility completed I got an error message for NOCHARDEL. My table has all 25 rows properly loaded. Now when I try to execute an insert/update/delete statement on any of the tables present in that schema I am getting following error.
Lookup Error - DB2 Database Error: ERROR [55039] [IBM][DB2/AIX64] SQL0290N Table space access is not allowed.
Could you please help me whether I am making any mistake or missing a parameter that is causing lock on the table.
Earlier while loading the file similar situation occurred, where DBA confirmed that Table space in question is in “load in progress” state

Changes generated by the DB2 LOAD utility are not logged (one of the side-effects of its high performance). If the database crashes immediately after the load it will be impossible to recover the table that was loaded by replaying log records, because there are no such records. For this reason the tablespace containing the loaded table is automatically placed in the BACKUP PENDING mode, forcing you to take a backup of that tablespace or the entire database to ensure it is fully recoverable.
There are options that you can specify for the LOAD command that can help you avoid this situation in the future:
NONRECOVERABLE -- this option does not place the tablespace into the BACKUP PENDING mode, but, as its name implies, the table you're loading to becomes non-recoverable in case of a crash, and your only option in that situation will be to drop and re-create the table.
COPY YES -- this option creates a copy of the table prior to loading, which can be used to recover the table to its pre-LOAD state in case of a crash.
If you are only loading 25 records, I suggest you use the IMPORT utility instead -- it does not have these restrictions because it is fully logged (at the price of lower performance, which for 25 records won't matter).

Thanks #mustaccio. I had 60 Million rows to insert. I was using 25 as sample to check the outcome.
To add another point, we later came to know that this is a known DB2 bug that keeps the load in progress state (DB2 is unable to acknowledge that the load has completed and the session remains open indefinitely) and place the table space in backup pending state.
Recovery is the only option to release the table space once it is in pending state.
This issue is fixed in fix pack 10 as per the DB2 team (we are yet to deploy and test). Mean while NONRECOVERABLE key word is working fine for us

The reason why your table is stuck in the LOAD IN PROGRESS state is the NOCHARDEL error happening at the end of the LOAD.
Have you tried restarting the database? This should reinitialize all table spaces and remove any rogue states.
http://www-01.ibm.com/support/docview.wss?uid=swg1IC65395
http://www-01.ibm.com/support/docview.wss?uid=swg21427102

Related

PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest

We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day.
Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!
We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.
Usually the files with low index names (e.g. 391089.1) change on read access.
On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.
Some more facts:
Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7
We see the phenomenon on multiple servers.
Can someone explain, why postgresql changes data files on pure read access?
We tested on a pure database without any other resources accessing the database.
This is normal. Some cases I can think of right away are:
a SELECT or other SQL statement setting a hint bit
This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions
These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions
This is necessary to prevent data corruption if the transaction ID counter wraps around.
The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:
never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions

DB2 Tables Not Loading when run in Batch

I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html

Online backup blocking truncate table

It´s documented that in DB2 the TRUNCATE statement is not compatible with online backup because it gets a Z lock on the table and prevents an online backup from running concurrently.
The lock wait happens when a truncate tries to get a shared lock on an internal online backup object.
Since this is by design in the product I will have to go for workarounds, so this thread is not about a solution, but why they can´t work together. I didn´t find a reasonable explanation why there is such limitation in db2.
Any insights?
Thanks,
Luciano Moreira
from http://www.ibm.com/developerworks/data/library/techarticle/dm-0501melnyk/
When a table holds a Z lock, no concurrent application can read or
update data in that table.
So now we know that a Z lock is and exclusive access to a table denying read and write access to the table.
from http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0053474.html
Exclusive Access: No other session can have a cursor open on the table, or a lock held on the table (SQLSTATE 25001).
from https://sites.google.com/site/umeshdanderdbms/difference-between-truncate-and-delete
Delete is logging operation, where as Truncate is makes the table empty on container level.
(Logging operation – DML operation are logged into logs (redo log in oracle, transaction log in DB2 etc). It is stored in logs for commit or rollback operation.)
This is the most interesting part. Truncate just 'forgets' the content of the table whereas deletes removes line by line processing all triggers, bells, and whistles. Therefore when you truncate all reading cursors will get invalid. To prevent stupid stuff like that you can only completely empty a table when nobody tries to access it. Online backup obviously needs to read the table. Therefore it is not possible to have both accessing the same table at the same time.

Recover DB2 table after CLI load mode on is set

I need to load large amount of data on a table in DB2 database. I am using CLI load mode on a table written in C using SQLSetStmtAttr function. Select statements does not work (the table gets locked) when it is set.
When the loading of the data completes I am doing load mode off. After that the table becomes accessible so that i can perform select from db2 command line tools (or control center).
But the problem is when my C program crashes or fails before doing load mode off. The table is always locked. I have to drop the table and all previous data is lost.
My question is whether there is a way to recover the table?
DBMS Documentation is your friend. You can read the description of SQL0668N (or any other error!) to find out what reason code 3 means, as well as how to fix it.
Basically, when a LOAD operation fails, you need to perform some cleanup on the table – either restart or terminate it. This can be done using the LOAD utility from outside of your program (e.g., LOAD from /dev/null of del TERMINATE into yourtable nonrecoverable) but you can also do it programatically.
Typically you would do this using the db2Load() API, and setting the piLongActionString member of the db2LoadStruct parameter you pass to db2Load(), with the same RESTART or TERMINATE operation.
It looks like you can set the SQL_ATTR_LOAD_INFO statement to the same db2LoadStruct when using a CLI Load, too, but I am not sure if this would actually work to complete a load restart / terminate.

How do I ALTER a set of partitioned tables in Postgres?

I created a set of partitioned tables in Postgres, and started inserting a lot of rows via the master table. When the load process blew up on me, I realized I should have declared the id row BIGSERIAL (BIGINT with a sequence, behind the scenes), but inadvertently set it as SERIAL (INTEGER). Now that I have a couple of billion rows loaded, I am trying to ALTER the column to BIGINT. The process seems to be working, but is taking a long time. So, in reality, I don't really know if it is working or it is hung. I'd rather not restart the entire load process again.
Any suggestions?
When you update a row to alter it in PostgreSQL, that writes out a new copy of the row and then does some cleanup later to remove the original. This means that trying to fix the problem by doing updates can take longer than just loading all the data in from scratch again--it's more disk I/O than loading a new copy, and some extra processing time too. The only situation where you'd want to do an update instead of a reload is when the original load was very inefficient, for example if a slow client programs is inserting the data and it's the bottleneck on the process.
To figure out if the process is still working, see if it's using CPU when you run top (UNIX-ish systems) or the Task Manager (Windows). On Linux, "top -c" will even show you what the PostgreSQL client processes are doing. You probably just expected it to take less time than the original load, which it won't, and it's still running rather than hung up.
Restart it (clarifying edit: restart the entire load process again).
Altering a column value requires a new row version, and all indexes pointing to the old version to be updated to point to the new version.
Additionally, see how much of the advise on populating databases you can follow.
Correction from #archnid:
altering the type of the column will trigger a table rewrite, so the row versioning isn't a big problem, but it will still take lots of disk space temporarily. you can usually monitor progress by looking at which files in the database directory are being appended to...