I have an iPhone application that is a data load harness to pre-populate database with data that will be shipped in a separate application. When I kick the program off I am reading from an XML file and the records are inserted into the database.
Everytime I hit the 247th record in the list the database then returns an error 14 database not found. If I stop the program, remove the 247 entries from the XML file that were just inserted and I restart the program again... the same thing happens. The next 247 records will be inserted then a failure with error 14.
I have over 30,000 records to load. Loading these 247 records at a time is not really a good option.
Any ideas on what is wrong?
DB2 has functionality in its IMPORT and LOAD commands that allow a commitlevel. It may not be the best answer, but check the docs to see if you have a way to commit every 200 rows or so. This way it's hands-free.
I have no idea what's wrong there, but maybe you can automatically close and reopen the database connection every 200 records to work around it.
How are you managing transactions? You probably don't want to insert everything in a single block, but then neither do you want to add every record in a separate block.
Related
I need to process millions of records coming from MongoDb and put a ETL pipeline to insert that data into a PostgreSQL database. However, in all the methods I've tried, I keep getting the out memory heap space exception. Here's what I've already tried -
Tried connecting to MongoDB using tMongoDBInput and put a tMap to process the records and output them using a connection to PostgreSQL. tMap could not handle it.
Tried to load the data into a JSON file and then read from the file to PostgreSQL. Data got loaded into JSON file but from there on got the same memory exception.
Tried increasing the RAM for the job in the settings and tried the above two methods again, still no change.
I specifically wanted to know if there's any way to stream this data or process it in batches to counter the memory issue.
Also, I know that there are some components dealing with BulkDataLoad. Could anyone please confirm whether it would be helpful here since I want to process the records before inserting and if yes, point me to the right kind of documentation to get that set up.
Thanks in advance!
As you already tried all the possibilities the only way that I can see to do this requirement is breaking done the job into multiple sub-jobs or going with incremental load based on key columns or date columns, Considering this as a one-time activity for now.
Please let me know if it helps.
I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html
We have been using BigQuery for over a year now with no issues. We load data as batch jobs every few hours and it usually is instantly available.
We just started experimenting with streaming inserts using template tables. With our first test, we saw no errors and the data showed up instantly. The test created approximately 120 tables. A simple select count (using the web ui) on the tables came up with the right total number of ~8000 rows. After a couple of hours of more streaming, the total dropped to ~1400 rows.
Unsure about what happened, we dropped the dataset, recreated the template table and re-ran the streaming. This time around, the tables showed up right away but the data did not. On our third attempt the tables themselves did not show up for more than a couple of hours. We are on the fourth attempt and this time we only streamed data belonging to one table. The table showed up right away, but it has been over an hour and the data does not show up.
The streaming service uses the latest Java library, inserts only one record at a time and logs the response. The response, without an exception is always {"kind":"bigquery#tableDataInsertAllResponse"} and no errors.
Any help trying to understand what is happening would be great. Thanks.
Looks like we've identified the issue. It appears there's a race in the template-tables path only that causes our system to think the first chunk of data was deleted by user action (table truncation -- which it obviously wasn't), and is dropped. We've identified the fix and will attempt to push out a fix shortly.
Thanks for letting us know!
I have a .csv file, comma-delimited (located at C:/). I am using the DB2 LOAD utility to load data present in the CSV file in a DB2 table.
LOAD CLIENT FROM C:\Users\somepath\FileName.csv of del
MODIFIED BY NOCHARDEL COLDEL, insert into SchemaName.TABLE_NAME;
CSV file has 25 rows. After the utility completed I got an error message for NOCHARDEL. My table has all 25 rows properly loaded. Now when I try to execute an insert/update/delete statement on any of the tables present in that schema I am getting following error.
Lookup Error - DB2 Database Error: ERROR [55039] [IBM][DB2/AIX64] SQL0290N Table space access is not allowed.
Could you please help me whether I am making any mistake or missing a parameter that is causing lock on the table.
Earlier while loading the file similar situation occurred, where DBA confirmed that Table space in question is in “load in progress” state
Changes generated by the DB2 LOAD utility are not logged (one of the side-effects of its high performance). If the database crashes immediately after the load it will be impossible to recover the table that was loaded by replaying log records, because there are no such records. For this reason the tablespace containing the loaded table is automatically placed in the BACKUP PENDING mode, forcing you to take a backup of that tablespace or the entire database to ensure it is fully recoverable.
There are options that you can specify for the LOAD command that can help you avoid this situation in the future:
NONRECOVERABLE -- this option does not place the tablespace into the BACKUP PENDING mode, but, as its name implies, the table you're loading to becomes non-recoverable in case of a crash, and your only option in that situation will be to drop and re-create the table.
COPY YES -- this option creates a copy of the table prior to loading, which can be used to recover the table to its pre-LOAD state in case of a crash.
If you are only loading 25 records, I suggest you use the IMPORT utility instead -- it does not have these restrictions because it is fully logged (at the price of lower performance, which for 25 records won't matter).
Thanks #mustaccio. I had 60 Million rows to insert. I was using 25 as sample to check the outcome.
To add another point, we later came to know that this is a known DB2 bug that keeps the load in progress state (DB2 is unable to acknowledge that the load has completed and the session remains open indefinitely) and place the table space in backup pending state.
Recovery is the only option to release the table space once it is in pending state.
This issue is fixed in fix pack 10 as per the DB2 team (we are yet to deploy and test). Mean while NONRECOVERABLE key word is working fine for us
The reason why your table is stuck in the LOAD IN PROGRESS state is the NOCHARDEL error happening at the end of the LOAD.
Have you tried restarting the database? This should reinitialize all table spaces and remove any rogue states.
http://www-01.ibm.com/support/docview.wss?uid=swg1IC65395
http://www-01.ibm.com/support/docview.wss?uid=swg21427102
I have a simple job that reads .csv file, converts data from this file through tMap, and writes data from file into DB.
If an error in .csv file is found, line containing error will be escaped and all other data will be written into DB.
If die on error is checked, writing into DB will abort when line with error has been reached.
What should I do if I want that either ALL data is written into DB if there's no error, or NONE of data is written if there's at least one error?
Thanks in advance!
As #Ryan mentioned, the usual standard is to use a transaction. If this isn't possible for some reason (I thought I'd heard/seen something about a row-lock limit per-transaction), consider dumping the results into a temporary copy of your actual table. If no errors occur, add it to the production table. If errors occur, pop an error message and drop the (temporary) table.
You should use a transaction. That way you can roll it back if there is an error.
Exactly how you go about implementing a transaction depends on the database you're using. Which is it?