I have a simple job that reads .csv file, converts data from this file through tMap, and writes data from file into DB.
If an error in .csv file is found, line containing error will be escaped and all other data will be written into DB.
If die on error is checked, writing into DB will abort when line with error has been reached.
What should I do if I want that either ALL data is written into DB if there's no error, or NONE of data is written if there's at least one error?
Thanks in advance!
As #Ryan mentioned, the usual standard is to use a transaction. If this isn't possible for some reason (I thought I'd heard/seen something about a row-lock limit per-transaction), consider dumping the results into a temporary copy of your actual table. If no errors occur, add it to the production table. If errors occur, pop an error message and drop the (temporary) table.
You should use a transaction. That way you can roll it back if there is an error.
Exactly how you go about implementing a transaction depends on the database you're using. Which is it?
Related
I have a trigger on a table in Informix 11.7 that writes to a log file all updates and deletes of a master data file. The log file has a self-generating sequence number on it to prove that no records have been deleted from the log. But if the trigger crashes for any reason, the sequence number does not roll back and a gap in the sequences appears making it look like logs have been deleted. All I want to do is be made aware that the trigger crashed, so that I can explain the gaps in the sequence numbers. But any file I write to gets rolled back and I don't think I can write to an ascii file from a trigger. Can I make it write to the online.log or any other ascii file that doesn't get rolled back? or email that it has crashed? Any bright ideas?
I would probably create a RAW table to hold the log records — or the messages. A raw table is not subject to transactions; entries inserted are not rolled back even if the transaction that created the entry is rolled back.
I don't think an external table will help this time, but I stand to be proved wrong.
I have a scenario where I have a lot of files in a CSV file i need to do operations on. The script needs to be able to handle if script is stopped or failed, then it should continue where i stopped from. In a database scenario this would be fairly simple. I would have an updated column and update that when operation for the line has completed. I have looked if I somehow could update the CSV on the fly, but I dont think that is possible. I could start having multiple files, but not that elegant. Can anyone recommend some kind of simple file based DB like framework? Where I from PowerShell could create a new database file (maybe json) and read from it and update on the fly.
If your problem is really so complex, that you actually need somewhat of a local database solution, then consider to go with SQLite which was built for such scenarios.
In your case, since you process an CSV row-by-row, I assume storing the info for the current row only will be enough. (Line number, status etc.)
I am using PostgreSQL db in Ubuntu. I got to know about WAL logs and pg_xlogdump. I used pg_xlogdump to print the WAL logs on screen. But I have no idea how to interpret the response and to know what transactions were made.
To read and understand pg_xlogdump output, you must know something about PostgreSQL's internals.
The desc will tell you what operation took place (something like NEXTOID or INSERT_LEAF) as well as further details (e.g., what the next Oid was or at which offset in which block of which file the new index entry was added).
You'll have to read the source to understand the different entry types.
For Amazon RedShift, usually data are loaded from S3 using 'copy' command. I want to know if the command is atomic or not. E.g. is it possible that in some exceptional cases that only part of the data file is loaded into RedShift table?
The COPY command with default options is atomic. If the file includes an invalid line that can cause a load failure, the COPY transaction will be rollbacked and no data is imported.
If you want to skip invalid lines and not to stop the transaction, you can use the MAXERROR option for COPY command that ignores invalid lines. Here is the example that ignores up to 100 invalid lines.
COPY table_name from 's3://[bucket-name]/[file-path or prefix]' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' DELIMITER '\t' MAXERROR 100;
If the number of invalid lines is more than MAXERROR error count(100), the transaction will be rollbacked.
See the following link for the details of COPY command.
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
You can use the flag of NOLOAD to check for errors before loading the data. This is a faster way to validate the format of your data as it doesn't try to load any data, just parse it.
You can define how many errors you are willing to tolerate with MAXERROR flag
If you have more than the MAXERROR count, your load will fail and no record is added.
See more information here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
I have an iPhone application that is a data load harness to pre-populate database with data that will be shipped in a separate application. When I kick the program off I am reading from an XML file and the records are inserted into the database.
Everytime I hit the 247th record in the list the database then returns an error 14 database not found. If I stop the program, remove the 247 entries from the XML file that were just inserted and I restart the program again... the same thing happens. The next 247 records will be inserted then a failure with error 14.
I have over 30,000 records to load. Loading these 247 records at a time is not really a good option.
Any ideas on what is wrong?
DB2 has functionality in its IMPORT and LOAD commands that allow a commitlevel. It may not be the best answer, but check the docs to see if you have a way to commit every 200 rows or so. This way it's hands-free.
I have no idea what's wrong there, but maybe you can automatically close and reopen the database connection every 200 records to work around it.
How are you managing transactions? You probably don't want to insert everything in a single block, but then neither do you want to add every record in a separate block.