Is 'copy' command in Amazon RedShift atomic or not?

Is 'copy' command in Amazon RedShift atomic or not? - amazon-redshift

For Amazon RedShift, usually data are loaded from S3 using 'copy' command. I want to know if the command is atomic or not. E.g. is it possible that in some exceptional cases that only part of the data file is loaded into RedShift table?

The COPY command with default options is atomic. If the file includes an invalid line that can cause a load failure, the COPY transaction will be rollbacked and no data is imported.
If you want to skip invalid lines and not to stop the transaction, you can use the MAXERROR option for COPY command that ignores invalid lines. Here is the example that ignores up to 100 invalid lines.
COPY table_name from 's3://[bucket-name]/[file-path or prefix]' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' DELIMITER '\t' MAXERROR 100;
If the number of invalid lines is more than MAXERROR error count(100), the transaction will be rollbacked.
See the following link for the details of COPY command.
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

You can use the flag of NOLOAD to check for errors before loading the data. This is a faster way to validate the format of your data as it doesn't try to load any data, just parse it.
You can define how many errors you are willing to tolerate with MAXERROR flag
If you have more than the MAXERROR count, your load will fail and no record is added.
See more information here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Related

How to pass variable in Load command from IBM Object Storage file to Cloud DB2

I am using below command to load Object Storage file into DB2 table:NLU_TEMP_2.
CALL SYSPROC.ADMIN_CMD('load from "S3::s3.jp-tok.objectstorage.softlayer.net::
<s3-access-key-id>::<s3-secret-access-key>::nlu-test::practice_nlu.csv"
OF DEL modified by codepage=1208 coldel0x09 method P (2) WARNINGCOUNT 1000
MESSAGES ON SERVER INSERT into DASH12811.NLU_TEMP_2(nlu)');
above command inserts 2nd column from object storage file to DASH12811.NLU_TEMP_2 in nlu column.
I want to insert request_id from variable as a additional column:request_id in DASH12811.NLU_TEMP_2(request_id,nlu).
I read in some article to use statement concentrator literals to dynamically pass a value. Please let us know if anyone has an idea on how to use it.
Note, i would be using this query in DB2 but not DB2 warehouse. External tables wont work in DB2.

LOAD does not have any ability to include extra values that are not part of the load file. You can try to do things with columns that are generated by default in Db2 but it is not a good solution.
Really you need to wait until DB2 on Cloud supports external tables

How to see the actual sql statements executed by POSTGRES?

I want to log the actual sql statements executed against a POSTGRES instance. I am aware that I can enable logging of the sql statements. Unfortunately, this doesn't log the actual sql, but rather a parsed version, with certain parameters stripped out and listed separately.
Is there a tool for reliably reconstituting this output into executable sql statements?
Or is there a way of intercepting the sql that is send to the postgres instance, such that that sql can be logged?
We want to be able to replay these sql statements against another database.
Thanks for your help!

Actually, PostgreSQL does log exactly the SQL that got executed. It doesn't strip parameters out. Rather, it doesn't interpolate them in, it logs what the application sent, with bind parameters separate. If your app sends insert into x(a,b) values ($1, $2) with bind params 42 and 18, that's what gets logged.
There's no logging option to interpolate bind parameters into the query string.
Your last line is the key part. You don't want logging at all. You're trying to do statement based replication via the logs. This won't work well, if at all, due to volatile functions, the search_path, per-user settings, sequence allocation order/gap issues, and more. If you want replication don't try to do it by log parsing.
If you want to attempt statement-based replication look into PgPool-II. It has a limited ability to do so, with caveats aplenty.

Via setting log_statement to all on postgresql.conf. See the documentation for runtime-config-logging

File loading issues in DB2 using Load utility

I have a .csv file, comma-delimited (located at C:/). I am using the DB2 LOAD utility to load data present in the CSV file in a DB2 table.
LOAD CLIENT FROM C:\Users\somepath\FileName.csv of del
MODIFIED BY NOCHARDEL COLDEL, insert into SchemaName.TABLE_NAME;
CSV file has 25 rows. After the utility completed I got an error message for NOCHARDEL. My table has all 25 rows properly loaded. Now when I try to execute an insert/update/delete statement on any of the tables present in that schema I am getting following error.
Lookup Error - DB2 Database Error: ERROR [55039] [IBM][DB2/AIX64] SQL0290N Table space access is not allowed.
Could you please help me whether I am making any mistake or missing a parameter that is causing lock on the table.
Earlier while loading the file similar situation occurred, where DBA confirmed that Table space in question is in “load in progress” state

Changes generated by the DB2 LOAD utility are not logged (one of the side-effects of its high performance). If the database crashes immediately after the load it will be impossible to recover the table that was loaded by replaying log records, because there are no such records. For this reason the tablespace containing the loaded table is automatically placed in the BACKUP PENDING mode, forcing you to take a backup of that tablespace or the entire database to ensure it is fully recoverable.
There are options that you can specify for the LOAD command that can help you avoid this situation in the future:
NONRECOVERABLE -- this option does not place the tablespace into the BACKUP PENDING mode, but, as its name implies, the table you're loading to becomes non-recoverable in case of a crash, and your only option in that situation will be to drop and re-create the table.
COPY YES -- this option creates a copy of the table prior to loading, which can be used to recover the table to its pre-LOAD state in case of a crash.
If you are only loading 25 records, I suggest you use the IMPORT utility instead -- it does not have these restrictions because it is fully logged (at the price of lower performance, which for 25 records won't matter).

Thanks #mustaccio. I had 60 Million rows to insert. I was using 25 as sample to check the outcome.
To add another point, we later came to know that this is a known DB2 bug that keeps the load in progress state (DB2 is unable to acknowledge that the load has completed and the session remains open indefinitely) and place the table space in backup pending state.
Recovery is the only option to release the table space once it is in pending state.
This issue is fixed in fix pack 10 as per the DB2 team (we are yet to deploy and test). Mean while NONRECOVERABLE key word is working fine for us

The reason why your table is stuck in the LOAD IN PROGRESS state is the NOCHARDEL error happening at the end of the LOAD.
Have you tried restarting the database? This should reinitialize all table spaces and remove any rogue states.
http://www-01.ibm.com/support/docview.wss?uid=swg1IC65395
http://www-01.ibm.com/support/docview.wss?uid=swg21427102

All Data Or None

I have a simple job that reads .csv file, converts data from this file through tMap, and writes data from file into DB.
If an error in .csv file is found, line containing error will be escaped and all other data will be written into DB.
If die on error is checked, writing into DB will abort when line with error has been reached.
What should I do if I want that either ALL data is written into DB if there's no error, or NONE of data is written if there's at least one error?
Thanks in advance!

As #Ryan mentioned, the usual standard is to use a transaction. If this isn't possible for some reason (I thought I'd heard/seen something about a row-lock limit per-transaction), consider dumping the results into a temporary copy of your actual table. If no errors occur, add it to the production table. If errors occur, pop an error message and drop the (temporary) table.

You should use a transaction. That way you can roll it back if there is an error.
Exactly how you go about implementing a transaction depends on the database you're using. Which is it?

How to resume on error during csv import in Postgresql

I'm using pgadminIII to run the queries. How to I continue the import process and output the errors to a file with a copy command?
copy my_db FROM E'D:\\my_textfile.txt' WITH CSV HEADER DELIMITER ';';

You can't, as Sam stated, but you can use external tool - pgloader which has this capability.

You can't. The COPY command is a single transaction so either all the data will get imported or none of it will. If you want to import data and not exit on errors, then you will need to use individual INSERT statements. That's the tradeoff with COPY. It's more efficient because it is a single transaction, but it requires that your data be error-free to succeed.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Is 'copy' command in Amazon RedShift atomic or not? - amazon-redshift

For Amazon RedShift, usually data are loaded from S3 using 'copy' command. I want to know if the command is atomic or not. E.g. is it possible that in some exceptional cases that only part of the data file is loaded into RedShift table?

Related

How to pass variable in Load command from IBM Object Storage file to Cloud DB2

How to see the actual sql statements executed by POSTGRES?

File loading issues in DB2 using Load utility

All Data Or None

How to resume on error during csv import in Postgresql

Categories

Resources