How to resume on error during csv import in Postgresql - postgresql

I'm using pgadminIII to run the queries. How to I continue the import process and output the errors to a file with a copy command?
copy my_db FROM E'D:\\my_textfile.txt' WITH CSV HEADER DELIMITER ';';

You can't, as Sam stated, but you can use external tool - pgloader which has this capability.

You can't. The COPY command is a single transaction so either all the data will get imported or none of it will. If you want to import data and not exit on errors, then you will need to use individual INSERT statements. That's the tradeoff with COPY. It's more efficient because it is a single transaction, but it requires that your data be error-free to succeed.

Related

How to update the Postgresql using CSV file multiple times

I have a CSV file whose data is to be imported to Postgres database , I did it using import function in pgadmin III but the problem is my CSV file changes frequently so how to import the data overwriting the already existing data in database from CSV file ?
You can save WAL logging through an optimization between TRUNCATE/COPY in the same transaction. The basic idea is to wipe the database table with TRUNCATE and reimport the data with COPY. This doesn't need to be done manually with pgAdmin each time. It can be scripted with something like:
BEGIN;
-- The CSV file is 'mydata.csv' and the table is 'mydata'.
TRUNCATE mydata;
COPY mydata FROM 'mydata.csv' WITH (FORMAT csv);
COMMIT;
Note that it requires superuser access to work. The COPY command also takes various arguments, so you can adjust for different settings for null and headers etc.
Finally it should be noted that you ideally want these both to be in the same transaction. I'm not going to over-complicate this example here though as this level of care isn't needed in many of the real-world sorts of cases where one is copying in a CSV file. If you think your situation needs it, it's not too hard to track down.

How to improve import speed on SQL Workbench/J

Tried like below, but it imports terribly slow, with speed 3 rows/sec
WbImport -file=c:/temp/_Cco_.txt
-table=myschema.table1
-filecolumns=warehouse_id,bin_id,cluster_name
---deleteTarget
-batchSize=10000
-commitBatch
WbInsert can use the COPY API of the Postgres JDBC driver.
To use it, use
WbImport -file=c:/temp/_Cco_.txt
-usePgCopy
-table=myschema.table1
-filecolumns=warehouse_id,bin_id,cluster_name
The options -batchSize and -commitBatch are ignored in that case, so you should remove them.
SQL Workbench/J will then essentially use the equivalent of a COPY ... FROM STDIN. That should be massively faster than regular INSERT statements.
This requires that the input file is formatted according to the requirements of the COPY command.
WbImport uses INSERT to load data. This is the worst way to load data into Redshift.
You should be using the COPY command for this as noted in the Redshift documentation:
"We strongly recommend using the COPY command to load large amounts of data. Using individual INSERT statements to populate a table might be prohibitively slow."

Export CSV from Mainframe DB2 in batch mode

how can I export in a CSV file the result of a SELECT query from Mainframe DB2 in Batch mode?
I have tried the FILE MANAGER online mode and it works but I need to use the batch mode for a better performance.
I can also use ISQL but I don't know which parameters I have to use to create a CSV file.
Thanks
If all else fails and you don't mind a little programming then coding your own program that runs the query and writes CSV is EXTREMELY easy.
I mention this because this might be better for you than relying on some tool.
As you're looking for improved performance I'd suggest you CALL the DSNUTILU stored procedure with the UNLOAD utility using DELIMITED COLDEL ',' and SHRLEVEL CHANGE ISOLATION UR parameters for CSV and to maximise concurrency on your DB2 for z/OS table. There are many other option depending on your requirements.
For reference refer to DSNUTILU stored procedure and Syntax and options of the UNLOAD control statement
On iserie you have the CPYTOIMPF command, may be on zos too

Command Line Interface (CLI) for SQLDeveloper

I rely on SQLDeveloper to edit and export a schema.
It works like a charm, and I can run import with sqlplus.
I have tried using sqlplus to generate the same schema export, with no result.
I cannot use the Oracle expdp tool, because I need an ASCII file to be able to diff it.
So the only option I have is SQLDeveloper.
I would like to automate the export (data + DDL) with a cron job on a Linux box, but I can't find a way to use SQLDeveloper from a command line to generate the export.
Any clue?
Short answer: no.
For just the schema side of things you may want checkout show create table equivalent in oracle sql which will get you the SQL source of the DDL.
Are you sure you want an ASCII file for the automated export of an entire DB though? I would be surprised if you really want to diff an entire export of a DB. This SO Answer may help a little though.
If you really want to get a full data dump plus DDL you will have to write your own script that gets the DDL as described in the first link and then select * and process each result into a sql insert.

export/import all the information of a table

For a mandatory assignment of a DB2 class I'm asked to write o procedure to export "export information about all xxx, delete all xxx and import the information again." where xxx is my table.
This procedure has to be as efficiently as possible.
I'm quite stuck here, quite naively I see two options
1) write a select * from xxx; drop ...; insert; using python or something
2) using some export/import utility of db2
But I can be totally wrong, suggestions?
what I've noticed is that there are not integrity constraints.
You can do that via "export/load/set integrity". I think it is the best way if you execute that in the server.
If you use python, you will have to use a odbc driver or similar to get the data, processes, etc.
If you use python just to execute the commands, it is ok, finally, it is just a call to the database.
If you execute the process in other machine, the net use is increased, and the performance is lower.
Using import, it is just like an "insert" per row in the file which uses a lot of transaction log. Instead, the load command, puts the data diretly in the tablespace and then check the referential integrity (faster process)
Finally, if you want to extract the information very fast, you can buy the IBM InfoSphere® Optim™ High Performance Unload for DB2 for Linux, UNIX and Windows
I have had a similar task before.
The solution is simple and sweet:
A simple export to csv; and once the data has been exported, the main thing is to TRUNCATE the table with your logs being disabled and then load the data back into the table.
EXPORT TO <FileName>.CSV OF DEL SELECT * FROM <TableName>;
ALTER TABLE <TableName> ACTIVATE NOT LOGGED INITIALLY WITH EMPTY TABLE;
LOAD FROM "./<FileName>.CSV" OF DEL INSERT INTO <TableName>;