How to ignore errors with psql \copy meta-command - postgresql

I am using psql with a PostgreSQL database and the following copy command:
\COPY isa (np1, np2, sentence) FROM 'c:\Downloads\isa.txt' WITH DELIMITER '|'
I get:
ERROR: extra data after last expected column
How can I skip the lines with errors?

You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.
\copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:
COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will
already have received earlier rows in a COPY FROM. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invoke VACUUM to recover the wasted space.
Bold emphasis mine. And:
COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.
COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.
There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.
Solution
Fix your input file instead.
If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.
Related answers with detailed instructions:
How to update selected rows with values from a CSV file in Postgres?
COPY command: copy only specific columns from csv

It is too bad that in 25 years Postgres doesn't have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.
I had to make a work-around this way:
Copy the original table and call it dummy_original_table
in the original table, create a trigger like this:
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$
DECLARE
v_rec RECORD;
BEGIN
-- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
IF v_rec IS NOT NULL THEN
RETURN NULL;
END IF;
BEGIN
INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
ON CONFLICT DO NOTHING;
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
RETURN NULL;
END;
Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table
psql dbname -c \copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E'\t'

Workaround: remove the reported errant line using sed and run \copy again
Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed and run \copy again, e.g.,
#!/bin/bash
bad_line_number=5 # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv
[per the comment from #Botond_Balázs ]

Here's one solution -- import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:
#!/bin/bash
input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
echo "$input_line" > $tmp_file
psql my_database \
-c "\
COPY my_table \
FROM `$tmp_file` \
DELIMITER '|'\
CSV;\
"
done
Additionally, you could modify the script to capture the psql stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line and the captured stdout/stderr to stdin and/or append it to a file.

Related

export table to csv on postgres

How can I export a table to .csv in Postgres, when I'm not superuser and can't use the copy command?
I can still import the data to postgres with "import" button on the right click, but no export option.
Use psql and redirect stream to file:
psql -U <USER> -d <DB_NAME> -c "COPY <YOUR_TABLE> TO stdout DELIMITER ',' CSV HEADER;" > file.csv
COPY your_table TO '/path/to/your/file.csv' DELIMITER ',' CSV HEADER;
For more details go to this manual
Besides what marvinorez's suggests in his answer you can do, from psql:
\copy your_table TO '/path/to/your/file.csv' DELIMITER ',' CSV HEADER
On the other hand, from pgadmin3, you can also open the table by right clicking on it's name and then selecting View Data. Then you can click on the upper-left corner of the table (where the column name row joins with the row number column, a gray empty square) to select all rows. Finally, you can copy with CtrlC or Edit -> Copy in the menu. The data will be copied to the clipboard in csv format, delimited by semicolon ;.
You can then paste it in LibreOffice Calc, MS Excel to display for instance.
If your table is large (what is large depends on the amount of RAM of your machine, among other things) it might not fit in the clipboard, so in that case, I would not use this method but the first one (\copy).
The easiest way would indeed be a COPY to stdout I think. If you can't do this, how about using pg_dump and then transform the output file with sed, AWK or even a text editor? This should work even with search and replace in an acceptable amount of time :)
I was having trouble with superuser and running psql, I took the simple stupid way using PGAdmin III.
1) SELECT * FROM ;
Before running select Query in the menu bar and select 'Query to File'
This will save it to a folder of your choice. May have to play with the settings on how to export, it likes quoting and ;.
2) SELECT * FROM ;
run normally and then save the output by selecting export in the File menu. This will save as a .csv
This is not a good approach for large tables. Tables I have done this for are a few 100,000 rows and 10-30 columns. Large tables may have problems.

Command to read a file and execute script with psql

I am using PostgreSQL 9.0.3. I have an Excel spreadsheet with lots of data to load into couple of tables in Windows OS.
I have written the script to get the data from input file and Insert into some 15 tables. This can't be done with COPY or Import. I named the input file as DATALD.
I find out the psql command -d to point the db and -f for the script sql. But I need to know the commands how to feed the input file along with the script so that the data gets inserted into the tables..
For example this is what I have done:
begin
for emp in (select distinct w_name from DATALD where w_name <> 'w_name')
--insert in a loop
INSERT INTO tblemployer( id_employer, employer_name,date_created, created_by)
VALUES (employer_id,emp.w_name,now(),'SYSTEM1');
Can someone please help?
For an SQL script you must ..
either have the data inlined in your script (in the same file).
or you need to utilize COPY to import the data into Postgres.
I suppose you use a temporary staging table, since the format doesn't seem to fit the target tables. Code example:
How to bulk insert only new rows in PostreSQL
There are other options like pg_read_file(). But:
Use of these functions is restricted to superusers.
Intended for special purposes.

Printing to the screen in a .sql file in PostgreSQL

I have a .sql file I am building for an upgrade to my application that alters tables, inserts/updates, etc.
I want to write to the screen after every command finishes.
So, for instance if I have something like:
insert into X...
I want to see something like,
Starting to insert into table X
Finished inserting into table X
Is this possible in PostgreSQL?
This sounds like it should be a very easy thing to do, however, I cannot find anywhere how to do it.
If you're just feeding a big pile of SQL to psql then you have a couple of options.
You could run psql with --echo-all:
-a
--echo-all
Print all input lines to standard output as they
are read. This is more useful for script processing than interactive
mode. This is equivalent to setting the variable ECHO to all.
That and the other "echo everything of this type" options (see the manual) are probably too noisy though. If you just want to print things manually, use \echo:
\echo text [ ... ]
Prints the arguments to the standard output, separated by one space and followed by a newline. This can be useful to intersperse information in the output of scripts.
So you can say:
\echo 'Starting to insert into table X'
-- big pile of inserts go here...
\echo 'Finished inserting into table X'
Via an answer to How can I run an ad-hoc script in PostgreSQL?:
DO language plpgsql $$
BEGIN
RAISE NOTICE 'Hello, World!';
END
$$;
Depending on what you're doing, I'd be worried about doing a bunch of anonymous code blocks. You might consider storing the above as a function, and passing in whatever value you want logged.
There's probably a better way to do it. But if you need to use vanilla SQL, try this:
SELECT NULL AS "Starting to insert into table X";
-- big pile of inserts go here...
SELECT NULL AS "Finished inserting into table X";

PostgreSQL: How to modify the text before /copy it

Lets say I have some customer data like the following saved in a text file:
|Mr |Peter |Bradley |72 Milton Rise |Keynes |MK41 2HQ |
|Mr |Kevin |Carney |43 Glen Way |Lincoln |LI2 7RD | 786 3454
I copied the aforementioned data into my customer table using the following command:
\copy customer(title, fname, lname, addressline, town, zipcode, phone) from 'customer.txt' delimiter '|'
However, as it turns out, there are some extra space characters before and after various parts of the data. What I'd like to do is call trim() before copying the data into the table - what is the best way to achieve this?
Is there a way to call trim() on every value of every row and avoid inserting unclean data in the first place?
Thanks,
I think the best way to go about this is to add a BEFORE INSERT trigger to the table you're inserting to. This way, you can write a stored procedure that will execute before every record is inserted and trim whitepsace (or do any other transformations you may need) on any columns that need it. When you're done, simply remove the trigger (or leave it, which will improve data integrity if you never want that whitespace int those columns). I think explaining how to create a trigger and stored procedure in PostgreSQL is probably outside the scope of this question, but I will link to the documentation for each.
I think this is the best way because it is simpler than parsing through a text file or writing shell code to do this. This kind of sanitization is the kind of thing triggers do very well and very simply.
Creating a Trigger
Creating a Trigger Function
I have somehow similar use case in one of the projects. My input files:
has number of lines in the file as a last line;
needs to have line numbers added on every line;
needs to have file_id added to every line.
I use the following piece of shell code:
FACT=$( dosql "TRUNCATE tab_raw RESTART IDENTITY;
COPY tab_raw(file_id,lnum,bnum,bname,a_day,a_month,a_year,a_time,etype,a_value)
FROM stdin WITH (DELIMITER '|', ENCODING 'latin1', NULL '');
$(sed -e '$d' -e '=' "$FILE"|sed -e 'N;s/\n/|/' -e 's/^/'$DSID'|/')
\.
VACUUM ANALYZE tab_raw;
SELECT count(*) FROM tab_raw;
" | sed -e 's/^[ ]*//' -e '/^$/d'
)
dosql is a shell function, that executes psql with proper connectivity info and executes everything, that was given as an argument.
As a result of this operation I will have $FACT variable holding a total count of inserter records (for error detection).
Later I do another dosql call:
dosql "SET work_mem TO '800MB';
SELECT tab_prepare($DSID);
VACUUM ANALYZE tab_raw;
SELECT tab_duplicates($DSID);
SELECT tab_dst($DSID);
SELECT tab_gaps($DSID);
SELECT tab($DSID);"
to get analyze and move data into the final tables from auxiliary one.

How to run a sequence of SQL queries and save the results?

In other statistical programs, it's possible to create a log file that shows the output issued as a result of a command. Is it possible to do something similar in SQL?
In particular, I'd like to have a single .sql file with many queries and to then output each result to a text file.
I'm using PostgreSQL and Navicat.
plpgsql function and COPY
One way would be to put the SQL script into a plpgsql function, where you can write the individual return values to files with COPY and compile a report from intermediary results just like you need it.
This has additional effect that may or may not be desirable. Like, you can grant or revoke permission to the whole function to arbitrary roles. Read about SECURITY DEFINER in the manual. And the syntax will be verified when you save the function - however, only superficially (there are plans to change that in the future). More details in this answer on dba.SE.
Basic example:
CREATE OR REPLACE FUNCTION func()
RETURNS void AS
$BODY$
BEGIN
COPY (SELECT * FROM tbl WHERE foo) TO '/path/to/my/file/tbl.csv';
COPY (SELECT * FROM tbl2 WHERE NOT bar) TO '/path/to/my/file/tbl2.csv';
END;
$BODY$
LANGUAGE plpgsql;
Of course, you need to have the necessary privileges in the database and in the file system.
Call it from the shell:
psql mydb -c 'SELECT func();'
psql switching between meta commands and SQL
#!/bin/sh
BASEDIR='/var/lib/postgresql/outfiles/'
echo "
\\o $OUTDIR/file1.txt \\\\\\\\ SELECT * FROM tbl1;
\\o $OUTDIR/file2.txt \\\\\\\\ SELECT * FROM tbl2;
\\o $OUTDIR/file3.txt \\\\\\\\ SELECT * FROM tbl3;" | psql event -p 5432 -t -A
That's right, 8 backslashes. Results from a double backslash that gets interpreted two times, so you have to double them two times.
I quote the manual about the meta-commands \o:
Saves future query results to the file filename or ...
and \\:
command must be either a command string that is completely parsable by
the server (i.e., it contains no psql-specific features), or a single
backslash command. Thus you cannot mix SQL and psql meta-commands with
this option. To achieve that, you could pipe the string into psql,
like this: echo '\x \\ SELECT * FROM foo;' | psql. (\\ is the
separator meta-command.)
Don't know about navicat, but you can do it with psql. Check the various --echo-X command-line options and the \o command if you just want temporary output to a file.