PostgreSQL escape character during copy - postgresql

I'm trying to import a CSV file into PostgreSQL but I am having an issue with special characters.
I'm using the following command
./psql -d data -U postgres -c "copy users from 'users.csv' delimiter E'\t' quote '~' csv"
It works fine until it encounters a field with the '~' which I'm using as a quote value to not break the existing quotes and inverted commas etc.
How do I escape this character in the csv file 'Person~name' so that it will import as 'Person~name'

CSV rules are listed in https://www.ietf.org/rfc/rfc4180.txt
To embed the quote character inside a string:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
In your case, replace double-quote by tilde, since you've choosen that delimiter.
Example:
test=> create table copytest(t text);
CREATE TABLE
test=> \copy copytest from stdin delimiter E'\t' quote '~' csv
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ~foo~~bar~
>> \.
test=> select * from copytest;
t
---------
foo~bar

Related

Exporting new line characters to text in Postgres

There is a text field in a Postgres database containing new lines. I would like to export the content of that field to a text file, preserving those new lines. However, the COPY TO command explictly transforms those characters into the \n string. For example:
$ psql -d postgres -c "COPY (SELECT CHR(10)) TO '/tmp/out.txt';"
COPY 1
$ cat /tmp/out.txt
\n
This behaviour seems to match the short description in the documents:
Presently, COPY TO will never emit an octal or hex-digits backslash sequence, but it does use the other sequences listed above for those control characters.
Is there any workaround to get the new line in the output? E.g. that a command like:
$ psql -d postgres -c "COPY (SELECT 'A line' || CHR(10) || 'Another line') TO '/tmp/out.txt';"
Results in something like:
A line
Another line
Update: I do not wish to obtain a CSV file. The output must not have headers, column separators or column decorators such as quotes (exactly as exemplified in the output above). The answers provided in a different question with COPY AS CSV do not fulfil this requirement.
Per my comment:
psql -d postgres -U postgres -c "COPY (SELECT CHR(10)) TO '/tmp/out.txt' WITH CSV;"
Null display is "NULL".
COPY 1
cat /tmp/out.txt
"
"
psql -d postgres -U postgres -c "COPY (SELECT 'A line' || CHR(10) || 'Another line') TO '/tmp/out.txt' WITH CSV;"
Null display is "NULL".
COPY 1
cat /tmp/out.txt
"A line
Another line"
Using the CSV format will maintain the embedded line breaks in the output. This is explained here COPY under CSV Format
The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character. You can also use FORCE_QUOTE to force quotes when outputting non-NULL values in specific columns.
...
Note
CSV format will both recognize and produce CSV files with quoted values containing embedded carriage returns and line feeds. Thus the files are not strictly one line per table row like text-format files.
UPDATE
Alternate method that does not involve quoting, using psql.
create table line_wrap(id integer, fld_1 varchar);
insert into line_wrap values (1, 'line1
line2');
insert into line_wrap values (2, 'line3
line4');
select fld_1 from line_wrap
\g (format=unaligned tuples_only=on) out.txt
cat out.txt
line1
line2
line3
line4

PostgreSQL COPY csv including Quotes

This is a very simple problem, I am using the psql terminal command COPY as shown bellow
COPY tbname FROM '/tmp/file.csv'
delimiter '|' csv;
However this file.csv contains data such as
random|stuff|32"
as well as
random|other "stuff"|15
I tried to use the double quote to escape the quotes as the Postgres site suggested
random|stuff|32""
random|other ""stuff""|15
This seems to remove the quotes completely which I don't want.
Is there a way to get the import to just treat these quotes as regular characters so that they appear in the database as they do in the csv file?
According to the documentation, the default quote symbol is ", so you need to provide a QUOTE argument with a different symbol. The quote symbol has to be a single one-byte character.
COPY tbname FROM '/tmp/file.csv'
delimiter '|' QUOTE '}' csv; -- use a symbol you know does not appear in your file.

How to safely unload/copy a table in RedShift?

In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.
For example, when I use the following command to unload/copy a table:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;
I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.
I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.
Finally I figured out the right approach, to add escape in both unload and copy command:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;
With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
Linefeed: \n
Carriage return: \r
The delimiter character specified for the unloaded data.
The escape character: \
A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified
in the UNLOAD command).
And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.
Try unload like below
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape
To load it back use as below
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;
This will work irrespective of your data might have , in between.
Since this topic comes up in many places we decided to package up the UNLOAd/extract process into a Docker service. All the code is on Github so you can use it as-is or grab the underlying Python code to create your own version: https://github.com/openbridge/ob_redshift_unload
You can set the delimiter, dates and ad hoc SQL via run-time configuration. This will also export a header row as well, something that is a little more complicated to undertake.
Here are a few of the runtime options:
-t: The table you wish to UNLOAD
-f: The S3 key at which the file will be placed
-s (Optional): The file you wish to read a custom valid SQL WHERE clause from. This will be sanitized then inserted into the UNLOAD command.
-r (Optional): The range column you wish to use to constrain the results. Any type supported by Redshift's BETWEEN function is accepted here (date, integer, etc.)
-r1 (Optional): The desired start range to constrain the result set
-r2 (Optional): The desired end range to constrain the result set
Note: -s and -d are mutually exlusive and cannot be used together. If neither is used, the script will default to not specifying a WHERE clause and output the entire table.
Then you can run it like this to UNLOAD:
docker run -it -v /local/path/to/my/config.json:/config.json openbridge/ob_redshift_unload python /unload.py -t mytable -f s3://dest-bucket/foo/bar/output_file.csv -r datecol -r1 2017-01-01 -r2 2017-06-01
The goal was to enhance the default UNLOAD process and wrap it into something that can help ensure consistency in generating outputs.
Here is a write-up that details the features/capabilities:
https://blog.openbridge.com/how-to-easily-extract-data-from-amazon-redshift-4e55435f7003

Postgres COPY command with literal delimiter

I was trying to import a CSV file into a PostgreSQL table using the COPY command. The delimiter of the CSV file is comma (,). However, there's also a text field with a comma in the value. For example:
COPY schema.table from '/folder/foo.csv' delimiter ',' CSV header
Here's the content of the foo.csv file:
Name,Description,Age
John,Male\,Tall,30
How to distinguish between the literal comma and the delimiter?
Thanks for your help.
To have the \ to be recognized as a escape character it is necessary to use the text format
COPY schema.table from '/folder/foo.csv' delimiter ',' TEXT
But then it is also necessary to delete the first line as the HEADER option is only valid for the CSV format.

Error at import data to postgresql using copy with delimiter

im am importing data to postgresql with this comand
COPY codigos_postales
(CPRO, CMUN, Nombre_Municipio, CP, Municipio_CP, Lugar_CP)
FROM 'path' WITH DELIMITER E'/t';
But i got this error.
ERROR: COPY delimiter must be a single-byte character
If you're trying to specify a tab as the delimiter, you want E'\t' (the escape character is a backslash not a forward slash) or just a literal tab ' '.
You can see that with:
regress=> SELECT E'\t' AS backslash, E'/t' AS forwardslash;
backslash | forwardslash
-----------+--------------
| /t
(1 row)
If the delimiter is actually the string /t then you won't be able to use COPY, as it only supports single character delimiters.
your delimiter looks like a little bit complex but not a single byte char... Try with '\t'.