Postgresql \COPY command and quote-less file - postgresql

Is there anyway to disable QUOTE AS this data distributor is giving out a file that has a set up like,
col1,col2
foo,bar
Some columns are a little more complex
col1,col2
Test outside-"bar' Blah blah` Someone else,What
Now the question: Is there a better way than giving QUOTE AS a character that is assumed not to exist?
\COPY maxmind.country FROM worldcitiespop.txt CSV QUOTE AS '$' HEADER
Where $ is an assumption the character doesn't exist?

You don't need to know that the quote character doesn't exist, csv formats can escape quotes if necessary. You should just pick the appropriate dialect for what worldcitiespop.txt contains.

Related

Escape \n when exporting CSV using COPY command

I need to export various tables in CSV for AWS Glue Catalog and I just noticed a major showstopper:
COPY command does not escape new line inputs in columns, only quotes them.
What confuses me even more is that I can switch to TEXT and get the format right - escape the special characters - but I cannot have HEADER in that format!
COPY (%s) TO STDOUT DELIMITER ',' NULL ''
Is there a way to get both HEADER and to escape the new line through COPY command?
I'm hoping that it's my overlook as the code is obviously there.
The text format does not produce CSV, that is why you cannot get headers or change the delimiter. It is the “internal” tab-separated format of PostgreSQL.
There are no provisions to replace newlines with \n in a CSV file, and indeed that would produce invalid CSV (according to what most people think; there is no standard).
You'll have to post-process the file.

Copy Command to insert CSV file - Escape Special Characters

I am trying to do a bulk insert into postgres db using copy command from csv file. All the columns in the db table are character_varying(1024) type.The copy command is failing on certain values which are in Double quotes
For example:
"TODD'S JAMES RENO PHCY,INC."
My copy command looks like below:
\copy file_tmp FROM /srv/data0/transfer/data_2.csv USING DELIMITERS ','
Could you please help in how to escape these special characters and get this working?
Although you have specified a delimiter, you have not specified a format, so it is still using "text". In "text" format, thing are escaped by backslashes, not quotes.
Also, 'USING DELIMITERS' is an extremely obsolete syntax.
You probably want something like:
\copy file_tmp FROM /srv/data0/transfer/data_2.csv WITH (FORMAT CSV)
You don't need to specify the delimiter, because it defaults to ',' when using CSV format.
Of course this still might fail on parts of the data you haven't shown us.

PostgreSQL COPY csv including Quotes

This is a very simple problem, I am using the psql terminal command COPY as shown bellow
COPY tbname FROM '/tmp/file.csv'
delimiter '|' csv;
However this file.csv contains data such as
random|stuff|32"
as well as
random|other "stuff"|15
I tried to use the double quote to escape the quotes as the Postgres site suggested
random|stuff|32""
random|other ""stuff""|15
This seems to remove the quotes completely which I don't want.
Is there a way to get the import to just treat these quotes as regular characters so that they appear in the database as they do in the csv file?
According to the documentation, the default quote symbol is ", so you need to provide a QUOTE argument with a different symbol. The quote symbol has to be a single one-byte character.
COPY tbname FROM '/tmp/file.csv'
delimiter '|' QUOTE '}' csv; -- use a symbol you know does not appear in your file.

How to safely unload/copy a table in RedShift?

In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.
For example, when I use the following command to unload/copy a table:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;
I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.
I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.
Finally I figured out the right approach, to add escape in both unload and copy command:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;
With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
Linefeed: \n
Carriage return: \r
The delimiter character specified for the unloaded data.
The escape character: \
A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified
in the UNLOAD command).
And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.
Try unload like below
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape
To load it back use as below
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;
This will work irrespective of your data might have , in between.
Since this topic comes up in many places we decided to package up the UNLOAd/extract process into a Docker service. All the code is on Github so you can use it as-is or grab the underlying Python code to create your own version: https://github.com/openbridge/ob_redshift_unload
You can set the delimiter, dates and ad hoc SQL via run-time configuration. This will also export a header row as well, something that is a little more complicated to undertake.
Here are a few of the runtime options:
-t: The table you wish to UNLOAD
-f: The S3 key at which the file will be placed
-s (Optional): The file you wish to read a custom valid SQL WHERE clause from. This will be sanitized then inserted into the UNLOAD command.
-r (Optional): The range column you wish to use to constrain the results. Any type supported by Redshift's BETWEEN function is accepted here (date, integer, etc.)
-r1 (Optional): The desired start range to constrain the result set
-r2 (Optional): The desired end range to constrain the result set
Note: -s and -d are mutually exlusive and cannot be used together. If neither is used, the script will default to not specifying a WHERE clause and output the entire table.
Then you can run it like this to UNLOAD:
docker run -it -v /local/path/to/my/config.json:/config.json openbridge/ob_redshift_unload python /unload.py -t mytable -f s3://dest-bucket/foo/bar/output_file.csv -r datecol -r1 2017-01-01 -r2 2017-06-01
The goal was to enhance the default UNLOAD process and wrap it into something that can help ensure consistency in generating outputs.
Here is a write-up that details the features/capabilities:
https://blog.openbridge.com/how-to-easily-extract-data-from-amazon-redshift-4e55435f7003

Ignore quotation marks when importing a CSV file into PostgreSQL?

I'm trying to import a tab-delimited file into my PostgreSQL database. One of the fields in my file is a "title" field, which occasionally contains actual quotation marks. For example, my tsv might look like:
id title
5 Hello/Bleah" Foo
(Yeah, there's just that one quotation mark in the title.)
When I try importing the file into my database:
copy articles from 'articles.tsv' with delimiter E'\t' csv header;
I get this error, referencing that line:
ERROR: unterminated CSV quoted field
How do I fix this? Quotation marks are never used to surround entire fields in the file. I tried copy articles from 'articles.tsv' with delimiter E'\t' escape E'\\' csv header; but I get the same error on the same line.
Assuming the file never actually tries to quote its fields:
The option you want is "with quote", see http://www.postgresql.org/docs/8.2/static/sql-copy.html
Unfortunately, I'm not sure how to turn off quote processing altogether, one kludge would be to specify a character that does not appear in your file at all.
Tab separated is the default format for copy statements. Treating them as CSV is just silly. (do you take this path just to skip the header ?)
copy articles from 'articles.tsv';
does exactly what you want.
I struggled with the same error and a few more. Finally gathering knowledge from few SO questions I came up with the following setup for making COPY TO/FROM successful even for quite sophisticated JSON columns:
COPY "your_schema_name.yor_table_name" (your, column_names, here)
FROM STDIN WITH CSV DELIMITER E'\t' QUOTE '\b' ESCAPE '\';
--here rows data
\.
the most important parts:
QUOTE '\b' - quote with backspace (thanks a lot #grautur!)
DELIMITER E'\t' - delimiter with tabs
ESCAPE '\' - and escape with a backslash