There is a text field in a Postgres database containing new lines. I would like to export the content of that field to a text file, preserving those new lines. However, the COPY TO command explictly transforms those characters into the \n string. For example:
$ psql -d postgres -c "COPY (SELECT CHR(10)) TO '/tmp/out.txt';"
COPY 1
$ cat /tmp/out.txt
\n
This behaviour seems to match the short description in the documents:
Presently, COPY TO will never emit an octal or hex-digits backslash sequence, but it does use the other sequences listed above for those control characters.
Is there any workaround to get the new line in the output? E.g. that a command like:
$ psql -d postgres -c "COPY (SELECT 'A line' || CHR(10) || 'Another line') TO '/tmp/out.txt';"
Results in something like:
A line
Another line
Update: I do not wish to obtain a CSV file. The output must not have headers, column separators or column decorators such as quotes (exactly as exemplified in the output above). The answers provided in a different question with COPY AS CSV do not fulfil this requirement.
Per my comment:
psql -d postgres -U postgres -c "COPY (SELECT CHR(10)) TO '/tmp/out.txt' WITH CSV;"
Null display is "NULL".
COPY 1
cat /tmp/out.txt
"
"
psql -d postgres -U postgres -c "COPY (SELECT 'A line' || CHR(10) || 'Another line') TO '/tmp/out.txt' WITH CSV;"
Null display is "NULL".
COPY 1
cat /tmp/out.txt
"A line
Another line"
Using the CSV format will maintain the embedded line breaks in the output. This is explained here COPY under CSV Format
The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character. You can also use FORCE_QUOTE to force quotes when outputting non-NULL values in specific columns.
...
Note
CSV format will both recognize and produce CSV files with quoted values containing embedded carriage returns and line feeds. Thus the files are not strictly one line per table row like text-format files.
UPDATE
Alternate method that does not involve quoting, using psql.
create table line_wrap(id integer, fld_1 varchar);
insert into line_wrap values (1, 'line1
line2');
insert into line_wrap values (2, 'line3
line4');
select fld_1 from line_wrap
\g (format=unaligned tuples_only=on) out.txt
cat out.txt
line1
line2
line3
line4
Related
I am trying to copy data from a csv file into my database table. But the problem is that a column named title has some values that contain a comma within.
How can I exclude it from being used as a delimiter while parsing?
Sample data:
"Hold On, I'm Coming", The Canettes Blues Band, On Tap & In the Can, 34, 100, 282
I don't want the comma after "Hold on" to be used as a delimiter.
If you are using copy or psql's \copy then use the options
format csv quote '"'
this will make the import ignore commas inside quoted values (which is what your sample data uses)
e.g. in psql
\copy target_table from the_input_file.txt with (format csv quote '"')
I'm trying to import a CSV file into PostgreSQL but I am having an issue with special characters.
I'm using the following command
./psql -d data -U postgres -c "copy users from 'users.csv' delimiter E'\t' quote '~' csv"
It works fine until it encounters a field with the '~' which I'm using as a quote value to not break the existing quotes and inverted commas etc.
How do I escape this character in the csv file 'Person~name' so that it will import as 'Person~name'
CSV rules are listed in https://www.ietf.org/rfc/rfc4180.txt
To embed the quote character inside a string:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
In your case, replace double-quote by tilde, since you've choosen that delimiter.
Example:
test=> create table copytest(t text);
CREATE TABLE
test=> \copy copytest from stdin delimiter E'\t' quote '~' csv
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ~foo~~bar~
>> \.
test=> select * from copytest;
t
---------
foo~bar
In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.
For example, when I use the following command to unload/copy a table:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;
I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.
I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.
Finally I figured out the right approach, to add escape in both unload and copy command:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;
With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
Linefeed: \n
Carriage return: \r
The delimiter character specified for the unloaded data.
The escape character: \
A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified
in the UNLOAD command).
And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.
Try unload like below
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape
To load it back use as below
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;
This will work irrespective of your data might have , in between.
Since this topic comes up in many places we decided to package up the UNLOAd/extract process into a Docker service. All the code is on Github so you can use it as-is or grab the underlying Python code to create your own version: https://github.com/openbridge/ob_redshift_unload
You can set the delimiter, dates and ad hoc SQL via run-time configuration. This will also export a header row as well, something that is a little more complicated to undertake.
Here are a few of the runtime options:
-t: The table you wish to UNLOAD
-f: The S3 key at which the file will be placed
-s (Optional): The file you wish to read a custom valid SQL WHERE clause from. This will be sanitized then inserted into the UNLOAD command.
-r (Optional): The range column you wish to use to constrain the results. Any type supported by Redshift's BETWEEN function is accepted here (date, integer, etc.)
-r1 (Optional): The desired start range to constrain the result set
-r2 (Optional): The desired end range to constrain the result set
Note: -s and -d are mutually exlusive and cannot be used together. If neither is used, the script will default to not specifying a WHERE clause and output the entire table.
Then you can run it like this to UNLOAD:
docker run -it -v /local/path/to/my/config.json:/config.json openbridge/ob_redshift_unload python /unload.py -t mytable -f s3://dest-bucket/foo/bar/output_file.csv -r datecol -r1 2017-01-01 -r2 2017-06-01
The goal was to enhance the default UNLOAD process and wrap it into something that can help ensure consistency in generating outputs.
Here is a write-up that details the features/capabilities:
https://blog.openbridge.com/how-to-easily-extract-data-from-amazon-redshift-4e55435f7003
I would like to know how I can import my data to table. I know the COPY command and the option HEADER. But the file I have to import has the following format:
Line 1: header1, header2, header3,...
Line 2: vartype, vartype, vartype,...
Line 3: data1, data2,...
Like you can see, I need to skip the second line too. For example:
"phonenumber","countrycode","firstname","lastname"
INTEGER,INTEGER,VARCHAR(50),VARCHAR(50)
123456789,44,"James","Bond"
5551234567,1,"Angelina","Jolie"
912345678,34,"Antonio","Banderas"
The first line is the exact name of the table's columns. I have tried to use the INSERT INTO command but I have not got good result.
I am using these two strategies for this type of problem:
1) Import all
import all rows into temporary table where columns have varchar type
delete rows you do not want
insert data into final table, cast varchar to desired types
2) Pre-process
delete rows from imported file
import
For your case, you can delete 2nd line using sed for example:
sed -i '2d' importfile.txt
This will remove 2nd line from file named importfile.txt. Note that flag -i will overwrite the file immediately, so use it with care.
You can use this to delete range of lines:
sed -i '2,4d' importfile.txt
This will remove lines 2, 3, 4 from file.
If you are working in a Linux shell you could always just stream in the records you want, eg
tail -[number of lines minus header] <file> | psql <db> -c "COPY <table> FROM STDIN CSV;"
or if your header is marked by say "#"
grep -v "^#" <file> | psql <db> -c "COPY <table> FROM STDIN CSV;"
You'll have to pre-process the file I'm afraid. There are far too many strange formats (like this one) around for COPY to understand - it just concentrates on handling the basics. You can trim the second line out with a simple bit of sed or perl.
perl -ne 'print unless ($.==2)' source_file.txt
When I try to export the text content of a field, and that content have carriage return characters, that chars are output like \N string.
For example:
create table foo ( txt text );
insert into foo ( txt ) values ( 'first line
second line
...
and other lines');
copy foo ( txt ) to '/tmp/foo.txt';
I want to return the following (a):
first line
second line
...
and other lines
But, output is (b):
first line\Nsecond line\N...\Nand other lines
Anybody knows how to get the (a) output?
The \N comes from the fact that one line must correspond to one database row.
This rule is relaxed for the CSV format where multi-line text is possible but then a quote character (by default: ") would enclose the text.
If you want multi-line output and no enclosing character around it, you shouldn't use COPY but SELECT.
Assuming a unix shell as the execution environment of the caller, you could do:
psql -A -t -d dbname -c 'select txt from foo' >/tmp/file.txt
Have you tried: \r\n?
Here's another solution that might work:
E'This is the first part \\n And this is the second'
via https://stackoverflow.com/a/938/1085891
Also, rather than copy the other responses, see here: String literals and escape characters in postgresql