Error when running postgresql COPY Command - postgresql

Would like to be able to add characters like '-' in the schema name when running COPY command in postgresSQL. Any way to get around this ? Thanks!
`psql -d postgres -c "\COPY (SELECT * FROM test-schema.tableName) TO data.csv DELIMITER ',' CSV"
ERROR: syntax error at or near "-"`enter code here`
LINE 1: COPY ( SELECT * FROM test-schema.tableName ) TO STDOUT DELIMITER ',...`

Yes though I tend to discourage it.
Identifiers:
SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable. The SQL standard will not define a key word that contains digits or starts or ends with an underscore, so identifiers of this form are safe against possible conflict with future extensions of the standard.
There is a second kind of identifier: the delimited identifier or quoted identifier. It is formed by enclosing an arbitrary sequence of characters in double-quotes (").
So:
create schema "test-schema";
CREATE SCHEMA
\dn "test-schema"
List of schemas
Name | Owner
-------------+----------
test-schema | postgres
create table "test-schema"."test-table"(id int);
select * from test-schema."test-table";
ERROR: syntax error at or near "-"
LINE 1: select * from test-schema."test-table";
select * from "test-schema"."test-table";
id
----
(0 rows)
As you see, if you double quote an identifier to get around the identifier naming rules then you are bound to always quoting it.

Related

Postgres escape double quotes

I am working with a malformed database which seems to have double quotes as part of the column names.
Example:
|"Market" |
|---------|
|Japan |
|UK |
|USA |
And I want to select like below
SELECT "\"Market\"" FROM mytable; /* Does not work */
How does one select such a thing?
The documentation says
[A] delimited identifier or quoted identifier […] is formed by enclosing an arbitrary sequence of characters in double-quotes ("). […]
Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.)
So you'll want to use
SELECT """Market""" AS "Market" FROM mytable;
An alternative would be
A variant of quoted identifiers allows including escaped Unicode characters identified by their code points. This variant starts with U& (upper or lower case U followed by ampersand) immediately before the opening double quote, without any spaces in between, for example U&"foo". […] Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number or alternatively a backslash followed by a plus sign followed by a six-digit hexadecimal code point number.
which in your case would mean
SELECT U&"\0022Market\0022" AS "Market" FROM mytable;
SELECT U&"\+000022Market\+000022" AS "Market" FROM mytable;
Disclaimer: your database may not actually have double quotes as part of the name itself. As mentioned in the comments, this might just be the way in which the tool you are using does display a column named Market (not market) since
Quoting an identifier also makes it case-sensitive
So all you might need could be
SELECT "Market" FROM mytable;

postgreSQL COPY failing with value too long for type character varying(255)

I am trying to copy a csv file with each column enclosed in double quotes. The COPY statement that I am using is:
\COPY schema.table_name FROM 'file_name.csv' WITH (FORMAT CSV, DELIMITER ',', ESCAPE '\', QUOTE '"', FORCE_NULL(col1,col2,col3,col4))
The column 'col3' is defined as VARCHAR(255). One of the rows in csv file has a string that has literal '\n' repeated 4 times, making the length of string 259. I am expecting that each occurrence of '\n' will convert into newline character, making the string to 255 character long. However, the COPY statement is failing with the below error:
ERROR: value too long for type character varying(255)
COPY schema.table_name, line 11859, column col3: "RN recommends 3HX3D (Mon, Wed, Fri) of PCA/ service.\nincontinence supplies\nPERS new\nConsumer stat..."
When I change the column length to 260, the COPY works fine. I cannot change the length of the column as I won't know how many '\n' literals may come in this column. Is this a bug in postgreSQL? If not, how can this be fixed? I am using version 12.3 of postgreSQL.
Thanks in advance!
The answer provided by #wildplasser works fine. I removed the size for VARCHAR column and the data is loaded successfully.
CSV files do not use \ escaping (except possibly for the quote character itself). You escape a newline by having it appear literally in the file, but included inside the quotes. So if you make the type long enough to hold the data, it will still be holding a literal sequence of \ and n, not newlines.
So this is not a bug in PostgreSQL, but rather a bug in your file data.

Why does my LIKE statement fail with '\\_' for matching?

I have a database entry that has entries that look like this:
id | name | code_set_id
I have this particular entry that I need to find:
674272310 | raphodo/qrc_resources.py | 782732
In my rails app (2.3.8), I have a statement that evaluates to this:
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc\\_resources.py%';
From reading up on escaping, the above query is correct. This is supposed to correctly double escape the underscore. However this query does not find the record in the database. These queries will:
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc\_resources.py%';
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc_resources.py%';
Am I missing something here? Why is the first SQL statement not finding the correct entry?
A single backslash in the RHS of a LIKE escapes the following character:
9.7.1. LIKE
[...]
To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.
So this is a literal underscore in a LIKE pattern:
\_
and this is a single backslash followed by an "any character" pattern:
\\_
You want LIKE to see this:
raphodo/qrc\_resources.py%
PostgreSQL used to interpret C-stye backslash escapes in strings by default but no longer, now you have to use E'...' to use backslash escapes in string literals (unless you've changed the configuration options). The String Constants with C-style Escapes section of the manual covers this but the simple version is that these two:
name LIKE E'raphodo/qrc\\_resources.py%'
name LIKE 'raphodo/qrc\_resources.py%'
do the same thing as of PostgreSQL 9.1.
Presumably your Rails 2.3.8 app (or whatever is preparing your LIKE patterns) is assuming an older version of PostgreSQL than the one you're actually using. You'll need to adjust things to not double your backslashes (or prefix the pattern string literals with Es).

PostgreSQL escape character during copy

I'm trying to import a CSV file into PostgreSQL but I am having an issue with special characters.
I'm using the following command
./psql -d data -U postgres -c "copy users from 'users.csv' delimiter E'\t' quote '~' csv"
It works fine until it encounters a field with the '~' which I'm using as a quote value to not break the existing quotes and inverted commas etc.
How do I escape this character in the csv file 'Person~name' so that it will import as 'Person~name'
CSV rules are listed in https://www.ietf.org/rfc/rfc4180.txt
To embed the quote character inside a string:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
In your case, replace double-quote by tilde, since you've choosen that delimiter.
Example:
test=> create table copytest(t text);
CREATE TABLE
test=> \copy copytest from stdin delimiter E'\t' quote '~' csv
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ~foo~~bar~
>> \.
test=> select * from copytest;
t
---------
foo~bar

How to safely unload/copy a table in RedShift?

In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.
For example, when I use the following command to unload/copy a table:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;
I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.
I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.
Finally I figured out the right approach, to add escape in both unload and copy command:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;
With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
Linefeed: \n
Carriage return: \r
The delimiter character specified for the unloaded data.
The escape character: \
A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified
in the UNLOAD command).
And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.
Try unload like below
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape
To load it back use as below
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;
This will work irrespective of your data might have , in between.
Since this topic comes up in many places we decided to package up the UNLOAd/extract process into a Docker service. All the code is on Github so you can use it as-is or grab the underlying Python code to create your own version: https://github.com/openbridge/ob_redshift_unload
You can set the delimiter, dates and ad hoc SQL via run-time configuration. This will also export a header row as well, something that is a little more complicated to undertake.
Here are a few of the runtime options:
-t: The table you wish to UNLOAD
-f: The S3 key at which the file will be placed
-s (Optional): The file you wish to read a custom valid SQL WHERE clause from. This will be sanitized then inserted into the UNLOAD command.
-r (Optional): The range column you wish to use to constrain the results. Any type supported by Redshift's BETWEEN function is accepted here (date, integer, etc.)
-r1 (Optional): The desired start range to constrain the result set
-r2 (Optional): The desired end range to constrain the result set
Note: -s and -d are mutually exlusive and cannot be used together. If neither is used, the script will default to not specifying a WHERE clause and output the entire table.
Then you can run it like this to UNLOAD:
docker run -it -v /local/path/to/my/config.json:/config.json openbridge/ob_redshift_unload python /unload.py -t mytable -f s3://dest-bucket/foo/bar/output_file.csv -r datecol -r1 2017-01-01 -r2 2017-06-01
The goal was to enhance the default UNLOAD process and wrap it into something that can help ensure consistency in generating outputs.
Here is a write-up that details the features/capabilities:
https://blog.openbridge.com/how-to-easily-extract-data-from-amazon-redshift-4e55435f7003