Copy Redshift Pipe Delimited enclosed by double Quotes - amazon-redshift

I am trying to load a file from S3 to Redshift. The file is delimited by Pipe, but there are value that contains Pipe and other Special characters, but if value has Pipe, it is enclosed by double quotes.
Example:
Field1|Field2
"abc|dh"|123
efh#ih|233
I have tried using below command but i am getting error saying invalid digit. it is due to the copy command is considering pipe in the value as delimiter.
copy table
from 's3'
iam_role 'arn'
region 'us-east-1'
MAXERROR AS 10 NULL AS '(null)'
'ESCAPE "'
IGNOREHEADER AS 1
DELIMITER '|' timeformat 'auto' GZIP;

You are looking for the REMOVEQUOTES parameter. https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-removequotes
ESCAPE requires your data to be prepared with the escape character embedded. For example if your escape character was \ then you would need to prepare the data so that the content was "abc\|dh".
Example:
DROP TABLE IF EXISTS public.quote_test;
CREATE TABLE IF NOT EXISTS public.quote_test (col_a VARCHAR(10), col_b VARCHAR(10));
SELECT * FROM quote_test;
echo '"a|b"|"c|d"' > ~/simple_quotes.txt
aws s3 cp ~/simple_quotes.txt s3://my-bucket/simple_quotes.txt
--Will fail
COPY quote_test FROM 's3://my-bucket/simple_quotes.txt'
CREDENTIALS 'aws_iam_role=arn:aws:iam::012345678901:role/redshift-cluster'
DELIMITER '|' REGION 'us-west-2';
--Succeeds with REMOVEQUOTES
COPY quote_test FROM 's3://benchmark-files/simple_quotes.txt'
CREDENTIALS 'aws_iam_role=arn:aws:iam::012345678901:role/redshift-cluster'
REMOVEQUOTES DELIMITER '|' REGION 'us-west-2';
SELECT * FROM quote_test;
-- col_a | col_b
-- -------+-------
-- a|b | c|d

Related

How do I use Postgresql's COPY command to import a file with consecutive single quotes?

I am trying to import a TSV file into Postgresql. I have created a table:
CREATE TABLE description (
id TEXT
, effective_time DATE
, active INT
, module_id TEXT
, concept_id TEXT
, language_code TEXT
, type_id TEXT
, term TEXT
, case_significance_id TEXT
);
I have a TSV file like so:
id effectiveTime active moduleId conceptId languageCode typeId term caseSignificanceId
12118017 20170731 1 900000000000207008 6708002 en 900000000000013009 Intrauterine cordocentesis 900000000000448009
12119013 20020131 1 900000000000207008 6709005 en 900000000000013009 Gentamicin 2''-nucleotidyltransferase 900000000000020002
12119013 20170731 1 900000000000207008 6709005 en 900000000000013009 Gentamicin 2''-nucleotidyltransferase 900000000000448009
12120019 20020131 1 900000000000207008 6710000 en 900000000000013009 Nitric oxide 900000000000020002
Note that the middle two entries have two consecutive single quotes acting as the symbol for double-prime (Gentamicin 2''-nucleotidyltransferase).
If I run
psql=# \copy description FROM /path/to/foo.txt WITH DELIMITER AS E'\t';
I get ERROR: missing data for column "effective_time". I think that's because the '' is screwing up the parsing of the column boundaries.
I have tried finding and replacing the '' instances with either \'\' or '''' and using CSV QUOTE E'\'' or CSV QUOTE '''', respectively, but I get the same error.
How do I edit the file or alter the \copy command to import the file correctly?
Haleemur Ali correctly points out that the original file—whose README purports it to comprise "UTF-8 encoded tab-delimited flat files which can be imported into any database"—is in fact not tab-separated, which may be my editor's fault. It works once I fix that.

Copy data from a txt file into database

I am using pgAdminIII and I want to copy data from a .txt file to my database.Let's say that we have a file called Address.txt and it has these values:
1,1970 Napa Ct.,Bothell,98011
2,9833 Mt. Dias Blv.,Bothell,98011
3,"7484, Roundtree Drive",Bothell,98011
4,9539 Glenside Dr,Bothell,98011
If I type
COPY myTable FROM 'C:\Address.txt' (DELIMITER(','));
I will get
ERROR: extra data after last expected column
CONTEXT: COPY address, line 3: "7484, Roundtree Drive",Bothell,98011
What do I need to add to the COPY command in order to ignore the , as a new column inside the " "?
You need to specify quote character such that:
COPY mytable FROM 'C:\Address.txt' DELIMITER ',' QUOTE '"' csv;

Using ASCII 31 field separator character as Postgresql COPY delimiter

We are exporting data from Postgres 9.3 into a text file for ingestion by Spark.
We would like to use the ASCII 31 field separator character as a delimiter instead of \t so that we don't have to worry about escaping issues.
We can do so in a shell script like this:
#!/bin/bash
DELIMITER=$'\x1F'
echo "copy ( select * from table limit 1) to STDOUT WITH DELIMITER '${DELIMITER}'" | (psql ...) > /tmp/ascii31
But we're wondering, is it possible to specify a non-printable glyph as a delimiter in "pure" postgres?
edit: we attempted to use the postgres escaping convention per http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html
warehouse=> copy ( select * from table limit 1) to STDOUT WITH DELIMITER '\x1f';
and received
ERROR: COPY delimiter must be a single one-byte character
Try prepending E before the sequence you're trying to use as a delimter. For example E'\x1f' instead of '\x1f'. Without the E PostgreSQL will read '\x1f' as four separate characters and not a hexadecimal escape sequence, hence the error message.
See the PostgreSQL manual on "String Constants with C-style Escapes" for more information.
From my testing, both of the following work:
echo "copy (select 1 a, 2 b) to stdout with delimiter u&'\\001f'"| psql;
echo "copy (select 1 a, 2 b) to stdout with delimiter e'\\x1f'"| psql;
I've extracted a small file from Actian Matrix (a fork of Amazon Redshift, both derivatives of postgres), using this notation for ASCII character code 30, "Record Separator".
unload ('SELECT btrim(class_cd) as class_cd, btrim(class_desc) as class_desc
FROM transport.stg.us_fmcsa_carrier_classes')
to '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter as '\036' leader;
This is an example of how this file looks in VI:
C^^Private Property
D^^Private Passenger Business
E^^Private Passenger Non-Business
I then moved this file over to a machine hosting PostgreSQL 9.5 via sftp, and used the following copy command, which seems to work well:
copy fmcsa.carrier_classes
from '/tmp/us_fmcsa_carrier_classes_mk4.txt'
delimiter u&'\001E';
Each derivative of postgres, and postgres itself seems to prefer a slightly different notation. Too bad we don't have a single standard!

PG COPY error: invalid input syntax for integer

Running COPY results in ERROR: invalid input syntax for integer: "" error message for me. What am I missing?
My /tmp/people.csv file:
"age","first_name","last_name"
"23","Ivan","Poupkine"
"","Eugene","Pirogov"
My /tmp/csv_test.sql file:
CREATE TABLE people (
age integer,
first_name varchar(20),
last_name varchar(20)
);
COPY people
FROM '/tmp/people.csv'
WITH (
FORMAT CSV,
HEADER true,
NULL ''
);
DROP TABLE people;
Output:
$ psql postgres -f /tmp/sql_test.sql
CREATE TABLE
psql:sql_test.sql:13: ERROR: invalid input syntax for integer: ""
CONTEXT: COPY people, line 3, column age: ""
DROP TABLE
Trivia:
PostgreSQL 9.2.4
ERROR: invalid input syntax for integer: ""
"" isn't a valid integer. PostgreSQL accepts unquoted blank fields as null by default in CSV, but "" would be like writing:
SELECT ''::integer;
and fail for the same reason.
If you want to deal with CSV that has things like quoted empty strings for null integers, you'll need to feed it to PostgreSQL via a pre-processor that can neaten it up a bit. PostgreSQL's CSV input doesn't understand all the weird and wonderful possible abuses of CSV.
Options include:
Loading it in a spreadsheet and exporting sane CSV;
Using the Python csv module, Perl Text::CSV, etc to pre-process it;
Using Perl/Python/whatever to load the CSV and insert it directly into the DB
Using an ETL tool like CloverETL, Talend Studio, or Pentaho Kettle
I think it's better to change your csv file like:
"age","first_name","last_name"
23,Ivan,Poupkine
,Eugene,Pirogov
It's also possible to define your table like
CREATE TABLE people (
age varchar(20),
first_name varchar(20),
last_name varchar(20)
);
and after copy, you can convert empty strings:
select nullif(age, '')::int as age, first_name, last_name
from people
Just came across this while looking for a solution and wanted to add I was able to solve the issue by adding the "null" parameter to the copy_from call:
cur.copy_from(f, tablename, sep=',', null='')
I got this error when loading '|' separated CSV file although there were no '"' characters in my input file. It turned out that I forgot to specify FORMAT:
COPY ... FROM ... WITH (FORMAT CSV, DELIMITER '|').
Use the below command to copy data from CSV in a single line without casting and changing your datatype.
Please replace "NULL" by your string which creating error in copy data
copy table_name from 'path to csv file' (format csv, null "NULL", DELIMITER ',', HEADER);
I had this same error on a postgres .sql file with a COPY statement, but my file was tab-separated instead of comma-separated and quoted.
My mistake was that I eagerly copy/pasted the file contents from github, but in that process all the tabs were converted to spaces, hence the error. I had to download and save the raw file to get a good copy.
CREATE TABLE people (
first_name varchar(20),
age integer,
last_name varchar(20)
);
"first_name","age","last_name"
Ivan,23,Poupkine
Eugene,,Pirogov
copy people from 'file.csv' with (delimiter ';', null '');
select * from people;
Just in first column.....
Ended up doing this using csvfix:
csvfix map -fv '' -tv '0' /tmp/people.csv > /tmp/people_fixed.csv
In case you know for sure which columns were meant to be integer or float, you can specify just them:
csvfix map -f 1 -fv '' -tv '0' /tmp/people.csv > /tmp/people_fixed.csv
Without specifying the exact columns, one may experience an obvious side-effect, where a blank string will be turned into a string with a 0 character.
this ought to work without you modifying the source csv file:
alter table people alter column age type text;
copy people from '/tmp/people.csv' with csv;
There is a way to solve "", the quoted null string as null in integer column,
use FORCE_NULL option :
\copy table_name FROM 'file.csv' with (FORMAT CSV, FORCE_NULL(column_name));
see postgresql document, https://www.postgresql.org/docs/current/static/sql-copy.html
All in python (using psycopg2), create the empty table first then use copy_expert to load the csv into it. It should handle for empty values.
import psycopg2
conn = psycopg2.connect(host="hosturl", database="db_name", user="username", password="password")
cur = conn.cursor()
cur.execute("CREATE TABLE schema.destination_table ("
"age integer, "
"first_name varchar(20), "
"last_name varchar(20)"
");")
with open(r'C:/tmp/people.csv', 'r') as f:
next(f) # Skip the header row. Or remove this line if csv has no header.
conn.cursor.copy_expert("""COPY schema.destination_table FROM STDIN WITH (FORMAT CSV)""", f)
Incredibly, my solution to the same error was to just re-arrange the columns. For anyone else doing the above solutions and still not getting past the error.
I apparently had to arrange the columns in my CSV file to match the same sequence in the table listing in PGADmin.

How do I stop Postgres copy command to stop padding Strings?

My field is defined as follows
"COLUMNNAME" character(9)
I import CSV files using the following command
copy "TABLE" from '/my/directory' DELIMITERS ',' CSV;
If I have a string such as 'ABCDEF' Postgres pads it out to 'ABCDEF '. How can I stop it from doing this?
it is because you have char instead of varchar. change type of your column into varchar and everything will be fine