COPY from S3 to Redshift not recognizing newline - postgresql

I am trying to run a COPY command from an S3 bucket to a Redshift PostgreSQL table, and I am getting the following error (in stl_load_errors):
err_code: 1207
err_reason: Invalid digit, Value '2', Pos 3, Type: Short
raw_field_value:
2
2/28/15
The file has 2 lines:
2/28/15,Phone,Android,0,1,3,2,2
2/28/15,Phone,Android,0,4,1,2,2
The CREATE TABLE code is:
create table aggregate_table( date date, variable varchar(15),source varchar(15), prepaid smallint, direction smallint, total smallint, carrier smallint, carrier_group smallint)
It seems like the newline is not being recognized, and is trying to read the end of the first line and the beginning of the second line as one value. I have tried using delimiter ',' and escape, but nothing seems to work.
Thank you for your help!
Edit: Here's the COPY command (i've also tried it with escape at the end as well)
COPY aggregate_table FROM 's3://path_to_file.csv' CREDENTIALS 'aws_access_key_id=XXXX;aws_secret_access_key=XXXXX' CSV delimiter ',' DATEFORMAT AS 'MM/DD/YY';

You need to add DATEFORMAT AS 'MM/DD/YY' to your COPY command. Otherwise redshift can not parse date in first column correctly, as it expects YYYY-MM-DD.
See http://docs.aws.amazon.com/redshift/latest/dg/r_DATEFORMAT_and_TIMEFORMAT_strings.html for more details.

#quarterdome thanks for working through this with me! After you pointed out that it worked, I tried from the beginning to end again. It turns out that when I saved the file without a .csv extension, it worked! –

Related

PostgreSQL how to read csv file with decimal comma?

I try to read a csv file containing real numbers with a comma as separator. I try to read this file with \copy in psql:
\copy table FROM 'filename.csv' DELIMITER ';' CSV HEADER;
psql does not recognize the comma as decimal point.
psql:filename.sql:44: ERROR: invalid input syntax for type real: "9669,84"
CONTEXT: COPY filename, line 2, column col-3: "9669,84"
I did some googling but could not find any answer other than "change the decimal comma into a decimal point". I tried SET DECIMALSEPARATORCOMMA=ON; but that did not work. I also experimented with some encoding but I couldn't find whether encoding governs the decimal point (I got the impression it didn't).
Is there really no solution other than changing the input data?
COPY to a table where you insert the number into a varchar field. Then do something like in psql:
--Temporarily change numeric formatting to one that uses ',' as
--decimal separator.
set lc_numeric = "de_DE.UTF-8";
--Below is just an example. In your case the select would be part of
--insert into the target table. Also the first part of to_number
--would be the field from your staging table.
select to_number('9669,84', '99999D999');
9669.84
You might need to change the format string to match all the numbers. For more information on what is available see Data formatting Table 9.28. Template Patterns for Numeric Formatting.

Use of column names in Redshift COPY command which is a reserved keyword

I have a table in redshift where the column names are 'begin' and 'end'. They are Redshift keywords. I want to explicitly use them in the Redshift COPY command. Is there a workaround rather than renaming the column names in the table. That will be my last option.
I tried to enclose them within single/double quotes, but looks like the COPY command only accepts comma separated column names.
Copy command works fails if you don't escape keywords as column name. e.g. begin or end.
copy test1(col1,begin,end,col2) from 's3://example/file/data1.csv' credentials 'aws_access_key_id=XXXXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXX' delimiter ',';
ERROR: syntax error at or near "end"
But, it works fine if as begin and end are enclosed by double quote(") as below.
copy test1(col1,"begin","end",col2) from 's3://example/file/data1.csv' credentials 'aws_access_key_id=XXXXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXX' delimiter ',';
I hope it helps.
If there is some different error please update your question.

How to set the delimiter, Postgresql

I am wondering what the delimiter from this .csv file is. I am trying to import the .csv via the COPY FROM Statement, but somehow it throws always an error. When I change the delimiter to E'\t' it throws an error. When I change the delimiter to '|' it throws a different error. I have been trying to import a silly .csv file for 3 days and I cannot achieve a success. I really need your help. Here is my .csv file: Download here, please
My code on postgresql looks like this:
CREATE TABLE movie
(
imdib varchar NOT NULL,
name varchar NOT NULL,
year integer,
rating float ,
votes integer,
runtime varchar ,
directors varchar ,
actors varchar ,
genres varchar
);
MY COPY Statement:
COPY movie FROM '/home/max/Schreibtisch/imdb_top100t_2015-06-18.csv' (DELIMITER E'\t', FORMAT CSV, NULL '', ENCODING 'UTF8');
When I use SHOW SERVER_ENCODING it says "UTF8". But why the hell can't postgre read the datas from the columns? I really do not get it. I use Ubuntu 64 bit, the .csv file has all the permissions it needs, postgresql has also. Please help me.
These are my errors:
ERROR: missing data for column "name"
CONTEXT: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
********** Error **********
ERROR: missing data for column "name"
SQL state: 22P04
Context: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
Use this code instead it is working fine on Linux as well on windows
\COPY movie(imdib,name,year,rating,votes,runtime,directors,actors,genres) FROM 'D:\test.csv' WITH DELIMITER '|' CSV HEADER;
and one more thing insert header in your csv file like shown below:
imdib|name|year|rating|votes|runtime|directors|actors|genres
tt0111161|The Shawshank Redemption|1994|9.3|1468273|142 mins.|Frank Darabont|Tim Robbins|Morgan Freeman
and use single byte delimiter like ',','|' etc.
Hope this will work for you ..!
The following works for me:
COPY movie (imdib,name,year,rating,votes,runtime,directors,actors,genres)
FROM 'imdb_top100t_2015-06-18.csv'
WITH (format csv, header false, delimiter E'\t', NULL '');
Unfortunately the file is invalid because on line 12011 the column year contains the value 2015 Video and thus the import fails because this can't be converted to an integer. And then further down (line 64155) there is an invalid value NA for the rating which can't be converted to a float and then one more for the votes.
But if you create the table with all varchar columns the above command worked for me.

Postgres CSV copy statement

Hi guys i have question better say error question.
MY function in postgres look's like:
CREATE TEMP TABLE aljazeera
(
date character varying,
channel_name character varying,
date1 character varying,
start character varying,
program character varying,
sub character varying,
epis_title character varying
) ;
COPY aljazeera FROM '/opt/transcode/data/epg/output/CSV/ALJAZEERA/pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"');
And my file looks like :
http://www.speedyshare.com/bNcjM/pre-g-1.csv
(WARNING: possible malware, certainly nasty bundleware on download link. Only use the top link Download: pre-g-1.csv).
When I try to upload to table it say error:
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
I don't know where is problem. Any advice for this issue.
Once I eventually downloaded the file without that nasty download manager I could reproduce the error:
craig=> \copy aljazeera FROM 'pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"')
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
The error isn't actually on line 175. It is that some prior line has an unbalanced quote. By binary search it was easy to narrow that down to line 26:
2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
I'm sure you can see what the problem is there. You've got a stray quote in the date.
BTW, if you ever need to link to chunks of text, you can use http://gist.github.com/, http://pastebin.com/, http://pastebin.ca/, etc. (For PostgreSQL query plans http://explain.depesz.com/ is best). That speedyshare thing is nasty.

postgresql how to have COPY interpret formatted numeric fields automatically?

I have an input CSV file containing something like:
SD-32MM-1001,"100.00",4/11/2012
SD-32MM-1001,"1,000.00",4/12/2012
I was trying to COPY import that into a postgresql table(varchar,float8,date) and ran into an error:
# copy foo from '/tmp/foo.csv' with header csv;
ERROR: invalid input syntax for type double precision: "1,000.00"
Time: 1.251 ms
Aside from preprocessing the input file, is there some setting in PG that will have it read a file like the one above and convert to numeric form in COPY? Something other than COPY?
If preprocessing is required, can it be set as part of the COPY command? (Not the psql \copy)?
Thanks a lot.
The option to pre processing is to first copy to a temporary table as text. From there insert into the definitive table using the to_number function:
select to_number('1,000.00', 'FM000,009.99')::double precision;
It's an odd CSV file that surrounds numeric values with double quotes, but leaves values like SD-32MM-1001 unquoted. In fact, I'm not sure I've ever seen a CSV file like that.
If I were in your shoes, I'd try copy against a file formatted like this.
"SD-32MM-1001",100.00,4/11/2012
"SD-32MM-1001",1000.00,4/12/2012
Note that numbers have no commas. I was able to import that file successfully with
copy test from '/fullpath/test.dat' with csv
I think your best bet is to get better formatted output from your source.