Postgres CSV copy statement - postgresql

Hi guys i have question better say error question.
MY function in postgres look's like:
CREATE TEMP TABLE aljazeera
(
date character varying,
channel_name character varying,
date1 character varying,
start character varying,
program character varying,
sub character varying,
epis_title character varying
) ;
COPY aljazeera FROM '/opt/transcode/data/epg/output/CSV/ALJAZEERA/pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"');
And my file looks like :
http://www.speedyshare.com/bNcjM/pre-g-1.csv
(WARNING: possible malware, certainly nasty bundleware on download link. Only use the top link Download: pre-g-1.csv).
When I try to upload to table it say error:
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
I don't know where is problem. Any advice for this issue.

Once I eventually downloaded the file without that nasty download manager I could reproduce the error:
craig=> \copy aljazeera FROM 'pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"')
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
The error isn't actually on line 175. It is that some prior line has an unbalanced quote. By binary search it was easy to narrow that down to line 26:
2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
I'm sure you can see what the problem is there. You've got a stray quote in the date.
BTW, if you ever need to link to chunks of text, you can use http://gist.github.com/, http://pastebin.com/, http://pastebin.ca/, etc. (For PostgreSQL query plans http://explain.depesz.com/ is best). That speedyshare thing is nasty.

Related

postgres csv date null import error

I am importing data into a Postgres database. The table I am importing to includes a couple of columns with dates.
The CSV file I am uploading, however, has empty values for some of the date fields.
The table looks like this:
dot_number bigint,
legal_name character varying,
dba_name character varying,
carrier_operation character varying,
hm_flag character varying,
pc_flag character varying,
...
mcs150_date date,
mcs150_mileage bigint,
The data looks like this:
1000045,"GLENN M HINES","","C","N","N","317 BURNT BROW RD","HAMMOND","ME","04730","US","317 BURNT BROW RD","HAMMOND","ME","04730","US","(207) 532-4141","","","19-NOV-13","20000","2012","23-JAN-02","ME","1","2"
1000050,"ROGER L BUNCH","","C","N","N","108 ST CHARLES CT","GLASGOW","KY","42141","US","108 ST CHARLES CT","GLASGOW","KY","42141","US","(270) 651-3940","","","","72000","2001","23-JAN-02","KY","1","1"
I have tried doing this:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER NULL '';
But I get this error:
ERROR: invalid input syntax for type date: "" CONTEXT: COPY cc, line
24, column mcs150_date: ""
********** Error **********
ERROR: invalid input syntax for type date: "" SQL state: 22007
Context: COPY cc, line 24, column mcs150_date: ""
This is probably pretty simple, but none of the solutions I've found online did not work.
You need to specify the QUOTE character so that "" would be interpreted as NULL, like so:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER QUOTE '"' NULL '';
QUOTE '"' was the addition.
Docs: https://www.postgresql.org/docs/current/static/sql-copy.html
I ended up importing as text and then altering the tables according to the correct type.
Just for any future reference.
Docs:https://www.postgresql.org/docs/current/sql-copy.html
says,
NULL
Specifies the string that represents a null value. The default is \N
(backslash-N) in text format, and an unquoted empty string in CSV
format. You might prefer an empty string even in text format for
cases where you don't want to distinguish nulls from empty strings.
This option is not allowed when using binary format.
so remove the quote on the empty string to obtain an NULL value on these empty date values.
Just for future reference, the issue here was probably the date format of the not-null date values. It's common for an MS Excel file saved to CSV to have that format, 01-JUL-16, but PostgreSQL will not know what to do with it unless you've first converted it to one of the standard date formats[1]. But PostgreSQL won't be able to accept that format "out of the box" when doing a COPY, because it'll be presented with a date string that doesn't match one of the format masks that it can handle by default.
That, AND the null handling for null date values.
[1] (and perhaps dealt with the consequences of having a 2-digit year upfront, particularly that years prior to 1969 will be interpreted as 20xx).

How to set the delimiter, Postgresql

I am wondering what the delimiter from this .csv file is. I am trying to import the .csv via the COPY FROM Statement, but somehow it throws always an error. When I change the delimiter to E'\t' it throws an error. When I change the delimiter to '|' it throws a different error. I have been trying to import a silly .csv file for 3 days and I cannot achieve a success. I really need your help. Here is my .csv file: Download here, please
My code on postgresql looks like this:
CREATE TABLE movie
(
imdib varchar NOT NULL,
name varchar NOT NULL,
year integer,
rating float ,
votes integer,
runtime varchar ,
directors varchar ,
actors varchar ,
genres varchar
);
MY COPY Statement:
COPY movie FROM '/home/max/Schreibtisch/imdb_top100t_2015-06-18.csv' (DELIMITER E'\t', FORMAT CSV, NULL '', ENCODING 'UTF8');
When I use SHOW SERVER_ENCODING it says "UTF8". But why the hell can't postgre read the datas from the columns? I really do not get it. I use Ubuntu 64 bit, the .csv file has all the permissions it needs, postgresql has also. Please help me.
These are my errors:
ERROR: missing data for column "name"
CONTEXT: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
********** Error **********
ERROR: missing data for column "name"
SQL state: 22P04
Context: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
Use this code instead it is working fine on Linux as well on windows
\COPY movie(imdib,name,year,rating,votes,runtime,directors,actors,genres) FROM 'D:\test.csv' WITH DELIMITER '|' CSV HEADER;
and one more thing insert header in your csv file like shown below:
imdib|name|year|rating|votes|runtime|directors|actors|genres
tt0111161|The Shawshank Redemption|1994|9.3|1468273|142 mins.|Frank Darabont|Tim Robbins|Morgan Freeman
and use single byte delimiter like ',','|' etc.
Hope this will work for you ..!
The following works for me:
COPY movie (imdib,name,year,rating,votes,runtime,directors,actors,genres)
FROM 'imdb_top100t_2015-06-18.csv'
WITH (format csv, header false, delimiter E'\t', NULL '');
Unfortunately the file is invalid because on line 12011 the column year contains the value 2015 Video and thus the import fails because this can't be converted to an integer. And then further down (line 64155) there is an invalid value NA for the rating which can't be converted to a float and then one more for the votes.
But if you create the table with all varchar columns the above command worked for me.

PG COPY error: invalid input syntax for integer

Running COPY results in ERROR: invalid input syntax for integer: "" error message for me. What am I missing?
My /tmp/people.csv file:
"age","first_name","last_name"
"23","Ivan","Poupkine"
"","Eugene","Pirogov"
My /tmp/csv_test.sql file:
CREATE TABLE people (
age integer,
first_name varchar(20),
last_name varchar(20)
);
COPY people
FROM '/tmp/people.csv'
WITH (
FORMAT CSV,
HEADER true,
NULL ''
);
DROP TABLE people;
Output:
$ psql postgres -f /tmp/sql_test.sql
CREATE TABLE
psql:sql_test.sql:13: ERROR: invalid input syntax for integer: ""
CONTEXT: COPY people, line 3, column age: ""
DROP TABLE
Trivia:
PostgreSQL 9.2.4
ERROR: invalid input syntax for integer: ""
"" isn't a valid integer. PostgreSQL accepts unquoted blank fields as null by default in CSV, but "" would be like writing:
SELECT ''::integer;
and fail for the same reason.
If you want to deal with CSV that has things like quoted empty strings for null integers, you'll need to feed it to PostgreSQL via a pre-processor that can neaten it up a bit. PostgreSQL's CSV input doesn't understand all the weird and wonderful possible abuses of CSV.
Options include:
Loading it in a spreadsheet and exporting sane CSV;
Using the Python csv module, Perl Text::CSV, etc to pre-process it;
Using Perl/Python/whatever to load the CSV and insert it directly into the DB
Using an ETL tool like CloverETL, Talend Studio, or Pentaho Kettle
I think it's better to change your csv file like:
"age","first_name","last_name"
23,Ivan,Poupkine
,Eugene,Pirogov
It's also possible to define your table like
CREATE TABLE people (
age varchar(20),
first_name varchar(20),
last_name varchar(20)
);
and after copy, you can convert empty strings:
select nullif(age, '')::int as age, first_name, last_name
from people
Just came across this while looking for a solution and wanted to add I was able to solve the issue by adding the "null" parameter to the copy_from call:
cur.copy_from(f, tablename, sep=',', null='')
I got this error when loading '|' separated CSV file although there were no '"' characters in my input file. It turned out that I forgot to specify FORMAT:
COPY ... FROM ... WITH (FORMAT CSV, DELIMITER '|').
Use the below command to copy data from CSV in a single line without casting and changing your datatype.
Please replace "NULL" by your string which creating error in copy data
copy table_name from 'path to csv file' (format csv, null "NULL", DELIMITER ',', HEADER);
I had this same error on a postgres .sql file with a COPY statement, but my file was tab-separated instead of comma-separated and quoted.
My mistake was that I eagerly copy/pasted the file contents from github, but in that process all the tabs were converted to spaces, hence the error. I had to download and save the raw file to get a good copy.
CREATE TABLE people (
first_name varchar(20),
age integer,
last_name varchar(20)
);
"first_name","age","last_name"
Ivan,23,Poupkine
Eugene,,Pirogov
copy people from 'file.csv' with (delimiter ';', null '');
select * from people;
Just in first column.....
Ended up doing this using csvfix:
csvfix map -fv '' -tv '0' /tmp/people.csv > /tmp/people_fixed.csv
In case you know for sure which columns were meant to be integer or float, you can specify just them:
csvfix map -f 1 -fv '' -tv '0' /tmp/people.csv > /tmp/people_fixed.csv
Without specifying the exact columns, one may experience an obvious side-effect, where a blank string will be turned into a string with a 0 character.
this ought to work without you modifying the source csv file:
alter table people alter column age type text;
copy people from '/tmp/people.csv' with csv;
There is a way to solve "", the quoted null string as null in integer column,
use FORCE_NULL option :
\copy table_name FROM 'file.csv' with (FORMAT CSV, FORCE_NULL(column_name));
see postgresql document, https://www.postgresql.org/docs/current/static/sql-copy.html
All in python (using psycopg2), create the empty table first then use copy_expert to load the csv into it. It should handle for empty values.
import psycopg2
conn = psycopg2.connect(host="hosturl", database="db_name", user="username", password="password")
cur = conn.cursor()
cur.execute("CREATE TABLE schema.destination_table ("
"age integer, "
"first_name varchar(20), "
"last_name varchar(20)"
");")
with open(r'C:/tmp/people.csv', 'r') as f:
next(f) # Skip the header row. Or remove this line if csv has no header.
conn.cursor.copy_expert("""COPY schema.destination_table FROM STDIN WITH (FORMAT CSV)""", f)
Incredibly, my solution to the same error was to just re-arrange the columns. For anyone else doing the above solutions and still not getting past the error.
I apparently had to arrange the columns in my CSV file to match the same sequence in the table listing in PGADmin.

ERROR: COPY delimiter must be a single one-byte character

I want to load the data from a flat file with delimiter "~,~" into a PostgreSQL table. I have tried it as below but looks like there is a restriction for the delimiter. If COPY statement doesn't allow multiple chars for delimiter, is there any alternative to do this?
metadb=# \COPY public.CME_DATA_STAGE_TRANS FROM 'E:\Infor\Outbound_Marketing\7.2.1\EM\metadata\pgtrans.log' WITH DELIMITER AS '~,~'
ERROR: COPY delimiter must be a single one-byte character
\copy: ERROR: COPY delimiter must be a single one-byte character
If you are using Vertica, you could use E'\t'or U&'\0009'
To indicate a non-printing delimiter character (such as a tab),
specify the character in extended string syntax (E'...'). If your
database has StandardConformingStrings enabled, use a Unicode string
literal (U&'...'). For example, use either E'\t' or U&'\0009' to
specify tab as the delimiter.
Unfortunatelly there is no way to load flat file with multiple characters delimiter ~,~ in Postgres unless you want to modify source code (and recompile of course) by yourself in some (terrific) way:
/* Only single-byte delimiter strings are supported. */
if (strlen(cstate->delim) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must be a single one-byte character")));
What you want is to preprocess your input file with some external tool, for example sed might to be best companion on GNU/Linux platfom, for example:
sed s/~,~/\\t/g inputFile
The obvious thing to do is what all other answers advised. Edit import file. I would do that, too.
However, as a proof of concept, here are two ways to accomplish this without additional tools.
1) General solution
CREATE OR REPLACE FUNCTION f_import_file(OUT my_count integer)
RETURNS integer AS
$BODY$
DECLARE
myfile text; -- read xml file into that var.
datafile text := '\path\to\file.txt'; -- !pg_read_file only accepts relative path in database dir!
BEGIN
myfile := pg_read_file(datafile, 0, 100000000); -- arbitrary 100 MB max.
INSERT INTO public.my_tbl
SELECT ('(' || regexp_split_to_table(replace(myfile, '~,~', ','), E'\n') || ')')::public.my_tbl;
-- !depending on file format, you might need additional quotes to create a valid format.
GET DIAGNOSTICS my_count = ROW_COUNT;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
This uses a number of pretty advanced features. If anybody is actually interested and needs an explanation, leave a comment to this post and I will elaborate.
2) Special case
If you can guarantee that '~' is only present in the delimiter '~,~', then you can go ahead with a plain COPY in this special case. Just treat ',' in '~,~' as an additional columns.
Say, your table looks like this:
CREATE TABLE foo (a int, b int, c int);
Then you can (in one transaction):
CREATE TEMP TABLE foo_tmp ON COMMIT DROP (
a int, tmp1 "char"
,b int, tmp2 "char"
,c int);
COPY foo_tmp FROM '\path\to\file.txt' WITH DELIMITER AS '~';
ALTER TABLE foo_tmp DROP COLUMN tmp1;
ALTER TABLE foo_tmp DROP COLUMN tmp2;
INSERT INTO foo SELECT * FROM foo_tmp;
Not quite sure if you're looking for a postgresql solution or just a general one.
If it were me, I would open up a copy of vim (or gvim) and run the commend :%s/~,~/~/g
That replaces all "~,~" with "~".
you can use a single character delimiter, open notepad press ctrl+h replace ~,~ with something will not interfere. like |

COPY from S3 to Redshift not recognizing newline

I am trying to run a COPY command from an S3 bucket to a Redshift PostgreSQL table, and I am getting the following error (in stl_load_errors):
err_code: 1207
err_reason: Invalid digit, Value '2', Pos 3, Type: Short
raw_field_value:
2
2/28/15
The file has 2 lines:
2/28/15,Phone,Android,0,1,3,2,2
2/28/15,Phone,Android,0,4,1,2,2
The CREATE TABLE code is:
create table aggregate_table( date date, variable varchar(15),source varchar(15), prepaid smallint, direction smallint, total smallint, carrier smallint, carrier_group smallint)
It seems like the newline is not being recognized, and is trying to read the end of the first line and the beginning of the second line as one value. I have tried using delimiter ',' and escape, but nothing seems to work.
Thank you for your help!
Edit: Here's the COPY command (i've also tried it with escape at the end as well)
COPY aggregate_table FROM 's3://path_to_file.csv' CREDENTIALS 'aws_access_key_id=XXXX;aws_secret_access_key=XXXXX' CSV delimiter ',' DATEFORMAT AS 'MM/DD/YY';
You need to add DATEFORMAT AS 'MM/DD/YY' to your COPY command. Otherwise redshift can not parse date in first column correctly, as it expects YYYY-MM-DD.
See http://docs.aws.amazon.com/redshift/latest/dg/r_DATEFORMAT_and_TIMEFORMAT_strings.html for more details.
#quarterdome thanks for working through this with me! After you pointed out that it worked, I tried from the beginning to end again. It turns out that when I saved the file without a .csv extension, it worked! –