Postgresql invalid input syntax with windows CSV files - postgresql

For some reason Postgresql wont read my CSV files in the form:
2017-10-20T21:20:00,124.502,CAM[CR][LF]
2017-10-20T21:21:00,124.765,CAM[CR][LF]
(thats an ISO compliant timestamp right?) into a table defined as:
CREATE TABLE ext_bsrn.spa_temp (
spadate TIMESTAMP WITHOUT TIME ZONE,
spa_azimuth NUMERIC,
station_id CHAR(3) )
WITH (oids = false);
It returns this error:
ERROR: invalid input syntax for type timestamp: "?2015-01-01T00:00:00"
CONTEXT: COPY spa_temp, line 1, column spadate: "?2015-01-01T00:00:00"
I don't understand why the '?' is shown inside the quotes in the error message, there's no characters before 2015 in my file (checked it in Notepad++ with noprint characters shown.)
I tried both windows (CRLF) and unix (LF) line ends, but neither makes any difference.
I also tried seperate date & time columns but then it just throws a similar error re the date field. "invalid input syntax for type date"
Does line 1 mean the first line or the second line (if there is a Line 0)?

Related

Getting ERROR: invalid input syntax for type double precision: "" in PostgreSQL

I am trying to use copy data from CSV to Postgres Table using the following command.
psql -c "\COPY team_cweo.bsa_mobile_pre_retention_asset FROM 'part-00199-8372009a-439d-49e0-9efc-141aead78131-c000.csv' CSV HEADER DELIMITER ','
The CSV file is the result oft Spark DataFrameWriter. I realized that for some fields there are null values which is represent as "" in the CSV file. But because of this I am getting the following error :
ERROR: invalid input syntax for type double precision: ""
CONTEXT: COPY bsa_mobile_pre_retention_asset, line 3, column 6281410000207
How should I do so that Postgresql knows that "" is null values instead of empty string. Or should I do something in the DataFrameWriter so that null values can be represent as something else in the CSV file.
Yes, it would be good if you could choose a different representation for NULL values, ideally an empty string. At any rate, it cannot contain the escape character (by default "). You can then use the NULL option of COPY, for example NULL '(null)' (the default value is the empty string).
If you cannot do that, you could define the column as type text and later convert it with
ALTER TABLE tab
ALTER col TYPE double precision USING CAST (nullif(col, '') AS double precision);
But that requires that the table gets rewritten, which can take a while.

postgres csv date null import error

I am importing data into a Postgres database. The table I am importing to includes a couple of columns with dates.
The CSV file I am uploading, however, has empty values for some of the date fields.
The table looks like this:
dot_number bigint,
legal_name character varying,
dba_name character varying,
carrier_operation character varying,
hm_flag character varying,
pc_flag character varying,
...
mcs150_date date,
mcs150_mileage bigint,
The data looks like this:
1000045,"GLENN M HINES","","C","N","N","317 BURNT BROW RD","HAMMOND","ME","04730","US","317 BURNT BROW RD","HAMMOND","ME","04730","US","(207) 532-4141","","","19-NOV-13","20000","2012","23-JAN-02","ME","1","2"
1000050,"ROGER L BUNCH","","C","N","N","108 ST CHARLES CT","GLASGOW","KY","42141","US","108 ST CHARLES CT","GLASGOW","KY","42141","US","(270) 651-3940","","","","72000","2001","23-JAN-02","KY","1","1"
I have tried doing this:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER NULL '';
But I get this error:
ERROR: invalid input syntax for type date: "" CONTEXT: COPY cc, line
24, column mcs150_date: ""
********** Error **********
ERROR: invalid input syntax for type date: "" SQL state: 22007
Context: COPY cc, line 24, column mcs150_date: ""
This is probably pretty simple, but none of the solutions I've found online did not work.
You need to specify the QUOTE character so that "" would be interpreted as NULL, like so:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER QUOTE '"' NULL '';
QUOTE '"' was the addition.
Docs: https://www.postgresql.org/docs/current/static/sql-copy.html
I ended up importing as text and then altering the tables according to the correct type.
Just for any future reference.
Docs:https://www.postgresql.org/docs/current/sql-copy.html
says,
NULL
Specifies the string that represents a null value. The default is \N
(backslash-N) in text format, and an unquoted empty string in CSV
format. You might prefer an empty string even in text format for
cases where you don't want to distinguish nulls from empty strings.
This option is not allowed when using binary format.
so remove the quote on the empty string to obtain an NULL value on these empty date values.
Just for future reference, the issue here was probably the date format of the not-null date values. It's common for an MS Excel file saved to CSV to have that format, 01-JUL-16, but PostgreSQL will not know what to do with it unless you've first converted it to one of the standard date formats[1]. But PostgreSQL won't be able to accept that format "out of the box" when doing a COPY, because it'll be presented with a date string that doesn't match one of the format masks that it can handle by default.
That, AND the null handling for null date values.
[1] (and perhaps dealt with the consequences of having a 2-digit year upfront, particularly that years prior to 1969 will be interpreted as 20xx).

How to set the delimiter, Postgresql

I am wondering what the delimiter from this .csv file is. I am trying to import the .csv via the COPY FROM Statement, but somehow it throws always an error. When I change the delimiter to E'\t' it throws an error. When I change the delimiter to '|' it throws a different error. I have been trying to import a silly .csv file for 3 days and I cannot achieve a success. I really need your help. Here is my .csv file: Download here, please
My code on postgresql looks like this:
CREATE TABLE movie
(
imdib varchar NOT NULL,
name varchar NOT NULL,
year integer,
rating float ,
votes integer,
runtime varchar ,
directors varchar ,
actors varchar ,
genres varchar
);
MY COPY Statement:
COPY movie FROM '/home/max/Schreibtisch/imdb_top100t_2015-06-18.csv' (DELIMITER E'\t', FORMAT CSV, NULL '', ENCODING 'UTF8');
When I use SHOW SERVER_ENCODING it says "UTF8". But why the hell can't postgre read the datas from the columns? I really do not get it. I use Ubuntu 64 bit, the .csv file has all the permissions it needs, postgresql has also. Please help me.
These are my errors:
ERROR: missing data for column "name"
CONTEXT: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
********** Error **********
ERROR: missing data for column "name"
SQL state: 22P04
Context: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
Use this code instead it is working fine on Linux as well on windows
\COPY movie(imdib,name,year,rating,votes,runtime,directors,actors,genres) FROM 'D:\test.csv' WITH DELIMITER '|' CSV HEADER;
and one more thing insert header in your csv file like shown below:
imdib|name|year|rating|votes|runtime|directors|actors|genres
tt0111161|The Shawshank Redemption|1994|9.3|1468273|142 mins.|Frank Darabont|Tim Robbins|Morgan Freeman
and use single byte delimiter like ',','|' etc.
Hope this will work for you ..!
The following works for me:
COPY movie (imdib,name,year,rating,votes,runtime,directors,actors,genres)
FROM 'imdb_top100t_2015-06-18.csv'
WITH (format csv, header false, delimiter E'\t', NULL '');
Unfortunately the file is invalid because on line 12011 the column year contains the value 2015 Video and thus the import fails because this can't be converted to an integer. And then further down (line 64155) there is an invalid value NA for the rating which can't be converted to a float and then one more for the votes.
But if you create the table with all varchar columns the above command worked for me.

DBASE Command Line to Replace a Year in Date Field

I am working in a software program called PastPerfect that has a "command window" where it says you can use dbase commands to do global updates to the program's dbf files.
THE PROBLEM: a user accidentally entered the wrong year, "1901", in a date field across multitudes of records and it needs to be replaced/fixed with the year "2001".
I have tried:
REPLACE YEAR(catdate) WITH 2001 FOR YEAR(catdate)=1901
and it keeps telling me it is an Invalid Command
Can somebody give me the correct dbase/foxpro syntax to replace all the years that are 1901 with 2001?
The syntax for the REPLACE command is.
REPLACE FieldName WITH Value FOR BooleanExpression
If CATDATE is a date field (no time), then
REPLACE catdate WITH DATE(2001, MONTH(catdate), DAY(catdate)) FOR YEAR(catdate) = 1901
If CATDATE is a date time field, then
REPLACE catdate WITH DATETIME(2001, MONTH(catdate), DAY(catdate), HOUR(catdate), MINUTE(catdate), SEC(catdate)) FOR YEAR(catdate) = 1901

Postgres CSV copy statement

Hi guys i have question better say error question.
MY function in postgres look's like:
CREATE TEMP TABLE aljazeera
(
date character varying,
channel_name character varying,
date1 character varying,
start character varying,
program character varying,
sub character varying,
epis_title character varying
) ;
COPY aljazeera FROM '/opt/transcode/data/epg/output/CSV/ALJAZEERA/pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"');
And my file looks like :
http://www.speedyshare.com/bNcjM/pre-g-1.csv
(WARNING: possible malware, certainly nasty bundleware on download link. Only use the top link Download: pre-g-1.csv).
When I try to upload to table it say error:
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
I don't know where is problem. Any advice for this issue.
Once I eventually downloaded the file without that nasty download manager I could reproduce the error:
craig=> \copy aljazeera FROM 'pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"')
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
The error isn't actually on line 175. It is that some prior line has an unbalanced quote. By binary search it was easy to narrow that down to line 26:
2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
I'm sure you can see what the problem is there. You've got a stray quote in the date.
BTW, if you ever need to link to chunks of text, you can use http://gist.github.com/, http://pastebin.com/, http://pastebin.ca/, etc. (For PostgreSQL query plans http://explain.depesz.com/ is best). That speedyshare thing is nasty.