I am importing data into a Postgres database. The table I am importing to includes a couple of columns with dates.
The CSV file I am uploading, however, has empty values for some of the date fields.
The table looks like this:
dot_number bigint,
legal_name character varying,
dba_name character varying,
carrier_operation character varying,
hm_flag character varying,
pc_flag character varying,
...
mcs150_date date,
mcs150_mileage bigint,
The data looks like this:
1000045,"GLENN M HINES","","C","N","N","317 BURNT BROW RD","HAMMOND","ME","04730","US","317 BURNT BROW RD","HAMMOND","ME","04730","US","(207) 532-4141","","","19-NOV-13","20000","2012","23-JAN-02","ME","1","2"
1000050,"ROGER L BUNCH","","C","N","N","108 ST CHARLES CT","GLASGOW","KY","42141","US","108 ST CHARLES CT","GLASGOW","KY","42141","US","(270) 651-3940","","","","72000","2001","23-JAN-02","KY","1","1"
I have tried doing this:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER NULL '';
But I get this error:
ERROR: invalid input syntax for type date: "" CONTEXT: COPY cc, line
24, column mcs150_date: ""
********** Error **********
ERROR: invalid input syntax for type date: "" SQL state: 22007
Context: COPY cc, line 24, column mcs150_date: ""
This is probably pretty simple, but none of the solutions I've found online did not work.
You need to specify the QUOTE character so that "" would be interpreted as NULL, like so:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER QUOTE '"' NULL '';
QUOTE '"' was the addition.
Docs: https://www.postgresql.org/docs/current/static/sql-copy.html
I ended up importing as text and then altering the tables according to the correct type.
Just for any future reference.
Docs:https://www.postgresql.org/docs/current/sql-copy.html
says,
NULL
Specifies the string that represents a null value. The default is \N
(backslash-N) in text format, and an unquoted empty string in CSV
format. You might prefer an empty string even in text format for
cases where you don't want to distinguish nulls from empty strings.
This option is not allowed when using binary format.
so remove the quote on the empty string to obtain an NULL value on these empty date values.
Just for future reference, the issue here was probably the date format of the not-null date values. It's common for an MS Excel file saved to CSV to have that format, 01-JUL-16, but PostgreSQL will not know what to do with it unless you've first converted it to one of the standard date formats[1]. But PostgreSQL won't be able to accept that format "out of the box" when doing a COPY, because it'll be presented with a date string that doesn't match one of the format masks that it can handle by default.
That, AND the null handling for null date values.
[1] (and perhaps dealt with the consequences of having a 2-digit year upfront, particularly that years prior to 1969 will be interpreted as 20xx).
Related
I am trying to use copy data from CSV to Postgres Table using the following command.
psql -c "\COPY team_cweo.bsa_mobile_pre_retention_asset FROM 'part-00199-8372009a-439d-49e0-9efc-141aead78131-c000.csv' CSV HEADER DELIMITER ','
The CSV file is the result oft Spark DataFrameWriter. I realized that for some fields there are null values which is represent as "" in the CSV file. But because of this I am getting the following error :
ERROR: invalid input syntax for type double precision: ""
CONTEXT: COPY bsa_mobile_pre_retention_asset, line 3, column 6281410000207
How should I do so that Postgresql knows that "" is null values instead of empty string. Or should I do something in the DataFrameWriter so that null values can be represent as something else in the CSV file.
Yes, it would be good if you could choose a different representation for NULL values, ideally an empty string. At any rate, it cannot contain the escape character (by default "). You can then use the NULL option of COPY, for example NULL '(null)' (the default value is the empty string).
If you cannot do that, you could define the column as type text and later convert it with
ALTER TABLE tab
ALTER col TYPE double precision USING CAST (nullif(col, '') AS double precision);
But that requires that the table gets rewritten, which can take a while.
I try to read a csv file containing real numbers with a comma as separator. I try to read this file with \copy in psql:
\copy table FROM 'filename.csv' DELIMITER ';' CSV HEADER;
psql does not recognize the comma as decimal point.
psql:filename.sql:44: ERROR: invalid input syntax for type real: "9669,84"
CONTEXT: COPY filename, line 2, column col-3: "9669,84"
I did some googling but could not find any answer other than "change the decimal comma into a decimal point". I tried SET DECIMALSEPARATORCOMMA=ON; but that did not work. I also experimented with some encoding but I couldn't find whether encoding governs the decimal point (I got the impression it didn't).
Is there really no solution other than changing the input data?
COPY to a table where you insert the number into a varchar field. Then do something like in psql:
--Temporarily change numeric formatting to one that uses ',' as
--decimal separator.
set lc_numeric = "de_DE.UTF-8";
--Below is just an example. In your case the select would be part of
--insert into the target table. Also the first part of to_number
--would be the field from your staging table.
select to_number('9669,84', '99999D999');
9669.84
You might need to change the format string to match all the numbers. For more information on what is available see Data formatting Table 9.28. Template Patterns for Numeric Formatting.
I'm trying to use copy to copy a large csv file into a postgres table.
A certain integer column is primarily null. In the csv file, this column just has "".
Every column is quoted, which doesn't seem to be an issue for other columns.
I get this error when I try to copy it:
ERROR: invalid input syntax for integer: ""
I tried setting a NULL clause to '' and "" in my copy statement. '' does nothing, "" generates an error:
zero-length delimited identifier at or near """"
I tried using sed to change all "" to " ", but that still doesn't work even when I set the null clause to " ". I still get
ERROR: invalid input syntax for integer: " "
For now I am able to proceed by sed'ing the column to -1. I don't really care about this column much anyways. I'd be ok to just setting it to null, or ignoring it, but when I tried to take it out of the column definition section of the copy command, postgres yelled at me.
So my question comes down to this: how can I tell postgres to treat "" as a null value?
Thank you.
The typical way to indicate a missing value (null) in a .csv file is to just put nothing into that field. For instance, if you have three columns (A, B and C) and there is no value for B, the .csv file would contain "Col A value",,"Col C value". "" is a string value, not a numeric value, so there's no way for it to be considered one.
This is what the force_null option is for:
Match the specified columns' values against the null string, even if it has been quoted
So assuming the name of the int column is "y":
\copy foo from foo.csv with (format csv, force_null (y));
For some reason Postgresql wont read my CSV files in the form:
2017-10-20T21:20:00,124.502,CAM[CR][LF]
2017-10-20T21:21:00,124.765,CAM[CR][LF]
(thats an ISO compliant timestamp right?) into a table defined as:
CREATE TABLE ext_bsrn.spa_temp (
spadate TIMESTAMP WITHOUT TIME ZONE,
spa_azimuth NUMERIC,
station_id CHAR(3) )
WITH (oids = false);
It returns this error:
ERROR: invalid input syntax for type timestamp: "?2015-01-01T00:00:00"
CONTEXT: COPY spa_temp, line 1, column spadate: "?2015-01-01T00:00:00"
I don't understand why the '?' is shown inside the quotes in the error message, there's no characters before 2015 in my file (checked it in Notepad++ with noprint characters shown.)
I tried both windows (CRLF) and unix (LF) line ends, but neither makes any difference.
I also tried seperate date & time columns but then it just throws a similar error re the date field. "invalid input syntax for type date"
Does line 1 mean the first line or the second line (if there is a Line 0)?
I am using pgloader to import from a .csv file which has empty strings in double quotes. A sample line is
12334,0,"MAIL","CA","","Sanfransisco","TX","","",""
After a successful import, the fields that has double quotes ("") are shown as two single quotes('') in postgres database.
Is there a way we can insert a null or even empty string in place of two single quotes('')?
I am using the arguments -
WITH truncate,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ','
SET client_encoding to 'UTF-8',
work_mem to '12MB',
standard_conforming_strings to 'on'
I tried using 'empty-string-to-null' mentioned in the documentation like this -
CAST column enumerate.fax using empty-string-to-null
But it gives me an error saying -
pgloader nph_opr_addr.test.load An unhandled error condition has been
signalled: At LOAD CSV
^ (Line 1, Column 0, Position 0) Could not parse subexpression ";"
when parsing
Use the field option:
null if blanks
Something like this:
...
having fields foo, bar, mynullcol null if blanks, baz
From the documentation:
null if
This option takes an argument which is either the keyword blanks or a double-quoted string.
When blanks is used and the field value that is read contains only space characters, then it's automatically converted to an SQL NULL value.
When a double-quoted string is used and that string is read as the field value, then the field value is automatically converted to an SQL NULL value