Example CSV line:
"2012","Test User","ABC","First","71.0","","","0","0","3","3","0","0","","0","","","","","0.1","","4.0","0.1","4.2","80.8","847"
All values after "First" are numeric columns. Lots of NULL values just quoted as such, right.
Attempt at COPY:
copy mytable from 'myfile.csv' with csv header quote '"';
NOPE: ERROR: invalid input syntax for type numeric: ""
Well, yeah. It's a null value. Attempt 2 at COPY:
copy mytable from 'myfile.csv' with csv header quote '"' null '""';
NOPE: ERROR: CSV quote character must not appear in the NULL specification
What's a fella to do? Strip out all double quotes from the file before running COPY? Can do that, but I figured there's a proper solution to what must be an incredibly common problem.
While some database products treat an empty string as a NULL value, the standard says that they are distinct, and PostgreSQL treats them as distinct.
It would be best if you could generate your CSV file with an unambiguous representation. While you could use sed or something to filter the file to good format, the other option would be to COPY the data in to a table where a text column could accept the empty strings, and then populate the target table. The NULLIF function may help with that: http://www.postgresql.org/docs/9.1/interactive/functions-conditional.html#FUNCTIONS-NULLIF -- it will return NULL if both arguments match and the first value if they don't. So, something like NULLIF(txtcol, '')::numeric might work for you.
as an alternative, using
sed 's/""//g' myfile.csv > myfile-formatted.csv
psql
# copy mytable from 'myfile-formatted.csv' with csv header;
works as well.
I think all you need to do here is the following:
COPY mytable from '/dir/myfile.csv' DELIMITER ',' NULL '' WITH CSV HEADER QUOTE ;
COPY mytable from '/dir/myfile.csv' DELIMITER ',' NULL ''
WITH CSV HEADER FORCE QUOTE *;
This worked for me in Python 3.8.X
import psycopg2
import csv
from io import StringIO
db_conn = psycopg2.connect(host=t_host, port=t_port,
dbname=t_dbname, user=t_user, password=t_pw)
cur = db_conn.cursor()
csv.register_dialect('myDialect',
delimiter=',',
skipinitialspace=True,
quoting=csv.QUOTE_MINIMAL)
with open('files/emp.csv') as f:
next(f)
reader = csv.reader(f, dialect='myDialect')
buffer = StringIO()
writer = csv.writer(buffer, dialect='myDialect')
writer.writerows(reader)
buffer.seek(0)
cur.copy_from(buffer, 'personnes', sep=',', columns=('nom', 'prenom', 'telephone', 'email'))
db_conn.commit()
Related
I have several CSVs with varying field names that I am copying into a Postgres database from an s3 data source. There are quite a few of them that contain empty strings, "" which I would like to convert to NULLs at import. When I attempt to copy I get an error along the lines of this (same issue for other data types, integer, etc.):
psycopg2.errors.InvalidDatetimeFormat: invalid input syntax for type date: ""
I have tried using FORCE_NULL (field 1, field2, field3) - and this works for me, except I would like to do FORCE_NULL (*) and apply to all of the columns as I have A LOT of fields I am bringing in that I'd like this applied to.
Is this available?
This is an example of my csv:
"ABC","tgif","123","","XyZ"
Using psycopg2 Copy functions. In this case copy_expert:
cat empty_str.csv
1, ,3,07/22/2
2,test,4,
3,dog,,07/23/2022
create table empty_str_test(id integer, str_fld varchar, int_fld integer, date_fld date);
import psycopg2
con = psycopg2.connect("dbname=test user=postgres host=localhost port=5432")
cur = con.cursor()
with open("empty_str.csv") as csv_file:
cur.copy_expert("COPY empty_str_test FROM STDIN WITH csv", csv_file)
con.commit()
select * from empty_str_test ;
id | str_fld | int_fld | date_fld
----+---------+---------+------------
1 | | 3 | 2022-07-22
2 | test | 4 |
3 | dog | | 2022-07-23
From here COPY:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
copy_expert allows you specify the CSV format. If you use copy_from it will use the text format.
I have a ton of CSV files that I'm trying to import into Postgres. The CSV data is all quoted regardless of what the data type is. Here's an example:
"3971","14","34419","","","","","6/25/2010 9:07:02 PM","70.21.238.46 "
The first 4 columns are supposed to be integers. Postgres handles the cast from the string "3971" to the integer 3971 correctly, but it pukes at the empty string in the 4th column.
PG::InvalidTextRepresentation: ERROR: invalid input syntax for type integer: ""
This is the command I'm using:
copy "mytable" from '/path/to/file.csv' with delimiter ',' NULL as '' csv header
Is there a proper way to tell Postgres to treat empty strings as null?
How to do this. Since I'm working in psql and using a file that the server user can't reach I use \copy, but the principle is the same:
create table csv_test(col1 integer, col2 integer);
cat csv_test.csv
"1",""
"","2"
\copy csv_test from '/home/aklaver/csv_test.csv' with (format 'csv', force_null (col1, col2));
COPY 2
select * from csv_test ;
col1 | col2
------+------
1 | NULL
NULL | 2
How can I concatenate a string inside of a concatenated jsonb object in postgresql? In other words, I am using the JSONb concatenate operator as well as the text concatenate operator in the same query and running into trouble.
Or... if there is a totally different query I should be executing, I'd appreciate hearing suggestions. The goal is to update a row containing a jsonb column. We don't want to overwrite existing key value pairs in the jsonb column that are not provided in the query and we also want to update multiple rows at once.
My query:
update contacts as c set data = data || '{"geomatch": "MATCH","latitude":'||v.latitude||'}'
from (values (16247746,40.814140),
(16247747,20.900840),
(16247748,20.890570)) as v(contact_id,latitude) where c.contact_id = v.contact_id
The Error:
ERROR: invalid input syntax for type json
LINE 85: update contacts as c set data = data || '{"geomatch": "MATCH...
^
DETAIL: The input string ended unexpectedly.
CONTEXT: JSON data, line 1: {"geomatch": "MATCH","latitude":
SQL state: 22P02
Character: 4573
You might be looking for
SET data = data || ('{"geomatch": "MATCH","latitude":'||v.latitude||'}')::jsonb
-- ^^ jsonb ^^ text ^^ text
but that's not how one should build JSON objects - that v.latitude might not be a valid JSON literal, or even contain some injection like "", "otherKey": "oops". (Admittedly, in your example you control the values, and they're numbers so it might be fine, but it's still a bad practice). Instead, use jsonb_build_object:
SET data = data || jsonb_build_object('geomatch', 'MATCH', 'latitude', v.latitude)
There are two problems. The first is operator precedence preventing your concatenation of a jsonb object to what is read a text object. The second is that the concatenation of text pieces requires a cast to jsonb.
This should work:
update contacts as c
set data = data || ('{"geomatch": "MATCH","latitude":'||v.latitude||'}')::jsonb
from (values (16247746,40.814140),
(16247747,20.900840),
(16247748,20.890570)) as v(contact_id,latitude)
where c.contact_id = v.contact_id
;
I'm trying to import data into a table with a jsonb column type, using a csv. I've read the csv specs that say any column value containing double quotes needs to:
be wrapped in quotes (double quotes at beginning and end)
double quotes escaped with a double quote (so if you want a double quote, you must use 2 double quotes instead of just 1 double quote)
My csv column value for the jsonb type looks like this (shortened for brevity):
"[
{
""day"": 0,
""schedule"": [
{
""open"": ""07:00"",
""close"": ""12:00""
}
]
}
]"
Note: opened this csv in notepad++ in case the editor is doing any special escaping, and all quotes are as they appear in editor.
Now I was curious about what the QUOTE AND ESCAPE values were in that PGAdmin error message, so here they are copied/pasted:
QUOTE '\"'
ESCAPE '''';""
To upload to PGAdmin, do I need to use \" to around each json token as (possibly?) suggested by that QUOTE value in the error message?
I'm using Go's encoding/csv package to write the csv.
I can load your file into a json or jsonb typed column using:
copy j from '/tmp/foo.csv' csv;
or
copy j from '/tmp/foo.csv' with (format csv);
or their \copy equivalents.
Based on your truncated (incomplete) text-posted-as-image, it is hard to tell what you are actually doing. But if you do it right, it will work.
The easiest workaround I've found is to copy the json data into a text column in a temporary staging table.
Then issue a query that follows the pattern:
insert into mytable (...) select ..., json_txtcol::json from staging_table
You can process it through another command before PostgreSQL receives the data, replacing the double double-quotes with an escaped double-quote.
For example:
COPY tablename(col1, col2, col3)
FROM PROGRAM $$sed 's/""/\\"/g' myfile.csv$$
DELIMITER ',' ESCAPE '\' CSV HEADER;
Here's a working example:
/tmp/input.csv contains:
Clive Dunn, "[ { ""day"": 0, ""schedule"": [{""open"": ""07:00"", ""close"": ""12:00""}]}]", 3
In psql (but should work in PgAdmin):
postgres=# CREATE TABLE test (person text, examplejson jsonb, num int);
CREATE TABLE
postgres=# COPY test (person, examplejson, num) FROM PROGRAM $$sed 's/""/\\"/g' /tmp/input.csv$$ CSV DELIMITER ',' ESCAPE '\';
COPY 1
postgres=# SELECT * FROM test;
person | examplejson | num
------------+-----------------------------------------------------------------+-----
Clive Dunn | [{"day": 0, "schedule": [{"open": "07:00", "close": "12:00"}]}] | 3
(1 row)
Disclosure: I am an EnterpriseDB (EDB) employee.
I'm copying data from a CSV file into a PostgreSQL table using COPY
My CSV file is simply:
0\"a string"
And my table "Test" was created by the following:
create table test (
id integer,
data jsonb
);
My copy statement was the following:
I received the following error:
williazz=# \copy test from 'test/test.csv' delimiters '\' CSV
ERROR: invalid input syntax for type json
DETAIL: Token "a" is invalid.
CONTEXT: JSON data, line 1: a...
COPY test, line 1, column data: "a string"
Interestingly, when I changed my CSV file to a number, it had no problem.
CSV:
0\1505
williazz=# \copy test from 'test/test.csv' delimiters '\' CSV
COPY 1
williazz=# select * from test;
id | data
----+------
0 | 1505
(1 row)
Furthermore, numbers in arrays also work:
CSV:
1\[0,1,2,3,4,5]
williazz=# select * from test;
id | data
----+---------------
0 | 1505
1 | [0,1,2,3,4,5]
(2 rows)
But as soon as I introduct a non-digit string into the JSON, the COPY stops working
0\[1,2,"three",4,5]
ERROR: invalid input syntax for type json
DETAIL: Token "three" is invalid.
CONTEXT: JSON data, line 1: [1, 2, three...
COPY test, line 1, column data: "[1, 2, three, 4, 5]"
I cannot get postgres to read a non-digit string in JSON format. I've also tried changing the data type of column "data" from jsonb to json, and using basically every combination of single and double quotes
Could someone please help me identify the problem? Thank you
Because your file is CSV encoded, it does not mean what you think.
0\"a string"
With a delimiter of \ this is two values: the number 0 and the string a string. Note the lack of quotes. Those quotes are part of the CSV string formatting. a string is not valid JSON, the quotes are required.
Instead you need to include the JSON string quotes inside the CSV string quotes. Quotes in CSV are escaped by doubling them.
0\"""a string"""
Now that is the number 0 and the string "a string" including quotes.
And as an observation, it would be simpler to remove the complication of embedding JSON into a CSV and use a pure JSON file.
[
[0, "a string"],
[1, "other string"]
]