Postgres CSV import - handle empty strings as integers

Postgres CSV import - handle empty strings as integers - postgresql

I have a ton of CSV files that I'm trying to import into Postgres. The CSV data is all quoted regardless of what the data type is. Here's an example:
"3971","14","34419","","","","","6/25/2010 9:07:02 PM","70.21.238.46 "
The first 4 columns are supposed to be integers. Postgres handles the cast from the string "3971" to the integer 3971 correctly, but it pukes at the empty string in the 4th column.
PG::InvalidTextRepresentation: ERROR: invalid input syntax for type integer: ""
This is the command I'm using:
copy "mytable" from '/path/to/file.csv' with delimiter ',' NULL as '' csv header
Is there a proper way to tell Postgres to treat empty strings as null?

How to do this. Since I'm working in psql and using a file that the server user can't reach I use \copy, but the principle is the same:
create table csv_test(col1 integer, col2 integer);
cat csv_test.csv
"1",""
"","2"
\copy csv_test from '/home/aklaver/csv_test.csv' with (format 'csv', force_null (col1, col2));
COPY 2
select * from csv_test ;
col1 | col2
------+------
1 | NULL
NULL | 2

Related

SQL error (invalid input syntax for type double precision: "") when trying to run query

I am working with world health's covid data for a projoect and have had no issues until this specific query keeps throwing the, invalid input syntax for double precision: "" error.
I should note that the tables were brought in from a CSV file and I am using postgresql.
Query throwing error:
select covid_deaths.continent, covid_deaths.location, covid_deaths.date, covid_deaths.population, covid_vacc.new_vaccinations,
SUM(covid_vacc.new_vaccinations::int) over (partition by covid_deaths.location
order by covid_deaths.location, covid_deaths.date) as RollingPeopleVaccinated
from covid_deaths
join covid_vacc
on covid_deaths.location = covid_vacc.location and covid_deaths.date::date = covid_vacc.date::date
The line throwing the error is line 3, particularly the SUM(covid_vacc.new_vaccinations::int) portion. The new_vaccinations column in the covid_vacc table is VARCHAR datatype, and I know casting is not a great solution, but I am very much trying to avoid having to reimport all of the data from the excel sheet. Evne if I were to do this, not sure how to get all the datatypes correct and issues with null values cleared up.
I have tried not casing the new_vaccinations column as well as casting it as a few different datatypes. Have also tried running querys to alter the datatype of the new_vaccinations column, but I don't believe that is actually working. Fairly new to sql so any help is appreciated.

Use nullif to convert empty strings to NULL. Use trim to shorten empty strings longer then '' down to '' for the nullif function.
Table "public.varchar_test"
Column | Type | Collation | Nullable | Default
------------+-----------------------+-----------+----------+---------
fld_1 | character varying(50) | | |
text_array | text[] | | |
insert into varchar_test(fld_1) values ('1'), (''), ('3');
select sum(fld_1::integer) from varchar_test ;
ERROR: invalid input syntax for type integer: ""
select sum(nullif(fld_1::integer, '')) from varchar_test ;
ERROR: invalid input syntax for type integer: ""
select sum(nullif(trim(fld_1), '')::integer) from varchar_test ;
sum
-----
4
So in your case:
SUM(nullif(trim(covid_vacc.new_vaccinations), '')::integer)
FYI, the above will not deal with issues like:
select '1,000'::float;
ERROR: invalid input syntax for type double precision: "1,000"
select 'a'::float;
ERROR: invalid input syntax for type double precision: "a"
That means there still may be a need for data cleanup on the column.

COPY with FORCE NULL to all fields

I have several CSVs with varying field names that I am copying into a Postgres database from an s3 data source. There are quite a few of them that contain empty strings, "" which I would like to convert to NULLs at import. When I attempt to copy I get an error along the lines of this (same issue for other data types, integer, etc.):
psycopg2.errors.InvalidDatetimeFormat: invalid input syntax for type date: ""
I have tried using FORCE_NULL (field 1, field2, field3) - and this works for me, except I would like to do FORCE_NULL (*) and apply to all of the columns as I have A LOT of fields I am bringing in that I'd like this applied to.
Is this available?
This is an example of my csv:
"ABC","tgif","123","","XyZ"

Using psycopg2 Copy functions. In this case copy_expert:
cat empty_str.csv
1, ,3,07/22/2
2,test,4,
3,dog,,07/23/2022
create table empty_str_test(id integer, str_fld varchar, int_fld integer, date_fld date);
import psycopg2
con = psycopg2.connect("dbname=test user=postgres host=localhost port=5432")
cur = con.cursor()
with open("empty_str.csv") as csv_file:
cur.copy_expert("COPY empty_str_test FROM STDIN WITH csv", csv_file)
con.commit()
select * from empty_str_test ;
id | str_fld | int_fld | date_fld
----+---------+---------+------------
1 | | 3 | 2022-07-22
2 | test | 4 |
3 | dog | | 2022-07-23
From here COPY:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
copy_expert allows you specify the CSV format. If you use copy_from it will use the text format.

How to import a CSV containing a jsonb column type

I'm trying to import data into a table with a jsonb column type, using a csv. I've read the csv specs that say any column value containing double quotes needs to:
be wrapped in quotes (double quotes at beginning and end)
double quotes escaped with a double quote (so if you want a double quote, you must use 2 double quotes instead of just 1 double quote)
My csv column value for the jsonb type looks like this (shortened for brevity):
"[
{
""day"": 0,
""schedule"": [
{
""open"": ""07:00"",
""close"": ""12:00""
}
]
}
]"
Note: opened this csv in notepad++ in case the editor is doing any special escaping, and all quotes are as they appear in editor.
Now I was curious about what the QUOTE AND ESCAPE values were in that PGAdmin error message, so here they are copied/pasted:
QUOTE '\"'
ESCAPE '''';""
To upload to PGAdmin, do I need to use \" to around each json token as (possibly?) suggested by that QUOTE value in the error message?
I'm using Go's encoding/csv package to write the csv.

I can load your file into a json or jsonb typed column using:
copy j from '/tmp/foo.csv' csv;
or
copy j from '/tmp/foo.csv' with (format csv);
or their \copy equivalents.
Based on your truncated (incomplete) text-posted-as-image, it is hard to tell what you are actually doing. But if you do it right, it will work.

The easiest workaround I've found is to copy the json data into a text column in a temporary staging table.
Then issue a query that follows the pattern:
insert into mytable (...) select ..., json_txtcol::json from staging_table

You can process it through another command before PostgreSQL receives the data, replacing the double double-quotes with an escaped double-quote.
For example:
COPY tablename(col1, col2, col3)
FROM PROGRAM $$sed 's/""/\\"/g' myfile.csv$$
DELIMITER ',' ESCAPE '\' CSV HEADER;
Here's a working example:
/tmp/input.csv contains:
Clive Dunn, "[ { ""day"": 0, ""schedule"": [{""open"": ""07:00"", ""close"": ""12:00""}]}]", 3
In psql (but should work in PgAdmin):
postgres=# CREATE TABLE test (person text, examplejson jsonb, num int);
CREATE TABLE
postgres=# COPY test (person, examplejson, num) FROM PROGRAM $$sed 's/""/\\"/g' /tmp/input.csv$$ CSV DELIMITER ',' ESCAPE '\';
COPY 1
postgres=# SELECT * FROM test;
person | examplejson | num
------------+-----------------------------------------------------------------+-----
Clive Dunn | [{"day": 0, "schedule": [{"open": "07:00", "close": "12:00"}]}] | 3
(1 row)
Disclosure: I am an EnterpriseDB (EDB) employee.

Converting DATE columns in Postgres

I have the following text file aatest.txt:
09/25/2019 | 1234.5
10/01/2018 | 6789.0
that would like to convert into zztext.txt:
2019-09-25 | 1234.5
2018-10-01 | 6789.0
My Postgres script is:
CREATE TABLE documents (tdate TEXT, val NUMERIC);
COPY documents FROM 'aatest.txt' WITH CSV DELIMITER '|';
SELECT TO_DATE(tdate, 'mm/dd/yyyy');
COPY documents TO 'zztest.txt' WITH CSV DELIMITER '|';
However I am getting the following error message:
ERROR: column "tdate" does not exist
What am I doing wrong? Thank you!

Your SELECT has no FROM clause, so you can't reference any columns. But you need to put that SELECT into the COPY statement anyways:
CREATE TABLE documents (tdate TEXT, val NUMERIC);
COPY documents FROM 'aatest.txt' WITH CSV DELIMITER '|';
COPY (select to_char(TO_DATE(tdate, 'mm/dd/yyyy'), 'yyyy-mm-dd'), val FROM documents)
TO 'zztest.txt' WITH CSV DELIMITER '|';

postgresql COPY and CSV data w/ double quotes

Example CSV line:
"2012","Test User","ABC","First","71.0","","","0","0","3","3","0","0","","0","","","","","0.1","","4.0","0.1","4.2","80.8","847"
All values after "First" are numeric columns. Lots of NULL values just quoted as such, right.
Attempt at COPY:
copy mytable from 'myfile.csv' with csv header quote '"';
NOPE: ERROR: invalid input syntax for type numeric: ""
Well, yeah. It's a null value. Attempt 2 at COPY:
copy mytable from 'myfile.csv' with csv header quote '"' null '""';
NOPE: ERROR: CSV quote character must not appear in the NULL specification
What's a fella to do? Strip out all double quotes from the file before running COPY? Can do that, but I figured there's a proper solution to what must be an incredibly common problem.

While some database products treat an empty string as a NULL value, the standard says that they are distinct, and PostgreSQL treats them as distinct.
It would be best if you could generate your CSV file with an unambiguous representation. While you could use sed or something to filter the file to good format, the other option would be to COPY the data in to a table where a text column could accept the empty strings, and then populate the target table. The NULLIF function may help with that: http://www.postgresql.org/docs/9.1/interactive/functions-conditional.html#FUNCTIONS-NULLIF -- it will return NULL if both arguments match and the first value if they don't. So, something like NULLIF(txtcol, '')::numeric might work for you.

as an alternative, using
sed 's/""//g' myfile.csv > myfile-formatted.csv
psql
# copy mytable from 'myfile-formatted.csv' with csv header;
works as well.

I think all you need to do here is the following:
COPY mytable from '/dir/myfile.csv' DELIMITER ',' NULL '' WITH CSV HEADER QUOTE ;

COPY mytable from '/dir/myfile.csv' DELIMITER ',' NULL ''
WITH CSV HEADER FORCE QUOTE *;

This worked for me in Python 3.8.X
import psycopg2
import csv
from io import StringIO
db_conn = psycopg2.connect(host=t_host, port=t_port,
dbname=t_dbname, user=t_user, password=t_pw)
cur = db_conn.cursor()
csv.register_dialect('myDialect',
delimiter=',',
skipinitialspace=True,
quoting=csv.QUOTE_MINIMAL)
with open('files/emp.csv') as f:
next(f)
reader = csv.reader(f, dialect='myDialect')
buffer = StringIO()
writer = csv.writer(buffer, dialect='myDialect')
writer.writerows(reader)
buffer.seek(0)
cur.copy_from(buffer, 'personnes', sep=',', columns=('nom', 'prenom', 'telephone', 'email'))
db_conn.commit()

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres CSV import - handle empty strings as integers - postgresql

Related

SQL error (invalid input syntax for type double precision: "") when trying to run query

COPY with FORCE NULL to all fields

How to import a CSV containing a jsonb column type

Converting DATE columns in Postgres

postgresql COPY and CSV data w/ double quotes

Categories

Resources