psql copy command will not interpret NULL values properly - postgresql

I am trying to copy values from a csv (with headers) into a table. I have looked at this answer which says to specify the NULL value, but it does not seem to have an effect for me. This is what I have:
CREATE TABLE stops
(
stop_id text PRIMARY KEY,
--stop_code text NULL,
stop_name text NOT NULL,
--stop_desc text NULL,
stop_lat double precision NOT NULL,
stop_lon double precision NOT NULL,
zone_id integer NULL,
stop_url text NULL,
location_type boolean NULL,
parent_station text NULL
);
\copy stops from './stops.txt' with csv header NULL AS ''
I also tried using the \N character like so:
\copy stops from './stops.txt' with csv header NULL AS '\N'
But it does not seem to have an effect.
I have also tried experimenting with a solution found here which looks like this:
\copy agency from './agency.txt' WITH (FORMAT csv header, FORCE_NULL(zone_id))
But this seems to throw a syntax error at the csv_header part.
Version is 9.6.
This is an excerpt of the csv:
stop_id,stop_name,stop_lat,stop_lon,zone_id,stop_url,location_type,parent_station
"de:07334:1714:1:1","Wörth Alte Bahnmeisterei","49.048742345982","8.26622538039577","","","","Parent1714"
"de:07334:1714:1:2","Wörth Alte Bahnmeisterei","49.0484420719247","8.26673742010779","","","","Parent1714"
"de:07334:1721:1:1","Maximiliansau Eisenbahnstraße","49.0373071007148","8.29789997731824","","","","Parent1721"
"de:07334:1721:2:2","Maximiliansau Eisenbahnstraße","49.0371363175998","8.29896897250649","","","","Parent1721"

But this seems to throw a syntax error at the csv_header part.
Put a comma after csv:
\copy agency from './agency.txt' WITH (FORMAT csv, HEADER, FORCE_NULL(zone_id,location_type))
Apparently FORCE_NULL is required for non-text columns, when using the empty string to specify NULL.

Related

How to select a column to appear with two single quote in the field

Here is my postgresql query
select 'insert into employee(ID_NUMBER,NAME,OFFICE) values ('''||ID_NUMBER||''','''||NAME||''','''||replace(DESIGNATION,'&','and')||''','''||replace(DEPT_NAME,'&','and')||''')' as col
from icare_employee_view
where id_number='201403241'
order by name;
output
insert into employee(ID_NUMBER,NAME,OFFICE) values ('201403241','ABINUMAN, JOSEPHINE CALLO','Assistant AGrS Principal for Curriculum and Instruction','AGrS Principal's Office')
but I need 'AGrS Principal's Office' to be 'AGrS Principal''s Office'
but I need 'AGrS Principal's Office' to be 'AGrS Principal''s Office'
any suggestions or sol'n is highly appreciated on how to fix my PostgreSQL query
Hi check this from pgDocs:
quote_literal ( text ) → text
Returns the given string suitably quoted to be used as a string
literal in an SQL statement string. Embedded single-quotes and
backslashes are properly doubled. Note that quote_literal returns null
on null input; if the argument might be null, quote_nullable is often
more suitable. See also Example 43.1.
quote_literal(E'O'Reilly') → 'O''Reilly'

postgres SQL: getting rid of NA while migrating data from csv file

I am migrating data from a "csv" file into a newly created table named fortune500. the code is shown below
CREATE TABLE "fortune500"(
"id" SERIAL,
"rank" INTEGER,
"title" VARCHAR PRIMARY KEY,
"name" VARCHAR,
"ticker" CHAR(5),
"url" VARCHAR,
"hq" VARCHAR,
"sector" VARCHAR,
"industry" VARCHAR,
"employees" INTEGER,
"revenues" INTEGER,
"revenues_change" REAL,
"profits" NUMERIC,
"profits_change" REAL,
"assets" NUMERIC,
"equity" NUMERIC
);
Then I wanted to migrate data from a csv file using the below code:
COPY "fortune500"("rank", "title", "name", "ticker", "url", "hq", "sector", "industry", "employees",
"revenues", "revenues_change", "profits", "profits_change", "assets", "equity")
FROM 'C:\Users\Yasser A.RahmAN\Desktop\SQL for Business Analytics\fortune.csv'
DELIMITER ','
CSV HEADER;
But I got the below error message due to NA values in one of the columns.
ERROR: invalid input syntax for type real: "NA"
CONTEXT: COPY fortune500, line 12, column profits_change: "NA"
SQL state: 22P02
So how can I get rid of 'NA' values while migrating the data?
Consider using a staging table that would not have restrictive data types and then do your transformations and insert into the final table after the data had been loaded into staging. This is known as ELT (Extract - Load - Transform) approach. You could also use some external tools to implement an ETL process, and do the transformation in that tool, before it reaches your database.
In your case, an ELT approach would be to first create a table with all text types, load that table and then insert into your final table, casting the text values into appropriate types, either filtering out the values that cannot be casted or inserting NULLs, or maybe 0, where that cast can't be made - depending on your requirements. For example you'd filter out rows where profits_change = 'NA' (or better, WHERE NOT (profits_change ~ '^\d+\.?\d+$') to check for a numeric value, or you'd insert NULL or 0:
CASE
WHEN profits_change ~ '^\d+\.?\d+$'
THEN profits_change::real
ELSE NULL -- or 0, depending what you need
END
You'd perform this kind of validation for all fields.
Alternatively, if it's a one off thing - just edit your CSV before importing.

Importing CSV to postgres with new columns

I have an issue importing csv data to a postgres database with geo data / postgis enabled via the following command on my database 'landmarks':
CREATE EXTENSION postgis;
So.... the story goes:
I am following this tutorial.
I am trying to import a csv with these columns
name conf capital venture latitude longitude
the first line, as an example of the data, is:
example, 1, 1, 1, 51.51923, -0.12205
I have set the table up following the tutorial except adding conf, capital and venture instead of the columns in his data (address, date_built, architect, landmark). i.e.:
CREATE TABLE landmarks
(
gid serial NOT NULL,
name character varying(50),
conf character varying(10),
capital character varying(10),
venture character varying(10),
the_geom geometry,
CONSTRAINT landmarks_pkey PRIMARY KEY (gid),
CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
CONSTRAINT enforce_geotype_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 4326)
);
then
CREATE INDEX landmarks_the_geom_gist
ON landmarks
USING gist
(the_geom );
The data is essentially the same otherwise to his example.
I've set the table up properly and enabled the postgis extension to deal with geom data fine.
However, the problem comes when I try to import my csv:
landmarks=# \copy landmarks(name,conf,capital,venture,latitude,longitude) FROM '../../../../../var/tmp/map2.csv' DELIMITERS ',' CSV HEADER;
ERROR: column "latitude" of relation "landmarks" does not exist
Now, I noticed that when he creates the table, he doesn't add latitude or longitude columns... so I wondered if that was the issue and tried to create another table with those columns as well as integers, however that just gives me this error:
ptmap3=# \copy landmarks(name,conf,capital,venture,latitude,longitude) FROM '../../../../../var/tmp/map2.csv' DELIMITERS ',' CSV HEADER;
ERROR: invalid input syntax for integer: "51.51923"
CONTEXT: COPY landmarks, line 2, column latitude: "51.51923"
So... it seems that if I add the latitude column then it works, but fails with the data? After checking the csv for errors using this
od -c map2.csv
...there is nothing wrong with my csv (no hidden characters or errors)... so what's the deal?
If anyone can help me import my csv to this db I'd be very grateful!
You need to process in two steps to import your (latitude,longitude) data into the geometry column of your database.
1. Import the data into two float columns latitude and longitude:
From your original table landmarks:
CREATE TABLE landmarks
(
gid serial NOT NULL,
name character varying(50),
conf character varying(10),
capital character varying(10),
venture character varying(10),
the_geom geometry,
CONSTRAINT landmarks_pkey PRIMARY KEY (gid),
CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
CONSTRAINT enforce_geotype_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 4326)
);
CREATE INDEX landmarks_the_geom_gist
ON landmarks
USING gist
(the_geom );
Add the temporary two columns latitude and longitude:
ALTER TABLE landmarks ADD COLUMN latitude double precision;
ALTER TABLE landmarks ADD COLUMN longitude double precision;
Then import your data:
\copy landmarks(name,conf,capital,venture,latitude,longitude) FROM '../../../../../var/tmp/map2.csv' DELIMITERS ',' CSV HEADER;
2. Populate the geometry column by creating POINT geometries from latitude and longitude:
UPDATE landmarks SET geom = ST_SetSRID(ST_MakePoint(longitude,latitude),4326);
Finally, delete the temporary columns:
ALTER TABLE landmarks DROP COLUMN latitude;
ALTER TABLE landmarks DROP COLUMN longitude;
There is an alternate method.
Google pgfutter and download appropriate version of the executable file. link for windows but other releases available as well.
Make sure pgfutter file and yourdata.csv file are in the same directory.
Set directory in terminal and run the following code.
This will create a new table as well so just enter a fresh table name.
pgfutter.exe --host "localhost" --port "5432" --db "YourDatabaseName" --schema "public" --table "TableName" --user "postgres" --pw "YourPassword" csv YourData.csv

T-SQL Replace part of a not null column with NULL generates error

I have table variable and all its columns can not be null (NOT NULL definition for each):
DECLARE #SampleTable
(
,SampleColumnID nvarchar(400) NOT NULL PRIMARY KEY
,SampleColumnText nvarchar(max) NOT NULL
)
I have done some operation with this variable and initialize the "SampleColumnText" with some text.
Then I try to replace some part of it with text return from other function. What happens is that the function returns NULL in some cases, so I this code generates me error:
REPLACE(SampleColumnText , '{*}', #InitByFunctionText)
WHERE #InitByFunctionText is NULL this time.
So, is it normal error to be generated as I am replacing only part of the text with NULL, not the whole text?
This is expected behaviour. REPLACE:
Returns NULL if any one of the arguments is NULL.
If you want to replace it with an empty string (which is not the same as NULL), you can use COALESCE:
REPLACE(SampleColumnText , '{*}', COALESCE(#InitByFunctionText,''))
I had a similar thing recently and the following got around this issue:
REPLACE(SampleColumnText , '{*}', ISNULL(#InitByFunctionText, ''))

postgresql COPY and CSV data w/ double quotes

Example CSV line:
"2012","Test User","ABC","First","71.0","","","0","0","3","3","0","0","","0","","","","","0.1","","4.0","0.1","4.2","80.8","847"
All values after "First" are numeric columns. Lots of NULL values just quoted as such, right.
Attempt at COPY:
copy mytable from 'myfile.csv' with csv header quote '"';
NOPE: ERROR: invalid input syntax for type numeric: ""
Well, yeah. It's a null value. Attempt 2 at COPY:
copy mytable from 'myfile.csv' with csv header quote '"' null '""';
NOPE: ERROR: CSV quote character must not appear in the NULL specification
What's a fella to do? Strip out all double quotes from the file before running COPY? Can do that, but I figured there's a proper solution to what must be an incredibly common problem.
While some database products treat an empty string as a NULL value, the standard says that they are distinct, and PostgreSQL treats them as distinct.
It would be best if you could generate your CSV file with an unambiguous representation. While you could use sed or something to filter the file to good format, the other option would be to COPY the data in to a table where a text column could accept the empty strings, and then populate the target table. The NULLIF function may help with that: http://www.postgresql.org/docs/9.1/interactive/functions-conditional.html#FUNCTIONS-NULLIF -- it will return NULL if both arguments match and the first value if they don't. So, something like NULLIF(txtcol, '')::numeric might work for you.
as an alternative, using
sed 's/""//g' myfile.csv > myfile-formatted.csv
psql
# copy mytable from 'myfile-formatted.csv' with csv header;
works as well.
I think all you need to do here is the following:
COPY mytable from '/dir/myfile.csv' DELIMITER ',' NULL '' WITH CSV HEADER QUOTE ;
COPY mytable from '/dir/myfile.csv' DELIMITER ',' NULL ''
WITH CSV HEADER FORCE QUOTE *;
This worked for me in Python 3.8.X
import psycopg2
import csv
from io import StringIO
db_conn = psycopg2.connect(host=t_host, port=t_port,
dbname=t_dbname, user=t_user, password=t_pw)
cur = db_conn.cursor()
csv.register_dialect('myDialect',
delimiter=',',
skipinitialspace=True,
quoting=csv.QUOTE_MINIMAL)
with open('files/emp.csv') as f:
next(f)
reader = csv.reader(f, dialect='myDialect')
buffer = StringIO()
writer = csv.writer(buffer, dialect='myDialect')
writer.writerows(reader)
buffer.seek(0)
cur.copy_from(buffer, 'personnes', sep=',', columns=('nom', 'prenom', 'telephone', 'email'))
db_conn.commit()