How to insert NULL values into a PostgreSQL table - postgresql

I have data in CSV file I am trying to insert into a postgresSQL table using pgloader. The input file is from MS SQL server export, and NULL values are already explicitly cast as NULL.
My pgloader scripts seems to fail for the keywords NULL, noticeably for integer and timestamp fields.
I really don't know what I am missing. Your help will be much appreciated.
I can successfully insert into the table from psql console:
insert into raw.a2
(NUM , F_FILENO , F_FOLIONO , F_DOC_TYPE , F_DOCDATE , F_BATCH , F_BOX , F_BLUCPY , F_ROUTOPOST , F_ROUTOUSR , F_WFCREATE , LINKEDFILE , DATECREATE , USERCREATE , DATEUPDATE , USERUPDATE , MEDIA , PGCOUNT , GROUPNUM , SUBJECT , PRI , F_FILECAT)
values
(
16,'18',3,'Nomination Details',NULL,NULL,NULL,1,NULL,NULL,1,'00000016.TIF','2011-02-08 13:02:11.000','isaac','2012-01-12 08:52:31.000','henrey','Multi',4,1.0,0,'-',NULL
);
INSERT 0 1
file-sample
1,'6',1,'Details',2011-02-22 00:00:00.000,NULL,NULL,1,NULL,NULL,2,'00000001.TIF',2011-02-08 09:42:24.000,'kevin',2011-10-27 09:08:42.000,'james','Multi',1,1.0,0,'-',NULL
2,'6',2,'Bio data',NULL,NULL,NULL,1,NULL,NULL,2,'00000002.TIF',2011-02-08 10:25:11.000,'kevin',2012-11-19 16:20:49.000,'pattie','Multi',4,1.0,0,'-',NULL
4,'10',1,'Details',2011-02-22 00:00:00.000,NULL,NULL,1,NULL,NULL,2,'00000004.TIF',2011-02-08 10:43:38.000,'kevin',2014-07-18 10:46:06.000,'brian','Multi',1,1.0,0,'-',NULL
pgloader commands
pgloader --type csv --with truncate --with "fields optionally enclosed by '''" --with "fields terminated by ','" --set "search_path to 'raw'" - "postgresql://postgres:postgres#localhost/doc_db?a2" < null_test
Table
Table "raw.a2"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+---------
num | integer | | not null |
f_fileno | character varying(15) | | |
f_foliono | integer | | |
f_doc_type | character varying(50) | | |
f_docdate | timestamp without time zone | | |
f_batch | integer | | |
f_box | integer | | |
f_blucpy | integer | | |
f_routopost | integer | | |
f_routousr | character varying(49) | | |
f_wfcreate | integer | | |
linkedfile | character varying(255) | | |
datecreate | timestamp without time zone | | |
usercreate | character varying(50) | | |
dateupdate | timestamp without time zone | | |
userupdate | character varying(50) | | |
media | character varying(5) | | |
pgcount | smallint | | |
groupnum | double precision | | |
subject | smallint | | |
pri | character varying(1) | | |
f_filecat | character varying(50) | | |
Indexes:
"a2_pkey" PRIMARY KEY, btree (num)
Output/Error
2019-07-24T05:55:24.231000Z WARNING Target table "\"raw\".\"a2\"" has 1 indexes defined against it.
2019-07-24T05:55:24.237000Z WARNING That could impact loading performance badly.
2019-07-24T05:55:24.237000Z WARNING Consider the option 'drop indexes'.
2019-07-24T05:55:24.460000Z ERROR PostgreSQL ["\"raw\".\"a2\""] Database error 22P02: invalid input syntax for integer: "NULL"
CONTEXT: COPY a2, line 1, column f_batch: "NULL"
2019-07-24T05:55:24.461000Z ERROR PostgreSQL ["\"raw\".\"a2\""] Database error 22007: invalid input syntax for type timestamp: "NULL"
CONTEXT: COPY a2, line 1, column f_docdate: "NULL"

From the docs of pgloader:
null if
This option takes an argument which is either the keyword blanks or a
double-quoted string.
When blanks is used and the field value that is read contains only
space characters, then it’s automatically converted to an SQL NULL
value.
When a double-quoted string is used and that string is read as the
field value, then the field value is automatically converted to an SQL
NULL value.
Looks like you're missing --with 'null if "NULL"' in your command.
Otherwise, you should be able to load the CSV data directly from psql:
\copy raw.a2 (NUM, F_FILENO, F_FOLIONO, F_DOC_TYPE, F_DOCDATE, F_BATCH, F_BOX, F_BLUCPY, F_ROUTOPOST, F_ROUTOUSR, F_WFCREATE, LINKEDFILE, DATECREATE, USERCREATE, DATEUPDATE, USERUPDATE, MEDIA, PGCOUNT, GROUPNUM, SUBJECT, PRI, F_FILECAT) FROM 'file.csv' WITH (FORMAT csv, NULL 'NULL')

Related

Why can't I use a plsql argument in this where clause?

I have a function below (is_organizer) that works, and lets me use this method as a computed field in Hasura. The function below (is_chapter_member) which is almost identical, doesn't work.
WORKS
CREATE OR REPLACE FUNCTION is_organizer(event_row events, hasura_session json)
RETURNS boolean AS $$
SELECT EXISTS (
SELECT 1
FROM event_organizers o
WHERE
o.user_id::text = hasura_session->>'x-hasura-user-id'
AND
(event_row.id = o.event_id OR event_row.event_template_id = o.event_template_id)
);
$$ LANGUAGE SQL STRICT IMMUTABLE;
BROKEN
CREATE OR REPLACE FUNCTION is_chapter_member(c chapters, hasura_session json)
RETURNS boolean AS $$
SELECT EXISTS (
SELECT 1
FROM chapter_members m
WHERE
m.user_id::text = hasura_session->>'x-hasura-user-id'
AND
c.chapter_id = m.chapter_id
);
$$ LANGUAGE SQL STRICT IMMUTABLE;
When attempting to add this function (not call it, just create it) Postgres gives me the following error:
ERROR: missing FROM-clause entry for table "c"
LINE 9: c.chapter_id = m.chapter_id
Why would a function param need a where clause? Table dumps below...
Table "public.chapters"
Column | Type | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('chapters_id_seq'::regclass)
title | text | | not null |
slug | text | | not null |
description | jsonb | | |
avatar_url | text | | |
photo_url | text | | |
region | text | | |
maps_api_result | jsonb | | |
lat | numeric(11,8) | | |
lng | numeric(11,8) | | |
created_at | timestamp with time zone | | not null | now()
updated_at | timestamp with time zone | | not null | now()
deleted_at | timestamp with time zone | | |
Table "public.chapter_members"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
user_id | integer | | not null |
chapter_id | integer | | not null |
created_at | timestamp with time zone | | not null | now()
updated_at | timestamp with time zone | | not null | now()
Table "public.events"
Column | Type | Collation | Nullable | Default
-------------------+-----------------------------+-----------+----------+---------------------------------------------------
id | integer | | not null | nextval('events_id_seq'::regclass)
event_template_id | integer | | not null |
venue_id | integer | | |
starts_at | timestamp without time zone | | not null |
duration | interval | | not null |
title | text | | |
slug | text | | |
description | text | | |
photo_url | text | | |
created_at | timestamp without time zone | | not null | now()
updated_at | timestamp without time zone | | not null | now()
deleted_at | timestamp without time zone | | |
ends_at | timestamp without time zone | | | generated always as (starts_at + duration) stored
Table "public.event_organizers"
Column | Type | Collation | Nullable | Default
-------------------+---------+-----------+----------+----------------------------------------------
id | integer | | not null | nextval('event_organizers_id_seq'::regclass)
user_id | integer | | not null |
event_id | integer | | |
event_template_id | integer | | |
This turned out to be using an incorrect column name in the broken function. chapter_id should have just been id on the c argument. I took Richard's prompt and tried putting parens around the arg like (c).chapter_id. This then correctly told me that chapter_id doesn't exist, and allowed me to fix the issue.

How to import multiple csv files and create tables from their headers automatically in postgresql?

I am new to postgresql and sql in general. From what I understood, manually one would create the table for each file and then use copy to import the data from the csv file to the table. If we have a bunch of csv files is it possible to import and create the tables for all files using the header names in the csv file as column names ?
Thanks
As an example using the program I mentioned in the comment:
--Shows table schema creation using Postgres dialect
csvsql -i postgresql --tables cp ~/cell_per.csv
CREATE TABLE cp (
line_id DECIMAL NOT NULL,
category VARCHAR NOT NULL,
cell_per DECIMAL NOT NULL,
ts_insert TIMESTAMP WITHOUT TIME ZONE,
ts_update TIMESTAMP WITHOUT TIME ZONE,
user_insert VARCHAR,
user_update VARCHAR,
plant_type VARCHAR,
season VARCHAR,
short_category VARCHAR
);
--Create table and import from file in one step
csvsql --db postgresql://aklaver:#localhost:5432/test --tables cp --insert ~/cell_per.csv
\d cp
Table "public.cp"
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+---------
line_id | numeric | | not null |
category | character varying | | not null |
cell_per | numeric | | not null |
ts_insert | timestamp without time zone | | |
ts_update | timestamp without time zone | | |
user_insert | character varying | | |
user_update | character varying | | |
plant_type | character varying | | |
season | character varying | | |
short_category | character varying | | |
select count(*) from cp;
count
-------
68
(1 row)
This is just one tool. You could also use the I/O tools from Pandas.

How to migrate tables with defaults, constraints and sequences with AWS DMS for postgres to postgres migration?

I recently did a migration from a RDS postgresql to Aurora postgresql. The tables were migrated successfully but the tables are missing their defaults, constraints and references. It also did not migrate any sequences.
Table in source database:
Table "public.addons_snack"
Column | Type | Collation | Nullable | Default
---------------+--------------------------+-----------+----------+------------------------------------------
id | integer | | not null | nextval('addons_snack_id_seq'::regclass)
name | character varying(100) | | not null |
snack_type | character varying(2) | | not null |
price | integer | | not null |
created | timestamp with time zone | | not null |
modified | timestamp with time zone | | not null |
date | date | | |
Indexes:
"addons_snack_pkey" PRIMARY KEY, btree (id)
Check constraints:
"addons_snack_price_check" CHECK (price >= 0)
Referenced by:
TABLE "addons_snackreservation" CONSTRAINT "addons_snackreservation_snack_id_373507cf_fk_addons_snack_id" FOREIGN KEY (snack_id) REFERENCES addons_snack(id) DEFERRABLE INITIALLY DEFERRED
Tables in target database
Table "public.addons_snack"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+---------
id | integer | | not null |
name | character varying(100) | | not null |
snack_type | character varying(2) | | not null |
price | integer | | not null |
created | timestamp(6) with time zone | | not null |
modified | timestamp(6) with time zone | | not null |
date | date | | |
Indexes:
"addons_snack_pkey" PRIMARY KEY, btree (id)
Did I do something wrong or DMS is not capable of doing this?
This SQL Snippet will be a clear answer for you.
You can restore Index and Constraint by using pg_dump and pg_restore, and the snippet consists of executing them.

Postgres varchar column giving error "invalid input syntax for integer"

I'm trying to use a INSERT INTO ... SELECT statement to copy columns from one table into another table but I am getting an error message:
gis=> INSERT INTO places (SELECT 0 AS osm_id, 0 AS code, 'country' AS fclass, pop_est::numeric(10,0) AS population, name, geom FROM countries);
ERROR: invalid input syntax for integer: "country"
LINE 1: ...NSERT INTO places (SELECT 0 AS osm_id, 0 AS code, 'country' ...
The SELECT statement by itself is giving a result like I expect:
gis=> SELECT 0 AS osm_id, 0 AS code, 'country' AS fclass, pop_est::numeric(10,0) AS population, name, geom FROM countries LIMIT 1;
osm_id | code | fclass | population | name | geom
--------+------+---------+------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | 0 | country | 103065 | Aruba | 0106000000010000000103000000010000000A000000333333338B7951C0C8CCCCCC6CE7284033333333537951C03033333393D82840CCCCCCCC4C7C51C06066666686E0284000000000448051C00000000040002940333333333B8451C0C8CCCCCC0C18294099999999418351C030333333B3312940333333333F8251C0C8CCCCCC6C3A294000000000487E51C000000000A0222940333333335B7A51C00000000000F62840333333338B7951C0C8CCCCCC6CE72840
(1 row)
But somehow it looks like it's getting confused thinking that the fclass column should be an integer when, in fact, it is actually a character varying(20)
gis=> \d+ places
Unlogged table "public.places"
Column | Type | Modifiers | Storage | Stats target | Description
------------+------------------------+------------------------------------------------------+----------+--------------+-------------
gid | integer | not null default nextval('places_gid_seq'::regclass) | plain | |
osm_id | bigint | | plain | |
code | smallint | | plain | |
fclass | character varying(20) | | extended | |
population | numeric(10,0) | | main | |
name | character varying(100) | | extended | |
geom | geometry | | main | |
Indexes:
"places_pkey" PRIMARY KEY, btree (gid)
"places_geom" gist (geom)
I've tried casting all of the columns to their exact types they need to be for the destination table but that doesn't seem to have any effect.
All of the other instances of this error message I can find online appear to be people trying to use empty strings as an integer which isn't relevant here because I'm selecting a constant string as fclass.
You need to specify the column names you are inserting into:
INSERT INTO places (osm_id, code, fclass, population, name, geom) SELECT ...
Without specifying them individually, it is assumed that all columns are to be inserted into - including gid, which you want to have auto-populate. So, 'country' is actually being inserted into code by your current INSERT statement.

Can not store a double value for spatial datatype in Mysql5.6

I am trying to insert a polygon shape in a mysql database. Vertices of Polygon shapes consist of double values. To insert the value I have tried with the following query. But I got the following as my error.
INSERT INTO HIBERNATE_SPATIAL
(PRD_GEO_REGION_ID,OWNER_ID,GEO_REGION_NAME,GEO_REGION_DESCRIPTION,GEO_REGION_DEFINITION)
VALUES (8,1,'POLYGON8','SHAPE8',Polygon(LineString(10.12345612341243,11.12345612341234),LineString(10.34512341246,11.4123423456),LineString(10.31423424456,11.34123423456),LineString(10.341234256,11.3412342456),LineString(10.11423423456,11.123424)));
TABLE DESCRIPTION
+------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| PRD_GEO_REGION_ID | int(11) | NO | PRI | NULL | auto_increment |
| OWNER_ID | decimal(3,0) | NO | | NULL | |
| GEO_REGION_NAME | varchar(50) | NO | UNI | NULL | |
| GEO_REGION_DESCRIPTION | varchar(70) | YES | | NULL | |
| GEO_REGION_DEFINITION | geometry | YES | | NULL | |
+------------------------+--------------+------+-----+---------+----------------+
THE ERROR :_:
ERROR 1367 (22007): Illegal non geometric '10.12345612341243' value found during parsing
Regards,
ArunRaj.
Solved the issue. Stored Decimal value successfully. I have done two mistakes.
1 ) LineString can only store Points datatype (Not the decimal values or coordinates). Corrected syntax follows.
INSERT INTO HIBERNATE_SPATIAL
(PRD_GEO_REGION_ID,OWNER_ID,GEO_REGION_NAME,GEO_REGION_DESCRIPTION,GEO_REGION_DEFINITION)
VALUES (8,1,'POLYGON8','SHAPE8',Polygon(LineString(POINT(10.12345612341243,11.12345612341234)),LineString(POINT(10.34512341246,11.4123423456)),LineString(POINT(10.31423424456,11.34123423456)),LineString(POINT(10.341234256,11.3412342456)),LineString(POINT(10.11423423456,11.123424))));
2 ) If it is a polygon shape. The shape has to be closed ( Starting and Ending point should be the same ).
That was the problem.
WORKING QUERY
INSERT INTO HIBERNATE_SPATIAL
(PRD_GEO_REGION_ID,OWNER_ID,GEO_REGION_NAME,GEO_REGION_DESCRIPTION,GEO_REGION_DEFINITION)
VALUES (8,1,'POLYGON8','SHAPE8',Polygon(LineString(POINT(10.12345612341243,11.12345612341234)),LineString(POINT(10.34512341246,11.4123423456)),LineString(POINT(10.31423424456,11.34123423456)),LineString(POINT(10.341234256,11.3412342456)),LineString(POINT(10.12345612341243,11.12345612341234))));