COPY with FORCE NULL to all fields - postgresql

I have several CSVs with varying field names that I am copying into a Postgres database from an s3 data source. There are quite a few of them that contain empty strings, "" which I would like to convert to NULLs at import. When I attempt to copy I get an error along the lines of this (same issue for other data types, integer, etc.):
psycopg2.errors.InvalidDatetimeFormat: invalid input syntax for type date: ""
I have tried using FORCE_NULL (field 1, field2, field3) - and this works for me, except I would like to do FORCE_NULL (*) and apply to all of the columns as I have A LOT of fields I am bringing in that I'd like this applied to.
Is this available?
This is an example of my csv:
"ABC","tgif","123","","XyZ"

Using psycopg2 Copy functions. In this case copy_expert:
cat empty_str.csv
1, ,3,07/22/2
2,test,4,
3,dog,,07/23/2022
create table empty_str_test(id integer, str_fld varchar, int_fld integer, date_fld date);
import psycopg2
con = psycopg2.connect("dbname=test user=postgres host=localhost port=5432")
cur = con.cursor()
with open("empty_str.csv") as csv_file:
cur.copy_expert("COPY empty_str_test FROM STDIN WITH csv", csv_file)
con.commit()
select * from empty_str_test ;
id | str_fld | int_fld | date_fld
----+---------+---------+------------
1 | | 3 | 2022-07-22
2 | test | 4 |
3 | dog | | 2022-07-23
From here COPY:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
copy_expert allows you specify the CSV format. If you use copy_from it will use the text format.

Related

Postgres CSV import - handle empty strings as integers

I have a ton of CSV files that I'm trying to import into Postgres. The CSV data is all quoted regardless of what the data type is. Here's an example:
"3971","14","34419","","","","","6/25/2010 9:07:02 PM","70.21.238.46 "
The first 4 columns are supposed to be integers. Postgres handles the cast from the string "3971" to the integer 3971 correctly, but it pukes at the empty string in the 4th column.
PG::InvalidTextRepresentation: ERROR: invalid input syntax for type integer: ""
This is the command I'm using:
copy "mytable" from '/path/to/file.csv' with delimiter ',' NULL as '' csv header
Is there a proper way to tell Postgres to treat empty strings as null?
How to do this. Since I'm working in psql and using a file that the server user can't reach I use \copy, but the principle is the same:
create table csv_test(col1 integer, col2 integer);
cat csv_test.csv
"1",""
"","2"
\copy csv_test from '/home/aklaver/csv_test.csv' with (format 'csv', force_null (col1, col2));
COPY 2
select * from csv_test ;
col1 | col2
------+------
1 | NULL
NULL | 2

Ingest utility doesn't insert NULL value in a column of integer type

I am reading a CSV file through a named pipe. In the CSV file the field2 column is blank which need to be inserted into a table column as NULL. The table column is of type integer, but When I try to run the ingest
I am getting an error that says 'field2 cannot be converted to the value type: integer'.
Here is my below code
mkfifo mypipe
tail -n +2 myfile.csv > mypipe &
db2 "INGEST FROM FILE mypipe
FORMAT DELIMITED
(
$field1 CHAR(9),
$field2 INTEGER EXTERNAL,
$field3 CHAR(32)
)
INSERT INTO my_table
VALUES($field1, $field2, $field3)"
In the above code, $field2 will be blank. In the my_table, $field2 value doesn't get inserted as NULL when the field is blank in csv.
Sample input csv data as shown below
Subject_Name,Student_ID,STATUS
Maths,,COMPLETED
Physics,,PENDING
Computers,,PENDING
I want the data to be ingested in the table like below
Subject_Name|Student_id|STATUS |
------------|----------|---------|
Maths |NULL |COMPLETED|
------------|----------|---------|
Physics |NULL |PENDING |
------------|----------|---------|
Computers |NULL |PENDING |
------------|----------|---------|
Can anyone suggest a way to resolve this issue?

PostgreSQL text field loads as strL

I have data stored in PostgreSQL with the data type text.
When I load this data into Stata it has type strL, even if every string in a column is only one charter long. This takes up too much memory. I would like to continue using the text type in PostgreSQL.
Is there a way to specify that text data from PostgreSQL is loaded into Stata with type str8?
I also want numeric data to be loaded as numeric values so allstring is not a good solution. I would also like to avoid specifying data type on a column by column basis.
The command I use to load data into Stata is this:
odbc load, exec("SELECT * FROM mytable") <connect_options>
The file profile.do contains the following:
set odbcmgr unixodbc, permanently
set odbcdriver ansi, permanently
The file odbci.ini contains the following:
[database_name]
Debug = 0
CommLog = 0
ReadOnly = no
Driver = /usr/local/lib/psqlodbcw.so
Servername = <server>
FetchBufferSize = 99
Port = 5432
Database = postgres
In PosrgreSQL mytable looks like this:
postgres=# \d+ mytable
Table "public.mytable"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------+-----------+----------+--------------+-------------
c1 | text | | extended | |
c2 | text | | extended | |
postgres=# select * from mytable;
c1 | c2
----+-------
a | one
b | two
c | three
(3 rows)
In Stata mutable looks like this:
. describe
Contains data
obs: 3
vars: 2
size: 500
---------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------
c1 strL %9s
c2 strL %9s
---------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
I am using PostgreSQL v9.6.5 and Stata v14.2.
You can do this in Stata by compress-ing your data after you load the variables:
clear
input strL string
"My name is Pearly Spencer"
"I am a contributor on Stack Overflow"
"This is an example variable"
end
describe
Contains data
obs: 3
vars: 1
size: 355
------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------------
string strL %9s
------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
compress, nocoalesce
describe
Contains data
obs: 3
vars: 1
size: 108
------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------------
string str36 %36s
------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
The option nocoalesce forces Stata to choose the appropriate length for the loaded string variables.

PostgreSQL convert varchar to numeric and get average

I have a column that I want to get an average of, the column is varchar(200). I keep getting this error. How do I convert the column to numeric and get an average of it.
Values in the column look like
16,000.00
15,000.00
16,000.00 etc
When I execute
select CAST((COALESCE( bonus,'0')) AS numeric)
from tableone
... I get
ERROR: invalid input syntax for type numeric:
The standard way to represent (as text) a numeric in SQL is something like:
16000.00
15000.00
16000.00
So, your commas in the text are hurting you.
The most sensible way to solve this problem would be to store the data just as a numeric instead of using a string (text, varchar, character) type, as already suggested by a_horse_with_no_name.
However, assuming this is done for a good reason, such as you inherited a design you cannot change, one possibility is to get rid of all the characters which are not a (minus sign, digit, period) before casting to numeric:
Let's assume this is your input data
CREATE TABLE tableone
(
bonus text
) ;
INSERT INTO tableone(bonus)
VALUES
('16,000.00'),
('15,000.00'),
('16,000.00'),
('something strange 25'),
('why do you actually use a "text" column if you could just define it as numeric(15,0)?'),
(NULL) ;
You can remove all the straneous chars with a regexp_replace and the proper regular expression ([^-0-9.]), and do it globally:
SELECT
CAST(
COALESCE(
NULLIF(
regexp_replace(bonus, '[^-0-9.]+', '', 'g'),
''),
'0')
AS numeric)
FROM
tableone ;
| coalesce |
| -------: |
| 16000.00 |
| 15000.00 |
| 16000.00 |
| 25 |
| 150 |
| 0 |
See what happens to the 15,0 (this may NOT be what you want).
Check everything at dbfiddle here
I'm going to go out on a limb and say that it might be because you have Empty strings rather than nulls in your column; this would result in the error you are seeing. Try wrapping the column name in a nullif:
SELECT CAST(coalesce(NULLIF(bonus, ''), '0') AS integer) as new_field
But I would really question your schema that you have numeric values stored in a varchar column...

Strange behaviour in Postgresql

I'm new to Postgresql and I'm trying to migrate my application from MySQL.
I have a table with the following structure:
Table "public.tbl_point"
Column | Type | Modifiers | Storage | Description
------------------------+-----------------------+-----------+----------+-------------
Tag_Id | integer | not null | plain |
Tag_Name | character varying(30) | not null | extended |
Quality | integer | not null | plain |
Execute | integer | not null | plain |
Output_Index | integer | not null | plain |
Last_Update | abstime | | plain |
Indexes:
"tbl_point_pkey" PRIMARY KEY, btree ("Tag_Id")
Triggers:
add_current_date_to_tbl_point BEFORE UPDATE ON tbl_point FOR EACH ROW EXECUTE PROCEDURE update_tbl_point()
Has OIDs: no
when I run the query through a C program using libpq:
UPDATE tbl_point SET "Execute"=0 WHERE "Tag_Id"=0
I got the following output:
ERROR: record "new" has no field "last_update"
CONTEXT: PL/pgSQL function "update_tbl_point" line 3 at assignment
I get exactly the same error when I try to change the value of "Execute" or any other column using pgAdminIII.
Everything works fine if I change the column name from "Last_Update" to "last_update".
I found the same problem with other tables I have in my database and the column always appears with abstime or timestamp columns.
Your update_tbl_point function is probably doing something like this:
new.last_update = current_timestamp;
but it should be using new."Last_Update" so fix your trigger function.
Column names are normalized to lower case in PostgreSQL (the opposite of what the SQL standard says mind you) but identifiers that are double quoted maintain their case:
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other. (The folding of unquoted names to lower case in PostgreSQL is incompatible with the SQL standard, which says that unquoted names should be folded to upper case. Thus, foo should be equivalent to "FOO" not "foo" according to the standard. If you want to write portable applications you are advised to always quote a particular name or never quote it.)
So, if you do this:
create table pancakes (
Eggs integer not null
)
then you can do any of these:
update pancakes set eggs = 11;
update pancakes set Eggs = 11;
update pancakes set EGGS = 11;
and it will work because all three forms are normalized to eggs. However, if you do this:
create table pancakes (
"Eggs" integer not null
)
then you can do this:
update pancakes set "Eggs" = 11;
but not this:
update pancakes set eggs = 11;
The usual practice with PostgreSQL is to use lower case identifiers everywhere so that you don't have to worry about it. I'd recommend the same naming scheme in other databases as well, having to quote everything just leaves you with a mess of double quotes (standard), backticks (MySQL), and brackets (SQL Server) in your SQL and that won't make you any friends.