COPY NULL values in Postgresql using psycopg2 copy_from() - postgresql

This seems like a rather popular question, but all the answers around here did not help me solve the issue... I have a Postgresql 9.5 table on my OS X machine:
CREATE TABLE test (col1 TEXT, col2 INT)
The following function uses the psycopg2 copy_from() command:
def test_copy(conn, curs, data):
cpy = BytesIO()
for row in data:
cpy.write('\t'.join([str(x) for x in row]) + '\n')
print cpy
cpy.seek(0)
curs.copy_from(cpy, 'test')
test_copy(connection, [('a', None), ('b', None)])
And will result in this error:
ERROR: invalid input syntax for integer: "None"
CONTEXT: COPY test, line 1, column col2: "None"
STATEMENT: COPY test FROM stdin WITH DELIMITER AS ' ' NULL AS '\N'
I tried also curs.copy_from(cpy, 'test', null=''), curs.copy_from(cpy, 'test', null='NULL'). Any suggestions are greatly appreciated.

OK, after more trial & error I found the solution:
copy_from(cpy, 'test', null='None')

Related

How to properly insert from stdin in postgresql?

normal insert:
insert into tfreeze(id,s) values(1,'foo');
I tried the following ways, both not working:
copy tfreeze(id,s ) from stdin;
1 foo
\.
copy tfreeze(id,s ) from stdin;
1 'foo'
\.
Only a few questions related from stdin in stackoverflow. https://stackoverflow.com/search?q=Postgres+Insert+statements+from+stdin
--
error code:
ERROR: 22P02: invalid input syntax for type integer: "1 foo"
CONTEXT: COPY tfreeze, line 1, column id: "1 foo"
LOCATION: pg_strtoint32, numutils.c:320
I get code from this(https://postgrespro.ru/education/books/internals) book.
code source: https://prnt.sc/eEsRZ5AK-tjQ
So far I tried:
1, foo, 1\t'foo', 1\tfoo
First, you have to use psql for that (you are already doing that).
You get that error because you use the default text format, which requires that the values are separated by tabulator characters (ASCII 9).
I recommend that you use the CSV format and separate the values with commas:
COPY tfreeze (id, s) FROM STDIN (FORMAT 'csv', FREEZE);
1,foo
\.

PSQL - encrypt_iv return multiple line encoded text

I am facing a issue to load a csv resulting froma query that encryts a text.
This is the query I run:
select
id,
encode(encrypt_iv(raw_address::bytea, '<aes_key>', '<iv>', 'aes-cbc/pad:pkcs'), 'base64') raw_address
from some_table;
But I got a multiline text as result for raw_address column. So I tried:
select
id,
encode(encrypt_iv(replace(raw_address, chr(92), chr(47))::bytea, '<aes_key>', '<iv>', 'aes-cbc/pad:pkcs'), 'base64') raw_address
from some_table;
This because I just wanted to make this \ into this / (to avoid \n)
This is the result example:
But got the same result. Then I found this answer and realize that + char was present, so I tried:
select
id,
replace(encode(encrypt_iv(replace(raw_address, chr(92), chr(47))::bytea, '<aes_key>', '<iv>', 'aes-cbc/pad:pkcs'), 'base64'), chr(10), '')
from, some_table;
Then I got one line:
But I don't know if I am modifing the original value, I can not decrypt the value. I tried:
select encode(decrypt_iv('55WHZ7tyGAlQxTIM0fPfY5tOKpbYzwdXCsemIgYV5TRG+h45IW1nU/zCqZbkIeiXQ3OXZSlHo0RPgq5wcgJ0xQ==', '<aes_key>', '<iv>', 'aes-cbc/pad:pkcs'), 'base64') ;
But I got:
ERROR: decrypt_iv error: Data not a multiple of block size
Any suggestion will be appreciated.
Thanks in advance!

Snowflake null values quoted in CSV breaks PostgreSQL unload

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
('A', 1),
(NULL, 2),
('B', NULL),
(NULL, NULL),
('Hello, world', NULL)
;
COPY INTO #STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ''
)
OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1
"",2
"B",""
"",""
"Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.
When you set the NULL_IF value to '', you are actually telling Snowflake to convert NULLS to a BLANK, which then get quoted. When you are copying out of Snowflake, the copy options are "backwards" in a sense and NULL_IF acts more like an IFNULL.
This is the code that I'd use on the Snowflake side, which will result in an unquoted empty string in your CSV file:
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
)

Psycopg2 execute_values sending all values as text

I have this table in postgres
CREATE TABLE target (
a json
b integer
c text []
id integer
CONSTRAINT id_fkey FOREIGN KEY (id)
REFERENCES public.other_table(id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
)
Which I would like to insert data to from psycopg2 using
import psycopg2
import psycopg2.extras as extras
# data is of the form dict, integer, list(string), string <- used to get fkey id
data = [[extras.Json([{'a':1,'b':2}, {'d':3,'e':2}]), 1, ['hello', 'world'], 'ident1'],
[extras.Json([{'a':4,'b':3}, {'d':1,'e':9}]), 5, ['hello2', 'world2'], 'ident2']]
# convert data to list of tuples containing objects
x = [tuple(u) for u in data]
# insert data to the database
query = ('WITH ins (a, b, c, ident) AS '
'(VALUES %s) '
'INSERT INTO target (a, b, c, id) '
'SELECT '
'ins.a '
'ins.b '
'ins.c '
'other_table.id'
'FROM '
'ins '
'LEFT JOIN other_table ON ins.ident = other_table.ident;')
cursor = conn.cursor()
extras.execute_values(cursor, query, x)
When I run this I get the error: column "a" is of type json but expression is of type text. I tried to solve this by adding a type cast in the SELECT statement but then I got the same error for c and then for b.
Originally I thought the problem lies in the WITH statement but based on the answers to my previous question this seems to not be the case Postgres `WITH ins AS ...` casting everything as text
It seems that execute_values is sending all the values as text with ' '.
Main Question: How can I get execute_values to send the values based on their python data type rather than just as text?
Sub questions:
How can I confirm that execute_values is in fact sending the values as text with quotation marks?
What is the purpose of the template argument of execute_values https://www.psycopg.org/docs/extras.html and could that be of help?
The issue, as Adrian Klaver points out in their comment, and also seen in this answer, is that the typing is lost in the CTE.
We can show this with an example in the psql shell:
CREATE TABLE test (col1 json);
WITH cte (c) AS (VALUES ('{"a": 1}'))
INSERT INTO test (col) SELECT c FROM cte;
resulting in
ERROR: column "col" is of type json but expression is of type text
whereas this version, with the type specified, succeeds:
WITH cte(c) AS (VALUES ('{"a": 1}'::json))
INSERT INTO test (col) SELECT c FROM cte;
We can mimic this in execute_valuesby providing the typing information in the template argument:
extras.execute_values(cursor, query, data, template='(%s::json, %s, %s, %s)')

how to deal with missings when importing csv to postgres?

I would like to import a csv file, which has multiple occurrences of missing values. I recoded them into NULL and tried to import the file as. I suppose that my attributes which include the NULLS are character values. However transforming them to numeric is bit complicated. Therefore I would like to import all of my table as:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' DELIMITER ';' CSV WITH NULL AS 'NULL' ';' HEADER
There must be a syntax error. But I tried different combinations and always get:
ERROR: syntax error at or near "WITH NULL"
LINE 1: COPY player_allstar FROM STDIN DELIMITER ';' CSV WITH NULL ...
I also tried:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
and get:
ERROR: invalid input syntax for integer: "NULL"
CONTEXT: COPY player_allstar, line 2, column dreb: "NULL"
I suppose it is caused by preprocessing with R. The Table came with NAs so I change them to:
data[data==NA] <- "NULL"
I`m not aware of a different way chaning to NULL. I think this causes strings. Is there a different way to preprocess and keep the NAs(as NULLS in postgres of course)?
Sample:
pts dreb oreb reb asts stl
11 NULL NULL 8 3 NULL
4 5 3 8 2 1
3 NULL NULL 1 1 NULL
data type is integer
Given /tmp/sample.csv:
pts;dreb;oreb;reb;asts;stl
11;NULL;NULL;8;3;NULL
4;5;3;8;2;1
3;NULL;NULL;1;1;NULL
then with a table like:
CREATE TABLE player_allstar (pts integer, dreb integer, oreb integer, reb integer, asts integer, stl integer);
it works for me:
\copy player_allstar FROM '/tmp/sample.csv' WITH (FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
Your syntax is fine, the problem seem to be in the formatting of your data. Using your syntax I was able to load data with NULLs successfully:
mydb=# create table test(a int, b text);
CREATE TABLE
mydb=# \copy test from stdin WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> col a header;col b header
>> 1;one
>> NULL;NULL
>> 3;NULL
>> NULL;four
>> \.
mydb=# select * from test;
a | b
---+------
1 | one
|
3 |
| four
(4 rows)
mydb=# select * from test where a is null;
a | b
---+------
|
| four
(2 rows)
In your case you can substitute to NULL 'NA' in the copy command, if the original value is 'NA'.
You should make sure that there's no spaces around your data values. For example, if your NULL is represented as NA in your data and fields are delimited with semicolon:
1;NA <-- good
1 ; NA <-- bad
1<tab>NA <-- bad
etc.