Possible to use pandas/sqlalchemy to insert arrays into sql database? (postgres) - postgresql

With the following:
engine = sqlalchemy.create_engine(url)
df = pd.DataFrame({
"eid": [1,2],
"f_i": [123, 1231],
"f_i_arr": [[123], [0]],
"f_53": ["2013/12/1","2013/12/1",],
"f_53a": [["2013/12/1"], ["2013/12/1"],],
})
with engine.connect() as con:
con.execute("""
DROP TABLE IF EXISTS public.test;
CREATE TABLE public.test
(
eid integer NOT NULL,
f_i INTEGER NULL,
f_i_arr INTEGER NULL,
f_53 DATE NULL,
f_53a DATE[] NULL,
PRIMARY KEY(eid)
);;
""")
df.to_sql("test", con, if_exists='append')
If I try to insert only column "f_53" (an date) it succeeds.
If I try to add column "f_53a" (a date[]) it fails with:
^
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "f_53a" is of type date[] but expression is of type text[]
LINE 1: ..._53, f_53a, f_i, f_i_arr) VALUES (1, '2013/12/1', ARRAY['201...
^
HINT: You will need to rewrite or cast the expression.
[SQL: 'INSERT INTO test (eid, f_53, f_53a, f_i, f_i_arr) VALUES (%(eid)s, %(f_53)s, %(f_53a)s, %(f_i)s, %(f_i_arr)s)'] [parameters: ({'f_53': '2013/12/1', 'f_53a': ['2013/12/1', '2013/12/1'], 'f_i_arr': [123], 'eid': 1, 'f_i': 123}, {'f_53': '2013/12/1', 'f_53a': ['2013/12/1', '2013/12/1'], 'f_i_arr': [0], 'eid': 2, 'f_i': 1231})]

I have mentioned the dtypes explicitly and it worked for me for postgres.
//sample code
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.dialects import postgresql
df.to_sql('mytable',pgConn, if_exists='append', index=False, dtype={'datetime': sqlalchemy.TIMESTAMP(), 'cur_c':postgresql.ARRAY(sqlalchemy.types.REAL),
'volt_c':postgresql.ARRAY(sqlalchemy.types.REAL)
})

Yes -- is possible to insert [] and [][] types from a dataframe into postgres form a dataframe.
Unlike flat DATE types, which are may be correctly parsed by sql, DATE[] and DATE[][] need to be converted to datetime objects first. Like so.
with engine.connect() as con:
con.execute("""
DROP TABLE IF EXISTS public.test;
CREATE TABLE public.test
(
eid integer NOT NULL,
f_i INTEGER NULL,
f_ia INTEGER[] NULL,
f_iaa INTEGER[][] NULL,
f_d DATE NULL,
f_da DATE[] NULL,
f_daa DATE[][] NULL,
PRIMARY KEY(eid)
);
""")
d = pd.to_datetime("2013/12/1")
i = 99
df = pd.DataFrame({
"eid": [1,2],
"f_i": [i,i],
"f_ia": [None, [i,i]],
"f_iaa": [[[i,i],[i,i]], None],
"f_d": [d,d],
"f_da": [[d,d],None],
"f_daa": [[[d,d],[d,d]],None],
})
df.to_sql("test", con, if_exists='append', index=None)

Related

R2DBC using Collection as parameter repository`s method for compare with uuid[]

DataBase: R2DBC Postgres
I have a column model_id at table with type: uuid[].
create table t_job
(
id uuid default gen_random_uuid() not null
primary key,
model_id uuid[] not null,
// -- ANOTHER COLUMN -- //
);
I need compare values from column model_id with Set<UUID>
#Query("""
SELECT case
WHEN COUNT(j) >= 1
THEN true
ELSE false
END
FROM t_job AS j
WHERE j.model_id IN :modelIdSet
AND j.state = 'done'
AND j.output_format = 'COLLISION'
""")
Mono<Boolean> isCollisionJobDoneBySeveralModelsId(String modelIdSet);
OUTPUT:
"debugMessage": "executeMany; bad SQL grammar [ SELECT case\n WHEN COUNT(j) >= 1\n THEN true\n ELSE false\n END\n FROM runner_processing_service.t_job AS j\n WHERE j.model_id IN :modelIdSet\n AND j.state = 'done'\n AND j.output_format = 'COLLISION'\n]; nested exception is io.r2dbc.postgresql.ExceptionFactory$PostgresqlBadGrammarException: [42601] syntax error at or near "$1""
How correct insert and compare values from uuid[] column with Set<UUID>
I try convert Set to String type and give that string to repository`s method, but it is not work to.
This query is work correct from console
enter image description here

getting started testing db functions pytest Process finished with exit code 5

I need to test loads of functions that used a sqlite db. To get started I want to use pytest for fixtures.
conftest.py:
import pytest
import sqlite3
#pytest.fixture
def session():
connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
cursor.execute("CREATE TABLE Investment (ID INTEGER PRIMARY KEY, name VARCHAR(120), ticker VARCHAR(10), exchange VARCHAR(10), type INTEGER, relativeAddress VARCHAR(50), sharesiesFundID VARCHAR(36))")
cursor.execute("CREATE TABLE Orders (investmentID INTEGER NOT NULL, logTimestamp TIMESTAMP NOT NULL, amount INTEGER, PRIMARY KEY(investmentID, logTimestamp), FOREIGN KEY(investmentID) REFERENCES Investment(ID))")
cursor.execute("CREATE TABLE InvestmentType (typeID INTEGER NOT NULL PRIMARY KEY , entryName VARCHAR(20))")
cursor.execute("INSERT INTO InvestmentType (typeID, entryName) VALUES (0, 'Company'), (1, 'ETF'), (2, 'Managed Fund')")
cursor.execute("INSERT INTO Investment (name, ticker, exchange, type, relativeAddress, sharesiesFundID) VALUES ('3M Co.', 'MMM', 'NYSE', 0, 'nyse-mmm', '94de52ef-324f-4d24-8a80-a5d2f00656bf'), ('a2 Milk Company', 'ATM', 'NZX', 0, 'atm', 'deff31bd-625b-4a82-bbc2-064c7b70b97c'), ('Abbott Laboratories', 'ABT', 'NYSE', 0, 'nyse-abt', 'a367613c-a9bd-4562-a8fd-459e7bd4f5ae')")
connection.commit()
yield cursor
connection.close()
test_db.py:
def get_entry(session):
result = session.execute("SELECT name FROM Investment WHERE ID = 3").fetchone()
assert result[0][1] == 'Abbott Laboratories'
this keep resulting in "Process finished with exit code 5".
I've tried putting everything in the same file, and some other configurations for pytest.

sqlalchemy seems have no support for insert cte

By given table creation statement and query it's necessary to get old values before update:
CREATE TABLE IF NOT EXISTS products(
id INT GENERATED BY DEFAULT AS IDENTITY NOT NULL PRIMARY KEY,
product_id INT UNIQUE,
image_link CHARACTER VARYING NOT NULL,
additional_image_links CHARACTER VARYING[] NOT NULL
);
WITH temp AS (
INSERT INTO products(product_id, image_link, additional_image_links)
VALUES(1, 'http://www.e1xazm1ple1k113.com',ARRAY['http://www.examkple1113.com','http://www.example2.com'])
ON CONFLICT (product_id) DO UPDATE SET image_link = EXCLUDED.image_link, additional_image_links = EXCLUDED.additional_image_links
WHERE products.image_link != EXCLUDED.image_link OR products.additional_image_links != EXCLUDED.additional_image_links OR products.image_link != EXCLUDED.image_link
RETURNING id, image_link, additional_image_links
)
SELECT image_link, additional_image_links FROM products WHERE id IN (SELECT id FROM temp);
If conflict happens and new values conform criteria result is generated, however I need to use sqlalchemy machinery for it. Approximate but not working example:
def upsert(table, rows, constraint, update_cols):
query = insert(table).values(rows)
return query.on_conflict_do_update(
constraint=constraint,
set_={c: getattr(query.excluded, c) for c in update_cols},
where=getattr(table.c, "additional_image_link") != getattr(query.excluded, "additional_image_link"),
).cte("upsert")
Calling which produces the exception:
sesh = session(autocommit=False, autoflush=False, engine=DEFAULT)
sesh.execute(upsert(*args))
sqlalchemy.exc.ArgumentError: Executable SQL or text() construct expected, got <sqlalchemy.sql.selectable.CTE at 0x1042c3f10; upsert>.

Use result from one operator inside another

I would like in get_birth_date use result from get_all_pets. How can i access it inside get_birth_date? Moreover i would like to print result from get_all_pets, where could i deifne such print()?
Where in my code could i do this?
import datetime
from airflow import DAG
from airflow.providers.postgres.operators.postgres import PostgresOperator
# create_pet_table, populate_pet_table, get_all_pets, and get_birth_date are examples of tasks created by
# instantiating the Postgres Operator
with DAG(
dag_id="postgres_operator_dag",
start_date=datetime.datetime(2020, 2, 2),
schedule_interval="#once",
catchup=False,
) as dag:
create_pet_table = PostgresOperator(
task_id="create_pet_table",
postgres_conn_id="postgres_robert",
sql="""
CREATE TABLE IF NOT EXISTS pet (
pet_id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
pet_type VARCHAR NOT NULL,
birth_date DATE NOT NULL,
OWNER VARCHAR NOT NULL);
""",
)
populate_pet_table = PostgresOperator(
task_id="populate_pet_table",
postgres_conn_id="postgres_robert",
sql="""
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
""",
)
get_all_pets = PostgresOperator(task_id="get_all_pets",postgres_conn_id="postgres_robert", sql="SELECT * FROM pet;")
get_birth_date = PostgresOperator(
task_id="get_birth_date",
postgres_conn_id="postgres_robert",
sql="SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC %(begin_date)s AND %(end_date)s",
parameters={"begin_date": "2020-01-01", "end_date": "2020-12-31"},
runtime_parameters={'statement_timeout': '3000ms'},
)
create_pet_table >> populate_pet_table >> get_all_pets >> get_birth_date
PostgresOperator is not suitable for running SELECT statements.
SELECT statements are more suitable for transfer operators or using hooks directly.
In your case you should use the PostgresHook:
from airflow.decorators import task
from airflow.providers.postgres.hooks.postgres import PostgresHook
#task()
def get_all_pets(**kwargs):
hook = PostgresHook(postgres_conn_id="postgres_robert")
df = hook.get_pandas_df(sql="SELECT * FROM pet;")
print(df)

Unable to insert nested record in postgres

i had managed to create tables in postgres but encountered issues when trying to insert values.
comands = (
CREATE TYPE student AS (
name TEXT,
id INTEGER
)
CREATE TABLE studentclass(
date DATE NOT NULL,
time TIMESTAMPTZ NOT NULL,
PRIMARY KEY (date, time),
class student
)
)
And in psycog2
command = (
INSERT INTO studentclass (date, time, student) VALUES (%s,%s, ROW(%s,%s)::student)
)
student_rec = ("John", 1)
record_to_insert = ("2020-05-21", "2020-05-21 08:10:00", student_rec)
cursor.execute(commands, record_to_insert)
When executed, the errors are the incorrect argument and if i tried to hard coded the student value inside the INSERT statement, it will inform me about the unrecognized column for student.
Please advise.
One issue is the column name is class not student. Second is psycopg2 does tuple adaption as composite type
So you can do:
insert_sql = "INSERT INTO studentclass (date, time, class) VALUES (%s,%s,%s)"
student_rec = ("John", 1)
record_to_insert = ("2020-05-21", "2020-05-21 08:10:00", student_rec)
cur.execute(insert_sql, record_to_insert)
con.commit()
select * from studentclass ;
date | time | class
------------+-------------------------+----------
05/21/2020 | 05/21/2020 08:10:00 PDT | (John,1)