Use result from one operator inside another

Use result from one operator inside another - postgresql

I would like in get_birth_date use result from get_all_pets. How can i access it inside get_birth_date? Moreover i would like to print result from get_all_pets, where could i deifne such print()?
Where in my code could i do this?
import datetime
from airflow import DAG
from airflow.providers.postgres.operators.postgres import PostgresOperator
# create_pet_table, populate_pet_table, get_all_pets, and get_birth_date are examples of tasks created by
# instantiating the Postgres Operator
with DAG(
dag_id="postgres_operator_dag",
start_date=datetime.datetime(2020, 2, 2),
schedule_interval="#once",
catchup=False,
) as dag:
create_pet_table = PostgresOperator(
task_id="create_pet_table",
postgres_conn_id="postgres_robert",
sql="""
CREATE TABLE IF NOT EXISTS pet (
pet_id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
pet_type VARCHAR NOT NULL,
birth_date DATE NOT NULL,
OWNER VARCHAR NOT NULL);
""",
)
populate_pet_table = PostgresOperator(
task_id="populate_pet_table",
postgres_conn_id="postgres_robert",
sql="""
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
""",
)
get_all_pets = PostgresOperator(task_id="get_all_pets",postgres_conn_id="postgres_robert", sql="SELECT * FROM pet;")
get_birth_date = PostgresOperator(
task_id="get_birth_date",
postgres_conn_id="postgres_robert",
sql="SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC %(begin_date)s AND %(end_date)s",
parameters={"begin_date": "2020-01-01", "end_date": "2020-12-31"},
runtime_parameters={'statement_timeout': '3000ms'},
)
create_pet_table >> populate_pet_table >> get_all_pets >> get_birth_date

PostgresOperator is not suitable for running SELECT statements.
SELECT statements are more suitable for transfer operators or using hooks directly.
In your case you should use the PostgresHook:
from airflow.decorators import task
from airflow.providers.postgres.hooks.postgres import PostgresHook
#task()
def get_all_pets(**kwargs):
hook = PostgresHook(postgres_conn_id="postgres_robert")
df = hook.get_pandas_df(sql="SELECT * FROM pet;")
print(df)

Related

How to record time in Postgresql via sqlalchemy

I absolutely do not understand how to write the net time to the database.
I create a table via sqlalchemy using the time object. Am I doing everything right?
windows = Table(
"windows", meta,
Column("courier_id", Integer, ForeignKey("couriers.courier_id"), nullable=False),
Column("start_time", Time),
Column("end_time", Time)
)
And how do I upload the data now? I did it like this
query = datab.windows.insert().values([1, 09:00, 18:00])
await conn.execute(query)
Аnd there is one more question. How to specify which columns to fill in in insert()

Use a datetime.time() object to specify the time values and pass the .values() as a dict with the column names as the keys:
import datetime
from sqlalchemy import (
create_engine,
Table,
Column,
Integer,
Time,
ForeignKey,
MetaData,
)
engine = create_engine("sqlite:///:memory:", echo=True)
windows = Table(
"windows",
MetaData(),
Column(
"courier_id",
Integer,
# ForeignKey("couriers.courier_id"), # omit for this example
nullable=False,
),
Column("start_time", Time),
Column("end_time", Time),
)
windows.create(engine)
stmt = windows.insert().values(
{
"courier_id": 1,
"start_time": datetime.time(9),
"end_time": datetime.time(18),
}
)
with engine.begin() as conn:
conn.execute(stmt)
"""SQL generated:
2021-03-24 16:57:50,811 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2021-03-24 16:57:50,813 INFO sqlalchemy.engine.Engine INSERT INTO windows (courier_id, start_time, end_time) VALUES (?, ?, ?)
2021-03-24 16:57:50,813 INFO sqlalchemy.engine.Engine [generated in 0.00036s] (1, '09:00:00.000000', '18:00:00.000000')
2021-03-24 16:57:50,813 INFO sqlalchemy.engine.Engine COMMIT
"""

Unable to insert nested record in postgres

i had managed to create tables in postgres but encountered issues when trying to insert values.
comands = (
CREATE TYPE student AS (
name TEXT,
id INTEGER
)
CREATE TABLE studentclass(
date DATE NOT NULL,
time TIMESTAMPTZ NOT NULL,
PRIMARY KEY (date, time),
class student
)
)
And in psycog2
command = (
INSERT INTO studentclass (date, time, student) VALUES (%s,%s, ROW(%s,%s)::student)
)
student_rec = ("John", 1)
record_to_insert = ("2020-05-21", "2020-05-21 08:10:00", student_rec)
cursor.execute(commands, record_to_insert)
When executed, the errors are the incorrect argument and if i tried to hard coded the student value inside the INSERT statement, it will inform me about the unrecognized column for student.
Please advise.

One issue is the column name is class not student. Second is psycopg2 does tuple adaption as composite type
So you can do:
insert_sql = "INSERT INTO studentclass (date, time, class) VALUES (%s,%s,%s)"
student_rec = ("John", 1)
record_to_insert = ("2020-05-21", "2020-05-21 08:10:00", student_rec)
cur.execute(insert_sql, record_to_insert)
con.commit()
select * from studentclass ;
date | time | class
------------+-------------------------+----------
05/21/2020 | 05/21/2020 08:10:00 PDT | (John,1)

Interconnecting tables on PostgreSQL

I am a newbie here.
I am using PostgreSQL to manipulate lots of data in my specific field of research. Unfortunately, I am encountering a problem that is not allowing me to continue my analysis. I tried to simplify my problem to clearly illustrate it.
Let's suppose I have a table called "Buyers" with those data:
table_buyers
The buyers can make ONLY ONE purchase in each store or none. There are three stores and there a table for each one. Just like below:
table_store1
table_store2
table_store3
To create the tables, I am using the following code:
CREATE TABLE public.buyer
(
ID integer NOT NULL PRIMARY KEY,
name text NOT NULL,
phone text NOT NULL
)
WITH (
OIDS = FALSE
)
;
CREATE TABLE public.Store1
(
ID_buyer integer NOT NULL PRIMARY KEY,
total_order numeric NOT NULL,
total_itens integer NOT NULL
)
WITH (
OIDS = FALSE
)
;
CREATE TABLE public.Store2
(
ID_buyer integer NOT NULL PRIMARY KEY,
total_order numeric NOT NULL,
total_itens integer NOT NULL
)
WITH (
OIDS = FALSE
)
;
CREATE TABLE public.Store3
(
ID_buyer integer NOT NULL PRIMARY KEY,
total_order numeric NOT NULL,
total_itens integer NOT NULL
)
WITH (
OIDS = FALSE
)
;
To add the information on the tables, I am using the following code:
INSERT INTO buyer (ID, name, phone) VALUES
(1, 'Alex', 88888888),
(2, 'Igor', 77777777),
(3, 'Mike', 66666666);
INSERT INTO Store1 (ID_buyer, total_order, total_itens) VALUES
(1, 87.45, 8),
(2, 14.00, 3),
(3, 12.40, 4);
INSERT INTO Store2 (ID_buyer, total_order, total_itens) VALUES
(1, 785.12, 7),
(2, 9874.21, 25);
INSERT INTO Store3 (ID_buyer, total_order, total_itens) VALUES
(2, 45.87, 1);
As all the tables are interconnected by buyer's ID, I wish I could have a query that generates an output just like this:
desired output table.
Please, note that if the buyer did not buy anything in a store, I must print '0'.
I know this is an easy task, but unfortunately, I have been failing on accomplish it.
Using the 'AND' logical operator, I tried the following code to accomplish this task:
SELECT
buyer.id,
buyer.name,
store1.total_order,
store2.total_order,
store3.total_order
FROM
public.buyer,
public.store1,
public.store2,
public.store3
WHERE
buyer.id = store1.id_buyer AND
buyer.id = store2.id_buyer AND
buyer.id = store3.id_buyer;
But, obviously, it just returned 'Igor' as this was the only buyer that have bought items on all three stores (print screen).
Then, I tried the 'OR' logical operator, just like the following code:
SELECT
buyer.id,
buyer.name,
store1.total_order,
store2.total_order,
store3.total_order
FROM
public.buyer,
public.store1,
public.store2,
public.store3
WHERE
buyer.id = store1.id_buyer OR
buyer.id = store2.id_buyer OR
buyer.id = store3.id_buyer;
But then, it returns 12 lines with wrong values (print screen).
Clearly, my mistake is about not considering that 'Buyers' don't have to on all three stores on my code. I just can't correct it on my own, can you please help me?
I appreciate a lot for an answer that can light up my way. Thanks a lot!
Tips about how I can search for this issue are very welcome as well!

Ok. I doubt that this is the final answer for you, but its a start
SELECT
buyer.id,
buyer.name,
COALESCE( gb_store1.total_orders, 0 ) as store1_total,
COALESCE( gb_store2.total_orders, 0 ) as store2_total,
COALESCE( gb_store3.total_orders, 0 ) as store3_total
FROM
public.buyer,
LEFT OUTER JOIN ( SELECT ID_buyer,
SUM( total_orders ) as total_orders,
SUM( total_itens ) as total_itens
FROM public.store1
GROUP BY ID_buyer ) gb_store1 ON gb_store1.id_buyer = buyer.id ,
LEFT OUTER JOIN ( SELECT ID_buyer,
SUM( total_orders ) as total_orders,
SUM( total_itens ) as total_itens
FROM public.store2
GROUP BY ID_buyer ) gb_store2 ON gb_store2.id_buyer = buyer.id ,
LEFT OUTER JOIN ( SELECT ID_buyer,
SUM( total_orders ) as total_orders,
SUM( total_itens ) as total_itens
FROM public.store3
GROUP BY ID_buyer ) gb_store3 ON gb_store3.id_buyer = buyer.id ;
So, this query has a couple elements should focus on. The subselects/groupby allow you to total within your subtables by ID_buyer. The LEFT OUTER JOIN make its so your query can still return a result, even if a subselect finds no matching record. Finally, the COALESCE allows you to return 0 when one of your totals is NULL (because the subselect found no match).
Hope this helps.

Possible to use pandas/sqlalchemy to insert arrays into sql database? (postgres)

With the following:
engine = sqlalchemy.create_engine(url)
df = pd.DataFrame({
"eid": [1,2],
"f_i": [123, 1231],
"f_i_arr": [[123], [0]],
"f_53": ["2013/12/1","2013/12/1",],
"f_53a": [["2013/12/1"], ["2013/12/1"],],
})
with engine.connect() as con:
con.execute("""
DROP TABLE IF EXISTS public.test;
CREATE TABLE public.test
(
eid integer NOT NULL,
f_i INTEGER NULL,
f_i_arr INTEGER NULL,
f_53 DATE NULL,
f_53a DATE[] NULL,
PRIMARY KEY(eid)
);;
""")
df.to_sql("test", con, if_exists='append')
If I try to insert only column "f_53" (an date) it succeeds.
If I try to add column "f_53a" (a date[]) it fails with:
^
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "f_53a" is of type date[] but expression is of type text[]
LINE 1: ..._53, f_53a, f_i, f_i_arr) VALUES (1, '2013/12/1', ARRAY['201...
^
HINT: You will need to rewrite or cast the expression.
[SQL: 'INSERT INTO test (eid, f_53, f_53a, f_i, f_i_arr) VALUES (%(eid)s, %(f_53)s, %(f_53a)s, %(f_i)s, %(f_i_arr)s)'] [parameters: ({'f_53': '2013/12/1', 'f_53a': ['2013/12/1', '2013/12/1'], 'f_i_arr': [123], 'eid': 1, 'f_i': 123}, {'f_53': '2013/12/1', 'f_53a': ['2013/12/1', '2013/12/1'], 'f_i_arr': [0], 'eid': 2, 'f_i': 1231})]

I have mentioned the dtypes explicitly and it worked for me for postgres.
//sample code
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.dialects import postgresql
df.to_sql('mytable',pgConn, if_exists='append', index=False, dtype={'datetime': sqlalchemy.TIMESTAMP(), 'cur_c':postgresql.ARRAY(sqlalchemy.types.REAL),
'volt_c':postgresql.ARRAY(sqlalchemy.types.REAL)
})

Yes -- is possible to insert [] and [][] types from a dataframe into postgres form a dataframe.
Unlike flat DATE types, which are may be correctly parsed by sql, DATE[] and DATE[][] need to be converted to datetime objects first. Like so.
with engine.connect() as con:
con.execute("""
DROP TABLE IF EXISTS public.test;
CREATE TABLE public.test
(
eid integer NOT NULL,
f_i INTEGER NULL,
f_ia INTEGER[] NULL,
f_iaa INTEGER[][] NULL,
f_d DATE NULL,
f_da DATE[] NULL,
f_daa DATE[][] NULL,
PRIMARY KEY(eid)
);
""")
d = pd.to_datetime("2013/12/1")
i = 99
df = pd.DataFrame({
"eid": [1,2],
"f_i": [i,i],
"f_ia": [None, [i,i]],
"f_iaa": [[[i,i],[i,i]], None],
"f_d": [d,d],
"f_da": [[d,d],None],
"f_daa": [[[d,d],[d,d]],None],
})
df.to_sql("test", con, if_exists='append', index=None)

Using IndexOf and/Or Substring to parse data into separate columns

I am working on migrating data from one database to another for a hospital. In the old database, the doctor's specialty IDs are all in one column (swvar_specialties), each separated by commas. In the new database, each specialty ID will have it's own column (example: Specialty1_PrimaryID, Specialty2_PrimaryID, Specialty3_PrimaryID, etc). I am trying to export the data out of the old database and separate these into these separate columns. I know I can use indexof and substring to do this - I just need help with the syntax.
So this query:
Select swvar_specialties as Specialty1_PrimaryID
From PhysDirectory
might return results similar to 39,52,16. I need this query to display Specialty1_PrimaryID = 39, Specialty2_PrimaryID = 52, and Specialty3_PrimaryID = 16 in the results. Below is my query so far. I will eventually have a join to pull the specialty names from the specialties table. I just need to get this worked out first.
Select pd.ref as PrimaryID, pd.swvar_name_first as FirstName, pd.swvar_name_middle as MiddleName,
pd.swvar_name_last as LastName, pd.swvar_name_suffix + ' ' + pd.swvar_name_degree as NameSuffix,
pd.swvar_birthdate as DateOfBirth,pd.swvar_notes as AdditionalInformation, 'images/' + '' + pd.swvar_photo as ImageURL,
pd.swvar_philosophy as PhilosophyOfCare, pd.swvar_gender as Gender, pd.swvar_specialties as Specialty1_PrimaryID, pd.swvar_languages as Language1_Name
From PhysDirectory as pd

The article Split function equivalent in T-SQL? provides some details on how to use a split function to split a comma-delimited string.
By modifying the table-valued function in presented in this article to provide an identity column we can target a specific row such as Specialty1_PrimaryID:
/*
Splits string into parts delimitered with specified character.
*/
CREATE FUNCTION [dbo].[SDF_SplitString]
(
#sString nvarchar(2048),
#cDelimiter nchar(1)
)
RETURNS #tParts TABLE (id bigint IDENTITY, part nvarchar(2048) )
AS
BEGIN
if #sString is null return
declare #iStart int,
#iPos int
if substring( #sString, 1, 1 ) = #cDelimiter
begin
set #iStart = 2
insert into #tParts
values( null )
end
else
set #iStart = 1
while 1=1
begin
set #iPos = charindex( #cDelimiter, #sString, #iStart )
if #iPos = 0
set #iPos = len( #sString )+1
if #iPos - #iStart > 0
insert into #tParts
values ( substring( #sString, #iStart, #iPos-#iStart ))
else
insert into #tParts
values( null )
set #iStart = #iPos+1
if #iStart > len( #sString )
break
end
RETURN
END
Your query can the utilise this split function as follows:
Select
pd.ref as PrimaryID,
pd.swvar_name_first as FirstName,
pd.swvar_name_middle as MiddleName,
pd.swvar_name_last as LastName,
pd.swvar_name_suffix + ' ' + pd.swvar_name_degree as LastName,
pd.swvar_birthdate as DateOfBirth,pd.swvar_notes as AdditionalInformation,
'images/' + '' + pd.swvar_photo as ImageURL,
pd.swvar_philosophy as PhilosophyOfCare, pd.swvar_gender as Gender,
(Select part from SDF_SplitString(pd.swvar_specialties, ',') where id=1) as Specialty1_PrimaryID,
(Select part from SDF_SplitString(pd.swvar_specialties, ',') where id=2) as Specialty2_PrimaryID,
pd.swvar_languages as Language1_Name
From PhysDirectory as pd