I'm using SQLAlchemy 1.3.4 and PostgreSQL 11.3.
I have the following (simplified) table definition:
class MyModel(Base):
__tablename__ = 'mymodel'
id = Column(Integer, primary_key=True)
col1 = Column(Unicode, nullable=False)
col2 = Column(Unicode, nullable=False)
col3 = Column(Unicode, nullable=False)
col4 = Column(Boolean)
created_at = Column(DateTime(timezone=True), nullable=False)
updated_at = Column(DateTime(timezone=True), nullable=False)
__table_args__ = (
Index('uq_mymodel_col1_col2_col3_col4',
col1, col2, col3, col4,
unique=True, postgresql_where=col4.isnot(None)),
Index('uq_mymodel_col1_col2_col3',
col1, col2, col3,
unique=True, postgresql_where=col4.is_(None)),
)
(I had to create 2 unique index rather than a UniqueConstraint because a UniqueConstraint would allow multiple rows with the same (col1, col2, col3) is col4 is null, which I do not want)
I'm trying to do the following query:
INSERT INTO mymodel (col1, col2, col3, col4, created_at, updated_at)
VALUES (%(col1)s, %(col2)s, %(col3)s, %(col4)s, %(created_at)s, %(updated_at)s)
ON CONFLICT DO UPDATE SET updated_at = %(param_1)s
RETURNING mymodel.id
I can't figure out how to properly use SQLAlchemy's on_conflict_do_update() though. :-/
Here is what I tried:
values = {…}
stmt = insert(MyModel.__table__).values(**values)
stmt = stmt.returning(MyModel.__table__.c.id)
stmt = stmt.on_conflict_do_update(set_={'updated_at': values['updated_at']})
result = dbsession.connection().execute(stmt)
However SQLAlchemy complains: Either constraint or index_elements, but not both, must be specified unless DO NOTHING
I find it very unclear how to use constraint or index_elements.
I tried a few things, to no avail. For example:
values = {…}
stmt = insert(MyModel.__table__).values(**values)
stmt = stmt.returning(MyModel.__table__.c.id)
stmt = stmt.on_conflict_do_update(constraint='uq_mymodel_col1_col2_col3_col4'
set_={'updated_at': values['updated_at']})
result = dbsession.connection().execute(stmt)
But then this doesn't work either: constraint "uq_mymodel_col1_col2_col3_col4" for table "mymodel" does not exist. But it does exist. (I even copy-pasted from pgsql to make sure I hadn't made a typo)
In any case, I have two unique constraints which can raise a conflict, but on_conflict_do_update() seems to only take one. So I also tried specifying both like this:
values = {…}
stmt = insert(MyModel.__table__).values(**values)
stmt = stmt.returning(MyModel.__table__.c.id)
stmt = stmt.on_conflict_do_update(constraint='uq_mymodel_col1_col2_col3_col4'
set_={'updated_at': values['updated_at']})
stmt = stmt.on_conflict_do_update(constraint='uq_mymodel_col1_col2_col3'
set_={'updated_at': values['updated_at']})
result = dbsession.connection().execute(stmt)
But I get the same error, that the uq_mymodel_col1_col2_col3_col4 does not exist.
At this point I just can't figure out how to do the above query, and would really appreciate some help.
Ok, I think I figured it out. So the problem didn't come from SQLAlchemy after all, I was actually misusing PostgreSQL.
First, the SQL query I pasted above didn't work, because like SQLAlchemy, PostgreSQL requires specifying either the index columns or a constraint name.
And when I specified one of my constraints, PostgreSQL gave me the same error as SQLAlchemy. And that's because my constraints weren't actually constraints, but unique indexes. It seems it really has to be a unique constraint, not a unique index. (even though that index would have the same effect as a unique constraint)
So I rewrote the model as follows:
# Feel free to use the following code under the MIT license
class NullableBoolean(TypeDecorator):
"""A three-states boolean, which allows working with UNIQUE constraints
In PostgreSQL, when making a composite UNIQUE constraint where one of the
columns is a nullable boolean, then null values for that column are counted
as always different.
So if you have:
class MyModel(Base):
__tablename__ = 'mymodel'
id = Column(Integer, primary_key=True)
col1 = Column(Unicode, nullable=False)
col2 = Column(Unicode, nullable=False)
col3 = Column(Boolean)
__table_args__ = (
UniqueConstraint(col1, col2, col3,
name='uq_mymodel_col1_col2_col3'),
}
Then you could INSERT multiple records which have the same (col1, col2)
when col3 is None.
If you want None to be considered a "proper" value that triggers the
unicity constraint, then use this type instead of a nullable Boolean.
"""
impl = Enum
def __init__(self, **kwargs):
kwargs['name'] = 'nullable_boolean_enum'
super().__init__('true', 'false', 'unknown', **kwargs)
def process_bind_param(self, value, dialect):
"""Convert the Python values into the SQL ones"""
return {
True: 'true',
False: 'false',
None: 'unknown',
}[value]
def process_result_value(self, value, dialect):
"""Convert the SQL values into the Python ones"""
return {
'true': True,
'false': False,
'unknown': None,
}[value]
class MyModel(Base):
__tablename__ = 'mymodel'
id = Column(Integer, primary_key=True)
col1 = Column(Unicode, nullable=False)
col2 = Column(Unicode, nullable=False)
col3 = Column(Unicode, nullable=False)
col4 = Column(Boolean)
created_at = Column(DateTime(timezone=True), nullable=False)
updated_at = Column(DateTime(timezone=True), nullable=False)
__table_args__ = (
UniqueConstraint(col1, col2, col3, col4,
name='uq_mymodel_col1_col2_col3_col4')
)
And now it seems to be working as expected.
Hope that helps someone in the future. If anybody has a better idea though, I'm interested. :)
Related
How could the following table be represented as an SQLA model?
CREATE TABLE x
(
x_id integer GENERATED ALWAYS AS IDENTITY,
x_name text unique
);
I think that if I create a column using:
id = Column(Integer, nullable=False, primary_key=True)
The generated SQL won't use GENERATED ALWAYS AS IDENTITY, but will instead use SERIAL.
Completing the following, in order to use the GENERATED ALWAYS AS IDENTITY syntax, is what I'm unsure of:
class X(Base):
__tablename__ = "x"
x_id = Column( <something here> )
x_name = Column(Text, unique = True)
You can use an Identity in your column definition.
class X(Base):
__tablename__ = "x"
x_id = Column(Integer, Identity(always=True), primary_key=True)
x_name = Column(Text, unique=True)
PS. no need to set nullable=False if you set primary_key=True.
When a conflict occurs on insert into a table with an auto incrementing id column, the sequence gets bumped causing a gap in the id range, which for me is undesirable.
Here is a simplified version of my situation:
create table tab1 (
id serial,
col1 int,
col2 int,
col3 int,
constraint const1 unique (col1, col2)
);
In a stored proc:
insert into tab1 (col1, col2, col3)
values (1, 2, 3)
on conflict on constraint const1
do update set col3 = excluded.col3
If there's a collision, the insert ... on conflict ... update works fine, except the next value from the sequence is burned.
Without doing an exists() check first, is there a way to not burn the next value from the sequence using just a single statement?
Note: There is no chance of a race condition of concurrent updates for the same conflict key.
There's no way to avoid the increment of the sequence, because it happens before the conflict is detected.
Here's the work around I used:
insert into tab1 (col1, col2, col3)
select x.*
from (select 1 a, 2 b, 3 c) x
left join tab1 o on o.col1 = x.a and o.col2 = x.b
where o.col1 is null
returning tab1.id into _id;
if _id is null then
update tab1 set
col3 = 3
where col1 = 1
and col2 = 2;
end if;
I Have a subquery which being used in multiple where conditions in my main query. Due to this the subquery executes multiple times to get the same result. is there a way to store and use the subquery result so that it would execute only once.
Sample code:
from sqlalchemy.sql.schema import ForeignKey
from sqlalchemy import Column, Integer, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.expression import select, union
Base = declarative_base()
class Table1(Base):
__tablename__ = 'table1'
id = Column(Integer, primary_key=True)
uuid = Column(Text, unique=True, nullable=False)
class Table2(Base):
__tablename__ = 'table2'
id = Column(Integer, primary_key=True)
uuid = Column(Text, unique=True, nullable=False)
class Table3(Base):
__tablename__ = 'table3'
id = Column(Integer, primary_key=True)
uuid = Column(Text, unique=True, nullable=False)
class Table4(Base):
__tablename__ = 'table4'
id = Column(Integer, primary_key=True)
type = Column(Text, nullable=False)
class Table5(Base):
__tablename__ = 'table5'
id = Column(Integer, primary_key=True)
res_id = Column(Integer, ForeignKey('table4.id'), nullable=False)
value = Column(Text, nullable=False)
class Table1Map(Base):
__tablename__ = 'table1_map'
id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
map_id = Column(Integer, ForeignKey('table1.id'), primary_key=True, unique=True, nullable=False)
class Table2Map(Base):
__tablename__ = 'table2_map'
id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
map_id = Column(Integer, ForeignKey('table2.id'), primary_key=True, unique=True, nullable=False)
class Table3Map(Base):
__tablename__ = 'table3_map'
id = Column(Integer, ForeignKey('table4.id'), primary_key=True, nullable=False)
map_id = Column(Integer, ForeignKey('table3.id'), primary_key=True, unique=True, nullable=False)
sub_query = select([Table5.__table__.c.id]).where(Table5.__table__.c.value=='somevalue')
subquery_1 = select([Table1.__table__.c.uuid.label("map_id"), Table1Map.__table__.c.id.label("id")]).select_from(Table1.__table__.join(Table1Map.__table__, Table1Map.__table__.c.map_id==Table1.__table__.c.id)).where(Table1Map.__table__.c.id.in_(sub_query))
subquery_2 = select([Table2.__table__.c.uuid.label("map_id"), Table2Map.__table__.c.id.label("id")]).select_from(Table2.__table__.join(Table2Map.__table__, Table2Map.__table__.c.map_id==Table2.__table__.c.id)).where(Table2Map.__table__.c.id.in_(sub_query))
subquery_3 = select([Table3.__table__.c.uuid.label("map_id"), Table3Map.__table__.c.id.label("id")]).select_from(Table3.__table__.join(Table3Map.__table__, Table3Map.__table__.c.map_id==Table3.__table__.c.id)).where(Table3Map.__table__.c.id.in_(sub_query))
main_query = union(subquery_1, subquery_2, subquery_3)
print(main_query)
This Produces the below query. I need to avoid this subquery being repeatedly executed multiple times.
SELECT TABLE1.UUID AS MAP_ID,
TABLE1_MAP.ID AS ID
FROM TABLE1
JOIN TABLE1_MAP ON TABLE1_MAP.MAP_ID = TABLE1.ID
WHERE TABLE1_MAP.ID IN
(SELECT TABLE5.ID
FROM TABLE5
WHERE TABLE5.VALUE = 'some_value')
UNION
SELECT TABLE2.UUID AS MAP_ID,
TABLE2_MAP.ID AS ID
FROM TABLE2
JOIN TABLE2_MAP ON TABLE2_MAP.MAP_ID = TABLE2.ID
WHERE TABLE2_MAP.ID IN
(SELECT TABLE5.ID
FROM TABLE5
WHERE TABLE5.VALUE = 'some_value')
UNION
SELECT TABLE3.UUID AS MAP_ID,
TABLE3_MAP.ID AS ID
FROM TABLE3
JOIN TABLE3_MAP ON TABLE3_MAP.MAP_ID = TABLE3.ID
WHERE TABLE3_MAP.ID IN
(SELECT TABLE5.ID
FROM TABLE5
WHERE TABLE5.VALUE = 'some_value')
Why? Have you run explain (analyze, buffers) sufficiently to show it is actually causing performance issues. It is quite possible the repeated executions finds the necessary value already in memory so does not require additional IO. However, the way to accomplish this in Postgres would be selecting the value from table5 in a CTE: (sorry I do not know your obfuscation manager, Python-Sqlalchemy).
with cte (id) as
(select id
from table5 t5
where t5.value = 'some_value'
)
select t1.uuid as map_id
t1m.id as id
from table1 t1
join table1_map on t1m.id = t1.id
where t1m.id in
(select id
from cte
)
select t2.uuid as map_id
t2m.id as id
from table2 t2
join table2_map on t2m.id = t2.id
where t2m.id in
(select id
from cte
)
select t3.uuid as map_id
t3m.id as id
from table3 t3
join table3_map on t3m.id = t3.id
where t3m.id in
(select id
from cte
);
Notice that you still have the repeat the sub-select (just referencing the CTE). If you insist on removing any repetition you can of course perform the union in a sub-select then filter the ids.
select uuid, id
from (select t1.uuid
, t1.id
from table1 t1
union
select t2.uuid
, t2.id
from table2 t2
union
select t3.uuid
, t3.id
from table3 t3
) tall
where tall.id in
(select t5.id
from table5 t5
where t5.value = 'some_value'
);
Either way run explain to see what actually performs the best in your environment. (Make sure it has production volume. IE do no run tests on tables with 100s of rows if your production has 100K rows).
NOT TESTED.
NOTE: It is a bad idea having a column name uuid. Postgres support the native data type uuid. Poor choice of names leads to confusion (developers not Postgres) and confusion leads to bugs. Often undetected until is becomes a critical production problem.
Currently, I have a table (2 columns ColumnName, Value) with data like this:
ColumnName Value
CustomerName Facebook
CompanyName Google
How can I write a query with And / Or condition to satisfy the request:
With And:
CustomerName = 'YAHOO' And CompanyName = 'Google' will return 0 records
With Or:
CustomerName = 'Facebook' Or CompanyName = 'Google' will return 2 records
I have no idea to begin.
Please advise.
Thanks.
You can research EAV data model for reasons why this model may not scale well.
You can query like so:
declare #YourTable table (ColumnName varchar(100), Value varchar(100) primary key (ColumnName, Value));
insert into #YourTable
select 'CustomerName', 'Facebook' union all
select 'CompanyName', 'Google';
--with And...
select *
from #YourTable
where (ColumnName = 'CustomerName' and Value = 'Yahoo') and
(ColumnName = 'CompanyName' and Value = 'Google')
--with Or...
select *
from #YourTable
where (ColumnName = 'CustomerName' and Value = 'Facebook') or
(ColumnName = 'CompanyName' and Value = 'Google')
I have a stored procedure with 2 CTEs. The second CTE has a parameter
WITH path_sequences
AS
(
),
WITH categories
AS
(
... WHERE CategoryId = #CategoryId
// I dont know how to get this initial parameter inside the CTE
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId
The initial parameter that I need to get inside the second TCE is p.CategoryId. How do I do that without having to create another stored procedure to contain the second CTE?
Thanks for helping
You can create table valued function
create function ftCategories
(
#CategoryID int
)
returns table
as return
with categories as (
... WHERE CategoryId = #CategoryId
)
select Col1, Col2 ...
from categories
and use it as
SELECT *
FROM path_sequences p
cross apply ftCategories(p.CategoryId) c
I have created simple query using your code. You can use it like -
DECLARE #CategoryId INT
SET #CategoryId = 1
;WITH path_sequences
AS
(
SELECT 1 CategoryId
),
categories
AS
(
SELECT 1 CategoryId WHERE 1 = #CategoryId
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId
This syntax is for External Aliases:
-- CTES With External Aliases:
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
AS
-- Define the CTE query.
(
SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
)
The only way to add parameters is to use scope variables like so:
--Declare a variable:
DECLARE #category INT
WITH
MyCTE1 (exName1, exName2)
AS
(
SELECT <SELECT LIST>
FROM <TABLE LIST>
--Use the variable as 'a parameter'
WHERE CategoryId = #CategoryId
)
First remove the second WITH, separate each cte with just a comma. Next you can add parameters like this:
DECLARE #category INT; -- <~~ Parameter outside of CTEs
WITH
MyCTE1 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
SELECT blah blah
FROM blah
WHERE CategoryId = #CategoryId
),
MyCTE2 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
)
SELECT *
FROM MyCTE2
INNER JOIN MyCTE1 ON ...etc....
EDIT (and CLARIFICATION):
I have renamed the columns from param1 and param2 to col1 and col2 (which is what I meant originally).
My example assumes that each SELECT has exactly two columns. The columns are optional if you want to return all of the columns from the underlying query AND those names are unique. If you have more or less columns than what is being SELECTed you will need to specify names.
Here is another example:
Table:
CREATE TABLE Employee
(
Id INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
ManagerId INT NULL
)
Fill table with some rows:
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Donald', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Micky', 'Mouse', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daisy', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Fred', 'Flintstone', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Darth', 'Vader', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Bugs', 'Bunny', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daffy', 'Duck', null)
CTEs:
DECLARE #ManagerId INT = 5;
WITH
MyCTE1 (col1, col2, col3, col4)
AS
(
SELECT *
FROM Employee e
WHERE 1=1
AND e.Id = #ManagerId
),
MyCTE2 (colx, coly, colz, cola)
AS
(
SELECT e.*
FROM Employee e
INNER JOIN MyCTE1 mgr ON mgr.col1 = e.ManagerId
WHERE 1=1
)
SELECT
empsWithMgrs.colx,
empsWithMgrs.coly,
empsWithMgrs.colz,
empsWithMgrs.cola
FROM MyCTE2 empsWithMgrs
Notice in the CTEs the columns are being aliased. MyCTE1 exposes columns as col1, col2, col3, col4 and MyCTE2 references MyCTE1.col1 when it references it. Notice the final select uses MyCTE2's column names.
Results:
For anyone still struggling with this, the only thing you need to is terminate your declaration of variables with a semicolon before the CTE. Nothing else is required.
DECLARE #test AS INT = 42;
WITH x
AS (SELECT #test AS 'Column')
SELECT *
FROM x
Results:
Column
-----------
42
(1 row affected)