How to make SQLAlchemy issue additional SQL after flushing the current session? - postgresql

I have some SQL that I'd like SQLAlchemy to issue after it flushes the current session.
So I'm trying to write a Python function that will do the following "at the end of this specific SQLAlchemy session, 1) flush the session, 2) then send this extra SQL to the database as well, 3) then finally commit the session", but only if I call it within that particular session.
I don't want it on all sessions globally, so if I didn't call the function within this session, then don't execute the SQL.
I know SQLAlchemy has a built-in events system, and I played around with it, but I can't figure out how to register an event listener for only the current session, and not all sessions globally. I read the docs, but I'm still not getting it.
I am aware of database triggers, but they won't work for this particular scenario.
I'm using Flask-SQLAlchemy, which uses scoped sessions.

Not sure why it does not work for you. Sample code below runs as expected:
class Stuff(Base):
__tablename__ = 'stuff'
id = Column(Integer, primary_key=True)
name = Column(String)
Base.metadata.create_all(engine)
session = Session()
from sqlalchemy import event
#event.listens_for(session, 'after_flush')
def _handle_event(session, context):
print('>> --- after_flush started ---')
rows = session.execute("SELECT 1 AS XXX").fetchall()
print(rows)
print('>> --- after_flush finished ---')
# create test data
s1 = Stuff(name='uno')
session.add(s1)
print('--- before calling commit ---')
session.commit()
print('--- after calling commit ---')

Related

How to lock a row while updating it in SqlAlchemy Core?

It is incredibly difficult to unit test race conditions like this. I was hoping to verify this with experts here.
I have the following scenario where I would like to obtain the first VPN Profile that is not assigned to any device.
In the meanwhile any other process trying to obtain the same profile (since it's the next in line), should wait here until my transaction has completed.
I would then update this vpn profile and assign the given device id to it and finish the transaction.
At this point any process that was waiting on the select().first() statement shouldn't be obtaining this particular VPN profile, because it has already been assigned to a device_id. Instead it should obtain the next available one.
After some digging, this is the code I have come up with. I'm using with_for_update() in the select statement and keep everything within the same engine.begin() transaction. I'm using SQLAlchemy Core without ORM.
async with engine.begin() as conn:
query = (
VpnProfileTable.select()
.where(
VpnProfileTable.c.device_id == None,
)
.with_for_update(nowait=False)
)
record = (await conn.execute(query)).first()
if record:
query = (
VpnProfileTable.update()
.where(VpnProfileTable.c.id == record.id)
.values(device_id=device_id)
)
await conn.execute(query)
await conn.commit()
Is my code reflecting what I'm trying to achieve?
Not sure if I need to commit(), since there is already a with engine.begin() statement. It should happen automatically at the end.
Many thanks

Slick insert into H2, but no data inserted

I'm sure I am missing something really stupidly obvious here - I have a unit test for a very simple Slick 3.2 setup. The DAO has basic retrieve and insert methods as follows:
override def questions: Future[Seq[Tables.QuestionRow]] =
db.run(Question.result)
override def createQuestion(title: String, body: String, authorUuid: UUID): Future[Long] =
db.run(Question returning Question.map(_.id) += QuestionRow(0l, UUID.randomUUID().toString, title, body, authorUuid.toString))
And I have some unit tests - for the tests im using in memory H2 and have a setup script (passed to the jdbcurl) to initialise two basic rows in the table.
The unit tests for retriving works fine, and they fetch the two rows inserted by the init script, and I have just added a simple unit test to create a row and then retrieve them all - assuming it will fetch the three rows, but no matter what I do, it only ever retrieves the initial two:
it should "create a new question" in {
whenReady(questionDao.createQuestion("Question three", "some body", UUID.randomUUID)) { s =>
whenReady(questionDao.questions(s)) { q =>
println(s)
println(q.map(_.title))
assert(true)
}
}
}
The output shows that the original s (the returning ID from the autoinc) is 3, as I would expect (I have also tried the insert not doing the returning step and just letting it return the number of rows inserted, which returns 1, as expecteD), but looking at the values returned in q, its only ever the first two rows inserted by the init script.
What am I missing?
My assumptions are that your JDBC url is something like jdbc:h2:mem:test;INIT=RUNSCRIPT FROM 'init.sql' and no connection pooling is used.
There are two scenarios:
the connection is performed with keepAliveConnection = true (or by appending DB_CLOSE_DELAY=-1 to the JDBC url) and the init.sql is something like:
DROP TABLE IF EXISTS QUESTION;
CREATE TABLE QUESTION(...);
INSERT INTO QUESTION VALUES(null, ...);
INSERT INTO QUESTION VALUES(null, ...);
the connection is performed with keepAliveConnection = false (default) (without appending DB_CLOSE_DELAY=-1 to the JDBC url) and the init.sql is something like:
CREATE TABLE QUESTION(...);
INSERT INTO QUESTION VALUES(null, ...);
INSERT INTO QUESTION VALUES(null, ...);
The call to questionDao.createQuestion will open a new connection to your H2 database and will trigger the initialization script (init.sql).
In both scenarios, right after this call, the database contains a QUESTION table with 2 rows.
In scenario (2) after this call the connection is closed and according to H2 documentation:
By default, closing the last connection to a database closes the database. For an in-memory database, this means the content is lost. To keep the database open, add ;DB_CLOSE_DELAY=-1 to the database URL. To keep the content of an in-memory database as long as the virtual machine is alive, use jdbc:h2:mem:test;DB_CLOSE_DELAY=-1.
The call to questionDao.questions will then open a new connection to your H2 database and will trigger again the initialization script (init.sql).
In scenario (1) the first connection is kept alive (and also the database content) but the new connection will re-execute the initialization script (init.sql) erasing the database content.
Given that (in both scenarios) questionDao.createQuestion returns 3, as expected, but then the content is lost and so the subsequent call to questionDao.questions will use a freshly initialized database.

How to make SQLAlchemy custom DDL be emitted after object inserted?

I have a PostgreSQL Materialized View that calculates some data about Manufacturers.
I created a SQLAlchemy custom DDL command to refresh the view:
from sqlalchemy.schema import DDLElement
from sqlalchemy.ext import compiler
class RefreshMaterializedView(DDLElement):
'''Target expected to be a view name string'''
def __init__(self, concurrently):
self.concurrently = concurrently
#compiler.compiles(RefreshMaterializedView)
def compile(element, compiler, **kw):
if element.concurrently:
return "REFRESH MATERIALIZED VIEW CONCURRENTLY %s" % (element.target)
return "REFRESH MATERIALIZED VIEW %s" % (element.target)
class ManufacturerMaterializedView(db.Model):
#classmethod
def refresh(cls, concurrently=True, bind=db.session):
RefreshMaterializedView(concurrently).execute(
target=cls.__table__.fullname, bind=bind)
Here's how I currently use it in my code:
db.session.add(new_manufacturer_instance)
ManufacturerMaterializedViewClass.refresh() # bound to the same session
db.session.flush()
# some other stuff
db.session.add(another_manufacturer_instance) # still in the same PostgreSQL transaction
ManufacturerMaterializedViewClass.refresh()
db.session.commit()
Desired behavior:
The materialized view is refreshed after the new_manufacturer_instance is created.
I can repeatedly insert new manufacturers and call ManufacturerMaterializedViewClass.refresh() multiple times within the same session, but the refresh will only be emitted once, at the end of the session after all the INSERT/UPDATE/DELETE statements for all objects have been emitted. Other object types affect the output of this materialized view, so this refresh statement needs to be emitted after those objects are modified.
Here's what's currently happening when I view the Flask-Sqlalchemy query log using SQLALCHEMY_ECHO = True:
$ python manage.py shell
>>> ManufacturerFactory() # creates a new manufacturer instance and adds it to the session
<Manufacturer #None:'est0'>
>>> ManufacturerMV.refresh()
2015-11-29 13:33:44,811 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2015-11-29 13:33:44,812 INFO sqlalchemy.engine.base.Engine REFRESH MATERIALIZED VIEW CONCURRENTLY manufacturer_mv
2015-11-29 13:33:44,812 INFO sqlalchemy.engine.base.Engine {}
>>> db.session.flush()
2015-11-29 13:34:13,745 INFO sqlalchemy.engine.base.Engine INSERT INTO manufacturer (name, website, logo, time_updated) VALUES (%(name)s, %(website)s, %(logo)s, %(time_updated)s) RETURNING manufacturer.id
2015-11-29 13:34:13,745 INFO sqlalchemy.engine.base.Engine {'logo': '/static/images/16-rc_gear_essential.jpg', 'website': 'http://hermann.com/', 'time_updated': None, 'name': 'est0'}
>>>> db.session.commit()
2015-11-29 13:42:58,160 INFO sqlalchemy.engine.base.Engine COMMIT
As you can see, calling refresh() immediately issues SQL to the DB, even before a session.flush(), pre-empting any additional insert/update statements. However, the DDL isn't actually executed by PostgreSQL until session.commit() closes the transaction.
How should I modify my DDL/classmethod to achieve my desired behavior?
I looked at the ORM events but wasn't quite sure how to leverage them for my use case. I don't want a refresh called on every session.commit() emitted by my application. The refreshing this particular view is a fairly expensive operation, so it should only happen when I actually called refresh() within the current transaction/session.

Running cleaning/validation code before committing in sqlalchemy

I'm totally new to PostGreSQL and SQLAlchemy, and I'm trying to figure out how to run validation/cleaning code on an SQLAlchemy model before it is committed to the database. The idea is to ensure data consistency beyond the standard type enforcement that comes built into SQL databases. For example, if I have a User model built on SQLAlchemy's models,
class User(db.Model):
...
email = db.Column(db.String())
zipCode = db.Column(db.String())
lat = db.Column(db.Float())
lng = db.Column(db.Float())
...
Before committing this document, I want to:
trim any leading & trailing spaces off the email field
ensure that the zip code is a 5-digit string of numbers (I'll define a regex for this)
automatically look up the corresponding latitude/longitude of the zip code and save those in the lat & lng fields.
other things that a database schema can't enforce
Does SQLAlchemy provide an easy way to provide Python code that is guaranteed to run before committing to do arbitrary tasks like this?
I found the easiest is to hook onto the update and insert events. http://docs.sqlalchemy.org/en/latest/orm/events.html
from sqlalchemy import event
def my_before_insert_listener(mapper, connection, target):
target.email=target.email.trim()
#All the other stuff
# associate the listener function with User,
# to execute during the "before_insert" hook
event.listen(
User, 'before_insert', my_before_insert_listener)
You can create custom sqlalchemy types that do this sort of thing.

Row-Level Update Lock using System.Transactions

I have a MSSQL procedure with the following code in it:
SELECT Id, Role, JurisdictionType, JurisdictionKey
FROM
dbo.SecurityAssignment WITH(UPDLOCK, ROWLOCK)
WHERE Id = #UserIdentity
I'm trying to move that same behavior into a component that uses OleDb connections, commands, and transactions to achieve the same result. (It's a security component that uses the SecurityAssignment table shown above. I want it to work whether that table is in MSSQL, Oracle, or Db2)
Given the above SQL, if I run a test using the following code
Thread backgroundThread = new Thread(
delegate()
{
using (var transactionScope = new TrasnsactionScope())
{
Subject.GetAssignmentsHavingUser(userIdentity);
Thread.Sleep(5000);
backgroundWork();
transactionScope.Complete();
}
});
backgroundThread.Start();
Thread.Sleep(3000);
var foregroundResults = Subject.GetAssignmentsHavingUser(userIdentity);
Where
Subject.GetAssignmentsHavingUser
runs the sql above and returns a collection of results and backgroundWork is an Action that updates rows in the table, like this:
delegate
{
Subject.UpdateAssignment(newAssignment(user1, role1));
}
Then the foregroundResults returned by the test should reflect the changes made in the backgroundWork action.
That is, I retrieve a list of SecurityAssignment table rows that have UPDLOCK, ROWLOCK applied by the SQL, and subsequent queries against those rows don't return until that update lock is released - thus the foregroundResult in the test includes the updates made in the backgroundThread.
This all works fine.
Now, I want to do the same with database-agnostic SQL, using OleDb transactions and isolation levels to achieve the same result. And I can't for the life of me, figure out how to do it. Is it even possible, or does this row-level locking only apply at the db level?