I'm totally new to PostGreSQL and SQLAlchemy, and I'm trying to figure out how to run validation/cleaning code on an SQLAlchemy model before it is committed to the database. The idea is to ensure data consistency beyond the standard type enforcement that comes built into SQL databases. For example, if I have a User model built on SQLAlchemy's models,
class User(db.Model):
...
email = db.Column(db.String())
zipCode = db.Column(db.String())
lat = db.Column(db.Float())
lng = db.Column(db.Float())
...
Before committing this document, I want to:
trim any leading & trailing spaces off the email field
ensure that the zip code is a 5-digit string of numbers (I'll define a regex for this)
automatically look up the corresponding latitude/longitude of the zip code and save those in the lat & lng fields.
other things that a database schema can't enforce
Does SQLAlchemy provide an easy way to provide Python code that is guaranteed to run before committing to do arbitrary tasks like this?
I found the easiest is to hook onto the update and insert events. http://docs.sqlalchemy.org/en/latest/orm/events.html
from sqlalchemy import event
def my_before_insert_listener(mapper, connection, target):
target.email=target.email.trim()
#All the other stuff
# associate the listener function with User,
# to execute during the "before_insert" hook
event.listen(
User, 'before_insert', my_before_insert_listener)
You can create custom sqlalchemy types that do this sort of thing.
Related
I'm using Firebird 2.5.8 to store information for a software I designed.
A customer contacted me today to inform me of multiple errors that I couldn't understand, and I used the "IBExpert" tool to inspect its database.
To my surprise, all the calculated fields had been transformed into "standard" fields. This is clearly visible in the "DDL" tab of the database tool, which displays tables definition as SQL code.
For instance, the following table definition:
CREATE TABLE TVERSIONS (
...
PARENTPATH COMPUTED BY (((SELECT TFILES.FILEPATH FROM TFILES WHERE ID = TVERSIONS.FILEID))),
....
ISCOMPLETE COMPUTED BY ((((SELECT TBACKUPVERSIONS.ISCOMPLETE FROM TBACKUPVERSIONS WHERE ID = TVERSIONS.CVERSION)))),
CDATE COMPUTED BY (((SELECT TBACKUPVERSIONS.SERVERSTARTDATE FROM TBACKUPVERSIONS WHERE ID = TVERSIONS.CVERSION))),
DDATE COMPUTED BY (((SELECT TBACKUPVERSIONS.SERVERSTARTDATE FROM TBACKUPVERSIONS WHERE ID = TVERSIONS.DVERSION))),
...
);
has been "changed" in the client database into this:
CREATE TABLE TVERSIONS (
...
PARENTPATH VARCHAR(512) CHARACTER SET UTF8 COLLATE UNICODE,
...
ISCOMPLETE SMALLINT,
CDATE TIMESTAMP,
DDATE TIMESTAMP,
...
);
How can such a thing be possible?
I've been using Firebird for more than 10 years, and I've never seen such a behavior until now. Is it possible that it could be a corruption of RDB$FIELDS.RDB$COMPUTED_SOURCE fields?
What would you advise?
To summarize the discussion on firebird-support (and comments above):
The likely cause of this happening is that the database was backed up and restored using gbak, and the restore did not complete successfully. If this happens, gbak will have ended in an error, and the database is in single shutdown state (which means only SYSDBA or the database owner is allowed to create one connection). If the database is not currently in single shutdown mode, someone used gfix to bring the database online again in normal state.
When a database is restored using gbak, calculated fields are initially created as normal fields (though their values are not part of the backup). After data is restored successfully, those fields are altered to be calculated fields. If there are any errors before or during redefinition of the calculated fields, the restore will fail, and the database will be in single shutdown state, and the calculated fields will still be "normal" fields.
I recommend doing a structural comparison of the database to check if calculated fields are the only problem, or if other things (e.g. constraints) are missing. A simple way to do this is to export the DDL of the database and a "known-good" database, for example using ISQL (command line option -extract), and comparing them with a diff tool.
Then either fix the existing database by executing the necessary DDL to restore calculated fields (and other things), or create a new empty database, and move the data from the old to the new (using a datapump tool).
Also check if any data is missing. By default, gbak restores the data in a single transaction, so in that case either all data is present or all data is missing. However, gbak also has a "transaction-per-table" mode (-ONE_AT_A_TIME or -O), which could mean some tables have data, and others have no data.
Im currently using jOOQ to build my SQL (with code generation via the mvn plugin).
Executing the created query is not done by jOOQ though (Using vert.X SqlClient for that).
Lets say I want to select all columns of two tables which share some identical column names. E.g. UserAccount(id,name,...) and Product(id,name,...). When executing the following code
val userTable = USER_ACCOUNT.`as`("u")
val productTable = PRODUCT.`as`("p")
create().select().from(userTable).join(productTable).on(userTable.ID.eq(productTable.AUTHOR_ID))
the build method query.getSQL(ParamType.NAMED) returns me a query like
SELECT "u"."id", "u"."name", ..., "p"."id", "p"."name", ... FROM ...
The problem here is, the resultset will contain the column id and name twice without the prefix "u." or "p.", so I can't map/parse it correctly.
Is there a way how I can say to jOOQ to alias these columns like the following without any further manual efforts ?
SELECT "u"."id" AS "u.id", "u"."name" AS "u.name", ..., "p"."id" AS "p.id", "p"."name" AS "p.name" ...
Im using the holy Postgres Database :)
EDIT: Current approach would be sth like
val productFields = productTable.fields().map { it.`as`(name("p.${it.name}")) }
val userFields = userTable.fields().map { it.`as`(name("p.${it.name}")) }
create().select(productFields,userFields,...)...
This feels really hacky though
How to correctly dereference tables from records
You should always use the column references that you passed to the query to dereference values from records in your result. If you didn't pass column references explicitly, then the ones from your generated table via Table.fields() are used.
In your code, that would correspond to:
userTable.NAME
productTable.NAME
So, in a resulting record, do this:
val rec = ...
rec[userTable.NAME]
rec[productTable.NAME]
Using Record.into(Table)
Since you seem to be projecting all the columns (do you really need all of them?) to the generated POJO classes, you can still do this intermediary step if you want:
val rec = ...
val userAccount: UserAccount = rec.into(userTable).into(UserAccount::class.java)
val product: Product = rec.into(productTable).into(Product::class.java)
Because the generated table has all the necessary meta data, it can decide which columns belong to it, and which ones don't. The POJO doesn't have this meta information, which is why it can't disambiguate the duplicate column names.
Using nested records
You can always use nested records directly in SQL as well in order to produce one of these 2 types:
Record2<Record[N], Record[N]> (e.g. using DSL.row(table.fields()))
Record2<UserAccountRecord, ProductRecord> (e.g using DSL.row(table.fields()).mapping(...), or starting from jOOQ 3.17 directly using a Table<R> as a SelectField<R>)
The second jOOQ 3.17 solution would look like this:
// Using an implicit join here, for convenience
create().select(productTable.userAccount(), productTable)
.from(productTable)
.fetch();
The above is using implicit joins, for additional convenience
Auto aliasing all columns
There are a ton of flavours that users could like to have when "auto-aliasing" columns in SQL. Any solution offered by jOOQ would be no better than the one you've already found, so if you still want to auto-alias all columns, then just do what you did.
But usually, the desire to auto-alias is a derived feature request from a misunderstanding of what's the best approch to do something in jOOQ (see above options), so ideally, you don't follow down the auto-aliasing road.
In an existing codebase some tables are now horizontally sharded within the same database but in different namespaces.
E.g. let's say there previously was a large table users which is now sharded by the country field so there are now the following tables: us.users, ca.users, es.users etc.
Since every single query to the tables already contains the country filter I was thinking of the following minimalistic adjustment so that there's no need to subclass the original model for every country manually or dynamically:
class SessionWithShardedTableSupport(Session):
""" Use sharded tables on the fly """
def connection(self, mapper=None, clause=None, bind=None, close_with_result=None, **kw):
if mapper and mapper.local_table.name == 'users':
mapper.local_table.schema = '???' # `us` or `ca` or `es`
return super().connection(mapper, clause, bind, close_with_result, **kw)
The solution would work fine if there was a way to get the country filter from the query from within the session but there doesn't seem any (at least inspecting both mapper & clause parameters didn't reveal them).
1) Is there a way to get the where clause / the filters from within the session?
2) Is there maybe a better way to adjust the table name / table namespace on the fly with minimalistic changes to the existing code?
i have a question and i m trying to find a code example to implement in my project. Here is the question, i want in powerbuilder to create a datastore from simple sql select and then to fetch one by one the value stored in the ds. I want this cause at the moment i m using CURSOR which is very slow and has transaction size problems, then i tried ROW_NUMBER which also is very slow. I m using on my application both oracle and sql. (with a lot of data), please if u can provide me an pb example it would be very helpful. thank you guys.
Here is an example:
datastore lds_data
lds_data = CREATE datastore
lds_data.DataObject = "your datawindow"
lds_data.SetTransObject (SQLCA)
lds_data.Retrieve() // Put your parms in the parenthesis
...
DESTROY lds_data // Optionnal -
And if you want to dynamically build the Datastore from the SQL statement, replace the 3rd line by (ls_err being defined as string variable and will contain possible return error) :
lds_data.create(sqlca.SyntaxFromSQL('select col, you, want from your_table', 'Style(Type=Form)', ls_err))
I'm using SQL Alchemy (1.0) ORM with a PostgreSQL database. Let's say I have a line in my class
serialid = Column(Integer, Sequence('journal_seq'), unique=True)
I realize that then there is a special "table" in my database, that holds the current value of the last one or next available integer for a serialid. Can I, from ORM, get the value of that integer (without incrementing it - otherwise I can just call next_value)? And is there a guarantee that the next serialid will have exactly that value?
I'd like to make an journal item with a serialid, and also make another item refering to that same serialid, but I'd like two of those to be committed (or rollbacked) together - and until I commit the journal item, I don't know what its serialid is. Maybe there is a cleaner way of doing it without knowing the current sequence value. A relationship would be great, but I don't know how to set it up.
(I know there is a question that asks the same thing when you control the SQL directly. I'd like to do the same from SQL Alchemy ORM.)