How to achieve consistent transactions in ORMs? - entity-framework

My question is of a conceptual nature.
Let's assume we use an ORM (like Entity Framework, Hibernate, ...) and model our domain objects in those ORMs. Since the ORM will abstract away most SQL / do transactions for us, how can we make sure our transactions will be consistent?
Example:
We have an online shop that sells books. If we sell a book, we would basically use something similar to this:
var book = books.Single(b => b.Id == 42);
book.Quantity -= 1;
books.SaveChanges();
But if I understood ORMs correctly, they will only wrap the change of book in an UPDATE query. Meaning you could get basic concurrency issues like this:
concurrency issue with orms
If you would use SQL, you would just wrap the .Quantity -= 1 in a transaction and be safe.
How would you usually deal with this in a proper way in ORMs, or is this handled somehow automatically?

You have to do a select for update before you change the quantity. This will lock the record.
It depends on the ORM you are using. With Hibernate you can set the LockModeType when doing the query.
In Hibernate this would be LockModeType.PESSIMISTIC_WRITE
Read more about locking in the Hibernate documentation:
https://docs.jboss.org/hibernate/orm/5.4/userguide/html_single/Hibernate_User_Guide.html#locking

Entity Framework uses Optimistic Concurrency for this. And most RDBMS systems won't prevent this update anomaly using a transaction.
UPDATE T SET Quantity = Quantity - 1 WHERE ID = :ID
This single statement will run atomically, and while using a single statement will prevent lost updates, it won't guarantee that Quantity doesn't get reduced below zero.
A pessimistic lock requires not just a transaction, but either locking hints or an elevated transaction isolation level.

Related

Optimistic Locking in Spring Data JDBC

I noticed that Spring Data JDBC doesn't seem to implemented Optimistic Locking (something like a JPA's #Version annotation).
I was thinking on creating a #Modifying query which considers the version field and returns boolean to check manually if the update was successful or not. But I'm afraid this approach is limited to simple entities, not aggregates implying multiple tables.
What's the best way to implement optimistic locking for aggregates?
It depends on your situation. If you just have 7 aggregates of which 5 are single entity aggregates go for the #Modifying solution for the single aggregates and write custom methods for the other 2.
If you have more aggregates consisting of more then one class consider properly implementing it and submitting a PR. The issue is already there: https://jira.spring.io/projects/DATAJDBC/issues/DATAJDBC-219
The main code changes will be in SqlGenerator which would need to add a where clause for aggregate roots if they have a version attribute.
If you are interested in doing a PR and need more assistance, please leave comment on the issue.

Correct approach to using JPA 2

I saw this link at Nabble where someone (James Sutherland) stated to someone that they were "executing a delete all JPQL query. This is basically similar to executing your own SQL, you are responsible for executing the query correctly to maintain your constraints.
This is not the normal way to delete objects in JPA. In JPA you normally read the object, then call remove() on it.".
I was wondering if this is true or not; based on how difficult it's been to remove more than simple tables, I'd start thinking this is correct.
My thoughts thus far are to do it like this:
Perform select statements, however particular they may be (e.g.
select all students where student courses > 4 and marks >= 60 and
student registration between 2011 and 2012).
Display/edit/delete
objects (so EntityManager merge/persist/remove)
Rinse, lather, and so on
Does this sound reasonable as an approach to how one is suppose to use the JPA or am I off base?
The cascading of remove was discussed here: Google App Engine - DELETE JPQL Query and Cascading. Also for instance doing batch updates won't update version column when optimistic locking is used. Thus batch updates/deletes are a bit crippled in JPA.
But I wouldn't say This is not the normal way to delete objects in JPA. When I need to delete 2, 20 or 200 objects based on some condition selecting and fetching them first just to call remove() on each is a bad idea most of the time.
After all batch updates/deletes are there in the specification for a reason.

Are there drawbacks to going around Entity Framework?

I have some design requirements that are not supported by Entity Framework, but are easily met by a simple SQL Query.
Essentially I need to do an insert that sets an Identity value.
Are there drawbacks to making a sproc that does my insert and then having EF call that sproc?
Are there caching concerns I need to be worried about? (Because I will be updating data "behind EF's back".)
Are there concurrency issues?
Anything else I need to be worried about?
If you don't keep around your db context, i.e. you dispose it after every unit of work this should work just fine (this covers most web scenarios) - unless you concurrently operate on the same table - if that is the case you might want to use a lock to synchronize the SQL and the EF queries or catch OptimisticConcurrencyException thrown by EF.
If you do keep a context around on the other hand make sure that you refresh it with RefreshMode.StoreWins.
Also see "Saving Changes and Managing Concurrency"

Create new or update existing entity at one go with JPA

A have a JPA entity that has timestamp field and is distinguished by a complex identifier field. What I need is to update timestamp in an entity that has already been stored, otherwise create and store new entity with the current timestamp.
As it turns out the task is not as simple as it seems from the first sight. The problem is that in concurrent environment I get nasty "Unique index or primary key violation" exception. Here's my code:
// Load existing entity, if any.
Entity e = entityManager.find(Entity.class, id);
if (e == null) {
// Could not find entity with the specified id in the database, so create new one.
e = entityManager.merge(new Entity(id));
}
// Set current time...
e.setTimestamp(new Date());
// ...and finally save entity.
entityManager.flush();
Please note that in this example entity identifier is not generated on insert, it is known in advance.
When two or more of threads run this block of code in parallel, they may simultaneously get null from entityManager.find(Entity.class, id) method call, so they will attempt to save two or more entities at the same time, with the same identifier resulting in error.
I think that there are few solutions to the problem.
Sure I could synchronize this code block with a global lock to prevent concurrent access to the database, but would it be the most efficient way?
Some databases support very handy MERGE statement that updates existing or creates new row if none exists. But I doubt that OpenJPA (JPA implementation of my choice) supports it.
Event if JPA does not support SQL MERGE, I can always fall back to plain old JDBC and do whatever I want with the database. But I don't want to leave comfortable API and mess with hairy JDBC+SQL combination.
There is a magic trick to fix it using standard JPA API only, but I don't know it yet.
Please help.
You are referring to the transaction isolation of JPA transactions. I.e. what is the behaviour of transactions when they access other transactions' resources.
According to this article:
READ_COMMITTED is the expected default Transaction Isolation level for using [..] EJB3 JPA
This means that - yes, you will have problems with the above code.
But JPA doesn't support custom isolation levels.
This thread discusses the topic more extensively. Depending on whether you use Spring or EJB, I think you can make use of the proper transaction strategy.

Challenge: Getting Linq-to-Entities to generate decent SQL without unnecessary joins

I recently came across a question in the Entity Framework forum on msdn:
http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/bb72fae4-0709-48f2-8f85-31d0b6a85f68
The person who asked the question tried to do a relatively simple query, involving two tables, a grouping, order by, and an aggregation using Linq-to-Entities. A pretty straightforward Linq query, and straightforward to do in SQL as well - the kind of stuff people try to do every day.
However, when using Linq-to-Entities the outcome is a complex query with lots of unnecessary joins etc. I tried it and wasn't able to get Linq-to-Entities to generate a decent SQL query from it if using just pure Linq against the EF entities.
Having seen a fair share of monster queries from EF I thought maybe the OP (and me, and others) are doing something wrong. Maybe there is a better way to do this?
So here's my challenge: using the example from the EF forum and using just Linq-to-Entities against the two entities, is it possible to get EF to generate a SQL query without unnecessary joins and other complexities?
I'd like to see EF generate something a little bit closer to what Linq-to-SQL does for the same kind of queries, while still using Linq against a EF model.
Restrictions: use EFv1 .net 3.5 SP1 or EFv4 (beta 1 is part of the VS2010/.net4 beta available for download from Microsoft). No CSDL->SSDL mapping tricks, model 'definingqueries', stored procs, db-side functions, or views allowed. Just plain 1:1 mapping between the model and the db and a pure L2E query that does what the original thread on MSDN asked. An association must exist between the two entities (i.e. my "workaround #1" answer to the original thread is not a valid workaround)
Update: 500pt bounty added. Have fun.
Update: As mentioned above, a solution that uses EFv4 / .net 4 (β1 or later) is of course eligible for the bounty. If you're using .net 4 post β1, please include build number (e.g. 4.0.20605), the L2E query you used, and the SQL it generated and sent to the DB.
Update: This issue has been fixed in VS2010 / .net 4 beta 2. Although the generated SQL still has a couple of [relatively harmless] extra levels of nesting, it doesn't do any of the nutty stuff it used to. The final execution plan after SQL Server's optimizer has had a go at it is now as good as it can be. +++ for the dudes and dudettes responsible for the SQL generating part of EFv4...
If I was that worried about the crazy SQL, I just wouldn't do any of the grouping in the database. I would first query all of the data I needed by finishing it off with a ToList() while using the Include function to load all the data in a single select.
Here's my final result:
var list = from o in _entities.orderT.Include("personT")
.Where(p => p.personT.person_id == person_id &&
p.personT.created >= fromTime &&
p.personT.created <= toTime).ToList()
group o by new { o.name, o.personT.created.Year, o.personT.created.Month, o.personT.created.Day } into g
orderby g.Key.name
select new { g.Key, count = g.Sum(x => x.price) };
This results in a much simpler select:
SELECT
1 AS [C1],
[Extent1].[order_id] AS [order_id],
[Extent1].[name] AS [name],
[Extent1].[created] AS [created],
[Extent1].[price] AS [price],
[Extent4].[person_id] AS [person_id],
[Extent4].[first_name] AS [first_name],
[Extent4].[last_name] AS [last_name],
[Extent4].[created] AS [created1]
FROM [dbo].[orderT] AS [Extent1]
LEFT OUTER JOIN [dbo].[personT] AS [Extent2] ON [Extent1].[person_id] = [Extent2].[person_id]
INNER JOIN [dbo].[personT] AS [Extent3] ON [Extent1].[person_id] = [Extent3].[person_id]
LEFT OUTER JOIN [dbo].[personT] AS [Extent4] ON [Extent1].[person_id] = [Extent4].[person_id]
WHERE ([Extent1].[person_id] = #p__linq__1) AND ([Extent2].[created] >= #p__linq__2) AND ([Extent3].[created] <= #p__linq__3)
Additionally, with the example data provided, SQL Profiler only notices a 3 ms increase in duration of the SQL call.
Personally, I think that anyone that whines about not liking the output SQL of an ORM layer should go back to using Stored Procedures and Datasets. They simply aren't ready to evolve yet, and need to spend a few more years in the proverbial oven. :)
Interesting discussion. I have used 2 ORM models so far (NHibernate and LINQ-to-Entities). In my experience, there is always a line where you have to give up on ORM to generated SQL and resort back to stored procedures or views to achieve best scalable queries. Having said that, I personally think that LINQ works better on more normalized databases and all the nested queries/joins are not a major issue. There are some cases where, in order to increase performance or scalability, you have to use DB server features (indexed views for example on SQL 2008 SE works only with query hints) and you simply cannot use an ORM (except iBatis?).
Granted that you won't get the best performance or scalability by using these nested joins/queries generated by linq but please don't forget the advantages and development benefits given by LINQ (or NHibernate) in any project. Surely there must be some merit to it.
Finally, although I risk comparing apple and oranges but isn't think more like asking: Do you want rapid website development (asp.net webforms, swing) or more control on your HTML (asp.net mvc, RoR)? pick the thing that best suits your requirements.
My 2 cents!
The SQL that linq generates is very efficient. It may look bulky but it takes into account relations on tables and constraints etc. In my opinion you should just blindly use the linq commands and not worry about scale. There are benefits of the large queries as its automatically generated. It avoids any slip ups in relational constraints and adds its own wrappers for faults/exceptions.
If however you want to write the SQL's yourself and still want to work behind the confines of an ORM, then try iBatishttp://ibatis.apache.org/ You have to write the SQL's and joins yourself, so it gives you complete control over the backend model.
Personally, just use SQLMetal and linq. Dont worry about performance and scale, unless you need to.