EclipseLink disable change tracking per query - jpa

We currently experience severe slowdowns in our application due to change tracking in EclipseLink. The problem is home-made, we don’t use JPA as it was meant to be.
I would like to know, how I can get cache hits (1st level), but don’t include these entities in change tracking except some condition is met.
// not real code, but close if you decompose the layers and inline methods
public void foo(Long customerId, boolean changeName, String newName) {
/*** check customer valid ***/
// #1 HOW TO discard from change tracking? – Customer won’t get modified!
Customer customer = entityManager.find(Customer.class, customerId);
// some Business Rules
// #2 They should be auto discarded from change tracking because #1 is also discarded)
checkSomething(customer.getAddresses());
checkSomething(customer.getPhoneNumbers ());
…
/*** manipulate customer ***/
// somewhere else in different classes/methods …
if(changeName) {
// #3 HOW TO get Cache hit (1st level) - it was read in #1
// newName should be persisted
customer = entityManager.find(Customer.class, customerId);
customer.setName(newName);
}
}
It would be ok to use EclipseLink API for #1 and #2
I would prefer hints.
EclipseLink 2.4.2
2nd level cache: disabled
ChangeTrackingType: DEFERRED

Try using the read-only query hint, which can be passed as a property to find or to queries, and see this for more on hints. The read-only hint should return the instance from the shared 2nd level cache, which should not be modified. As it is not added to the 1st level EntityManager cache, any other reads without the hint will build/return the managed instance.
The documentation states this works for nontransactional read operations, so I'm not sure how it will work if the EntityManager is using a transactional connection for reads, as it will not use the shared cache for reads through a transaction.

Related

Drools - Store Multi Stateful Sessions

We have implemented drools engine in our platform in order to be able to evaluate rules from streams.
In our use case we have a change detection stream which contains the changes of multiple entities.
Rules need to be evaluated for each entity from the stream over a period of time and evolve it's state apart from others entities(Sessions). Those rules produces alerts based on the state of each entity. And for this reason entities should be into boundaries, so the state of one entity does not interfere on the others.
To achieve this, we create a session as a Spring Bean for each entity id and store it in a inMemory HashMap. So every time an entity arrives, we try to find it`s session on the inMemory Map by using it's Id. If we get a null return we create it.
It does`t seems the right way to accomplish it. Because it does not offer a disaster recover strategy neither offers a great memory management.
We could use some kind of inMemory database such as Redis or Memchached. But I don`t think it would be able to recover a stateful session precisely.
Does someone know how to achieve disaster recover and a good memory management with a embedded Drools with multi sessions in the right way? Does the platform offers some solution?
Thanks very much for your attention and support
The answer is not to try to persist and reuse sessions, but rather to persist an object that models the current state of the entity.
Your current workflow is this:
Entity arrives at your application (from change detection stream or elsewhere)
You do a lookup on a hashmap to get a Session which has the entity's state stored
You fire the rules, which updates the session (and possibly the entity)
You persist the session in-memory.
What your workflow should be is this:
(same) Entity arrives at your application
You do a look-up on an external data source for the entity's state -- for example from a database or data store
You fire the rules, passing in the entity state. Instead of updating the session, you update the state instance.
You persist the state to your external data source.
If you add appropriate write-through caches you can guarantee both performance and consistency. This will also allow you to scale your application sideways if you implement appropriate locking / transaction handling for your data source.
Here's a toy example.
Let's say we have an application modelling a Library where a user is allowed to check out books. A user is only allowed to check out a total of 3 books at a time.
The 'event' we receive models a book check-in or check-out event:
class BookBorrowEvent {
int userId;
int bookId;
EventType eventType; // EventType.CHECK_IN or EventType.CHECK_OUT
}
In an external data source we maintain a UserState record -- maybe as a distinct record in a traditional RDBMS or an aggregate; how we store it isn't really relevant to the example. But let's say our UserState record as returned from the data source looks something like this:
class UserState {
int userId;
int[] borrowedBookIds;
}
When we receive the event, we'll first retrieve the user state from the external data store (or an internally-managed write-through cache), then add the UserState to the rule inputs. We should be appropriately handling our sessions (disposing of them after use, using session pools as needed), of course.
public void handleBookBorrow(BookBorrowEvent event) {
UserState state = getUserStateFromStore(event.getUserId());
KieSession kieSession = ...;
kieSession.insert( event );
kieSession.insert( state );
kieSession.fireAllRules();
persistUserStateToStore(state);
}
Your rules would then do their work against the UserState instance, instead of storing values in local variables.
Some example rules:
rule "User borrows a book"
when
BookBorrowEvent( eventType == EventType.CHECK_OUT,
$bookId: bookId != null )
$state: UserState( $checkedOutBooks: borrowedBookIds not contains $bookId )
Integer( this < 3 ) from $checkedOutBooks.length
then
modify( $state ) { ... }
end
rule "User returns a book"
when
BookBorrowEvent( eventType == EventType.CHECK_IN,
$bookId: bookId != null )
$state: UserState( $checkedOutBooks: borrowedBookIds contains $bookId )
then
modify( $state ) { ... }
end
Obviously a toy example, but you could easily add additional rules for cases like user attempts to check out a duplicate copy of a book, user tries to return a book that they hadn't checked out, return an error if the user exceeds the 3 max book borrowing limit, add time-based logic for length of checkout allowed, etc.
Even if you were using stream-based processing so you can take advantage of the temporal operators, this workflow still works because you would be passing the state instance into the evaluation stream as you receive it. Of course in this case it would be more important to properly implement a write-through cache for performance reasons (unless your temporal operators are permissive enough to allow for some data source transaction latency). The only changes you need to make is to refocus your rules to target their data persistence to the state object instead of the session itself -- which isn't generally recommended anyway since sessions are designed to be disposed of.

Locked Caching or Simple Cache with locking

Our e-commerce application built on ATG, has provision whereby multiple users can update the same Order. Since the cache mode for Order is Simple - this has resulted in large number of ConcurrentUpdateException and InvalidVersionException. We were considering locked cache mode, however are skeptical about using locked caching as the Orders are being updated very frequently and locking might result in deadlocks and have its own performance implications.
Is there a way we can continue using simple cache mode and minimize the occurances of ConcurrentUpdateException and InvalidVersionException?
My experience has been that you have to use locked caching with orders on any medium to high volume ATG websites.. Also, remember that the end-user experience is bad when this happens as they either get an error message (if the error handling is good) or they get something like an "internal server error" error.
The reason I believe you need to use locked caching for order is:
You can't guarantee that a user has not got multiple sessions open at the same time which are updating the shopping cart (which is just an incomplete Order). I have also seen examples where customers share their logins with family members etc and then wonder why all these items keep magically appearing in their shopping cart.
There are a number of processes which update the order including things like scenarios and customer service agents using the CSC module.
You could have code which updates orders in a non-safe way.
Some things which might help include:
Always use the OrderManager to load/update an order. Sounds obvious but I have seen a lot of updating orders via the repository.
Make sure that any updates are inside a transaction block.
Try to consolidate any background processes which might update orders to run on a small subset of your ATG instances (this will help reduce concurrency)
The ATG help has this to say about it:
A multi-server application might require locked caching, where only one Oracle ATG Web Commerce instance at a time has write access to the cached data of a given item type. You can use locked caching to prevent multiple servers from trying to update the same item simultaneously—for example, Commerce order items, which can be updated by customers on an external-facing server and by customer service agents on an internal-facing server. By restricting write access, locked caching ensures a consistent view of cached data among all Oracle ATG Web Commerce instances.
That said converting to locked caching will most certainly require performance testing and tuning of the order repository caches. It can and does result in deadlocks (seen that many times) but if configured correctly the deadlocks are infrequent.
Not sure what version of ATG you are using but for 10.2 there is a good explanation here of how you can get everything "in sync".
There is actually a Best Practices approach that was recommended in Legacy ATG Community long time ago. Just pasting it here.
When you are using the Order object with synchronization and transactions, there is a specific usage pattern that is critical to follow. Not following the expected pattern can lead to unnecessary ConcurrentUpdateExceptions, InvalidVersionExceptions, and deadlocks. The following sequence must be strictly adhered to in your code:
Obtain local-lock on profile ID.
Begin Transaction
Synchronize on Order
Perform ALL modifications to the order object.
Call OrderManager.updateOrder.
End Synchronization
End Transaction.
Release local-lock on profile ID.
Steps 1, 2, 7, 8 are done for you in the beforeSet() and afterSet() methods for ATG form handlers where order updates are expected. These include form handlers that extend PurchaseProcessFormHandler and OrderModifierFormHandler (deprecated). If your code accesses/modifies the order outside of a PurchaseProcessFormHandler, it will likely need to obtain the local-lock manually. The lock fetching can be done using the TransactionLockService.
So, if you have extended an ATG form handler based on PurchaseProcessFormHandler, and have written custom code in a handleXXX() method that updates an order, your code should look like:
synchronized( order )
{
// Do order updates
orderManager.updateOrder( order );
}
If you have written custom code updating an order outside of a PurchaseProcessFormHandler (e.g. CouponFormHandler, droplet, pipeline servlet, fulfillment-related), your code should look like:
ClientLockManager lockManager = getLocalLockManager(); // Should be configured as /atg/commerce/order/LocalLockManager
boolean acquireLock = false;
try
{
acquireLock = !lockManager.hasWriteLock( profileId, Thread.currentThread() );
if ( acquireLock )
lockManager.acquireWriteLock( profileId, Thread.currentThread() );
TransactionDemarcation td = new TransactionDemarcation();
td.begin( transactionManager );
boolean shouldRollback = false;
try
{
synchronized( order )
{
// do order updates
orderManager.updateOrder( order );
}
}
catch ( ... e )
{
shouldRollback = true;
throw e;
}
finally
{
try
{
td.end( shouldRollback );
}
catch ( Throwable th )
{
logError( th );
}
}
}
finally
{
try
{
if ( acquireLock )
lockManager.releaseWriteLock( profileId, Thread.currentThread(), true );
}
catch( Throwable th )
{
logError( th );
}
}
This pattern is only useful to prevent ConcurrentUpdateExceptions, InvalidVersionExceptions, and deadlocks when multiple threads attempt to update the same order on the same ATG instance. This should be adequate for most situations on a commerce site since session stickiness will confine updates to the same order to the same ATG instance.

EF database concurrency

I have two apps: one app is asp.net and another is a windows service running in background.
The windows service running in background is performing some tasks (read and update) on database while user can perform other operations on database through asp.net app. So I am worried about it as for example, in windows service I collect some record that satisfy a condition and then I iterate over them, something like:
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>)
foreach (EntityA entity in collection)
{
// do some stuff
}
so, if user modify a record that is used later in the loop iteration, what value for that record is EF taken into account? the original retrieved when performed:
context.EntitiesA.where(<condition>)
or the new one modified by the user and located in database?
As far as I know, during iteration, EF is taken each record at demand, I mean, one by one, so when reading the next record for the next iteration, this record corresponds to that collected from :
context.EntitiesA.where(<condition>)
or that located in database (the one the user has just modified)?
Thanks!
There's a couple of process that will come into play here in terms of how this will work in EF.
Queries are only performed on enumeration (this is sometimes referred to as query materialisation) at this point the whole query will be performed
Lazy loading only effects navigation properties in your above example. The result set of the where statement will be pulled down in one go.
So what does this mean in your case:
//nothing happens here you are just describing what will happen later to make the
// query execute here do a .ToArray or similar, to prevent people adding to the sql
// resulting from this use .AsEnumerable
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>);
//when it first hits this foreach a
//SELECT {cols} FROM [YourTable] WHERE [YourCondition] will be performed
foreach (EntityA entity in collection)
{
//data here will be from the point in time the foreach started (eg if you have updated during the enumeration in the database you will have out of date data)
// do some stuff
}
If you're truly concerned that this can happen then get a list of id's up front and process them individually with a new DbContext for each (or say after each batch of 10). Something like:
IList<int> collection = context.EntitiesA.Where(...).Select(k => k.id).ToList();
foreach (int entityId in collection)
{
using (Context context = new Context())
{
TEntity entity = context.EntitiesA.Find(entityId);
// do some stuff
context.Submit();
}
}
I think the answer to your question is 'it depends'. The problem you are describing is called 'non repeatable reads' an can be prevented from happening by setting a proper transaction isolation level. But it comes with a cost in performance and potential deadlocks.
For more details you can read this

Multiple / Rapid ajax requests and concurrency issues with Entity Framework

I have an asp.net MVC4 application that I am using Unity as my IoC. The constructor for my controller takes in a Repository and that repository takes in a UnitOfWork (DBContext). Everything seems to work fine until multiple ajax requests from the same session happen too fast. I get the Store update, insert, or delete statement affected an unexpected number of rows (0) error due to a concurrency issue. This is what the method looks like called from the ajax request:
public void CaptureData(string apiKey, Guid sessionKey, FormElement formElement)
{
var trackingData = _trackingService.FindById(sessionKey);
if(trackingData != null)
{
formItem = trackingData.FormElements
.Where(f => f.Name == formElement.Name)
.FirstOrDefault();
if(formItem != null)
{
formItem.Value = formElement.Value;
_formElementRepository.Update(formItem);
}
}
}
This only happens when the ajax requests happens rapidly, meaning fast. When the requests happen at a normal speed everything seems fine. It is like the app needs time to catch up. Not sure how I need to handle the concurrency check in my repository so I don't miss an update. Also, I have tried setting the "MultipleActiveResultSets" to true and that didn't help.
As you mentioned in the comment you are using a row version column. The point of this column is to prevent concurrent overwrites of the same row. You have two operations:
Read record - reads record and current row version
Update record - update record with specified key and row version. The row version is updated automatically
Now if those operations are executed by concurrent request you may receive this:
Request A: Read record
Request B: Read record
Request A: Write record - changes row version!
Request B: Write record - fires exception because record with row version retrieved during Read record doesn't exist
The exception is fired to tell you that you are trying to update obsolete data because there is already a new version of the updated record. Normally you need to refresh data (by reloading current record from the database) and try to save them again. In highly concurrent scenario this handling may repeat many times because simply your database is designed to prevent this. Your options are:
Remove row version and let requests overwrite the value as they wish. If you really need concurrent request processing and you are happy to have "some" value, this may be the way to go.
Not allow concurrent requests. If you need to process all updates you most probably also need their real order. In such case your application should not allow concurrent requests.
Use SQL / stored procedure instead. By using table hints you will be able to lock record during Read operation and no other request will be able to read that record before the first one save changes and commits or rollbacks transaction.

Does EF caching work differently for SQL Server CE 3.5?

I have been developing some single-user desktop apps using Entity Framework and SQL Server 3.5. I thought I had read somewhere that once records are in an EF cache for one context, if they are deleted using a different context, they are not removed from the cache for the first context even when a new query is executed. Hence, I've been writing really inefficient and obfuscatory code so I can dispose the context and instantiate a new one whenever another method modifies the database using its own context.
I recently discovered some code where I had not re-instantiated the first context under these conditions, but it worked anyway. I wrote a simple test method to see what was going on:
using (UnitsDefinitionEntities context1 = new UnitsDefinitionEntities())
{
List<RealmDef> rdl1 = (from RealmDef rd in context1.RealmDefs
select rd).ToList();
RealmDef rd1 = RealmDef.CreateRealmDef(100, "TestRealm1", MeasurementSystem.Unknown, 0);
context1.RealmDefs.AddObject(rd1);
context1.SaveChanges();
int rd1ID = rd1.RealmID;
using (UnitsDefinitionEntities context2
= new UnitsDefinitionEntities())
{
RealmDef rd2 = (from RealmDef r in context2.RealmDefs
where r.RealmID == rd1ID select r).Single();
context2.RealmDefs.DeleteObject(rd2);
context2.SaveChanges();
rd2 = null;
}
rdl1 = (from RealmDef rd in context1.RealmDefs select rd).ToList();
Setting a breakpoint at the last line I was amazed to find that the added and deleted entity was in fact not returned by the second query on the first context!
I several possible explanations:
I am totally mistaken in my understanding that the cached records
are not removed upon requerying.
EF is capricious in its caching and it's a matter of luck.
Caching has changed in EF 4.1.
The issue does not arise when the two contexts are
instantiated in the same process.
Caching works differently for SQL CE 3.5 than other versions of SQL
server.
I suspect the answer may be one of the last two options. I would really rather not have to deal with all the hassles in constantly re-instantiating contexts for single-user desktop apps if I don't have to do so.
Can I rely on this discovered behavior for single-user desktop apps using SQL CE (3.5 and 4)?
When you run the 2nd query on an the ObjectSet it's requerying the database, which is why it's reflecting the change exposed by your 2nd context. Before we go too far into this, are you sure you want to have 2 contexts like you're explaining? Contexts should be short lived, so it might be better if you're caching your list in memory or doing something else of that nature.
That being said, you can access the local store by calling ObjectStateManager.GetObjectStateEntries and viewing what is in the store there. However, what you're probably looking for is the .Local storage that's provided by DbSets in EF 4.2 and beyond. See this blog post for more information about that.
Judging by your class names, it looks like you're using an edmx so you'll need to make some changes to your file to have your context inherit from a DbSet to an objectset. This post can show you how
Apparently Explanation #1 was closer to the fact. Inserting the following statement at the end of the example:
var cached = context1.ObjectStateManager.GetObjectStateEntries(System.Data.EntityState.Unchanged);
revealed that the record was in fact still in the cache. Mark Oreta was essentially correct in that the database is actually re-queried in the above example.
However, navigational properties apparently behave differently, e.g.:
RealmDef distance = (from RealmDef rd in context1.RealmDefs
where rd.Name == "Distance"
select rd).Single();
SystemDef metric = (from SystemDef sd in context1.SystemDefs
where sd.Name == "Metric"
select sd).Single();
RealmSystem rs1 = (from RealmSystem rs in distance.RealmSystems
where rs.SystemID == metric.SystemID
select rs).Single();
UnitDef ud1 = UnitDef.CreateUnitDef(distance.RealmID, metric.SystemID, 100, "testunit");
rs1.UnitDefs.Add(ud1);
context1.SaveChanges();
using (UnitsDefinitionEntities context2 = new UnitsDefinitionEntities())
{
UnitDef ud2 = (from UnitDef ud in context2.UnitDefs
where ud.Name == "testunit"
select ud).Single();
context2.UnitDefs.DeleteObject(ud2);
context2.SaveChanges();
}
udList = (from UnitDef ud in rs1.UnitDefs select ud).ToList();
In this case, breaking after the last statement reveals that the last query returns the deleted entry from the cache. This was my source of confusion.
I think I now have a better understanding of what Julia Lerman meant by "Query the model, not the database." As I understand it, in the previous example I was querying the database. In this case I am querying the model. Querying the database in the previous situation happened to do what I wanted, whereas in the latter situation querying the model would not have the desired effect. (This is clearly a problem with my understanding, not with Julia's advice.)