I have fount this question, Keeping JPA EntityManager open?, but I still have some concerns.
Is it good idea to have always open EnityManager during application life? Does it consume some resources like database connection? Does it keep entities or it will release them if it uses Weak references? I use EclipseLink 2.x.
Thanks
Zlaja
EntityManager was designed to be rather short-lived. It is technically possible to keep it open for a long time, but sooner or later you will face the following issues:
As you wrote EnityManager keeps loaded entities and indeed it keeps them using weak references (at least with Hibernate, but I'm not sure if this is required by JPA specs). So they should be freed before JVM runs out of memory. Unfortunately, I've seen that keeping large number of entities affects EM performance a lot (negatively of course) when the number grows.
Open EM may consume a database connection, eg. when there are lazy-loadable objects in memory.
EM by definition is not thread-safe, so in web applications (for instance) reusing/sharing one instance is totally unacceptable.
Probably the biggest problem is that when there is any error occurring in EM (eg. on transaction commit due to violation of DB constraints), JPA requires that EM should be closed ASAP and discarded. This will put all your entities residing in memory to detached state, meaning that touching any lazily-loaded collections/references will fail. A workaround for that is to reload all entities, but it's difficult in bigger applications when they are scattered all over the application layers. A solution to that is to start working with detached entities and use EntityManager.merge(). But this usually requires changing programming model, and, in particular, is kind of contradictory to "always-open" entity manager approach. You should use only one approach and stick to it.
So generally it's better to keep EntityManager short-lived, it really simplifies a lot things.
Related
I’m build a calendar/entry/statistics application using quite complex models with a large number of relationships between models.
In general I’m concerned of performance, and are considering different strategies and are looking for input before implementing the application.
I’m completely new to DBContextPooling so please excuse me for possible stupid questions. But has DBContextPooling anything to do with the use of multiple DBContext classes, or is the use related to improved performance regardless of a single or multiple DBContext?
I might end up implementing a larger number of DBsets, or should I avoid it? I’m considering to create multiple DBContext Classes for simplicity, but will this reduce memory use and improve performance? Would it be better/smarter to split the application into smaller projects?
Is there any performance difference in using IEnumerable vs ICollection? I’m avoiding the use of lists as much as possible. Or would it be even better to use IAsyncEnumerable?
Most of your performance pain points will come from a complex architecture that you have not simplified. Managing a monolithic application result in lots of unnecessary 'compensating' logic when one use case is treading on the toes of another and your relationships are so intertwined.
Optimisations such as whether to use Context Pooling or IEnumerable vs ICollection can come later. They should not affect the architecture of your solution.
If your project is as complex as you suggest, then I'd recommend you read up on Domain Driven Design and Microservices and break your application up into several projects (or groups of projects).
https://learn.microsoft.com/en-us/dotnet/architecture/microservices/microservice-ddd-cqrs-patterns/
Each project (or group of projects) will have its own DbContext to administer the entities within that project.
Further, each DbContext should start off by only exposing Aggregate Roots through the DbSets. This can mean more database activity than is strictly necessary for a particular use case but best to start with a clean architecture and squeeze that last ounce of performance (sometimes at the cost of architectural clarity), if and when needed.
For example if you want to add an attendee to an appointment, it can be appealing to attack the Attendee table directly. But, to keep things clean, and considering an attendee cannot exist without an appointment, then you should make appointment the aggregate root and only expose appointment as an entry point for the outside world to attack. That appointment can be retrieved from the database with its attendees. Then ask the appointment to add the attendees. Then save the appointment graph by calling SaveChanges on the DbContext.
In summary, your Appointment is responsible for the functionality within its graph. You should ask Appointment to add an Attendee to the list instead of adding an Attendee to the Appointment's list of attendees. A subtle shift in thinking that can reduce complexity of your solution an awful lot.
The art is deciding where those boundaries between microservices/contexts should lie. There can be pros and cons to two different architectures, with no clear winner
To your other questions:
DbContext Pooling is about maintaining a pool of ready-to-go instantiated DbContexts. It saves the overhead of repeated DbContext instantiation. Probably not worth it, unless you have an awful lot of separate requests coming in and your profiling shows that this is a pain point.
The number of DbSets required is alluded to above.
As for IEnumerable or ICollection or IList, it depends on what functionality you require. Here's a nice simple summary ... https://medium.com/developers-arena/ienumerable-vs-icollection-vs-ilist-vs-iqueryable-in-c-2101351453db
Would it be better/smarter to split the application into smaller
projects?
Yes, absolutely! Start with architectural clarity and then tweak for performance benefits where and when required. Don't start with performance as the goal (unless you're building a millisecond sensitive solution).
DbContext in EF Code first implements Unit of Work and Repository patterns as
MSDN site said:
A DbContext instance represents a combination of the Unit Of Work and Repository patterns such that it can be used to query from a database and group together changes that will then be written back to the store as a unit. DbContext is conceptually similar to ObjectContext.
Does it means that using another UoW and Repository abstractions(such as IRepository and IUnitOfWor), over DbContext is wrong?
In the other word does using another abstraction layer over DbContext add any additional value to our code?
Values such as technology independent DAL(Our Domain will depends on IRepository and IUnitofWork instead of DbContext)
Consider this - you currently have two strong ORMs, each having it's pros and cons over the other:
Entity Framework
NHibernate
Additionally there are several more micro ORMs, such as:
Dapper
Massive
PetaPoco
...
And to make things even more complicated, there are clients / drivers for non-SQL databases such as:
C# driver for MongoDb
StackExchange Driver for Redis
...
And of course, one more thing that always has to be taken in consideration is whether there will be testing that would include mocking the data access layer.
Decision whether you should use UoW/Repository pattern should come from your project itself.
If your project is short-termed, with limited budget, and you are not likely to be using anything else but Entity Framework and SQL, then introducing UoW/Repository layer of abstraction will just take you additional pointless development time which you could have used in something else or completed project earlier and earned some extra cash.
However, if project is long-running and involves more complex development lifecycle that includes continuous testing, then UoW/Repository pattern is a must. With amount of databases that are now in usage and NoSQL movement coming heavily in .NET ecosystem, making a decision to nail selection of ORM and database might cause severe refactoring once you decide to scale out (i.e. scaling out with MongoDb is much cheaper than with SQL, so your client might ask you out of sudden to move everything to MongoDb). As sides are shifting constantly right now and new ideas are being implemented (such as combined graph+document databases), no one can make a good statement which database will be best choice for your project in 1 year from now.
There is no bool answer to this question.
This is just my point of view, and it is coming from developer who works on both short-termed and long-running projects.
I'm refactoring my ASP.NET MVC 4 WebAPI project for performance optimization reasons.
Within my controller code, I'm searching for entities in a context (DbContext, EF6). There are a few thousands of such entities, new ones are added on an hourly basis (i.e. "slowly"), they are rarely deleted (and I don't care if deleted entities are still found on the context's cache!) and are never modified.
After reading the answers to this question, to this one and a few more discussions, I'm still not sure it's a bad idea to use a single static DbContext for the purpose described above - a DbContext which never updates the database.
Performance-wise, I'm not worried about the instantiation cost, but rather about the uselessness of caching requested entities if the DbContext is created for each request. I'm also using a 2nd level caching, which makes the persistence of the context even more acute.
My questions are:
1. Regardless of the specific implementation, is a "static" DbContext a valid solution in my case?
2. If so, what would be the most appropriate way of implementing such a DbContext?
3. Should I periodically "flush" the context to clear the cache in order to prevent if from growing too big?
DbContext caches entity instances when you get/query the data. It ensures different queries that return the same data map to the same entity (based on type and id). Otherwise, if you modify the same entity in different object instances, the context would not know which one has the correct data. Therefore a static DbContext would blow up over time until the process crashes.
DbContexts should be short lived. Request.Properties is a good place to save it in Web API (maps to HttpContext.Items in IIS).
I have a requirement in which we create a single Statelebean which creates a Container Managed EntityManager instance(using #PersistenceContext) in EJB3 environment.
In this single Stateless bean created, we create threads which are executed in specific time intervals. This thread would be running for months.
I have a doubt as to whether the single EntityManager instance obtained from the container (using CMP) can be used for the entire lifetime(> 1yrs).
To the lifetime of EntityManager: I think it is more a question of the DB connection lifetime. In this case, when the JPA provider detects a connection time-out, if you configured your JDBC connection string wth autoReconnect=true you would expect that another connection is built. Also you should search for possibilities of setting a big timeout.
On the other side, you probably ignore that in EJB you are not allowed to open new Threads. In your case, you would have some problems when it comes to managed entities (that are changed in different threads) and to transaction problems. Instead I would use the Timer Service.
An EntityManager seems meant to represent a transactional space. To me, it doesn't make sense to use a single transactional space for the entire life of a thread that will be long lived but it depends on your design and provider implementation how feasible this is. If you are going to use a single EM, make sure it is not shared between threads, and monitor its resource usage, as they are required by JPA to cache every entity read through them as managed instances; you might want to occasionally call em.clear() to detach managed instances so they can be garbage collected at logical points.
I don't think injection will work, as the container should tie the EntityManager to the life of the bean it is injected into, not the life of your thread. You will want to obtain the EntityManagerFactory and obtain/manage your own EntityManager lifecycles for your threads.
Considering this class
public class XQueries
{
public IQueryable Query1()
{
using (XEntities context = new XEntities())
{
return something;
}
}
public IQueryable Query2()
{
using (XEntities context = new XEntities())
{
return somethingElse;
}
}
}
Is a connection to the database created for every (XEntities context = new XEntities()) {...} ? If so what is the correct way to create a static UnitOfWork class so that only 1 connection to exist?
You can't create a static unit of work, because by definition a unit of work is a short lived object. Because the EF ObjectContext is designed around the unit of work pattern it is a bad idea to have a single ObjectContext instance during the life time of the application. There are several reasons for this.
First of all, the ObjectContext class is not thread-safe. This means that during the unit of work of one user (in a web app for instance), another user can commit his unit of work. When they share the same ObjectContext, it means that in that situation just half of the changes are persisted and changes are not transactional. When you are lucky the ObjectContext fails and throws an exception. When you are unlucky, you corrupt the ObjectContext and safe and load crap from and to your database and find out when your application is running in production (of course, during testing and staging everything always seems to work).
Second, the ObjectContext has a caching mechanism that is designed for it to be short lived. When an entity is retrieved from the database it stays in the ObjectContext’s cache until that instance is garbage collected. When you keep that instance alive for a long period of time, entities get stale. Especially if that particular ObjectContext instance is not the only one writing to that database.
The Entity Framework opens connections only when required, for example to execute a query or to call SaveChanges, and then closes the connection when the operation is complete.
From Martin Fowler’s book Patterns of Enterprise Application Architecture in respect to Unit Of Work.
When you're pulling data in and out of
a database, it's important to keep
track of what you've changed;
otherwise, that data won't be written
back into the database. Similarly you
have to insert new objects you create
and remove any objects you delete.
You can change the database with each
change to your object model, but this
can lead to lots of very small
database calls, which ends up being
very slow. Furthermore it requires you
to have a transaction open for the
whole interaction, which is
impractical if you have a business
transaction that spans multiple
requests. The situation is even worse
if you need to keep track of the
objects you've read so you can avoid
inconsistent reads.
A Unit of Work keeps track of
everything you do during a business
transaction that can affect the
database. When you're done, it figures
out everything that needs to be done
to alter the database as a result of
your work.
Whenever I use Entity Framework for a clients (which I'd admit is rare) the ObjectContext object is the Unit Of Work implementation for the system. That is the ObjectContext will somewhat meet the three statements above. Rather than concentrating too much on the absolutely correct definition using the ObjectContext makes things a little easier for you.
Do some research on DI/IoC and Repository patterns this will give you more flexibility in handling your problem.