Does Entity Framework 6 async parallel queries use Multiple Active Result Sets for connection pooling

Does Entity Framework 6 async parallel queries use Multiple Active Result Sets for connection pooling - entity-framework

I have some very large queries the EF is creating resulting in slow response times and high CPU use so I thought as a way to optimize I'd try to implement MARS and Async parallel queries to pull back multiple, simpler result sets in parallel and manipulating in memory.
i.e. I'd like to do this:
public async Task<IEnumerable<TResult>> GetResult<TResult>()
{
using(var context = new Context())
{
return await context.Set<TResult1>().ToListAsync().ConfigureAwait(false);
}
}
IEnumerable<TResult1> result1;
IEnumerable<TResult2> result2;
var result1Task = GetResult<TResult1>();
var result2Task = GetResult<TResult2>();
await Task.WhenAll(result1Task, result2Task).ConfigureAwait(false);
var result1 = result1Task.Result;
var result2 = result2Task.Result;
But not sure if this takes advantage of connection pooling since it creates a new DBContext for each task.
I found this article, but it isn't using Entity Framework.
I found this one using Core and it wasn't a recommended strategy.
And this one using Entity Framework for .NET framework, but is using a stored procedure as the example, but I just want to issue say 3 read queries in parallel, not call an SP.
Ideally looking for a way to achieve multiple result sets using linq to generate the SQL (vs using strings select Id, VendorName From Vendors....) and auto mapping the results to a class without having to use strings (vendorID = (int)vendorReader["BusinessEntityID"];).
Is this possible or a pipe dream?

The requirement for running multiple queries concurrently is usually not solved in ORMs with parallelism in the application. It is not safe to access a single DbContext from multiple threads. Instead, a pattern known as future queries is used. For EF6 this is available in the third party library https://www.nuget.org/packages/Z.EntityFramework.Plus.EF6/
The API is very simple and consists of an extension method that will cause the queries to be added to an internal list until the time when one of the queries is materialized (e.g. by calling ToList). At this time, all the queries are sent to the server in a single batch, and the results are returned together as well.

Related

Problem with the concept of scope in Dependency injection when using EF [duplicate]

This question already has answers here:
What is the best practice in EF Core for using parallel async calls with an Injected DbContext?
(2 answers)
Closed last year.
I have a problem with the concept of scope in dependency injection. I have registered my db context as a scope and And I save the user activity in a table using an asynchronous method without using "await".
// In Startup:
services.AddScoped<IDbContext, StorageSystemDbContext>();
services.AddScoped<IUserActivityService,UserActivityService>();
// In UserActivityService:
public async void LogUserActivityAsync(string controllerName, string actionName, ActionType actionType = ActionType.View, string data = "", string description = "")
{
await InsertAsync(new UserActivity
{
ControllerName = controllerName,
ActionName = actionName,
ActionType = actionType,
CreatedDateTime = DateTime.Now,
Description = description,
UserId = (await _workContext.CurrentUserAsync())?.Id
});
}
//In Controller:
_userActivityService.LogUserActivityAsync(CurrentControllerName, CurrentActionName,data);
I get the following error when I call same action twice immediately:
InvalidOperationException: A second operation was started on this context before a previous operation completed. This is usually caused by different threads concurrently using the same instance of DbContext. For more information on how to avoid threading issues with DbContext, see https://go.microsoft.com/fwlink/?linkid=2097913.
I expected a new db context to be created with the second request, depending on the type of db context dependency registration, but according to this error, a new context was not created for the second request and used the previous one.
What is the reason for this?
I'm using Asp Net.Core MVC and EF in .Net Core 5

An injected DbContext into a service regardless of scoping will be one single reference when constructor injected. Calling multiple methods in that service will always use the same instance. AddedScoped with ASP.Net will scope the services (and DbContext) to the web request. This is the recommended scoping for a DbContext to ensure any entities loaded during a request can ensure that they are all tracked by the same DbContext instance and that DbContext should be alive for the life of that request. (i.e. to provided lazy loading support if needed) A Transient scoped dependency would mean the DbContext passed to 2 different services would be distinct references. This leads to problems where Service A calls another service to retrieve entities that it wants to associate with an entity it loaded and is trying to update. These entities are associated to a different DbContext resulting in errors or issues like duplicate data being created.
Even with a transient scope DbContext you would still have the exact same problem trying to run two calls from the same service in parallel, and there are many good reasons referenced in the comments not to use un-awaited async calls to do so. Even if your intention is to await multiple calls together, the only way to enable something like would be to internally scope the DbContext within the method call itself. This would typically involve injecting a DbContextFactory type class rather than a DbContext into the service, where the DbContextFactory is a dependency that can initialize and provide a new DbContext; Then:
using (var context = _contextFactory.Create())
{
// operations with DbContext. (context)
}
Even then you need to consider the DB synchronization guards like row and table locks / deadlocks which could rear their heads if you have a significant number of operations happening in parallel. Keep in mind with web applications the web server can be responding to a significant number of requests in parallel, each of which could be kicking off these processes at any time. (Works fine during development with 1 client, crawls/dies out in the real world.)

I found the answer here:
https://stackoverflow.com/a/44121808/4604557
If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.

Is it possibile to use a single transaction (on EF) with two different contexts pointing different schemas?

I'm currenly designing an application where I need to use two different database schemas (on the same instance): one as the application base, the other one to customize the application and the fields for every customer.
Since I read something about Repository pattern and as I've understood is possible to use two different contexts without efficiency loose, I'm now asking if I can use a single database transaction between two schemas with Entity Framework, as I'm actually doing directly on the database (SQL Server 2008-2012).
Sorry for my English an Thanks in advance!

If your connection strings are the same (which in your case will be as you have different schemas only for different contexts) then you are ok with this approach.
Basically you will have two different contexts that will be connected via the same connection string to the database and which will represent two different schemas.
using (var scope = new TransactionScope()) {
using (var contextSO = new ContextSchemaOne()) {
// Add, remove, change entities from context schema one
ContextSchemaOne.SaveChanges;
}
using (var contextST = new ContextSchemaTwo()) {
// Add, remove, change entities from context schema two
ContextSchemaTwo.SaveChanges;
}
scope.Complete();
}
I wasn't very successful in the past with this approach, and we switched to one context per database.
Further reading: Entity Framework: One Database, Multiple DbContexts. Is this a bad idea?
Maybe it's better to read something about unit of work before taking a decision about this.
You will have to do something like this: Preparing for multiple EF contexts on a unit of work - TransactionScope

Are navigation properties read each time they are accessed? (EF4.1)

I am using EF 4.1, with POCOs which are lazy loaded.
Some sample queries that I run:
var discountsCount = product.Discounts.Count();
var currentDiscountsCount = product.Discounts.Count(d=>d.IsCurrent);
var expiredDiscountsCount = product.Discounts.Count(d=>d.IsExpired);
What I'd like to know, is whether my queries make sense, or are poorly performant:
Am I hitting the database each time, or will the results come from cached data in the DbContext?
Is it okay to access the navigation properties "from scratch" each time, as above, or should I be caching them and then performing more queries on them, for example:
var discounts = product.Discounts;
var current = discounts.Count(d=>d.IsCurrent);
var expired = discounts.Count(d=>d.Expired);
What about a complicated case like below, does it pull the whole collection and then perform local operations on it, or does it construct a specialised SQL query which means that I cannot reuse the results to avoid hitting the database again:
var chained = discounts.OrderBy(d=>d.CreationDate).Where(d=>d.CreationDate < DateTime.Now).Count();
Thanks for the advice!
EDIT based on comments below
So once I call a navigation property (which is a collection), it will load the entire object graph. But what if I filtered that collection using .Count(d=>d...) or Select(d=>d...) or Min(d=>d...), etc. Does it load the entire graph as well, or only the final data?

product.Discounts (or any other navigation collection) isn't an IQueryable but only an IEnumerable. LINQ operations you perform on product.Discounts will never issue a query to the database - with the only exception that in case of lazy loading product.Discounts will be loaded once from the database into memory. It will be loaded completely - no matter which LINQ operation or filter you perform.
If you want to perform filters or any queries on navigation collections without loading the collection completely into memory you must not access the navigation collection but create a query through the context, for instance in your example:
var chained = context.Entry(product).Collection(p => p.Discounts).Query()
.Where(d => d.CreationDate < DateTime.Now).Count();
This would not load the Discounts collection of the product into memory but perform a query in the database and then return a single number as result. The next query of this kind would go to the database again.

In your examples above the Discounts collection should be populated by Ef the first time you access it. The subsequent linq queries on the Discount collection should then be performed in memory. This will even include the last complex expression.
You can also use the Include method to make sure you are getting back associated collection first time. example .Include("Discounts");
If your worried about performance I would recommend using SQL Profiler to have a look at what SQL is being executed.

Entity Framework: Calling 'Read' when DataReader is closed

Entity Framework: Calling 'Read' when DataReader is closed
I am getting this problem intermittently when i pound my service with parallel asynchronous calls.
i understand that the reader is accessed when calling .ToList() on my defined EF query.
I would like to find out what is the best practice in constructing EF queries to avoid this, and similar problems.
My architecture is as follows:
My Entity Data Layer is a static class, with a static constructor, which instantiates my Entities (_myEntities). It also sets properties on my entities such as MergeOption.
This static class exposes public static methods which simply access the Entities.
public static GetSomeEntity(Criteria c) {
...
var q = _myEntitites.SomeEntity.Where(predicate);
return q.ToList();
}
This has been working in production for some time, but the error above and the one here happen intermittently, esp under heavy loads from clients.
I am also currently setting MultipleActiveResultSets=True in my connection string.

And that is the source of all your problems. Don't use shared context and don't use shared context as data cache or central data access object - it should be defined as one of the main rules in EF. It is also the reason why you need MARS (our discussion from previous question is solved now). When multiple clients executes queries on your shared context in the same time it opens multiple DataReaders on the same db connection.
I'm not sure why you get your current exception but I'm sure that you should redesign your data access approach. If you also modify data on shared context you must.

The issue may come from the connection timeout when trying to get a huge amount of data from your database, so trying to set the connection timeout in your code as below:
Entity 5
((IObjectContextAdapter)this.context).ObjectContext.CommandT‌imeout = 1800;
Other Entity:
this.context.Database.CommandTimeout = 1800;

Is there an in-memory provider for Entity Framework?

I am unit testing code written against the ADO .NET Entity Framework. I would like to populate an in-memory database with rows, and make sure that my code retrieves them properly.
I can mock the Entity Framework using Rhino Mocks, but that would not be sufficient. I would be telling the query what entities to return to me. This would neither test the where clause nor the .Include() statements. I want to be sure that my where clause matches only the rows I intend, and no others. I want to be sure that I have asked for the entities that I need, and none that I don't.
For example:
class CustomerService
{
ObjectQuery<Customer> _customerSource;
public CustomerService(ObjectQuery<Customer> customerSource)
{
_customerSource = customerSource;
}
public Customer GetCustomerById(int customerId)
{
var customers = from c in _customerSource.Include("Order")
where c.CustomerID == customerId
select c;
return customers.FirstOrDefault();
}
}
If I mock the ObjectQuery to return a known customer populated with orders, how do I know that CustomerService has the right where clause and Include? I would rather insert some customer rows and some order rows, then assert that the right customer was selected and the orders are populated.

An InMemory provider is included in EF7 (pre-release).
You can use either the NuGet package, or read about it in the EF repo on GitHub (view source).

The article http://www.codeproject.com/Articles/460175/Two-strategies-for-testing-Entity-Framework-Effort  describes Effort  -Entity Framework provider that runs in memory.
You can still use your DbContext or ObjectContext classes within unit tests, without having to have an actual database.

A better approach here might be to use the Repository pattern to encapsulate your EF code. When testing your services you can use mocks or fakes. When testing your repositories you will want to hit the real DB to ensure that you are getting the results you expect.

There is not currently a in memory provider for EF, but if you take a look at Highway.Data it has a base abstraction interface and an InMemoryDataContext.
Testing Data Access and EF with Highway.Data

Yes, there is at least one such provider - SQLite. I have used it a bit and it works. Also you can try SQL Server Compact. It's an embeded database and has EF providers too.
Edit:
SQLite has support for in-memory databases (link1). All you need is to specify a connection string like: "Data Source=:memory:;Version=3;New=True;". If you need in an example you may look at SharpArchitecture.

I am not familiar with Entity Framework and the ObjectQuery class but if the Include method is virtual you can mock it like this:
// Arrange
var customerSourceStub = MockRepository.GenerateStub<ObjectQuery<Customer>>();
var customers = new Customer[]
{
// Populate your customers as if they were coming from DB
};
customerSourceStub
.Stub(x => x.Include("Order"))
.Return(customers);
var sut = new CustomerService(customerSourceStub);
// Act
var actual = sut.GetCustomerById(5);
// Assert
Assert.IsNotNull(actual);
Assert.AreEqual(5, actual.Id);

You could try SQL Server Compact but it has some quite wild limitations:
SQL Server Compact does not support SKIP expressions in paging queries when it is used with the Entity Framework
SQL Server Compact does not support entities with server-generated keys or values when it is used with the Entity Framework
No outer joins, collate, modulo on floats, aggregates

In EF Core there are two main options for doing this:
SQLite in-memory mode allows you to write efficient tests against a provider that behaves like a relational database.
The InMemory provider is a lightweight provider that has minimal dependencies, but does not always behave like a relational database
I am using SQLite and it supports all queries, that I need to do with Azure SQL production database.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse