Optimize EF Core query with Include() - entity-framework

I have following query within my project and it is consuming lot of time to execute. I am trying to optimize it, but not able to successfully do it. Any suggestions would be highly appreciated.
_context.MainTable
.Include(mt => mt.ChildTable1)
.Include(mt => mt.ChildTable1.ChildTable2)
.Include(mt => mt.ChildTable3)
.Include(mt => mt.ChildTable3.ChildTable4)
.SingleOrDefault(
mt =>
mt.ChildTable3.ChildTable4.Id == id
&&
mt.ChildTable1.Operation == operation
&&
mt.ChildTable1.Method = method
&&
mt.StatusId == statusId);

Include() gets translates to join and you are using too many joins in the code. You can optimize indexes with the help of DB engine execution plan.
I suggest you not to use all Include in one go. instead, you break the query and apply Include one by one. I meant you apply Include, get the result and then apply theIncludeagain and so..By having more than twoInclude` affect the performance.

I Don't see any performance issues with you query.
Since you have a singleOrDefault I would look at uptimizing the database call. If you have analytics tools available then in SQL Server Management Studio then you choose tools > Sql Server Profiler. Get query into SQL Server Management Studio, mark the query and choose "Analyse Query in Database Engine Tuning advisor"

Related

Entity Framework Arithabort ON, but still query is slow

I have a simple query
var count = await _context.ExchangeRate.AsNoTracking().CountAsync(u => u.Currency == "GBP");
The table has only 3 Columns and 10 rows data.
When I tried to execute the query from Net 5 project it is taking around 2.3 seconds for the first time and 500ms (+- 100) for subsequent requests. When I hit the same request in SSMS it is returning in almost no time (45ms as seen in sql profiler).
I have implemented ARITHABORT ON in EF from here
When I see in SQL Profiler it is setting ARITHABORT ON but still the query takes the same time for the first request and subsequent requests.
How do I achieve speed same as SSMS query speed. I need the query to run really speed as my project has requirement to the return the response in 1 second (Need to make atleast 5 simple DB calls...if 1 call takes 500ms then it is crossing 1 second requirement)
Edit
Tried with even ADO.Net. The execution time took as seen in SQL Profiler is 40ms where as when it reached the code it is almost 400ms. So much difference
using (var conn = new SqlConnection(connectionString))
{
var sql = "select count(ExchangeRate) as cnt from ExchangeRate where Currency = 'GBP'";
SqlCommand cmd = new SqlCommand();
cmd.CommandText = "SET ARITHABORT ON; " + sql;
cmd.CommandType = CommandType.Text;
cmd.Connection = conn;
conn.Open();
var t1 = DateTime.Now;
var rd = cmd.ExecuteReader();
var t2 = DateTime.Now;
TimeSpan diff = t2 - t1;
Console.WriteLine((int)diff.TotalMilliseconds);
while (rd.Read())
{
Console.WriteLine(rd["cnt"].ToString());
}
conn.Close();
}
Your "first run" scenario is generally the one-off static initialization of the DbContext. This is where the DbContext works out its mappings for the first time and will occur when the first query is executed. The typical approach to avoid this occurring for a user is to have a simple "warm up" query that runs when the service starts up.. For instance after your service initializes, simply put something like the following:
// Warm up the DbContext
using (var context = new AppDbContext())
{
var hasUser = context.Users.Any();
}
This also serves as a quick start-up check that the database is reachable and responding. The query itself will do a very quick operation, but the DbContext will resolve its mappings at this time so any newly generated DbContext instances will respond without incurring that cost during a request.
As for raw performance, if it isn't a query that is expected to take a while and tie up a request, don't make it async. Asynchronous requests are not faster, they are actually a bit slower. Using async requests against the DbContext is about ensuring your web server / application thread is responsive while potentially expensive database operations are processing. If you want a response as quickly as possible, use a synchronous call.
Next, ensure that any fields you are filtering against, in this case Currency, are indexed. Having a field called Currency in your entity as a String rather than a CurrencyId FK (int) pointing to a Currency record is already an extra indexing expense as indexes on integers are smaller/faster than those on strings.
You also don't need to bother with AsNoTracking when using a Count query. AsNoTracking applies solely when you are returning entities (ToList/ToArray/Single/Firstetc.) to avoid having the DbContext holding onto a reference to the returned entity. When you use Count/Any or projection to return properties from entities using Select there is no entity returned to track.
Also consider network latency between where your application code is running and the database server. Are they the same machine or is there a network connection in play? How does this compare when you are performing a SSMS query? Using a profiler you can see what SQL EF is actually sending to the database. Everything else in terms of time is a cost of: Getting the request to the DB, Getting the resulting data back to the requester, parsing that response. (If in the case where you are returning entities, allocating, populating, checking against existing references, etc... In the case of counts etc. checking existing references)
Lastly, to ensure you are getting the peak performance, ensure that your DbContexts lifetimes are kept short. If a DbContext is kept open and has had a number of tracking queries run against in (Selecting entities without AsNoTracking) those tracked entity references accumulate and can have a negative performance impact on future queries, even if you use AsNoTracking as EF looks to check through it's tracked references for entities that might be applicable/related to your new queries. Many times I see developers assume DbContexts are "expensive" so they opt to instantiate them as little as possible to avoid those costs, only to end up making operations more expensive over time.
With all that considered, EF will never be as fast as raw SQL. It is an ORM designed to provide convenience to .Net applications when it comes to working with data. That convenience in working with entity classes rather than sanitizing and writing your own raw SQL every time comes with a cost.

ADF Mapping Data Flow Source - can query hints (OPTION) be used?

I have a view using a CTE that exceeds the maximum recursion, so I need to select from it using the hint
OPTION (MAXRECURSION 3650)
Is that possible? I do not seem to be able to find any information on it, other than the fact that is not working in the Source Query - any documentation on what you can do as far as SQL queries would be greatly appreciated.
Error message:
at Source 'Calendar': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'option'.
Source Query:
select * from dbo.ReportingCalendar option (maxrecursion 3650)
The above query is correct and runs on the SQL Server.
I refer to this documentation, but there isn't information about keyword 'option'. I also test it with data flow, got the same error with you. So it seems not support this keyword.
As an alternative, you can use copy activity, it supports 'option'. You can copy data from your SQL database to Azure data lake gen2( or someplace else that data flow support as source), then use that as source in data flow and do some transform.

Entity Framework Query execution timeout issue

Meet the following issue when working with EntityFramework v6.1.3
The query is:
var query = DataContext.ExternalPosts.Include(e => e.ExternalUser)
.Where(e => e.EventId == eventId)
.OrderByDescending(e => e.PublishedAt)
.Take(35);
When I do
query.ToList()
I get "The wait operation timed out" exception. But when I use query from
query.ToString()
and execute it directly on server (via Management Studio) it take about 150ms
I have updated CommandTimeout period to 180 and managed to get the result after 50sec via EntityFramework.
When I remove '.OrderByDescending' or '.Include' it works correct for me, I didn't measure the time but it works quite fast.
There is statistics: http://grab.by/KsQ2
I use AzureDb.
UPDATE:
New day, new situation: today it works quite normal with the same query and on the same set of data. Could this be Azure services issue?
Any ideas?
This might help :- msdn forum link

Entity Framework Related Entity too much load time

I'm using MVC5 along with EF6 to develop an application. I'm using SQL Server Express for database. I have two tables/entities in the database.
Vehicle - Contains Information about the vehicles
GsmDeviceLog - Contains data received from GPS Tracker Fit in the vehicle.
My GsmDeviceLogs table currently has around 20K records and it takes around 90 Seconds to execute the below code. Just to fetch one record(i.e. The Last Record).
Here is the code:
var dlog = db.Vehicles.Find(2).GsmDeviceLogs.LastOrDefault();
When I try to open the table using Server explorer it shows ALL the data with in 5-10 seconds. Can anyone help me get the details loaded quickly on the page as well.
Thanks for reading and paying attention to the question.
Can anyone suggest any means to reduce the time.
Your query should look like this:
var dlog = db.Vehicles
.Where(v => v.Id == 2)
.SelectMany(v => v.GsmDeviceLogs)
.OrderByDescending(gdl => gdl.Id) // or order by some datetime
.FirstOrDefault();
In your original query you are loading the Vehicle with Find. Then accessing the GsmDeviceLogs collection loads all logs of that vehicle into memory by lazy loading and then you pick the last one from the loaded collection in memory. It's probably the loading of all logs that consumes too much time.
The query above is executed completely in the database and returns only one GsmDeviceLog record. Side note: You must use OrderByDescending(...).FirstOrDefault here because LastOrDefault is not supported with LINQ to Entities.
Your Linq query is inefficient. try doing db.Vehicles.Single( x => x.Id).GsmDeviceLogs.OrderByDescending(x => x.Id).FirstOrDefault() or whatever your primary keys are

Does EF caching work differently for SQL Server CE 3.5?

I have been developing some single-user desktop apps using Entity Framework and SQL Server 3.5. I thought I had read somewhere that once records are in an EF cache for one context, if they are deleted using a different context, they are not removed from the cache for the first context even when a new query is executed. Hence, I've been writing really inefficient and obfuscatory code so I can dispose the context and instantiate a new one whenever another method modifies the database using its own context.
I recently discovered some code where I had not re-instantiated the first context under these conditions, but it worked anyway. I wrote a simple test method to see what was going on:
using (UnitsDefinitionEntities context1 = new UnitsDefinitionEntities())
{
List<RealmDef> rdl1 = (from RealmDef rd in context1.RealmDefs
select rd).ToList();
RealmDef rd1 = RealmDef.CreateRealmDef(100, "TestRealm1", MeasurementSystem.Unknown, 0);
context1.RealmDefs.AddObject(rd1);
context1.SaveChanges();
int rd1ID = rd1.RealmID;
using (UnitsDefinitionEntities context2
= new UnitsDefinitionEntities())
{
RealmDef rd2 = (from RealmDef r in context2.RealmDefs
where r.RealmID == rd1ID select r).Single();
context2.RealmDefs.DeleteObject(rd2);
context2.SaveChanges();
rd2 = null;
}
rdl1 = (from RealmDef rd in context1.RealmDefs select rd).ToList();
Setting a breakpoint at the last line I was amazed to find that the added and deleted entity was in fact not returned by the second query on the first context!
I several possible explanations:
I am totally mistaken in my understanding that the cached records
are not removed upon requerying.
EF is capricious in its caching and it's a matter of luck.
Caching has changed in EF 4.1.
The issue does not arise when the two contexts are
instantiated in the same process.
Caching works differently for SQL CE 3.5 than other versions of SQL
server.
I suspect the answer may be one of the last two options. I would really rather not have to deal with all the hassles in constantly re-instantiating contexts for single-user desktop apps if I don't have to do so.
Can I rely on this discovered behavior for single-user desktop apps using SQL CE (3.5 and 4)?
When you run the 2nd query on an the ObjectSet it's requerying the database, which is why it's reflecting the change exposed by your 2nd context. Before we go too far into this, are you sure you want to have 2 contexts like you're explaining? Contexts should be short lived, so it might be better if you're caching your list in memory or doing something else of that nature.
That being said, you can access the local store by calling ObjectStateManager.GetObjectStateEntries and viewing what is in the store there. However, what you're probably looking for is the .Local storage that's provided by DbSets in EF 4.2 and beyond. See this blog post for more information about that.
Judging by your class names, it looks like you're using an edmx so you'll need to make some changes to your file to have your context inherit from a DbSet to an objectset. This post can show you how
Apparently Explanation #1 was closer to the fact. Inserting the following statement at the end of the example:
var cached = context1.ObjectStateManager.GetObjectStateEntries(System.Data.EntityState.Unchanged);
revealed that the record was in fact still in the cache. Mark Oreta was essentially correct in that the database is actually re-queried in the above example.
However, navigational properties apparently behave differently, e.g.:
RealmDef distance = (from RealmDef rd in context1.RealmDefs
where rd.Name == "Distance"
select rd).Single();
SystemDef metric = (from SystemDef sd in context1.SystemDefs
where sd.Name == "Metric"
select sd).Single();
RealmSystem rs1 = (from RealmSystem rs in distance.RealmSystems
where rs.SystemID == metric.SystemID
select rs).Single();
UnitDef ud1 = UnitDef.CreateUnitDef(distance.RealmID, metric.SystemID, 100, "testunit");
rs1.UnitDefs.Add(ud1);
context1.SaveChanges();
using (UnitsDefinitionEntities context2 = new UnitsDefinitionEntities())
{
UnitDef ud2 = (from UnitDef ud in context2.UnitDefs
where ud.Name == "testunit"
select ud).Single();
context2.UnitDefs.DeleteObject(ud2);
context2.SaveChanges();
}
udList = (from UnitDef ud in rs1.UnitDefs select ud).ToList();
In this case, breaking after the last statement reveals that the last query returns the deleted entry from the cache. This was my source of confusion.
I think I now have a better understanding of what Julia Lerman meant by "Query the model, not the database." As I understand it, in the previous example I was querying the database. In this case I am querying the model. Querying the database in the previous situation happened to do what I wanted, whereas in the latter situation querying the model would not have the desired effect. (This is clearly a problem with my understanding, not with Julia's advice.)