I'm running this linq query which is a little big.
var events = _context.Event.OrderByDescending(e => e.StartDate).Where(e => e.IsPresentation == true).Where(e => e.IsCanceled == false).Where(e => e.StartDate > new DateTime());
And the page outputing the data from this query is taking too much time to load. Maybe because I'm using too many wheres.
I had the same issue using includes, and then includes, in a different query, but I divided the query, to improve the performance. But I'm trying to figure it out how to do the same thing in that situation, because I'm not using any include.
Overall, the performance of the query will largely depend on the table size, and availability of suitable indices.
A couple things I can note from that query:
This statement doesn't make a lot of sense: .Where(e => e.StartDate > new DateTime()). new DateTime() will initialize a DateTime from 01/01/0001. Any dates stored in an SQL Server DateTime column for example will be from 01/01/1753, so this seems rather moot. If the database/entity DateTime value is null-able, then .Where(e => e.StartDate.HasValue) would be more applicable. If the DateTime value is not null-able then this condition can be left off entirely.
So if the field is null-able, the Linq expression would look more like:
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate.HasValue)
.OrderByDescending(e => e.StartDate)
.ToList();
If it's not null-able:
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled)
.OrderByDescending(e => e.StartDate)
.ToList();
The next culprit to eliminate: Lazy Load proxy hits. Does the Event property have navigation properties to any other entities? If this is something like a web application and you are serializing entities to send back to the client, navigation property EF proxies can absolutely kill performance. Any code after this call that touches a navigation property will result in extra individual DB calls to lazy load these navigation properties. For methods that return lists of entities this can be critical. If an Event has a reference to something like a User and you load 1000 events referencing roughly 1000 users, when a serializer goes to serialize those 1000 events, it will "touch" each of the user references. This leads to ~1000 extra SQL statements effectively doing SELECT * FROM tblUser WHERE UserId = 1, SELECT * FROM tblUser WHERE UserId = 2... etc. etc. for each User ID in each Event. If you need these related entities you can Eager load them with Include(e => e.User) which will be faster than loading them individually, but this does mean loading a lot of extra data into memory that your client/code may not need. You can avoid the lazy load hits by turning off lazy loading & proxies, but this will leave these entities with #null references which means any code expecting an Event entity with any related details may get one of these partially loaded entities. (not good, the entity should always be in a complete or complete-able state) The final option is to use Select to populate a view model for your results. This can speed up queries considerably because you can have EF compose a query to pull back just the data you need from whatever entities rather than everything or triggering lazy loads.
For example if you just need an EventId, EventNumber, Name, StartDate, and a UserName to display for the events:
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate.HasValue)
.Select(e => new EventViewModel
{
EventId = e.EventId,
EventNumber = e.EventNumber,
Name = e.Name,
StartDate = e.StartDate,
UserName = e.User.Name
})
.OrderByDescending(e => e.StartDate)
.ToList();
This avoids any lazy load hits and reduces the query run to just the columns needed which can speed up queries significantly.
Next would be to look at the queries EF is running and their relevant execution plan. This can highlight missing/poor indexes, and any unexpected lazy load hits. The method for doing this would depend on your database, but involves running a Profiler against the DB to capture the SQL statements being run while you debug. From here you can capture the SQL statements that EF generates then run those manually against your database with any DB-side analysis tools. (such as SSMS with SQL Server to get an execution plan which can identify missing indexes) Serializer lazy load hits in a web application are detectable as a lot of extra SQL statements executing after your method appears to have completed, but before the data gets back to the client. This is the serializer "touching" proxies resulting in lots of additional queries that the server has to wait to complete before the data is returned to the client.
Lastly would be data volume. Any system that is expected to grow over time should consider the amount of data that can eventually be returned. Anything that returns lists of data over time should incorporate server-side pagination where the client sends a page size and page # where the server can translate this into a .Skip(pageNumber * pageSize).Take(pageSize) operation. (/w page # starting at 0) Most data grid and like components should support server-side pagination to send these arguments to their data load methods. These controls will need to know the total row count to set up pagination so you would need a method to return that count:
var rowCount = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate.HasValue)
.Count();
Same conditions, no order by, and a .Count() with no ToList() etc. will compose an efficient Count query.
This should give you a few things to check and tweak to eliminate your performance pitfalls.
You should store the value to one variable like var now = new DateTime()
Combine all of the conditions into one Where clause
Just OrderByDescending after Where clause, It means We just order based on actual data instead of all.
var now = new DateTime();
var events = _context.Event
.Where(e => e.IsPresentation && !e.IsCanceled && e.StartDate > now)
.OrderByDescending(e => e.StartDate);
Tips
You should re-arrange your condition based on actual data. For example:
.Where(e => 1 == 2 && 2 == 2 && 3 == 3)
As you can see, We no need to manipulate the rest of conditions from && 2 == 2 && 3 == 3 because of and condition.
one thing can be sorting at the end because there would be less items and then less time to sort,
BUT it really depends on your data distribution. Look if MOST of your data has e.IsPresentation == true, then the first "Where" does not reduce the data size for you, SO then you are again checking e.IsCanceled == false on e.g. 95 % your data. But assume only 10 % of your whole data is e.IsCanceled == false. So now if you apply e.IsPresentation == true , on that 10% in the second order, it takes much less time than before. So in big databases DB managers usually use different query plans! however the final result is the same. the process time is NOT same. hope it helps you.
Related
I'm using EF Core 3.0 code first with MSSQL database. I have big table that has ~5 million records. I have indexes on ProfileId, EventId and UnitId. This query takes ~25-30 seconds to execute. Is it normal or there is a way to optimize it?
await (from x in _dbContext.EventTable
where x.EventId == request.EventId
group x by new { x.ProfileId, x.UnitId } into grouped
select new
{
ProfileId = grouped.Key.ProfileId,
UnitId = grouped.Key.UnitId,
Sum = grouped.Sum(a => a.Count * a.Price)
}).AsNoTracking().ToListAsync();
I tried to loos through profileIds, adding another WHERE clause and removing ProfileId from grouping parameter, but it worked slower.
Capture the SQL being executed with a profiling tool (SSMS has one, or Express Profiler) then run that within SSMS /w execution plan enabled. This may highlight an indexing improvement. If the execution time in SSMS roughly correlates to what you're seeing in EF then the only real avenue of improvement will be hardware on the SQL box. You are running a query that will touch 5m rows any way you look at it.
Operations like this are not that uncommon, just not something that a user would expect to sit and wait for. This is more of a reporting-type request so when faced with requirements like this I would look at options to have users queue up a request where they can receive a notification when the operation completes to fetch the results. This would be set up to prevent users from repeatedly requesting updates ("not sure if I clicked" type spams) or also considerations to ensure too many requests from multiple users aren't kicked off simultaneously. Ideally this would be a candidate to run off a read-only reporting replica rather than the read-write production DB to avoid locks slowing/interfering with regular operations.
Try to remove ToListAsync(). Or replace it with AsQueryableAsync(). Add ToList slow performance down.
await (from x in _dbContext.EventTable
where x.EventId == request.EventId
group x by new { x.ProfileId, x.UnitId } into grouped
select new
{
ProfileId = grouped.Key.ProfileId,
UnitId = grouped.Key.UnitId,
Sum = grouped.Sum(a => a.Count * a.Price)
});
This is a very weird problem
In short
var q = (some query).Count();
Gives my a number and
var q = (some query).ToList().Count();
Gives me entirely different number...
with mentioning that (some query) has two includes (joins)
is there a sane explanation for that???
EDIT: here is my query
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).Count();
this gives me a wrong number
and this:
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).ToList().Count();
Gives me accurate number..
EDIT 2
Wher I wrote my query as linq query it worked perfectly...
var q1 = (from d in db.membership_renewals where d.isDeleted == false join m in db.members on d.mr_memberId equals m.m_id join s in db.sports on d.mr_sportId equals s.s_id select d.mr_id).Count();
I think the problem that entity framework doesn't execute the joins in the original query but forced to execute them in (ToList())...
I Finally figured out what's going on...
The database tables are not linked together in the database (there are no relationship or constraints defined in the database itself) so the code doesn't execute the (inner join) part.
However my classes on the other hand are well written so when I perform (ToList()) it automatically ignores the unbound rows...
And when I wrote the linq query defining the relation ship keys (primary and foreign) it worked alright because now the database understands my relation between tables...
Thanks everyone you've been great....
My guess is IQueryable gives a smaller number cause not all the objects are loaded, kind of like a stream in Java, but IQueryable.toList().count() forces the Iqueryable to load all the data and it is traversed by the list constructor and stored in the list so IQueryable.toList().Count() is the accurate answer. This is based on 5 minutes of search on MSDN.
The idea is the underlying datastore of the IQueryable is a database iterator so it executes differently every time because it executes the query again on the database, so if you call it twice against the same table, and the data has changed you get different results. This is called delayed execution. But when you say IQueryable.ToList() you force the iterator to do the whole iteration once and dump the results in a list which is constant
I have two almost identical queries. The only difference is the Where clause.
Following a solution rebuild, the first response time for both queries is 20 seconds. For all following requests:
.Where(x => x.EnquiryId == id); returns in < 1 second
.Where(x => ids.Contains(x.EnquiryId)); always takes 20 seconds, even with a single id in the collection
What am I doing wrong? How can I select on multiple ids without such an immense performance hit?
Bizarrely the following where clause also takes 20 seconds: .Where(x => x.EnquiryId == ids.FirstOrDefault());
edit: AzureSQL (live) and SQLExpress2017 (dev) on the backend. Query is slow on both live and my dev machine.
edit: In SQL Server Profiler I'm not sure what to look at, but for the two queries each has an RPC:Completed EventClass. One of these (I'm guessing the fast one) is 22 lines long. The other is nearly 7000 lines long. So I guess my next question is how can I advise EF to not create such awful SQL?
update: for anyone else who has this issue, I found I can bypass the bad SQL generation through the use of a union rather than a contains, ie instead of .Where(x => ids.Contains(x.EnquiryId)); use a loop foreach (var id in ids) { q = q.Union(query.Where(x => x.EnquiryId == id)); }
You could try to use Any (I did not profile this so I don't know if this is faster).
.Where(x => ids.Any(id => id == x.EnquiryId))
Let's say I have an Entity Framwork query
var query = db.Entities
.FancyQueryStuff()
.Where(GetFilter()) // *
.OrderBy(GetSort()) // *
.Take(GetNumberOfRows()) // *
;
and figure that this query is very slow. Testing reveals that the following rewrite is much faster:
var ids = db.Entities
.FancyQueryStuff()
.Where(GetFilter()) // *
.OrderBy(GetSort()) // *
.Take(GetNumberOfRows()) // *
.Select(x => x.Id)
.ToArray()
;
var query = db.Entries
.FancyQueryStuff()
.OrderBy(GetSort()) // *
.Where(x => ids.Contains(x.Id));
Whether that is quicker depends on a lot of things, including the sql database used, but I have a scenario in which this is the case with SQL Server and a particular query doing heavy joining.
Now the problem I have is that I want to use libraries that take IQueryables and apply Where, OrderBy, Take and Skip internally according to UI information the get from somewhere else (DevExpress/Telerik grids with paging, where the user clicks on captions to sort, etc.).
That means I have to write the query in a form where all the rows marked with an asterisk can be applied by a third-party framework.
With Devextreme, for example, you have a method that takes the query plus a data structure representing the filter/sorting/paging in a custom format and returns the query results you are supposed to pass to a client in an html application:
var result = DataSourceLoader.Load(query, loadOptions);
DataSourceLoader.Load applies everything of the kind I marked with an asterisk to the end of the query, executes it and returns the result.
I guess it's possible to do what I want with some heavy guns of linq magic (dynamic linq?), but before I try myself I thought maybe someone already has a snippet ready for this probably not too uncommon use case.
I have a single DbContext.. First I do:
var all = context.MySet.Where(c=>c.X == 1).ToList();
later (with the same context instance)
var special = context.MySet.Where(c=>(c.X == 1) && (c.Y===1).ToList();
The database is hit AGAIN! Since the first query is guaranteed
to return all of the elements that will exist in the second, why is the DB being hit again?
If you wish to avoid hitting the database again then you could try this;
var special = all.Where(c=>(c.X == 1) && (c.Y===1).ToList();
Since the list of all objects already contains everything you want you can just query that list and the database won't get hit again.
Your link expression is just a query, it only retrieves data when you enumerate it (for example calling .ToList()). You can keep changing the query and hold off actually getting the data until you need it. The entity framework will convert your query into an SQL query in the background and then fetch data.
Avoid writing "ToList()" at the end of every query as this forces the EF to hit the database.
If you only ever what to hit the database once then get the data you need by calling "ToList(), To.Array etc and then work with that collection (in your case the "all" collection) since this is the object holding all the data.