I have been trying to figure out how to optimize the following query for the past few days and just not having much luck. Right now my test db is returning about 300 records with very little nested data, but it's taking 4-5 seconds to run and the SQL being generated by LINQ is awfully long (too long to include here). Any suggestions would be very much appreciated.
To sum up this query, I'm trying to return a somewhat flattened "snapshot" of a client list with current status. A Party contains one or more Clients who have Roles (ASPNET Role Provider), Journal is returning the last 1 journal entry of all the clients in a Party, same goes for Task, and LastLoginDate, hence the OrderBy and FirstOrDefault functions.
Guid userID = 'some user ID'
var parties = Parties.Where(p => p.BrokerID == userID).Select(p => new
{
ID = p.ID,
Title = p.Title,
Goal = p.Goal,
Groups = p.Groups,
IsBuyer = p.Clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "buyer")),
IsSeller = p.Clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "seller")),
Journal = p.Clients.SelectMany(c => c.Journals).OrderByDescending(j => j.OccuredOn).Select(j=> new
{
ID = j.ID,
Title = j.Title,
OccurredOn = j.OccuredOn,
SubCatTitle = j.JournalSubcategory.Title
}).FirstOrDefault(),
LastLoginDate = p.Clients.OrderByDescending(c=>c.LastLoginDate).Select(c=>c.LastLoginDate).FirstOrDefault(),
MarketingPlanCount = p.Clients.SelectMany(c => c.MarketingPlans).Count(),
Task = p.Tasks.Where(t=>t.DueDate != null && t.DueDate > DateTime.Now).OrderBy(t=>t.DueDate).Select(t=> new
{
ID = t.TaskID,
DueDate = t.DueDate,
Title = t.Title
}).FirstOrDefault(),
Clients = p.Clients.Select(c => new
{
ID = c.ID,
FirstName = c.FirstName,
MiddleName = c.MiddleName,
LastName = c.LastName,
Email = c.Email,
LastLogin = c.LastLoginDate
})
}).OrderBy(p => p.Title).ToList()
I think posting the SQL could give us some clues, as small things like the order of OrderBy coming before or after the projection could make a big difference.
But regardless, try extracting the Clients in a seperate query, this will simplify your query probably. And then include other tables like Journal and Tasks before projecting and see how this affects your query:
//am not sure what the exact query would be, and project it using ToList()
var clients = GetClientsForParty();
var parties = Parties.Include("Journal").Include("Tasks")
.Where(p=>p.BrokerID == userID).Select( p => {
....
//then use the in-memory clients
IsBuyer = clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "buyer")),
...
}
)
In all cases, install EF profiler and have a look at how your query is affected. EF can be quiet surprising. Something like putting OrderBy before the projection, the same for all these FirstOrDefault or SingleOrDefault, they can all have a big effect.
And go back to the basics, if you are searching on LoweredRoleName, then make sure it is indexed so that the query is fast (even though that could be useless since EF could end up not making use of the covering index since it is querying so many other columns).
Also, since this is query is to view data (you will not alter data), don't forget to turn off Entity tracking, that will give you some performance boost as well.
And last, don't forget that you could always write your SQL query directly and project to your a ViewModel rather than anonymous type (which I see as a good practice anyhow) so create a class called PartyViewModel that includes the flatten view you are after, and use it with your hand-crafted SQL
//use your optimized SQL query that you write or even call a stored procedure
db.Database.SQLQuery("select * from .... join .... on");
I am writing a blog post about these issues around EF. The post is still not finished, but all in all, just be patient, use some of these tricks and observe their effect (and measure it) and you will reach what you want.
Related
List list = new List();
I have a list of Guid. What is the best to check all guid exits or not using ef core table?
I am currently using the below code but the performance is very bad. assume user table as 1 million records.
for Example
public async Task<bool> IsIdListValid(IEnumerable<int> idList)
{
var validIds = await _context.User.Select(x => x.Id).ToListAync();
return idList.All(x => validIds.Contains(x));
}
The performance is bad because you are reading each row of the table into memory, and then iterating through it (ToList materializes the query.) Try using the Any() method to take advantage of the strength of the database. Use something like the following: bool exists = _context.User.Any(u => idList.Contains(u));. This should translate to an SQL IN clause.
Provided you assert that the # of IDs being sent in is kept reasonable, you could do the following:
var idCount = _context.User.Where(x => idList.Contains(x.Id)).Count();
return idCount == idList.Count;
This assumes that you are comparing on a unique constraint like the PK. We get a count of how many rows have a matching ID from the list, then compare that to the count of IDs sent.
If you're passing a large # of IDs, you would need to break the list up into reasonable sets as there are limits to what you can do with an IN clause and potential performance costs as well.
My sample code lines are,
var question = context.EXTests
.Include(i => i.EXTestSections.Where(t => t.Status != (int)Status.InActive))
.Include(i => i.EXTestQuestions)
.FirstOrDefault(p => p.Id == testId);
Here Include was not supporting Where Clause. How can I modify above code?
You have a sequence of ExTests. Every ExText has zero or more ExTestSections, Every Extest also has a property ExtestQuestions, which is probably also a sequence. Finally every ExTest is identified by an Id.
You want a query where you get the first ExTest that has Id equal to testId, inclusive all its ExTestQuestions and some ExTestSections. You want only those ExTestSections whith an InActive status.
Use Select instead of Using
One of the slower parts of database queries is the transfer of the data from the DBMS to your process. Hence it is wise to limit it to only the data you actually plan to use.
It seems that you have designed a one-to-many relation between ExTests and its ExTestSections: every ExTest has zero or more ExTestSections and every ExTestSection belongs to exactly one ExTest. In databases this is done by giving the ExTestSection a foreign key to the ExTest that it belongs to. It might be that you've designed a many-to-many relation. The principle remains the same.
If you ask an ExTest with its hundred ExTestSections, you get the Id of the the ExTest and hundred times the value of the foreign key of the ExTestSection, thus sending the same value 101 times. What a waste.
So if you query data from the database, only query for the data you actually plan to use.
Use Include if you plan to update the queried data, otherwise use Select
Back to your question
var result = myDbContext.EXTests
.Where(exTest => exTest.Id == testId)
.Select( exTest => new
{
// only select the properties you plan to use
Id = exTest.Id;
Name = exTest.Name,
Result = exText.Result,
... // other properties
ExTestSections = exTest.Sections
.Where(exTestSection => exTestSection.Status != (int)Status.InActive)
.Select(exTestSection => new
{
// again: select only those properties you actually plan to use
Id = exTestSection.Id,
// foreign key not needed, you know it equals ExTest primary key
// ExTestId = exTestSection.ExtTestId
... // other ExtestSection properties you plan to use
})
.ToList(),
ExTestQuestions = exTest.ExTestQuestions
.Select( ...) // only the properties you'll use
})
.FirstOrDefault();
I've transferred the test on equal TestId to a Where. This would allow you to omit the Id of the requested item: you know it will equal testId, so not meaningful to transfer it.
This is my real world example.
I have 4 tables:
Person
Plan
Coverage
CoveredMembers
Each person can have many plans, each of those plans can have many coverages. Each of those coverages can have many CoveredMembers.
I need a query that will apply a filter on Plan.PlanType == 1 and CoveredMembers.TermDate == null.
This query should bring back any person who has a medical type plan that is not terminated.
This SQL statement would do just that:
SELECT Person.*, Plans.*, Coverages.*, CoveredMembers.*
FROM Person P
INNER JOIN Plan PL ON P.PersonID = PL.PersonID
INNER JOIN Coverage C on PL.PlanID = C.PlanID
INNER JOIN CoveredMember CM on C.CoverageID = CM.CoverageID
WHERE CM.TermDate = NULL AND PL.PlanType = 1
I have figured out how to do this using anonymous types, but I sometimes need to update the data and save back to the database - and anonymous types are read only.
I was given a solution that did work using JOIN but it only brought back the persons (albeit filtered the way I needed). I can then loop through each person:
foreach (var person in persons) {
foreach (var plan in person.Plans{
//do stuff
}
}
But wouldn't that make a db call for each iteration of the loop? I have 500 persons with 3 unterminated medical plans each, so it would call the db 1500 times?
This is why I want to bring the whole data tree from Persons to CoveredMembers back in one shot. Is this not possible?
I believe this is accomplished in two parts:
Your query to determine the people you wish to have returned based on your criteria as discussed in this question previously: Entity framework. Need help filtering results
Properly setting the navigation properties for entities you want brought together to be eagerly loaded: http://msdn.microsoft.com/en-us/data/jj574232.aspx
For example if your Person entity looks like:
public class Person {
public List<Plan> Plans {get; set;}
...
}
When returning data from the dbcontext you can also use explicit eager loading with the include option:
var people = context.People
.Include(p => p.Plans)
.ToList();
....
If these are nested - coverage is part of plan, etc (which it looks like, it goes something like):
var people = context.People
.Include(p => p.Plans.Select(pl=>pl.Coverage).Select(c=>c.CoveredMembers)))
.ToList();
....
I am making some assumptions about your data model here, and my code above probably needs a little tweaking.
EDIT:
I might need someone else to weigh in here, but I don't think you can add the where clause into an include like that (my example above leads you that way a bit by putting the include on the context object, instead return an IQueryable with your conditions set as solved in your first post (without a ToList() called on it) and then use the code you wrote above without the Where clauses:
From first post (you supplied different criteria in this one, but same concept)
var q = from q1 in dbContext.Parent
join q2 in dbContext.Children
on q1.key equals q2.fkey
join q3 in ........
where q4.col1 == 3000
select q1;
Then:
List<Person> people = q.Include(p => p.Plans
.Select(pl => pl.Coverages)
.Select(c => c.CoveredMembers).ToList();
Again, doing this without being able to troubleshoot - I am sure it would take me a few attempts to iron this one out too.
I am working on a new project and we are using Entity Framework and the dev lead would like to use lambda queries whenever possible. One thing we are having a hard time figuring out is how to select two columns specifically. Also how to select distinct. We have a table that has multiple entries for a vendor but we want to just get a list of vendors and load to a dictionary object. It fails because as written it is trying to add a key value that has already been added. Take the following query.
Dictionary<int, string> dict = new Dictionary<int, string>();
dict = GetWamVendorInfo().AsEnumerable()
.Where(x => x.vendor_name != null && x.vendor_id != null)
//.Select(x => x.vendor_id).Distinct()
.Take(2)
.ToDictionary(o => int.Parse(o.vendor_id.ToString()), o => o.vendor_name);
What I would like to do is select just vendor_id and vendor_name so we can get just the distinct records.
Any help would be greatly appreciated.
Thanks,
Rhonda
Use an anonymous type:
// earlier bit of query
.Select(x => new { VendorId = x.vendor_id, VendorName = x.vendor_name } )
.Distinct()
.ToDictionary(o => o.VendorId, o => o.VendorName);
I've removed the call to Take(2) as it wasn't clear why you'd want it - and also removed the parsing of VendorId, which I would have expected to already be an integer type.
Note that you should almost certainly remove the AsEnumerable call from your query - currently you'll be fetching all the vendors and filtering with LINQ to Objects. There's also no point creating an empty dictionary and then ignoring it entirely. I suspect your complete query should be:
var vendors = GetWamVendorInfo()
.Select(x => new { VendorId = x.vendor_id,
VendorName = x.vendor_name } )
.Distinct()
.ToDictionary(o => o.VendorId,
o => o.VendorName);
As an aside, you should ask your dev lead why he wants to use lambda expressions (presumably as opposed to query expressions) everywhere. Different situations end up with more readable code using different syntax options - it's worth being flexible on this front.
Just use an anonymous object:
var vendors = GetWamVendorInfo().AsEnumerable()
.Where(x => x.vendor_name != null && x.vendor_id != null)
.Select(new {x.vendor_id, x.vendor_name})
.Take(2)
That's it. You can now work with vendors[0].vendor_id, vendors[0].vendor_name, and so on.
Say I have a table in a database called TestDB with a table called Table1 which contains two columns
ID
Name
In this table ,there is two row,
ID 1 and Name 'Row1'
ID 2 and Name 'Row2'
If this is my code
var currObj = Table1.Where(o => o.Name.Contains("Row1")).FirstOrDefault();
currObj.Name = "Row1a";
var a = Table1.Where(o => o.Name.Contains("Row1a")).FirstOrDefault();
var b = Table1.Local.Where(o => o.Name.Contains("Row1a")).FirstOrDefault();
a is going to return null, whereas b is going to return a value.
if I do this though
var c = Table1.Where(o => o.Name.Contains("Row2")).FirstOrDefault();
var d = Table1.Local.Where(o => o.Name.Contains("Row2")).FirstOrDefault();
then c will return something, but d will not.
For me, this seems unintuitive because there is two different places the data could be. For every query I do, I have to look in the database, and the Local object and merge them together. Like if it's changed in the local object, I have to take that into account. Does Entity Framework have any mechanism for this, which would consider BOTH the data in the database AND the local at the same time?
So if I went
var e = Table1.AMagicSolution.Where(o => o.Name.Contains("Row1a")).FirstOrDefault();
var f = Table1.AMagicSolution.Where(o => o.Name.Contains("Row2")).FirstOrDefault();
then e and f would both return something (f from the database and e from the Local object)
Your expectation is little bit strange. EF context works like a unit of work = it is used to process single logical transaction and because of that you should try to design your application in the way that it knows if the record is in the local storage or not.
If for any reason your application doesn't know if the record is in the local storage you should always separate queries to the local storage and to the database! Reasons are performance and unnecessary roundtrips to the database. Use helper like in following example to query the local storage first and only if record is not present in the local storage query the database:
public static Table1 AMagicSolution(this IQueryable<Table1> query, string name)
{
var item = Table1.Local.Where(t => t.Name.Contains(name)).FirstOrDefault();
if (item != null)
{
item = Table1.Where(t => t.Name.Contains(name)).FirstOrDefault();
}
return item;
}
Not much different from your other question.
Table1.ObjectStateManager.GetObjectStateEntries(EntityState.Added | EntityState.Unchanged)
.Where(o => o.GetType() == typeof(Table1Type)
&& o => o.Name.Contains("Row1a"))
.FirstOrDefault();
From the docs: "The EntityState is a bit field, so state entries for multiple states can be retrieved in one call by doing a bitwise OR of more than one EntityState values."