This question already has answers here:
Client side GroupBy is not supported
(6 answers)
Closed 2 years ago.
I am trying to run GroupBy() command in northwind db this is my code
using(var ctx = new TempContext())
{
var customer = (from s in ctx.Customers
group s by s.LastName into custByLN
select custByLN);
foreach(var val in customer)
{
Console.WriteLine(val.Key);
{
foreach(var element in val)
{
Console.WriteLine(element.LastName);
}
}
}
}
it gives System.InvalidOperationException: 'Client side GroupBy is not supported'
Apparently you are trying to make groups of Customers with the same value for LastName. Some database management systems don't support GroupBy, although this is very rare, as Grouping is a very common database action.
To see if your database management system supports grouping, try the GroupBy using method syntax. End with ToList, to execute the GroupBy:
var customerGroupsWithSameLastName = dbContext.Customers.GroupBy(
// Parameter KeySelector: make groups of Customers with same value for LastName:
customer => customer.LastName)
.ToList();
If this works, the DBMS that your DbContext communicates with accepts GroupBy.
The result is a List of groups. Every Group object implements IGrouping<string, Customer>, which means that every Group has a Key: the common LastName of all Customers in this group. The group IS (not HAS) a sequence of all Customers that have this LastName.
By the way: a more useful overload of GroupBy has an extra parameter: resultSelector. With the resultSelector you can influence the output: it is not a sequence of IGrouping objects, but a sequence of objects that you specify with a function.
This function has two input parameters: the common LastName, and all Customers with this LastName value. The return value of this function is one of the elements of your output sequence:
var result = dbContext.Customers.GroupBy(
customer => customer.LastName,
// parameter resultSelector: take the lastName and all Customers with this LastName
// to make one new:
(lastName, customersWithThisLastName) => new
{
LastName = lastName,
Count = customersWithThisLastName.Count(),
FirstNames = customersWithThisLastName.Select(customer => customer.FirstName)
.ToList(),
... // etc
})
.ToList();
Back to your question
If the above code showed you that the function is not supported by your DBMS, you can let your local process do the grouping:
var result = dbContext.Customer
// if possible: limit the number of customers that you fetch
.Where(customer => ...)
// if possible: limit the customer properties that you fetch
.Select(customer => new {...})
// Transfer the remaining data to your local process:
.AsEnumerable()
// Now your local process can do the GroupBy:
.GroupBy(customer => customer.LastName)
.ToList();
Since you selected the complete Customer, all Customer data would have been transferred anyway, so it is not a big loss if you let your local process do the GroupBy, apart maybe that the DBMS is probably more optimized to do grouping faster than your local process.
Warning: Database management systems are extremely optimized in selecting data. One of the slower parts of a database query is the transfer of the selected data from the DBMS to your local process. So if you have to use AsEnumerable(), you should realize that you will transfer all data that is selected until now. Make sure that you don't transfer anything that you won't use anyhow after the AsEnumerable(); so if you are only interested in the FirstName and LastName, don't transfer primary keys, foreign keys, addresses, etc. Let your DBMS do the Where and Select`
Related
List list = new List();
I have a list of Guid. What is the best to check all guid exits or not using ef core table?
I am currently using the below code but the performance is very bad. assume user table as 1 million records.
for Example
public async Task<bool> IsIdListValid(IEnumerable<int> idList)
{
var validIds = await _context.User.Select(x => x.Id).ToListAync();
return idList.All(x => validIds.Contains(x));
}
The performance is bad because you are reading each row of the table into memory, and then iterating through it (ToList materializes the query.) Try using the Any() method to take advantage of the strength of the database. Use something like the following: bool exists = _context.User.Any(u => idList.Contains(u));. This should translate to an SQL IN clause.
Provided you assert that the # of IDs being sent in is kept reasonable, you could do the following:
var idCount = _context.User.Where(x => idList.Contains(x.Id)).Count();
return idCount == idList.Count;
This assumes that you are comparing on a unique constraint like the PK. We get a count of how many rows have a matching ID from the list, then compare that to the count of IDs sent.
If you're passing a large # of IDs, you would need to break the list up into reasonable sets as there are limits to what you can do with an IN clause and potential performance costs as well.
My sample code lines are,
var question = context.EXTests
.Include(i => i.EXTestSections.Where(t => t.Status != (int)Status.InActive))
.Include(i => i.EXTestQuestions)
.FirstOrDefault(p => p.Id == testId);
Here Include was not supporting Where Clause. How can I modify above code?
You have a sequence of ExTests. Every ExText has zero or more ExTestSections, Every Extest also has a property ExtestQuestions, which is probably also a sequence. Finally every ExTest is identified by an Id.
You want a query where you get the first ExTest that has Id equal to testId, inclusive all its ExTestQuestions and some ExTestSections. You want only those ExTestSections whith an InActive status.
Use Select instead of Using
One of the slower parts of database queries is the transfer of the data from the DBMS to your process. Hence it is wise to limit it to only the data you actually plan to use.
It seems that you have designed a one-to-many relation between ExTests and its ExTestSections: every ExTest has zero or more ExTestSections and every ExTestSection belongs to exactly one ExTest. In databases this is done by giving the ExTestSection a foreign key to the ExTest that it belongs to. It might be that you've designed a many-to-many relation. The principle remains the same.
If you ask an ExTest with its hundred ExTestSections, you get the Id of the the ExTest and hundred times the value of the foreign key of the ExTestSection, thus sending the same value 101 times. What a waste.
So if you query data from the database, only query for the data you actually plan to use.
Use Include if you plan to update the queried data, otherwise use Select
Back to your question
var result = myDbContext.EXTests
.Where(exTest => exTest.Id == testId)
.Select( exTest => new
{
// only select the properties you plan to use
Id = exTest.Id;
Name = exTest.Name,
Result = exText.Result,
... // other properties
ExTestSections = exTest.Sections
.Where(exTestSection => exTestSection.Status != (int)Status.InActive)
.Select(exTestSection => new
{
// again: select only those properties you actually plan to use
Id = exTestSection.Id,
// foreign key not needed, you know it equals ExTest primary key
// ExTestId = exTestSection.ExtTestId
... // other ExtestSection properties you plan to use
})
.ToList(),
ExTestQuestions = exTest.ExTestQuestions
.Select( ...) // only the properties you'll use
})
.FirstOrDefault();
I've transferred the test on equal TestId to a Where. This would allow you to omit the Id of the requested item: you know it will equal testId, so not meaningful to transfer it.
I'm developing a client app that uses breezejs and Entity Framework 6 on the back end. I've got a statement like this:
var country = 'Mexico';
var customers = EntityQuery.from('customers')
.where('country', '==', country)
.expand('order')
I want to use There may be hundreds of orders that each customer has made. For the purposes of performance, I only want to retrieve the latest order for each customer. This will be based on the created date for the order. In SQL, I could write something like this:
SELECT c.customerId, companyName, ContactName, City, Country, max(o.OrderDate) as LatestOrder FROM Customers c
inner join Orders o on c.CustomerID = o.CustomerID
group by c.customerId, companyName, ContactName, City, Country
If this was run against the northwind database, only the most recent order row is returned for each customer.
How can I write a similar query in breeze, so that it runs on the server side and therefore returns less data to the client. I know I could handle this all on the client but writing some javascript in a querysucceeded method that could be run by the client - but that's not the goal here.
thanks
For a case like this, you should create a special endpoint method that will perform your query.
Then you can use an Entity Framework query to do what you want, using the LINQ syntax.
Here are two Web API examples:
[HttpGet]
public IQueryable<Object> CustomersLatestOrderEntities()
{
// IQueryable<Object> containing Customer and Order entity
var entities = ContextProvider.Context.Customers.Select(c => new { Customer = c, LatestOrder = c.Orders.OrderByDescending(o => o.OrderDate).FirstOrDefault() });
return entities;
}
[HttpGet]
public IQueryable<Object> CustomersLatestOrderProjections()
{
// IQueryable<Object> containing Customer and Order entity
var entities = ContextProvider.Context.Customers.Select(c => new { Customer = c, LatestOrder = c.Orders.OrderByDescending(o => o.OrderDate).FirstOrDefault() });
// IQueryable<Object> containing just data fields, no entities
var projections = entities.Select(e => new { e.Customer.CustomerID, e.Customer.ContactName, e.LatestOrder.OrderDate });
return projections;
}
Note that you have a choice here. You can return actual entities, or you can return just some data fields. Which is right for you depends upon how you are going to use them on the client. If they are just for display in a
non-editable list, you can just return the plain data (CustomersLatestOrderProjections above). If they can potentially
be edited, then return the object containing the entities (CustomersLatestOrderEntities). Breeze will merge the entities
into its cache, even though they are contained inside this anonymous object.
Either way, because it returns IQueryable, you can use the Breeze filtering syntax from the client to further qualify the query.
var projectionQuery = breeze.EntityQuery.from("CustomersLatestOrderProjections")
.skip(20)
.take(10);
var entityQuery = breeze.EntityQuery.from("CustomersLatestOrderEntities")
.where('customer.countryName', 'startsWith', 'C');
.take(10);
I have been trying to figure out how to optimize the following query for the past few days and just not having much luck. Right now my test db is returning about 300 records with very little nested data, but it's taking 4-5 seconds to run and the SQL being generated by LINQ is awfully long (too long to include here). Any suggestions would be very much appreciated.
To sum up this query, I'm trying to return a somewhat flattened "snapshot" of a client list with current status. A Party contains one or more Clients who have Roles (ASPNET Role Provider), Journal is returning the last 1 journal entry of all the clients in a Party, same goes for Task, and LastLoginDate, hence the OrderBy and FirstOrDefault functions.
Guid userID = 'some user ID'
var parties = Parties.Where(p => p.BrokerID == userID).Select(p => new
{
ID = p.ID,
Title = p.Title,
Goal = p.Goal,
Groups = p.Groups,
IsBuyer = p.Clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "buyer")),
IsSeller = p.Clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "seller")),
Journal = p.Clients.SelectMany(c => c.Journals).OrderByDescending(j => j.OccuredOn).Select(j=> new
{
ID = j.ID,
Title = j.Title,
OccurredOn = j.OccuredOn,
SubCatTitle = j.JournalSubcategory.Title
}).FirstOrDefault(),
LastLoginDate = p.Clients.OrderByDescending(c=>c.LastLoginDate).Select(c=>c.LastLoginDate).FirstOrDefault(),
MarketingPlanCount = p.Clients.SelectMany(c => c.MarketingPlans).Count(),
Task = p.Tasks.Where(t=>t.DueDate != null && t.DueDate > DateTime.Now).OrderBy(t=>t.DueDate).Select(t=> new
{
ID = t.TaskID,
DueDate = t.DueDate,
Title = t.Title
}).FirstOrDefault(),
Clients = p.Clients.Select(c => new
{
ID = c.ID,
FirstName = c.FirstName,
MiddleName = c.MiddleName,
LastName = c.LastName,
Email = c.Email,
LastLogin = c.LastLoginDate
})
}).OrderBy(p => p.Title).ToList()
I think posting the SQL could give us some clues, as small things like the order of OrderBy coming before or after the projection could make a big difference.
But regardless, try extracting the Clients in a seperate query, this will simplify your query probably. And then include other tables like Journal and Tasks before projecting and see how this affects your query:
//am not sure what the exact query would be, and project it using ToList()
var clients = GetClientsForParty();
var parties = Parties.Include("Journal").Include("Tasks")
.Where(p=>p.BrokerID == userID).Select( p => {
....
//then use the in-memory clients
IsBuyer = clients.Any(c => c.RolesInUser.Any(r => r.Role.LoweredName == "buyer")),
...
}
)
In all cases, install EF profiler and have a look at how your query is affected. EF can be quiet surprising. Something like putting OrderBy before the projection, the same for all these FirstOrDefault or SingleOrDefault, they can all have a big effect.
And go back to the basics, if you are searching on LoweredRoleName, then make sure it is indexed so that the query is fast (even though that could be useless since EF could end up not making use of the covering index since it is querying so many other columns).
Also, since this is query is to view data (you will not alter data), don't forget to turn off Entity tracking, that will give you some performance boost as well.
And last, don't forget that you could always write your SQL query directly and project to your a ViewModel rather than anonymous type (which I see as a good practice anyhow) so create a class called PartyViewModel that includes the flatten view you are after, and use it with your hand-crafted SQL
//use your optimized SQL query that you write or even call a stored procedure
db.Database.SQLQuery("select * from .... join .... on");
I am writing a blog post about these issues around EF. The post is still not finished, but all in all, just be patient, use some of these tricks and observe their effect (and measure it) and you will reach what you want.
My model contains an Order (parent object) and Shipments (child object). The database table for these already have a surrogate key as an auto-increment primary key.
I have the business rule is that for each shipment in the order, we need to have an auto generated "counter" field -- e.g. Shipment 1, Shipment 2, Shipment 3, etc. Shipment model has properties: "ShipmentId", "OrderId", "ShipmentNumber". My attempted implemention is to have ShipmentNumber an int and in code(as opposed to database), query the Shipment collection and do max() + 1.
Here's a code snipet of what I'm doing.
Shipment newShipmentObj = // blah;
int? currentMaxId = myOrderObj.Shipments
.Select(x => (int?) x.ShipmentNumber)
.Max();
if (currentMaxId.HasValue)
newShipmentObj.ShipmentNumber = currentMaxId.Value + 1;
else
newShipmentObj.ShipmentNumber = 1; // 1st one
myOrderObj.Shipments.Add(newShipmentObj);
// etc.. rest of EF4 code
Is there a better way?
I don't really like this as I have the following problems because of potential transaction/concurrency issues.
My Order object also has a autoincrement "counter" -- e.g. Order 1, Order 2, Order 3, ... My Order model has properties: "OrderId", "CustomerId", "OrderNumber".
My design is that I have an OrderRepository but not a ShipmentRepository. The ShipmentRepository could query off the Order.Shipment collection... but with Orders, I have to query directly off the dbcontext, e.g.
int? currentMaxId = (_myDbContext)).Orders
.Where(x => x.CustomerId == 123456)
.Select(x => (int?)x.OrderNumber)
.Max();
However, the above part doesn't work well if I attempt to add multiple objects to the DbContext without committing/saving changes to the database. (i.e. the .Where() returns null... and only works if I use DbContext ".Local", which is not what I want.)
Help! Not sure what the best solution would be. Thanks!
you seem to already have shipmentid that is incremental. you can use it for you shipment number and maybe combined with current date as described here: How to implement gapless, user-friendly IDs in NHibernate? what you are trying to do with Max() is evil. Stay away from it as it can cause problems with getting the same shipment numbers for multiple shipments when the load is high