Given the following LINQ to Entities (EF4) query...
var documents =
from doc in context.Documents
.Include(d => d.Batch.FinancialPeriod)
.Include(d => d.Batch.Contractor)
.Include(d => d.GroupTypeHistory.Select(gth => gth.GroupType))
.Include(d => d.Items.Select(i => i.Versions))
.Include(d => d.Items.Select(i => i.Versions.Select(v => v.ProductPackPeriodic.ProductPack.Product.HomeDelivery)))
.Include(d => d.Items.Select(i => i.Versions.Select(v => v.ProductPackPeriodic.ProductPack.Product.Manufacturer)))
.Include(d => d.Items.Select(i => i.Versions.Select(v => v.ProductPackPeriodic.ProductPeriodic)))
.Include(d => d.Items.Select(i => i.Versions.Select(v => v.ProductPackPeriodic.SpecialContainerIndicator)))
.Include(d => d.Items.Select(i => i.Versions.Select(v => v.Endorsements.Select(e => e.Periodic))))
where doc.ID == this.documentID
select doc;
Document document = documents.FirstOrDefault();
Where ...
a Document will always have a Batch which in turn will always have a
financial period and a contractor
a Document will always have one or
more GroupTypeHistory, each of which will always have a GroupType
a Document will have zero or more Item's, which in turn will have one
or more Version's
a Version will have zero or one ProductPackPeriodic
a ProductPackPeriodic will always have one ProductPack and one
ProductPeriodic, along with zero or one SpecialContainerIndicator
a ProductPack will always have one Product
a Product will have zero or one Manufacturer, and zero or one HomeDelivery
a Version will have zero or more Endorsements, each of which will have one Periodic
The above LINQ query generates some of the worst TSQL that I have ever seen, with some of the related tables included multiple times (probably because they are referenced within the query multiple times) and takes significantly longer than I would like to run (the tables concerned can contain millions of rows, but this is not the cause).
I know that there has to be a better way to write it (taking into account all of the different reference types that I describe above) which will result in better TSQL, but every version that I try fails to return the data correctly.
Can anybody assist in pointing me to a better solution?
If it makes it any easier to understand, were I writing TSQL directly I would be looking at something like the following...
select *
from Document d
inner join Batch b
inner join FinancialPeriod fp on b.FinancialPeriodID = fp.FinancialPeriodID
inner join Contractor c on b.ContractorID = c.ContractorID
on d.BatchID = b.BatchID
inner join DocumentGroupType dgt on d.DocumentID = dgt.DocumentID
left join Item i
left join ItemVersion iv
left join ProductPackPeriodic ppp
inner join ProductPack pack
inner join Product p
left join Manufacturer m on p.ManufacturerID = m.ManufacturerID
left join HomeDeliveryProduct hdp on p.ProductID = hdp.ProductID
on pack.ProductID = p.ProductID
on ppp.ProductPackID = pack.ProductPackID
inner join ProductPeriodic pp on ppp.ProductPeriodicID = pp.ProductPeriodicID
left join SpecialContainerIndicator sci on ppp.SpecialContainerIndicatorCode = sci.SpecialContainerIndicatorCode
on iv.ProductPackPeriodicID = ppp.ProductPackPeriodicID
left join ItemVersionEndorsement ive
inner join EndorsementPeriodic ep on ive.EndorsementPeriodicID = ep.EndorsementPeriodicID
on iv.ItemVersionID = ive.ItemVersionID
on i.ItemID = iv.ItemID
on d.DocumentID = i.DocumentID
where d.DocumentID = 33 -- example value
This may also make the relationship requirements clearer.
Thanks in advance.
For scenarios like this i write specific stored procedures and then use EFExtensions with a custom materializer to not only get excellent performance but also correctly materialized entities.
I don't have any good answer for EF, but it might suit you to use a micro ORM for certain complex queries. Micro ORMs are essentially low-level wrappers over SQL, that allow you to get strongly typed objects, along with other convenience features. You can take a look at Dapper, for example, which is used by this very site for some of the bottleneck queries. It should perform very close to native SQL performance.
Related
I have an INCLUDE table that I want to check a couple of values, in the same row, using an IN clause. The below doesn't return the correct result set because it produces two EXISTS clauses with subqueries. This results in the 2 values being checked independently and not strictly in the same child row. (forgive any typos as I'm typing this in from printed code)
var db = new dbEntities();
IQueryable<dr> query = db.drs;
// filter the parent table
query = query.Where(p => DropDown1.KeyValue.ToString().Contains(p.system_id.ToString()));
// include the child table
query = query.Include(p => p.drs_versions);
// filter the child table using the other two dropdowns
query = query.Where(p => p.drs_versions.Any(c => DropDown2.KeyValue.ToString().Contains(c.version_id.ToString())) && c => DropDown3.KeyValue.ToString().Contains(c.status_id.ToString()));
// I tried removing the second c=> but received an error "'c' is inaccessible due to its protection level" error and couldn't find an clear answer to how this related to Entity Framework
// query = query.Where(p => p.drs_versions.Any(c => DropDown2.KeyValue.ToString().Contains(c.version_id.ToString())) && DropDown3.KeyValue.ToString().Contains(c.status_id.ToString()));
This is an example of the query the code above produces...
SELECT *
FROM drs d
LEFT OUTER JOIN drs_versions v ON d.dr_id = v.dr_id
WHERE d.system_id IN (9,8,3)
AND EXISTS (SELECT 1 AS C1
FROM drs_versions sub1
WHERE d.tr_id = sub1.tr_id
AND sub1.version_id IN (9, 4, 1))
AND EXISTS (SELECT 1 AS C1
FROM drs_versions sub2
WHERE d.tr_id = sub2.tr_id
AND sub2.status_id IN (12, 7))
This is the query I actually want:
SELECT *
FROM drs d
LEFT OUTER JOIN drs_versions v ON d.dr_id = v.dr_id
WHERE d.system_id IN (9, 8, 3)
AND v.version_id IN (9, 4, 1)
AND v.status_id IN (12, 7)
How do I get Entity Framework to create a query that will give me the desired result set?
Thank you for your help
I'd drop all of the .ToString() everywhere and format your values ahead of the query to make it a lot easier to follow.. If EF is generating SQL anything like what you transcribed, you are casting to String just to have EF revert it back to the appropriate type.
From that it just looks like your parenthesis are a bit out of place:
I'm also not sure how something like DropDown2.KeyBalue.ToString() resolves back to what I'd expect to be a collection of numbers based on your SQL examples... I've just substituted this with a method called getSelectedIds().
IEnumerable<int> versions = getSelectedIds(DropDown2);
IEnumerable<int> statuses = getSelectedIds(DropDown3);
query = query
.Where(p => p.drs_versions
.Any(c => versions.Contains(c.version_id)
&& statuses.Contains(c.status_id));
As a general bit of advice I suggest always looking to simplify the variables you want to use in a linq expression as much as possible ahead of time to keep the text inside the expression as simple to read as possible. (avoiding parenthesis as much as possible) Make liberal use of line breaks and indentation to organize what falls under what, and use the code highlighting to double-check your closing parenthesis that they are closing the opening you expect.
I don't think your first example actually was input correctly as it would result in a compile error as you cannot && c => ... within an Any() block. My guess would be that you have:
query = query.Where(p => p.drs_versions.Any(c => DropDown2.KeyValue.ToString().Contains(c.version_id.ToString())) && p.drs_versions.Any(c => DropDown3.KeyValue.ToString().Contains(c.status_id.ToString()));
Your issue is closing off the inner .Any()
query.Where(p => p.drs_versions.Any(c => DropDown2.KeyValue.Contains(c.version_id))
&& DropDown3.KeyValue.Contains(c.status_id)); //<-- "c" is still outside the single .Any() condition so invalid.
Even then I'm not sure this will fully explain the difference in queries or results. It sounds like you've tried typing across code rather than pasting the actual statements and captured EF queries. It may help to copy the exact statements from the code because it's pretty easy to mistype something when trying to simplify an example only to find out you've accidentally excluded the smoking gun for your issue.
I have this table structure:
Classic many to many relationship. I want to get all the orders for products belonging to the category for a small number of products I provide. It may be easier to show the SQL that does exactly what I want:
select o.*
from [Order] o join Product p2 on o.FKCatalogNumber=p2.CatalogNumber
where p2.FKCategoryId IN
(select c.Id
from Category c join Product p1 on p1.FKCategoryId=c.Id
where p1.CatalogNumber in ('0001', '0002')
This example gives me all the orders belonging to the categories that catalog #'s 0001 and 0002 are in.
But I am unable to wrap my head around the equivalent EF syntax for this query. I'm embarrassed to say I spent half the day on this. I bet it's easy for someone out there.
I came up with this but it's not working (and probably not even close):
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Categories
.SelectMany(c => c.Products, (c, p) => new {c, p})
.Where(#t => catNumbers.Contains(#t.p.CatalogNumber))
.Select(#t => #t.p.Orders)
.ToList();
You can use query syntax (which looks very similar to SQL) in LINQ, so if you're more comfortable with SQL then you may prefer to write your query like this:
string[] catNumbers = {"0001", "0002"};
var orders = from o in ctx.Orders
join p2 in ctx.Products on o.FKCatalogNumber equals p2.CatalogNumber
where
(
from c in ctx.Categories
join p1 in ctx.Products on c.ID equals p1.FKCategoryId
where catNumbers.Contains(p1.CatalogNumber)
select c.ID
).Contains(p2.FKCategoryId)
select o;
As you can see, it's actually just your SQL query rearranged slightly, but it compiles as C#.
Note that:
the [Order] o syntax for referencing tables is replaced by o in ctx.Orders
LINQ enforces which way round you do the join condition so I had to flip your on o.FKCatalogNumber=p2.CatalogNumber to be on o.FKCatalogNumber equals p2.CatalogNumber
instead of your where p2.FKCategoryId IN (...), the equivalent c# is (...).Contains(p2.FKCategoryId)
the select comes last, not first
but those are the only major changes. Otherwise, it's written just like SQL.
I'd also draw your attention to a distinction regarding this comment:
the equivalent EF syntax for this query
The syntax here isn't specific to EF, but is just LINQ - Language Integrated Querying. It has two flavours: query syntax (sometimes called declarative) and method syntax (sometimes called fluent). LINQ works on just about any collection that implements IEnumerable or IQueryable, including EF's DbSet.
For more info on the different ways of querying, this MSDN page is a decent place to start. There's also this handy reference table showing the equivalent query syntax for each method-syntax operator, where applicable.
You can still nest queries in EF. The following looks like it works for me:
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Orders
.Where(o => ctx.Products
.Where(p => catNumbers.Contains(p.CatalogNumber))
.Select(p => p.CategoryId)
.Contains(o.Product.CategoryId)
);
This produces the following SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CatalogNumber] AS [CatalogNumber]
FROM [dbo].[Orders] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Products] AS [Extent2]
INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[CategoryId] = [Extent3].[CategoryId]
WHERE ([Extent1].[CatalogNumber] = [Extent3].[CatalogNumber]) AND ([Extent2].[CatalogNumber] IN (N'0001', N'0002'))
)
I have a few Tables I want to join together:
Users
UserRoles
WorkflowRoles
Role
Workflow
The equivalent sql I want to generate is something like
select * from Users u
inner join UserRoles ur on u.UserId = ur.UserId
inner join WorkflowRoles wr on wr.RoleId = ur.RoleId
inner join Workflow w on wr.WorkflowId = w.Id
where u.Id = x
I want to get all the workflows a user is part of based on their roles in one query. I've found that you can get the results like this:
user.Roles.SelectMany(r => r.Workflows)
but this generates a query for each role which is obviously less than ideal.
Is there a proper way to do this without having to resort to hacks like generating a view or writing straight sql?
You could try the following two queries:
This one is better readable, I think:
var workflows = context.Users
.Where(u => u.UserId == givenUserId)
.SelectMany(u => u.Roles.SelectMany(r => r.Workflows))
.Distinct()
.ToList();
(Distinct because a user could have two roles and these roles may contain the same workflow. Without Distinct duplicate workflows would be returned.)
But this one performs better, I believe:
var workflows = context.WorkFlows
.Where(w => w.Roles.Any(r => r.Users.Any(u => u.UserId == givenUserId)))
.ToList();
So it turns out the order which you select makes the difference:
user.Select(x => x.Roles.SelectMany(y => y.Workflows)).FirstOrDefault()
Have not had a chance to test this, but it should work:
Users.Include(user => user.UserRoles).Include(user => user.UserRole.WorkflowRoles.Workflow)
If the above is not correct then is it possible that you post your class structure?
Let's imagine I have an table called Foo with a primary key FooID and an integer non-unique column Bar. For some reason in a SQL query I have to join table Foo with itself multiple times, like this:
SELECT * FROM Foo f1 INNER JOIN Foo f2 ON f2.Bar = f1.Bar INNER JOIN Foo f3 ON f3.Bar = f1.Bar...
I have to achieve this via LINQ to Entities.
Doing
ObjectContext.Foos.Join(ObjectContext.Foos, a => a.Bar, b => b.Bar, (a, b) => new {a, b})
gives me LEFT OUTER JOIN in the resulting query and I need inner joins, this is very critical.
Of course, I might succeed if in edmx I added as many associations of Foo with itself as necessary and then used them in my code, Entity Framework would substitute correct inner join for each of the associations. The problem is that at design time I don't know how many joins I will need. OK, one workaround is to add as many of them as reasonable...
But, if nothing else, from theoretical point of view, is it at all possible to create inner joins via EF without explicitly defining the associations?
In LINQ to SQL there was a (somewhat bizarre) way to do this via GroupJoin, like this:
ObjectContext.Foos.GroupJoin(ObjectContext.Foos, a => a.Bar, b => b.Bar, (a, b) => new {a, b}).SelectMany(o = > o.b.DefaultIfEmpty(), (o, b) => new {o.a, b)
I've just tried it in EF, the trick does not work there. It still generates outer joins for me.
Any ideas?
In Linq to Entities, below is one way to do an inner join on mutiple instances of the same table:
using (ObjectContext ctx = new ObjectContext())
{
var result = from f1 in ctx.Foo
join f2 in ctx.Foo on f1.bar equals f2.bar
join f3 in ctx.Foo on f1.bar equals f3.bar
select ....;
}
I have this Sql statement
SELECT * FROM Game
INNER JOIN Series ON Series.Id = Game.SeriesId
INNER JOIN SeriesTeams ON SeriesTeams.SeriesId = Series.Id
INNER JOIN Team ON Team.Id = SeriesTeams.TeamId
INNER JOIN TeamPlayers ON TeamPlayers.TeamId = Team.Id
INNER JOIN Player ON Player.Id = TeamPlayers.PlayerId
WHERE AND Game.StartTime >= GETDATE()
AND Player.Id = 1
That I want to be converted into a lambda expression.
This is how it works.
A game can only be joined to 1 series, but a serie can of course have many games. A serie can have many teams and a team can join many series.
A player can play in many teams and a team has many players.
SeriesTeams and TeamPlayers are only the many-to-many tables created by EF to hold the references between series/teams and Teams/Players
Thanks in advance...
Edit: I use the EF 4 CTP5 and would like to have the answer as lambda functions, or in linq if that is easier...
Ok, first of all, if you want to make sure that everything is eager-loaded when you run your query, you should add an explicit Include:
context.
Games.
Include(g => g.Series.Teams.Select(t => t.Players)).
Where(g =>
g.StartTime >= DateTime.Now &&
g.Series.Teams.Any(t => t.Players.Any(p => p.Id == 1))).
ToList();
However, as I mentioned in my comment, this won't produce the same results as your SQL query, since you don't filter out the players from the child collection.
EF 4.1 has some nifty Applying filters when explicitly loading related entities features, but I couldn't get it to work for sub-sub-collections, so I think the closest you can get to your original query would be by projecting the results onto an anonymous object (or you can create a class for that if you need to pass this object around later on):
var query = context.
Games.
Where(g =>
g.StartTime >= DateTime.Now &&
g.Series.Teams.Any(t => t.Players.Any(p => p.Id == 1))).
Select(g => new
{
Game = g,
Players = g.
Series.
Teams.
SelectMany(t => t.
Players.
Where(p => p.Id == user.Id))
});
Then you can enumerate and inspect the results:
var gamesAndPlayersList = query.ToList();
I did found the solution.
IList<Domain.Model.Games> commingGames = this.Games
.Where(a => a.StartTime >= DateTime.Now && a.Series.Teams.Any(t => t.Players.Any(p => p.Id == user.Id))).ToList();
If somebody has a better solution then I am all ears..