Way to make EF include method generate inner joins? - entity-framework

Using EF4. I have a situation where a linq to EF query with explicit joins won't work due to a many-to-many table. I'm not going to break the database design to suit EF, so I want to use the include method. However, that always seems to generate left outer joins and I need inner joins (simple context.Table1s.Include("Table2") where tables are 1-to-1 or 1-to-many will demonstrate the problem).
Any way to force inner joins?

There is no way to make EF generate queries a certain way unfortunately.
However when you add the 3 entities you should only see 2 of them in the designer, the 3rd one is a "relation". So if you have for example the entities people, books and loans, you should only see people and books and people should have the navigation property called Books(), which returns all books related (loans) to that person.
And if you wish you could do an .Include("Books") on that navigation property aswell.

You can control it by writing better LINQ
I've asked a very similar question that got me a usable answer that provided a solution so I was able to translate a query to an inner join hence speeding up query execution time considerably. Especially the second solution in the accepted answer is the one I used afterwards, because I didn't really want to use eSQL since I don't really like magic strings. It would put me back 10 years in the past.

Related

EF Behavior of Last and LastOrDefault

Recently I was enhancing my knowledge with EF and to stay up to date I just went for the hole path on Pluralsight. Whith that in mind I checked out a few courses by Julie Lerman which is by the way a great teacher.
During the course there was a short recap on some of the built in functionalites of linq and how it interprets it into the query. I just wondered if this will be updated with a feature release and wonder why this is behaving like that.
From Entity Framework Core: Getting Started courese Quote: "Last methods require query to have an order by method otherwise will return full set then pick last in memory."
Thanks
First/Last methods both should always have an Order By clause to try and ensure the results are repeatable. There is really no good reason to use a Last method in an EF linq expression, just reverse the ordering and use a First method. The only possible reason would be situations where you want the First and Last item from a query expression.
You can learn a lot about what EF is doing by running a profiler and inspecting the generated SQL. Two reasons came to mind why EF would require an Order By clause to do the Last operation. The first was that SQL would typically approach a LAST type scenario using something like a MAX(...) expression which would require a column or columns to work out. The second was that it could possibly look to reverse the existing order by conditions. I thought it would be be first option, but looking at the generated SQL it is actually the second.
var test = context.Parents.OrderBy(x => x.Name).First();
SELECT TOP(1) [p].[ParentId], [p].[MasterParentId], [p].[Name]
FROM [Parents] AS [p]
ORDER BY [p].[Name]
go
var test = context.Parents.OrderBy(x => x.Name).Last();
SELECT TOP(1) [p].[ParentId], [p].[MasterParentId], [p].[Name]
FROM [Parents] AS [p]
ORDER BY [p].[Name] DESC
go
The MAX approach that you might use with SQL Server may be provider dependent and it might be problematic when working with multiple Order By expressions.
It is worth noting that EF's ability to support Linq methods is provider implementation specific so not all operations are supported by all providers or all versions of EF. For example, the Last/LastOrDefault methods are not supported in EF6, they expect you to reverse the Order By conditions and use First*.
The reason Last methods would need an OrderBy clause to avoid performing the operation in memory would likely be that EF would generate a query that compared a value against a MAX(...) operation. Without that, it cannot generate an SQL statement to get a last row and would have to load everything to enumerate over.

What Methodology To Implement For Insert Statements In Entity Framework

When inserting a lot of interrelated nested objects, is it best practice to first create and persist the inner entites as to respect the foreign key relationsihps and move up in the hierarchy or is it better to create all the objects with their inner objects and persist only the outer object? So far, I have experienced that when doing the latter, Entity Framework seems to be intelligently figuring out what to insert first with regard to the relationships. But is there a caveat that one should be aware of? The first method seems to me as the classical SQL logic whereas the latter seems to suit the idea of Entity Framework more.
It's all about managing the risk of data corruption and ensuring that database is valuable.
When you persist the inner records first, you open yourself up to the possibility of bad data if the outer save fails. Depending on your application needs, orphaned inner records could be anywhere from a major problem to a minor annoyance.
When you save everything together, if a single record fails the entire save fails.
Of course, if the inner records are meaningful without the outer records (again, determined by your business needs) then you might be preventing the application from progressing.
In short: if the inner record is dependent on the outer, save together. If the inner is meaningful on its own, make the decision based on performance / readability.
Examples:
A House has meaning without a Homeowner. It would be acceptable to create a House on its own.
A HSA (Homeowner's Association) does not make sense without Homeowners. These should be created together.
Obviously, these examples are contrived and become trivial when using existing data. For the purpose of this question, we're assuming that both are being created at the same time.

find(1234) including relationships

I am having a model "Events" (Zend_Db_Table_Abstract) that's got various relationships to other models. Usually I think I would do something like this to find it and its relationships:
$events = new Events();
$event = $events->find($id)->current();
$eventsRelationship1 = $event->findDependentRowset('Relationship1');
As the relationship is already set up I am wondering if there's any sort of automatic join available or something. Every time I fetch my event I need to have all the relationships, too. Currently I see only two ways to achieve that:
Build the query myself, hard coded. Don't like this, because it's working around the already set up relationship and "model method convenience".
Fetch every related object with a single query. This one's ugly, too, as I have to trigger too many queries.
This goes even a step further when thinking about getting a set of multiple rows. For a single event I may query the database multiple times, but when fetching 100 rows joins are just elementary.
So, does anyone know a way to create joins by using those relationships or is there no other way than hardcoding the query?
Thanks in advance
Arne
The way to solve this challenge is to 'upgrade' your database access to use the dataMapper pattern.
You are essentially adding an extra layer between the model in your application an their representation in the db. This mapper layer allows you read/write data from different tables - rather than a direct link between one model and one table.
Here is a good tutorial to follow. (There are some bits you can skip - I left out all the getters and setters as its just me using the code).
It takes a little while to get your head round the way it works, when you've just been using Zend_Db_Table_Abstract, but it is worth it.

Need to understand IN, SELECT, VALUE, FROM, in Entity Framework

Working on this EF tutorial, I've difficult to understand the meaning of the following markups
<asp:EntityDataSource ID="CoursesEntityDataSource" runat="server"
...
Where="#PersonID IN (SELECT VALUE instructor.PersonID FROM it.People AS instructor)">
...
</asp:EntityDataSource>
So what's this, pure SQL or Linq 2 Entity?
For what I understand,
it.people is people object that comes out of the query
FROM and AS makes sense in a pure SQL point of view
Taken together, they don't really make sense to me, and the tutorial didn't give much information.
Thanks for helping
This is another part of Entity Framework called Object Services. It allows you to perform the same object queries but without using LINQ to Entities. Generally it's used when you want to stream query results in DataReaders or in very limited scenarios where LINQ to Entities can't give you what you need.

Challenge: Getting Linq-to-Entities to generate decent SQL without unnecessary joins

I recently came across a question in the Entity Framework forum on msdn:
http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/bb72fae4-0709-48f2-8f85-31d0b6a85f68
The person who asked the question tried to do a relatively simple query, involving two tables, a grouping, order by, and an aggregation using Linq-to-Entities. A pretty straightforward Linq query, and straightforward to do in SQL as well - the kind of stuff people try to do every day.
However, when using Linq-to-Entities the outcome is a complex query with lots of unnecessary joins etc. I tried it and wasn't able to get Linq-to-Entities to generate a decent SQL query from it if using just pure Linq against the EF entities.
Having seen a fair share of monster queries from EF I thought maybe the OP (and me, and others) are doing something wrong. Maybe there is a better way to do this?
So here's my challenge: using the example from the EF forum and using just Linq-to-Entities against the two entities, is it possible to get EF to generate a SQL query without unnecessary joins and other complexities?
I'd like to see EF generate something a little bit closer to what Linq-to-SQL does for the same kind of queries, while still using Linq against a EF model.
Restrictions: use EFv1 .net 3.5 SP1 or EFv4 (beta 1 is part of the VS2010/.net4 beta available for download from Microsoft). No CSDL->SSDL mapping tricks, model 'definingqueries', stored procs, db-side functions, or views allowed. Just plain 1:1 mapping between the model and the db and a pure L2E query that does what the original thread on MSDN asked. An association must exist between the two entities (i.e. my "workaround #1" answer to the original thread is not a valid workaround)
Update: 500pt bounty added. Have fun.
Update: As mentioned above, a solution that uses EFv4 / .net 4 (β1 or later) is of course eligible for the bounty. If you're using .net 4 post β1, please include build number (e.g. 4.0.20605), the L2E query you used, and the SQL it generated and sent to the DB.
Update: This issue has been fixed in VS2010 / .net 4 beta 2. Although the generated SQL still has a couple of [relatively harmless] extra levels of nesting, it doesn't do any of the nutty stuff it used to. The final execution plan after SQL Server's optimizer has had a go at it is now as good as it can be. +++ for the dudes and dudettes responsible for the SQL generating part of EFv4...
If I was that worried about the crazy SQL, I just wouldn't do any of the grouping in the database. I would first query all of the data I needed by finishing it off with a ToList() while using the Include function to load all the data in a single select.
Here's my final result:
var list = from o in _entities.orderT.Include("personT")
.Where(p => p.personT.person_id == person_id &&
p.personT.created >= fromTime &&
p.personT.created <= toTime).ToList()
group o by new { o.name, o.personT.created.Year, o.personT.created.Month, o.personT.created.Day } into g
orderby g.Key.name
select new { g.Key, count = g.Sum(x => x.price) };
This results in a much simpler select:
SELECT
1 AS [C1],
[Extent1].[order_id] AS [order_id],
[Extent1].[name] AS [name],
[Extent1].[created] AS [created],
[Extent1].[price] AS [price],
[Extent4].[person_id] AS [person_id],
[Extent4].[first_name] AS [first_name],
[Extent4].[last_name] AS [last_name],
[Extent4].[created] AS [created1]
FROM [dbo].[orderT] AS [Extent1]
LEFT OUTER JOIN [dbo].[personT] AS [Extent2] ON [Extent1].[person_id] = [Extent2].[person_id]
INNER JOIN [dbo].[personT] AS [Extent3] ON [Extent1].[person_id] = [Extent3].[person_id]
LEFT OUTER JOIN [dbo].[personT] AS [Extent4] ON [Extent1].[person_id] = [Extent4].[person_id]
WHERE ([Extent1].[person_id] = #p__linq__1) AND ([Extent2].[created] >= #p__linq__2) AND ([Extent3].[created] <= #p__linq__3)
Additionally, with the example data provided, SQL Profiler only notices a 3 ms increase in duration of the SQL call.
Personally, I think that anyone that whines about not liking the output SQL of an ORM layer should go back to using Stored Procedures and Datasets. They simply aren't ready to evolve yet, and need to spend a few more years in the proverbial oven. :)
Interesting discussion. I have used 2 ORM models so far (NHibernate and LINQ-to-Entities). In my experience, there is always a line where you have to give up on ORM to generated SQL and resort back to stored procedures or views to achieve best scalable queries. Having said that, I personally think that LINQ works better on more normalized databases and all the nested queries/joins are not a major issue. There are some cases where, in order to increase performance or scalability, you have to use DB server features (indexed views for example on SQL 2008 SE works only with query hints) and you simply cannot use an ORM (except iBatis?).
Granted that you won't get the best performance or scalability by using these nested joins/queries generated by linq but please don't forget the advantages and development benefits given by LINQ (or NHibernate) in any project. Surely there must be some merit to it.
Finally, although I risk comparing apple and oranges but isn't think more like asking: Do you want rapid website development (asp.net webforms, swing) or more control on your HTML (asp.net mvc, RoR)? pick the thing that best suits your requirements.
My 2 cents!
The SQL that linq generates is very efficient. It may look bulky but it takes into account relations on tables and constraints etc. In my opinion you should just blindly use the linq commands and not worry about scale. There are benefits of the large queries as its automatically generated. It avoids any slip ups in relational constraints and adds its own wrappers for faults/exceptions.
If however you want to write the SQL's yourself and still want to work behind the confines of an ORM, then try iBatishttp://ibatis.apache.org/ You have to write the SQL's and joins yourself, so it gives you complete control over the backend model.
Personally, just use SQLMetal and linq. Dont worry about performance and scale, unless you need to.