I am currently in the process of attempting to integrate an Entity Framework application with a legacy database that is about ten years old or so. One of the many problems that this database has (alongside having no relations or constraints whatsoever) is that almost every column is set to null, even though in almost all cases, this wouldn't make sense.
Invariably, I will encounter an exception along these lines:
The 'SortOrder' property on 'MyRecord' could not be set to a 'null' value. You must set this property to a non-null value of type 'Int32'.
I have seen many questions that refer to the exception above, but these all seem to be genuine mistakes where the developer did not write classes that properly represent the data in the database. I would like to deliberately write a class that does not properly represent the data in the database. I am fully aware that this is against the rules of Entity Framework, and that is most likely why I am having so much difficulty doing it.
It is not possible to change the schema at this point as it will break existing applications. It is also not possible to fix the data, because new data will be inserted by old applications. I would like to map the database with Entity Framework as it should be, slowly move all the applications over the next couple of years or so to rely on it for data access before finally being able to move on to the database redesign phase.
One method I have used to get around this is to transparently proxy the variable:
internal int? SortOrderInternal { get; set; }
public int SortOrder
{
get { return this.SortOrderInternal ?? 0; }
set { this.SortOrderInternal = value; }
}
I can then map the field in CodeFirst:
entity.Ignore(model => model.SortOrder);
entity.Property(model => model.SortOrderInternal).HasColumnName("SortOrder");
Using the internal keyword in this method does allow me to nicely encapsulate this nastiness so I can at the very least keep it from leaking outside my data access assembly.
But unfortunately I am now unable to use the proxy field in a query as a NotSupportedException will be thrown:
The specified type member 'SortOrder' is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported.
Perhaps it might be possible to transparently rewrite the expression once it is received by the DbSet? I would be interested to hear if this would even work; I'm not skilled enough with expression trees to say. I have so far been unsuccessful in finding a method in DbSet that I could override to manipulate the expression, but I'm not above making a new class that implements IDbSet and passes through to DbSet, horrible though that would be.
Whilst investigating the stack trace, I found a reference to an internal Entity Framework concept called a Shaper, which appears to be the thing that takes the data and inputs it to A quick bit of Googling on this concept doesn't yield anything, but investigating System.Data.Entity.dll with dotPeek indicates that this would certainly be something that would help me... assuming Shaper<T> wasn't internal and sealed. I'm almost certainly barking up the wrong tree here, but I'd be interested to hear if anyone has encountered this before.
That's a fairly tough nut to crack, but you might be able to do it via Microsoft.Linq.Translations.
Related
I am using EF6
I have a generic table which holds data for different types of class objects using the "Table Per Hierarchy" Approach. In addition these class objects use complex types for defining types for their properties.
So using a made up example,
Table = Person
"Mike the Teacher" is a "Teacher" instance of Person with a personType of "Teacher"
The "Teacher" instance has 2 properties, complextypePersonalDetails and complextypeAddress.
complextypePersonalDetails contains
First Name, Surname and Age.
complextypeAddress contains
HouseName, Street, Town, City, County.
I admit that this design may be over the top, and the problem may be of my making, but that aside I wanted to check to see whether I could do anymore with EF6 before I rewrite it.
I am performance profiling the code with JetBrains DotTrace.
On first call, say on
personTeacher = db.person.OfType().First()
I get a massive delay of around 150,000ms
around:
SerializedGeneratedViewOfType (150,000ms)
TryGenerateQueryViewOfType
GenerateTypeSpecificQueryView
GenerateQueryViewForSingleExtent
GenerateQueryViewForExtentAndType
GenerateViewsForExtentAndType
GenerateViewComponents
EnsureExtentIsFullyMapped (90,000ms)
GenerateCaseStatements (60,000ms)
I have created a pregenerated View using the "InteractivePreGeneratedViews" nuget package which creates the SQL. However even with this I still need to incur my first hit. Also this hit seems to happen every time the Webserver/Website/AppPool is restarted.
I am not totally sure of the EF process, but I guess there is some further form of runtime compilation or caching which happens when web app starts. Where could this be happening and is there a proactive method that I could use to pregenerate/precompile/precache this problem away.
In the medium term, we will rewrite this code in Dapper or EF.Core. So for now, any thoughts on what can be done?
Thanks.
I had commented on this before, but retracted it, but just agreeing with "this design may be over the top, and the problem may be of my making", but I thought I'd see if anyone else jumped in.
The initial spin-up cost is due to EF needing to resolve the mapping for your schema. This happens once, the first time a DBSet on the context is accessed. You can mitigate this by executing a query on your application start, I.e.
void Application_Start(object sender, EventArgs e)
{
// Initialization stuff...
using (var context = new MyContext())
{
var result = context.MyTable.Any(); // Spin up will happen here, not when the first user attempts to access a query.
}
}
You actually need to run a query for the the DbContext to resolve the mapping, just new-ing one up won't do it.
For larger, or more complex schemas you can also look to utilize bounded contexts where each context maps a particular set of relationships for a specific area of the application. The less complex/comprehensive a context is, the faster it initializes.
As far as the design goes, TPH is for representing inheritance, which is where you need to establish an "is-a" relation between like entities. Relational models, and ORMs by definition can support this, but they're geared more towards "has-a" relationships. Rather than having a model where you go "is-a person with an address", the relation is best mapped out that a person may "have-an" address. I've worked on a system that was designed by a team of engineers where an entire reporting system with dynamic rules was represented by 6 tables. Honestly, those designs are a nightmare to maintain.
I don't know why OfType() is so slow, but I found a fast and easy workaround by replacing it with a cast; EntityFramework seems to support that just fine and without the performance penalty.
var stopwatch = Stopwatch.StartNew();
using (var db = new MyDbContext())
{
// warn up
Console.WriteLine(db.People.Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// fast
Console.WriteLine(db.People.Select(p => p as Teacher).Where(p => p != null).Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// slow
Console.WriteLine(db.People.OfType<Teacher>().Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
}
20
3308 ms
2
3796 ms
2
10026 ms
This article provides some evidence that turning off AutoDetectChanges on your Entity Framework data context can provide a significant performance improvement when inserting large numbers of entities.
context.Configuration.AutoDetectChangesEnabled = false;
However, the DataContext provided by the SqlEntityConnection type provider doesn't seem to provide any way to control this setting.
There's no context.Configuration property, or context.DataContext.Configuration property. There is a context.DataContext.ContextOptions but it has nothing even resembling AutoDetectChangesEnabled.
The DataContext property on the type provider context is of type System.Data.Objects.ObjectContext. Does anyone know of a way to influence this particular setting from there?
I wrote a pretty similar article last year on detect changes performance which you can find here: http://blog.staticvoid.co.nz/2012/5/7/entityframework_performance_and_autodetectchanges My experience is mostly with DbContext (which wraps ObjectContext) but i did a bit of a search and found the following
Why is inserting entities in EF 4.1 so slow compared to ObjectContext?
what this says is that ObjectContext doesnt actually do automatic change detection so this isnt something you should need to worry about. However you still do need to be aware that large object graphs will slow things down as is all snapshot tracking scenarios detect changes is required at some point, and this involves full enumeration of the object graph
I have a very deep relational tree in my model design, that is, the root entity contains a collection of entities that contains more collections of other entities that contains more collections and on an on ... I develop a business layer that other developers have to use to perform operations, including get/save data.
Then, I am thinking about what is the best strategy to cope with this situation. I cannot allow that when retrieving a entity, EF resolves all the dependency tree, since it will end in a lot of useless JOIN (useless because maybe I do not need that data in the next level).
If I disable lazy loading and enforce eager loading for what is needed, it works as expected, but if other developer calls child.Parent.Id instead of child.ParentId trying to do something new (like a new requirement or feature not considered at the beggining), it will get a NullReferenceException if that dependency was not included, which is bad... but it will be a "fast error", and it could be fixed straight away.
If I enable lazy loading, accessing child.Parent.Id instead of child.ParentId will end in a standalone query to the DB each time it is accessed. It won't fail, but it is worse because there is no error, only a decrement in the performance, and all the code should be reviewed.
I am not happy with any of these two solutions.
I am not happy having entities that contains null or empty collections, when in reality, it is not true.
I am not happy with letting EF perform arbitrary queries to the DB at any moment. I would like to get all the information in one shoot if possible.
So, I come up with several possible solutions that involve disabling lazy loading and enforcing eager loading, but not sure which is better:
I can create a EntityBase class, that contains the data in the table without the collections, so they cannot be accessed. And concrete implementations that contains the relationships, the problem is that you do not have much flexibility since C# does not allow multi-inheritance.
I can create interfaces that "mask" the objects hidding the properties that are not available at that method call. For example, if I have a User.Roles property, in order to show a grid will all users, I do not need to resolve the .Roles property, so I could create an interface 'IUserData' that does not contain such property.
But I do not if this additional work is worth, maybe a fast NullReferenceException indicating "This property has not been loaded" would be enough.
Would it be possible to throw a specific exception type if the property is virtual and it has not been overridden/set ?
What method do you use?
Thanks.
In my opinion you are trying to protect the developers from the need to understand what they are doing when they access data and what performance implications it can have - which might result in an unnecessary convoluted API with a lot of helper classes, base classes, interfaces, etc.
If a developer uses user.MiddleName.Trim() and MiddleName is null he gets a NullReferenceException and did something wrong, either didn't check for null or didn't make sure that the MiddleName is set to a value. The same when he accesses user.Roles and gets a NullReferenceException: He didn't check for null or didn't call the appropriate method of your API that loads the Roles of the user.
I would say: Explain how navigation properties work and that they have to be requested explicitly and let the application crash if a developer doesn't follow the rules. He needs to understand the mistake and fix it.
As a help you could make loading related data explicit somehow in the API, for example with methods like:
public User GetUser(int userId);
public User GetUserWithRoles(int userId);
Or:
public User GetUser(int userId, params Expression<Func<User,object>>[] includes);
which could be called with:
var userWithoutRoles = layer.GetUser(1);
var userWithRoles = layer.GetUser(2, u => u.Roles);
You could also leverage explicit loading instead of lazy loading to force the developers to call a method when they want to load a navigation property and not just access the property.
Two additional remarks:
...lazy loading ... will end in a standalone query to the DB each time
it is accessed.
"...and not yet loaded" to complete this. If the navigation property has already been loaded within the same context, accessing the property again won't trigger a query to the database.
I would like to get all the information in one shoot if possible.
Multiple queries do not necessarily result in worse performance than one query with a lot of Includes. In fact complex eager loading can lead to data multiplication on the wire and make entity materialization very time consuming and slower than multiple lazy or explicit loading queries. (Here is an example where a query's performance has been improved by a factor of 50 by changing it from a single query with Includes to more than 1000 queries without Include.) Quintessence is: You cannot reliably predict what's the best loading strategy in a specific situation without measuring the performance (if the performance matters in that situation).
This question concerns using JPA to manage some data where some scenarios benefit from the full object model and others seem to be better implemented by a much flatter model. I'm therefore inclined to create two models. I get the feeling that this is not a good idea but I'm hard-pressed to see exactly why, or what the alternatives may be.
The basis scenario is that there is an Entity, lets call it A which the many side of a relationship with entity B. So in the database A has a foreign key field and if the full object model we see (simplified, getters/setters removed)
public Class A {
public int aKey;
public B;
// more attributes
}
public Class B {
public int bKey;
public List<A> collectionOfA;
// and more
}
One particular scenario is handling the arrival into the system of new As. They come from some external in the form of, say, text files. the insertion code needs to
for each CVS record
get the bKey from the record
find the B, or manage any error
create the A, setting the B
persist
Now in fact my scenario is more complex, there are several such relationships, so that find/set pairing is repeated several times.
Alternatively I could (and in fact have) created a second mapping for the A table
public Class Ainserter {
public int aKey;
public int bKey;
// more attributes
}
Now I just set the two values and persist. This does assume that the DB will have the referential integrity constraints, but with the tooling I'm using that is the case. In this, and in many legacy systems the DB pre-exists and may be accessed from both the new JPA code and other even non-Java code. I therefore don't see a reason to put the referential integrity checking in the JPA code in such simple cases.
I can see that potentially there are opportunities for aspects of the full model to become stale with respect to my insertions, but in a legacy environment there could be insertions happening in the DB itself at any time. So I don't see a new problem here.
I can also see potential for confusion if the same Entity Context were used for both models, but that can be avoided by suitable encapsulation.
Any other thoughts?
Edit:
There is a suggestion from axtavt to use EntityManager.getReference(B.class, bkey) to get the B instance. My understanding is that if I do this then to be properly conforming with the JPA programming model I am supposed to set both sides of the relationship, hence I would need to visit the "referenced" B object and add my A into his collection.
Edited again:
I was concerned that visiting B would cause a database lookup, so in performance terms I would not get the win. I have it on very good authority that, at least OpenJPA, will in fact not need to "inflate" B if we only access B's key and the collection of As - and so getReference() is a good suggestion. I seems reasonable that a well designed JPA implementation would have such optimisations.
JPA has an EntityManager.getReference() method, which basically combines the approaches you describe.
It gets primary key and returns a proxy object with that primary key without hitting the database. So, you can use that object to initialize the relationship field, exactly as you want to do in your second approach.
I'm getting
System.NotSupportedException: All
objects in the EntitySet
'Entities.Message' must have unique
primary keys. However, an instance of
type 'Model.Message' and an instance
of type 'Model.Comment' both have the
same primary key value
but I have no idea what this means.
Using EF4, I have a bunch of entities of type Message. Some of these messages are actually a subtype, Comment, inheritance by table-per-type. Just
DB.Message.First();
will produce the exception. I have other instances of subtyping where I don't experience problems but I can't see any discrepencies. Sometimes, though, the problem goes away if I restart the development server, but not always.
Edit:
I've worked out (should have before) that the problem is a fault of the stored procedure fetching my Messages. The way this is currently set up as that all the fields pertaining to Message is fetched, the Comment table is ignored by the sproc. The context then proceeds to muck this up, probably by fetching those Messages that are also Comments again, as you suggested. How to do this properly is the central issue at hand. I've found some indications to a solution at http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/bb0bb421-ba8e-4b35-b7a7-950901adb602.
As you infer, it looks like the Context is fetching a Comment as a Message (not knowing that it is a comment). Later, you ask for the actual Comment, so the context fetches the Comment. Now you have two object instances in the Context with the same ID - one is a Message and one is a Comment.
It seems that the exception is not being thrown until after both objects have been loaded (ie when you try to access the Message the second time). If you can find a way to remove the Message from Context when the Comment is loaded, this may solve your problem.
Another option might be to use the Table-per-hierarchy model. This results in a bad database design but at the end of the day you have to use what works.
You might be able to avoid the problem by ensuring that the objects are loaded as Comments first. This way, when you ask for the Message, the Context already knows about it.
Also consider using Composition over Inheritance, such that a Message has 0..1 CommentDetails.
The final suggestion is to remove the dependency on the Entity Framework from your Control code, and create a Data Access Layer which references the EF and retrieves your objects. The DAL can turn Entity Framework objects into a different set of Entity objects which are easier to use in code. This approach will produce a lot of code overhead, but may be suitable if you cannot use the Entity Framework to produce an Entity model which represents your Entities in the way you want to work with them.
To summarize, unless MS fix this issue, there is no solution to your problem which does not involve a rethink of your approach. Unfortunately the Entity Framework is not ideal, especially for complex Entity models - you might be better off creating your own DAL and bypassing the EF altogether.
It sounds like you are pulling two records into memory one into message and one into comment.
Possible prblems:
There are two physical messages with the same id
The same message is being pulled up as a message and a comment
The same message is being pulled up twice into the same context
That the problem sometimes goes away when you restart, points to a problem with cleaning up of context. Are you using "using" statements.
Do you have functionality for changing from a message to a comment?
I am not an EF kind of guy (busy working with NHibernate, haven't had time to get up to date with EF yet) so I may be totally wrong here, but could the problem be that the two tables (since you are using inheritance by table-per-type) have primary keys that collide?
If you check the data in both tables, do primary key values collide?