I want to save all strings to my DB in uppercase. I think the best place to do this is by overriding SaveChanges() on my DbContext. I know I need to call ToUpper() on something but I am unsure on what to call it on.
public override int SaveChanges()
{
foreach (var entry in ChangeTracker.Entries().Where(e => e.State == EntityState.Added || e.State == EntityState.Modified))
{
//do something
}
return base.SaveChanges();
}
I'm not sure if it is wise to pollute your Database layer with this constraint. This would limit reusability of your database.
Quite often the definition of structure of the (tables of the) database is separated from the handling of the data in the database, which is done via a separate Database Abstraction Layer.
This separation makes it possible to reuse your database structure for databases with other constraints (for instance, one that allows lower case strings, or a special database for unit tests).
This seperation of concerns is quite often implemented using the repository pattern. This separation makes it possible to change functionality of the database without having to change the structure of the database.
MSDN Description entity framework and repository pattern
Stack Overflow: Repository Pattern Step by Step Explanation
You could also use an existing database that uses lower case strings, as your repository layer could convert everything to upper case before returning queried strings.
So by separating the database from functionality on the data you make it easier to reuse the database for other purposes, and to change requirements without having to change the data in the database: improved reusability and improved maintenance.
Related
I am using EF6
I have a generic table which holds data for different types of class objects using the "Table Per Hierarchy" Approach. In addition these class objects use complex types for defining types for their properties.
So using a made up example,
Table = Person
"Mike the Teacher" is a "Teacher" instance of Person with a personType of "Teacher"
The "Teacher" instance has 2 properties, complextypePersonalDetails and complextypeAddress.
complextypePersonalDetails contains
First Name, Surname and Age.
complextypeAddress contains
HouseName, Street, Town, City, County.
I admit that this design may be over the top, and the problem may be of my making, but that aside I wanted to check to see whether I could do anymore with EF6 before I rewrite it.
I am performance profiling the code with JetBrains DotTrace.
On first call, say on
personTeacher = db.person.OfType().First()
I get a massive delay of around 150,000ms
around:
SerializedGeneratedViewOfType (150,000ms)
TryGenerateQueryViewOfType
GenerateTypeSpecificQueryView
GenerateQueryViewForSingleExtent
GenerateQueryViewForExtentAndType
GenerateViewsForExtentAndType
GenerateViewComponents
EnsureExtentIsFullyMapped (90,000ms)
GenerateCaseStatements (60,000ms)
I have created a pregenerated View using the "InteractivePreGeneratedViews" nuget package which creates the SQL. However even with this I still need to incur my first hit. Also this hit seems to happen every time the Webserver/Website/AppPool is restarted.
I am not totally sure of the EF process, but I guess there is some further form of runtime compilation or caching which happens when web app starts. Where could this be happening and is there a proactive method that I could use to pregenerate/precompile/precache this problem away.
In the medium term, we will rewrite this code in Dapper or EF.Core. So for now, any thoughts on what can be done?
Thanks.
I had commented on this before, but retracted it, but just agreeing with "this design may be over the top, and the problem may be of my making", but I thought I'd see if anyone else jumped in.
The initial spin-up cost is due to EF needing to resolve the mapping for your schema. This happens once, the first time a DBSet on the context is accessed. You can mitigate this by executing a query on your application start, I.e.
void Application_Start(object sender, EventArgs e)
{
// Initialization stuff...
using (var context = new MyContext())
{
var result = context.MyTable.Any(); // Spin up will happen here, not when the first user attempts to access a query.
}
}
You actually need to run a query for the the DbContext to resolve the mapping, just new-ing one up won't do it.
For larger, or more complex schemas you can also look to utilize bounded contexts where each context maps a particular set of relationships for a specific area of the application. The less complex/comprehensive a context is, the faster it initializes.
As far as the design goes, TPH is for representing inheritance, which is where you need to establish an "is-a" relation between like entities. Relational models, and ORMs by definition can support this, but they're geared more towards "has-a" relationships. Rather than having a model where you go "is-a person with an address", the relation is best mapped out that a person may "have-an" address. I've worked on a system that was designed by a team of engineers where an entire reporting system with dynamic rules was represented by 6 tables. Honestly, those designs are a nightmare to maintain.
I don't know why OfType() is so slow, but I found a fast and easy workaround by replacing it with a cast; EntityFramework seems to support that just fine and without the performance penalty.
var stopwatch = Stopwatch.StartNew();
using (var db = new MyDbContext())
{
// warn up
Console.WriteLine(db.People.Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// fast
Console.WriteLine(db.People.Select(p => p as Teacher).Where(p => p != null).Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// slow
Console.WriteLine(db.People.OfType<Teacher>().Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
}
20
3308 ms
2
3796 ms
2
10026 ms
I'm currenly designing an application where I need to use two different database schemas (on the same instance): one as the application base, the other one to customize the application and the fields for every customer.
Since I read something about Repository pattern and as I've understood is possible to use two different contexts without efficiency loose, I'm now asking if I can use a single database transaction between two schemas with Entity Framework, as I'm actually doing directly on the database (SQL Server 2008-2012).
Sorry for my English an Thanks in advance!
If your connection strings are the same (which in your case will be as you have different schemas only for different contexts) then you are ok with this approach.
Basically you will have two different contexts that will be connected via the same connection string to the database and which will represent two different schemas.
using (var scope = new TransactionScope()) {
using (var contextSO = new ContextSchemaOne()) {
// Add, remove, change entities from context schema one
ContextSchemaOne.SaveChanges;
}
using (var contextST = new ContextSchemaTwo()) {
// Add, remove, change entities from context schema two
ContextSchemaTwo.SaveChanges;
}
scope.Complete();
}
I wasn't very successful in the past with this approach, and we switched to one context per database.
Further reading: Entity Framework: One Database, Multiple DbContexts. Is this a bad idea?
Maybe it's better to read something about unit of work before taking a decision about this.
You will have to do something like this: Preparing for multiple EF contexts on a unit of work - TransactionScope
This question concerns using JPA to manage some data where some scenarios benefit from the full object model and others seem to be better implemented by a much flatter model. I'm therefore inclined to create two models. I get the feeling that this is not a good idea but I'm hard-pressed to see exactly why, or what the alternatives may be.
The basis scenario is that there is an Entity, lets call it A which the many side of a relationship with entity B. So in the database A has a foreign key field and if the full object model we see (simplified, getters/setters removed)
public Class A {
public int aKey;
public B;
// more attributes
}
public Class B {
public int bKey;
public List<A> collectionOfA;
// and more
}
One particular scenario is handling the arrival into the system of new As. They come from some external in the form of, say, text files. the insertion code needs to
for each CVS record
get the bKey from the record
find the B, or manage any error
create the A, setting the B
persist
Now in fact my scenario is more complex, there are several such relationships, so that find/set pairing is repeated several times.
Alternatively I could (and in fact have) created a second mapping for the A table
public Class Ainserter {
public int aKey;
public int bKey;
// more attributes
}
Now I just set the two values and persist. This does assume that the DB will have the referential integrity constraints, but with the tooling I'm using that is the case. In this, and in many legacy systems the DB pre-exists and may be accessed from both the new JPA code and other even non-Java code. I therefore don't see a reason to put the referential integrity checking in the JPA code in such simple cases.
I can see that potentially there are opportunities for aspects of the full model to become stale with respect to my insertions, but in a legacy environment there could be insertions happening in the DB itself at any time. So I don't see a new problem here.
I can also see potential for confusion if the same Entity Context were used for both models, but that can be avoided by suitable encapsulation.
Any other thoughts?
Edit:
There is a suggestion from axtavt to use EntityManager.getReference(B.class, bkey) to get the B instance. My understanding is that if I do this then to be properly conforming with the JPA programming model I am supposed to set both sides of the relationship, hence I would need to visit the "referenced" B object and add my A into his collection.
Edited again:
I was concerned that visiting B would cause a database lookup, so in performance terms I would not get the win. I have it on very good authority that, at least OpenJPA, will in fact not need to "inflate" B if we only access B's key and the collection of As - and so getReference() is a good suggestion. I seems reasonable that a well designed JPA implementation would have such optimisations.
JPA has an EntityManager.getReference() method, which basically combines the approaches you describe.
It gets primary key and returns a proxy object with that primary key without hitting the database. So, you can use that object to initialize the relationship field, exactly as you want to do in your second approach.
So it turns out that I am the last person to discover the fundamental floor that exists in Microsoft's Entity Framework when implementing TPT (Table Per Type) inheritance.
Having built a prototype with 3 sub classes, the base table/class consisting of 20+ columns and the child tables consisting of ~10 columns, everything worked beautifully and I continued to work on the rest of the application having proved the concept. Now the time has come to add the other 20 sub types and OMG, I've just started looking the SQL being generated on a simple select, even though I'm only interested in accessing the fields on the base class.
This page has a wonderful description of the problem.
Has anyone gone into production using TPT and EF, are there any workarounds that will mean that I won't have to:
a) Convert the schema to TPH (which goes against everything I try to achieve with my DB design - urrrgghh!)?
b) rewrite with another ORM?
The way I see it, I should be able to add a reference to a Stored Procedure from within EF (probably using EFExtensions) that has the the TSQL that selects only the fields I need, even using the code generated by EF for the monster UNION/JOIN inside the SP would prevent the SQL being generated every time a call is made - not something I would intend to do, but you get the idea.
The killer I've found, is that when I'm selecting a list of entities linked to the base table (but the entity I'm selecting is not a subclass table), and I want to filter by the pk of the Base table, and I do .Include("BaseClassTableName") to allow me to filter using x=>x.BaseClass.PK == 1 and access other properties, it performs the mother SQL generation here too.
I can't use EF4 as I'm limited to the .net 2.0 runtimes with 3.5 SP1 installed.
Has anyone got any experience of getting out of this mess?
This seems a bit confused. You're talking about TPH, but when you say:
The way I see it, I should be able to add a reference to a Stored Procedure from within EF (probably using EFExtensions) that has the the TSQL that selects only the fields I need, even using the code generated by EF for the monster UNION/JOIN inside the SP would prevent the SQL being generated every time a call is made - not something I would intend to do, but you get the idea.
Well, that's Table per Concrete Class mapping (using a proc rather than a table, but still, the mapping is TPC...). The EF supports TPC, but the designer doesn't. You can do it in code-first if you get the CTP.
Your preferred solution of using a proc will cause performance problems if you restrict queries, like this:
var q = from c in Context.SomeChild
where c.SomeAssociation.Foo == foo
select c;
The DB optimizer can't see through the proc implementation, so you get a full scan of the results.
So before you tell yourself that this will fix your results, double-check that assumption.
Note that you can always specify custom SQL for any mapping strategy with ObjectContext.ExecuteStoreQuery.
However, before you do any of this, consider that, as RPM1984 points out, your design seems to overuse inheritance. I like this quote from NHibernate in Action
[A]sk yourself whether it might be better to remodel inheritance as delegation in the object model. Complex inheritance is often best avoided for all sorts of reasons unrelated to persistence or ORM. [Your ORM] acts as a buffer between the object and relational models, but that doesn't mean you can completely ignore persistence concerns when designing your object model.
We've hit this same problem and are considering porting our DAL from EF4 to LLBLGen because of this.
In the meantime, we've used compiled queries to alleviate some of the pain:
Compiled Queries (LINQ to Entities)
This strategy doesn't prevent the mammoth queries, but the time it takes to generate the query (which can be huge) is only done once.
You'll can use compiled queries with Includes() as such:
static readonly Func<AdventureWorksEntities, int, Subcomponent> subcomponentWithDetailsCompiledQuery = CompiledQuery.Compile<AdventureWorksEntities, int, Subcomponent>(
(ctx, id) => ctx.Subcomponents
.Include("SubcomponentType")
.Include("A.B.C.D")
.FirstOrDefault(s => s.Id == id));
public Subcomponent GetSubcomponentWithDetails(int id)
{
return subcomponentWithDetailsCompiledQuery.Invoke(ObjectContext, id);
}
I'm starting a new project and have decided to try to incorporate DDD patterns and also include Linq to Entities. When I look at the EF's ObjectContext it seems to be performing the functions of both Repository and Unit of Work patterns:
Repository in the sense that the underlying data level interface is abstracted from the entity representation and I can request and save data through the ObjectContext.
Unit Of Work in the sense that I can write all my inserts/updates to the objectContext and execute them all in one shot when I do a SaveChanges().
It seems redundant to put another layer of these patterns on top of the EF ObjectContext? It also seems that the Model classes can be incorporated directly on top of the EF generated entities using 'partial class'.
I'm new at DDD so please let me know if I'm missing something here.
I don't think that the Entity Framework is a good implementation of Repository, because:
The object context is insufficiently abstract to do good unit testing of things which reference it, since it is bound to the DB access. Having an IRepository reference instead works much better for creating unit tests.
When a client has access to the ObjectContext, the client can do pretty much anything it cares to. The only real control you have over this at all is to make certain types or properties private. It is hard to implement good data security this way.
On a non-trivial model, the ObjectContext is insufficiently abstract. You may, for example, have both tables and stored procedures mapped to the same entity type. You don't really want the client to have to distinguish between the two mappings.
On a related note, it is difficult to write comprehensive and well-enforce business rules and entity code. Indeed, whether or not it this is even a good idea is debatable.
On the other hand, once you have an ObjectContext, implementing the Repository pattern is trivial. Indeed, for cases that are not particularly complex, the Repository is something of a wrapper around the ObjectContext and the Entity types.
I would say that you should look at the ObjectContext as your UnitOfWork, and not as a repository.
An ObjectContext cannot be a repository -imho- since it is 'to generic'.
You should create your own Repositories, which have specialized methods (like GetCustomersWithGoldStatus for instance) next to the regular CRUD methods.
So, what I would do, is create repositories (one for each aggregate-root), and let those repositories use the ObjectContext.
I like to have a repository layer for the following reasons:
EF gotcha's
When you look at some of the current tutorials on EF (Code First version), it is apparent there's a number of gotcha's to be handled, particularly around object graphs (entities containing entities) and disconnected scenarios. I think a repository layer is great for wrapping these up in one place.
A clear picture of data access mechanisms
A repository gives a specific picture as to how the BL is accessing and updating the data store. It exposes methods that have a clear single purpose, and can be tested independently of the BL. Standard example from the textbooks, Find() to find a single entity. A more application specific example, Clear() to clear down a db table.
A place for optimizations
Inevitably you come up against performance hits when using vanilla EF. I use the repository to hide the optimization mechanisms from the BL.
Examples,
GetKeys() to project cached keys from the tables (for Insert/Update decisions). The reading of key only is faster and uses less memory than reading the full entity.
Bulk load via SqlBulkCopy. EF will insert by individual SQL statements. If you want a single statement to insert multiple rows, SqlBulkCopy is a good mechanism. The repository encapsulates this and provides metadata for SqlBulkCopy. As well as the Insert method, you need a StartBatch() and EndBatch() method, which is also an argument for a UnitOfWork layer.