Programmatic data transformation in EF5 Code First migration - entity-framework

Is it possible to do any type of programmatic data transformation in Entity Framework 5 Code First migrations?
There is an Sql() method to execute queries, but it has return type void and I don't see any way to get the results of the queries I perform.
Example
I have table Recipe with one-to-many relationship to Ingredient. For various reasons I want to convert this to a Ingredients JSON string property instead. The only approach I can think of is something like this:
Create new column IngredientsJson
For each recipe, query its ingredients, construct a JSON string programmatically and insert into the new column.
Drop the old table Ingredient.

You should use db 'initializer' for what you want - and/ore 'Seed' of a sort (as to where to inject into the EF flow).
You can > take a look at this post with a customized < initializer - that performas both Db Create... and Migrate. It's not cut and paste solution, but mostly works (it was just a fast go at the problem, you'd need to adjust a bit, it has couple fixes below).
MigrateDatabaseToLatestVersion dose only the migration part - and you need seed-ing exposed - or manually wrap that part (the main point is in 'checks' done for different situations - i.e. when to 'engage' into migration - or seeding).
Migration should go first, and db 'creation' kind of doesn't make much sense, except for seeding.
You override Seed (you created) to put any db handling there - you have the DbContext exposed - and you can also call SqlQuery if needed.
How to create initializer to create and migrate mysql database?

Related

See Create database queries by modelBuilder

currently I am working at a project which requires to be backwards compatible with (non-EF) databases, but also want to create a new database from model.
For this task I save the current schema somewhere (in XML form) and update the databases with raw sql update steps, until they match the schema, which is working fine.
Also, the modelBuilder matches the schema (as in, my algorithm finds no difference between the newly created database by context.Database.Create() and my saved schema) currently.
Since the schema will most likely change in later stages of development, I do have to support two ways to create an Up-to-date database and was wondering if I could combine these two - since now I have to update the saved target schema, create the update steps AND update my modelBuilder so that is creates exactly the database I need - which would be quite a tedious task.
So since there is probably no way to "translate" my schema to modelBuilder entries and because there is more not mapped information in my POCO classes (which prohibits the approach of updating a correct database and update my classes database first) the only (visible to me) way would be to somehow gather the CREATE TABLE statements a context would create when I call Database.Create() which I can use to update my schema and the update steps accordingly.
I know quite sure I can do the same by logging the context while calling the Create() method, however - this will take quite some time, will issue some queries I do not need and will create a dump database I have to get rid of afterwards each time I update my model.
So I was wondering if there was a way to inspect the modelBuilder (or the context, of course) and somehow see what the tables would look like it maps to.

Development process for Code First Entity Framework and SQL Server Data Tools Database Projects

I have been using Database First Entity Framework (EDMX) and SQL Server Data Tools Database Projects in combination very successfully - change the schema in the database and 'Update Model from Database' to get them into the EDMX. I see though that Entity Framework 7 will be dropping the EDMX format and I am looking for a new process that will allow me to use Code First in Combination with Database Projects.
Lots of my existing development and deployment processes rely on having a database project that contains the schema. This goes in source control is deployed along with the code and is used to update the production database complete with data migration using pre and post deployment scripts. I would be reluctant to drop it.
I would be keen to split one big EDMX into many smaller models as part of this work. This will mean multiple Code First models referencing the same database.
Assuming that I have an existing database and a database project to go with it - I am thinking that I would start by using the following wizard to create an initial set of entity and context classes - I would do this for each of the models.
Add | New Item... | Visual C# Items | Data | ADO.NET Entity Data Model | Code first from database
My problem is - where do I go from there? How do I handle schema changes? As long as I can get the database schema updated, I can use a schema compare operation to get the changes into the project.
These are the options that I am considering.
Make changes in the database and use the wizard from above to regenerate. I guess that I would need to keep any modifications to the entity and/or context classes in partial classes so that they do not get overwritten. Automating this with a list of tables etc to include would be handy. Powershell or T4 Templates maybe? SqlSharpener (suggested by Keith in comments) looks like it might help here. I would also look at disabling all but the checks for database existence and schema compatibility here, as suggested by Steve Green in the comments.
Make changes in code and use migrations to get these changes applied to the database. From what I understand, not having models map cleanly to database schemas (mine don't) might pose problems. I also see some complaints on the net that migrations do not cover all database object types - this was also my experience when I played around with Code First a while back - unique constraints I think were not covered. Has this improved in Entity Framework 7?
Make changes in the database and then use migrations as a kind of comparison between code and the database. See what the differences are and adjust the code to suit. Keep going until there are no differences.
Make changes manually in both code and the database. Obviously, this is not very appealing.
Which of these would be best? Is there anything that I would need to know before trying to implement it? Are there any other, better options?
So the path that we ended up taking was to create some T4 templates that generate both a DbContext and our entities. We provide the entity T4 a list of tables from which to generate entities and have a syntax to indicate that the entity based on one table should inherit from the entity based on another. Custom code goes in partial classes. So our solution looks most like my option 1 from above.
Also, we started out generating fluent configuration in OnModelCreating in the DbContext but have swapped to using attributes on the Entities (where attributes exist - HasPrecision was one that we had to use fluent configuration for). We found that it is more concise and easier to locate the configuration for a property when it is right there decorating that property.

EF subsequent migration - fill new table with data

I've created a new class, and EF will create a migration code when I'll do add-migration in Package Manager Console. Since this table is a classifier, I want to populate it with data and include this data in migration. I cannot use Seed method, since i'll be using my generated migration on production database later on.
Where should I hardcode the values for this table? I can edit the generated migration cs file, but this seems an inelegant solution. Could you recommend more proper place to define the data?
I think if you're goal is to populate your table for development/testing, there's no reason you shouldn't do you data seeding via the Seed method. You could always wrap the seed code for this table with an if block that checks your connection string values.
Edit:
If you plan to populate your table in your production database with the same data, it would certainly make sense to do so in the Up method of that specific migration that creates the table.

Get Model schema to programmatically create database using a provider that doesn't support CreateDatabase

I'm using the SQLite provider for Entity Framework 5 but it doesn't support CreateDatabase and thus cannot auto create the database. (Code First)
Is there a way I can obtain the Model schema at runtime so that I can create the SQL "CREATE TABLE" command myself?
If not at runtime, some other way to obtain the schema so I know how to create the table properly?
Thanks!
A) As for obtaining the model schema at runtime part
(all are earlier posts of mine)
See this one How I can read EF DbContext metadata programmatically?
And How check by unit test that properties mark as computed in ORM model?
Also this one for a custom initializer Programmatic data transformation in EF5 Code First migration
Having said that...
The problem I see is where and at what point you actually have the data available.
Actually I'm quite sure you won't be able to do that at any time.
Because to be able to extract that info you need to have a DbContext running - so db has to be constructed etc. etc.
In the initializer maybe - but using different ways to get that info - the above is not available.
B)
The other way would be to go the way of implementing a provider, generator etc. (e.g. this post).
That way you should get all that info just at the right time from the EF/CF itself.
However I haven't played with that much.
For more info you can check the EF source code
This is more of a 'gathered info' so far - in case it helps you get anywhere with it. Not really a solution. I'll add some more tomorrow.
EDIT:
To get the real database metadata, look into the other DataSpace, this should get you to the right place...
(note: things tend to get less exact from here - as obviously there isn't the right official support)
var ssSpaceSet = objectContext.MetadataWorkspace.GetItems<EntityContainer>(DataSpace.SSpace).First()
.BaseEntitySets
.First(meta => meta.ElementType.Name == "YourTableName");
If you look up in debugger, Table property should have the real table name.
However, reflection might be required.
How I can read EF DbContext metadata programmatically?
How check by unit test that properties mark as computed in ORM model?
Programmatic data transformation in EF5 Code First migration
Entity Framework MigrationSqlGenerator for SQLite
http://entityframework.codeplex.com/
Entity Framework - Get Table name from the Entity
ef code first: get entity table name without dataannotations
Get Database Table Name from Entity Framework MetaData
http://www.codeproject.com/Articles/350135/Entity-Framework-Get-mapped-table-name-from-an-ent

Entity Framework Table Per Type Performance

So it turns out that I am the last person to discover the fundamental floor that exists in Microsoft's Entity Framework when implementing TPT (Table Per Type) inheritance.
Having built a prototype with 3 sub classes, the base table/class consisting of 20+ columns and the child tables consisting of ~10 columns, everything worked beautifully and I continued to work on the rest of the application having proved the concept. Now the time has come to add the other 20 sub types and OMG, I've just started looking the SQL being generated on a simple select, even though I'm only interested in accessing the fields on the base class.
This page has a wonderful description of the problem.
Has anyone gone into production using TPT and EF, are there any workarounds that will mean that I won't have to:
a) Convert the schema to TPH (which goes against everything I try to achieve with my DB design - urrrgghh!)?
b) rewrite with another ORM?
The way I see it, I should be able to add a reference to a Stored Procedure from within EF (probably using EFExtensions) that has the the TSQL that selects only the fields I need, even using the code generated by EF for the monster UNION/JOIN inside the SP would prevent the SQL being generated every time a call is made - not something I would intend to do, but you get the idea.
The killer I've found, is that when I'm selecting a list of entities linked to the base table (but the entity I'm selecting is not a subclass table), and I want to filter by the pk of the Base table, and I do .Include("BaseClassTableName") to allow me to filter using x=>x.BaseClass.PK == 1 and access other properties, it performs the mother SQL generation here too.
I can't use EF4 as I'm limited to the .net 2.0 runtimes with 3.5 SP1 installed.
Has anyone got any experience of getting out of this mess?
This seems a bit confused. You're talking about TPH, but when you say:
The way I see it, I should be able to add a reference to a Stored Procedure from within EF (probably using EFExtensions) that has the the TSQL that selects only the fields I need, even using the code generated by EF for the monster UNION/JOIN inside the SP would prevent the SQL being generated every time a call is made - not something I would intend to do, but you get the idea.
Well, that's Table per Concrete Class mapping (using a proc rather than a table, but still, the mapping is TPC...). The EF supports TPC, but the designer doesn't. You can do it in code-first if you get the CTP.
Your preferred solution of using a proc will cause performance problems if you restrict queries, like this:
var q = from c in Context.SomeChild
where c.SomeAssociation.Foo == foo
select c;
The DB optimizer can't see through the proc implementation, so you get a full scan of the results.
So before you tell yourself that this will fix your results, double-check that assumption.
Note that you can always specify custom SQL for any mapping strategy with ObjectContext.ExecuteStoreQuery.
However, before you do any of this, consider that, as RPM1984 points out, your design seems to overuse inheritance. I like this quote from NHibernate in Action
[A]sk yourself whether it might be better to remodel inheritance as delegation in the object model. Complex inheritance is often best avoided for all sorts of reasons unrelated to persistence or ORM. [Your ORM] acts as a buffer between the object and relational models, but that doesn't mean you can completely ignore persistence concerns when designing your object model.
We've hit this same problem and are considering porting our DAL from EF4 to LLBLGen because of this.
In the meantime, we've used compiled queries to alleviate some of the pain:
Compiled Queries (LINQ to Entities)
This strategy doesn't prevent the mammoth queries, but the time it takes to generate the query (which can be huge) is only done once.
You'll can use compiled queries with Includes() as such:
static readonly Func<AdventureWorksEntities, int, Subcomponent> subcomponentWithDetailsCompiledQuery = CompiledQuery.Compile<AdventureWorksEntities, int, Subcomponent>(
(ctx, id) => ctx.Subcomponents
.Include("SubcomponentType")
.Include("A.B.C.D")
.FirstOrDefault(s => s.Id == id));
public Subcomponent GetSubcomponentWithDetails(int id)
{
return subcomponentWithDetailsCompiledQuery.Invoke(ObjectContext, id);
}