How to avoid data loss with EF Model First database schema upgrade?

How to avoid data loss with EF Model First database schema upgrade? - entity-framework

This is a long question, but I would be very very thankful if I can get some good advice on this. In short, I’m looking for a good approach for version upgrade of MS SQL database schema that also demands data being moved from deleted tables into new tables.
I think Stack Overflow is the most appropriate place for this question (not dba.stackexchange.com) because at its core, this is an issue for .NET developers using Entity Framework, and the database parts of this consists mostly of auto-generated sql scripts.
Background
A .NET application and SQL database is running in Azure (The application in worker roles and the database in Azure SQL). Until now, version upgrades have worked fine, because all database schema changes have been simple (like adding a new column). However, from now on I also need to deal with moving data from one table to another during upgrades. (I’m able to fix this temporarily by creating a new database, generate a script with data from the old database and manually edit the script to make it fit the new schema, but I hope there is a better approach).
I use Entity Framework and I use Model First. Entities and associations are defined in Visual Studio Data Model Designer, and this approach is very appropriate for my application.
I use a dacpac to upgrade the Azure SQL database, and this approach has worked well until now (but now I will get data loss, so now I must find a way to move data to new tables).
I hope I can continue to use entity framework and defining entities/associations in the designer, but it’s fine to switch away from dacpac upgrade to another technology if needed.
Upgrade approach until now
I add new entities (tables), associations (relations) and properties (columns) in the designer.
I right-click, pick “Generate Database from Model…” and this results in a .sql script that drops old database objects and creates the new database objects.
I create an empty database and run the script to create the tables/keys etc.
In SQL Server Management Studio, I right-click the database and pick “Tasks -> Extract Data-tier Application…”. When the wizard completes I get the dacpac I need (Actually I can now delete the database, since I only created it to be able to get the dacpac file, since I don’t think I can generate it in Visual Studio Data Model Designer).
I right-click the Azure SQL database and pick “Tasks -> Upgrade Data-tier Application…” and follow the wizard.
Until now I have never had data loss, so this has worked fine!
Current situation
This is a simplified example to illustrate the issue, but I will get into almost identical situations quite often from now on it seems. Look at the old and the new version of the schema in the figure below. Assume there is already data in the database. I need the data in ImageFile to end up in ImageFileOriginal or ImageFileProcessed depending on the IsOriginal boolean/bit value. Using “Upgrade Data-tier Application” I will get alerted of data loss. What approach would you recommend to deal with this? As I said earlier, it’s fine to switch away from dacpac upgrade to another technology if needed.
I have read about Visual Studio Database Projects, Fluent Migrator, Red Gate and Entity Designer Database Generation Power Pack (It doesn't support Visual Studio 2012), but I didn’t find a good way for this. I admit I haven’t spent a whole day digging into each technology, but I certainly spent some time to try finding a good approach.

The best way to migrate database schema (create / delete tables / columns) and also data, is using the SSDT - Sql Server Data Tools, available for Visual Studio 2010 and Visual Studio 2012.
Here are some very useful links:
http://msdn.microsoft.com/data/tools
http://blogs.msdn.com/b/ssdt
http://msdn.microsoft.com/en-us/data/hh297027

In the Configuration class set the constructor as below:
public Configuration()
{
AutomaticMigrationsEnabled = true;
AutomaticMigrationDataLossAllowed = false;
}
Set the AutomaticMigrationEnabled property to true
means we are using automatic code first migration and another property AutomaticMigrationDataLossAllowed is set to false, means that during the migration no existing data is lost from that migration of the table of the database.
The entire Configuration class is as follows.

Related

EF Core Migrations manual edits possible?

I am using EF Core 2.0 in my sample project with some value object configurations. I modify the code and generate migrations via CLI command line. In the last migration rather than adding a new database table as it should, it is trying to rename existing tables to each other and create an extra table for existing one. I could not figure out the reason for it.
Issue is, since with EF Core the snapshot is a separate auto-generated file from the migration itself I don't want to modify the snapshot.
I only want to modify the migration script so that it will not rename multiple tables, and then generate the snapshot from the migrations I created.
I did not see any command for this in the CLI - is it such a bad practice to modify the scaffolded migration and regenerate or am I missing some obvious new link where how to manually modify migration scripts is explained?
Thanks a bunch.
Update 1: After comments, added info about the snapshot from this link.
Because the current database schema is represented in code, EF Core doesn't have to interact with the database to create migrations. When you add a migration, EF determines what changed by comparing the data model to the snapshot file. EF interacts with the database only when it has to update the database. +

I examined my generated snapshot code from source control. It exactly has added one extra table as what I needed.
The migration script to generate this is hectic at best - renaming multiple tables to each other and then warning that this could break causing multiple issues.
Since this is a sample project for me with only mock data as of now at least, I decided to go for it and not break the automated scripts. I am willing to lose some mock data at this stage rather than wasting time on it.
If this were in a production database I would be extremely careful to manually create the same result with intervention modifying both the scaffold and the migration file.
I am accepting this one as an answer (basically saying current EF Core does not support it to the best of my current knowledge) since there is no other candidate now - I will be more than glad to accept if any better answer shows up.

Development process for Code First Entity Framework and SQL Server Data Tools Database Projects

I have been using Database First Entity Framework (EDMX) and SQL Server Data Tools Database Projects in combination very successfully - change the schema in the database and 'Update Model from Database' to get them into the EDMX. I see though that Entity Framework 7 will be dropping the EDMX format and I am looking for a new process that will allow me to use Code First in Combination with Database Projects.
Lots of my existing development and deployment processes rely on having a database project that contains the schema. This goes in source control is deployed along with the code and is used to update the production database complete with data migration using pre and post deployment scripts. I would be reluctant to drop it.
I would be keen to split one big EDMX into many smaller models as part of this work. This will mean multiple Code First models referencing the same database.
Assuming that I have an existing database and a database project to go with it - I am thinking that I would start by using the following wizard to create an initial set of entity and context classes - I would do this for each of the models.
Add | New Item... | Visual C# Items | Data | ADO.NET Entity Data Model | Code first from database
My problem is - where do I go from there? How do I handle schema changes? As long as I can get the database schema updated, I can use a schema compare operation to get the changes into the project.
These are the options that I am considering.
Make changes in the database and use the wizard from above to regenerate. I guess that I would need to keep any modifications to the entity and/or context classes in partial classes so that they do not get overwritten. Automating this with a list of tables etc to include would be handy. Powershell or T4 Templates maybe? SqlSharpener (suggested by Keith in comments) looks like it might help here. I would also look at disabling all but the checks for database existence and schema compatibility here, as suggested by Steve Green in the comments.
Make changes in code and use migrations to get these changes applied to the database. From what I understand, not having models map cleanly to database schemas (mine don't) might pose problems. I also see some complaints on the net that migrations do not cover all database object types - this was also my experience when I played around with Code First a while back - unique constraints I think were not covered. Has this improved in Entity Framework 7?
Make changes in the database and then use migrations as a kind of comparison between code and the database. See what the differences are and adjust the code to suit. Keep going until there are no differences.
Make changes manually in both code and the database. Obviously, this is not very appealing.
Which of these would be best? Is there anything that I would need to know before trying to implement it? Are there any other, better options?

So the path that we ended up taking was to create some T4 templates that generate both a DbContext and our entities. We provide the entity T4 a list of tables from which to generate entities and have a syntax to indicate that the entity based on one table should inherit from the entity based on another. Custom code goes in partial classes. So our solution looks most like my option 1 from above.
Also, we started out generating fluent configuration in OnModelCreating in the DbContext but have swapped to using attributes on the Entities (where attributes exist - HasPrecision was one that we had to use fluent configuration for). We found that it is more concise and easier to locate the configuration for a property when it is right there decorating that property.

Entity Framework 6 Model First Migration

Desired outcome:
Use model first approach with Entity Framework and allow changes to deployed database/ model to be done automatically based on the changes in the model. Automatic schema difference script generation to allow smooth migrations.
Is there a way to perform migrations in model first EF6? I can see code first migrations topics all over, but nothing much on Model First.
Options I saw so far:
Database generation power pack (seems outdated)
somehow convert to code first, then use migrations (not desirable, as I like to have a visual designer)
somehow piggy back on code first migrations (http://blog.amusedia.com/2012/08/entity-framework-migration-with-model.html : this is for EF5, got error that can't run migrations on Model First)
some third party tools?

As far as I know there still is no automatic migration for Entity framework model first.
Our approach is:
Create a fresh database from the model.
Create a diff script to migrate the old database to the new one.
Verify that this diff script is indeed correct. Always double check what your automation tool creates.
We first used Open DB diff for our model first migrations. After that we switched to Redgate's SQL compare because it produced more reliable migrations .
In our experience DbDiff produced a lot of unnecessary SQL because it bothers with the order that columns are in, and has some other issues like foreign keys constantly being dropped and re-added. Aside from that it still did the job fine, but we had to do a lot of double checking on its generated SQL.

DevExpress XPO vs NHibernate vs Entity Framework: database upgrading issue

What is the best practice for upgrading the database using ORM (DevExpress XPO, NHibernate or MS Entity Framework)?
I'm starting a new project and have to pick an ORM. The development process requires of releasing intermediate test builds quite often and likely that each build will have changes in the database structure. Each new version has to upgrade the DB gently to keep current data.
For old solutions I would provide a set of SQL scripts for upgrading the database from v1 to v2, from v2 to v3, etc. and execute them sequentially.
But how is it going to work for ORM? Should I still write SQL scripts to upgrade the DB?
I understand that simple adding new fields wouldn't cause a problem (e.g. see UpdateSchema() method for XPO), but what if I have to split a table and reallocate current records into 2 new tables?

I can't comment on the other ORM's, but I have used DevExpress XPO for a corporate treasury application since 2007. The schema changes a little with every release but there have also been some big schema changes over the years as well. A somewhat extended version of the default XPO upgrade mechanism has comfortably catered for all the changes.
There is good basic information here about upgrading XPO applications.
DevExpress provide a DBUpdater tool to assist you with the task of upgrading production environments. You can extend this tool to cater for additional requirements. In my application, we have added some options for logging, preview with rollback, etc.
Each module has virtual UpdateDatabaseBeforeSchemaUpdate() and UpdateDatabaseAfterSchemaUpdate() methods. You can significantly control the upgrade process within these.
As you mention, some of the upgrade will be handled automatically by XPO (e.g., adding a new column), but some things need additional control such as initialising the new column with a default value for existing records.
For instance, let's say MyNewField has been added to the MyEntity XPO class in version 2.0 of your application. Let's say it should default to a value of 3 for existing records. XPO will handle the creation of the new column but existing records will be NULL. (If you specify a default value in the XPO class it would only pertain to new records). In order to correct the value for existing records you would add something like the following to entity module's overridden UpdateDatabaseAfterSchemaUpdate():
public override void UpdateDatabaseAfterUpdateSchema()
{
base.UpdateDatabaseAfterUpdateSchema();
if (CurrentDBVersion < new Version(2, 0, 0, 0))
ObjectSpace.GetSession().ExecuteNonQuery(
"UPDATE [MyEntity] SET [MyNewField] = 3 WHERE [MyNewField] IS NULL");
}
(You could also use ObjectSpace.GetObjects<MyEntity>() and a foreach if you prefer to avoid the direct SQL.)
In your more extreme example of splitting a table in two, you can use the same method, but you would override UpdateDatabaseBeforeUpdateSchema() instead, run the SQL to split the table, let XPO perform any other schema updates and, if necessary, populate any default values in the UpdateDatabaseAfterUpdateSchema().
You will find that you bump into constraint problems e.g., foreign key violations so you might find you need to write some general routines such as DropAllForeignKeyConstraints() as part of the UpdateDatabaseBeforeUpdateSchema(). Sometimes you find that XPO already provide something, sometimes not. Missing constraints and indexes will get regenerated in the schema update. (In my experience switching a master data table's primary key turned out to be the hardest update routine to get right.)
By default the calls all happen in an SQL transaction so if anything fails it should all roll back.
The developers need to be aware of when a change to the domain model is likely to cause a problem with the underlying schema.
For testing, we keep a few old customer databases and run a bunch of before-and-after tests as part of the build process to make sure that existing customers are able to upgrade properly whatever version they are upgrading from. In production whenever we run into a problem upgrading, the problem data is added into this test library to prevent similar problems in the future.
We are dealing with major international companies and banks. The customers are quite happy with the result. In situations where a corporate's DBA needs to sign off on the changes, they don't seem to mind having a command line tool to do the upgrade rather than a script.

Most migration solutions can handle easy tasks, like adding new column, relationship or removing one, but fail to work when you rename a column (is that an add? or a remove following an add which equals a rename? What should you do with the data in that case?)
All three solutions have basic migrations support, XPO even lets you run your own scripts as a part of the process (to insert static/test/contant data, etc.)
There's also the MigratorDotNet project that you can use and not to rely on any ORM specific feature regarding migrations.
Personally, I would use auto migration only in dev/test environment and would have full set of upgrade scripts when running on client specific database to say upgrade from v1 to v2.

How is it going to work for ORM? Should I still write SQL scripts to
upgrade the DB?
Clear answer of this question should be on Programmer's stackexchange thread - What are the criteria for evaluating an ORM for.NET?, there i got simple answer for your question that you asked and matches with my experience with ORM while developing some project with Entity framework and Code smith ORM templates.
How does the ORM manages changes in the data model? what if I have to split a table and reallocate current records into 2 new tables?
Some can update the DB automatically within a certain measure, other
don't do anything and you'll have to do the dirty work yourself; other
provide a framework for handling change that lets you control database
updates. That means every couple of days someone needs to spend an hour updating the model to add a table or change datatypes that are changing
Ref:
https://softwareengineering.stackexchange.com/questions/6543/what-are-the-benefits-of-using-database-abstraction-by-orm
https://softwareengineering.stackexchange.com/questions/41739/best-arguments-for-against-introducing-orm-technology-into-a-companies-dev-proce/41833#41833

If you ask - what is the best practice for upgrading the db using ORM - my answer is: Don't use it if your application is more than a hobbyist app.
There are a lot of scenarios where many ORMs are unable to provide support to your specific database needs, e.g. in creating stored procedures, create indices and views or even indexed views/materialized tables without writing sql scripts. Problems like adding a new non-nullable column to an existing table are much harder to solve in ORM-Migration-Code than by writing SQL scripts.
Current Tools like Visual Studio Data Tools do handle these kind of problems way better.

EF4, self tracking, repository pattern, SQL Server 2008 AND SQL Server Compact

I am creating a project using Entity Frameworks 4 and self tracking entities. I want to be able to either get the data from a sql server 2008 database or from sql server compact database (with the switch being in the config file). I am using the repository pattern and I will have the self tracking entities sitting in a separate assembly.
Do I need two edmx files? If so, how do I generate only one set of STE's in the separate assembly? Also do I need to generate two context classes as well? I am unsure of the plumbing for all this. Can anyone help?
Darren
I forgot to add that the two databases will be identical and that the compact version is for offline usage.

Just to follow up on this. In the end I had to maintain two separate edmx files, one for sql server and one for compact. The main reason being that compact 3.5 does not support auto identities (as mentioned above by Zeeshan). This in turn led to two context classes. In the context class for sql server compact I had to put code to check for insertions, query the database for the latest id and increment it manually before saving.
Thankfully with the release of compact 4.0 this no longer applies as it supports auto id and you can indeed use just one edmx file.
Darren

You do need the edmx file as long as the schema is exactly the same. just change the connectionstring and everything would work seamlessly. Though i am not sure how u are saying that schema is same when compact edition does not support identity concept and full blown sql server does. So if you are using features specific to sql server that's not available in compact, then you would get runtime errors.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse