How to migrate existing data managed with sqeryl? - persistence

There is a small project of mine reaching its release, based on squeryl - typesafe relational database framework for Scala (JVM based language).
I foresee multiple updates after initial deployment. The data entered in the database should be persisted over them. This is impossible without some kind of data migration procedure, upgrading data for newer DB schema.
Using old data for testing new code also requires compatibility patches.
Now I use automatic schema generation by framework. It seem to be only able create schema from scratch - no data persists.
Are there methods that allow easy and formalized migration of data to changed schema without completely dropping automatic schema generation?
So far I can only see an easy way to add columns: we dump old data, provide default values for new columns, reset schema and restore old data.
How do I delete, rename, change column types or semantics?
If schema generation is not useful for production database migration, what are standard procedures to follow for conventional manual/scripted redeployment?

There have been several discussions about this on the Squeryl list. The consensus tends to be that there is no real best practice that works for everyone. Having an automated process to update your schema based on your model is brittle (can't handle situations like column renames) and can be dangerous in production. Personally, I like the idea of "migrations" where all of your schema changes are written as SQL. There are a few frameworks that help with this and you can find some of them here. Personally, I just use a light wrapper around the psql command line utility to do schema migrations and data loading as it's a lot faster for the latter than feeding in the data over JDBC.

Related

Using SQLAlchemy ORM, Pydantic and Alembic: every model change needs to be reflected in THREE separate places, violates DRY?

So I am learning FastAPI and want to get more experience with relational databases. I am using SQLAlchemy ORM, Pydantic and Alembic. Database is Postgres. One thing that I am running into however, is when I want to add a single column to a table I need to change a model, a schema and alembic in order to reflect this change. Isn't this a huge violation of DRY, error prone and very hard to maintain in the long run?
Check out SQLModel. It attempts to tackle this exact issue. Instead of defining a database model (SQLAlchemy) and a corresponding Pydantic model, you only define one SQLModel that combines both.
It doesn't change the fact that you need to verify that your Alembic migrations work as intended thought.
The project is still in its early stages, but I find it very promising.
In general, I don't see any benefit in defining the schema twice, when you are developing an API. There is however, the obvious downside of making the entire application much more error prone, when you have to repeat yourself for every change in the schema.
With SQLModel you will probably see a substantial reduction in the lines of code, unless you have very special requirements (such as exotic types, multiple layers of validation/conversion, highly complex/nested relationships).

What SQL Server 2017 features are not supported in Entity Framework Core code first?

Our team is thinking of utilizing Entity Framework Core code-first to help model the database. We can have both DB projects and EF models, as per article here Database Projects vs. Entity Framework Database Migrations utilizing schema compares, just trying to figure out what will be the source of truth?
Does Entity Framework support all features in SQL Server SSDT Database Projects?
What features does EF Core 2 not support? (eg, does it not support any of following: triggers, views, functions, stored procedures, encryption keys, certificates, db properties (ansi null, quoted identifier), partitions)
I am trying to locate the Microsoft Resource.
tl;dr Database Projects are feature-rich, but database-first. Migrations is code-first, but has a very limited built-in set of database features.
For many people it won't be relevant to compare Database Projects and Migrations. They represent two different modes of working with Entity Framework. Migrations is code-first, DP is database-first. Sure, you can use migrations to control the database schema and besides that keep a DP in sync with the generated database to satisfy DBAs (as the link suggests). But both lead their own separate lives and there's no Single Source Of Truth.
So comparing them is useful if you're not sure yet wich working mode you're going to choose.
For me the most important difference is that DP will cover all database objects and detect all changes between them when comparing databases. Migrations only detect changes between a database and the mapped model. And the set of options for generating database objects is very limited. For everything you need additionally you have to inject SQL statements into the migration code. These statements are your own responsibility. You have to figure out yourself if a migration needs an ALTER PROCEDURE statement or not (for example). EF won't complain if the script and the database differ in this respect.
This is the main reason why I've never been a great fan of migrations. It's virtually impossible to maintain a mature database schema including storage, file groups, privileges, collations, and what have you.
Another advantage of DP is that they're great in combination with source control. Each database object has its own file and it's very easy to check the change history of each individual object. That's not possible with generated migrations. Indeed, many intermediate changes may never make it to a generated migration.
Of course the obvious advantage of migrations is the possibility to do a runtime check (albeit incomplete) whether the code and the database match. In database-first projects you need to create your own mechanism for that.
EF Core is only ORM.
1) You should be ready to create all DB objects except tables manually. What I create manually: constrates (defaults as well as conditions). Since this is code first - there is no need in SP, functions and so on. If you use ORM - DB is only storage. Of course practice is important. For me default constraints adds comfort on tables where I create test data manually. And conditions also are usefull in situations when you do not trust your (team) code.
2) you will do creation (and dropping) of views, triggers, sp and so on to the "migration" code (there is such concept in EF) in plain sql:
migrationBuilder.Sql("CREATE VIEW ...");
As a result you could have a separate "migration" program (e.g. command line tool) that install or remove both Ef Core tables and your manually created objects, do and revert the data migrations.
"EF Core migrations" is quite complex api (reserve a week for learning). Interesting topics: managing several dbcontexts in one db, createing db object during migration from model annotations, unistall. Or find a freelancer for it (this part of project is good for outsourcing).

What are ways to include sizable Postgres table imports in Flyway migrations?

We have a series of modifications to a Postgres database, which can generally be written all in SQL. So it seems Flyway would be a great fit to automate these.
However, they also include imports from files to tables, such as
COPY mytable FROM '${PWD}/mydata.sql';
And secondarily, we'd like not to rely on Postgres' use of file paths like this, which apparently must reside on the server. It should be possible to run any migration from a remote client -- as in Amazon's RDS documentation (last section).
Are there good approaches to handling this kind of scenario already in Flyway? Or alternate approaches to avoid this issue altogether?
Currently, it looks like it'd work to implement the whole migration in Java and use the Postgres driver's CopyManager to import the data. However, that means most of our migrations have to be done in Java, which seems much clumsier. (As far as I can tell, hybrid Java+SQL migrations are not expected?)
Am new to looking at Flyway so thought I'd ask what other alternatives might exist with Flyway, since I'd expect it's pretty common to import a table during a migration.
Starting with Flyway 3.1, you can use COPY FROM STDIN statements within your migration files to accomplish this. The SQL execution engine will automatically use PostgreSQL's CopyManager to transfer the data.

Entity Framework without a DB?

Is it possible to use Entity Framework 4.3 without linking the model to an actual DB in the back-end?
I need to build a conceptual model of a database in the VS designer and then I'd like to manually handle fetches, inserts and updates to various back-end databases (horrible legacy systems). I need to be able to do this without EF moaning about not having tables mapped, etc. I realise that this is a very odd thing to want to do...
The reason for this is that we would like to move from these legacy systems into a well designed data model and .NET environment, but we need to still maintain functionality and backward compatibility with the old systems during development. We will then reach a stage where we can import the old data (coming from about 6 different databases) into a single DB that matches the EF model I'm building. In theory, we should then be able to switch over from the hacked up EF model to a proper EF model matching the new data structure.
Is this viable? Is it possible to use the EF interface, with LINQ without actually pointing it to a database?
I have managed to query the legacy systems by overriding the generated DbContext and exposing IQueryable properties which query the old systems. My big fight now is with actually updating the data.
If I am able to have EF track changes to entities, but not actually save those changes. I should be able to override the SaveChanges() method on the context to manually insert into various legacy tables.
I'm sort of at wits end with this issue at the moment.
UDPATE 4 Sept 2012: I've opted to use the EDMX file designer to build the data model and I generate the code by using T4. This enables me to then manually write mapping code to suit my needs. It also allows me to later perform a legacy data migration with relative ease.
If I were in your situation I'd setup the new DB server and link the legacy servers to it. Then create stored procedures to interface with EF for the INSERT/UPDATE/DELETE. This way your EF code remains separate from the legacy support messiness. As you decommission the legacy DB servers you can update your stored procedures accordingly. Once you have no more legacy DB servers you can either continue using your sprocs or do a refresh of your EF data connection to use the table schema directly.
Entity framework is to link entities to a data store without manual populates.
Otherwise you're just using classes with linq.
If you mean you don't want a seperate data store like sql server, mongo etc etc, then just let your application create the database as an mdb file that gets bundled in your app_data file. That means you don't need a databsae server so to speak and the database is part of your app.
If on the other hand you want a different way to save to the database, you can create your own data adapters to behave however you like. The mongo .net entity framework component is an example of this.
Alternatively, using code only you can just use stored procedures to persist to the database which can be a bit verbose and annoying with EF, but could bridge the gap for you you and allow you to build a good architecture with a model you want that gets translated into the crappy one in your repositories.
Then when the new database is ready, you can just rework your repo's to use savechanges and you're done.
This will of course only work with the code only approach.

Relation between ER Modelling and Database normalization

How is database normalization related to ER Modelling??
What comes first??
Or should both be implemented at the same time??
I feel modeling should come first in a highly normalized database design.
Creating the model allows you to think through how the tables will relate to one another and also allows you to envision what tables you'll need to use when writing your join queries.
Using a tool such as MySQL Workbench or Toad Data Modeler , depending on your target database vendor, can even generate SQL commands to build the tables, constraints, and indexes directly from the model. This is useful because it ensures the tables are created exactly as you designed them.
Also, when making changes to the model, some tools like those mentioned above will even allow you to "update" your schema by issuing the necessary statements required to do so.
So in short, for a project with more than one table, I'd always model it first. It also makes it easier for developers to understand how the tables function and relate at a glance rather than having to read through DDL to understand it.
Modeling can even be fun!
A model created with MySQL Workbench:
Hope this helps!