How to implement schema migration for PostgreSQL database - postgresql

I need to implement schema migration mechanism for PostgreSQL.
Just to remove ambiguity: with schema-migration I mean that I need upgrade my database structures to the latest version regardless of their current state on particular server instance.
For example in version one I created some tables, then in version two I renamed some columns and in version three I removed one table and created another one. I have multiple servers and on some of them I have version one on some version three etc.
My idea:
Generate hash for output produced by
pg_dump --schema-only
every time before I change my database schema. This will be a reliable way to identify database version in the future to which the patch should apply.
Contain a list of patches with the associated hashed to which they should apply.
When I need to upgrade my database I will run an application that will search for hash that corresponds to current database structure (by calculating hash of local database and comparing it with hash set that I have) and apply associated patch.
Repeat until next hash is not found.
Could you please point any weak sides of this approach?

Have you ever heard of https://pgmodeler.io ? At the company where I work we decided to go for this since it can perform schema diff even between local and remote. We are very satisfied with it.
Otherwise if you are more for a free solution, you could develop a migration tool which can be used to apply migrations you store in a single repo. Furthermore this tool could rely on a migration table you keep in a separate schema so that your DB(s) will always know which migrations were applied or not.
The beauty of this approach is that migrations can both be about a schema change and data changes.
I hope this can give you some ideas.

Related

Best way to backup and restore data in PostgreSQL for testing

I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.
Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.
On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.
If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.
So for e.g.
(from bash) createdb todayDB -T snapshotDB
or
(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;
Pros:
In theory, always exact same DB by design (no custom logic)
Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)
Cons:
Takes 2x the disk space (although template could be on a low performance NFS etc)
For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.
There are 3 types of changes.
For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.

Using Data compare to copy one database over another

Ive used the Data Comare tool to update schema between the same DB's on different servers, but what If so many things have changed (including data), I simply want to REPLACE the target database?
In the past Ive just used TSQL, taken a backup then restored onto the target with the replace command and/or move if the data & log files are on different drives. Id rather have an easier way to do this.
You can use Schema Compare (also by Red Gate) to compare the schema of your source database to a blank target database (and update), then use Data Compare to compare the data in them (and update). This should leave you with the target the same as the source. However, it may well be easier to use the backup/restore method in that instance.

Best strategy for db update when updating application

I have function that initialize my database create tables etc.
Now I prepare version two of the application and in this function at the end I added check for column existence and if not exist I alter table.
My question is:
To avoid checking this all the time is it good to put in UserDefaults some flag that indicate that current app is version two and if it is to avoid this code?
This seams logical to me but other opinion is always welcome ;)
You could have a version number table/column in your database which stores the schema version number. Every time you change the schema, increment the number in your application file and then run the relevant migration code to get from one schema version to another whilst updating the schema version in the database.
This answer has a handy way of tracking db schema version numbers without creating a separate table in SQLite
Yes, you can user NSUSER Default to check this. I don't think anything wrong with this.

How to migrate existing data managed with sqeryl?

There is a small project of mine reaching its release, based on squeryl - typesafe relational database framework for Scala (JVM based language).
I foresee multiple updates after initial deployment. The data entered in the database should be persisted over them. This is impossible without some kind of data migration procedure, upgrading data for newer DB schema.
Using old data for testing new code also requires compatibility patches.
Now I use automatic schema generation by framework. It seem to be only able create schema from scratch - no data persists.
Are there methods that allow easy and formalized migration of data to changed schema without completely dropping automatic schema generation?
So far I can only see an easy way to add columns: we dump old data, provide default values for new columns, reset schema and restore old data.
How do I delete, rename, change column types or semantics?
If schema generation is not useful for production database migration, what are standard procedures to follow for conventional manual/scripted redeployment?
There have been several discussions about this on the Squeryl list. The consensus tends to be that there is no real best practice that works for everyone. Having an automated process to update your schema based on your model is brittle (can't handle situations like column renames) and can be dangerous in production. Personally, I like the idea of "migrations" where all of your schema changes are written as SQL. There are a few frameworks that help with this and you can find some of them here. Personally, I just use a light wrapper around the psql command line utility to do schema migrations and data loading as it's a lot faster for the latter than feeding in the data over JDBC.

How do implement schema changes in a NOSQL storage system

How do you manage a major schema change when you are using a Nosql store like SimpleDB?
I know that I am still thinking in SQL terms, but after working with SimpleDB for a few weeks I need to make a change to a running database. I would like to change one of the object classes to have a unique id, as rather than a business name, and as it is referenced by another object, I will need to also update the reference value in these objects.
With a SQL database you would run set of sql statements as part of the client software deployment process. Obviously this will not work with something like SimpleDB as
there is no equivalent of a SQL update statement.
Due to the distributed nature of SimpleDB, there is no way of knowing when the changes you have made to the database have 'filtered' out to all the nodes running your client software.
Some solutions I have thought of are
Each domain has a version number. The client software knows which version of the domain it should use. Write some code that copies the data from one domain version to another, making any required changes as you go. You can then install new client software that then accesses the new domain version. This approach will not work unless you can 'freeze' all write access during the update process.
Each item has a version attribute that indicates the format used when it was stored. The client uses this attribute when loading the object into memory. Object can then be converted to the latest format when it is written back to SimpleDB. The problem with this is that the new software needs to be deployed to all servers before any writes in the new format occur, or clients running the old software will not know how to read the new format.
It all is rather complex and I am wondering if I am missing something?
Thanks
Richard
I use something similar to your second option, but without the version attribute.
First, try to keep your changes to things that are easy to make backward compatible - changing the primary key is the worst case scenario for this.
Removing a field is easy - just stop writing to that field once all servers are running a version that doesn't require it.
Adding a field requires that you never write that object using code that won't save that field. If you can't deploy the new version everywhere at once, use an intermediate version that supports saving the field before you deploy a version that requires it.
Changing a field is just a combination of these two operations.
With this approach changes are applied as needed - write using the new version, but allow reading of the old version with default or derived values for the new field.
You can use the same code to update all records at once, though this may not be appropriate on a large dataset.
Changing the primary key can be handled the same way, but could get really complex depending on which nosql system you are using. You are probably stuck with designing custom migration code in this case.
RavenDB another NoSQL database uses migrations to acheive this
http://ayende.com/blog/66563/ravendb-migrations-rolling-updates
http://ayende.com/blog/66562/ravendb-migrations-when-to-execute
Normally these type of changes are handled by your application that changes the schema to a newer one upon loading version X and converting to version Y and persisting