This is a question for those of you developing on a team of devs where all of you have separate databases. You're versioning your database using source control and other tools which will automatically bring dev databases up to date to the latest version of the database (schema, data, SP's, functions, etc.).
OK Great! But wait! What if you are developing on version 4.0 of your software, but now you need to switch branches to the 3.2 branch to fix a bug? The schema could be (almost assuredly is) very different by now...
I suppose if you went through the extra effort to write rollback scripts along with your change scripts, this could work. But that seems like a lot of work - is it really worth it?
Much easier would be to create a new 3.2-branch database and work with that while working on the 3.2-branch code. It doesn't seem reasonable to me to require that each developer has exactly one database to work with.
I'm going on a limb and assume that you are versioning the database as a binary? If all your database assets were in the form of constructive code (eg SQL scripts and/or text data dumps), the solution would be simple, as suggested by Mark: store these assets as part of the development branch. To work on version 3.2, switch the branch, re-run the create scripts and presto, 3.2 database. Merging would be just as easy as with regular code (or just as painful, depending on your version control system of choice).
Here are some suggestions to work in this mode:
If creating the database instances from text is too slow, make a cache on a shared disk volume, keyed by the contents of all the schema / data files (or the MD5 sum thereof).
Write a pre-commit hook to ensure that the schema and data dumps in the developer's instance are the same as the ones under version control. This prevents people from making changes to their dev database with an interactive tool, and then forgetting to commit them.
You mention change scripts; treat them as a liability. While they may be required by your deployment scenario (eg for customers who want to upgrade in-place), they duplicate information from the version history of the database, and per Murphy's law duplication means desynchronization sooner or later. Try to auto-generate the change scripts from the versioned database assets using "diff"; or if this cannot be achieved, dedicate some serious unit tests to database upgrades.
Related
I've been using MongoDB for about a year now, however not nearly up to its potential.
I've been developing new software, out of anyone's eyes except my own, and I've enjoyed the flexibility of the database to its fullest and I've made major structural changes to data on the fly.
Now I'm at a point where I have production server(s) and 3 development servers, I'm having a real problem with changing data structures and syncing them up.
Theoretically the development servers should always have the most current data from production. In a structured database, if I rename something, I can just run a compare tool and do the corresponding change in production after a pull. In MongoDB, this can become incredibly difficult.. there could be hundreds of changes from document to document, much less from database to database.
I've been reviewing my ~/.dbshell file to kinda get the feel of changes I've made, but what about changes made within the program its self? Configuration database changes?
Are there tools or procedures that are around to make this easier?
I've spent hours on Google researching how others do it. I came across Mongeez, but it's more manual and tedious than I need. In the past, I just do a mongodump and mongorestore inside of a git directory to transport data, but these snapshots are too rigid. I read a few blog posts regarding moving new data from production to development, but nothing about updating development documents in production. I could write a comparison script, but I feel like this is reinventing the wheel. There has to be a better way.
TL;DR: What are some ways to version NoSQL data, new entries and changed data, between environments?
I had a similar problem/experience while managing a few production Mongo machines for about a year.
Two quick pieces of advice:
WiredPrairie is right. Version your documents and that will allow you to migrate in a casual/relaxed manner. I wish we had done that up front. One of my biggest regrets.
We used Groovy to connect and do our schema/data changes and I loved it. The language is easy to learn and it works great with JSON. My practice was to back up the collections I'd be operating on, write the scripts in dev, run them and if I messed up, restore the backed up collections. Iterate until I got the scripts perfect and then repeat in production.
Hi i have configured the basics of cruise control to make releases, and automated nunit test using just MSBuild. Now i'm wondering if is possible to deploy/versioning databases with this?
I'm a beginner at CCNet .So if is possible some suggestions or tutorials (if there are) . Also if someone knows a free tool for database deployment/versioning let me know.. i will be grateful.
Thanks in advance
Hugh
It isn't free but SQL Source Control from RedGate can do what you're looking for, assuming it's a SQL Server database. It has a commandline interface that you can use in CCNet tasks. The easy approach of just migrating up is... easy, the changes are applied to your database schema / data. There was an issue with v2x of the tool that they've overcome with 3, which is that if you were to rename a table column then it would delete the column and create a new one with the right name. Obviously that's quite a big problem if you've got data you want to keep, so with v3 there's the concept of migrations and this allows you to specify alter scripts so instead of dropping the column you could script the change non-destructively.
As far as I know, at this time, they don't have anything that allows you to roll back your version.
Otherwise you could take a look at database migration tools, there seemed to be some promise for these in .Net at least. There is also this post that has some other tools (again for .net) and then there's this https://stackoverflow.com/search?q=database+migration+tool which is not restricted to any language but is general database migrations
If you're still looking for ways to version and migrate databases, one such tool is dbdeploy.net . I've hosted it on github after forking it and doing some work. Latest version is fully up to date and has some interesting features (done by someone who also uses it and sent a pull request).
For a long time now, we've held our data within the project's repository. We just held everything under data/sql, and each table had its own create_tablename.sql and data_tablename.sql files.
We have now just deployed our 2nd project onto Scalr and we've realised it's a bit messy.
The way we deploy:
We have a "packageup" collection of scripts which tear apart the project into 3 archives (data, code, static files) which we then store in 3 separate buckets on S3.
Whenever a role starts up, it downloads one of the files (depending on the role: data, nfs or web) and then a "unpackage" script sets up everything for each role, loads the data into mysql, sets up the nfs, etc.
We do it like this because we don't want to save server images, we always start from vanilla instances onto which we install everything from scratch using various in-house built scripts. Startup time isn't an issue (we have a ready to use farm in 9 minutes).
The issue is that it's a pain trying to find the right version of the database whenever we try to setup a new development build (at any point in time, we've got about 4 dev builds for a project). Also, git is starting to choke once we go into production, as the sql files end up totalling around 500mb.
The question is:
How is everyone else managing databases? I've been looking for something that makes it easy to take data out of production into dev, and also migrating data from dev into production, but haven't stumbled upon anything.
You should seriously take a look at dbdeploy (dbdeploy.com). It is ported to many languages, the major ones being Java and PHP. It is integrated in build-tools like Ant and Phing, and allows easy sharing of so called delta files.
A delta file always consists of a deploy section, but can also contain an undo section. When you commit your delta file and another developer checks it out, he can just run dbdeploy and all new changes are automatically applied to his database.
I'm using dbdeploy for my open source blog, so you can take a look on how delta files are organized: http://site.svn.dasprids.de/trunk/sql/deltas/
How I understand your main question is expirience of other people in migrating of SQL data from dev into production.
I use Microsoft SQL Server instead of My SQL, so I am not sure, that my expirience you can use directly. Nevertheless this way works very good.
I use Visual Studio 2010 Ultimate edition to compare data in two databases. The same feature exist also in Vinsual Studio Team Edition 2008 (or Database edition). You can read http://msdn.microsoft.com/en-us/library/dd193261.aspx to understand how it works. You can compare two databases (dev and prod) and generate SQL Script for modifying the data. You can easy exclude some tables or some columns from the comparing. You can also examine the results and exclude some entries from generation of the script. So one can easy and flexible generate scripts which can de used for deployment of the changes in the database. You can separetely compare the data of two databases from the sructure (schema compareing). So you can refresh data in dev with the data from prod or generate scripts which modify prod database to the last version of the dev database. I recommend you to look at this features and some products of http://www.red-gate.com/ (like http://www.red-gate.com/products/SQL_Compare/index.htm).
Check out capistrano. It's a tool the ruby community uses for deployment to different enviroments and I find it really useful.
Also if your deployment is starting to choke try a tool twitter built called Murder.
Personally i'd look at Toad
http://www.toadworld.com/
Less than 10k ;) ... will analyse database structures, produce scripts to modify them and also will migrate data.
One part of the solution is to capture the version of each of your code modules and their corresponding data resources in a single location, and compare them to ensure consistency. For example, an increment in the version number of your, say, customer_comments module will require a corresponding SQL delta file to upgrade the relevant DB tables to the equal version number for the data.
For an example, have a look at Magento's core_resource approach as documented by #AlanStorm.
Cheers,
JD
I am curious if there are any solutions out there, preferably free, that can have a central database to publish data to in a versioned manner.
For example,
Client 1 decides to edit a persons profile so it gets a local copy on its machine to make changes to. When they are happy with there edit they publish the results to the central database. Just like how you would do a submit in perforce.
Client 2 tries to edit the same local copy but when they go to submit they have to resolve conflicts.
The central database must store compressed differences between versions of the data.
At any point someone can look at all versions of the data submitted.
Check out OffScale DataGrove.
This product tracks changes to the entire DB - schema and data. You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (you simply check-out their version, just like you do with your source control). This means all your DBs can always be synchronized.
Disclaimer - I work at OffScale :-)
"Version control of databases" is a bit ambiguous for a title, because you are actually asking for a VCS using a database as repository "data store".
Subversion has such a model (either Berkeley DB or filesystem-based).
It also has a Copy-Modify-Merge model which is similar to the kind of locking mechanism you are describing.
(source: red-bean.com)
(source: red-bean.com)
The sql tools from redgate sort of offer some of this functionality, but not implemented in a way you describe. For example, sql data compare can compare the differences between data in 2 databases, and sql source control can be used as well.
However, getting a copy of the database on a local machine, making changes and resubmitting would be more of a manual process.
What database server are you using? If you are using MySQL and PHP, Doctrine has 'Versionable' behavior which can be applied to a model.
The documentation on this behavior is here:
http://www.doctrine-project.org/projects/orm/1.2/docs/manual/behaviors/en#core-behaviors:versionable
This is exactly what my product (yes I'm biased :)) DBmaestro Teamwork does.
It enforces and keep track on the changes of structure and content
It prevents two parallel changes on an object structure or content by two (as long they work on the same object - meaning, same database, same schema, ...)
It uses a baseline aware analysis which understand the nature of the change and knows if the change should be promoted or should be ignored (as it was made from another environment) or if there is a conflict
And much moreā¦
I would encourage you to read a comprehensive, unbiased review on Database Enforced Management Solution by veteran Database expert Ben Taylor which he posted on LinkedIn https://www.linkedin.com/pulse/article/20140907002729-287832-solve-database-change-mangement-with-dbmaestro
I was overseeing branching and merging throughout the last release at my company, and a number of times had to modify our Subversion pre-commit hooks to enforce different requirements on check-in comments and such. I was a bit nervous every time I was editing those files, because (a) they're part of a live production system, albeit only used internally (and we're not a huge organization), and (b) they're not under version control themselves.
I'm curious what sort of fail-safes people have in place on their version control infrastructure. Daily backups? "Meta" version control? I suppose the former is in place here as part of the backup of the whole repository. And the latter would be useful as the complexity of check-in requirements grows...
Natch - the version-control and any other infrastructure code is also under version-control but I would use a separate project from any development project.
I prefer a searchable wiki or similar knowledge-base repository to clogging up your bug-tracking system with things like VCS config.
Most importantly, make sure that the documentation is kept up to date - in my experience, people are vastly better at keeping code docs up to date than admin docs. This may have been the individuals concerned . One thing that is often overlooked is, if systems are configured according to standard Unix Practices or similar philosophy, that implies a body of knowledge about locations that may not be familiar to an OS/X or Windows programmer, faced with suddenly fixing a broken script. Without being condescending, make sure basic assumptions about location and interdependency are documented.
You should document all "setup" configuration for all your tools and these documents should be checked into version control. For tools with text file configurations which allow comments, you could just checkin the config file. But for tools that require using the interface, you should have a full document with images of the dialog boxes showing what choices are chosen.
Most importantly though, these documents should say WHY you have set the values chosen (when not taking the default).
Second, as backup, the same documents should be included in your bug tracking software under a "How do I setup the version control software?" bug. (The bug tracking database is located on a different physical server, right?)
Third, all of this should be backed-up off-site. I'm sure there question on SO about backup strategies.
What's wrong with using the same version control repository for the commit hooks and other configuration files? That's how I've handled it in the past when I've been responsible for a project's configuration management.
You should also back up your svn repository. That way if the repository itself becomes corrupted or the server catches fire or something, you can recover both your project and the svn control files.
If you have build scripts that are doing this (such as Nant) then you could be checking in those.