How to deploy: database, source and binary changes in 1 patch? - deployment

I'm part of a development team that works on many CMS based projects, using systems like Joomla and Drupal.
In our development process, all of our code changes are managed inside of Git. At the end of a sprint, we create a DIFF that we can apply via patch to live site.
The problem is that most of the time, the changes include
Database Schema Changes
Database Data Changes
Source Code changes
Binary file changes (like images)
Git Diff handles Source Code changes beautifully. Binary files are only not included in the Diff except for reference to the fact that the files have changed.
Database Schema Changes and Database Data Changes are a mess.
I was wandering if anything like an unified patch system exists that could be used to deploy all of these changes in 1 patch.
So the question is, "Is there a system that can be used to deploy all of these changes in 1 shot?
Ideally, this system would allow to run dry-run like patch, but for all of the 4 data types.
Edit:
Thank you everyone for the feedback that you provided, it was a starting point for my research in this area.
Here is what I found so far:
It's difficult to deploy php based
applications using linux packaging
system because the changes to the
project happen iteratively rather
then as releases.
It would be possible to use dbconfig to deploy changes to a
project, but the problem is
generating mysql db diffs (schema
and data)
what really is missing for deployment of php based applications
is a deployment manager that would
be installed on the server and would
be the interface for deploying the
patches
I started a Google Wave on this topic and produced a lot of information as a result.
If anyone is interested in reading this wave, please let me know and I will add you.

For handling installation and upgrade of our application, we use the debian packaging system . ( .deb package )
Context :
We are making J2EE + Flex application. Shipping and administred throught a VPN.
So not so far from you.
Fresh install and upgrade for a version to another are made through puppet ( a system for automating system administration tasks : he install our .deb )
In the .deb we have
our compiled sourcecode
the schema of the database ( handled by [db-config][1] )
binary stuff
how to install throught apt all other application needed ( mysql, tomcat ... )
= All stuff for a fresh install
We also add the info to go from a version to another
the script for upgrading the database ( for each version )
new binary
new stuff to lauch at the machine start ( eg : some weeks ago we have add a activeMQ server )
=> Once the .deb is made correctly, we can install or upgrade seamless in one operation. ( it's made automatically, without any prompt ).
Theire is one .deb per realease, each .deb has a version number and a signature.
You can pick any of our .deb and make a fresh install or upgrade from the actual version to the version number he hold.
The .deb is in our continous integration system. ( we build a .deb each hour, like if we are about to realease a new version )
What are the benefit ?
Install / upgrade automaticcally, with confidence.
Rollback a version
run dry are natively supported
In your precise case
* Database Schema Changes
* Database Data Changes
* Source Code changes
* Binary file changes (like images)
Database => you will have to write migration script. One for each version. ( ex : 1.2-update.sql 1.3-update.sql )
Source code and binary => add them, say in witch version they have to be copied/use
Edit : i'm not sure about source code. We are doing that with compiled code...
Some links to start :
https://wiki.ubuntu.com/PackagingGuide/Complete
http://www.debian.org/doc/manuals/maint-guide/index.fr.html#contents ( in french )
[1]: http://pwet.fr/man/linux/formats/dbconfig dbconfig
[1]: http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html debian

I don't think you'll find a fail-safe mechanism.
I recommend that, when possible, you take into account compatibility with the current published source when making schema/data changes.
This way you can make a v. simple tool that runs database scripts committed to a particular svn location (you don't want diff on database changes, as if you need further modifications you need different statements).
With the above done, you can have a simple command that runs the database changes, then the binary & source code changes.
For database there is also the option of schema&data comparisons tools, these could be used to compare environments & make sure there isn't anything unexpected missing in the change scripts - could also generate the change scripts, but as I said you really want to make sure it won't break current source.

You can create a tool to do the migrations painlessly -- something similar to Peoplesoft's Patch Upgrade Assistant.
It is basically a standalone executable that reads an "Upgrade Template" and carries out tasks. The upgrade template declaratively describes the upgrade tasks or "steps". The steps could be - copy (for backing up or moving the precompiled objects like classes and othar binaries), database (for altering schema elements), SQL Scripts (for loading or transforming current data). The steps will have some predicate logic capable - if it is this, do this, else skip it and go to next etc.
The template is usually an XML file. It also provides for manual steps with instructions for manual actions. Each step also specifies if it is recoverable or not. It would also validate if the step has succeeded or not.
It may be possible to have a Open Source project around this requirement which is quite common.

You need to save git commit objects in local file and then import them into other repo/branch.

Related

An early versioned migration is no longer valid SQL in an upgraded version of Postgres

In testing an upgrade to our Postgres database, we've discovered that one of our oldest versioned migration files is no longer valid SQL. This isn't an issue for the production database which (of course) has those migrations already in the schema_history_table, but standing up any new sandboxes is now made impossible by this broken V file.
What's the best way to bring an old V file into the modern world without forever orphaning our production database?
Of the top of my head I can think of a few possible options.
Configure postgres to enable previous version compatibility. I'm no expert at this, but I think there are some options here.
Just modify the historic migration scripts to they now work with the new version. This will mean that you can't stand up old versions any longer, but does this matter to you? I think that you'll need to run flyway repair after you do this, as Flyway will detect that the files have been tampered with.
Create a parallel set of scripts, one for each version, putting them in different folders. Then use the flyway.locations option to specify different folders depending on the version of the target.

What's the best way to upgrade from umbraco 7.6 to 7.15.1 (including db upgrade)

I am trying to upgrade the site from v 7.6 to v. 7.15.1.
I have done the upgrade on localhost which included updating the db.
Now I transferred my files from localhost o the test site and on there I am getting an error in log:
ERROR Umbraco.Core.UmbracoApplicationBase - An unhandled exception occurred
System.Data.SqlClient.SqlException (0x80131904): Invalid object name 'umbracoUserLogin'.
and I can't login to the backoffice.
It seems to be looking for umbracoUserLogin on test while it doesn't exist yet because on test the db is not updated yet.
How to update the db on test in this case while the files have already been updated on localhost and transferred to test site?
I have done 2 umbraco upgrades recently; one is from 7.5.7 to 7.13.1 and the recent one is from 7.13.1 to 7.15.1.
During my upgrade; I have seen this problem and fix in this issue can help you for your problem(and I didn't see this problem again after doing the upgrade again, but this time checking all the auto changing files and accepting them one at a time-see details below for this) but coming back to your question; "What's the best way to upgrade from umbraco 7.6 to 7.15.1(including db upgrade)"; here are the steps that you should follow;
Create a backup for your project and your umbraco db before you start. If you are using Git, then things will be super easy for this.
Open up Nuget Package Manager for your Umbraco project and do the package upgrade using the Nuget Package Manger window or the consol. Search for UmbracoCms version 7.15.1 for your case.
Once you start doing the upgrade, you will see some popup windows that will ask you to approve some auto file changes(including some config files changes). As you don't want to lose some of your pre-upgrade settings, don't accept them all or discard them all, check all of them one by one, and as a general rule; if you don't have any custom changes for those files, then simply approve the change, otherwise, check your changes and make sure you don't loose anything and discard some of these file changes as a result.
Once you're done with your UmbracoCms upgrade(which will automatically do some dependency package upgrades), build your project, make sure all is looking good then go to your local project's umbraco back-office url, this will trigger the rest of the umbraco upgrade process and simply complete the upgrade steps by following the screens- at this point your umbraco db changes will be done automatically and it is possible that you might have some issues with some old corrupt cached files, if this happens, then simply delete App_Data/TEMP files and App_Data umbraco.config file and try again. If you see some other problems during the installation, check the logs(browser developer tools can be handy to understand the problems in this case), and fix them one at a time. It is possible that you don'T need some of your old web.config settings and they might cause some issues, simply comment out those lines and see if this will fix some of the issues.
Once you are done with you local upgrade, deploy your code to your testing environment, and go to the umbraco url of your test environment and follow the screens to complete the installation for your testing environment. If you see any problems, please check my notes for step 4 above.
Do your umbraco upgrade for other testing environments(QA, UAT, Training etc) and complete your umbraco upgrade tests. Once the tests are done, then you are ready to go live. After the live deployment, you will have to complete the umbraco upgrade one last time, but this time for the live system.
Always get your back-ups for each environment before you do the upgrade, so you will be ready to rollback your changes if things go wrong(which might happen as you're doing a big umbraco upgrade).
Final note; there are some good articles for this, please take a look to understand the process better. Good luck!

How to create VCS-like conflict-merging file?

I'm trying to generate "unresolved-conflict"-like files with no luck.
I checked diff manpage and googled about diff, merging etc... but I only found information about how to handle these files, but not about how to actually generate them.
To be clear, what I am trying to do is, having two similar files, generate single automatically merged one similar to that most VCS systems like Git or Subversion generate over files in "conflict" status.
The main goal is be able to rapidly edit it to manually resolve all differences just as I do in Git or Subversion but without having them in any VCS system.
I "almost" successfully generated full diffs with diff -C 1000000command (because I won't have too large files that context limit is pretty acceptable).
...but resulting file comes with ALL rows modified. That is: prepended by "-" or "+" (depending of if it comes from first or second file) or " " (space) for common rows.
What I would obtain is an "almost unchanged" file with sections like following example emphasizing differences:
<<<<<<<< File1
Section from File1
Foo
========
Section from File2
Bar
>>>>>>>> File2
EDIT:
Answering #s.m. comment, I explain here what is my exact goal (because it is too long to explain in a comment):
I'm working on a server to allocate multiple PostgreSQL clusters acting as hot-stanby of distinct masters.
I already successfully implemented binary full/incremental backups (bacula) over production servers and also have a helper script to configure hot-standby servers.
But nowadays we have to setup (and mantain -and ideally periodically check-) all of them one by one.
To make it simpler, we are planning to create single (or possibly multiple) "Super"-hot-standby server(s) containing multiple clusters replicating different master servers.
My goal is to have a single script to create new standby cluster easily without too complicated tunning and not having to bother about backup setup (because all clusters will be backed up at once).
I almost successfully implemented that script: It creates a new cluster in a free port, adjust needed configuration parameters and put it in sync with master.
These adjustmens are made over "default" configuration files but some masters may have special configuration parameters (specially memory adjustments) that must be replicated in standby because, otherwise, it could be unable to replicate some operations of the master). And there is too the pg_haba.conf which defines which users/servers are allowed to connect to, which we also want to replicate on standby (for an eventual failover).
So, to make it easier (and less error prone) to merge both configuration files, I implemented a bash function to retrieve configuration files from masters and, now, my goal is to merge it with forementioned "default-tuned" one.
This way, adding new standby would be as easy as executing our script providing master's network name and reviewing automatically merged files to manually solve the few differences encountered in merge.
EDIT 2:
To be clear, what I were trying to do in preference order is:
Approach it by just using GNU Diff (like #s.m. pointed in his comment) even by using complex arguments or piping to external tools usually available on most unix* systems so I can wrap it in a bash function and use it in my script without no dependencies.
Use some existing tool (but not reinvent the wheel).
Implement my own tool and use it.
Without better solution, I finally tried to implement my own tool (which I called 'humandiff') to approach it.
I published it in Github and uploaded as npm package so I can now install it from npm in producion servers.
Even thought it needs a little setup to be installed. That is:
Install NodeJS and NPM (sudo apt-get install nodejs-legacy npm in debian-like systems).
Install humandiff itself (sudo npm install -g humandiff).
Usage and output examples can be found in README file so I do'nt extend myself anymore.
I post this answer just in case someone happens to have the same problem but, anyway, better solutions would be welcomed too.
Edit: I missed to say, even it's pretty obvious, that in fact I didn't implement any diff algorithm at all. I just noticed that having position and offset metatata provided by GNU diff and one of the original files is possible to construct the other or that merged file a were searching for so I simply implemented a wrapper to de so. But, instead of calling GNU Diff binary, I found an also named "diff" module in the NPM repository that served to me for the same task.

Puppet - recognize new build versions and deploy

I have a puppet master sources my application builds into a master folder. for eg. xxxxx_v1.0.0.zip and yyyyy_v1.0.8.zip [xxxxx gets deployed to a ser of servers and yyyyy to another set of servers].
What is the best way to handle sourcing on puppet master on new versions of my application builds, without editing the .pp files on the master to reference the new build number on the filename, preferably, automatic.
Thanks
A good way to build a suitable package for your operating system instead. Puppet can use those with
package { 'application-x': ensure => latest }
Failing that, you solve this
on the agent side, by fetching your application metadata from somewhere, e.g. with an exec of wget, then having it run a script to perform the deployment if necessary
on the master side using an ENC like the Puppet Dashboard, or better yet, Hiera, to hold your latest version information
If you really want to do this through Puppet's fileserver without touching any metadata and just dropping the files in your modules, you can try with the generate function.
$latest_zip_application_x = generate("/usr/local/bin/find_latest application_x")
file { 'application_x.zip':
...
source => "puppet:///modules/application_x/path/to/$latest_zip_application_x",
}
where /usr/local/bin/find_latest is a script that will find the most recent version of your package and write it to stdout.
This is pretty horrible practice though - you are really not catering to Puppet's strengths with constructs like these.

Storing third-party framework/middleware into source control that needs to alter your compiler/IDE

I know there are posts that ask how one stores third-party libraries into source control (such as this and this). While those are great answers, I still can't find the answer to this:
How do you store third-party middleware/frameworks binaries that need to alter your compiler / IDE for the library to work properly? Note: for my needs, I don't need to store the middleware source, I only store header files / lib / JAR ..so that it's ready to be linked.
Typically, you simply link libraries to your app, and you are good. But what about middleware / frameworks that need more?
Specific examples:
Qt moc pre-processor.
ZeroC Ice Slice (ice) compiler (similar to CORBA IDL preprocessor).
Basically these frameworks/middleware need to generate their own code before your application can link to it.
From the point of view of the developer, ideally he wants to just checkout, and everything should be ready to go. But then my IDE/compiler will not be setup properly yet, so the compilation will fail..
What do you think?
Backup everything including the setup of the IDE, operating system, etc. This is what i do
1) Store all 3rd party libraries in source control. I have a branch for all the libraries.
2) Backup the entire tool chain which was used to build. This includes every tool. Each tool is installed into the same directory on each developers computer, so this makes it simple to setup a developers machine remotely.
3) This is the most hardcore, but prepare 1 perfect developer IDE setup which is clean, then make a VMWare / VirtualPC image out of it. This will be useful when you cant seem to get the installers to work in future.
I learned this lesson the painful way because I often have to wade through visual studio 6 code which don't build properly.
I think that a better solution is to make sure that the build is self-contained and downloads all necessary software for itself unless you tell it otherwise. This is the way maven works, and it is really handy. The downside is that it sometimes needs to download a application server or similar, which is highly unpractical, but at least the build succeeds and it becomes the new developers responsibility to improve the build if needed.
This does of course not work great if your software needs attended installs, but I would try to avoid any such dependencies in any case. You can add alternative routes (e.g the ant script compiles the code if eclipse hasn't done it yet). If this is not feasible, an alternative option is to fail with a clear indication of what went wrong (e.g 'CORBA_COMPILER_HOME' not set, please set and try again').
All that said, the most complete solution is of course to ship everything with your app (i.e OS, IDE, the works), but I doubt that that is applicable in the general case, how would you feel about that type of requirements to build a software product? It also limits people who want to adapt your software to new platforms.
What about adding 1 step.
A nant script which is started with a bat file. The developer would only have to execute one .bat file, the bat file could start nant, and the Nant script could be made to do anything you need.
This is actually a pretty subtle question. You're talking about how to manage features of the environment which are necessary in order to allow your build to proceed. In this case it's the top level of your code toolchain, but the problem can be generalised to include the entire toolchain, and even key aspects of the operating system.
In my place of work, we have various requirements of the underlying operating system before our code will successfully run. This includes machine-specific configurations as well as ensuring correct versions of system libraries and language runtimes are present. We've dealt with this by maintaining a standard generic build machine image which contains the toolchain requirements we need. We can push this out to a virgin machine and get a basic environment that contains the complete toolchain and any auxiliary programs.
We then use fsvs to version control any additional configuration, which can be layered on to specific groups of machines as needed.
Finally, we use custom scripts hooked in to our CI server (we use Hudson) to perform any pre-processing steps required for specific projects.
The main advantages for us of this approach is:
We can build and deploy developer and production machines very easily (and have IT handle this side of the problem).
We can easily replace failed machines.
We have a known environment for testing (we install everything to a simulated 'production server' before going live).
We (the software team) version control critical configuration details and any explicit pre-processing steps.
I would outsource the task of building the midleware to a specialized build server and only include the binary output as regular 3rd party dependencies under source control.
If this strategy can be successfully applied depends on whether all developers need to be able to change midleware code and recompile it frequently. But this issue could also be solved via a Continous Integration Server like Teamcity that allows to create private builds.
Your build process would look like the following:
Middleware repo containing middleware code
Build server, building middleware
Push middleware build output to project repository as 3rd party references
Update: This doesn't really answer how to modify the IDE. It's just a sort-of Maven replacement thingy for C++/Python/Java. You shouldn't need to modify the IDE to build stuff, if so, you need a different IDE or a system that generates/modifies IDE files for you. (See CMake for a cross-platform c/c++ project file generator.)
I've written a system (first in Ant/Beanshell at two different places, then rewrote it in Python at my current job) where third-partys are compiled separately (by someone), stored and shared via HTTP.
Somewhat hurried description follows:
Upon start, the build system looks through all modules in repo, executes each module's setup target, which downloads the specific version of a third-party lib or app that the current code revision uses. These are then unzipped, PATH/INCLUDE etc are added to (or, for small libs, copy them to a single directory for the current repo) and then launches Visual Studio with /useenv.
Each module's file check for stuff that it needs, and if it needs installing and licensing, such as Visual Studio, Matlab or Maya, that must be on the local computer. If that's not there, the cmd-file will fail with a nice error message. This way, you can also check that the correct version is in there
So there are a number of directories on the local disk involved. %work% needs to be set using an global environment variable, preferrable on a different disk than system or source-checkout, at least if doing heavy C++.
%work% <- local store for all temp files, unzip, and for each working copy's temp files
%work%/_cache <- downloaded zips (2 gb)
%work%/_local <- local zips (for development or retrieved in other manners while travvelling)
%work%/_unzip <- unzips of files in _cache (10 gb)
%work%&_content <- textures/3d models and other big files (syncronized manually, this is 5 gb today, not suitable for VC either)
%work%/D_trunk/ <- store for working copy checked out to d:/trunk
%work%/E_branches/v2 <- store for working copy checked out to e:/branches/v2
So, if trunk uses Boost 1.37 and branches/v2 uses 1.39, both boost-1.39 and boost-1.37 reside in /_cache/ (as zips) and /_unzip/ (as raw files).
When starting visual studio using bat files from d:/trunk/BuildSystem/Visual Studio.cmd, INCLUDE points to /_unzip/boost-1.37, while if runnig e:/branches/v2/BuildSystem/Visual Studio.cmd, INCLUDE points to /_unzip/boost-1.39.
In the repo, only a small set of bootstrap binaries need to be stored (i.e. wget and 7z).
We currently download about 2 gb of packed data, which is unzipped to 10 gb (pdb files are huge!), so keeping this out of source control is essential. Having this system allows us to keep the repo size small enough to use DVCS such as Mercurial (or Git) instead of SVN, which is very nice. (I'm thinking of using Mercurials bigfiles extension or file sharing instead of a separately http-served directory.)
It work flawlessly. Developers need only to check out, set an enviroment variable for their local cache, then run Visual Studio via a specific batch-file in the repo. No unzipping or compiling or stuff. A new developer can set up his computer in no time. (Installing Visual Studio takes the order of a magnitude more time.)
First time on a new computer takes some time, but then it's fast, only a few seconds. Downloads/unzips are shared on the local computer, do checking out additional branches/versions does not occupy more space. Working offline is also possible, you just need to get the zip files manually if new ones have been uploaded. (This mechanism is essential to test new versions/compilations of third-party libraries.)
The basics are in a repo on bitbucket but it needs more work before it's ready for the public. Apart from doc and polish, I plan to:
extend it to use cmake instead of raw
vcproj-files, to make it more
cross-platform.
script the entire
process from checkout/download of
third-party packages to building and
zipping them (including storing the
download in a local repo) ... currently that's on my dev computer. Not good. Will fix. :)
As for moc, we use Qt's Visual Studio add-in, which stores this in the .vcproj files. Works well. I do think that CMake is one of the best answers for this though