Database migrations: manage with build script or automatic on app startup? - deployment

I'm in the process of developing a deployment system for a new web app and I'm wondering where the best point in the process to manage database migrations is (the question of how to do the migrations is another problem entirely).
It seems there are two ways to go:
Use a migration script that can
either be run manually from command
line or as part of the automatic
deployment/build process
Run the migrations when the app
starts up (I'm using ASP.NET so this
can be done easily enough without
causing a long-running user request)
Does anyone have any suggestions/insight/experience with these approaches? Any other suggestions?
I can see why #1 might be more attractive - it gives me complete control over when the DB is updated. However, I quite like #2 as it allows me to quickly iterate between deployments and reduces the manual process. #2 could also be used on my development machine to allow even quicker iterations. Hmm, starting to think having both might be a good thing...

We have a sales-force system with ~100 client and we are updating database at application startup (True, our is a desktop application.) I like this approach, it's safe and iterative if we have indeterministic startpoint (is the client database new or only updated to verison x.y.z?).
But at serverside I'm preferr your #1 option: we create a SQL query file on our virtual machine (based on the copy of the original database) and runs this query against the real server.
So IMHO:
Disconnected clients: startup, iterative scripts
Server: query created on VM based on the actual and real database
So I'm interrested in this problem too, and find some (half)frameworks as RikMigrations. After some googling there is a good startplace about DB versioning/migration frameworks: .NET Database Migration Tool Roundup. Not neccessarely the documentation but the team blogs can be interresting.

I like option #1 better as it seems much more flexible. In lieu of actually performing migrations on each app start, I think I would verify that the database schema (version number?) matches the code, and if not, throw a warning or error about a mismatched database schema.

I'd prefer option #1 for a number of reasons. First, integration tests usually require your DB schema to be up-to-date, and launching a web-site to upgrade the schema will be a huge timewaster. Second, you cannot change database schema while your site is running (say, add a couple of indexes to speed things up).
As for production side of things, upgrading your database in transaction MSI-style installation is much better than attempting to upgrade at each app startup since you can potentially end up with desynchronized database-application versions.
And if you're looking for the migration framework, take a look at Wizardby.

If the application ever has to run on a customer's machine than migrating at startup can prevent a lot of support calls - assuming you can do seamless migration without user intervention (I hope you aren't normally running your web app with permission to modify the database).
If the application always runs under your control automatic migration is less of an issue - but still can be a good feature, especially if you want to minimize downtime and manual deployment steps.

Related

Wildfly won't deploy when datasource is unavailable

I am using wildfly-8.2.0.Final.
There are several databases that i have to connect to. However, some of them are only used for certain functionalities on the web application and they are not needed to be online all the time. So when the wildfly starts, some of the datasources may not be online. However, disconnection to any datasource causes wildfly to not deploy .war deployment and i cannot find any way to solve this problem. Is there a way?
UPDATE:
I have a single table on a remote database server. The user will be able to query the table via my web application. The thing is, I have almost no control over the mentioned database. When the web application starts, it could be offline. However, this would cause my web application to fail to start. What I want is being able to run queries on a remote database if it is online. If it is offline, the web page could fail or the query can be canceled. But the only thing that I don’t want is that my web application to be limited by a remote database that I may have no control over.
My previous solution was a workaround. I would run queries on the remote database via a local database which has a foreign table to the remote one. However, the local one reads all data on the remote table before applying any constraints on postgresql 9.5. As the remote table has a large number of rows and I am using lazy loading, it takes so long for a single query and defeats the whole purpose of the lazy loading.
I found a similar question, but there is no answer.
On wildfly, you can set the datasource so that it tries to reconnect periodically when it disconnects. In my case, the deployment should be successful initially for this to be helpful.
The deployment will failed if it references those datasources.
Also you could define but disable those datasources.

Sitecore MongoDB not creating all database/collections

We are working on Sitecore deployment in Azure.
Sitecore Experience Platform 8.0 rev. 160115
MongoDB - 3.0.4
We installed MongoDB, and we can connect to localhost using Robomongo. We can only see “Analytics” database/collections.
Our connection strings setup are:
Connectionstring.config
But the other 3 databases and collections are not created.
Tracking.live
Tracking.history
Tracking.contact
In Sitecore.Analytics.config file – the setting “Analytics.Enabled” is set to true.
Sitecore.Analytics.config
In log we found some references to xDB cloud initialization failed issues, therefore we disabled it.
Are we missing any configurations? Any help or suggestions are appreciated.
Thank you
Keep in mind that MongoDB is schemaless. Of course, in a production environment you would probably have to create these databases manually - to ensure that access rights are assigned correctly. But in a development environment, any database can be created on the fly.
The only reason the analytics database was created for you is because Sitecore creates indexes for the Interactions collection. Otherwise, you wouldn't see this database until xDB wrote some data into it. Same goes for any MongoDB collection - those won't appear until there's either data being written or an index created.
The other three databases will be created once the aggregation/processing logic is executed. I.e. when your instance starts to actually collect and process visit data.
As a conclusion, don't worry about these databases missing (for now). Just verify that xDB functionality is working properly.

mongodb i/o timeout when using clustered mongo instances

I have an application that is using the upper.io/db package for communication with a Mongo database server (which is a fairly simple wrapper around gopkg.in/mgo.v2). The way the application works is that it creates a session in the main thread on start-up, and then each individual go routine that needs to make requests to the mongo server calls Clone on the session and does a defer session.Close on the resulting value. As far as I can tell, this is all standard operating procedure.
This setup works without any errors in our development environments where we are either using a locally run MongoDB or a sandbox instance on MongoLab. Recently we promoted the application up to our staging environment where we have the application talking to a Shared Cluster instance of MongoDB on MongoLab (the cheapest 15$ option). This is where the weirdness starts happening. The /first/ request that goes through (from the first go-routine that gets invoked) comes back with the expected response, but the subsequent ones all return
read tcp <ip address>:47112: i/o timeout
This happens both from our local development machines pointed at the cluster or from the AWS host for the staging environment. Since the Mongo cluster is from Mongolabs I am going to assume that they've configured everything correctly on their end.
The code is somewhat boring TBH: It literally just opens the session in the main function and maintains a reference to it, and then there are multiple goroutines with this basic structure:
sess := session.Clone()
defer sess.Close()
// make requests to Mongo
During testing, I even restricted it to run only one thing at once (i.e. only one goroutine is active at any given time), and it still fails in the same fashion.
Has anybody run into this before? Do I need to configure upper.io/db in a specific fashion? Maybe use mgo directly? I am at my wits end with this :(
In a rather long and grueling process, we finally tracked down where this issue and similar ones like it came from in our program. It ended up being a session leak in the v1 version of the upper.io/db library. The bug and fix are outlined here, but the v1 version of this library is horribly outdated at this point and the later versions do not exhibit this issue.
I doubt this answer will be useful for anybody so late in the game (especially since we ourselves solved it like.. 3 years ago at this point), but just wanted to leave the answer here for completeness.

chef mongodb user_management (create admin and other users)

I'm relatively new to chef and am in the process of using the edelight mongodb cookbook. I've got the process of actually creating a standalong mongodb instance working fine. It's understanding how to use the subsequent user_management recipe to create the initial admin user and regular users.
When I add "default['mongodb']['config']['auth'] = true" to the attributes/default.rb file, and run the mongodb::default recipe, the db is created and authentication is on.
However when I run the mongodb::user_management recipe I get this error every time. Clearly I'm doing something wrong, but being new to editing chef/ruby files I can't determine what's failing. Looks like I might need to work within the users.rb attribute file?
===================================================
Error executing action add on resource 'mongodb_user[admin]'
NameError
uninitialized constant Mongo::MongoClient
The edelight cookbook has been unmaintained for quite some time now. Chef-Brigade is attempting to take over maintenance on the cookbook until a new owner can be found.
https://github.com/chef-brigade/mongodb-cookbook
There is work being implemented to fix some of the user_management issues. I am not 100% sure the current state of the user_management fixes but you would likely be better off starting with that cookbook and reporting any issues to the team there so they can work to resolve. There is active development taking place.
I would be glad to help you debug the issue if it persists on the chef-brigade flavor of the cookbook as we can actively make changes to resolve any issues.

How to deploy changes to a Cassandra CQL schema

We have an application which is using Cassandra for its database. How should we deploy schema changes in a live production environment.
In development we are just blowing the database away and recreating it with a 'database.cql' script kept in version control. This clearly isn't a solution in production.
In the relational world I would either use a sequence of upgrade scripts and apply them in order, or use a tool to interactively compare the staging and production databases and make the appropriate schema changes.
How do I solve the same problem in the Cassandra?
Here's one I've started and have been using for a while.
https://github.com/heartysoft/aedes
It supports multiple environments, and versioning. Since we're Windows based, it's mainly powershell, but there's no reason a bash script couldn't be written to do the equivalent. The powershell script itself is extremely simple. It requires Powershell v3+. Usage is pretty easy:
aedes.ps1 192.168.40.4 [-u username -p password -env dev]
will look for schema files in the ..\schema folder. Schema files are expected to have a n_ prefix. Environment specific files have a .env.cql postfix. So, if the files are:
1_people.dev.cql
1_people.prod.cql
2_people_some_indexes.cql
3_jobs.dev.cql
3_jobs.prod.cql
4_jobs_something_changed.cql
And run it for prod, then the ones with .prod.cql and no "env" .cql will be applied in order. You can also specify a $start version that can be used to specify where to start applying from (e.g. if start is specified as 3, then anything with 1_ and 2_ will be skipped).
It's pretty basic but seems to work quite well. We just have Cassandra downloaded (not installed) on the "applier machine" (which could be your machine, i.e. not part of a cluster) and have cqlsh on the PATH for easier application. Did (and do) have plans for more features, but working nicely as is for the time being.
Since there wasn't an existing tool, I ended up writing one.
It is called cql-migrate, and provides incremental updates to a deployed Cassandra schema.
[update] Since writing this, I have found a couple more options: one for for rails and another for go