We have an application which is using Cassandra for its database. How should we deploy schema changes in a live production environment.
In development we are just blowing the database away and recreating it with a 'database.cql' script kept in version control. This clearly isn't a solution in production.
In the relational world I would either use a sequence of upgrade scripts and apply them in order, or use a tool to interactively compare the staging and production databases and make the appropriate schema changes.
How do I solve the same problem in the Cassandra?
Here's one I've started and have been using for a while.
https://github.com/heartysoft/aedes
It supports multiple environments, and versioning. Since we're Windows based, it's mainly powershell, but there's no reason a bash script couldn't be written to do the equivalent. The powershell script itself is extremely simple. It requires Powershell v3+. Usage is pretty easy:
aedes.ps1 192.168.40.4 [-u username -p password -env dev]
will look for schema files in the ..\schema folder. Schema files are expected to have a n_ prefix. Environment specific files have a .env.cql postfix. So, if the files are:
1_people.dev.cql
1_people.prod.cql
2_people_some_indexes.cql
3_jobs.dev.cql
3_jobs.prod.cql
4_jobs_something_changed.cql
And run it for prod, then the ones with .prod.cql and no "env" .cql will be applied in order. You can also specify a $start version that can be used to specify where to start applying from (e.g. if start is specified as 3, then anything with 1_ and 2_ will be skipped).
It's pretty basic but seems to work quite well. We just have Cassandra downloaded (not installed) on the "applier machine" (which could be your machine, i.e. not part of a cluster) and have cqlsh on the PATH for easier application. Did (and do) have plans for more features, but working nicely as is for the time being.
Since there wasn't an existing tool, I ended up writing one.
It is called cql-migrate, and provides incremental updates to a deployed Cassandra schema.
[update] Since writing this, I have found a couple more options: one for for rails and another for go
Related
We have a database project that uses a dacpac to deploy schema changes and also allows a pre-deployment and post-deployment script.
However, we frequently have to run one-off scripts and security would prefer that developers not have write access in prod (we do not have DBA role at this time). I'm trying to find a solution that would work with azure devops to store one-time run scripts in git, run the script if it has not been run before, and not run the script the next time the pipeline runs. We'd like this done through devops so the SP has access to run the queries and not the dev, and anything flowing through the pipe has been through our peer review process, plus we have record of what was executed.
I'm looking for suggestions from anyone who has done this or is aware of any product which can do this.
Use liquibase. Though I would have it as part of my code base you can also use it from the CLI and run your scripts using that tool.
Liquibase keeps track of what SQL files you have published across deployments so you can have multiple stages say DIT, UAT, STAGING, PROD and it can apply the remaining one off SQL changes over time.
Generally unless you really need support, I doubt you'd need the commercial version. The opensource version is more than sufficient for my system needs and I have a relatively complex system already.
The main reason I like liquibase over other technologies is it allows for SQL based change sets. So the learning curve is a lot lower.
Two tips:
don't rely on the automatic computation of the logicalFilePath, explicitly set it even if it is repeating yourself. This allows you to refactor your scripts so instead of lumping everything into a single folder you may group them later on.
Name your scripts with the date first. That way you can leverage the natural sorting order.
I've faced a similar problem in the past:
Option 1
If you can afford to have an additional table in your database to keep track of what was executed or not, your problem can be easily solved, there is a tool which helps you: https://github.com/DbUp/DbUp
Then you would have a new repository let's call it OneOffSqlScriptsRepository and your pipeline would consume this repository:
resources:
repositories:
- repository: OneOffSqlScriptsRepository
endpoint: OneOffSqlScriptsEndpoint
type: git
Thus you'd create a pipeline to run this DbUp application consuming the scripts from the OneOffSqlScripts repository, the DB would take care of executing the scripts only once (it's configurable).
The username/password for the database can be stored safely in the library combined with azure keyvaults, so only people with the right access rights could access them (apart from the pipeline).
Option 2
This option assumes that you wanna do everything by using only the native resources that azure pipelines can provide.
Create a OneOffSqlScripts as in option1
Create a ScriptsRunner repository
In the ScriptRunner repository, you'd create a folder containing a .json file with the name of the scripts and the amount of times (or a boolean) you've had run them.
eg.:
[{
"id": 1
"scriptName" : "myscript1.sql"
"runs": 0 //or hasRun : false
}]
Then write a python script that reads and writes a json file by updating the amount of runs, thus you'd need to update your repository after each pipeline run. It would mean that your pipeline will perform a git commit / push operation after each run in case there new scripts to be run.
The algorithm is like these, the implementation can be tuned.
MySQL workbench is a fantastic tool. However, I'm having a hard time figuring out how to create multiple windows. For example, in Sequel Pro (or TablePlus or really any other SQL client) I can have multiple windows:
Yes I know there are 'tabs' but those aren't quite the same thing. Is there a way to have multiple windows using MySQL Workbench?
It seems like from a few other threads this would need to be done manually via:
$ open -n -a MySQLWorkbench.app
MySQL Workbench is originally designed to be a single instance app only. On Windows this has been extended to allow multiple instances (there's a setting in the preferences) and you found a way to do this on macOS. However this bears some risks, because all instances share the same config and cache files and can write simultaneously to them, which is prone to file corruption. Also, any changes done to the configuration or connections end up in the same file, so the last change may override previously made changes in another instance.
I'm going to build a extremly small script for dumping a Sybase database in perl. The problem is that Perl doesn't come with preinstalled Sybase-support. I don't have access to the servers root so I can't install any packages and I can't reach the perl-folder. The server is not configured for internet access so I have to deliver the packages "manually" thorugh FTP.
So, my question is if there are any easy ways of doing this. The only library I need is DBI::Sybase or Sybase standalone (maybe I haven't done my research enough and doesn't even need this much?) which means I would love to just be able to put the .pm file there, loading it through
use localModule
and then run my small script.
The solution has to work on both Red hat and Solaris if I understood my supervisor correctly.
Best regards
Since you are primarily concerned with dumping the database, and not data retrieval and manipulation, you could probably get by without having to use DBI::Sybase or other perl module that is not preinstalled.
Without more details, it's hard to be very specific, but here's the overview. Your perl script can execute some SQL scripts which can dump the databases.
You can either put the list of databases you wish to dump in a config file (or env file), or you can generate it dynamically by calling isql using the -b option to suppress headers, and nocount to suppress footers, and store the output in an array.
Once you have the list of databases, just loop them, running another isql command to dump each database.
I'm running a pylons app using fastcgi and apache2. There are two versions (different revisions from my svn repo), one for staging and one for production. I'd like them to use different paste config files.
Right now, my dispatch.fcgi inside htdocs in the pylons app just uses one config file (so both stage and live use the same configuration). I'd like to be able to have debugging enabled on the stage server but not on the live server, for example. Any suggestions?
One approach would be to have more than one dispatch.fcgi prepared (referencing different INI files), then run a script on deployment to copy the correct one into the active position.
Another approach would be to have two .fcgi files, then use an IfDefine directive to select the proper rules in your main httpd.conf.
In other words, on the staging server, you start httpd with httpd -D staging, then put the staging config inside <IfDefine staging></IfDefine> and the other config inside <IfDefine !staging></IfDefine>
The limitation of this approach is that since IfDefine is binary, going past two options while still having a "default" option requires a bunch of extra lines. It's not the end of the world, and if you require a parameter to be given on all deployments it stays clean.
Still, I would use option #1.
I'm in the process of developing a deployment system for a new web app and I'm wondering where the best point in the process to manage database migrations is (the question of how to do the migrations is another problem entirely).
It seems there are two ways to go:
Use a migration script that can
either be run manually from command
line or as part of the automatic
deployment/build process
Run the migrations when the app
starts up (I'm using ASP.NET so this
can be done easily enough without
causing a long-running user request)
Does anyone have any suggestions/insight/experience with these approaches? Any other suggestions?
I can see why #1 might be more attractive - it gives me complete control over when the DB is updated. However, I quite like #2 as it allows me to quickly iterate between deployments and reduces the manual process. #2 could also be used on my development machine to allow even quicker iterations. Hmm, starting to think having both might be a good thing...
We have a sales-force system with ~100 client and we are updating database at application startup (True, our is a desktop application.) I like this approach, it's safe and iterative if we have indeterministic startpoint (is the client database new or only updated to verison x.y.z?).
But at serverside I'm preferr your #1 option: we create a SQL query file on our virtual machine (based on the copy of the original database) and runs this query against the real server.
So IMHO:
Disconnected clients: startup, iterative scripts
Server: query created on VM based on the actual and real database
So I'm interrested in this problem too, and find some (half)frameworks as RikMigrations. After some googling there is a good startplace about DB versioning/migration frameworks: .NET Database Migration Tool Roundup. Not neccessarely the documentation but the team blogs can be interresting.
I like option #1 better as it seems much more flexible. In lieu of actually performing migrations on each app start, I think I would verify that the database schema (version number?) matches the code, and if not, throw a warning or error about a mismatched database schema.
I'd prefer option #1 for a number of reasons. First, integration tests usually require your DB schema to be up-to-date, and launching a web-site to upgrade the schema will be a huge timewaster. Second, you cannot change database schema while your site is running (say, add a couple of indexes to speed things up).
As for production side of things, upgrading your database in transaction MSI-style installation is much better than attempting to upgrade at each app startup since you can potentially end up with desynchronized database-application versions.
And if you're looking for the migration framework, take a look at Wizardby.
If the application ever has to run on a customer's machine than migrating at startup can prevent a lot of support calls - assuming you can do seamless migration without user intervention (I hope you aren't normally running your web app with permission to modify the database).
If the application always runs under your control automatic migration is less of an issue - but still can be a good feature, especially if you want to minimize downtime and manual deployment steps.