Development to production updates using NoSQL - mongodb

I've been using MongoDB for about a year now, however not nearly up to its potential.
I've been developing new software, out of anyone's eyes except my own, and I've enjoyed the flexibility of the database to its fullest and I've made major structural changes to data on the fly.
Now I'm at a point where I have production server(s) and 3 development servers, I'm having a real problem with changing data structures and syncing them up.
Theoretically the development servers should always have the most current data from production. In a structured database, if I rename something, I can just run a compare tool and do the corresponding change in production after a pull. In MongoDB, this can become incredibly difficult.. there could be hundreds of changes from document to document, much less from database to database.
I've been reviewing my ~/.dbshell file to kinda get the feel of changes I've made, but what about changes made within the program its self? Configuration database changes?
Are there tools or procedures that are around to make this easier?
I've spent hours on Google researching how others do it. I came across Mongeez, but it's more manual and tedious than I need. In the past, I just do a mongodump and mongorestore inside of a git directory to transport data, but these snapshots are too rigid. I read a few blog posts regarding moving new data from production to development, but nothing about updating development documents in production. I could write a comparison script, but I feel like this is reinventing the wheel. There has to be a better way.
TL;DR: What are some ways to version NoSQL data, new entries and changed data, between environments?

I had a similar problem/experience while managing a few production Mongo machines for about a year.
Two quick pieces of advice:
WiredPrairie is right. Version your documents and that will allow you to migrate in a casual/relaxed manner. I wish we had done that up front. One of my biggest regrets.
We used Groovy to connect and do our schema/data changes and I loved it. The language is easy to learn and it works great with JSON. My practice was to back up the collections I'd be operating on, write the scripts in dev, run them and if I messed up, restore the backed up collections. Iterate until I got the scripts perfect and then repeat in production.

Related

Branch and Merge abilities in a Document Database?

When I think of a document database, I think of a bunch of JSON files. (I imagine it is more complex than that, but that is how envision it.)
In an upcoming project, we need the ability to deal with multiple different versions of the data. As I got to looking at the needs, they are very similar to the needs that drive branching and merging of code. (Versions of the data moving through a process, emergency updates to the existing data in prod even though there are active versions being worked on, etc)
This has me wondering, do any of the popular document databases have features that are similar to branching and merging of documents? (I tried searching around, but I could not get any relevant results.)
RavenDB has great Revisions and Patching features.
With Revisions you can keep track of your documents history
https://ravendb.net/docs/article-page/4.2/Csharp/server/extensions/revisions
https://ravendb.net/learn/inside-ravendb-book/reader/4.0/4-deep-dive-into-the-ravendb-client-api#document-revisions
With Patching you can update existing data in production
https://ravendb.net/docs/article-page/4.2/Csharp/client-api/operations/patching/single-document
https://ravendb.net/learn/inside-ravendb-book/reader/4.0/2-zero-to-ravendb#patching-documents

How to synchronize deployments (especially of database object changes) on multiple environments

I have this challenge. I am the DevOps engineer and a software engineer in a team where months back, the developers moved from having a central Oracle DB to having the DB on a CentOS VM on their individual laptops. The move from a central DB was to reduce dependency on the DBAs and also to eliminate issues that stemmed from inconsistent data.
The plan for sharing and ensuring synchronization of the Database with everyone on the team was that each person will share change scripts with everyone. The problem is that we use Skype for communication (we just setup slack but are yet to start using it fully), and although people sometimes post the text of DB change scripts, it could be missed by some. The other problem is that some developers miss posting the changes. Further, new releases are deployed in Production without being deployed on the Test and Demo environments.
This has posed a serious challenge for us, especially myself who of recent, became responsible for ensuring that our Demo deployments were in sync with the Production deployments.
Most of the synchronization issues border on the lack of sync of the Database due to missing change scripts or missing DB objects. Oracle is our DB of preference.
A typical deployment in the Demo environment is a very painful process that involves testing an application and as issues occur due to missing DB table columns, functions, stored procs, we have to look for the missing DB objects, apply them to the DB and then continue until all issues are resolved.
How can I solve this problem to ensure smooth, painless and less time-consuming deployments? Can migrating our applications to Docker help with the DB synchronization issues and the associated lack of discipline of the developers? What process can we put into place to improve in this area?
Thank you very much in advance for your help.
Have a look # http://www.dbmaestro.com
I strongly recommend you to join the live demo session
DBmaetro TeamWork can help you merge the changes from multiple DBs into a single shared DB and to move safely the changes from one environment to the other
Danny

do any source-control systems use a document database for storage?

One of those questions that's difficult to google.
We were running into issues the other day with speed of our svn repository. The standard solution to this seems to be "more RAM! more CPU!" etc. Which got me to wondering, are there any source-control systems that use a document/nosql database (mongodb, couchdb etc) for database? It seems like it might be a natural -- but I'm no expert on source-control database theory. Perhaps there's a way to configure a more recent source control to use a document db as storage?
None that I know of do, and they wouldn't want to. Given the difference in degrees of testing, it would likely hurt robustness (a really bad thing for a source code repository). It would probably also end up hurting performance, because of the inability to do delta storage.
Note that Subversion has two very different storage mechanisms, one backed by the embedded Berkeley DB, and the other backed by simple files. One or the other of these might be better suited to your usage.
Also, since you posed your question pretty broadly, I'll comment on Git and TFS.
Git uses very efficiently packed files in the filesystem to store the repository. Frequently, the entire history is smaller than a checkout. For one very old project that my lab has, the entire history is 57MiB, and a working tree (not counting history) is 56MiB.
TFS stores a lot (possibly all) of its data in a SQL database.
Git uses memory-mapped files just like MongoDB :)
Though Git doesn't actually use MongoDB and I don't think it would want to. If you look at Git, it doesn't really need a NoSQL DB, it basically is a DB.
As far as i know no of the VCS uses noSQL/document based databases. The idea of using a couchdb etc. is not new...but no one has implemented such a thing till now...

Database Versioning - How does branch switching work?

This is a question for those of you developing on a team of devs where all of you have separate databases. You're versioning your database using source control and other tools which will automatically bring dev databases up to date to the latest version of the database (schema, data, SP's, functions, etc.).
OK Great! But wait! What if you are developing on version 4.0 of your software, but now you need to switch branches to the 3.2 branch to fix a bug? The schema could be (almost assuredly is) very different by now...
I suppose if you went through the extra effort to write rollback scripts along with your change scripts, this could work. But that seems like a lot of work - is it really worth it?
Much easier would be to create a new 3.2-branch database and work with that while working on the 3.2-branch code. It doesn't seem reasonable to me to require that each developer has exactly one database to work with.
I'm going on a limb and assume that you are versioning the database as a binary? If all your database assets were in the form of constructive code (eg SQL scripts and/or text data dumps), the solution would be simple, as suggested by Mark: store these assets as part of the development branch. To work on version 3.2, switch the branch, re-run the create scripts and presto, 3.2 database. Merging would be just as easy as with regular code (or just as painful, depending on your version control system of choice).
Here are some suggestions to work in this mode:
If creating the database instances from text is too slow, make a cache on a shared disk volume, keyed by the contents of all the schema / data files (or the MD5 sum thereof).
Write a pre-commit hook to ensure that the schema and data dumps in the developer's instance are the same as the ones under version control. This prevents people from making changes to their dev database with an interactive tool, and then forgetting to commit them.
You mention change scripts; treat them as a liability. While they may be required by your deployment scenario (eg for customers who want to upgrade in-place), they duplicate information from the version history of the database, and per Murphy's law duplication means desynchronization sooner or later. Try to auto-generate the change scripts from the versioned database assets using "diff"; or if this cannot be achieved, dedicate some serious unit tests to database upgrades.

What are the best practices for versioning with a data driven web app and many devs? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've read many answers to similar questions, but still didn't get to the answer that I was looking for.
We've got a group of about 12 devs and business analysts working on one app. It's an enormous application, I'd guess about 1000+ pages in a mix of ASP and ASP.NET.
What I'm wondering is how the pros manage versioning of a large app like this one? Especially how to manage deployments, database changes and source control. Do they build the source control procedures in such a way that the app can be rolled back to a stable point at any time? How does the database fit in to that? Are all database schema changes and procs etc stored in source control?
I think my ideal solution here is to be able to re-hydrate the entire app from scratch to a specific version, including data. Is that overkill?
Update: my first guess at the size of the app was way off. I did an actual count and came up with a bigger number. 90% of those pages are frozen for development or unused.
While you are certainly asking a question that covers a lot of ground, there are 3 major parts you need to have in place:
Use a version control system. Something like Subversion or GIT is going to automatically let you roll back the code of your application to any point in time, or to any point at which you "tagged" your source code. As long as everyone only commits code that builds and runs successfully, every commit will be a valid point to which you can roll changes back to. Your version control system will manage multiple lines of code for you as well with branches. There are many strategies for handling branches, find one that works best for your process.
Use a build server. A build server like cruise control, Hudson, or TeamCity is going to automatically ensure that every commit is a "good" commit that compiles and can run successfully (assuming you have tests in place)
Include your database schema and static database data with your code in version control. There are tools like LiquiBase that allow you to manage your database structure and are designed to work with multiple developers working concurrently, even across multiple branches. Your code is no good without the corresponding database structure, so you do need to make sure you keep both in sync. Storing your database changes in your source code repository is the easiest way to do that. That being said, you cannot store your full database in your repository. It is too big and it changes too much for your source control's diff/merge support to handle. Depending on your industry, it may also not be legal due to government regulations. If you find you need to roll back your production database due to a bad release, you will need to determine what SQL statements would best undo the changes applied by looking at your stored update scripts.
Perforce has a great white paper addressing some of the issues you raise. From their white paper, that talks about web pages, you can extend the concept to stored procedures and scripts for generating/modifying tables.
As for being able to rehydrate from scratch, that really depends on your disaster recovery plans, as well as, how you setup and configure your development, and integration testing environments; and how much downtime you are willing to accept. I don't deal with these issues day to day, so perhaps people who have a lot more real world experience, specially people on the Operations side of things can impart their wisdom.
120 pages is "enormous"? Wow...
Versioning an app like this is quite straightforward. You have a production rollout process that includes tagging every release in the revision control system before it goes out, and only installing tagged releases. You back up the database at the time of the upgrade, before changing the schema, and store it somewhere with a descriptive name that allows you to link the code and DB backup. If you need to roll back, you just install the old code and restore the backup.
If you need to rollback, but with the current data, well, that's a harder problem, and is more about how your database is structured and how it's structure is evolved, but Rails migrations are an interesting study in how it can be done.
Beyond that, your question is pretty huge, and touches on a lot of areas. It might be best to contract someone who has dealt with all these things before and put you on the right path.
This is really quite a normal, and moderately-sized application. You just need to follow good source control processes, use proper unit testing, continuous integration, etc. Many, many, development organizations have solved this problem.
With all due respect, the fact that you consider 120 pages to be enormous, and the fact that you feel this is a big deal, both indicate to me that you should consider stopping development Right Now, and improving your development process.
I'm concerned that if you don't do that now, you'll learn later why you should have done so.
For something this small, rather then worry about being able to roll way back in time, what you are really after is the ability to roll back the most current release to the last one. That way if you break the build, you can roll back to the build prior. How?
1) Your web server reads from /www/working
2) Push your changes into /www/new
3) Move /www/current to /www/old
4) Move /www/new to /www/current
If /www/current blows up, just move /www/old back in its place.
The only catch is rolling back the database schema isn't part of this. Rolling back schema changes is difficult, if not impossible. All you can really do is restore the old version from backup and loose any activity between the live database and what was on backup. This makes sense if you think about it--if you add three new tables and somehow refactor another table into a better schema, how can you roll back the schema change after that? You can't--you just have to loose all the new data. Moral? Take your database changes seriously--there is a reason DBA's are sometimes considered assholes. They have to be--they can't just roll back mistakes in the database like you can roll back your buggy code.