I'm considering using Heroku as a platform for a project I'm working on. This project will have many independent databases (postgres). Each database will spin up when someone is using it, then save the data to a dump file and spin down when no one is logged on (if all these databases are always active it will be colossally expensive).
Unfortunately I have no experience with Heroku and their documentation has an annoying marketing slant to it--I can't figure out if this is possible. How do I pay for the storage of backups? Is it possible to store backups without an associated running database?
My alternative is to build this on Amazon, but I'd rather not do all this engineering myself.
Many thanks in advance.
The Postgres Schemas approach might fit well for your example of multi-tenacy.
This Blog Post and RailsCast might help you further.
Spinning up multiple databases sounds like fighting the defaults, for which concerns?
Related
I have this challenge. I am the DevOps engineer and a software engineer in a team where months back, the developers moved from having a central Oracle DB to having the DB on a CentOS VM on their individual laptops. The move from a central DB was to reduce dependency on the DBAs and also to eliminate issues that stemmed from inconsistent data.
The plan for sharing and ensuring synchronization of the Database with everyone on the team was that each person will share change scripts with everyone. The problem is that we use Skype for communication (we just setup slack but are yet to start using it fully), and although people sometimes post the text of DB change scripts, it could be missed by some. The other problem is that some developers miss posting the changes. Further, new releases are deployed in Production without being deployed on the Test and Demo environments.
This has posed a serious challenge for us, especially myself who of recent, became responsible for ensuring that our Demo deployments were in sync with the Production deployments.
Most of the synchronization issues border on the lack of sync of the Database due to missing change scripts or missing DB objects. Oracle is our DB of preference.
A typical deployment in the Demo environment is a very painful process that involves testing an application and as issues occur due to missing DB table columns, functions, stored procs, we have to look for the missing DB objects, apply them to the DB and then continue until all issues are resolved.
How can I solve this problem to ensure smooth, painless and less time-consuming deployments? Can migrating our applications to Docker help with the DB synchronization issues and the associated lack of discipline of the developers? What process can we put into place to improve in this area?
Thank you very much in advance for your help.
Have a look # http://www.dbmaestro.com
I strongly recommend you to join the live demo session
DBmaetro TeamWork can help you merge the changes from multiple DBs into a single shared DB and to move safely the changes from one environment to the other
Danny
I am planning a large project that is going to use Neo4j as main storage to model the data, but I was wondering if I should additionally use something like MySQL as a parallel structure to back the user administration subsystem. It would have the advantage that I could use something with which I have much more experience (18 Years) to do the very tedious job of login and account administration. The disadvantage would be the overhead of keeping two databases in sync, since I do plan to have the users also represented as nodes in Neo4j with a small subset of the user information. I know that it depends a lot on the details of the project, but if anybody has experience in the field with a similar set up it would be very appreciated.
Rather than building replication, stick to your chosen database & build administration tools and UI. Replication is complex & difficult, admin tools hopefully less so.
If you can't get the data out at least in CSV for reporting, maybe reconsider choice of DB.
I've been using MongoDB for about a year now, however not nearly up to its potential.
I've been developing new software, out of anyone's eyes except my own, and I've enjoyed the flexibility of the database to its fullest and I've made major structural changes to data on the fly.
Now I'm at a point where I have production server(s) and 3 development servers, I'm having a real problem with changing data structures and syncing them up.
Theoretically the development servers should always have the most current data from production. In a structured database, if I rename something, I can just run a compare tool and do the corresponding change in production after a pull. In MongoDB, this can become incredibly difficult.. there could be hundreds of changes from document to document, much less from database to database.
I've been reviewing my ~/.dbshell file to kinda get the feel of changes I've made, but what about changes made within the program its self? Configuration database changes?
Are there tools or procedures that are around to make this easier?
I've spent hours on Google researching how others do it. I came across Mongeez, but it's more manual and tedious than I need. In the past, I just do a mongodump and mongorestore inside of a git directory to transport data, but these snapshots are too rigid. I read a few blog posts regarding moving new data from production to development, but nothing about updating development documents in production. I could write a comparison script, but I feel like this is reinventing the wheel. There has to be a better way.
TL;DR: What are some ways to version NoSQL data, new entries and changed data, between environments?
I had a similar problem/experience while managing a few production Mongo machines for about a year.
Two quick pieces of advice:
WiredPrairie is right. Version your documents and that will allow you to migrate in a casual/relaxed manner. I wish we had done that up front. One of my biggest regrets.
We used Groovy to connect and do our schema/data changes and I loved it. The language is easy to learn and it works great with JSON. My practice was to back up the collections I'd be operating on, write the scripts in dev, run them and if I messed up, restore the backed up collections. Iterate until I got the scripts perfect and then repeat in production.
I keep a content revision history for a certain content type. It's stored in MongoDB. But since the data is not frequently accessed I don't really need it there, taking up memory. I'd put it in a slower hard disk database.
Which database should I put it in? I'm looking for something that's really cheap and with cloud hosting available. And I don't need speed. I'm looking at SimpleDB, but it doesn't seem very popular. rdbms doesn't seem easy enough to handle since my data is structured into documents. What are my options?
Thanks
Depends on how often you want to look at this old data:
Why don't you mongodump it to your local disk and mongorestore when you want it back.
Documentation here
OR
Setup a local mongo instance and clone the database using the information here
Based on your questions and comments, you might not find the perfect solution. You want free or dirt cheap storage, and you want to have your data available online.
There is only one solution I can see feasible:
Stick with MongoDB. SimpleDB does not allow you to store documents, only key-value pairs.
You could create a separate collection for your history. Use a cloud service that gives you a free tier. For example, http://MongoLab.com gives you 240Mb free tier.
If you exceed the free tier, you can look at discarding the oldest data, moving it to offline storage, or start paying for what you are using.
If you data grows a lot you will have to make the decision whether to pay for it, keep it available online or offline, or discard it.
If you are dealing with a lot of large objects (BLOBS or CLOBS), you can also store the 'non-indexed' data separate from the database. This keeps the database both cheap and fast. The large objects can be retrieved from any cheap storage when needed.
Cloudant.com is pretty cool for hosting your DB in the cloud and it uses Big Couch which is a nosql thing. I'm using it for my social site in the works as Couch DB (Big Couch) similar has an open ended structure and you talk to it via JSON. Its pretty awesome stuff but so weird to move from SQL to using Map-Reduce but once you do its worth it. I did some research because I'm a .NET guy for a long time but moving to Linux and Node.js partly out of bordom and the love of JavaScript. These things just fit together because Node.js is all JavaScript on the backend and talks seemlessly to Couch DB and the whole thing scales like crazy.
I have a standalone network device. It needs to be reworked to function as part of a geographically distributed group of these devices. Synchronization between devices in the group need not occur frequently, not more than hourly. The application is rails with SQLite.
Mainly, we want to keep certain pieces of information collected on these devices in sync. Because of the deployment, it isn't feasible to add a large database cluster.
I have been considering CouchDB since replication and handling conflicts resulting from replication is a strong suit of its.
What do you think of CouchDB as a mechanism to keep distributed network devices synchronized? Any thoughts or suggestions for an alternative approach?
What is the particular question?
CouchDB implements master-master replication which is exactly what you are asking for.
Or?
CouchDB would be a great fit for this, because as you say, it has master-master replication. Since you're replicating over the WAN, another huge add is that CouchDB was designed to handle going on and off the network gracefully, which will be a nice piece of fault tolerance.
A lot of people have used CouchDB for this type of situation. Take a look at some case studies (http://www.couchbase.com/customers/case-studies) and a recent blog post I wrote about using CouchDB to keep front end servers' session data synchronized (weblog.bocoup.com/storing-php-sessions-in-couchdb).
Also, it would help if you posted more information about your case so that we can help cater our answers.
Cheers.
CouchDB is fine. You might have some alternatives with Unix tools.
The simplest key/value database is files in a filesystem. They work great. If you only need key/value storage with basic replication, then rsync can do that. If your conflict resolution policy is, for example, always take the latest timestamped data, then you might get away with rsync.
First of all, you're probably running Unix/Linux. SSH and rsync will be included, unlike CouchDB.
Another advantage of rsync (actually its SSH tunnel) is of course identification, authentication, and authorization. Your device is presumably Unix/Linux, and there are a million ways to wire up Unix authorization. It's not a guarantee but nearly anything is doable: password files, NIS, LDAP, Kerberos, Samba/Active Directory. The list goes on.
With Couch you will have to figure out some kind of user management system.
Will you use oauth?
Will you have to write an authentication plugin?
Will you also replicate the _users database around? What about conflicts in the _users database?
Do you instead have a central _users database? How can you have a central users database if you can't have a central data database?
Couch, like MySQL, is a full-blown server. It will maintenance load that rsync won't.
Remember to compact your databases, compact your views, and run view cleanup
Remember to rotate the log files
Possibly back up your .couch files and your .ini config
In other words, can you do a quick and dirty rsync hack, or do you need the full Couch package?
CouchDB is a uniform, consistent platform regardless of OS. That can be good or bad. Not knowing your specifics, I would guess that rsync over SSH is the best short-term, but Couch is the best long-term. (But with so many software projects, long-term never seems to arrive.)