Incremental backups from server to local machine - mongodb

My live site is using mongodb to store user activities on the site.
I am having a single server running monogdb. I cant afford a second server for master slave replication.
my problem is i want to take the dump of server's mongodb database everyday and restore it to my local machine so that i can query on my local machine.I know how to dump and restore but the issue is every day i have to dump the entire database from server and restore it from the scratch in my local machine ..it takes a lot of time.
so my question is ..is there any way to have incremental backup in mongodb so that i have to dump and restore only single day data as it will take less time.

i do not know much about mongodb, but i have an idea.
i think you can introduce your local mongodb instance as a slave of master production db, and make slave only writable if possible, for preventing live system making selects from your local.
this way can work because slaves keeps track of master writes and deletes and try to make themselves as a copy of master.
And there is a good reason to do that is a slave doesn't have to be online always, when it becomes online, slave will check masters list (this list lenght like 1hour or 1 day is configurable at master) and copy datas from master as quick as possible.
Once you dump master to your local, then you can backup your data twice a day with this method i think.

Related

SQL Live Backup Over Intermittent Connection

I have a few PCs that have local PostgreSQL databases running, just logging data. Data is only ever inserted, never removed or updated. The remote PCs are connected to the internet by cellular modem and depending on their location, often do not have internet access. When they do have an internet connection I would like them to push a copy of their databases to a central location and keep the remote database up to date with any new data. Essentially, I need an 'rsync' for databases.
At first it seemed like what I need is to set up PostgreSQL Hot-Standby but I'm unsure if this is actually what I need because my situation seems to differ from the examples I've seen.
Each remote PC has a Postgres server with a single database that has a unique name, the tables within the DBs have generic names. I would like to synchronize these databases to a single remote Postgres server. I think this should be okay due to the unique DB names.
My connectivity is very intermittent, days to weeks without a connection. I've seen PgAdmin be very reliable despite a terrible (cellular) internet connection, if Postges Hot-Standby is the same I may be alright.
As far as I can see my options are either to set up PostgreSQL Hot-Standby, or roll my own solution. I don't want to roll my own solution. However it is simple enough if I can't find anything better; a Python daemon run by systemd to find the diff between the local and remote DB, then push the new rows from the local to the remote DB. But I'm sure someone has solved this problem, I just haven't found the solution yet.
You don't need hot standby (which is the PostgreSQL term for being able to query the replicated database), but streaming replication. You need a central standby server for each intermittently connected remote database server. If you use replication slots, you can be sure that replication will never fall behind.

Is it possible to dump the Mongodb database (the only one of many on the host) while the database is running?

Is it possible to dump the database (the only one of many on the host) while the database and related services are running?
At this time, data will continue to be written to the database. It is necessary to transfer the database to another host without using replication (different versions of the database) and at the same time reduce the downtime of services.
This scenario is assumed:
Start backing up data from the working database.
Wait until most of the backup is complete.
Disable services and write data to the database (downtime).
Finish data backup (downtime).
It is assumed that we have a complete data dump here.
Recover data on a new host (downtime).

MongoDB 2.2: why didn't replication catch up a collection following a dump/restore?

We have a three-server replicaset running MongoDB 2.2 on Ubuntu 10.04, and recently had to upgrade the hard drive for each server where one particular database resides. This database contains log information for web service requests, where they write to collections in hourly buckets using the current timestamp to determine the name, e.g. log_yyyymmddhh.
I performed this process:
backup the database on the primary server with mongodump --db log_db
take a secondary server offline, replace the disk
bring the secondary server up in standalone mode (i.e. comment out the replSet entry
in /etc/mongodb.conf before starting the service)
restore the database on the secondary server with mongorestore --drop --db log_db
add the secondary server back into the replicaset and bring it online,
letting replication catch up the hourly buckets that were updated/created
while it had been offline
Everything seemed to go as expected, except that the collection which was the current bucket at the time of the backup was not brought up to date by replication. I had to manually copy that collection over by hand to get it up to date. Note that collections which were created after the backup were synched just fine.
What did I miss in this process that caused MongoDB not to get things back in synch for that one collection? I assume something got out of whack with regard to the oplog?
Edit 1:
The oplog on the primary showed that its earliest timestamp went back a couple of days, so there should have been plenty of space to maintain transactions for a few hours (which was the time the secondary was offline).
Edit 2:
Our MongoDB installation uses two disk partitions: /dev/sda1 and /dev/sdb1. The primary MongoDB directory /var/lib/mongodb/ is on /dev/sda1, and holds several databases, while the log database resides by itself on /dev/sdb1. There's a sym link /var/lib/mongodb/log_db which points to a directory on /dev/sdb1. Since the log db was getting full, we needed to upgrade the disk for /dev/sdb1.
You should be using mongodump with the --oplog option. Running a full database backup with mongodump on a replicaset that is updating collections at the same time may not leave you with a consistent backup. This becomes worse with larger databases, more collections and more frequent updates/inserts/deletes.
From the documentation for your version (2.2) of MongoDB (it's the same for 2.6 but just to be as accurate as possible):
--oplog
Use this option to ensure that mongodump creates a dump of the
database that includes an oplog, to create a point-in-time snapshot of
the state of a mongod instance. To restore to a specific point-in-time
backup, use the output created with this option in conjunction with
mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump
operation, the dump will not reflect a single moment in time. Changes
made to the database during the update process can affect the output
of the backup.
http://docs.mongodb.org/v2.2/reference/mongodump/
This is not covered well in most MongoDB tutorials around backups and restores. Generally you are better off if you can perform a live snapshot of the storage volume your database resides on (assuming your storage solution has a live snapshot ability compatible with MongoDB). Failing that, your next best bet is taking a secondary offline and then performing a snapshot or backup of the database files. Mongodump on a live database is increasingly a less optimal solution for larger databases due to performance issues.
I'd definitely take a look at the MongoDB overview of backup options: http://docs.mongodb.org/manual/core/backups/
I would guess this has to do with the oplog not being long enough, although it seems like you checked that and it looked reasonably big.
Still, when adding new members to a replica set you shouldn't be snapshotting and restoring them. It's better to simply add a new member and let replication happen by itself. This is described in the Mongo docs and is the process I've always followed.

MongoDB Master and Slave at same time

I have a server that I want to use for testing new app verson (say staging server), but at same time I want to use it as replication slave for MongoDB. So, there is two roles:
always replicate an database to this server (only one database, original, with real data)
after deployment, make a copy of original db, to a new one (*-staging db), and test my deployment against this database
I see from docs how to replicate only specified database from one server to another, seems that it's working fine. But the problem that when i've tried to make a copy of existing database, on slave server, it fails with error not master. I don't want to make this database copy on master server, because it means that all staging tests will be executed against master server, that doesn't work for me.
Does it mean that I can't have MongoDB master for one database, and slave for another?
Slaves by default are read only but you can achieve what you are trying to do by making it master and slave at the same time by passing both --master and --slave when starting your server:
mongod --slave --source master:1234 --master

PostgreSQL - using log shipping to incrementally update a remote read-only slave

My company's website uses a PostgreSQL database. In our data center we have a master DB and a few read-only slave DB's, and we use Londiste for continuous replication between them.
I would like to setup another read-only slave DB for reporting purposes, and I'd like this slave to be in a remote location (outside the data center). This slave doesn't need to be 100% up-to-date. If it's up to 24 hours old, that's fine. Also, I'd like to minimize the load I'm putting on the master DB. Since our master DB is busy during the day and idle at night, I figure a good idea (if possible) is to get the reporting slave caught up once each night.
I'm thinking about using log shipping for this, as described on
http://www.postgresql.org/docs/8.4/static/continuous-archiving.html
My plan is:
Setup WAL archiving on the master DB
Produce a full DB snapshot and copy it to the remote location
Restore the DB and get it caught up
Go into steady state where:
DAYTIME -- the DB falls behind but people can query it
NIGHT -- I copy over the day's worth of WAL files and get the DB caught up
Note: the key here is that I only need to copy a full DB snapshot one time. Thereafter I should only have to copy a day's worth of WAL files in order to get the remote slave caught up again.
Since I haven't done log-shipping before I'd like some feedback / advice.
Will this work? Does PostgreSQL support this kind of repeated recovery?
Do you have other suggestions for how to set up a remote semi-fresh read-only slave?
thanks!
--S
Your plan should work.
As Charles says, warm standby is another possible solution. It's supported since 8.2 and has relatively low performance impact on the primary server.
Warm Standby is documented in the Manual: PostgreSQL 8.4 Warm Standby
The short procedure for configuring a
standby server is as follows. For full
details of each step, refer to
previous sections as noted.
Set up primary and standby systems as near identically as possible,
including two identical copies of
PostgreSQL at the same release level.
Set up continuous archiving from the primary to a WAL archive located
in a directory on the standby server.
Ensure that archive_mode,
archive_command and archive_timeout
are set appropriately on the primary
(see Section 24.3.1).
Make a base backup of the primary server (see Section 24.3.2), and load
this data onto the standby.
Begin recovery on the standby server from the local WAL archive,
using a recovery.conf that specifies a
restore_command that waits as
described previously (see Section
24.3.3).
To achieve only nightly syncs, your archive_command should exit with a non-zero exit status during daytime.
Additional Informations:
Postgres Wiki about Warm Standby
Blog Post Warm Standby Setup
9.0's built-in WAL streaming replication is designed to accomplish something that should meet your goals -- a warm or hot standby that can accept read-only queries. Have you considered using it, or are you stuck on 8.4 for now?
(Also, the upcoming 9.1 release is expected to include an updated/rewritten version of pg_basebackup, a tool for creating the initial backup point for a fresh slave.)
Update: PostgreSQL 9.1 will include the ability to pause and resume streaming replication with a simple function call on the slave.