How does one synchronise on premise MongoDB with an Azure DocumentDB? - mongodb

I'm looking for the best way that i can synchronise a on premise MongoDB with an Azure DocumentDB . the idea is this can synchronise on a predetermined time, for example every 2 hours.
I'm using .NET and C#.
I was thinking that I can create a Windows Service that retrieves the documents from de Azure DocumentDB collections and inserts the documents on my on premise MongoDB.
But I'm wondering if there is any better way.

Per my understanding, you could use Azure Cosmos DB Data Migration Tool to Export docs from collections to JSON file, then pick up the exported file(s) and insert / update into your on-premise MongoDB. Moreover, here is a tutorial about using the Windows Task Scheduler to backup DocumentDB, you could follow here.
When executing the export operation, you could export to a local file or Azure Blob Storage. For exporting to the local file, you could leverage the FileTrigger from Azure WebJobs SDK Extensions to monitor the file additions / changes under a particular directory, then pick up the new inserted local file and insert into your MongoDB. For exporting to Blob storage, you could also work with WebJobs SDK and use the BlobTrigger to trigger the new blob file and do the insertion. For the blob approach, you could follow How to use Azure blob storage with the WebJobs SDK.

Related

How to take the backup of on-premises mongodb from azure

Is there any possibility to take the backup of mongodb which hosted in Azure-vm(ubuntu18.04)from azure services?
I already have the script to take the backup of mongodb and send it to azure blob storage.
But I don't want to take the backup in azure-VM server and send it to blob.
Instead is there any other way to take backup from any azure services?
I analyzed and found that we can migrate the database to azure cosmos and take backup. I can't afford to pay cost for that. So i want to know any other way to take backup directly from azure

Is it a good idea to backup an Azure VM running MongoDB using Azure Backup?

We have a VM that runs a MongoDB and a web api to it.
Currently there is no backup solution enabled.
My goal is to protect data in MongoDB.
Is it a good idea to snapshot the whole Azure VM instance daily?
Or should I go for a database backup solution?

Loading data from S3 to PostgreSQL RDS

We are planning to go for PostgreSQL RDS in AWS environment. There are some files in S3 which we will need to load every week. I don't see any option in AWS documentation where we can load data from S3 to PostgreSQL RDS. I see it is possible for Aurora but cannot find anything for PostgreSQL.
Any help will be appreciated.
One option is to use AWS Data Pipeline. It's essentially a JSON script that allows you to orchestrate the flow of data between sources on AWS.
There's a template offered by AWS that's setup to move data between S3 and MySQL. You can find it here. You can easily follow this and swap out the MySQL parameters with those associated with your Postgres instance. Data Pipeline simply looks for RDS as the type and does not distinguish between MySQL and Postgres instances.
Scheduling is also supported by Data Pipeline, so you can automate your weekly file transfers.
To start this:
Go to the Data Pipeline service in your AWS console
Select "Build from template" under source
Select the "Load S3 to MySQL table" template
Fill in the rest of the fields and create the pipeline
From there, you can monitor the progress of the pipeline in the console!

how we can do automatic backup for compute engine disk everyday ? in google cloud

I have created instance in compute engine with windows server 2012. i cant see any option to take automatic backup for instance disk database everyday. there is option of snapshot but we need to operate this manually. please suggest any way to backup automatically and can be restore able on a single click. if is there any other possibility using cloud SQL storage or any other storage please recommend.
thanks
There's an API to take snapshots, see API section here:
https://cloud.google.com/compute/docs/disks/create-snapshots#create_your_snapshot
You can write a simple app to get triggered from Cron or something to take a snapshot periodically.
You have no provision for automatic back up for compute engine disk. But you can do a manual disk backup by creating a snapshot.
Best alternative way is to create a bucket and move your files there. Google cloud buckets have automated back up facility available.
Cloud storage and cloud SQL are your options for automated back ups in google cloud.

How to continuously write mongodb data into a running hdinsight cluster

I want to keep a windows azure hdinsight cluster always running so that I can periodically write updates from my master data store (which is mongodb) and have it process map-reduce jobs on demand.
How can periodically sync data from mongodb with the hdinsight service? I'm trying to not have to upload all data whenever a new query is submitted which anytime, but instead have it somehow pre-warmed.
Is that possible on hdinsight? Is it even possible with hadoop?
Thanks,
It is certainly possible to have that data pushed from Mongo into Hadoop.
Unfortunately HDInsight does not support HBase (yet) otherwise you could use something like ZeroWing which is a solution from Stripe that reads the MongoDB Op log used by Mongo for replication and then writes that our to HBase.
Another solution might be to write out documents from your Mongo to Azure Blob storage, this means you wouldn't have to have the cluster up all the time, but would be able to use it to do periodic map reduce analytics against the files in the storage vault.
Your best method is undoubtedly to use the Mongo Hadoop connector. This can be installed in HDInsight, but it's a bit fiddly. I've blogged a method here.