Two-way replication between MongoDB and DynamoDB - mongodb

We certainly want to use separate databases since the front-end team finds it robust to work with MongoDB Atlas and AWS cloud architects find it easy to work with DynamoDB.
Our architecture:
Web application uses MongoDB to insert, update and retrieve data.
The MongoDB is synced in real-time with DynamoDB.
Background AWS services use DynamoDB for inserting, updating and retrieving data.
The changes in either DynamoDB or MongoDB are replicated to each other.
Tried so far:
We currently do have a sync in place with DyanmoDB streams and MongoDB atlas trigger to listen to changes on each database and forward them to the other. We use lambas for this, but our replication logic is not robust yet.
AWS Database Migration Service with ongoing replication has been suggested but haven't been able to get it to work in our use case. Perhaps, this is one option.
3rd party services like: https://www.cdata.com/sync/
Ideal Fit
The most ideal solution would be an AWS-based solution if not a reliable 3rd party service.
Greatly appreciate any resources or thoughts on this! :)

Related

AWS platform. Picking the right technologies

I am building an app that allows people to share items with other people in the community. I wanted to use AWS as my platform.
My idea was to use react Native for the app. AWS Cognito for the authentication. AWS lambda for the server calls. Relational database for storing data about the items and user data such as geolocation. Dynamodb for real-time chat, requests for borrowing and transaction data between users. My primary focus is low cost and I was thinking of using PostgresSQL for relational database.
What do you guys think of my database choices. Of course the PostgresSQL database on rds. Is there a flaw in database plan so far? Any help would be greatly appreciated.
I would probably just use DynamoDB for everything in your application. I don't see a real need to storing some of your data in an RDS database here. However if you definitely need a relational database, I would suggest AWS Aurora Serverless so that your entire application would be using serverless AWS services. Also, normal relational database connection pools don't work that well in AWS Lambda, so I would suggest using the new Data API.

How do I populate a google big table instance with data using an external url?

I've a google big table instance that need to be populate with data that are in a Postgres Database. My product team give a URL's that allow me to replicate the database. So using simple words I need to duplicate the Postgres database into the google instance and the way that my product team give me is using this url, how can I do this? any tutorial that can help me?
If you are already running PostgreSQL and would like to have a mirror of it on Google Cloud Platform, the best and simplest approach may be to run your own PostgreSQL instance on a Google Compute Engine virtual machine which can be done via several approaches, e.g.,
tutorial for launching PostgreSQL, or
click-to-deploy solution for PostgreSQL by Bitnami
Then, you would want to continuously mirror data from your local instance to the PostgreSQL instance running in Google Cloud to be able to query it. Another SO answer suggests that there are two major approaches to this:
Master/Master replication (Bucardo)
Master/Slave replication (Slony)
Based on your use case where you want to keep your local PostgreSQL instance as the canonical one, and just replicate to Google Cloud for the purpose of querying it, you want a Master/Slave replication, and have the PostgreSQL instance be the read-only replica, so you probably want to use the Slony approach.
For a more in-depth look at PostgreSQL solutions for high availability, load balancing, and replication, see the comparison in the manual.

OrientDB in Azure

We would like to use OrientDB Graph in an Azure environment. Does anybody has experience using it? We also would like to know if high availability from OrientDB is required under Azure cloud? Azure already offers high availability for Azure storage, Azure Drive and SQL. I understand that they have replications and load balancing built in.
This is super important because we prefer not to get into the business of replications and infrastructure management.
Thanks
So you can spin up 2 or more machines and install OrientDB on them, then configure them together as a distributed cluster. However I haven't been able to find any way that is simpler, easier to do. I am interested in this topic too.
Azure does have features such as geo-replication, which is protects your data against a major data-center incident but doesn't provide any performance benefit and will not make it highly available.
Although pretty reliable, occasionally Microsoft will reboot servers for updates, so to protect against downtime you can use affinity groups so that, of your 2 or more servers, one will always be online. This however does need to be used in conjunction with database replication and ideally load balancing.
It's also worth noting that OrientDB recommends clusters have an odd number of servers as this can prevent conflicts when synchronising data after a communication issue between the servers.
I am using it in amazon and I had to create a java project to monitor http requests inserts and queries. The queries are very fast but takes longer inserting data .
I recommend this type of graph database mode to decrease the time of the queries. Also if you have empty fields OrientDB manages very well compared to other databases .
If you need help with the java project can response to this post and I´ll help u.
I hope it helps. Good luck.

RDBMS persistence for couchbase

Folks,
We are evaluating distributed caching solutions for our application. We started with looking at Memcache, then expanded to look at Couchbase. One of our key requirements is the ability to back up the (in-memory) cache reliably to RDBMS and to restore from it in case of nod/cluster failure.
Our preferred option would be to have a configuration switch in couchbase that would cause it to back up new entries to RDBMS.
What we would like to avoid is writing application code that sends cache entries/refreshes explicitly to RDBMS.
Can anyone tell me if couchbase (cluster) can be configured to do so?
Thanks.
-Raj
Couchbase cannot be configured to write through to an RDBMS for backup. What you should take a look at is the Couchbase bucket, not the memcached bucket. The Couchbase bucket uses the memcached layer as a cache and provides replication and persistence out of the box. With this setup you do not need a separate RDBMS because Couchbase will take care of all of the persistence for you and it will replicate your data so that if you have server failures you can just failover any failed nodes and promote other replica nodes to active ones. Take a look at this page http://www.couchbase.com/couchbase-server/features and if you have any other architecture questions here then I would recommend posting them on the Couchbase forums http://www.couchbase.com/forums where some of the developers can give you some more in depth answers.

Azure Table Vs MongoDB on Azure

I want to use a NoSQL database on Windows Azure and the data volume will be very large. Whether a Azure Table storage or a MongoDB database running using a Worker role can offer better performance and scalability? Has anyone used MongoDB on Azure using a Worker role? Please share your thoughts on using MongoDB on Azure over the Azure table storage.
Table Storage is a core Windows Azure storage feature, designed to be scalable (100TB 200TB 500TB per account), durable (triple-replicated in the data center, optionally georeplicated to another data center), and schemaless (each row may contain any properties you want). A row is located by partition key + row key, providing very fast lookup. All Table Storage access is via a well-defined REST API usable through any language (with SDKs, built on top of the REST APIs, already in place for .NET, PHP, Java, Python & Ruby).
MongoDB is a document-oriented database. To run it in Azure, you need to install MongoDB onto a web/worker roles or Virtual Machine, point it to a cloud drive (thereby providing a drive letter) or attached disk (for Windows/Linux Virtual Machines), optionally turn on journaling (which I'd recommend), and optionally define an external endpoint for your use (or access it via virtual network). The Cloud Drive / attached disk, by the way, is actually stored in an Azure Blob, giving you the same durability and georeplication as Azure Tables.
When comparing the two, remember that Table Storage is Storage-as-a-Service: you simply access a well-known REST endpoint. With MongoDB, you're responsible for maintaining the database (e.g. whenever MongoDB Inc (formerly 10gen) pushes out a new version of MongoDB, you'll need to update your server accordingly).
Regarding MongoDB Inc's alpha version pointed to by jtoberon: If you take a close look at it, you'll see a few key things:
The setup is for a Standalone mongodb instance, without replica-sets or shards. Regarding replica-sets, you still get several benefits using the Standalone version, due to the way Blob storage works.
To provide high-availability, you can run with multiple instances. In this case, only one instance serves the database, and one is a 'warm-standby' that launches the mongod process as soon as the other instance fails (for maintenance reboot, hardware failure, etc.).
While 10gen's Windows Azure wrapper is still considered 'alpha,' mongod.exe is not. You can launch the mongod exe just like you'd launch any other Windows exe. It's just the management code around the launching, and that's what the alpa implementation is demonstrating.
EDIT 2011-12-8: This is no longer in an alpha state. You can download the latest MongoDB+Windows Azure project here, which provides replica-set support.
For performance, I think you'll need to do some benchmarking. Having said that, consider the following:
When accessing either Table Storage or MongoDB from, say, a Web Role, you're still reaching out to the Windows Azure Storage system.
MongoDB uses lots of memory for its own cache. For this reason, lots of high-scale MongoDB systems are deployed to larger instance sizes. For Table Storage access, you won't have the same memory-size consideration.
EDIT April 7, 2015
If you want to use a document-based database as-a-service, Azure now offers DocumentDB.
I have used both.
Azure Tables : dead simple, fast, really hard to write even simple queries.
Mongo : runs nicely, lots of querying capabilities, requires several instances to be reliable.
In a nutshell,
if your queries are really simple (key->value), you must run a cost comparison (mainly number of transactions against the storage versus cost of hosting Mongo on Azure). I would rather go to table storage for that one.
If you need more elaborate queries and don't want to go to SQL Azure, Mongo is likely your best bet.
I realize that this question is dated. I'd like to add the following info for those who may come upon this question in their searches.
Note that now, MongoDB is offered as a fully managed service on Azure. (officially in Beta as of Apr '15)
See:
http://www.mongodb.com/partners/cloud/microsoft
or
https://azure.microsoft.com/en-us/blog/announcing-new-mongodb-instances-on-microsoft-azure/
See (including pricing):
https://azure.microsoft.com/en-us/marketplace/partners/mongolab/mongolab/
My first choice is AzureTables because SAAS model and low cost and SLA 99.99% http://alexandrebrisebois.wordpress.com/2013/07/09/what-if-20000-windows-azure-storage-transactions-per-second-isnt-enough/
some limits..
http://msdn.microsoft.com/en-us/library/windowsazure/jj553018.aspx
http://www.windowsazure.com/en-us/pricing/calculator/?scenario=data-management
or AzureSQL for small business
DocumentDB
http://azure.microsoft.com/en-us/documentation/services/documentdb/
http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/
second choice is many cloud providers including Amazon offer S3
or Google tables https://developers.google.com/bigquery/pricing
nTH choice manage the SHOW all by myself have no sleep MongoDB well I will look again the first two SAAS
My choice if I am running "CLOUD" I will go for SAAS model as much as possible "RENT-IT"...
The question is what my app needs is it AzureTables or DocumentDB or AzureSQL
DocumentDB documentation
http://azure.microsoft.com/en-us/documentation/services/documentdb/
How Azure pricing works
http://azure.microsoft.com/en-us/pricing/details/documentdb/
this is fun
http://www.documentdb.com/sql/demo
At Build 2016 it was announced that DocumentDB would support all MongoDB drivers. This solves some of the lack of tooling issues with DocDB and also makes it easier to migrate Mongo apps.
Above answers are all good - but the real answer depends on what your requirements are. You need to understand what size of data you are processing, what types of operations you want to perform on the data and then select the solution that meets your needs.
One thing to remember is Azure Table Storage doesn't support complex data types.It supports every property in entity to be a String or number or boolean or date etc.
One can't store an object against a key,which i feel is must for NoSql DB.
https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/understanding-the-table-service-data-model scroll to Property Types