AWS glue with MongoDB Atlas - mongodb

I've tried multiple things to try to connect AWS glue to MongoDB atlas. Has someone been successful in doing so and if so, please can someone help me with the steps.
The AWS documentation claims that it should work with any compatible MongoDB link but it doesn't.

I am facing a similar issue. I checked with the AWS support team and it seems like they have a huge backlog of similar issues where customers have requested the ability to connect to MongoDB Atlas. Unfortunately, they don't have an ETA for this.
Either you can opt to migrate to AWS Document DB and then use Glue to crawl your data store or you probably have to think of some other way to get your data from atlas to a layer that is supported properly by Glue.

Related

Is ToroDB available for aws DocumentDB

Hi I'd like to move some data from an AWS DocumentDB to a PostgreSQL, I've found on the web ToroDB that would be the best tool for my task. It's implemented for MongoDB I'd like to know if I can use it also in AWS DocumentDB (with MongoDB compatibility)
Thank you in advice!
From what I can see in ToroDB documentation , is using the oplog to tail the changes in MongoDB, not change streams. Amazon DocumentDB doesn't have the oplog collection, but does supports change streams. Unless ToroDB gets updated to support change streams (which I doubt it will, documentation says only MongoDB 3.2 and 3.4 are supported and last Github update is from 5 years ago), is not going to be able to live replicate from DocumentDB.
However, if the destination is AWS RDS PostgreSQL or AWS Aurora PostgreSQL, then AWS DMS can be used to replicate from DocumentDB.

Migrate data from Citus to RDS

Since Citus is not going to be available as a Managed Service in AWS, I am trying move the database to RDS (not the whole history but only the transactional portion as an OLTP). The migration from Citus is not clear because the data does not reside in a single node. I want to check the options we might have to move data from Citus to RDS.
Amazon DMS: This option is good for the supported databases (PostgreSQL) but we do not know what behavior this will have in Citus from the distributed nature of the engine. Has someone migrated the data to S3, to another DB or something in these lines?
I saw this paper from AWS https://d1.awsstatic.com/whitepapers/aws-cloud-data-ingestion-patterns-practices.pdf?did=wp_card&trk=wp_card on how to ingest data from different sources and DMS seems like a good option but I do not know the internals of Citus that well to tell if we will get all the data and gather the CDC correctly.
A Custom migration: Via a support ticket, we can access the S3 buckets that Citus uses for Disaster recovery where the WAL logs are available and we could use something like WAL-G to take those logs and replicate them in a Postgres instance. The issue here is that this is a very custom migration and the development time might be too high.
Is there any other option to move data from Citus to RDS or Aurora in AWS, what looks like a good path to make the database migration? All the documents refer to move data the other way around, from Aurora or RDS to Citus.
Sumedh from Citus Cloud here. Please go ahead and open a support ticket with us to further investigate solutions. We can evaluate if using DMS is a viable approach for your use-case.

How to store files larger than 16mb in AWS DocumentDB?

I am switching from MongoDB to AWS DocumentDB. However, in MongoDB, I used GridFs to store and retrieve files larger than 16MB. But this is not supported in AWS DocumentDb. Is there any way on how to store or process large files(>16MB) in AWS DocumentDB?
Any help or leads would be appreciated. Thanks!
GridFS is a client construct and should just work with DocumentDB. However, we do not test it and thus don't officially support it. Did you encounter any issues when using GridFS with DocumentDB?
If you switched to MongoDB Atlas which runs very nicely on to of AWS you would still be able to used GridFS to store your files.

mongodb atlas + aws lambda cannot setup security whitelist?

I have been reading these few posts but they are a bit outdated so I just want to confirm if my understanding is correct up to date.
I am currently playing around with mongodb atlas and using the free tier only.
I am thinking of using aws lambda to periodically access mongodb to make a backup to s3.
To my understanding, I have to turn on 0.0.0.0 which I do not want to do at all because it'll be a bad idea if it's a production db though.
Through the readings and up to today. If I don't want to add 0.0.0.0 into mongodb atlas whitelist I have to create a VPC in aws just for the lambda function right?
Also, free tiers at mongodb atlas does not support the connection?
Thanks in advance for any help and suggestions to my understanding.
These are the posts I read
https://www.mongodb.com/blog/post/introducing-vpc-peering-for-mongodb-atlas
https://www.mongodb.com/blog/post/serverless-development-with-nodejs-aws-lambda-mongodb-atlas
AWS Lambda To Atlas

MongoLab and Elasticsearch

My Mongo database is hosted at MongoLab. I'd like to use ElasticSearch as a full text search engine on top of my DB.
As I understand MongoDB needs to run as a replica-set, but I don't have any control on how the database run. I'm currently using the 500mb free plan.
On the top of that, I'm using the scala playframework.
Was anyone successful with those technologies and services?
Update:
Finally I'm not using MongoDB anymore, and went straight for a ElasticSearch solution.
I found this nice cloud host providing a 500MB free plan http://facetflow.com/
It was very useful for my development.
I didn't find any satisfying Scala library for ES, therefore I'm using Dispatch and make direct http requests to the ES instance.
I hope that someone will find this useful.
Just a quick note ... MongoHQ has oplog support with their MongoDB Elastic Deployments ... those could help you with using Elastic Search and River.
http://blog.mongohq.com/elastic-deployments-now-with-oplog-access/
I haven't looked into this too deeply, but you might want to check out Searchly http://www.searchly.com/features/ . The features mention
Built-in crawler for crawling web pages and databases. (Currently MongoDB)
If you try this out, please let me know how it goes. I will do the same.
Update:
I haven't tried searchly, but I was able to start a MongoDB instance in replica mode on OpenShift.
I have also an Elastic Search server running on the same OpenShift "gear".
Now I need time to try connecting those two together, and then the fun will start :-)