Using Amazon S3 as a File System for MongoDB - mongodb

I am deciding to use MongoDB as a Document management DB in my application. Initially I was thinking to use S3 as a data store but it seems mongoDB uses local file system to store data. Can I use S3 as data store in MongoDB.
thanx

Provisioned IOPS in AWS is ideal for MongoDB.
This link has notes about running MongoDB on AWS and is rather useful.

Related

Restore mongodb database stored in s3 bucket without loading to local machine

I have a mongodb back up in an s3 bucket (tar file) and I want to restore/ import it to a mongodb cloud atlas instance without having data pass through my local machine since it is a very large file, is this possible?
You need some compute resource to do it. Try Lambda function. Other possibility would be an EC2 or Docker Lambda (if limitations of standard Lambda are not enough for you)

How to dump file data into AWS ElasticSearch

I have a large json text file (line terminated json documents generated by querying DynamoDB on based on a filter criteria). What is the best way to index these documents into AWS managed ElasticSearch service.
If your data is in from of a local file or in EC2, you can consider using Logstash.
The open source version of Logstash (Logstash OSS) provides a convenient way to use the bulk API to upload data into your Amazon Elasticsearch Service (Amazon ES) domain.
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains-logstash.html

How to store files larger than 16mb in AWS DocumentDB?

I am switching from MongoDB to AWS DocumentDB. However, in MongoDB, I used GridFs to store and retrieve files larger than 16MB. But this is not supported in AWS DocumentDb. Is there any way on how to store or process large files(>16MB) in AWS DocumentDB?
Any help or leads would be appreciated. Thanks!
GridFS is a client construct and should just work with DocumentDB. However, we do not test it and thus don't officially support it. Did you encounter any issues when using GridFS with DocumentDB?
If you switched to MongoDB Atlas which runs very nicely on to of AWS you would still be able to used GridFS to store your files.

How can I test the MariaDB CONNECT Storage Engine with MongoDB - Does a connector exist?

According to a few articles I've read (e.g. here and here), MariaDB supports connection, command and querying of external data sources using the CONNECT Storage Engine.
Specifically, I'd like to test with MongoDB. Is there a connector that I can download and documention specific to MongoDB? My Google searches so far have come up short.

How to continuously write mongodb data into a running hdinsight cluster

I want to keep a windows azure hdinsight cluster always running so that I can periodically write updates from my master data store (which is mongodb) and have it process map-reduce jobs on demand.
How can periodically sync data from mongodb with the hdinsight service? I'm trying to not have to upload all data whenever a new query is submitted which anytime, but instead have it somehow pre-warmed.
Is that possible on hdinsight? Is it even possible with hadoop?
Thanks,
It is certainly possible to have that data pushed from Mongo into Hadoop.
Unfortunately HDInsight does not support HBase (yet) otherwise you could use something like ZeroWing which is a solution from Stripe that reads the MongoDB Op log used by Mongo for replication and then writes that our to HBase.
Another solution might be to write out documents from your Mongo to Azure Blob storage, this means you wouldn't have to have the cluster up all the time, but would be able to use it to do periodic map reduce analytics against the files in the storage vault.
Your best method is undoubtedly to use the Mongo Hadoop connector. This can be installed in HDInsight, but it's a bit fiddly. I've blogged a method here.