Restore mongodb database stored in s3 bucket without loading to local machine - mongodb

I have a mongodb back up in an s3 bucket (tar file) and I want to restore/ import it to a mongodb cloud atlas instance without having data pass through my local machine since it is a very large file, is this possible?

You need some compute resource to do it. Try Lambda function. Other possibility would be an EC2 or Docker Lambda (if limitations of standard Lambda are not enough for you)

Related

Google Cloud SQL PostgreSQL replication?

I want to make sure that there's not a better (easier, more elegant) way of emulating what I think is typically referred to as "logical replication" ("logical decoding"?) within the PostgreSQL community.
I've got a Cloud SQL instance (PostgreSQL v9.6) that contains two databases, A and B. I want B to mirror A, as closely as possible, but don't need to do so in real time or anything near that. Cloud SQL does not offer the capability of logical replication where write-ahead logs are used to mirror a database (or subset thereof) to another database. So I've cobbled together the following:
A Cloud Scheduler job publishes a message to a topic in Google Cloud Platform (GCP) Pub/Sub.
A Cloud Function kicks off an export. The exported file is in the form of a pg_dump file.
The dump file is written to a named bucket in Google Cloud Storage (GCS).
Another Cloud Function (the import function) is triggered by the writing of this export file to GCS.
The import function makes an API call to delete database B (the pg_dump file created by the export API call does not contain initial DROP statements and there is no documented facility for adding them via the API).
It creates database B anew.
It makes an API call to import the pg_dump file.
It deletes the old pg_dump file.
That's five different objects across four GCP services, just to obtain already existing, native functionality in PostgreSQL.
Is there a better way to do this within Google Cloud SQL?

Best way to set up jupyter notebook project in AWS

My current project have the following structure:
Starts with a script in jupyter notebook which dowloads data from a CRM API to put in a local PostgressSql database I run with PgAdmin. After that it runs cluster analysis, return some scoring values, creates a table in database with the results and updates this values in the CRM with another API call. This process will take between 10 to 20 hours (the API only allows 400 requests per minute).
The second notebook reads the database, detects last update, runs api call to update database since the last call, runs kmeans analysis to cluster the data, compare results with the previous call, updates the new ones and the CRM via API. This second process takes less than 2 hours in my estimation and I want this script to run every 24 hours.
After testing, this works fine. Now I'm evaluating how to put this in production in AWS. I understand for the notebooks I need Sagemaker and from I have seen is not that complicated, my only doubt here is if I can call the API without implementing aditional code or need some configuration. My second problem is database. I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3. My goal is to write the less code as possible, but a have try some tutorial of RDS like this one: [1]: https://www.youtube.com/watch?v=6fDTre5gikg&t=10s, and I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance?? and how to connect to it to analysis this data from SageMaker. My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud. Just some orientation about how to use this tools would be appreciated.
I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3
RDS and Aurora are relational databases fully managed by AWS. "Regular" RDS allows you to launch the existing popular databases such as MySQL, PostgreSQSL and other which you can launch at home/work as well.
Aurora is in-house, cloud-native implementation databases compatible with MySQL and PosrgreSQL. It can store the same data as RDS MySQL or PosrgreSQL, but provides a number of features not available for RDS, such as more read replicas, distributed storage, global databases and more.
S3 is not a database, but an object storage, where you can store your files, such as images, csv, excels, similarly like you would store them on your computer.
I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance??
You can migrate your data from your local postgress to RDS or Aurora if you wish. But RDS nor Aurora will not connect to your existing local database, as they are databases themselfs.
My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud.
I don't see a reason why you wouldn't be able to connect to the database. You can try to make it work, and if you encounter difficulties you can make new question on SO with RDS/Aurora setup details.

How to store mongodb data in Datadisk in Azure VM

I want do the following
Create a VM
Install Mongo
Store Mongo DB data in Data disk
Delete the VM which excludes the Data disk
Then create a VM and use the above existing Data disk
My goal is goal create and delete the Azure VM, but re-use the single data disk.
How can I achieve it using ARM template?
If my understanding is right, on your scenario, you should use Azure Custom Script Extension to do this(Install Mongodb and change Mongodb data path).
You could check this question:How to change the location that MongoDB uses to store its data?.
You need write a script and test it on your VM and then use template to execute it.
hi i think you can buy a extra disk and link it to the vm it and while removing the vm detatch it and you can attach to which ever vm you want it on later please let me know if that is what you wanted.

Using Amazon S3 as a File System for MongoDB

I am deciding to use MongoDB as a Document management DB in my application. Initially I was thinking to use S3 as a data store but it seems mongoDB uses local file system to store data. Can I use S3 as data store in MongoDB.
thanx
Provisioned IOPS in AWS is ideal for MongoDB.
This link has notes about running MongoDB on AWS and is rather useful.

Transfer MongoDB to another server?

If I populate a MongoDB instance on my local machine, can I wholesale transfer that database to a server and have it work without too much effort?
The reason I ask is that my server is currently an Amazon EC2 Micro instance and I need to put LOTS of data into a MongoDB and don't think I can spare the transactions and bandwidth on the EC2 instance.
There is copy database command which I guess should be good fit for your need.
Alternatively, you can just stop MongoDb, copy the database files to another server and run an instance of MongoDb there.