How to prevent accidental deletion in Mongodb Atlas - mongodb-atlas

Is there a way to prevent accidental deletion of projects and/or clusters in Mongodb Atlas?
I'm coming from AWS where there is such a concept for S3 bucket and RDS databases where you have to manually disable such protection before you can delete the resource.
I've not been able to find an equivalent for Atlas and it makes me a little nervous my team or I could accidentally delete something, especially as we are managing our Atlas infrastructure through Terraform and we could miss something in a terraform apply.
Thanks

I've just found that Atlas have added this as a new feature when creating a new database of any type (serverless, dedicated, shared):
Hopefully will be supported by the Terraform provider soon.

Related

Migrate data from Citus to RDS

Since Citus is not going to be available as a Managed Service in AWS, I am trying move the database to RDS (not the whole history but only the transactional portion as an OLTP). The migration from Citus is not clear because the data does not reside in a single node. I want to check the options we might have to move data from Citus to RDS.
Amazon DMS: This option is good for the supported databases (PostgreSQL) but we do not know what behavior this will have in Citus from the distributed nature of the engine. Has someone migrated the data to S3, to another DB or something in these lines?
I saw this paper from AWS https://d1.awsstatic.com/whitepapers/aws-cloud-data-ingestion-patterns-practices.pdf?did=wp_card&trk=wp_card on how to ingest data from different sources and DMS seems like a good option but I do not know the internals of Citus that well to tell if we will get all the data and gather the CDC correctly.
A Custom migration: Via a support ticket, we can access the S3 buckets that Citus uses for Disaster recovery where the WAL logs are available and we could use something like WAL-G to take those logs and replicate them in a Postgres instance. The issue here is that this is a very custom migration and the development time might be too high.
Is there any other option to move data from Citus to RDS or Aurora in AWS, what looks like a good path to make the database migration? All the documents refer to move data the other way around, from Aurora or RDS to Citus.
Sumedh from Citus Cloud here. Please go ahead and open a support ticket with us to further investigate solutions. We can evaluate if using DMS is a viable approach for your use-case.

Can you use AWS DMS to move Aurora DB from one account to another?

I am trying to migrate an Aurora cluster from one of our accounts to another. We actually do not have a lot write requests and the database itself is quite small, but somehow we decided to minimize the downtime.
I have looked into several options
Use snapshot: cut off the mutation in source DB, take snapshot, share and restore in another account. This would definitely introduce some downtime
Use Aurora cloning: cut off the mutation in source DB, clone the cluster in target account and switch to the target DB. According to AWS, the cloning is much faster than taking and restoring a snapshot, so the downtime should be shorter.
I am not sure if I can use DMS to do this as I did not find useful doc/tutorials about moving Aurora across accounts. Also, I am not sure whether DMS will sync any write requests to target DB during migration.
If DMS can not live sync, then I probably should use Bucardo to live migrate.
Looking at the docs, AWS Aurora with PostgreSQL compatibility is allowed as source & target endpoints. So, answering your question, yes it's possible.
Obviously, your source Aurora DB should be accessible from the target account. Check that the DB endpoint is public and the traffic is not restricted by ACLs rules or SGs rules.
Also, if you want to enable ongoing replication, you need to grant rds_replication (or rds_superuser) role to the source database user. Link to the docs.
We actually ended up using DMS for this migration. What we did was:
Take a snapshot of the target DB in the original account.
Share the snapshot to the target account and restore it over there. (You have to use snapshot for migrating things like triggers, custom types, sequence, etc)
Setup connections (like VPC peering or security groups) between two accounts.
Setup DMS in source account (endpoints, replication instance, task)
Write SQL to temporarily disable/delete constraints, triggers, etc which may cause error when load source data.
Using DMS to load source data and enable ongoing replication.
Enable/add constraints, triggers, etc back.
Post migration test

Do I gain anything by using "proper" replicas for a read-only MongoDB database?

I have a web-app that depends on a read-only MongoDB database. Through trial and error, I discovered that by far the fastest way to run the ETL pipeline that populates the database is to run a local copy of MongoDB, populate the database, stop the database, and tarball the state directory.
To deploy a high-availability "cluster," I create multiple instances (or containers) running the app, each with access to a copy of the state in locally mounted storage. Putting these behind a load balancer with regular health checks and autoscaling (or in a Kubernetes cluster as a ReplicaSet), I get isolation, redundancy, easy rollbacks (using versioned storage), and easy setup in virtually any environment.
The key idea here is that because the database is read-only, it is in a sense a "stateless" application. Thus, I can treat it like any other static provider of information
There are many apparent advantages to this setup. Nevertheless, I have always had a nagging feeling that I was missing something. Given a read-only context, is there still some reason why it might be better to run a "proper" MongoDB cluster?
If you don't mind outages when the single node goes down and you don't mind taking the system down during upgrades then this is probably an ok deployment. You might get a safer dump and restore using mongodump and mongorestore rather than tar but apart from that this setup should work for a read-only deployment.

Will Serverless support AWS DocumentDB?

I work in a company that's using Serverless to build cloud-native applications and services. Today we use DynamoDB and SQL Databases with AWS Aurora.
We want to go with DocumentDB for our next application, but we could not find anything about Serverless and AWS DocumentDB. Does Serverless support AWS DocumentDB? If not, is there any plans to support it in the future?
Serverless supports any AWS resources that you can define using CloudFormation. As per the Serverless docs here:
Define your AWS resources in a property titled resources. What goes in
this property is raw CloudFormation template syntax, in YAML...
The YAML for creating a DocumentDB cluster is, going to look something like:
resources:
Resources:
DBCluster:
Type: "AWS::DocDB::DBCluster"
DeletionPolicy: Delete
Properties:
DBClusterIdentifier: "MyCluster"
MasterUsername: "MasterUser"
MasterUserPassword: "Password1234!"
DBInstance:
Type: "AWS::DocDB::DBInstance"
Properties:
DBClusterIdentifier: "MyCluster"
DBInstanceIdentifier: "MyInstance"
DBInstanceClass: "db.r4.large"
DependsOn: DBCluster
You can find the other CloudFormation resources that you can define in the resources parameter of your Serverless.yaml here.
DocumentDB is not a serverless service. You need to manage the backend server to use it.
Please refer to this blog: https://blogs.itemis.com/en/serverless-services-on-aws, you can see it is not in the list of "SERVERLESS SERVICES ON AWS".
No, this won't support serverless, if you really want this you can go with DynamoDB. Also, can see differences if you want.
DocumentDB
MongoDB is supported in this database, which provide ease to learn
Stored procedures are needed in this, where data retrieval and data accumulation is done with help
Document size is limited to 16MB and storage is maximized up to 64TB of data.
Daily backups are managed by the database itself, and can be recovered whenever required
This is costly as we require paying around $200/month even if the user uses only some instances of database or only used few hours.
AWS is not involved in the user credentials stored area as that will be stored in DB directly
Available in specific regions
Can be easily migrated out of AWS into any MongoDB
In case of primary node failure, service promotes read-replica to primary. Multi A-Z has to be configured by users. Backup can be copied across regions
DynamoDB
MongoDB is not directly supported i this and even not easy to migrate from MongoDB to DynamoDB
Stored procedures are not needed in this, which makes the process easier for users
There is no limit in the document size as it can be scaled up to the size of user requirements
Daily backups are not available which makes the user too backup the data which triggered explicitly by users, and can be recovered whenever needed
There is initial cost associated with this, but overall cost is less. Also, on-demand pricing is available where user manage with the lesser amount of $1/month. 25GB data is provided for free in first stage.
AWS controls the user access to the database through identity and access management where authentication and authorization is needed for low level as well
Available in all regions
Can not be easily migrated out of AWS into any MongoDB, you need to write a code to transform
Support global tables, which protect users against regional failure. Data is automatically replicated across multiple AZs in a single region.

Sandbox version for AWS RedShift

I have been using RedShift for a few months and I like it. But I need to add some tests around it and I am not sure what the most cost effective way of doing it is. I can only think of using one server RedShift cluster as Sandbox but that seems to be too costly even if I only use it during testing
Databases in Redshift cannot 'see' each other and cross-database queries are not supported. Therefore we simply have 'development', 'test' and 'production' databases on the same cluster.
When we're ready to push to production we:
take a snapshot
drop production
rename test to production
This generally works fine for use because we find Redshift to be over-provisioned on storage, i.e., filling our nodes to their max storage capacity does not provide acceptable performance.
NOTE: You cannot drop your "master" database defined when the cluster was created. If you are using that as your primary database you will have to unload your cluster and recreate it for this approach to be viable.
I got the answer from AWS RedShift forum: "There is no way of creating a sandbox version of Redshift. We'll add this to our backlog of feature requests"