deploying mongodb on google cloud platform? - mongodb

Hello all actually for my startup i am using google cloud platform, now i am using app engine with node.js this part is working fine but now for database, as i am mongoDB i saw this for mongoDB https://console.cloud.google.com/launcher/details/click-to-deploy-images/mongodb?q=mongo now when i launched it on my server now it created three instances in my compute engine but now i don't know which is primary instance and which is secondary, also one more thing as i read that primary instance should be used for writing data and secondary for reading, now when i will query my database should i provide secondary instance url and for updating/inserting data in my mongodb database should i provide primary instance url otherwise which url should i use for CRUD operations on my mongodb database ?? also after launcing this do i have to make any changes in any conf file or in any file manually or they already done that for me ?? Also do i have to make instance groups of all three instances or not ??
Please if any one of you think i have not done any research on this or its not a valid stackoverflow question then i am so sorry google cloud platform is very much new that's why there is not much documentation on it also this is my first time here in deploying my code on servers that's why i am completely noob in this field Thanks Anyways please help me ut of here guys.

but now i don't know which is primary instance and which is secondary,
Generally the Cloud Launcher will name the primary with suffix -1 (dash one). For example by default it would create mongodb-1-server-1 instance as the primary.
Although you can also discover which one is the primary by running rs.status() on any of the instances via the mongo shell. As an example:
mongo --host <External instance IP> --port <Port Number>
You can get the list of external IPs of the instances using gcloud. For example:
gcloud compute instances list
By default you won't be able to connect straight away, you need to create a firewall rule for the compute engines to open port(s). For example:
gcloud compute firewall-rules create default-allow-mongo --allow tcp:<PORT NUMBER> --source-ranges 0.0.0.0/0 --target-tags mongodb --description "Allow mongodb access to all IPs"
Insert a sensible port number, please avoid using the default value. You may also want to limit the source IP ranges. i.e. your office IP. See also Cloud Platform: Networking
i read that primary instance should be used for writing data and secondary for reading,
Generally replication is to provide redundancy and high availability. Where the primary instance is being used to read and write, and secondaries act as replicas to provide a level of fault tolerance. i.e. the loss of primary server.
See also:
MongoDB Replication.
Replication Read Preference.
MongoDB Sharding.
now when i will query my database should i provide secondary instance url and for updating/inserting data in my mongodb database should i provide primary instance url otherwise which url should i use for CRUD operations on my mongodb database
You can provide both in MongoDB URI and the driver will figure out where to read/write. For example in your Node.js you could have:
mongodb://<instance 1>:<port 1>,<instance 2>:<port 2>/<database name>?replicaSet=<replica set name>
The default replica set name set by Cloud Launcher is rs0. Also see:
Node Driver: URI.
Node Driver: Read Preference.
also after launcing this do i have to make any changes in any conf file or in any file manually or they already done that for me ?? Also do i have to make instance groups of all three instances or not ??
This depends on your application use case, but if you are launching through click and deploy the MongoDB config should all be taken care of.
For a complete guide please follow tutorial : Deploy MongoDB with Node.js. I would also recommend to check out MongoDB security checklist.
Hope that helps.

Related

can multiple server access the same mongodb?

I am going to create a load balancer in Azure. I have a VM that already running and going to take a backup of the existing VM and will create another VM using that backup. So two servers will have the same configuration and will use the same credentials.
In the already existing server, I have MongoDB configured, and if I create the same VM that will also have the same configuration as the old VM. Now what I want to know is can I use the same MongoDB which will be accessed by two servers that have the same configurations?
Will it create any mess or any give any error?
can I use like above mentioned?
Do I need to configure another MongoDB for the second server?
can anyone please clarify my questions? it would be great to have some clear explanation. thank you
MongoDB has build in support for horizontal scalability and high availability meaning that you dont need to create a 3th party load balancer , the mongos service part of mongoDB sharding cluster is the load balancer itself. Check the official documentation for mongoDB replication and sharding ...
On your questions:
Will it create any mess or any give any error?
If you just copy data to another VM it will be fine , as far as you dont write to one of the VMs you can loadbalance reads between this independent VMs , but this is strange approch when you have build in mongoDB replication mechanism and you can just add the second VM as a SECONDARY member from replicaSet.
can I use like above mentioned?
Sure , you can use also this approach but why you will need to do it?
Do I need to configure another MongoDB for the second server?
Depends on the use case , but in general you would prefer to create 3x members replicaSet or if your database is large and write performance is strong requirement you may need to distribute the database between multiple servers ( shards ) so you will need more then just 3x servers ...

What are the difference between Mongo URL and Mongo localhost connection?

Sorry i'm new to MongoDB so I'm so confused between
mongodb+srv://username:<password>#cluster0.accdl.mongodb.net/website?retryWrites=true&w=majority
and
mongodb://[port]:27017/[database_name]
what's difference and how is it impact our code?
well, as mongodb.com in https://www.mongodb.com/developer/article/srv-connection-strings/ said :
What is this mongodb+srv syntax?
Well, in MongoDB 3.6 we introduced the concept of a seed list that is specified using DNS records, specifically SRV and TXT records. You will recall from using replica sets with MongoDB that the client must specify at least one replica set member (and may specify several of them) when connecting. This allows a client to connect to a replica set even if one of the nodes that the client specifies is unavailable.
and :
Note that without the SRV record configuration we must list several nodes (in the case of Atlas we always include all the cluster members, though this is not required). We also have to specify the ssl and replicaSet options.
then in short words , mongodb +srv syntax , is way to connect to mongodb database , released starting from mongodb 3.6 , and allows you connect to the whole replicaset including all nodes , instead of mention a specific node in the traditional connection way .
I think mongodb+srv is used when you are using cluster and one instance of db
both of them will work for one instance but I think mongodb is work only for one instance

Setting up mongo replication in production

How do you setup mongodb replication in production environments? I started using cloud formation with this template but it crashes half way. I want to setup mongo so that it has one primary and two replications.
I haven't found a good tutorial for how to setup Mongo replication.
Some other questions I have are:
How does the failover work, if I have three Ec2 instances each with mongo and the primary fails. Another instance becomes the primary but how does my client PyMongo and Scala Mongo know the IP address of the new primary.
Lets say the primary goes down for 1 hour and there are 2,000 writes. When it goes back up, how does the primary gets updated. Do I need a script for this?
I am trying to do this with flask PyMongo
I ended up testing this on my local machine here is what I found.
Failover is done by the client, in the Mongo URI you specify all your replications and when PyMongo connects to it. He checks to see which one is the primary and writes to that one.
When the database goes back up they all sync to match the same records in the all the databases.
Readthedocs has step by step manual on setting up MongoDB cluster on different platforms, including AWS EC2:
https://mongodb-documentation.readthedocs.io/en/latest/ecosystem/tutorial/install-mongodb-on-amazon-ec2.html#deploy-a-multi-node-replica-set
To provide your clients with working mongo instance you can employ several different strategies. For example:
Set up Route53 failover. Route53 will monitor health instance of primary node, and change DNS record to point to secondary in case of failure.
Use service discovery. Consul, etc, ZooKeeper and doozerd are worth exploring.
In case of failing and then coming back a mongodb node will receive latest data from other nodes — that's just what replica set does.

Mongodb cluster with aws cloud formation and auto scaling

I've been investigating creating my own mongodb cluster in AWS. Aws mongodb template provides some good starting points. However, it doesn't cover auto scaling or when a node goes down. For example, if I have 1 primary and 2 secondary nodes. And the primary goes down and auto scaling kicks in. How would I add the newly launched mongodb instance to the replica set?
If you look at the template, it uses an init.sh script to check if the node being launched is a primary node and waits for all other nodes to exist and creates a replica set with thier ip addresses on the primary. When the Replica set is configured initailly, all the nodes already exist.
Not only that, but my node app uses mongoose. Part of the database connection allows you to specify multiple nodes. How would I keep track of what's currently up and running (I guess I could use DynamoDB but not sure).
What's the usual flow if an instance goes down? Do people generally manually re-configure clusters if this happens?
Any thoughts? Thanks.
This is a very good question and I went through this very painful journey myself recently. I am writing a fairly extensive answer here in the hope that some of these thoughts of running a MongoDB cluster via CloudFormation are useful to others.
I'm assuming that you're creating a MongoDB production cluster as follows: -
3 config servers (micros/smalls instances can work here)
At least 1 shard consisting of e.g. 2 (primary & secondary) shard instances (minimum or large) with large disks configured for data / log / journal disks.
arbiter machine for voting (micro probably OK).
i.e. https://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/
Like yourself, I initially tried the AWS MongoDB CloudFormation template that you posted in the link (https://s3.amazonaws.com/quickstart-reference/mongodb/latest/templates/MongoDB-VPC.template) but to be honest it was far, far too complex i.e. it's 9,300 lines long and sets up multiple servers (i.e. replica shards, configs, arbitors, etc). Running the CloudFormation template took ages and it kept failing (e.g. after 15 mintues) which meant the servers all terminated again and I had to try again which was really frustrating / time consuming.
The solution I went for in the end (which I'm super happy with) was to create separate templates for each type of MongoDB server in the cluster e.g.
MongoDbConfigServer.template (template to create config servers - run this 3 times)
MongoDbShardedReplicaServer.template (template to create replica - run 2 times for each shard)
MongoDbArbiterServer.template (template to create arbiter - run once for each shard)
NOTE: templates available at https://github.com/adoreboard/aws-cloudformation-templates
The idea then is to bring up each server in the cluster individually i.e. 3 config servers, 2 sharded replica servers (for 1 shard) and an arbitor. You can then add custom parameters into each of the templates e.g. the parameters for the replica server could include: -
InstanceType e.g. t2.micro
ReplicaSetName e.g. s1r (shard 1 replica)
ReplicaSetNumber e.g. 2 (used with ReplicaSetName to create name e.g. name becomes s1r2)
VpcId e.g. vpc-e4ad2b25 (not a real VPC obviously!)
SubnetId e.g. subnet-2d39a157 (not a real subnet obviously!)
GroupId (name of existing MongoDB group Id)
Route53 (boolean to add a record to an internal DNS - best practices)
Route53HostedZone (if boolean is true then ID of internal DNS using Route53)
The really cool thing about CloudFormation is that these custom parameters can have (a) a useful description for people running it, (b) special types (e.g. when running creates a prefiltered combobox so mistakes are harder to make) and (c) default values. Here's an example: -
"Route53HostedZone": {
"Description": "Route 53 hosted zone for updating internal DNS (Only applicable if the parameter [ UpdateRoute53 ] = \"true\"",
"Type": "AWS::Route53::HostedZone::Id",
"Default": "YA3VWJWIX3FDC"
},
This makes running the CloudFormation template an absolute breeze as a lot of the time we can rely on the default values and only tweak a couple of things depending on the server instance we're creating (or replacing).
As well as parameters, each of the 3 templates mentioned earlier have a "Resources" section which creates the instance. We can do cool things via the "AWS::CloudFormation::Init" section also. e.g.
"Resources": {
"MongoDbConfigServer": {
"Type": "AWS::EC2::Instance",
"Metadata": {
"AWS::CloudFormation::Init": {
"configSets" : {
"Install" : [ "Metric-Uploading-Config", "Install-MongoDB", "Update-Route53" ]
},
The "configSets" in the previous example shows that creating a MongoDB server isn't simply a matter of creating an AWS instance and installing MongoDB on it but also we can (a) install CloudWatch disk / memory metrics (b) Update Route53 DNS etc. The idea is you want to automate things like DNS / Monitoring etc as much as possible.
IMO, creating a template, and therefore a stack for each server has the very nice advantage of being able to replace a server extremely quickly via the CloudFormation web console. Also, because we have a server-per-template it's easy to build the MongoDB cluster up bit by bit.
My final bit of advice on creating the templates would be to copy what works for you from other GitHub MongoDB CloudFormation templates e.g. I used the following to create the replica servers to use RAID10 (instead of the massively more expensive AWS provisioned IOPS disks).
https://github.com/CaptainCodeman/mongo-aws-vpc/blob/master/src/templates/mongo-master.template
In your question you mentioned auto-scaling - my preference would be to add a shard / replace a broken instance manually (auto-scaling makes sense with web containers e.g. Tomcat / Apache but a MongoDB cluster should really grow slowly over time). However, monitoring is very important, especially the disk sizes on the shard servers to alert you when disks are filling up (so you can either add a new shard to delete data). Monitoring can be achieved fairly easily using AWS CloudWatch metrics / alarms or using the MongoDB MMS service.
If a node goes down e.g one of the replicas in a shard, then you can simply kill the server, recreate it using your CloudFormation template and the disks will sync across automatically. This is my normal flow if an instance goes down and generally no re-configuration is necessary. I've wasted far too many hours in the past trying to fix servers - sometimes lucky / sometimes not. My backup strategy now is run a mongodump of the important collections of the database once a day via a crontab, zip up and upload to AWS S3. This means if the nuclear option happens (complete database corruption) we can recreate the entire database and mongorestore in an hour or 2.
However, if you create a new shard (because you're running out of space) configuration is necessary. For example, if you are adding a new Shard 3 you would create 2 replica nodes (e.g. primary with name => mongo-s3r1 / secondary with name => mongo-s3r2) and 1 arbitor (e.g. with name mongo-s3r-arb) then you'd connect via a MongoDB shell to a mongos (MongoDB router) and run this command: -
sh.addShard("s3r/mongo-s3r1.internal.mycompany.com:27017,mongo-s3r2.internal.mycompany.com:27017")
NOTE: - This commands assumes you are using private DNS via Route53 (best practice). You can simply use the private IPs of the 2 replicas in the addShard command but I have been very badly burned with this in the past (e.g. serveral months back all the AWS instances were restarted and new private IPs generated for all of them. Fixing the MongoDB cluster took me 2 days as I had to reconfigure everything manually - whereas changing the IPs in Route53 takes a few seconds ... ;-)
You could argue we should also add the addShard command to another CloudFormation template but IMO this adds unnecessary complexity because it has to know about a server which has a MongoDB router (mongos) and connect to that to run the addShard command. Therefore I simply run this after the instances in a new MongoDB shard have been created.
Anyways, that's my rather rambling thoughts on the matter. The main thing is that once you have the templates in place your life becomes much easier and defo worth the effort! Best of luck! :-)

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.
I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.
I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.