I'm a high school student. I implemented a spring boot application. It's deployed in AWS EC2. I'm using Nginx as my load balancer. My application is running on two different ec2 instances behind the load balancer both using their own MongoDB. How do I sync the data b/w the two DB's? Please pardon me for any mistake, I'm doing this for the first time and please feel free to point out mistakes in my design.
Related
CONTEXT: I have been learning Kubernetes and trying to get some hands-on experience. I have been using AKS to abstract the complexity of having to deal with the control plane (and because I have a free student azure account). I am deploying a NodeJS app that connects to the MongoDB database. So far the deployment has been successful but I am using MongoDB Atlas and connecting to it.
Based on the little I have learned about Stateful sets, the MongoDB Atlas service seems a lot easier and more convenient but my question is, when would it be a better idea to consider deploying a stateful set with MongoDB database? (running on the pod) What's more cost-effective? More easily scalable?
I realize the questions might be a little bit vague but I am just getting started with Kubernetes..
disclaimer: This is not a production application, just something simple I am using to learn K8S
Official docs docs uses statefullset and that would make sense. Generally all DB kind of applications deployed as statefullset. Because there can be states that nodes are not sync with each other and that would create data inconsistencies between nodes(mongodb nodes not kubernetes).
You can deploy MongoDB as deployment. I have seen it deployed. But most clients use a connection string to connect(a string of multiple node addresses). And since kubernetes exposes statefullsets with headless services you should be okay.
For learning purpose, I advice you to deploy your MongoDB in a StatefulSet. Then you can learn how it works and what problem you could encounter with this Kubernetes object.
For production application, I advice to never deploy a database in a StatefulSet if you don't need it. In fact, StatefulSet will come with a lot of problematics that you might not need to manage.
Sometimes, companies rules restrict to host their data on external company storage.
To know if you need to put your database in a StatefulSet, the question I try to answer is:
Should my DB be hosted on premise (for privacy)?
Should my DB be scalable?
Should my DB be updated frequently?
You can find a list of pros/cons on the documentation.
Been working on a web app with a simple database model that only needs CRUD operations, figured MongoDB would be perfect for it. The most important constraints of the project is that it be able to scale from a small amount of users to a large amount. I’ve been looking at the cloud launcher and I’ve noticed that the most popular MongoDB solution advertises a cost of ~$350/mo. This is a surprisingly large amount that makes me consider using cloud sql for my database instead. Is there a better way to deploy MongoDB to GCP that’s more fitted to my use case? I’ve been reading about automatic scaling with kubernetes but I can’t find anything about price. Any and all advice is greatly appreciated
I haven't used mongodb with kubernetes but we do use the cloud launcher solution at work. We use 2 nodes(n1-standard-1) and an arbiter(micro) + 100GB storage on each node which comes up around $100 a month. You would need a replicaset in a production environment so this seems to be a reasonable base cost.
Kubernetes does not provide a lot of advantages over the classic GCE deployment for mongodb compared to a webserver. Setting up a replicaset on kubernetes is a bit more work compared to GCE setup. https://medium.com/google-cloud/mongodb-replica-sets-with-kubernetes-d96606bd9474 and http://blog.kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets.html should serve as decent references but wouldn't lower your costs. Scaling nodes would be slightly easier though but does not strictly translate to scaling mongodb.
I have lately been working on a similar solution.
GCP announced that they don't charge for Kubernetes cluster management but only for resources used by it (instances, network ...):
https://cloud.google.com/kubernetes-engine/pricing
In general, databases are high maintenance (data mounts, backups, migrations...), so I would not start running Mongo on Kubernetes right away. You could get there but it will be more complicated than deploying your web app on Kubernetes.
Better to use MongoDB as a service that supports GCP (e.g. MongoDB Atlas), I have done so myself and see a few other companies do that.
If you scale gradually you should be able to control your costs.
The web app itself should be easy to deploy and maintain on Kubernetes.
Is it a good practice to setup Elasticsearch, logstash and kiban on 3 different servers, with each server having RAM of 8GB.
Or
Setup ELK on 1 single machine with higher memory of 16GB.
The machine needs to be highly available.
Can anyone suggest or share inputs
it depends on your task and situation. normally it is good practice to setup Elasticsearch, logstash and kiban on 3 different server. or if you data if more so you have to make a cluster of elastic search or may have more than one server of logstash .
filefeats will be on all the data(log) server .
there are an example of handling 25000 logs per secoung
https://engineering.viki.com/blog/2015/log-processing-at-scale-elk-cluster-at-25k-events-per-second/
Its slightly more complicated than explained here,
Any distributed component would try to offer features with sharded or partioned way. In a similar way the Elastic Search at ELK which is based out of Master Slave model and maintains the data at ES data nodes. This means one needs to set up a cluster of nodes for Elastic search itself for its various components such as ES Master, ES data and ES client.
The next level if the system is required at production grade which requires Multi master setup with minimum 3 master nodes.
This would be the beginning of ELK.
If one needs to run such a complex system in a limited resources, then Containerizing the ELK components and running them in a container orchestration framework is the recommended option. Kubernetes/Docker swarm are the options to run ELK cluster based on the dockerized instances of ELK. Again these orchestration frameworks also require multimaster setup , but that would be fair as one would have lot more components in a cloud environment and all of them could be controlled under these orchestration frameworks.
To deploy a server in Amazon Ec2, I wish to have the mongodb master database in an Ec2 instance itself and at an average I would be having around 5-6 Ec2 instances running in parallel which are scaled by amazon auto-scaling group.
As database is updated frequently and all instances are under Elastic load balancer,it is hard to predict which users data is in which database of Ec2. By following this approach, am i assured of data consistency in mongodb across the instances while scaling up and down ? If it is not the good approach please suggest alternate ways of doing it.
When using Amazon autoscaling, new EC2 instances will be created from a root AMI image (for example, with an empty database).
As data is added to your database, that data is not synced back to the AMI image. So when a second EC2 instance is launched due to a scaling event, that new EC2 instance will have it's own blank database, because it will be based on the same root AMI image (with the blank database).
The two databases will not know about each other and no syncing will occur. Also, at any time, any of the EC2 instances may be deleted due to a scale-down event. So any data on that instance may be lost.
Separate your web layer from the database layer: use autoscaling to scale your web layer, but don't use autoscaling for your data layer.
MongoDB has it's own form of clustering for load balancing and high-availability. Use it rather than rolling your own using autoscaling.
It is not a standard practice to couple your web server with the database server. Here is what I would suggest.
Implement load balancing on your web servers as well as your mongo db instances so for the sake of argument, you will have 4 web and 4 mongo db servers.
For implementing load balancing on your mongo db servers, it is up to you if you wish to go with a master-slave tier where each mongo db server is a master as well as a slave(so that all instances have data synced) or you can look into sharding.
Was wondering if anyone out there has any experience in deploying a Zend community app to the cloud (e.g. AWS or similar)?
I'm new to cloud hosting having always been fortunate enough in the past to work for folks who have dedicated servers, my main concern (non-zend specific) is how you manage resilience at the database level? FOr example I would in a traditional setup have 2 boxes running the DB (Mysql) in Master/Slave mode with the master replicating to the slave. Assuming any HD failure of the Master I could swap the DB connection over from the Master to the slave and rebuild master at a later point? is this done differently in the cloud?
Any help/pointers greatly appreciated?
It depends on the type of cloud service that you use. If you're using AWS to get your own virtual machine ( Amazon EC2 ) then it's basically the same as having a dedicated server and you can keep a master slave setup and work them much the same way.
However, if you plan on using Amazon's cloud database service ( Amazon Simple DB ) then you don't have to worry about masters and slaves since Amazon does this for you and makes sure that you always have access to your data. The only thing is that it's in beta.
One of the points of the cloud is to take your mind off the hardware. Amazon worries about that.
You might still want to have two virtual machines in case amazon is doing maintenance that might cause your vm to become unavailable, however, Amazon stresses that it would be highly available and never go down really, so long as you pay.