I'm used to building web apps the 'traditional' way but am trying to wrap my head around using docker. If I run postgres in a container along with my python web app, is this the same as spinning up a digital ocean server and installing postgres from scratch? How do I handle backups, fault tolerance, etc with a postgres database that is in docker?
As an alternative, I normally use hosted postgres on Heroku or AWS. Doesn't that solve a lot of the issues I would run into when hosting postgres myself in docker? Do developers really run postgres in docker or do they typically prefer to use an external hosted service?
It's wise for the moment to only keep stateless services or one-off jobs in Docker, and not put any stateful service, like a database.
This recent article from mesosphere has more details about why this isn't yet the case.
One issue would be that orchestration technologies aren't yet up to snuff for the high requirements of stateful services. To quote:
The first challenge is resource isolation. Many container orchestration solutions in the market provide a best effort approach to resource allocation, including memory, CPU and Storage. While this may be ok for stateless apps, it may be catastrophic for stateful services, where loss of performance may result in loss of customer transactions or data.
Another is that stateful databases have been built with different assumptions than those employed by containers, and are heavily optimized for them. Again, quoting:
Most of today’s stateful database technologies were originally designed for a non-containerized world. The operational instructions are very specific to the technology and can sometimes be version specific. Trying to map generic primitives of a container orchestration platform to stateful services is usually a time consuming and error prone operation.
You could totally run your postgres instance inside Docker. But it will require some work to handle backups, fault tolerance and such.
At my company we've made the choice to not put databases inside Docker, for now at least.
Related
CONTEXT: I have been learning Kubernetes and trying to get some hands-on experience. I have been using AKS to abstract the complexity of having to deal with the control plane (and because I have a free student azure account). I am deploying a NodeJS app that connects to the MongoDB database. So far the deployment has been successful but I am using MongoDB Atlas and connecting to it.
Based on the little I have learned about Stateful sets, the MongoDB Atlas service seems a lot easier and more convenient but my question is, when would it be a better idea to consider deploying a stateful set with MongoDB database? (running on the pod) What's more cost-effective? More easily scalable?
I realize the questions might be a little bit vague but I am just getting started with Kubernetes..
disclaimer: This is not a production application, just something simple I am using to learn K8S
Official docs docs uses statefullset and that would make sense. Generally all DB kind of applications deployed as statefullset. Because there can be states that nodes are not sync with each other and that would create data inconsistencies between nodes(mongodb nodes not kubernetes).
You can deploy MongoDB as deployment. I have seen it deployed. But most clients use a connection string to connect(a string of multiple node addresses). And since kubernetes exposes statefullsets with headless services you should be okay.
For learning purpose, I advice you to deploy your MongoDB in a StatefulSet. Then you can learn how it works and what problem you could encounter with this Kubernetes object.
For production application, I advice to never deploy a database in a StatefulSet if you don't need it. In fact, StatefulSet will come with a lot of problematics that you might not need to manage.
Sometimes, companies rules restrict to host their data on external company storage.
To know if you need to put your database in a StatefulSet, the question I try to answer is:
Should my DB be hosted on premise (for privacy)?
Should my DB be scalable?
Should my DB be updated frequently?
You can find a list of pros/cons on the documentation.
I deploying a python + tensorflow + flask application using a fully managed Google Cloud Run Service (1 vCPUs and 4 GB Ram).
System works fine but it is really slow, so I am evaluating ways of making it fast (it needs to run 20-30 times faster than what is doing now)
What would be the best approach?
To use a Kubernetes Cluster with one or two powerful machines
To use a Kubernetes Cluster with 3-5 weaker machines
To forget about Kubernets/Docker and run everything on single powerfull VM
Something else maybe?
For now I don't expect to have more than 10 users at a time but I want to be able to scale it up eventually.
You might want to evaluate according to your use case
Per this article, Fully managed Cloud Run is an ideal serverless platform for stateless containerized microservices that don’t require Kubernetes features like namespaces, co-location of containers in pods (sidecars) or node allocation and management.
GKE is a great choice if you are looking for a container orchestration platform that offers advanced scalability and configuration flexibility.
You mentioned you are looking the cheaper/easier method to develop, but this will probably not be as scalable, efficient or manageable, you might want to take a closer look at all cloud compute options in GCP to see what could benefit your use case the most.
You mentioned your use case is CPU intensive, so you might want to leverage the high CPU machine types, these might be used directly by creating a VM, creating an instance group or using them in other services like GKE or App Engine
I am new to Docker. I want some opinion from some expert about container design. I have set up a database in the MongoDB cloud (Atlas). I have Windows app in Docker container which include Windows OS and application based components. I want to use RavenDB and this database is very new to me. A component of my Windows container will communicate to both MongoDB and RavenDB.
My question is
should I create different docker container for RavenDB or will I install RavenDB in my existing windows container.
it is design decision problem. I am new to RavenDB and Docker so the pros and cons are not clear to me yet. Kindly help me.
I had a similar application, where I had a postgresql db and Nodejs webapp.
The web application and the database were running on separate docker containers.
This way the two containers were independent of each other.
This replicates the actual production scenario, where you'll have your service and database running separately.
It is recommended to run single process on each container.
Better modularity of the services. Separation of Concerns.
Scaling containers horizontally is much easier if the container is isolated to a single function.
This way the two containers were independent of each other. The postgresql db container had a volume mounted to persist the data.
A more detailed explanation can be found here
Been working on a web app with a simple database model that only needs CRUD operations, figured MongoDB would be perfect for it. The most important constraints of the project is that it be able to scale from a small amount of users to a large amount. I’ve been looking at the cloud launcher and I’ve noticed that the most popular MongoDB solution advertises a cost of ~$350/mo. This is a surprisingly large amount that makes me consider using cloud sql for my database instead. Is there a better way to deploy MongoDB to GCP that’s more fitted to my use case? I’ve been reading about automatic scaling with kubernetes but I can’t find anything about price. Any and all advice is greatly appreciated
I haven't used mongodb with kubernetes but we do use the cloud launcher solution at work. We use 2 nodes(n1-standard-1) and an arbiter(micro) + 100GB storage on each node which comes up around $100 a month. You would need a replicaset in a production environment so this seems to be a reasonable base cost.
Kubernetes does not provide a lot of advantages over the classic GCE deployment for mongodb compared to a webserver. Setting up a replicaset on kubernetes is a bit more work compared to GCE setup. https://medium.com/google-cloud/mongodb-replica-sets-with-kubernetes-d96606bd9474 and http://blog.kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets.html should serve as decent references but wouldn't lower your costs. Scaling nodes would be slightly easier though but does not strictly translate to scaling mongodb.
I have lately been working on a similar solution.
GCP announced that they don't charge for Kubernetes cluster management but only for resources used by it (instances, network ...):
https://cloud.google.com/kubernetes-engine/pricing
In general, databases are high maintenance (data mounts, backups, migrations...), so I would not start running Mongo on Kubernetes right away. You could get there but it will be more complicated than deploying your web app on Kubernetes.
Better to use MongoDB as a service that supports GCP (e.g. MongoDB Atlas), I have done so myself and see a few other companies do that.
If you scale gradually you should be able to control your costs.
The web app itself should be easy to deploy and maintain on Kubernetes.
I'm trying to implement an architecture that's similar to the coreos's production architecture (shown below)
Should I run the database as a central service or one or more of the workers?
I figured the database needs some kind of replication, which makes me think that putting it in the worker cluster makes more sense, but I'm just not sure.
This should be run as a worker. The central services are the basic things that come with CoreOS (mainly etcd). The workers host your applications, the database being one of them. You do have a persistence issue because your database will have state to remember between restarts. So, there is a bigger issue of how do you make that persistence? One was to do it is use a host file and give the database an affinity to that host and mount the host file. Another thing you might consider is running more than one database (if your db technology supports that) and replicate that database so you have two (or more) copies in different workers. (non-affinity). If your database creates transaction logs that can be applied to a backup, you can manage those transaction logs in a worker.
Another thing to consider is not using a container for your database. The database is a weird animal, its care and feeding is not like the rest of the applications. So it is reasonable (in my opinion) to have your database managed and maintained outside the scope of your cluster (but still reachable by the cluster).