Apache Marathon app and container relation - marathon

I would like to understand the relation between a Marathon App and a container. Is it really so, that a Marathon App definition can contain only a single container definition (1:1)? As far as I understand the Marathon REST API, link attached, the answer is yes.
https://mesosphere.github.io/marathon/docs/rest-api.html#post-/v2/apps
But then are we supposed to use App Groups in order to define such complex applications that are built from more than a single container? I have checked Kubernetes, and the idea of "pod" in that case seems to be very convenient to build such applications, that are composed by multiple containers, which containers in the same pod have a single network stack, and application scaling happens on pod level.
Can we say, that Kubernetes pod corresponds to Marathon App Group? Or should I not try to find any similarities, but rather I should better understand Marathon philosophy?
Thank you!
Regards,
Laszlo

An app in Marathon specifies how to spawn tasks of that application. While you can specify how many tasks you want to spawn, every single on of these tasks only corresponds to one command or container.
To help you, I would need to understand more about your use case.
Groups can be used to organize related apps including dependencies. The tasks of the apps will not necessarily be co-located on the same host.
If you need co-location, you need to either create a container with multiple processed or use constraints to directly specify on which host you want to run the tasks.

Related

How to (properly) Create Jobs On Demand

What I would like to do
I would like to create a Kubernetes workflow where users could POST jobs whenever they wanted, and they might do it at any time, not necessarily scheduling anything (CronJobs), or specifying parallelism or completion requirements, i.e., users could create Jobs on demand.
How I would do it right now
The way I'm thinking about accomplishing this is by simply applying the Jobs to the Kubernetes cluster (I also have to make sure the job doesn't have the same name of a current one because otherwise Kubernetes will think it's a mistake and won't create another one). However, this feels improper because the Jobs will be kind of scattered on the cluster and I would lose control over them (though Kubernetes would supposedly automatically manage them optimally).
Is there a better, proper a way?
I imagine a more proper way of configuring all this is to create some sort of Deployment and Service on top of the Jobs, but is that an existing feature on Kubernetes? Huge companies probably have had this problem in the past so I wonder: what are the best practices for this Kubernetes Jobs On Demand use case?
Not a full answer but you might be interested in this project: https://github.com/ivoscc/kubernetes-task-runner.
It provides an API to launch one-time tasks as Jobs on a Kubernetes cluster, handles input/output files via GCS and periodically cleans up finished Jobs.

Kubernetes dynamic pod provisioning

I have an app I'm building on Kubernetes which needs to dynamically add and remove worker pods (which can't be known at initial deployment time). These pods are not interchangeable (so increasing the replica count wouldn't make sense). My question is: what is the right way to do this?
One possible solution would be to call the Kubernetes API to dynamically start and stop these worker pods as needed. However, I've heard that this might be a bad way to go since, if those dynamically-created pods are not in a replica set or deployment, then if they die, nothing is around to restart them (I have not yet verified for certain if this is true or not).
Alternatively, I could use the Kubernetes API to dynamically spin up a higher-level abstraction (like a replica set or deployment). Is this a better solution? Or is there some other more preferable alternative?
If I understand you correctly you need ConfigMaps.
From the official documentation:
The ConfigMap API resource stores configuration data as key-value
pairs. The data can be consumed in pods or provide the configurations
for system components such as controllers. ConfigMap is similar to
Secrets, but provides a means of working with strings that don’t
contain sensitive information. Users and system components alike can
store configuration data in ConfigMap.
Here you can find some examples of how to setup it.
Please try it and let me know if that helped.

Is it good to put complete application in one kubernetes pod?

I have a application consisting of frontend, backend and a database.
At the moment the application is running on a kubernetes cluster.
Front-, backend and database is inside its own Pod communicating via services.
My consideration is to put all these application parts (Front-, Backend and DB) in one Pod, so i can make a Helm chart of it and for every new customer i only have to change the values.
The Question is, if this is a good solution or not to be recommended.
No, it is a bad idea, this is why:
First, the DB is a stateful container, when you update any of the components, you have to put down all containers in the POD, let's say this is a small front end update, it will put down everything and the application will be unavailable.
Let's say you have multiple replicas of this pod to avoid the issue mentioned above, this will make extremely hard to scale the application, because you will need a copy of every container scaled, when you might likely need only FE or BE to scale, also creating multiple replicas of a database, depending how it replicates the data, will make it slower. You also have to consider backup and restore of the data in case of failures.
In the same example above, multiple replicas will make the PODs consume too much resources, even though you don't need it.
If you just want to deploy the resources without much customization, you could just deploy them into separate namespaces and add policies to prevent one namespace talking to each other and deploy the raw yaml there, only taking care to use config maps to load the different configurations for each.
If you want just a simple templating and deployment solution, you can use kustomize.
If you want to have the complex setup and management provided by Helm, you could defined all pods in the chart, an example is the Prometheus chart.
You can create a helm chart consisting of multiple pods or deployments, so you do not need to put them in one pod just for that purpose. I would also not recommend that, as for example the Database would most likely fit better in a StatefulSet.

if an application needs several containers running on the same host, why not just make a single container with everything you need?

I started working on kubernetes. I already worked with single container pods. Now i want to working on multiple container pod. i read the statement like
if an application needs several containers running on the same host, why not just make a single container with everything you need?
means two containers with single IP address. My dought is, in which cases two or more containers uses same host?
Could you please anybody explain me above scenario with an example?
This is called "multiple processes per container".
https://docs.docker.com/config/containers/multi-service_container/
t's discussed on the internet many times and it has many gotchas. Basically there's not a lot of benefit of doing it.
Ideally you want container to host 1 process and its threads/subprocesses.
So if your database process is in a crash loop, let it crash and let docker restart it. This should not impact your web container.
Also putting processes in separate containers lets you set separate memory/CPU limits so that you can set different limits for your web container and database container.
That's why Kubernetes exposes POD concept which lets you run multiple containers in the same namespace. Read this page fully: https://kubernetes.io/docs/concepts/workloads/pods/pod/
Typically it is recommended to run one process in a docker container. Although it is very much possible to run multiple processes in a container it is discouraged more on the basis of application architecture and DevOps perspective rather than technical reasons.
Following discussion gives a better insight into this debate: https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container
When one runs multiple processes from different packages/tools in the same docker it could run into dependency and upgradability issues that docker meant to solve in the first place. Nevertheless, for many applications, it makes much sense to scale and manage application in blocks rather than individual components, sometimes it makes life bit easy. So PODs are basically a balance between the two. It gives the isolation for each process for better service manageability and yet groups them together so that the application as a whole can be better managed.
I would say segregating responsibilities can be a boon. Maintenance is whole lot easier this way. A container can have a single entrypoint and a pod's health is checked via that main process in the entry point. If your front end (say Java over Tomcat ) application is working fine but database is not working (although one must never use production database in container), you will not get the feedback that u might get when they are different containers.
Also, one of the best part of docker images are they are available as different modules and maintenance is super easy that way.

How to organize pods with non-replicatable containers in kubernetes?

I'm trying to get my head around kubernetes.
I understand that pods are a great way to organize containers that are related. I understand that replication controllers are a great way to ensure they are up and running.
However, I do not get how to do it in real life.
Given a webapp with, say a rails app on unicorn, behind nginx with a postgres database.
The nginx and rails app can autoscale horizontally (if they are shared nothing), but postgres can't out of the box.
Does that mean I can't organize the postgres database within the same pod as nginx and rails, when I want to have two servers behind a loadbalancer? Does postgres need an own replication controller and is simply a service within the cluster?
The general question about that is: In common webscenarios, what kind of containers are going into one pod? I know that this can't be answered generally, so the ideas behind it are interesting.
The answer to this really depends on how far down the rabbithole you want to go. You are really describing 3 independent parts of your app - ngingx, rails, postgres. Each of those parts has different needs when it comes to monitoring, scaling, and updating.
You CAN put all 3 into a single pod. That replicates the experience of a VM, but with some advantages for manageability, deployment etc. But you're already calling out one of the major disadvantages - you want to scale (for example) the rails app but not the postgres instance. It's time to decompose.
What if you instead made 2 pods - one for rails+nginx and one for postgres. Now you can scale your frontend without messing up your database deployment. You might go even further and split your rails app and nginx into distinct pods, if that makes sense. Or split your rails app into 5 smaller apps.
This is the whole idea behind microservices (which is a very hyped word, I know). Decompose the problem into smaller and more manageable chunks. This is WHY kubernetes exists - to help you manage the resulting ocean of microservices.
To answer your last question - there is no single recipe. It's all about how your application is structured and what makes sense FOR YOU. Maybe you want to decompose along team boundaries, or along departments in your company, or along admin roles. The questions to ask yourself are things like "if I update or scale this pod, is there any part of it that I don't want updated/sclaed at the same time?"
In some sense it is a data normalization problem. A pod should be a model of one concept or owner or scope. I hope that helps a little.
You should put containers into the same pod when you want to deploy and update them at the same time or if they need to share local state (disk, network, etc). In some edge cases, you may also want to co-locate them for performance reasons.
In your scenario, since nginx and the rails app can scale horizontally, they should be in their own pods so that you can provision the right number of replicas for each tier of your application (unless you would always scale them at the same time). The postgres database would be in a separate pod, accessed via a service.
This allows you to update to a newer version of nginx without changing anything else about your service. Same for the rails app. And they could each scale independently.