Can Kubernetes work like a compute farm and route one request per pod - kubernetes

I've dockerized a legacy desktop app. This app does resource-intensive graphical rendering from a command line interface.
I'd like to offer this rendering as a service in a "compute farm", and I wondered if Kubernetes could be used for this purpose.
If so, how in Kubernetes would I ensure that each pod only serves one request at a time (this app is resource-intensive and likely not thread-safe)? Should I write a single-threaded wrapper/invoker app in the container and thus serialize requests? Would K8s then be smart enough to route subsequent requests to idle pods rather than letting them pile up on an overloaded pod?

Interesting question.
The inbuilt default Service object along with kube-proxy does route the requests to different pods, but only does so in a round-robin fashion which does not fit our use case.
Your use-case would require changes to be made to the kube-proxy setup during the cluster setup. This approach is tedious and will require you to have your own cluster setup (not supported by cloud services). As described here.
Best bet would be to setup a service-mesh like Istio which provides the features with little configuration along with a lot of other useful functionalities.
See if this helps.

Related

k8s: Is it possible to have two identical deployments but route different traffic to them?

Here is my use case:
I have a microservice which gets sent traffic via an ingress gateway in real time and via a batch process. What I'd like to be able to do is be able to conceptually define a deployment and have it create two sets of pods:
One set for real time request
Another for batch.
When a new version of the microservice gets deployed, the k8s deployment is updated and both real time and batch use the new version.
Is this possible in k8s or will I need to create two deployments and manage them separately?
This is a community wiki answer posted for better visibility. Feel free to expand it.
Since we don't know the complete information about the architecture used, the following suggestions from comments can be used in the future to solve the problem.
1. With Deployments, Services, Selectors
You can have two identical deployments and route different traffic to them.
It may be implemented by Services:
In Kubernetes, a Service is an abstraction which defines a logical set
of Pods and a policy by which to access them (sometimes this pattern
is called a micro-service). The set of Pods targeted by a Service is
usually determined by a selector.
Such approach has some advantages.
By default, traffic will be routed to endpoints in random way if you are using iptables proxy mode. When you try to send traffic to specific pods covered by the same deployment - it may happen large differences in CPU and Memory usage leading to the resource exhaustion or wasting resources.
It will be easier to manage service versioning, CPU and Memory assignment and rollouts.
2. With Istio
From David M. Karr
If a service is defined as a VirtualService, you can route to
different DestinationRule objects depending on header values (or other
qualifications).
Additionally
If you need to deploy a new version of the microservice, you can choose between different strategies, which is more suitable for your needs.
Kubernetes deployment strategies:
recreate: terminate the old version and release the new one
ramped: release a new version on a rolling update fashion, one after the other
blue/green: release a new version alongside the old version then switch traffic
canary: release a new version to a subset of users, then proceed to a full rollout
a/b testing: release a new version to a subset of users in a precise way (HTTP headers, cookie, weight, etc.). A/B testing is
really a technique for making business decisions based on statistics
but we will briefly describe the process. This doesn’t come out of the
box with Kubernetes, it implies extra work to setup a more advanced
infrastructure (Istio, Linkerd, Traefik, custom nginx/haproxy, etc).

How many side-car proxy is too much in a pod

I am currently studying distributed systems and have seen that many businesses relies on side-car proxy pattern for their services. For example, I know a company that uses an nginx proxy for authentication of their services and roles and permissions instead of including this business logic within their services.
Another one makes use of a cloud-sql-proxy on GKE to use the Cloud SQL offering that comes on google cloud. So on top of deploying their services in a container which runs in a pod, they is a proxy just for communicating with the database.
There is also istio which is a service mesh solution which can be deployed as a side-car proxy in a pod.
I am pretty sure there are other commonly know use-cases where this pattern is used but at some point how much is too much side-car proxy? How heavy is it on the pod running it and what are the complexity that comes with using 2, 3, or even 4 side car proxys on top of your service container?
I recommend you to define what really you need and continue your research based on this, since this topic is too broad and doesn't have one correct answer.
Due to this, I decided to post a community wiki answer. Feel free to expand it.
There can be various reasons for running some containers in one pod. According to Kubernetes documentation:
A Pod can encapsulate an application composed of multiple co-located
containers that are tightly coupled and need to share resources. These
co-located containers form a single cohesive unit of service—for
example, one container serving data stored in a shared volume to the
public, while a separate sidecar container refreshes or updates
those files. The Pod wraps these containers, storage resources, and an
ephemeral network identity together as a single unit.
In its simplest form, a sidecar container can be used to add functionality to a primary application that might otherwise be difficult to improve.
Advantages of using sidecar containers
sidecar container is independent from its primary application in terms of runtime environment and programming language;
no significant latency during communication between primary application and sidecar container;
the sidecar pattern entails designing modular containers. The modular container can be plugged in more than one place with minimal modifications, since you don't need to write configuration code inside each application.
Notes regarding usage of sidecar containers
consider making a small sidecar container that doesn't consume much resources. The strong point of a sidecar containers lies in their ability to be small and pluggable. If sidecar container logic is getting more complex and/or becoming more tightly coupled with the main application container, it may better be integrated with the main application’s code instead.
to ensure that any number of sidecar containers can works successfully with main application its necessary to sum up all the resources/request limits while defining resource limits for the pod, because all the containers will run in parallel. Whole functionality works only if both types of containers are running successfully and most of the time these sidecar containers are simple and small that consume fewer resources than the main container.

Triggering a Kubernetes-based application from AppEngine

I'm currently looking into triggering some 3D rendering from an AppEngine-based service.
The idea is that input data is submitted by an API client to this web service, which then invokes an internal Kubernetes GPU enabled application ("rendering backend") to do the hard work.
GPU-enabled clusters are relatively expensive ($$$), so I really want the cluster to be up and running on demand. I am trying to achieve that by setting the autoscaling minimum to 0 for the rendering backend.
The only pretty way of "triggering" a rendering task on such a cluster I could think of is via Pub/Sub Push. Basically, I need something like Cloud Tasks, but those seem to be aimed at long running tasks executed in AppEngine, not Kubernetes. Plus I like the way Pub/Sub decouples the web service from the rendering backend.
Google's Pub/Sub only allows pushing via HTTPS and only to a validated domain. It appears that Google is forcing me to completely "expose" my internal rendering backend by assigning a domain name to it, which feels ridiculous. I cannot just tell Pub/Sub to invoke http://loadbalancer.IP.address/handle_push.
This is making me doubt my architecture.
How would you go about building something like this on GCP?
From the GKE perspective:
You can have a cluster with a dedicated GPU-based nodepool and schedule your pods there using Taints and tolerations. Additionally, you can control the number of nodes in your nodepool using Autoscaling so that, you can use them only when your pods are to be scheduled/run.
Consider that this requires an additional default-non-GPU-based nodepool, where system pods are being run.
For triggering, as long as your default pool is running, you'd be able to deploy your application and the autoscaling should start automatically. For deploying from an App Engine application, you might want to consider talking to the Kubernetes API directly through a library.
Finally and considering the nature of your current goal (3D rendering), it might be best to use Kubernetes Jobs. With these, you can complete an sporadic computational load, allowing the nodepool to downsize once is finished.
Wrapping up, you can have a minimum cluster with a zero-sized GPU-based nodepool that will autoscale when a tainted job is requested to be run there, and once the workload is finished, it should automatically downscale. These actions can be triggered from GAE, using one of the client libraries.

Kubernetes best practices in pods

As I have been using kubernetes more I keep on seeing the reference that a pod can contain 1 container or more and I have even looked at examples.
My question is whether there is a case where this would be best practice and more efficient to create multi container pods since you can scale and replicate your pods coupling it with a service.
Thanks in advance
A Pod can contain multiple containers, but for the most portion of the situations, it makes perfect sense for the Pod to be simply an abstraction over a single running container.
In what situations does it make sense to have a multi-container deployed Pod?
What comes to my mind are the scenarios where you have a primary Pod running, but you need to tightly couple helper processes, such as a log watcher. In those situations, it makes perfect sense to actually have multiple containers running inside a single pod.
Another big example that comes to my mind is from the Istio project, which is a platform made to connect, manage and secure microservices and is generally referred as a Service Mesh.
A huge part of what it does and is able to accomplish to provide a greater control and customization over the deployed microservices network, is due to the fact that it deploys a sidecar proxy, denominated Envoy, throughout the environment intercepting all network communication between microservices.
Here, you can check an example of load balancing in a Istio service mesh. As you can see the Proxy is deployed inside the Pod, intercepting all communication that goes through it.

Kubernetes architecture with on premise api gateway

When using on premise (running on my own) api gateway like Kong, should it be run in a node as 1 withing the main kubernetes cluster or should it be ran as separate kubernetes cluster?
Unless you have an amazing reason to do otherwise: run Kong within the cluster. Pretty much the last thing you'd want is for all API requests to bomb because of a severed connection between cluster-A and cluster-B, not to mention the horrible latency as requests hop from one layer of abstraction to another.
Taking a page from the nginx Ingress controller, you also have the opportunity to use the Endpoint API to bypass the iptables-based Service machinery, saving even more latency and system resources -- a trick that would be almost impossible with a multi-cluster configuration.
It is my recollection there are even Kong-based Ingress controllers, which could save you even more heartache if their featureset and your needs align