I need to deploy a GPU intensive task on GCP. I want to use a Node.js Docker image and within that container to run a Node.js server that listens to HTTP requests and runs a Python image processing script on-demand (every time that a new HTTP request is received containing the images to be processed). My understanding is that I need to deploy a load balancer in front of the K8s cluster that has a static public IP address which then builds/launches containers every time a new HTTP request comes in? And then destroy the container once processing is completed. Is container re-use not a concern? I never worked with K8s before and I want to understand how it works and after reading the GKE documentation this is how I imagine the architecture. What am I missing here?
runs a Python image processing script on-demand (every time that a new HTTP request is received containing the images to be processed)
This can be solved on Kubernetes, but it is not a very common kind of workload.
The project that support your problem best is Knative with its per-request auto-scaler. Google Cloud Run is the easiest way to use this. But if you want to run this within your own GKE cluster, you can enable it.
That said, you can also design your Node.js service to integrate with the Kubernetes API-server to create Jobs - but it is not a good design to have common workload talk to the API-server. It is better to use Knative or Google Cloud Run.
Related
I've dockerized a legacy desktop app. This app does resource-intensive graphical rendering from a command line interface.
I'd like to offer this rendering as a service in a "compute farm", and I wondered if Kubernetes could be used for this purpose.
If so, how in Kubernetes would I ensure that each pod only serves one request at a time (this app is resource-intensive and likely not thread-safe)? Should I write a single-threaded wrapper/invoker app in the container and thus serialize requests? Would K8s then be smart enough to route subsequent requests to idle pods rather than letting them pile up on an overloaded pod?
Interesting question.
The inbuilt default Service object along with kube-proxy does route the requests to different pods, but only does so in a round-robin fashion which does not fit our use case.
Your use-case would require changes to be made to the kube-proxy setup during the cluster setup. This approach is tedious and will require you to have your own cluster setup (not supported by cloud services). As described here.
Best bet would be to setup a service-mesh like Istio which provides the features with little configuration along with a lot of other useful functionalities.
See if this helps.
I'm kinda new to Kubernets and I think I understand the basics of the whole system but most of the stuff I have read was about how to use kubectl to start a service and deployment and stuff.
But in my use case I have this web API running (built in ASP.net core) that takes a request, does some processing and depending on the input data has to start a secondary process.
A Kubernetes job with restart policy OnFailure seemed to be the way to implement those secondary processes but I can't find any resources on how the web server can be used to start this job.
You can use Kubernetes API to create a Job(or any kubernetes resource) from your application running inside the cluster. You can either install kubectl inside your applications's container and call it from your application code or use a kubernetes client library(https://github.com/kubernetes-client/csharp) to talk to kubernetes API server.
See the following answer for more details:
Kubernetes - Finding out how many replicas there are in a service?
I'm currently looking into triggering some 3D rendering from an AppEngine-based service.
The idea is that input data is submitted by an API client to this web service, which then invokes an internal Kubernetes GPU enabled application ("rendering backend") to do the hard work.
GPU-enabled clusters are relatively expensive ($$$), so I really want the cluster to be up and running on demand. I am trying to achieve that by setting the autoscaling minimum to 0 for the rendering backend.
The only pretty way of "triggering" a rendering task on such a cluster I could think of is via Pub/Sub Push. Basically, I need something like Cloud Tasks, but those seem to be aimed at long running tasks executed in AppEngine, not Kubernetes. Plus I like the way Pub/Sub decouples the web service from the rendering backend.
Google's Pub/Sub only allows pushing via HTTPS and only to a validated domain. It appears that Google is forcing me to completely "expose" my internal rendering backend by assigning a domain name to it, which feels ridiculous. I cannot just tell Pub/Sub to invoke http://loadbalancer.IP.address/handle_push.
This is making me doubt my architecture.
How would you go about building something like this on GCP?
From the GKE perspective:
You can have a cluster with a dedicated GPU-based nodepool and schedule your pods there using Taints and tolerations. Additionally, you can control the number of nodes in your nodepool using Autoscaling so that, you can use them only when your pods are to be scheduled/run.
Consider that this requires an additional default-non-GPU-based nodepool, where system pods are being run.
For triggering, as long as your default pool is running, you'd be able to deploy your application and the autoscaling should start automatically. For deploying from an App Engine application, you might want to consider talking to the Kubernetes API directly through a library.
Finally and considering the nature of your current goal (3D rendering), it might be best to use Kubernetes Jobs. With these, you can complete an sporadic computational load, allowing the nodepool to downsize once is finished.
Wrapping up, you can have a minimum cluster with a zero-sized GPU-based nodepool that will autoscale when a tainted job is requested to be run there, and once the workload is finished, it should automatically downscale. These actions can be triggered from GAE, using one of the client libraries.
Requirement - New Relic monitoring for an application running in pods as part of a kubernetes cluster.
I have installed Kube-state-metrics on my cluster and able to see kubernetes dashboard using newrelic insights.
Also, need to configure the Application monitoring for the same. Following https://blog.newrelic.com/2017/11/27/monitoring-application-performance-in-kubernetes/ for the same.
Have some questions for the same -
Can this be achieved using kube-state-metrics ?
Do I need to have separate yaml file for each pod containing license key?
Do I need to make changes in my application also or adding the information in spec will work?
Do I need to install Java agent in every pod? If yes, will it eat resources?
Somehow, Installation of application monitoring is becoming complex. Please explain the exact requirement of installation
You didn't mention your stack, you should follow instructions on their site for your language. Typically you just pull in their agent library and configure credentials to get started. You should not have a reason to tell your pods apart, so the agent credentials should be the same for all pods
Installing agents at infrastructure will let you have infrastructure data. So you'll get alerts if you're running out of memory/space/cpu and such. Infrastructure agent cannot possibly know about application data. If you want application performance data (apm) you need to install the agent at the application level too and you'll get data such as http request rates, error rates and response times if it's a webserver. You can also annotate current transaction with data which is all application specific. They have a bunch of client agents, see if there's one for your stack. For example all you need for a nodejs service is require('newrelic') at the top of your app and configuration
We are now on our journey to break our monolith (on-prem pkg (rpm/ova)) into services (dockers).
In the process we are evaluation envoy/istio as our communication and security layer, it looks great when running as sidecar in k8s, or each service on a separate machie.
As we are going to deliver several services within one machine, and can't deliver it within k8s, I'm not sure if we can use envoy, I didn't find any reference on using envoy in additional ways, are there additional deployment methods I can use to enjoy it?
You can run part of your services on Kubernetes and part on VMs.