Flink session cluster and jobs submission in Kubernetes - kubernetes

Our team set up a Flink Session Cluster in our K8S cluster. We chose Flink Session Cluster rather than Job Cluster because we have a number of different Flink Jobs, so that we want to decouple the development and deployment of Flink from those of our jobs. Our Flink setup contains:
Single JobManager as a K8S pod, no High Availability (HA) setup
A number of TaskManagers, each as a K8S pod
And we develop our jobs in a separate repository and deploy to Flink cluster when there is code merged.
Now, we noticed that JobManager as a pod in K8S can be redeployed anytime by K8S. So, once it is redeployed, it loses all jobs. To solve this problem, we developed a script that keeps monitoring the jobs in Flink, if jobs not running, the script will resubmit the jobs to the cluster. Since it may take some time for the script to discover and resubmit the jobs, there is a small service break quite often, and we are thinking if this could be improved.
So far, we have some ideas or questions:
One possible solution could be: when the JobManager is (re)deployed, it will fetch the latest Jobs jar and run the jobs. This solution looks overall good. Still, since our jobs are developed in a separate repo, we need a solution for the cluster to notice the latest jobs when there are changes in the jobs, either JobManager keeps polling the latest jobs jar or Jobs repo deploys the latest jobs jar.
I see that Flink HA feature can store checkpoints/savepoints, but not sure if Flink HA can already handle this redeployment issue?
Does anyone have any comment or suggestion on this? Thanks!

Yes, Flink HA will solve the JobManager failover problems you're concerned about. The new job manager will pick up information about what jobs are (supposed to be) running, their jars, checkpoint status, etc, from the HA storage.
Note also that Flink 1.10 includes a beta release of native support for Kubernetes session clusters. See the docs.

Related

How to keep Flink taskmanager pod running on K8s

I've created a Flink cluster using Session mode on native K8s using the command:
$ ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=my-first-flink-cluster
based on these instructions.
I am able to submit jobs to the cluster and view the Flink UI. However, I noticed that Flink creates a taskmanager pod only when the job is submitted and deletes it right after the job is finished. Previously I tried the same using YARN based deployment on Google Dataproc and with that method the cluster had a taskmanager always running which reduced job start time.
Hence, is there a way to keep a taskmanager pod always running using K8s Flink deployment such that job start time is reduced?
the intention of the native k8s support provided by Flink is to have this active resource allocation (i.e. task slots through new TaskManager instances) in case it is needed. In addition to that, it will allow the shutdown of TaskManager pods if they are not used anymore. That's the behavior you're observing.
What you're looking for is the standalone k8s support. Here, Flink does not try to start new TaskManager pods. The ResourceManager is passive, i.e. it only considers the TaskManagers that are registered. Some outside process (or a user) has to manage TaskManager pods instead. This might lead to jobs failing if there are not enough task slots available.
Best,
Matthias

Best practice to clean up Flink application cluster on Kubernetes when the application is completed

We are running Flink jobs on Kubernetes in Application mode, the problem is when the job is completed/stopped, the job manager container will exit but the 1. deployment for task managers 2. job manager service 3. configMap will still be there unless we run kubectl delete to clean it up.
This is not a big deal if we stop the job manually, but in case our Flink job is a batch job which will complete sometime later, it means we need an external service to keep monitoring job manager container and clean up the rest resources when it's done, which is not very practical.
I wonder what's the best practice here? Do we support run Flink batch jobs on Kubernetes? If yes then there should be a way for the Flink job itself to clean up everything when it's completed right?
I assume that you are running standalone Flink application on Kubernetes. In such mode, Flink is not aware of Kubernetes cluster. So the users have to leverage some external tools(e.g. kubectl, k8s-operator) to manage the lifecyle of Flink clusters. This means that you need to delete the TaskManager deployment, configmaps, services manually.
I think this situation could get improved via the following two ways.
Set the owner reference for TaskManager deployment, configmaps, services to JobManager job. However, you still need to delete the Kubernetes job manually after application finished.
Have a try on the native Kubernetes integration. Flink will have an embedded Kubernetes client and could delete the resource automatically when application finished.

How different is the Flink deployment on Kubernetes and Native Kubernetes

What are the major difference b/w Native Kubernetes and Kubernetes deployments?
I'm new to Kubernetes and trying to understand how different is the Flink deployments on them.
If any insight into internals is given it will be of great help.
In a Kubernetes session or per-job deployment, Flink has no idea it's running on Kubernetes. In this mode, Flink behaves as it does in any standalone deployment (where there is no cluster framework available to do resource management). Kubernetes just happens to be how the infrastructure was created, but as far as Flink is concerned, it could have been bare metal. You will have to arrange for kubernetes to create the infrastructure that you will have configured Flink to expect.
In a Native Kubernetes session deployment, Flink uses its KubernetesResourceManager, which submits a description of the cluster it wants to the Kubernetes ApiServer, which creates it. As jobs come and go, and the requirements for task managers (and slots) go up and down, Flink is able to obtain and release resources from kubernetes as appropriate.
In Application Mode (blog post) (details) you end up with Flink running as a kubernetes application, which will automatically create and destroy cluster components as needed for the job(s) in one Flink application.

Flink Statefun HA kubernetes cluster

I'm trying to deploy high available flink cluster on kubernetes. In the below examples worker nodes are replicated but we have only one master pod.
https://github.com/apache/flink-statefun
As far as I understand there are 2 approaches to make job manager HA.
https://ci.apache.org/projects/flink/flink-docs-stable/ops/jobmanager_high_availability.html
https://medium.com/hepsiburadatech/high-available-flink-cluster-on-kubernetes-setup-73b2baf9200e
In the first example we deploy another job manager to switch between them in case of failure
In the second example kubernetes redeploy the job manager pod in case of failure
So I have few questions
For both examples what happens to the running jobs when the active job manager fails?
Can the first scenario be applied on kubernetes?
For the second scenario in case of job manager failure flink UI will be unavailable until the pod recover but in the second first scenario it will be available am I right?
What is the pros/cons of the both scenarios?
There is one approach to make job manager HA, both of your link is using the JM HA using zookeeper cluster to make active/standby arhitecture of the JM.
When JobManager fails there is a "Failover" such as describe in apache flink documentation(first link), the standby JM become to be Active.
Ofcouse, kubernetes is just the deployment of the whole Flink cluster, you can still use the HA cluster mode using zk.
No, both will make the "failover" and a standby JM will become active.
You are not understand that kubernetes is only the deploy cluster of flink, Same as you can deploy it on phsical/virtual servers, than u can deploy it on kubernetes, but things like High Aviability will stay the same.
EDIT:
You can make 2 or more pods in kubernetes of JobManager and then it`ll be equal to the first solution.

How to best run Apache Airflow tasks on a Kubernetes cluster?

What we want to achieve:
We would like to use Airflow to manage our machine learning and data pipeline while using Kubernetes to manage the resources and schedule the jobs. What we would like to achieve is for Airflow to orchestrate the workflow (e.g. Various tasks dependencies. Re-run jobs upon failures) and Kubernetes to orchestrate the infrastructure (e.g cluster autoscaling and individual jobs assignment to nodes). In other words Airflow will tell the Kubernetes cluster what to do and Kubernetes decides how to distribute the work. In the same time we would also want Airflow to be able to monitor the individual tasks status. For example if we have 10 tasks spreaded across a cluster of 5 nodes, Airflow should be able to communicate with the cluster and reports show something like: 3 “small tasks” are done, 1 “small task” has failed and will be scheduled to re-run and the remaining 6 “big tasks” are still running.
Questions:
Our understanding is that Airflow has no Kubernetes-Operator, see open issues at https://issues.apache.org/jira/browse/AIRFLOW-1314. That being said we don’t want Airflow to manage resources like managing service accounts, env variables, creating clusters, etc. but simply send tasks to an existing Kubernetes cluster and let Airflow know when a job is done. An alternative would be to use Apache Mesos but it looks less flexible and less straightforward compared to Kubernetes.
I guess we could use Airflow’s bash_operator to run kubectl but this seems not like the most elegant solution.
Any thoughts? How do you deal with that?
Airflow has both a Kubernetes Executor as well as a Kubernetes Operator.
You can use the Kubernetes Operator to send tasks (in the form of Docker images) from Airflow to Kubernetes via whichever AirflowExecutor you prefer.
Based on your description though, I believe you are looking for the KubernetesExecutor to schedule all your tasks against your Kubernetes cluster. As you can see from the source code it has a much tighter integration with Kubernetes.
This will also allow you to not have to worry about creating the docker images ahead of time as is required with the Kubernetes Operator.