Airflow with KubernetesExecutor workers (EKS) and webserver+scheduler on EC2 - kubernetes

I wanted to know if it's possible to setup a KubernetesExecutor on Airflow but having the webserver and scheduler running on an EC2?
Meaning that tasks would run on Kubernetes pods (EKS in my case) but the base services on a regular EC2.
I tried to find information about the issue but failed short...
the following quote is from Airflow's docs, and it's the reason I'm asking this question
KubernetesExecutor runs as a process in the Airflow Scheduler. The scheduler itself does not necessarily need to be running on Kubernetes, but does need access to a Kubernetes cluster.
Thanks in advance!

Yes, this is entirely possible.
You just need to run your airflow scheduler and airflow webserver on EC2 and configure the EC2 instance to have all the necessary acces (via service account likely - but this is your decision and deployment configuration) to be able to spin pods on your EKS cluster.
Nothing special about it besides that you will have to learn how to run and configure the components to talk to each other - there are no ready-to-use recipes, you will have to simply follow theconfiguration parameters of Airflo, and authentication schemes that you need to have.

Related

Good solutions to automate infrastructure deployment locally?

I have recently been reading more about infrastructure as a service (IaaS) and platform as a service (PaaS) and had some questions. I see when we opt for a PaaS solution, it is generally very easy to create the infrastructure as the cloud providers handle that for us and we can even automate the deployment using an infrastructure as code solution like Terraform.
But if we use an IaaS solution or even a local on premise cluster, we lose a lot of the automation it seems that PaaS allows. So I was curious, are there any good tools out there for automating infrastructure deployment on a local cluster that is not in the cloud?
The best thing I could think of was to run a local Kubernetes cluster and then Dockerize each of the infrastructure components, but this seems difficult as each node in the cluster will need its own specific configuration files.
From my basic Googling, it seems like there is not a good solution to this.
Edit:
I was not clear enough with my original intentions. I have two problems I am trying to solve.
How do I automate infrastructure deployment locally? For example, suppose I wanted to create a Hadoop HDFS cluster. I would need to configure one node to be the namenode with an accessible IP, and the other nodes to be datanodes that are aware of the namenode's IP. At the moment, I have to do this manually by logging into each node, checking it's IP, and then configuring each one. How would I automate this? If I were to use a Kubernetes approach, how do I specify that one of the running pods needs to be the namenode and the others are datanodes? How do I find the pods' IPs and have them be aware of the namenode IP?
The next problem I have is very similar to the first, but a slight modification. How would I deploy specific configuration files to each node. For instance in Kafka, the configuration file for one node, requires the IPs of the Zookeeper nodes, as well as the IP it should listen on. This may be different for every node in the cluster. Is there a good way to make these config files pod specific, so that I do not have to do bash text processing to insert the correct contents into each pod's config files?
You can use Terraform for all of your on-premise Infra. Automation, and Ansible for configuration management.
Let's say you have three HPE servers, Install K8s or VMware on them using Ansible, then you can treat them as three Avvaliabilty zones in one region, same as AWS. from this you can start deploying dockerize apps, or helm charts using Terraform.
Summary:
Ansbile for installing and configuration K8s.
Terraform for provisioning K8s.
Helm for installing apps on K8s.
After this you gonna have a base automated on-premise Infra.

Running kiam server securely

Can anyone explain an example of using kiam on kubernetes to manage service-level access control to aws resources?
According to the docs:
The server is the only process that needs to call sts:AssumeRole and
can be placed on an isolated set of EC2 instances that don't run other
user workloads.
I would like to know to run the server part of it away from nodes that host your services.
Answer: KIAM architecture is well explained here:
https://www.bluematador.com/blog/iam-access-in-kubernetes-kube2iam-vs-kiam
Basically you want to use Master Nodes in your cluster with IAM::STS permissions on them to install the Server portion of kiam and then let your worker nodes connect to master nodes to retrieve credentials.
DISCLAIMER: I did some digging on k2iam and kiam without going all the way through to taking them to a test bench and wasn't happy with what I found out. It turns out we don't need them anymore starting with K8s 1.13 in EKS, that is as of september 4th as native support from AWS has been added for PODS to access IAM STS.
https://docs.aws.amazon.com/en_pv/eks/latest/userguide/iam-roles-for-service-accounts.html

How to best run Apache Airflow tasks on a Kubernetes cluster?

What we want to achieve:
We would like to use Airflow to manage our machine learning and data pipeline while using Kubernetes to manage the resources and schedule the jobs. What we would like to achieve is for Airflow to orchestrate the workflow (e.g. Various tasks dependencies. Re-run jobs upon failures) and Kubernetes to orchestrate the infrastructure (e.g cluster autoscaling and individual jobs assignment to nodes). In other words Airflow will tell the Kubernetes cluster what to do and Kubernetes decides how to distribute the work. In the same time we would also want Airflow to be able to monitor the individual tasks status. For example if we have 10 tasks spreaded across a cluster of 5 nodes, Airflow should be able to communicate with the cluster and reports show something like: 3 “small tasks” are done, 1 “small task” has failed and will be scheduled to re-run and the remaining 6 “big tasks” are still running.
Questions:
Our understanding is that Airflow has no Kubernetes-Operator, see open issues at https://issues.apache.org/jira/browse/AIRFLOW-1314. That being said we don’t want Airflow to manage resources like managing service accounts, env variables, creating clusters, etc. but simply send tasks to an existing Kubernetes cluster and let Airflow know when a job is done. An alternative would be to use Apache Mesos but it looks less flexible and less straightforward compared to Kubernetes.
I guess we could use Airflow’s bash_operator to run kubectl but this seems not like the most elegant solution.
Any thoughts? How do you deal with that?
Airflow has both a Kubernetes Executor as well as a Kubernetes Operator.
You can use the Kubernetes Operator to send tasks (in the form of Docker images) from Airflow to Kubernetes via whichever AirflowExecutor you prefer.
Based on your description though, I believe you are looking for the KubernetesExecutor to schedule all your tasks against your Kubernetes cluster. As you can see from the source code it has a much tighter integration with Kubernetes.
This will also allow you to not have to worry about creating the docker images ahead of time as is required with the Kubernetes Operator.

Should we run a Consul container in every Pod?

We run our stack on the Google Cloud Platform (hosted Kubernetes, GKE) and have a Consul cluster running outside of K8s (regular GCE instances).
Several services running in K8s use Consul, mostly for it's CP K/V Store and advanced locking, not so much for service discovery so far.
We recently ran into some issues with using the Consul service discovery from within K8s. Right now our apps talk directly to the Consul Servers to register and unregister services they provide.
This is not recommended best-practice, usually Consul clients (i.e. apps using Consul) should talk to the local Consul agent. In our setup there are no local Consul agents.
My Question: Should we run local Consul agents as sidekick containers in each pod?
IMHO this would be a huge waste of ressources, but it would match the Consul best-practies better.
I tried searching on Google, but all posts about Consul and Kubernetes talk about running Consul in K8s, which is not what I want to do.
As the official Consul Helm chart and the documentation suggests the standard approach is to run a DaemonSet of Consul clients and then use a connect-side-car injector to inject sidecars into your node simply by providing an annotation of the pod spec. This should handle all of the boilerplate and will be inline with best practices.
Consul: Connect Sidecar; https://www.consul.io/docs/platform/k8s/connect.html

Are there any scripts to monitor k8s‘ status?

If used on the production system, k8s related services might be down at sometime. Are there any scripts provided that can monitor and restart the services, or i need to develop my scripts and add them to crontab.
I'm guessing you mean things like the scheduler, apiserver etc. If so, they're already monitored by the kubelet running on that node. Kubelet itself is monitored by a babysitter (your init system- eg upstart, systemd etc). Depending on how your provisioned your cluster, the manifest files for those kube-daemons might be under /etc/kubernetes/manifest, those will have health checks.
Yes..How about dashboard (web ui) and kube-dns .. recently we deployed a new cluster and kube-dns was not working, didn't realize until user reported. Looking for a automated test/utility which can validated all the kubernetes required services running properly after new cluster deployment. Looked into prometheus which helps for continuous monitoring but may not help on new cluster setup validation.