Airflow on k8s and Google Operators: creds verification - kubernetes

I have Apache Airflow on k8s.
Earlier, when Airflow was running on my local server (not k8s) i didn't have troubles with oauth2 creds verification: when Google Operators (based on GoogleCloudHook) starts, my browser opens and redirects me to Google Auth page. It was one-time procedure.
With Airflow on k8s my tasks running on separate pods and there are troubles with this oauth2 creds verification, i cant "open browser" inside pod, and i dont want to do it every time when my task will be running.
Can I somehow disable this procedure or automatizate this?
Is there any solution?

In order to authenticate you shoukd firstly be using the correct operator and executor in Airflow. In your case this would be the Kubernetes Executor. When using this executor you need to set up secret/s for use with k8s.
Refer to the documentation here Kubernetes Executor
Overview

Related

How to authenticate to a GKE cluster without using the gcloud CLI

I've got a container inside a GKE cluster and I want it to be able to talk to the Kubernetes API of another GKE cluster to list some resources there.
This works well if run the following command in a separate container to proxy the connection for me:
gcloud container clusters get-credentials MY_CLUSTER --region MY_REGION --project MY_PROJECT; kubectl --context MY_CONTEXT proxy --port=8001 --v=10
But this requires me to run a separate container that, due to the size of the gcloud cli is more than 1GB big.
Ideally I would like to talk directly from my primary container to the other GKE cluster. But I can't figure out how to figure out the IP address and set-up the authentication required for the connection.
I've seen a few questions:
How to Authenticate GKE Cluster on Kubernetes API Server using its Java client library
Is there a golang sdk equivalent of "gcloud container clusters get-credentials"
But it's still not really clear to me if/how this would work with the Java libraries, if at all possible.
Ideally I would write something like this.
var info = gkeClient.GetClusterInformation(...);
var auth = gkeClient.getAuthentication(info);
...
// using the io.fabric8.kubernetes.client.ConfigBuilder / DefaultKubernetesClient
var config = new ConfigBuilder().withMasterUrl(inf.url())
.withNamespace(null)
// certificate or other autentication mechanishm
.build();
return new DefaultKubernetesClient(config);
Does that make sense, is something like that possible?
There are multiple ways to connect to your cluster without using the gcloud cli, since you are trying to access the cluster from another cluster within the cloud you can use the workload identity authentication mechanism. Workload Identity is the recommended way for your workloads running on Google Kubernetes Engine (GKE) to access Google Cloud services in a secure and manageable way. For more information refer to this official document. Here they have detailed a step by step procedure for configuring workload identity and provided reference links for code libraries.
This is drafted based on information provided in google official documentation.

CloudSQL Proxy on GKE : Service vs Sidecar

Does anyone know the pros and cons for installing the CloudSQL-Proxy (that allows us to connect securely to CloudSQL) on a Kubernetes cluster as a service as opposed to making it a sidecar against the application container?
I know that it is mostly used as a sidecar. I have used it as both (in non-production environments), but I never understood why sidecar is more preferable to service. Can someone enlighten me please?
The sidecar pattern is preferred because it is the easiest and more secure option. Traffic to the Cloud SQL Auth proxy is not encrypted or authenticated, and relies on the user to restrict access to the proxy (typically be running local host).
When you run the Cloud SQL proxy, you are essentially saying "I am user X and I'm authorized to connect to the database". When you run it as a service, anyone that connects to that database is connecting authorized as "user X".
You can see this warning in the Cloud SQL proxy example running as a service in k8s, or watch this video on Connecting to Cloud SQL from Kubernetes which explains the reason as well.
The Cloud SQL Auth proxy is the recommended way to connect to Cloud SQL, even when using private IP. This is because the Cloud SQL Auth proxy provides strong encryption and authentication using IAM, which can help keep your database secure.
When you connect using the Cloud SQL Auth proxy, the Cloud SQL Auth proxy is added to your pod using the sidecar container pattern. The Cloud SQL Auth proxy container is in the same pod as your application, which enables the application to connect to the Cloud SQL Auth proxy using localhost, increasing security and performance.
As sidecar is a container that runs on the same Pod as the application container, because it shares the same volume and network as the main container, it can “help” or enhance how the application operates. In Kubernetes, a pod is a group of one or more containers with shared storage and network. A sidecar is a utility container in a pod that’s loosely coupled to the main application container.
Sidecar Pros: Scales indefinitely as you increase the number of pods. Can be injected automatically. Already used by serviceMeshes.
Sidecar Cons: A bit difficult to adopt, as developers can't just deploy their app, but deploy a whole stack in a deployment. It consumes much more resources and it is harder to secure because every Pod must deploy the log aggregator to push the logs to the database or queue.
Refer to the documentation for more information.

RabbitMQ Topology Operator cant create Exchange on kubernetes

i am trying to use rabbitMQ Topology Operator to manage a rabbitMQ cluster running on kubernetes.
As a setup i have deployed rabbitmq-cluster-operator to create cluster and enabled necessary plugins like the management plugin.
Next i deployed rabbitmq-topology operator in same namespace.
After definining some infrastructur for topology oeprator eg an Exchange the topology operator just logs ERRORs when trying to create the exchange
"Error: API responded with a 401 Unauthorized"
Seems like the topology operator can not authorize against the management api.
I followed instructions to install the operator here
https://www.rabbitmq.com/kubernetes/operator/using-topology-operator.html
Iam wondering if a have to configure a user for the topology operator to authorize against the management api?
The Topology Operator is using the "{RabbitClusterName}-default-user" secret and the RabbitMQ Cluster Operator is generating a random default username/password pair when it creates the cluster.
I had the same problem because I overwrote the default user and password in aditionalConfig and the one created by the operator didn't work anymore.
Make sure that the user from the {RabbitClusterName}-default-user secret works with the management api. It should be in the same namespace as the cluster.

Airflow with KubernetesExecutor workers (EKS) and webserver+scheduler on EC2

I wanted to know if it's possible to setup a KubernetesExecutor on Airflow but having the webserver and scheduler running on an EC2?
Meaning that tasks would run on Kubernetes pods (EKS in my case) but the base services on a regular EC2.
I tried to find information about the issue but failed short...
the following quote is from Airflow's docs, and it's the reason I'm asking this question
KubernetesExecutor runs as a process in the Airflow Scheduler. The scheduler itself does not necessarily need to be running on Kubernetes, but does need access to a Kubernetes cluster.
Thanks in advance!
Yes, this is entirely possible.
You just need to run your airflow scheduler and airflow webserver on EC2 and configure the EC2 instance to have all the necessary acces (via service account likely - but this is your decision and deployment configuration) to be able to spin pods on your EKS cluster.
Nothing special about it besides that you will have to learn how to run and configure the components to talk to each other - there are no ready-to-use recipes, you will have to simply follow theconfiguration parameters of Airflo, and authentication schemes that you need to have.

Running kiam server securely

Can anyone explain an example of using kiam on kubernetes to manage service-level access control to aws resources?
According to the docs:
The server is the only process that needs to call sts:AssumeRole and
can be placed on an isolated set of EC2 instances that don't run other
user workloads.
I would like to know to run the server part of it away from nodes that host your services.
Answer: KIAM architecture is well explained here:
https://www.bluematador.com/blog/iam-access-in-kubernetes-kube2iam-vs-kiam
Basically you want to use Master Nodes in your cluster with IAM::STS permissions on them to install the Server portion of kiam and then let your worker nodes connect to master nodes to retrieve credentials.
DISCLAIMER: I did some digging on k2iam and kiam without going all the way through to taking them to a test bench and wasn't happy with what I found out. It turns out we don't need them anymore starting with K8s 1.13 in EKS, that is as of september 4th as native support from AWS has been added for PODS to access IAM STS.
https://docs.aws.amazon.com/en_pv/eks/latest/userguide/iam-roles-for-service-accounts.html