Problem with expiring kubernetes tokens in pulumi provider

Problem with expiring kubernetes tokens in pulumi provider - kubernetes

After I create a kubernetes cluster in pulumi, I get the following error when trying to delete it:
error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials
If the cluster has been deleted, you can edit the pulumi state to remove this resource
I can refresh the credentials by running a targeted pulumi refresh and then a targeted pulumi up on the k8s provider. Is there an easier way to keep the kubernetes clusters updated without having to run the targeted commands all the time?

Related

How to authenticate to a GKE cluster without using the gcloud CLI

I've got a container inside a GKE cluster and I want it to be able to talk to the Kubernetes API of another GKE cluster to list some resources there.
This works well if run the following command in a separate container to proxy the connection for me:
gcloud container clusters get-credentials MY_CLUSTER --region MY_REGION --project MY_PROJECT; kubectl --context MY_CONTEXT proxy --port=8001 --v=10
But this requires me to run a separate container that, due to the size of the gcloud cli is more than 1GB big.
Ideally I would like to talk directly from my primary container to the other GKE cluster. But I can't figure out how to figure out the IP address and set-up the authentication required for the connection.
I've seen a few questions:
How to Authenticate GKE Cluster on Kubernetes API Server using its Java client library
Is there a golang sdk equivalent of "gcloud container clusters get-credentials"
But it's still not really clear to me if/how this would work with the Java libraries, if at all possible.
Ideally I would write something like this.
var info = gkeClient.GetClusterInformation(...);
var auth = gkeClient.getAuthentication(info);
...
// using the io.fabric8.kubernetes.client.ConfigBuilder / DefaultKubernetesClient
var config = new ConfigBuilder().withMasterUrl(inf.url())
.withNamespace(null)
// certificate or other autentication mechanishm
.build();
return new DefaultKubernetesClient(config);
Does that make sense, is something like that possible?

There are multiple ways to connect to your cluster without using the gcloud cli, since you are trying to access the cluster from another cluster within the cloud you can use the workload identity authentication mechanism. Workload Identity is the recommended way for your workloads running on Google Kubernetes Engine (GKE) to access Google Cloud services in a secure and manageable way. For more information refer to this official document. Here they have detailed a step by step procedure for configuring workload identity and provided reference links for code libraries.
This is drafted based on information provided in google official documentation.

External Vault Authentication Permission denied

I am a beginner in vault and I am trying to set up an external vault for the Kubernetes application using https://learn.hashicorp.com/tutorials/vault/kubernetes-external-vault It is working completely fine if everything is on a local machine same as a tutorial but In my case, I have setup vault HA on AWS EC2 and I have a separate Kubernetes cluster in which I have set up sidecar container using Helm chart as mentioned in the tutorial. I already set up Kubernetes configuration, roles, and policy on vault server but when my application starts with annotation to get secret from vault I am getting error permisson denied from vault init container.
Does anyone please help with this? Thank you in advance.

AGIC deletes a working backend when a different pod fails to start

Deployment overview
We are using the Azure Gateway Ingress Controller (AGIC) to automatically create listeners and back-ends on an app gateway for ingresses in our AKS cluster
ArgoCD is deployed to the K8s cluster to create applications.
When ArgoCD creates an app, it pulls a helm chart from a git repo created for that instance of our app, and creates the app
The app is created with a Persistent Volume Claim to an Azure Storage File folder to store user data. It also gets an Ingress for the app that is labelled so that AGIC creates it in the App Gateway.
When everything works, all is well. I can access my argocd on one hostname, and each of my deployed apps on their hostnames - all through the App Gateway that is being maintained by AGIC
Problem description
When one of my pods fails to start (because the storage key used by the PVC is incorrect), then AGIC updates the app gateway to remove my argoCD backend, which still works correctly.
AGIC deletes my working ARGOCD back-end.
If I delete the failed pod, AGIC deploys my HTTP back-end for ArgoCD again on the app gateway.
Questions:
How can I troubleshoot why AGIC removes the ArgoCD back-end? Is there a log I can enable that will tell me in detail how it is making deployment decisions?
Is there anything I can do on AKS to try and separate the ArgoCD from the pods so that AGIC doesn't remove the back-end for ArgoCD when a pod is broken? (they are already deployed in different namespaces)

There appears to be a bug in AGIC where when some back-ends are resolved, and some are not, as soon as the first back-end in the list is unresolved, the rest of the backends are not created.
I have logged the following issue in Github to get it fixed: https://github.com/Azure/application-gateway-kubernetes-ingress/issues/1054
I found this by setting the logging parameter for AGIC to level 5, reviewing the logs and matching up the log messages to the AGIC source code in that repo.

rancher rke - cloud_provider AWS - kubelet fails on certificate error due to MITM proxy use

I'm using rke version 1.0.4 and I'm successfully deploying rancher-managed kubernetes clusters on custom nodes in AWS. I was not using the cloud_provider: aws. Now, I am trying to deploy the clusters with cloud_provider: aws enabled so that I can make use of EBS volumes for persistent volumes.
The EC2 instances are deployed separately using Terraform. When using those custom nodes to deploy a cluster with rke with a cloud_provider set to aws, kubelet fails due to a certificate error. In our environment, a MITM proxy intercepts all internet traffic and therefore a specific CA must be used to validate the certificate to ec2.us-east-1.amazonaws.com. The rancher-provider kubelet image does not trust that certificate. Is there an option to add a custom ca-bundle?
Note that the IAM roles are fine. It all works if I build a manual cluster (non-rancher) using the official RPMs. I believe kubelet looks into /etc/pki by default for a ca-bundle and therefore finds the one that includes my MITM proxy cert.
results: error in docker logs kubelet:
I0609 20:43:31.841214 8058 aws.go:1180] Zone not specified in configuration file; querying AWS metadata service
F0609 20:43:33.365708 8058 server.go:273] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-4324dfsdfdfd432a: "error listing AWS instances: \"RequestError: send request failed\\ncaused by: Post https://ec2.us-east-1.amazonaws.com/: x509: certificate signed by unknown authority\""
Anyone came around this issue in the past?
Thanks,

Error creating AKS cluster in ARM with service principal in MS Graph

I'm using the Resource Manager REST API to deploy an AKS cluster. To create the app and service principal it needs, I'm using Microsoft Graph (not Azure AD Graph).
The problem I'm running into is that there seems to be a lag between when I create the app and SP, and when they become visible in ARM. If I try creating the cluster straight after the app, I get the following error:
Bad Request (HTTP 400).
Service principal clientID: <client-id> not found in Active Directory tenant
72f988bf-86f1-41af-91ab-2d7cd011db47, Please see https://aka.ms/acs-sp-help for more details.
I can verify in the portal that the app with the specified client ID does exist, as does the service principal. If I wait a couple of minutes, then the AKS cluster creation succeeds.
Is it possible to force Graph to make the app/SP visible to ARM immediately? Alternatively, is there a way in ARM to check if the app is visible, before I try creating my cluster?

No, you just have to wait and retry, for example, I'm assigning permissions to AKS service principal to the ACR deployed alongside with AKS or to a pre-existing ACR. It will fail until service principal is recognized by ARM.
After that I start AKS deployment

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse