kubectl x509: certificate signed by unknown authority - kubernetes

We have running instance on GKE. Till this afternoon we started to receive "x509: certificate signed by unknown authority" error from new created kubernetes cluster. The kubectl is working with old clusters but not with new ones.
What we tried:
gcloud update
kubectl update
gcloud re authenticate
clean gcloud install & auth
.kube/config cert remove & gcloud container clusters get-credentials
remove and add new clusters
Thanks.

Related

What causes x509 cert unknown authority in some Kubernetes clusters when using the Hashicorp Vault CLI?

I'm trying to deploy an instance of HashiCorp Vault with TLS and integrated storage using the official Helm chart. I've run through the official tutorial using minikube without any issues. I also tested this tutorial with a cluster created with kind. The tutorial went as expected on both minikube and kind, however when I tried on a production cluster created with TKGI (Tanzu Kubernetes Grid Integrated) I ran into x509 errors running vault commands in the server pods. I can get by some of them by using -tls-skip-verify, but what may be different between these two clusters to cause the warning? It seems to be causing additional problems when I try to join the replicas to the raft pool.
Here's an example showing the x509 error,
bash-3.2$ kubectl exec -n vault vault-0 -- vault operator init \
> -key-shares=1 \
> -key-threshold=1 \
> -format=json > /tmp/vault/cluster-keys.json
Get "https://127.0.0.1:8200/v1/sys/seal-status": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "ca")
Is there something that could be updated on the TKGI clusters so that these x509 errors could be avoided?

Unable to connect with kubectl to a GKE cluster

I am currently trying to connect with kubeclt to a GKE cluster.
I followed the steps in the documentation and executed successfully the following:
gcloud container clusters get-credentials <cluster_name> --zone <zone>
Some days ago it work perfectly fine. I was able to setup a connection with kubectl.
The configuration was not changed in any way. And also still try to accessing the cluster through the same network. The cluster itself is running stable. Whatever I try I run into a timeout.
I have already had a look into the kubectl configuration:
kubectl config view
It seems to be that the access token is expired.
...
expiry: "2022-08-01T12:12:35Z"
expiry-key: '{.credential.token_expiry}'
token-key: '{.credential.access_token}'
...
Is there any change to update the token? I am not able to update the token with the get-credential command. Already delete the configuration completely and run the command afterwards. But the token is still the same.
I am very thankful for any hints or ideas on this.
Have you tried rerunning your credentials command again to refresh your local kubeconfig?
gcloud container clusters get-credentials <cluster_name> --zone <zone>
Alternatively, try the beta variant:
gcloud beta container clusters get-credentials <cluster_name> --zone <zone>
(You may need to install the beta package using gcloud components install beta)

Kubernetes (Azure's AKS) suddenly gives error "kubectl x509 certificate has expired or is not yet valid"

Suddenly an entire Kubernetes cluster (Azure's AKS-solution) became unresponsive.
When running kubectl commands, the result is kubectl x509 certificate has expired or is not yet valid.
Nothing in Azure Portal indicates an unhealthy state.
The quick solution:
az aks rotate-certs -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME
When certificates have been rotated, you can use kubectl again.
Be ready to wait 30 minutes before the cluster fully recovers.
Full explanation can be found in this article:
https://learn.microsoft.com/en-us/azure/aks/certificate-rotation
AKS clusters created prior to May 2019 have certificates that expire after two years. Any cluster created after May 2019 or any cluster that has its certificates rotated have Cluster CA certificates that expire after 30 years. All other AKS certificates, which use the Cluster CA for signing, will expire after two years and are automatically rotated during an AKS version upgrade which happened after 8/1/2021. To verify when your cluster was created, use kubectl get nodes to see the Age of your node pools.
Here is the commands you can resolve the issue by rotate certificates and
az account set --subscription
az aks get-credentials -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME
az aks rotate-certs -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME
Note: get-credentials is mandatory to rotate certificates.

kubectl unable to recognize "dashboard/deployment.yml"

I am getting the following error when i try to deploy a kubernetes service using my bitbucket pipeline to my kubernetes cluster. I am using deploying services method to deploy the service which works fine on my local machine so i am not able to reproduce the issue.
Is it a certificate issue or some configuration issue ?
How can i resolve this ?
1s
+ kubectl apply -f dashboard/
unable to recognize "dashboard/deployment.yml": Get https://kube1.mywebsitedomain.com:6443/api?timeout=32s: x509: certificate is valid for kube1, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not kube1.mywebsitedomain.com
unable to recognize "dashboard/ingress.yml": Get https://kube1.mywebsitedomain.com:6443/api?timeout=32s: x509: certificate is valid for kube1, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not kube1.mywebsitedomain.com
unable to recognize "dashboard/secret.yml": Get https://kube1.mywebsitedomain.com:6443/api?timeout=32s: x509: certificate is valid for kube1, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not kube1.mywebsitedomain.com
unable to recognize "dashboard/service.yml": Get https://kube1.mywebsitedomain.com:6443/api?timeout=32s: x509: certificate is valid for kube1, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not kube1.mywebsitedomain.com
Before running the apply command I did set the cluster using the kubectl config and i get the following on the console.
+ kubectl config set-cluster kubernetes --server=https://kube1.mywebsitedomain.com:6443
Cluster "kubernetes" set.
It was the certificate issue. Using the right certificate will definitely solve this problem but in my case the certificate verification wasn't necessary as secure connection is not required for this spike.
So here is my work around
I used the flag --insecure-skip-tls-verify with kubectl and it worked fine
+ kubectl --insecure-skip-tls-verify apply -f dashboard/
deployment.extensions/kubernetes-dashboard unchanged
ingress.extensions/kubernetes-dashboard unchanged
secret/kubernetes-dashboard-auth unchanged
service/kubernetes-dashboard unchanged

Deploying an app to gke from CI

I use gitlab for my CI, they host it and i have my own runners.
I have a k8s cluster running in gke.
I want to use kubectl apply to deploy new versions of my containers.
This all works from my local machine because it uses my google account.
I tried setting this all up as suggested by k8s and gitlab
1. copy over the ca.crt
2. copy over the token
- echo "$KUBE_CA_PEM" > kube_ca.pem
- kubectl config set-cluster default-cluster --server=$KUBE_URL --certificate-authority="$(pwd)/kube_ca.pem"
- kubectl config set-credentials default-admin --token=$KUBE_TOKEN
- kubectl config set-context default-system --cluster=default-cluster --user=default-admin
- kubectl config use-context default-system
When i do this it fails with x509: certificate signed by unknown authority
I tried going to the google cloud console > cluster > show credentials and instead of the token specify the username and password that it shows me there, this fails with the same error.
Finally i tried using the --insecure-skip-tls-verify=true but then it complains error: You must be logged in to the server (the server has asked for the client to provide credentials)
Any Help would be appreciated.
The cause of this problem was an incorrect server url. The server needs to be the one defined on the cluster information page in the google cloud console. You will find an Endpoing ip address.