[AWS][EKS][Fargate] x509: certificate signed by unknown authority - x509

I'm building an AWS EKS cluster with Fargate managed nodes and everything is fine till I want to pull a docker image from a remote on-premise docker registry hosted on Harbor. CA is fully private on on-premise and I thought that this could be an issue.
As a workaround, I tried to create an Apache proxy with SSL key and cert generated by AWS PCA (from another account!). Later customized docker pull endpoint call to move (pull) through this proxy.
I tested this setup from the proxy instance as well as another bastion host instance and images are pulling correctly with Harbor authentication (not from EKS).
I checked and CA created in AWS PCA is not expired (2022 date expiration).
From inside AWS EKS, this pull doesn't work correctly. I'm including error messages:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 78s fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 5s fargate-scheduler Successfully assigned <name-of-deployment-here> to fargate-ip-10-155-250-49.eu-central-1.compute.internal
Normal Pulling 2s kubelet Pulling image "<image_name_here>"
Warning Failed 2s kubelet Failed to pull image "<apache-proxy-address>/<docker-repository-address>": rpc error: code = Unknown desc = failed to pull and unpack image "<apache-proxy-address>/<docker-repository-address>": failed to resolve reference "<apache-proxy-address>/<docker-repository-address>: failed to do request: Head https://<apache-proxy-address>/<****>/<docker-repository-address>: x509: certificate signed by unknown authority
Warning Failed 2s kubelet Error: ErrImagePull
Normal BackOff 1s kubelet Back-off pulling image "<apache-proxy-address>/<docker-repository-address>"
Warning Failed 1s kubelet Error: ImagePullBackOff
Error is caused by:
x509: certificate signed by unknown authority
Do you guys have any ideas?
Thanks in advance!

There is no solution to this problem at the moment. We have to wait for AWS to implement private certificate support in EKS from ACM Private CA. Currently, certificates accepted by the EKS service have to be signed by some public CA.

Related

Not able to pull registry.k8s.io in Microk8s start

I'm trying to kickstart a MicroK8s cluster but the Calico pod stays on Pending status because of a 403 error against the pull of registry.k8s.io/pause:3.7
This is the error:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "registry.k8s.io/pause:3.7": failed to pull image "registry.k8s.io/pause:3.7": failed to pull and unpack image "registry.k8s.io/pause:3.7": failed to resolve reference "registry.k8s.io/pause:3.7": pulling from host registry.k8s.io failed with status code [manifests 3.7]: 403 Forbidden
We're talking about a new server which might be missing some configuration.
The insecure registry, according to Microk8s documentation, is enabled at localhost:3200.
I've enabled the dns on the Microk8s but nothing has change.
If I try to pull from docker I get a forbidden error.

K8S PODS from workers can not access to private repository deployed on Master

I have created a K8S Cluster with 3 nodes ( 1 Master + 2 Workers / CNI flannel).
I have deployed a private repository localhost:31320 on Master using docker registry ( secret has been correctly defined in deployment registry file).
My issue is that I can not pull image from private registry ( it works correctly on Master)
I get error:
"Warning Failed 4m15s (x6 over 22m) kubelet Failed to pull image "localhost:31320/automation-platform/base:11.0.15.1.centos.7.9.2009.2": rpc error: code = Unknown desc = Error while pulling image: Get http://localhost:31320/v1/repositories/automation-platform/base/images: dial tcp [::1]:31320: connect: connection refused
"
I do not know how to investigate it and why my workers does not acces to private repo ( connection refused ?)
I don't see any error under kubelet logs of worker node.
Note that my deployment files have been already used on a previous network and all worked correctly. The difference I saw is that I used K8S 1.20.
Now, I use K8S 1.23
Does someone could tell how I can investigate to see the reason of my problem ?
Regards,
try to edit /etc/containerd/config.toml and add your registry ip:port like this
{
"insecure-registries":
["172.16.4.93:5000"]
}

Failed to pull image "velero/velero-plugin-for-gcp:v1.1.0" while installing Velero in GKE Cluster

I'm trying to install and configure Velero for kubernetes backup. I have followed the link to configure it in my GKE cluster. The installation went fine, but velero is not working.
I am using google cloud shell for running all my commands (I have installed and configured velero client in my google cloud shell)
On further inspection on velero deployment and velero pods, I found out that it is not able to pull the image from the docker repository.
kubectl get pods -n velero
NAME READY STATUS RESTARTS AGE
velero-5489b955f6-kqb7z 0/1 Init:ErrImagePull 0 20s
Error from velero pod (kubectl describe pod) (output redacted for readability - only relevant info shown below)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 38s default-scheduler Successfully assigned velero/velero-5489b955f6-kqb7z to gke-gke-cluster1-default-pool-a354fba3-8674
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Failed to pull image "velero/velero-plugin-for-gcp:v1.1.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ErrImagePull
Normal BackOff 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Back-off pulling image "velero/velero-plugin-for-gcp:v1.1.0"
Warning Failed 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ImagePullBackOff
Normal Pulling 8s (x2 over 37s) kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Pulling image "velero/velero-plugin-for-gcp:v1.1.0"
Command used to install velero: (some of the values are given as variables)
velero install \
--provider gcp \
--plugins velero/velero-plugin-for-gcp:v1.1.0 \
--bucket $storagebucket \
--secret-file ~/velero-backup-storage-sa-key.json
Velero Version
velero version
Client:
Version: v1.4.2
Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
<error getting server version: timed out waiting for server status request to be processed>
GKE version
v1.15.12-gke.2
Isn't this a Private Cluster ? – mario 31 mins ago
#mario this is a private cluster but I can deploy other services without any issues (for eg: I have deployed nginx successfully) –
Sreesan 15 mins ago
Well, this is a know limitation of GKE Private Clusters. As you can read in the documentation:
Can't pull image from public Docker Hub
Symptoms
A Pod running in your cluster displays a warning in kubectl describe such as Failed to pull image: rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Potential causes
Nodes in a private cluster do not have outbound access to the public
internet. They have limited access to Google APIs and services,
including Container Registry.
Resolution
You cannot fetch images directly from Docker Hub. Instead, use images
hosted on Container Registry. Note that while Container Registry's
Docker Hub
mirror
is accessible from a private cluster, it should not be exclusively
relied upon. The mirror is only a cache, so images are periodically
removed, and a private cluster is not able to fall back to Docker Hub.
You can also compare it with this answer.
It can be easily verified on your own by making a simple experiment. Try to run two different nginx deployments. First based on image nginx (which equals to nginx:latest) and the second one based on nginx:1.14.2.
While the first scenario is perfectly feasible because the nginx:latest image can be pulled from Container Registry's Docker Hub mirror which is accessible from a private cluster, any attempt of pulling nginx:1.14.2 will fail which you'll see in Pod events. It happens because the kubelet is not able to find this version of the image in GCR and it tries to pull it from public docker registry (https://registry-1.docker.io/v2/), which in Private Clusters is not possible. "The mirror is only a cache, so images are periodically removed, and a private cluster is not able to fall back to Docker Hub." - as you can read in docs.
If you still have doubts, just ssh into your node and try to run following commands:
curl https://cloud.google.com/container-registry/
curl https://registry-1.docker.io/v2/
While the first one works perfectly, the second one will eventually fail:
curl: (7) Failed to connect to registry-1.docker.io port 443: Connection timed out
Reason ? - "Nodes in a private cluster do not have outbound access to the public internet."
Solution ?
You can search what is currently available in GCR here.
In many cases you should be able to get the required image if you don't specify it's exact version (by default latest tag is used). While it can help with nginx, unfortunatelly no version of velero/velero-plugin-for-gcp is currently available in Google Container Registry's Docker Hub mirror.
Granting private nodes outbound internet access by using Cloud NAT seems the only reasonable solution that can be applied in your case.
I solved this problem by realizing that version of:
velero/velero-plugin-for-gcp
is not following the version of:
velero/velero
For example, now latest versions are:
velero/velero:v1.9.1 and velero/velero-plugin-for-gcp:v1.5.0

Unauthorized when trying to allow nodes to join a Kubernetes cluster

I had a two node cluster in which one was master and another slave. It was running from the last 26 days. Today i tried to remove a node using kubeadm reset and add it again and kubelet was not able to start
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
The binary conntrack is not installed, this can cause failures in network connection cleanup.
server.go:376] Version: v1.10.2
feature_gate.go:226] feature gates: &{{} map[]}
plugins.go:89] No cloud provider specified.
server.go:233] failed to run Kubelet: cannot create certificate signing request: Unauthorized
while the join command is successful
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "aaaaa:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://aaaaa:6443"
[discovery] Requesting info from "https:/aaaaaa:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server
[discovery] Successfully established connection with API Server "aaaa:6443"
This node has joined the cluster:
Certificate signing request was sent to master and a response
was received.
The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
IMO the log failed to run Kubelet: cannot create certificate signing request: Unauthorized is the source of the problem, but I am do not know how it is coming and how to fix it.
TIA. I can give more details but I am not sure what all I shall give

Getting "x509: certificate signed by unknown authority" even with "--insecure-skip-tls-verify" option in Kubernetes

I have a private Docker image registry running on a Linux VM (10.78.0.228:5000) and a Kubernetes master running on a different VM running Centos Linux 7.
I used the below command to create a POD:
kubectl create --insecure-skip-tls-verify -f monitorms-rc.yml
I get this:
sample monitorms-mmqhm 0/1 ImagePullBackOff 0 8m
and upon running:
kubectl describe pod monitorms-mmqhm --namespace=sample
Warning Failed Failed to pull image "10.78.0.228:5000/monitorms":
Error response from daemon: {"message":"Get
https://10.78.0.228:5000/v1/_ping: x509: certificate signed by unknown
authority"}
Isn't Kubernetes supposed to ignore the server certificate for all operations during POD creation when the --insecure-skip-tls-verify is passed?
If not, how do I make it ignore the tls verification while pulling the docker image?
PS:
Kubernetes version :
Client Version: v1.5.2
Server Version: v1.5.2
I have raised this issue here: https://github.com/kubernetes/kubernetes/issues/43924
The issue you're seeing is actually a docker issue. Using --insecure-skip-tls-verify is a valid arg to kubectl, but it only deals with the connecition between kubectl and the kubernetes API server. The error you're seeing is actually because the docker daemon cannot login to the private registry because the cert it's using in unsigned.
Have a look at the Docker insecure registry docs and this should solve your problem.