AKS cluster to Nexus repo via VPN - kubernetes

i need to be able to connect to a nexus repository from my AKS cluster. The pods get deployed, but ends up with imagepullback error, with an rpc error code. The repo can be accessed only via a VPN and i can manually run a docker pull command to pull the image. But the pods on the AKS cluster are not able to connect. What am i doing wrong? Please help!
Here is the error message I get when I describe the pod. I have even applied a secret with the credentials for the remote nexus repo. Still no luck.
Events:
Type Reason Age From Message
Normal Scheduled 51s default-scheduler Successfully assigned my-namespace/export-example-5c649db546-trhm6 to aks-nodepool1-30782560-vmss00006z
Warning Failed 20s kubelet Failed to pull image "dockerhub.myrepoexample.com/omnius-vnext/export-example:4.0.1": rpc error: code = Unknown desc = failed to pull and unpack image "dockerhub.myrepoexample.com/omnius-vnext/export-example:4.0.1": failed to resolve reference "dockerhub.myrepoexample.com/omnius-vnext/export-example:4.0.1": failed to do request: Head "https://dockerhub.myrepoexample.com/v2/omnius-vnext/export-example/manifests/4.0.1": dial tcp 35.154.211.153:443: i/o timeout
Warning Failed 20s kubelet Error: ErrImagePull
Normal BackOff 20s kubelet Back-off pulling image "dockerhub.myrepoexample.com/omnius-vnext/export-example:4.0.1"
Warning Failed 20s kubelet Error: ImagePullBackOff
Normal Pulling 9s (x2 over 51s) kubelet Pulling image "dockerhub.myrepoexample.com/omnius-vnext/export-example:4.0.1"

Related

istio bookinfo sample application deployment failed for "ImagePullBackOff"

I am trying to deploy istio's sample bookinfo application using the below command:
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
from here
but each time I am getting ImagePullBackoff error like this:
NAME READY STATUS RESTARTS AGE
details-v1-c74755ddf-m878f 2/2 Running 0 6m32s
productpage-v1-778ddd95c6-pdqsk 2/2 Running 0 6m32s
ratings-v1-5564969465-956bq 2/2 Running 0 6m32s
reviews-v1-56f6655686-j7lb6 1/2 ImagePullBackOff 0 6m32s
reviews-v2-6b977f8ff5-55tgm 1/2 ImagePullBackOff 0 6m32s
reviews-v3-776b979464-9v7x5 1/2 ImagePullBackOff 0 6m32s
For error details, I have run :
kubectl describe pod reviews-v1-56f6655686-j7lb6
Which returns these:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m41s default-scheduler Successfully assigned default/reviews-v1-56f6655686-j7lb6 to minikube
Normal Pulled 7m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 7m39s kubelet Created container istio-init
Normal Started 7m39s kubelet Started container istio-init
Warning Failed 5m39s kubelet Failed to pull image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0": rpc error: code = Unknown desc = context deadline exceeded
Warning Failed 5m39s kubelet Error: ErrImagePull
Normal Pulled 5m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 5m39s kubelet Created container istio-proxy
Normal Started 5m39s kubelet Started container istio-proxy
Normal BackOff 5m36s (x3 over 5m38s) kubelet Back-off pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Warning Failed 5m36s (x3 over 5m38s) kubelet Error: ImagePullBackOff
Normal Pulling 5m25s (x2 over 7m38s) kubelet Pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Do I need to build dockerfile first and push it to the local repository? There are no clear instructions there or I failed to find any.
Can anybody help?
If you check in dockerhub the image is there:
https://hub.docker.com/r/istio/examples-bookinfo-reviews-v1/tags
So the error that you need to deal with is context deadline exceeded while trying to pull it from dockerhub. This is likely a networking error (a generic Go error saying it took too long), depending on where your cluster is running you can do manually a docker pull from the nodes and that should work.
EDIT: for minikube do a minikube ssh and then a docker pull
Solved the problem by below command :
minikube ssh docker pull istio/examples-bookinfo-reviews-v1:1.17.0
from this git issues here
Also How to use local docker images with Minikube?
Hope this may help somebody

K8s pod ImagePullBackoff

created a very simple nginx pod and run into status ImagePullBackoff
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned reloader/nginx to aks-appnodepool1-22779252-vmss000000
Warning Failed 29m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 52.200.78.26:443: i/o timeout
Warning Failed 27m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 52.21.28.242:443: i/o timeout
Warning Failed 23m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 3.223.210.206:443: i/o timeout
Normal Pulling 22m (x4 over 32m) kubelet Pulling image "nginx"
Warning Failed 20m (x4 over 29m) kubelet Error: ErrImagePull
Warning Failed 20m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 3.228.155.36:443: i/o timeout
Warning Failed 20m (x7 over 29m) kubelet Error: ImagePullBackOff
Warning Failed 6m41s kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 52.5.157.114:443: i/o timeout
Normal BackOff 2m17s (x65 over 29m) kubelet Back-off pulling image "nginx"
Checked network status:
A VM in the same subnet can access "https://registry-1.docker.io/v2/library/nginx/manifests/latest" and telnet 52.5.157.114 443 successful.
docker pull nginx successfully on the VM in the same subnet.
kubectl exec into a running pod in the same cluster can wget https://registry-1.docker.io/v2/library/nginx/manifests/latest successfully.
.
What is the possible problem?
When I wget/curl or anything you want to access
https://registry-1.docker.io/v2/library/nginx/manifests/latest
It says
{"errors":[{"code":"UNAUTHORIZED","message":"authentication required","detail":[{"Type":"repository","Class":"","Name":"library/nginx","Action":"pull"}]}]}
However this is because you need to be logged in to pull this image from this repository.
2 solutions:
The first is simple, in the image field just replace this url by nginx:latest and it should work
The second: create a regcred
in your pod yaml , change image : docker.io/library/nginx:latest to docker.io/nginx:latest
Turned out to be firewall dropped the package.

helm3 installation fails with: failed post-install: timed out waiting for the condition

I'm new to Kubernetes and Helm. I have installed k3d and helm:
k3d version v1.7.0
k3s version v1.17.3-k3s1
helm version
version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"clean", GoVersion:"go1.13.12"}
I do have a cluster created with 10 worker nodes. When I try to install stackstorm-ha on the cluster I see the following issues:
helm install stackstorm/stackstorm-ha --generate-name --debug
client.go:534: [debug] stackstorm-ha-1592860860-job-st2-apikey-load: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed post-install: timed out waiting for the condition
helm.go:84: [debug] failed post-install: timed out waiting for the condition
njbbmacl2813:~ gangsh9$ kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
kubectl describe pods either shows :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/stackstorm-ha-1592857897-st2api-7f6c877b9c-dtcp5 to k3d-st2hatest-worker-5
Warning Failed 23m kubelet, k3d-st2hatest-worker-5 Error: context deadline exceeded
Normal Pulling 17m (x5 over 37m) kubelet, k3d-st2hatest-worker-5 Pulling image "stackstorm/st2api:3.3dev"
Normal Pulled 17m (x5 over 28m) kubelet, k3d-st2hatest-worker-5 Successfully pulled image "stackstorm/st2api:3.3dev"
Normal Created 17m (x5 over 28m) kubelet, k3d-st2hatest-worker-5 Created container st2api
Normal Started 17m (x4 over 28m) kubelet, k3d-st2hatest-worker-5 Started container st2api
Warning BackOff 53s (x78 over 20m) kubelet, k3d-st2hatest-worker-5 Back-off restarting failed container
or
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/stackstorm-ha-1592857897-st2timersengine-c847985d6-74h5k to k3d-st2hatest-worker-2
Warning Failed 6m23s kubelet, k3d-st2hatest-worker-2 Failed to pull image "stackstorm/st2timersengine:3.3dev": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/stackstorm/st2timersengine:3.3dev": failed to resolve reference "docker.io/stackstorm/st2timersengine:3.3dev": failed to authorize: failed to fetch anonymous token: Get https://auth.docker.io/token?scope=repository%3Astackstorm%2Fst2timersengine%3Apull&service=registry.docker.io: net/http: TLS handshake timeout
Warning Failed 6m23s kubelet, k3d-st2hatest-worker-2 Error: ErrImagePull
Normal BackOff 6m22s kubelet, k3d-st2hatest-worker-2 Back-off pulling image "stackstorm/st2timersengine:3.3dev"
Warning Failed 6m22s kubelet, k3d-st2hatest-worker-2 Error: ImagePullBackOff
Normal Pulling 6m10s (x2 over 6m37s) kubelet, k3d-st2hatest-worker-2 Pulling image "stackstorm/st2timersengine:3.3dev"
Kind of stuck here.
Any help would be greatly appreciated.
The TLS handshake timeout error is very common when the machine that you are running your deployment on is running out of resources. Alternative issue is caused by slow internet connection or some proxy settings but we ruled out that since you can pull and run docker images locally and deploy small nginx webserver in your cluster.
As you may notice in the stackstorm helm chart it installs a big amount of services/pods inside your cluster which can take up a lot of resources.
It will install 2 replicas for each component of StackStorm
microservices for redundancy, as well as backends like RabbitMQ HA,
MongoDB HA Replicaset and etcd cluster that st2 replies on for MQ, DB
and distributed coordination respectively.
I deployed stackstorm on both k3d and GKE but I had to use fast machines in order to deploy this quickly and successfully.
NAME: stackstorm
LAST DEPLOYED: Mon Jun 29 15:25:52 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
Congratulations! You have just deployed StackStorm HA!

My old windows pods are dead and don't respond to http requests / exec fails

I have an AKS cluster with a mix of Windows and Linux nodes and an nginx-ingress.
This all worked great, but a few days ago all my windows pods have become unresponsive.
Everything is still green on the K8s dashboard, but they don't respond to HTTP requests and kubectl exec fails.
All the linux pods still work.
I created a new deployment with the exact same image and other properties, and this new pod works, responds to HTTP and kubectl exec works.
Q: How can I find out why my old pods died? How can I prevent this from occuring again in the future?
Note that this is a test cluster, so I have the luxury of being able to investigate, if this was prod I would have burned and recreated the cluster already.
Details:
https://aks-test.progress-cloud.com/eboswebApi/ is one of the old pods, https://aks-test.progress-cloud.com/eboswebApi2/ is the new pod.
When I look at the nginx log, I see a lot of connect() failed (111: Connection refused) while connecting to upstream.
When I try kubectl exec -it <podname> --namespace <namespace> -- cmd I get one of two behaviors:
Either the command immediatly returns without printing anything, or I get an error:
container 1dfffa08d834953c29acb8839ea2d4c6b78b7a530371d98c16b15132d49f5c52 encountered an error during CreateProcess: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf) extra info: {"CommandLine":"cmd","WorkingDirectory":"C:\\inetpub\\wwwroot","Environment":{...},"EmulateConsole":true,"CreateStdInPipe":true,"CreateStdOutPipe":true,"ConsoleSize":[0,0]}
command terminated with exit code 126
kubectl describe pod works on both.
The only difference I could find was that on the old pod, I don't get any events:
Events: <none>
whereas on the new pod I get a bunch of them for pulling the image etc:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39m default-scheduler Successfully assigned ingress-basic/ebos-webapi-test-2-78786968f4-xmvfw to aksnpwin000000
Warning Failed 38m kubelet, aksnpwin000000 Error: failed to start container "ebos-webapi-test-2": Error response from daemon: hcsshim::CreateComputeSystem ebos-webapi-test-2: The binding handle is invalid.
(extra info: {"SystemType":"Container","Name":"ebos-webapi-test-2","Owner":"docker","VolumePath":"\\\\?\\Volume{dac026db-26ab-11ea-bb33-e3730ff9432d}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\ebos-webapi-test-2","Layers":[{"ID":"8c160b6e-685a-58fc-8c4b-beb407ad09b4","Path":"C:\\ProgramData\\docker\\windowsfilter\\12061f29088664dc41c0836c911ed7ced1f6d7ed38b1c932c25cd8ca85a3a88e"},{"ID":"6a230a46-a97c-5e30-ac4a-636e62cd9253","Path":"C:\\ProgramData\\docker\\windowsfilter\\8c0ce5a9990bc433c4d937aa148a4251ef55c1aa7caccf1b2025fd64b4feee97"},{"ID":"240d5705-d8fe-555b-a966-1fc304552b64","Path":"C:\\ProgramData\\docker\\windowsfilter\\2b334b769fe19d0edbe1ad8d1ae464c8d0103a7225b0c9e30fdad52e4b454b35"},{"ID":"5f5d8837-5f62-5a76-a706-9afb789e45e4","Path":"C:\\ProgramData\\docker\\windowsfilter\\3d1767755b0897aaae21e3fb7b71e2d880de22473f0071b0dca6301bb6110077"},{"ID":"978503cb-b816-5f66-ba41-ed154db333d5","Path":"C:\\ProgramData\\docker\\windowsfilter\\53d2e85a90d2b8743b0502013355df5c5e75448858f0c1f5b435281750653520"},{"ID":"d7d0d14e-b097-5104-a492-da3f9396bb06","Path":"C:\\ProgramData\\docker\\windowsfilter\\38830351b46e7a0598daf62d914eb2bf01e6eefde7ac560e8213f118d2bd648c"},{"ID":"90b1c608-be4c-55a1-a787-db3a97670149","Path":"C:\\ProgramData\\docker\\windowsfilter\\84b71fda82ea0eacae7b9382eae2a26f3c71bf118f5c80e7556496f21e754126"},{"ID":"700711b2-d578-5d7c-a17f-14165a5b3507","Path":"C:\\ProgramData\\docker\\windowsfilter\\08dd6f93c96c1ac6acd3d2e8b60697340c90efe651f805809dbe87b6bd26a853"},{"ID":"270de12a-461c-5b0c-8976-a48ae0de2063","Path":"C:\\ProgramData\\docker\\windowsfilter\\115de87074fadbc3c44fc33813257c566753843f8f4dd7656faa111620f71f11"},{"ID":"521250bb-4f30-5ac4-8fcd-b4cf45866627","Path":"C:\\ProgramData\\docker\\windowsfilter\\291e51f5f030d2a895740fae3f61e1333b7fae50a060788040c8d926d46dbe1c"},{"ID":"6dded7bf-8c1e-53bb-920e-631e78728316","Path":"C:\\ProgramData\\docker\\windowsfilter\\938e721c29d2f2d23a00bf83e5bc60d92f9534da409d0417f479bd5f06faa080"},{"ID":"90dec4e9-89fe-56ce-a3c2-2770e6ec362c","Path":"C:\\ProgramData\\docker\\windowsfilter\\d723ebeafd1791f80949f62cfc91a532cc5ed40acfec8e0f236afdbcd00bbff2"},{"ID":"94ac6066-b6f3-5038-9e1b-d5982fcefa00","Path":"C:\\ProgramData\\docker\\windowsfilter\\00d1bb6fc8abb630f921d3651b1222352510d5821779d8a53d994173a4ba1126"},{"ID":"037c6d16-5785-5bea-bab4-bc3f69362e0c","Path":"C:\\ProgramData\\docker\\windowsfilter\\c107cf79e8805e9ce6d81ec2a798bf4f1e3b9c60836a40025272374f719f2270"}],"ProcessorWeight":5000,"HostName":"ebos-webapi-test-2-78786968f4-xmvfw","MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\c44f445c-272b-11ea-b9bc-ae0ece5532e1\\volumes\\kubernetes.io~secret\\default-token-n5tnc","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"4c9bede623553673fde0da6e8dc92f9a55de1ff823a168a35623ad8128f83ecb"})
Normal Pulling 38m (x2 over 38m) kubelet, aksnpwin000000 Pulling image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Pulled 38m (x2 over 38m) kubelet, aksnpwin000000 Successfully pulled image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Created 38m (x2 over 38m) kubelet, aksnpwin000000 Created container ebos-webapi-test-2
Normal Started 38m kubelet, aksnpwin000000 Started container ebos-webapi-test-2

Unable to mount PVC created by OpenEBS on pods on Kubernetes bare-metal deployment

I am facing issues while mounting pvc on pods with openebs installed on bare-metal kubernetes cluster created with RKE.
Expected Behavior
PVC's should be mounted on pods without issues.
Current Behavior
Pods unable to mount PVC's:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m9s (x23 over 2m45s) default-scheduler pod has unbound PersistentVolumeClaims (repeated 4 times)
Normal Scheduled 2m8s default-scheduler Successfully assigned default/minio-deployment-64d7c79464-966jr to 192.168.1.21
Normal SuccessfulAttachVolume 2m8s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff"
Warning FailedMount 84s (x4 over 102s) kubelet, 192.168.1.21 MountVolume.WaitForAttach failed for volume "pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to sendtargets to portal 10.43.227.122:3260 output: iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: No portals found
, err exit status 21
Warning FailedMount 24s (x4 over 80s) kubelet, 192.168.1.21 MountVolume.WaitForAttach failed for volume "pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2016-09.com.openebs.jiva:pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff, portal: 10.43.227.122,3260].
iscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)
iscsiadm: Could not log into all portals
Logging in to [iface: default, target: iqn.2016-09.com.openebs.jiva:pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff, portal: 10.43.227.122,3260] (multiple)
(exit status 12)
Warning FailedMount 2s kubelet, 192.168.1.21 Unable to mount volumes for pod "minio-deployment-64d7c79464-966jr_default(640263d0-ec99-11e8-85c9-b06ebfd124ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"minio-deployment-64d7c79464-966jr". list of unmounted volumes=[storage]. list of unattached volumes=[storage default-token-9n8pn]
Steps to Reproduce
Install openebs with helm.
Create a pvc with storage class as openebs-standalone
Create pod and try to mount the PVC.
kubectl get pvc:
root#an4:/home/rke-k8s# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
docker-private-registry-docker-registry Bound pvc-58cf63c1-ec95-11e8-9b5d-2cfda16d3cfd 10Gi RWO openebs-standalone 22m
Update
When I tried the sample minio deployment, here's what I have observed:
PVC creation took around 1-2 minutes.
Mounting PVC to the pod took around 1 hour.
Storage class used for this was openebs-standard.
Any reason for this? It is on-prem cluster deployment.
Well, this issue was documented in the troubleshooting guide - https://docs.openebs.io/docs/next/tsgiscsi.html
This is the issue with openebs and already been opened with team. Fix is still pending, you can track the issue here:
https://github.com/openebs/openebs/issues/1688
There is step by step instruction how to debug the issue. Hope this helps.