fail to run istio-ingressgateway, got Readiness probe failed: connection refused - kubernetes

I fail to deploy istio and met this problem. When I tried to deploy istio using istioctl install --set profile=default -y. The output is like:
➜ istio-1.11.4 istioctl install --set profile=default -y
✔ Istio core installed
✔ Istiod installed
✘ Ingress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition
Deployment/istio-system/istio-ingressgateway (containers with unready status: [istio-proxy])
- Pruning removed resources Error: failed to install manifests: errors occurred during operation
After running kubectl get pods -n=istio-system, I found the pod of istio-ingressgateway was created, and the result of describe:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m36s default-scheduler Successfully assigned istio-system/istio-ingressgateway-8dbb57f65-vc85p to k8s-slave
Normal Pulled 4m35s kubelet Container image "docker.io/istio/proxyv2:1.11.4" already present on machine
Normal Created 4m35s kubelet Created container istio-proxy
Normal Started 4m35s kubelet Started container istio-proxy
Warning Unhealthy 3m56s (x22 over 4m34s) kubelet Readiness probe failed: Get "http://10.244.1.4:15021/healthz/ready": dial tcp 10.244.1.4:15021: connect: connection refused
And I can't get the log of this pod:
➜ ~ kubectl logs pods/istio-ingressgateway-8dbb57f65-vc85p -n=istio-system
Error from server: Get "https://192.168.0.154:10250/containerLogs/istio-system/istio-ingressgateway-8dbb57f65-vc85p/istio-proxy": dial tcp 192.168.0.154:10250: i/o timeout
I run all this command on two VM in Huawei Cloud, with a 2C8G master and a 2C4G slave in ubuntu18.04. I have reinstall the environment and the kubernetes cluster, but that doesn't help.
Without ingressgateway
I also tried istioctl install --set profile=minimal -y that only run istiod. But when I try to run httpbin(kubectl apply -f samples/httpbin/httpbin.yaml) with auto injection on, the deployment can't create pod.
➜ istio-1.11.4 kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
httpbin 0/1 0 0 5m24s
➜ istio-1.11.4 kubectl describe deployment/httpbin
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 6m6s deployment-controller Scaled up replica set httpbin-74fb669cc6 to 1
When I unlabel the default namespace(kubectl label namespace default istio-injection-), everything works fine.
I hope to deploy istio ingressgateway and run demo like istio-ingressgateway, but I have no idea to solve this situation. Thanks for any help.

I made a silly mistake Orz.
After communiation with my cloud provider, I was informed that there was a network security policy of my cloud server. It's strange that one server has full access and the other has partial access (which only allow for port like 80, 443 and so on). After I change the policy, everything works fine.
For someone who may meet the similar question, I found all these questions seem to come with network problems like dns configuration, k8s configuration or server network problem after hours of searching in google. Like what howardjohn said in this issue, this is not a istio problem.

Related

gcloud - BrokerCell cloud-run-events/default is not ready

I am trying to use google cloud for my pubsub event driven application.
Currently, I am setting up Cloud Run for Anthos following the below tutorials
https://codelabs.developers.google.com/codelabs/cloud-run-events-anthos#7
https://cloud.google.com/anthos/run/archive/docs/events/cluster-configuration
I have created the GKE clusters. It is successful and is up and running.
However, I am getting the below error when I try to create event broker.
$ gcloud beta events brokers create default --namespace default
X Creating Broker... BrokerCell cloud-run-events/default is not ready
- Creating Broker...
Failed.
ERROR: gcloud crashed (TransportError): HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
gcloud's default CA certificates failed to verify your connection, which can happen if you are behind a proxy or firewall.
To use a custom CA certificates file, please run the following command:
gcloud config set core/custom_ca_certs_file /path/to/ca_certs
However, When I rerun the command, it shows broker already exists
$ gcloud beta events brokers create default --namespace default
ERROR: (gcloud.beta.events.brokers.create) Broker [default] already exists.
Checking the status of broker, it shows BrokerCellNotReady
$ kubectl get broker -n default
NAME URL AGE READY REASON
default http://default-brokercell-ingress.cloud-run-events.svc.cluster.local/default/default 39m Unknown BrokerCellNotReady
And I am getting status pending for default-brokercell-fanout pod.
$ kubectl get pods -n cloud-run-events
NAME READY STATUS RESTARTS AGE
controller-648c495796-b5ccb 1/1 Running 0 105m
default-brokercell-fanout-855494bb9b-2c7zv 0/1 Pending 0 100m
default-brokercell-ingress-5f8cdc6467-wwq42 1/1 Running 0 100m
default-brokercell-retry-6f4f9696d6-tg898 1/1 Running 0 100m
webhook-85f7bc69b4-qrpck 1/1 Running 0 109m
I couldn't find any discussion related to this error.
Please give me some ideas to resolve this issue.
I encountered the same issue. The reason might be the given cluster setup does not have enough CPU resources.
You can check it by
kubectl describe pod/default-brokercell-retry-6f4f9696d6-tg898 -n cloud-run-events
If the output is
then that's the reason.
After knowing the root cause, you can fix it in various ways, e.g., enable auto-scaling in your node pool.

Kubernetes metrics-server not working with Linkerd

I have a metrics-server and a horizontal pod autoscaler using this server, running on my cluster.
This works perfectly fine, until i inject linkerd-proxies into the deployments of the namespace where my application is running. Running kubectl top pod in that namespace results in a error: Metrics not available for pod <name> error. However, nothing appears in the metrics-server pod's logs.
The metrics-server clearly works fine in other namespaces, because top works in every namespace but the meshed one.
At first i thought it could be because the proxies' resource requests/limits weren't set, but after running the injection with them (kubectl get -n <namespace> deploy -o yaml | linkerd inject - --proxy-cpu-request "10m" --proxy-cpu-limit "1" --proxy-memory-request "64Mi" --proxy-memory-limit "256Mi" | kubectl apply -f -), the issue stays the same.
Is this a known problem, are there any possible solutions?
PS: I have a kube-prometheus-stack running in a different namespace, and this seems to be able to scrape the pod metrics from the meshed pods just fine
The problem was apparently a bug in the cAdvisor stats provider with the CRI runtime. The linkerd-init containers keep producing metrics after they've terminated, which shouldn't happen. The metrics-server ignores stats from pods that contain containers that report zero values (to avoid reporting invalid metrics, like when a container is restarting, metrics aren't collected yet,...). You can follow up on the issue here. Solutions seem to be changing to another runtime or using the PodAndContainerStatsFromCRI flag, which will let the internal CRI stats provider be responsible instead of the cAdvisor one.
I'm able to use kubectl top on pods that have linkerd injected:
:; kubectl top pod -n linkerd --containers
POD NAME CPU(cores) MEMORY(bytes)
linkerd-destination-5cfbd7468-7l22t destination 2m 41Mi
linkerd-destination-5cfbd7468-7l22t linkerd-proxy 1m 13Mi
linkerd-destination-5cfbd7468-7l22t policy 1m 81Mi
linkerd-destination-5cfbd7468-7l22t sp-validator 1m 34Mi
linkerd-identity-fc9bb697-s6dxw identity 1m 33Mi
linkerd-identity-fc9bb697-s6dxw linkerd-proxy 1m 12Mi
linkerd-proxy-injector-668455b959-rlvkj linkerd-proxy 1m 13Mi
linkerd-proxy-injector-668455b959-rlvkj proxy-injector 1m 40Mi
So I don't think there's anything fundamentally incompatible with the Linkerd and the Kubernetes metrics server.
I have noticed that I will sometimes see the errors for the first ~1m after a pod starts, before the metrics server has gotten its initial state for a pod; but these error messages seem a little different than what you reference:
:; kubectl rollout restart -n linkerd deployment linkerd-destination
deployment.apps/linkerd-destination restarted
:; while ! kubectl top pod -n linkerd --containers linkerd-destination-6d974dd4c7-vw7nw ; do sleep 10 ; done
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
POD NAME CPU(cores) MEMORY(bytes)
linkerd-destination-6d974dd4c7-vw7nw destination 1m 25Mi
linkerd-destination-6d974dd4c7-vw7nw linkerd-proxy 1m 13Mi
linkerd-destination-6d974dd4c7-vw7nw policy 1m 18Mi
linkerd-destination-6d974dd4c7-vw7nw sp-validator 1m 19Mi
:; kubectl version --short
Client Version: v1.23.3
Server Version: v1.21.7+k3s1

CockroachDB distributed workload on all nodes

I've deployed a CockroachDB cluster on Kubernetes using this guide:
https://github.com/cockroachlabs-field/kubernetes-examples/blob/master/SECURE.md
I deployed it with
$ helm install k8crdb --set Secure.Enabled=true cockroachdb/cockroachdb --namespace=thesis-crdb
Here is how it looks when I list it with $ helm list --namespace=thesis-crdb
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
k8crdb thesis-crdb 1 2021-01-29 20:18:25.5710691 +0100 CET deployed cockroachdb-5.0.4 20.2.4
Here is how it looks when I list it with $ kubectl get all --namespace=thesis-crdb
NAME READY STATUS RESTARTS AGE
pod/k8crdb-cockroachdb-0 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-1 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-2 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-init-j2h7t 0/1 Completed 0 3h1m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/k8crdb-cockroachdb ClusterIP None <none> 26257/TCP,8080/TCP 3h1m
service/k8crdb-cockroachdb-public ClusterIP 10.99.163.201 <none> 26257/TCP,8080/TCP 3h1m
NAME READY AGE
statefulset.apps/k8crdb-cockroachdb 3/3 3h1m
NAME COMPLETIONS DURATION AGE
job.batch/k8crdb-cockroachdb-init 1/1 33s 3h1m
Now I wanna simulate traffic to this cluster. First I access the pod with: $ kubectl exec -i -t -n thesis-crdb k8crdb-cockroachdb-0 -c db "--" sh -c "clear; (bash || ash || sh)"
Which gets me inside the first pod/node.
From here I initiate the workload
[root#k8crdb-cockroachdb-0 cockroach]# cockroach workload init movr 'postgresql://root#localhost:26257?sslmode=disable'
And then I run the workload for 5 minutes
[root#k8crdb-cockroachdb-0 cockroach]# cockroach workload run movr --duration=5m 'postgresql://root#localhost:26257?sslmode=disable'
I am aware that I'm running the workload on one node, but I was under the expression that the workload would be distributed among all nodes? Because when I monitor the performance with the cockroachDB console I see that it's only the first node that is doing all the work, and the other nodes are idle.
As you can see the second (and third node) haven't had any workload at all. Is this just a visual glitch in the console? Or how can I run the workload so it get distributed evenly among all nodes in the cluster?
-UPDATE-
Yes, glad you brought up the cockroachdb-client-secure pod, because that's where I no longer could follow the guide. I tried as they did in the guide by doing: $ curl https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/client-secure.yaml | sed -e 's/serviceAccountName\: cockroachdb/serviceAccountName\: k8crdb-cockroachdb/g' | kubectl create -f -
But it throws this error:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1638 100 1638 0 0 4136 0 --:--:-- --:--:-- --:--:-- 4146
Error from server (Forbidden): error when creating "STDIN": pods "cockroachdb-client-secure" is forbidden: error looking up service account default/k8crdb-cockroachdb: serviceaccount "k8crdb-cockroachdb" not found
I also don't know if my certificates have been approved, because when I try this:
$ kubectl get csr k8crdb-cockroachdb-0 --namespace=thesis-crdb
I throws this:
Error from server (NotFound): certificatesigningrequests.certificates.k8s.io "k8crdb-cockroachdb-0" not found
And when I try to approve certificate: $ kubectl certificate approve k8crdb-cockroachdb-0 --namespace=thesis-crdb
It throws:
Error from server (NotFound): certificatesigningrequests.certificates.k8s.io "k8crdb-cockroachdb-0" not found
Any idea how to proceed from here?
This is not a glitch. Nodes will only receive SQL traffic if clients connect to them and issue SQL statements. It seems like you're running the workload by logging in to one of the cockroach pods and directing it to connect to that pod on its local port. That means only that pod is going to receive queries. The cockroach workload subcommand takes an arbitrary number of pgurl strings and will balance load over all of them. Note also that k8crdb-cockroachdb-public represents a load-balancer over all o
If you look at the guide you posted, it continues to describe how to deploy the cockroachdb-client-secure pod. Th If you were to run the workload there pointed at the load balancer, with something like:
'postgres://root#k8crdb-cockroachdb-public?sslcert=cockroach-certs%2Fclient.root.crt&sslkey=cockroach-certs%2Fclient.root.key&sslrootcert=cockroach-certs%2Fca.crt&sslmode=verify-full'
UPDATE
I'm not an expert in the k8s here but I think your issue creating the client pod relates to the namespace. It's currently assuming that everything is in the default namespace but it appears that you're working in the --namespace=thesis-crdb. Consider adding a namespace flag to the kubectl create -f - command. Or, potentially consider setting the namespace for the session:
kubectl config set-context --current --namespace=thesis-crdb

I cannot load the node information on kubernetes

When I ran the command below, I got the below messages
bistel#BISTelResearchDev-DN03:~$ kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
While in the master node, I get the information as below:
bistel#BISTelResearchDev-NN:/etc/kubernetes$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
bistelresearchdev-dn03 NotReady <none> 62s v1.19.3
bistelresearchdev-nn Ready master 57m v1.19.3
bistel#BISTelResearchDev-NN:/etc/kubernetes$
The bistelresearchdev-dn03 is the worker node and the message appears when I ran any command using kubectl as follows The connection to the server localhost:8080 was refused - did you specify the right host or port?.
I googled it a lot but any trials didn't work for me.
Thanks,
kubectl works only on master node in cluster. If you are getting this error then there is no issue.
I can see the issue here is node is NotReady status for that you can check below things.
Check kubelet is running on node bistelresearchdev-dn03 with systemctl status kubelet
Check network plugin is installed on your cluster.
The first computer you ran on is missing the kube config file.
Normally kubectl expects to find it at
~/.kube/config
If you get the one off the master node and copy it onto your machine your kubectl will see it and be able to use it.

Minikube got stuck when creating container

I recently got started to learn Kubernetes by using Minikube locally in my Mac. Previously, I was able to start a local Kubernetes cluster with Minikube 0.10.0, created a deployment and viewed Kubernetes dashboard.
Yesterday I tried to delete the cluster and re-did everything from scratch. However, I found I cannot get the assets deployed and cannot view the dashboard. From what I saw, everything seemed to get stuck during container creation.
After I ran minikube start, it reported
Starting local Kubernetes cluster...
Kubectl is now configured to use the cluster.
When I ran kubectl get pods --all-namespaces, it reported (pay attention to the STATUS column):
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-addon-manager-minikube 0/1 ContainerCreating 0 51s
docker ps showed nothing:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
minikube status tells me the VM and cluster are running:
minikubeVM: Running
localkube: Running
If I tried to create a deployment and an autoscaler, I was told they were created successfully:
kubectl create -f configs
deployment "hello-minikube" created
horizontalpodautoscaler "hello-minikube-autoscaler" created
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-minikube-661011369-1pgey 0/1 ContainerCreating 0 1m
default hello-minikube-661011369-91iyw 0/1 ContainerCreating 0 1m
kube-system kube-addon-manager-minikube 0/1 ContainerCreating 0 21m
When exposing the service, it said:
$ kubectl expose deployment hello-minikube --type=NodePort
service "hello-minikube" exposed
$ kubectl get service
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-minikube 10.0.0.32 <nodes> 8080/TCP 6s
kubernetes 10.0.0.1 <none> 443/TCP 22m
When I tried to access the service, I was told:
curl $(minikube service hello-minikube --url)
Waiting, endpoint for service is not ready yet...
docker ps still showed nothing. It looked to me everything got stuck when creating a container. I tried some other ways to work around this issue:
Upgraded to minikube 0.11.0
Use the xhyve driver instead of the Virtualbox driver
Delete everything cached, like ~/.minikube, ~/.kube, and the cluster, and re-try
None of them worked for me.
Kubernetes is still new to me and I would like to know:
How can I troubleshoot this kind of issue?
What could be the cause of this issue?
Any help is appreciated. Thanks.
It turned out to be a network problem in my case.
The pod status is "ContainerCreating", and I found during container creation, docker image will be pulled from gcr.io, which is inaccessible in China (blocked by GFW). Previous time it worked for me because I happened to connect to it via a VPN.
I didn't try minikube but I use kubernetes. With the information provided it is difficult to say the cause of the issue. Your minikube has no problem in creating resources but ContainerCreating is a problem related to docker daemon or improper communication between kube-api and docker daemon or some problem with kubelet.
You can try the following command:
kubectl describe po POD_NAME
This will give you the POD's events. Maybe this will provide a path to the root cause of issue.
You may also check the logs of kubelet to get the events.
I had this problem on Windows, but it was related to an NTLM proxy. I deleted the minikube VM then recreated it with the correct proxy settings for my CNTLM installation:
minikube start \
--docker-env http_proxy=http://10.0.2.2:3128 \
--docker-env https_proxy=http://10.0.2.2:3128 \
--docker-env no_proxy=localhost,127.0.0.1,::1,192.168.99.100
See https://blog.alexellis.io/minikube-behind-proxy/
The horizontalpodautoscaler (hpa) requires heapster to use. You'll need to run heapster in minikube for that to work. You can always debug these kinds of issues with minikube logs or interactively through the dashboard found at minikube dashboard.
You can find the steps to run heapster and grafana at https://github.com/kubernetes/heapster
For me, it takes several minutes before I see the ContainerCreating problem. After executing the following command:
systemctl status kube-controller-manager.service
I get this error:
Sync "default/redis-master-2229813293" failed with unable to create pods: No API token found for service account "default", retry after the token is automatically created and added to the service account.
There are two ways to solve this:
Set the service account with token
Remove the ServiceAccount setting of KUBE_ADMISSION_CONTROL in api-server