gmp managed prometheus example not working on a brand new vanilla stable gke autopilot cluster - kubernetes

Google managed prometheus seems like a great service however at the moment it does not work even in the example... https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed
Setup:
create a new autopilot cluster 1.21.12-gke.2200
enable manage prometheus via gcloud cli command
gcloud beta container clusters update <mycluster> --enable-managed-prometheus --region us-central1
add port 8443 firewall webhook command
install ingress-nginx
try and use the PodMonitoring manifest to get metrics from ingress-nginx
Error from server (InternalError): error when creating "ingress-nginx/metrics.yaml": Internal error occurred: failed calling webhook "default.podmonitorings.gmp-operator.gke-gmp-system.monitoring.googleapis.com": Post "https://gmp-operator.gke-gmp-system.svc:443/default/monitoring.googleapis.com/v1/podmonitorings?timeout=10s": x509: certificate is valid for gmp-operator, gmp-operator.gmp-system, gmp-operator.gmp-system.svc, not gmp-operator.gke-gmp-system.svc
There is a thread suggesting this will all be fixed this week (8/11/2022), https://github.com/GoogleCloudPlatform/prometheus-engine/issues/300, but it seems like this should work regardless.
if I try to port forward ...
kubectl -n gke-gmp-system port-forward svc/gmp-operator 8443
error: Pod 'gmp-operator-67d5fff8b9-p4n7t' does not have a named port 'webhook'

Related

Issue in Istio Integration with Ambassador API gateway

I have Installed Ambassador Api gateway on AWS EKS cluster. It's working as expected.
Now I'd like to integrate Istio service mesh.
I'm following the steps given in the ambassador's official documentation.
https://www.getambassador.io/docs/edge-stack/latest/howtos/istio/#istio-integration.
But after Istio integration some ambassador pods are keep crashing.
At a time only 1 pod shows healthy out of 3.
Note: Istio side car are integrated successfully in all ambassador pods. and I have tried with Ambassador 2.1.1 & 2.1.2. But both has same issue. I'm not able to keep all ambassador pod healthy.
My EKS version is v1.19.13-eks
Below are the error:
time="2022-03-02 12:30:17.0687" level=error msg="Post \"http://localhost:8500/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp 127.0.0.1:8500: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher
Please do let me know if the above documentation is not sufficient for Istio integration with Ambassador on AWS EKS
Edit 1: In further investigation I found the issue comes when I tried to integrate Istio with PeerAuthentication STRICT mode. There is no such issue with default (permissive) mode.
But another issue comes when enable the STRICT mode, and now it's failing to connect with redis service
After some investigation and testing I find out the way to integrate Istio with Ambassador with PeerAuthentication STRICT mode.
the fix :
update the REDIS_URL env variable with https
from:
REDIS_URL: ambassador-redis:6379
to
REDIS_URL: https://ambassador-redis:6379

RabbitMQ host and port on Kubernetes cluster

I've installed RabbitMQ on a Kubernetes Cluster using Helm as follows:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/rabbitmq-cluster-operator
Then I setup a Go Client something like this running as a service on the same Kubernetes Cluster
import amqp "github.com/rabbitmq/amqp091-go"
conn, err = amqp.Dial("amqp://guest:guest#localhost:5672/")
But the client fails to connect. How do I figure out what the host and port should be set to for RabbitMQ on a Kubernetes Cluster?
If your Go client is running as a microservice on the same cluster, you need to use the appropriate DNS record to access it, localhost just attempts to access the Go client microservice itself.
In the namespace where RabbitMQ is installed, you can run kubectl get svc and there should be a ClusterIP service running with port 5672, likely called my-release.
You can then connect to it from any other service in the cluster with my-release.NAMESPACE.svc.DOMAIN.
The Helm release notes also show how to connect to the service, as well as many other helpful notes like authentication username and password, as well as external access availability etc.
helm get notes my-release

Failed calling webhook "namespace.sidecar-injector.istio.io"

I have make my deployment work with istio ingressgateway before. I am not aware of any changes made in istio or k8s side.
When I tried to deploy, I see an error in replicaset side that's why it cannot create new pod.
Error creating: Internal error occurred: failed calling webhook
"namespace.sidecar-injector.istio.io": Post
"https://istiod.istio-system.svc:443/inject?timeout=10s": dial tcp
10.104.136.116:443: connect: no route to host
When I try to go inside api-server and ping 10.104.136.116 (istiod service IP) it just hangs.
What I have tried so far:
Deleted all coredns pods
Deleted all istiod pods
Deleted all weave pods
Reinstalling istio via istioctl x uninstall --purge
turning all of VMs firewall
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -F
restarted all of the nodes
manual istio pod injection
Setup
k8s version: 1.21.2
istio: 1.10.3
HA setup
CNI: weave
CRI: containerd
In my case this was related to firewall. More info can be found here.
The gist of it is that on GKE at least you need to open another port 15017 in addition to 10250 and 443. This is to allow communication from your master node(s) to you VPC.
I don't have a definite answer unto why is this happening. But kube-apiserver cannot access istiod via service IP, wherein it can connect when I used the istiod pod IP.
Since I don't have the control over the VM and lower networking layer and not sure if they have changed something (because it is working before).
I made this work by changing my CNI from weave to flannel
In my case it was due to firewall. Following this Istio debug guide, I identified that the kubectl get --raw /api/v1/namespaces/istio-system/services/https:istiod:https-webhook/proxy/inject -v4 command was timing out while all other cluster internal calls were ok.
The best way to diagnose this is to open temporarly your AWS Security Groups involved to 0.0.0.0/0 for port 15017 and then try again.
If the errror won't show again, you know there's need to fix this part.
I am using EKS with Amazon VPC CNI v1.12.2-eksbuild.1

Kubernetes, Unable to connect to the server: EOF

Environment of kubectl: Windows 10.
Kubectl version: https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/windows/amd64/kubectl.exe
Hello. I've just installed Kubernetes cluster at Google Cloud Platform. Then applied the next command:
gcloud container clusters get-credentials my-cluster --zone europe-west1-b --project my-project
It successfully added the credentials at %UserProfile%\.kube\config
But when I try kubectl get pods it returns Unable to connect to the server: EOF. My computer accesses the internet through corporate proxy. How and where could I provide cert file for the kubectl so it could use the cert with all the requests? Thanx.
You would get EOF if there is no response from kubectl API calls in a certain time(Idle time is set 300 sec by default).
Try increasing cluster Idle time or maybe you might need VPN to access those pods (something like those)

Why tiller connect to localhost 8080 for kubernetes api?

When use helm for kubernetes package management, after installed the helm client,
after
helm init
I can see tiller pods are running on kubernetes cluster, and then when I run helm ls, it gives an error:
Error: Get http://localhost:8080/api/v1/namespaces/kube-system/configmaps?labe
lSelector=OWNER%3DTILLER: dial tcp 127.0.0.1:8080: getsockopt: connection
refused
and use kubectl logs I can see similar message like:
[storage/driver] 2017/08/28 08:08:48 list: failed to list: Get
http://localhost:8080/api/v1/namespaces/kube-system/configmaps?
labelSelector=OWNER%3DTILLER: dial tcp 127.0.0.1:8080: getsockopt: connection
refused
I can see the tiller pod is running at one of the node instead of master, there is no api server running on that node, why it connects to 127.0.0.1 instead of my master ip?
Run this before doing helm init. It worked for me.
kubectl config view --raw > ~/.kube/config
First delete tiller deployment and stop the tiller service.By running below commands,
kubectl delete deployment tiller-deploy --namespace=kube-system
kubectl delete service tiller-deploy --namespace=kube-system
rm -rf $HOME/.helm/
By default, helm init installs the Tiller pod into the kube-system namespace, with Tiller configured to use the default service account.
Configure Tiller with cluster-admin access with the following command:
kubectl create clusterrolebinding tiller-cluster-admin \
--clusterrole=cluster-admin \
--serviceaccount=kube-system:default
Then install helm server (Tiller) with the following command:
helm init
So I was having this problem since a couple weeks on my work station and none of the answers provided (here or in Github) worked for me.
What it has worked is this:
sudo kubectl proxy --kubeconfig ~/.kube/config --port 80
Notice that I am using port 80, so I needed to use sudo to be able to bing the proxy there, but if you are using 8080 you won't need that.
Be careful with this because the kubeconfig file that the command above is pointing to is in /root/.kube/config instead than in your usual $HOME. You can either use an absolute path to point to the config you want to use or create one in root's home (or use this sudo flag to preserve your original HOME env var --preserve-env=HOME).
Now if you are using helm by itself I guess this is it. To get my setup working, as I am using Helm through the Terraform provider on GKE this was a pain in the ass to debug as the message I was getting doesn't even mention Helm and it's returned by Terraform when planning. For anybody that may be in a similar situation:
The errors when doing a plan/apply operation in Terraform in any cluster with Helm releases in the state:
Error: error installing: Post "http://localhost/apis/apps/v1/namespaces/kube-system/deployments": dial tcp [::1]:80: connect: connection refused
Error: Get "http://localhost/api/v1/namespaces/system/secrets/apigee-secrets": dial tcp [::1]:80: connect: connection refused
One of these errors for every helm release in the cluster or something like that. In this case for a GKE cluster I had to ensure that I had the env var GOOGLE_APPLICATION_CREDENTIALS pointing to the key file with valid credentials (application-default unless you are not using the default set up for application auth) :
gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS=/home/$USER/.config/gcloud/application_default_credentials.json
With the kube proxy in place and the correct credentials I am able again to use Terraform (and Helm) as usual. I hope this is helpful for anybody experiencing this.
kubectl config view --raw > ~/.kube/config
export KUBECONFIG=~/.kube/config
worked for me