Redirection Stateful pod in Kubernetes - kubernetes

I am using stateful in kubernetes.
I write an application which will have leader and follower (using Go)
Leader is for writing and reading.
Follower is just for reading.
In the application code, I used "http.Redirect(w, r, url, 307)" function to redirect the writing from follower to leader.
if I use a jump pod to test the application (try to access the app to read and write), my application can work well, the follower can redirect to the leader
kubectl run -i -t --rm jumpod --restart=Never --image=quay.io/mhausenblas/jump:0.2 -- sh
But when I deploy a service (to access from outside).
apiVersion: v1
kind: Service
metadata:
name: service-name
spec:
selector:
app: app-name
ports:
- port: 80
targetPort: 9876
And access to application by this link:
curl -L -XPUT -T /tmp/put-name localhost:8001/api/v1/namespaces/default/services/service-name/proxy/
Because service will randomly select a pod each time we access to the application. When it accesses to leader, it work well (no problem happen), But if it accesses to the follower, the follower will need to redirect to leader, I met this error:
curl: (6) Could not resolve host: pod-name.svc-name.default.svc.cluster.local
What I tested:
I can use this link to access when I used jump pod
I accessed to per pod, look up the DNS. I can find the DNS name by "nslookup" command
I tried to fix the IP of leader in my code. In my code, the follower will redirect to a IP (not a domain like above). But It still met this error:
curl: (7) Failed to connect to 10.244.1.71 port 9876: No route to host
Anybody know this problem. Thank you!

In order for pod DNS to work you must create headless service:
apiVersion: v1
kind: Service
metadata:
name: svc-name-headless
spec:
clusterIP: None
selector:
app: app-name
ports:
- port: 80
targetPort: 9876
And then in StatefulSet spec you must refer to this service:
spec:
serviceName: svc-name-headless
Read more here: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id
Alternatively you may specify what particular pod will be selected by Service like that:
apiVersion: v1
kind: Service
metadata:
name: svc-name
spec:
selector:
statefulset.kubernetes.io/pod-name: pod-name-0
ports:
- port: 80
targetPort: 9876

When you are accessing your cluster services from outside, a DNS names reserved normally for inner cluster use (e.g. pod-name.svc-name.default.svc.cluster.local) are not recognized by clients (e.g. Web browser).
Context:
You are trying to expose access to PODs, controlled by StatefullSet with service of ClusterIP type
Solution:
Change ClusterIP (assumed by default when not specified) to NodePort or LoadBalancer

Related

Kubernetes Inter-Node traffic is not working properly

lately I am configuring a k8s cluster composed of 3 nodes(master, worker1 and worker2) that will host an UDP application(8 replicas of it). Everything is done and the cluster is working very well but there is only one problem.
Basically there is a Deployment which describes the Pod and it looks like:
apiVersion: apps/v1
kind: Deployment
metadata:
name: <name>
labels:
app: <app_name>
spec:
replicas: 8
selector:
matchLabels:
app: <app_name>
template:
metadata:
labels:
app: <app_name>
spec:
containers:
- name: <name>
image: <image>
ports:
- containerPort: 6000
protocol: UDP
There is also a Service which is used to access to the UDP application:
apiVersion: v1
kind: Service
metadata:
name: <service_name>
labels:
app: <app_name>
spec:
type: NodePort
ports:
- port: 6000
protocol: UDP
nodePort: 30080
selector:
app: <app_name>
When i try to access to the service 2 different scenarios may occur:
The request is assigned to a POD that is in the same node that received the request
The request is assigned to a POD that is in the other node
In the second case the request arrives correctly to the POD but with a source IP which ends by 0 (for example 10.244.1.0) so the response will never be delivered correctly.
I can't figure it out, I really tried everything but this problem still remains. In this moment to make the cluster working properly i added externalTrafficPolicy: Local and internalTrafficPolicy: Local to the Service in this way the requests will remain locally so when a request is sent to worker1 it will be assigned to a Pod which is running on worker1, the same for the worker2.
Do you have any ideas about the problem?
Thanks to everyone.
Have you confirmed that the response is not delivered correctly for your second scenario? The source IP address in that case should be the one of the node where the request first arrived.
I am under the impression that you are assuming that since the IP address ends in 0 this is necessarily a network address, and that could be a wrong assumption, as it depends on the Netmask configured for the Subnetwork where the nodes are allocated; for example, if the nodes are in the Subnet 10.244.0.0/23, then the network address is 10.244.0.0, and 10.244.1.0 is just another usable address that can be assigned to a node.
Now, if your application needs to preserve the client's IP address, then that could be an issue since, by default, the source IP seen in the target container is not the original source IP of the client. In this case, additionally to configuring the externalTrafficPolicy as Local, you would need to configure a healthCheckNodePort as specified in the Preserving the client source IP documentation.

HAProxy Ingress Controller Service Changed IP on GCP

I am using HAProxy as the ingress-controller in my GKE clusters. And exposing HAProxy service as LoadBalancer service(Internal).
Recently, I experienced an issue, where the HA-Proxy service changed its EXTERNAL-IP, and traffic stopped routing to HAProxy. This issue occurred multiple times on different days(now it has stopped). I had to manually add that new External-IP to the frontend of that Loadbalancer to allow traffic to HAProxy.
There were two pods running for HAProxy, and both had been running for days, and there was nothing in their logs. I assume it was something related to Service or GCP LB and not HAProxy itself.
I am afraid that I don't have any logs related to that.
I still don't know, what caused the service IP to change. As there were no recent changes, and the cluster and all services were running for many days properly, and suddenly this occurred.
Has anyone faced a similar issue earlier? Or what can I do to avoid such issue in future?
What could have caused the IP to change?
This is how my service is configured:
---
apiVersion: v1
kind: Service
metadata:
labels:
run: haproxy-ingress
name: haproxy-ingress
namespace: haproxy-controller
annotations:
cloud.google.com/load-balancer-type: "Internal"
networking.gke.io/internal-load-balancer-allow-global-access: "true"
cloud.google.com/network-tier: "Premium"
spec:
selector:
run: haproxy-ingress
type: LoadBalancer
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
- name: stat
port: 1024
protocol: TCP
targetPort: 1024
Found some logs:
Warning SyncLoadBalancerFailed 30m (x3570 over 13d) service-controller Error syncing load balancer: failed to ensure load balancer: googleapi: Error 409: IP_IN_USE_BY_ANOTHER_RESOURCE - IP '10.17.129.17' is already being used by another resource.
Normal EnsuringLoadBalancer 3m33s (x3576 over 13d) service-controller Ensuring load balancer
The Short answer is: External IP for the service are ephemeral.
Because HA-Proxy controller pods are recreated the HA-Proxy service is created with an ephemeral IP.
To avoid this issue, I would recommend using a static IP that you can reference in the loadBalancerIP field.
This can be done by following steps:
Reserve a static IP. (link)
Use this IP, to create a service (link)
Example YAML:
apiVersion: v1
kind: Service
metadata:
name: helloweb
labels:
app: hello
spec:
selector:
app: hello
tier: web
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
loadBalancerIP: "YOUR.IP.ADDRESS.HERE"
Unfortunately without logs it's hard to say anything for sure. You should check the audit logs that GKE ships to Cloud Logging as that might give you some idea of what happened. One option is the GCP "oops"'d the GLB and GKE recreated it, thus giving it a new IP. I've never heard of that happening with LBs though (it happens pretty often with nodes, but not LBs). A more common case would be you ran some kubectl command that inadvertently removed the Service object and then it was recreated by some management layer you have set up (Argo, Flux, Helm Operator, whatever) but delete+recreate again means it's a new LB with a new IP. The latter case should be visible in the audit logs so check those out for sure.

Google Kubernetes Ingress health check always failing

I have configured a web application pod exposed via apache on port 80. I'm unable to configure a service + ingress for accessing from the internet. The issue is that the backend services always report as UNHEALTHY.
Pod Config:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
name: webapp
name: webapp
namespace: my-app
spec:
replicas: 1
selector:
matchLabels:
name: webapp
template:
metadata:
labels:
name: webapp
spec:
containers:
- image: asia.gcr.io/my-app/my-app:latest
name: webapp
ports:
- containerPort: 80
name: http-server
Service Config:
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
type: NodePort
selector:
name: webapp
ports:
- protocol: TCP
port: 50000
targetPort: 80
Ingress Config:
kind: Ingress
metadata:
name: webapp-ingress
spec:
backend:
serviceName: webapp-service
servicePort: 50000
This results in backend services reporting as UNHEALTHY.
The health check settings:
Path: /
Protocol: HTTP
Port: 32463
Proxy protocol: NONE
Additional information: I've tried a different approach of exposing the deployment as a load balancer with external IP and that works perfectly. When trying to use a NodePort + Ingress, this issue persists.
With GKE, the health check on the Load balancer is created automatically when you create the ingress. Since the HC is created automatically, so are the firewall rules.
Since you have no readinessProbe configured, the LB has a default HC created (the one you listed). To debug this properly, you need to isolate where the point of failure is.
First, make sure your pod is serving traffic properly;
kubectl exec [pod_name] -- wget localhost:80
If the application has curl built in, you can use that instead of wget.
If the application has neither wget or curl, skip to the next step.
get the following output and keep track of the output:
kubectl get po -l name=webapp -o wide
kubectl get svc webapp-service
You need to keep the service and pod clusterIPs
SSH to a node in your cluster and run sudo toolbox bash
Install curl:
apt-get install curl`
Test the pods to make sure they are serving traffic within the cluster:
curl -I [pod_clusterIP]:80
This needs to return a 200 response
Test the service:
curl -I [service_clusterIP]:80
If the pod is not returning a 200 response, the container is either not working correctly or the port is not open on the pod.
if the pod is working but the service is not, there is an issue with the routes in your iptables which is managed by kube-proxy and would be an issue with the cluster.
Finally, if both the pod and the service are working, there is an issue with the Load balancer health checks and also an issue that Google needs to investigate.
As Patrick mentioned, the checks will be created automatically by GCP.
By default, GKE will use readinessProbe.httpGet.path for the health check.
But if there is no readinessProbe configured, then it will just use the root path /, which must return an HTTP 200 (OK) response (and that's not always the case, for example, if the app redirects to another path, then the GCP health check will fail).

Kubernetes - Pass Public IP of Load Balance as Environment Variable into Pod

Gist
I have a ConfigMap which provides necessary environment variables to my pods:
apiVersion: v1
kind: ConfigMap
metadata:
name: global-config
data:
NODE_ENV: prod
LEVEL: info
# I need to set API_URL to the public IP address of the Load Balancer
API_URL: http://<SOME IP>:3000
DATABASE_URL: mongodb://database:27017
SOME_SERVICE_HOST: some-service:3000
I am running my Kubernetes Cluster on Google Cloud, so it will automatically create a public endpoint for my service:
apiVersion: v1
kind: Service
metadata:
name: gateway
spec:
selector:
app: gateway
ports:
- name: http
port: 3000
targetPort: 3000
nodePort: 30000
type: LoadBalancer
Issue
I have an web application that needs to make HTTP requests from the client's browser to the gateway service. But in order to make a request to the external service, the web app needs to know it's ip address.
So I've set up the pod, which serves the web application in a way, that it picks up an environment variable "API_URL" and as a result makes all HTTP requests to this url.
So I just need a way to set the API_URL environment variable to the public IP address of the gateway service to pass it into a pod when it starts.
I know this isn't the exact approach you were going for, but I've found that creating a static IP address and explicitly passing it in tends to be easier to work with.
First, create a static IP address:
gcloud compute addresses create gke-ip --region <region>
where region is the GCP region your GKE cluster is located in.
Then you can get your new IP address with:
gcloud compute addresses describe gke-ip --region <region>
Now you can add your static IP address to your service by specifying an explicit loadBalancerIP.1
apiVersion: v1
kind: Service
metadata:
name: gateway
spec:
selector:
app: gateway
ports:
- name: http
port: 3000
targetPort: 3000
nodePort: 30000
type: LoadBalancer
loadBalancerIP: "1.2.3.4"
At this point, you can also hard-code it into your ConfigMap and not worry about grabbing the value from the cluster itself.
1If you've already created a LoadBalancer with an auto-assigned IP address, setting an IP address won't change the IP of the underlying GCP load balancer. Instead, you should delete the LoadBalancer service in your cluster, wait ~15 minutes for the underlying GCP resources to get cleaned up, and then recreate the LoadBalancer with the explicit IP address.
You are trying to access gateway service from client's browser.
I would like to suggest you another solution that is slightly different from what you are currently trying to achieve
but it can solve your problem.
From your question I was able to deduce that your web app and gateway app are on the same cluster.
In my solution you dont need a service of type LoadBalancer and basic Ingress is enough to make it work.
You only need to create a Service object (notice that option type: LoadBalancer is now gone)
apiVersion: v1
kind: Service
metadata:
name: gateway
spec:
selector:
app: gateway
ports:
- name: http
port: 3000
targetPort: 3000
nodePort: 30000
and you alse need an ingress object (remember that na Ingress Controller needs to be deployed to cluster in order to make it work) like one below:
More on how to deploy Nginx Ingress controller you can finde here
and if you are already using one (maybe different one) then you can skip this step.
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: gateway-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: gateway.foo.bar.com
http:
paths:
- path: /
backend:
serviceName: gateway
servicePort: 3000
Notice the host field.
The same you need to repeat for your web application. Remember to use appropriate host name (DNS name)
e.g. for web app: foo.bar.com and for gateway: gateway.foo.bar.com
and then just use the gateway.foo.bar.com dns name to connect to the gateway app from clients web browser.
You also need to create a dns entry that points *.foo.bar.com to Ingress's public ip address
as Ingress controller will create its own load balancer.
The flow of traffic would be like below:
+-------------+ +---------+ +-----------------+ +---------------------+
| Web Browser |-->| Ingress |-->| gateway Service |-->| gateway application |
+-------------+ +---------+ +-----------------+ +---------------------+
This approach is better becaues it won't cause issues with Cross-Origin Resource Sharing (CORS) in clients browser.
Examples of Ingress and Service manifests I took from official kubernetes documentation and modified slightly.
More on Ingress you can find here
and on Services here
The following deployment reads the external IP of a given service using kubectl every 10 seconds and patches a given configmap with it:
apiVersion: apps/v1
kind: Deployment
metadata:
name: configmap-updater
labels:
app: configmap-updater
spec:
selector:
matchLabels:
app: configmap-updater
template:
metadata:
labels:
app: configmap-updater
spec:
containers:
- name: configmap-updater
image: alpine:3.10
command: ['sh', '-c' ]
args:
- | #!/bin/sh
set -x
apk --update add curl
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.16.0/bin/linux/amd64/kubectl
chmod +x kubectl
export CONFIGMAP="configmap/global-config"
export SERVICE="service/gateway"
while true
do
IP=`./kubectl get services $CONFIGMAP -o go-template --template='{{ (index .status.loadBalancer.ingress 0).ip }}'`
PATCH=`printf '{"data":{"API_URL": "https://%s:3000"}}' $IP`
echo ${PATCH}
./kubectl patch --type=merge -p "${PATCH}" $SERVICE
sleep 10
done
You probably have RBAC enabled in your GKE cluster and would still need to create the appropriate Role and RoleBinding for this to work correctly.
You've got a few possibilities:
If you really need this to be hacked into your setup, you could use a similar approach with a sidecar container in your pod or a global service like above. Keep in mind that you would need to recreate your pods if the configmap actually changed for the changes to be picked up by the environment variables of your containers.
Watch and query the Kubernetes-API for the external IP directly in your application, eliminating the need for an environment variable.
Adopt your applications to not directly depend on the external IP.

readiness probe fails with connection refused

I am trying to setup K8S to work with two Windows Nodes (2019). Everything seems to be working well and the containers are working and accessible using k8s service. But, once I introduce configuration for readiness (or liveness) probes - all fails. The exact error is:
Readiness probe failed: Get http://10.244.1.28:80/test.txt: dial tcp 10.244.1.28:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
When I try the url from k8s master, it works well and I get 200. However I read that the kubelet is the one executing the probe and indeed when trying from the Windows Node - it cannot be reached (which seems weird because the container is running on that same node). Therefore I assume that the problem is related to some network configuration.
I have a HyperV with External network Virtual Switch configured. K8S is configured to use flannel overlay (vxlan) as instructed here: https://learn.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/network-topologies.
Any idea how to troubleshoot and fix this?
UPDATE: providing the yaml:
apiVersion: v1
kind: Service
metadata:
name: dummywebapplication
labels:
app: dummywebapplication
spec:
ports:
# the port that this service should serve on
- port: 80
targetPort: 80
selector:
app: dummywebapplication
type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: dummywebapplication
name: dummywebapplication
spec:
replicas: 2
template:
metadata:
labels:
app: dummywebapplication
name: dummywebapplication
spec:
containers:
- name: dummywebapplication
image: <my image>
readinessProbe:
httpGet:
path: /test.txt
port: 80
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 60
nodeSelector:
beta.kubernetes.io/os: windows
And one more update. In this doc (https://kubernetes.io/docs/setup/windows/intro-windows-in-kubernetes/) it is written:
My Windows node cannot access NodePort service
Local NodePort access from the node itself fails. This is a known
limitation. NodePort access works from other nodes or external
clients.
I don't know if this is related or not as I could not connect to the container from a different node as stated above. I also tried a service of LoadBalancer type but it didn't provide a different result.
The network configuration assumption was correct. It seems that for 'overlay', by default, the kubelet on the node cannot reach the IP of the container. So it keeps returning timeouts and connection refused messages.
Possible workarounds:
Insert an 'exception' into the ExceptionList 'OutBoundNAT' of C:\k\cni\config on the nodes. This is somewhat tricky if you start the node with start.ps1 because it overwrites this file everytime. I had to tweak 'Update-CNIConfig' function in c:\k\helper.psm1 to re-insert the exception similar to the 'l2bridge' in that file.
Use 'l2bridge' configuration. Seems like 'overlay' is running in a more secured isolation, but l2bridge is not.