I am using HAProxy as the ingress-controller in my GKE clusters. And exposing HAProxy service as LoadBalancer service(Internal).
Recently, I experienced an issue, where the HA-Proxy service changed its EXTERNAL-IP, and traffic stopped routing to HAProxy. This issue occurred multiple times on different days(now it has stopped). I had to manually add that new External-IP to the frontend of that Loadbalancer to allow traffic to HAProxy.
There were two pods running for HAProxy, and both had been running for days, and there was nothing in their logs. I assume it was something related to Service or GCP LB and not HAProxy itself.
I am afraid that I don't have any logs related to that.
I still don't know, what caused the service IP to change. As there were no recent changes, and the cluster and all services were running for many days properly, and suddenly this occurred.
Has anyone faced a similar issue earlier? Or what can I do to avoid such issue in future?
What could have caused the IP to change?
This is how my service is configured:
---
apiVersion: v1
kind: Service
metadata:
labels:
run: haproxy-ingress
name: haproxy-ingress
namespace: haproxy-controller
annotations:
cloud.google.com/load-balancer-type: "Internal"
networking.gke.io/internal-load-balancer-allow-global-access: "true"
cloud.google.com/network-tier: "Premium"
spec:
selector:
run: haproxy-ingress
type: LoadBalancer
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
- name: stat
port: 1024
protocol: TCP
targetPort: 1024
Found some logs:
Warning SyncLoadBalancerFailed 30m (x3570 over 13d) service-controller Error syncing load balancer: failed to ensure load balancer: googleapi: Error 409: IP_IN_USE_BY_ANOTHER_RESOURCE - IP '10.17.129.17' is already being used by another resource.
Normal EnsuringLoadBalancer 3m33s (x3576 over 13d) service-controller Ensuring load balancer
The Short answer is: External IP for the service are ephemeral.
Because HA-Proxy controller pods are recreated the HA-Proxy service is created with an ephemeral IP.
To avoid this issue, I would recommend using a static IP that you can reference in the loadBalancerIP field.
This can be done by following steps:
Reserve a static IP. (link)
Use this IP, to create a service (link)
Example YAML:
apiVersion: v1
kind: Service
metadata:
name: helloweb
labels:
app: hello
spec:
selector:
app: hello
tier: web
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
loadBalancerIP: "YOUR.IP.ADDRESS.HERE"
Unfortunately without logs it's hard to say anything for sure. You should check the audit logs that GKE ships to Cloud Logging as that might give you some idea of what happened. One option is the GCP "oops"'d the GLB and GKE recreated it, thus giving it a new IP. I've never heard of that happening with LBs though (it happens pretty often with nodes, but not LBs). A more common case would be you ran some kubectl command that inadvertently removed the Service object and then it was recreated by some management layer you have set up (Argo, Flux, Helm Operator, whatever) but delete+recreate again means it's a new LB with a new IP. The latter case should be visible in the audit logs so check those out for sure.
Related
I have the following service:
apiVersion: v1
kind: Service
metadata:
name: hedgehog
labels:
run: hedgehog
spec:
ports:
- port: 3000
protocol: TCP
name: restful
- port: 8982
protocol: TCP
name: websocket
selector:
run: hedgehog
externalIPs:
- 1.2.4.120
In which I have specified an externalIP.
I'm also seeing this IP under EXTERNAL-IP when running kubectl get services.
However, when I do curl http://1.2.4.120:3000 I get a timeout. However the app is supposed to give me a response because the jar running inside the container in the deployment does respond to localhost:3000 requests when run locally.
if you see the type of your service might be cluster IP try changing the type to LoadBalancer
apiVersion: v1
kind: Service
metadata:
name: http-service
spec:
clusterIP: 172.30.163.110
externalIPs:
- 192.168.132.253
externalTrafficPolicy: Cluster
ports:
- name: highport
nodePort: 31903
port: 30102
protocol: TCP
targetPort: 30102
selector:
app: web
sessionAffinity: None
type: LoadBalancer
something like this where type: LoadBalancer.
First of all you have to understand you cannot place any random address in your ExternalIP field. Those addresses are not managed by Kubernetes and are the responsibility of the cluster administrator or you. External IP addresses specified with externalIPs are different than the external IP address assigned to a service of type LoadBalancer by a cloud provider.
I checked the address that you mentioned in the question and it does not look like it belongs to you. That why I suspect that you placed a random one there.
The same address appears in this article about ExternalIP. As you can see here the address in this case are the IP addresses of the nodes that Kubernetes runs on.
This is potential issue in your case.
Another one is too verify if your application is listening on localhost or 0.0.0.0. If it's really localhost then this might be another potential problem for you. You can change where the server process is listening. You do this by listening on 0.0.0.0, which means “listen on all interfaces”.
Lastly please verify that your selector/ports of the services are correct and that you have at least one endpoint that backs your service.
I'm trying to use a load balancer to expose a service I have running on an EKS pod. My service is defined in a yaml like this:
kind: Service
apiVersion: v1
metadata:
name: mlflow-server
namespace: default
labels:
app: mlflow-server
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
externalTrafficPolicy: Local
type: LoadBalancer
selector:
app: mlflow-server
ports:
- name: http
port: 88
targetPort: http
- name: https
port: 443
targetPort: https
This is to define a service for a pod that I have mlflow server running on. When I apply this and access the external IP generated for the service, I get a This site can’t be reached webpage error. Is there something I'm missing with exposing my service as a load balanced service to access the mlflow ui?
For a basic Loadbalancer type service you do not need the annotation service.beta.kubernetes.io/aws-load-balancer-type: nlb this creates the network load balancer. Now if you need it to be an NLB then there might be following problems:
The nlb takes few minutes to come up when you apply the setting. If you check it just after you deploy it it will not be able to accept the traffic. Please do check if the intended network loadbalancer is up in your AWS-EC2console > Loadbalancer tab.
The second problem that is more likely to happen is that the NLB is can be attached with only some instance types only. To check that you can go through the following link.
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-register-targets.html#register-deregister-targets
So if you actually do not have the need of network loadbalancer remove the annotation as the nlb has an higher charge as well. But, if that is the dire requirement do check with the second option if the instances that you are using on AWS are compatible with Network LoadBalancer.
I am working with aks service. I have a Tensorflow serving image at azure container registry. Now when I deploy my service, the public service endpoint is not accessible neither is it pingable.
My image is exposed at port 8501 , so I am using it as a target port in my yaml.
Here is the yaml file I am using for this deployment.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-model-gpu
spec:
replicas: 1
template:
metadata:
labels:
app: my-model-gpu
spec:
containers:
- name: my-model-gpu
image: dsdemocr.azurecr.io/work-place-safety-gpu
ports:
- containerPort: 8501
resources:
limits:
nvidia.com/gpu: 1
imagePullSecrets:
- name: registrykey
---
apiVersion: v1
kind: Service
metadata:
name: my-model-gpu
spec:
type: LoadBalancer
ports:
- port: 8501
protocol: "TCP"
targetPort: 8501
selector:
app: my-model-gpu
below are my svc description : kubectl describe svc my-model-gpu
Name: my-model-gpu
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-model-gpu","namespace":"default"},"spec":{"ports":[{"port":850...
Selector: app=my-model-gpu
Type: LoadBalancer
IP: 10.0.244.106
LoadBalancer Ingress: 52.183.17.101
Port: <unset> 8501/TCP
TargetPort: 8501/TCP
NodePort: <unset> 31546/TCP
Endpoints: 10.244.0.22:8501
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 10m service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 9m8s service-controller Ensured load balancer
Looks like I am making some mistake with port mapping. Any help is much appreciated.
From the information you provided, there is no problem with the service having the load balancer type. As I think, the possible reasons are all about your application and I would list below:
the port which you expose is not right, so you need to make sure what is the right port to expose.
I see you want to use GPU in AKS, so you need to choose the right VM size for GPU in the creation time. This may also cause the problem that your application not in the running state.
there are other problems happen to your application and it cause your application not run well. So you also need to check your application state.
To the imagePullSecret, maybe you do not assign enough permission for the service principal to pull the image. This reason is less likely, but I also list here.
Hope this helps you.
Please follow the advice provided by community:
Verify if your application is working properly,
Does the Service have any Endpoints?
Researching this topic I would suggest to take a look for those Azure specific information:
Use a static public IP address with the Azure Kubernetes Service (AKS) load balancer
Preview - Use a Standard SKU load balancer in Azure Kubernetes Service (AKS)
If the static IP address defined in the loadBalancerIP property of the Kubernetes service manifest does not exist, or has not been created in the node resource group and no additional delegations configured, the load balancer service creation fails.
There is very similar case on github.
If using Advanced networking it creates the vNet in the same resource group as the AKS service by default.
Note:
Currently only Basic IP SKUis supported. Work is in progress to support the Standard IP resource SKU. For more information, see IP address types and allocation methods in Azure.
Additional resources:
Azure-LoadBalnacer related:
Azure - Service Type LoadBalancer
Azure - Internal load balancer
Hope this help.
The container i was trying to access had no port open on 8501 , once i fixed it it worked well.
I think you need to access application on 52.183.17.101:8501
Because you didn't defined the routing the traffic to 80 port of loadbalancer.
By default it will create loadbalancer listening on 8501.
I have a setup Metallb as LB with Nginx Ingress installed on K8S cluster.
I have read about session affinity and its significance but so far I do not have a clear picture.
How can I create a single service exposing multiple pods of the same application?
After creating the single service entry point, how to map the specific client IP to Pod abstracted by the service?
Is there any blog explaining this concept in terms of how the mapping between Client IP and POD is done in kubernetes?
But I do not see Client's IP in the YAML. Then, How is this service going to map the traffic to respective clients to its endpoints? this is the question I have.
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000
Main concept of Session Affinity is to redirect traffic from one client always to specific node. Please keep in mind that session affinity is a best-effort method and there are scenarios where it will fail due to pod restarts or network errors.
There are two main types of Session Affinity:
1) Based on Client IP
This option works well for scenario where there is only one client per IP. In this method you don't need Ingress/Proxy between K8s services and client.
Client IP should be static, because each time when client will change IP he will be redirected to another pod.
To enable the session affinity in kubernetes, we can add the following to the service definition.
service.spec.sessionAffinity: ClientIP
Because community provided proper manifest to use this method I will not duplicate.
2) Based on Cookies
It works when there are multiple clients from the same IP, because it´s stored at web browser level. This method require Ingress object. Steps to apply this method with more detailed information can be found here under Session affinity based on Cookie section.
Create NGINX controller deployment
Create NGINX service
Create Ingress
Redirect your public DNS name to the NGINX service public/external IP.
About mapping ClientIP and POD, according to Documentation
kube-proxy is responsible for SessionAffinity. One of Kube-Proxy job
is writing to IPtables, more details here so thats how it is
mapped.
Articles which might help with understanding Session Affinity:
https://sookocheff.com/post/kubernetes/building-stateful-services/
https://medium.com/#diegomrtnzg/redirect-your-users-to-the-same-pod-by-using-session-affinity-on-kubernetes-baebf6a1733b
follow the service reference for session affinity
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000
Kubernetes version:
v1.10.3
Docker version:
17.03.2-ce
Operating system and kernel:
Centos 7
Steps to Reproduce:
https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/
Results:
[root#rd07 rd]# kubectl describe services example-service
Name: example-service
Namespace: default
Labels: run=load-balancer-example
Annotations:
Selector: run=load-balancer-example
Type: NodePort
IP: 10.108.214.162
Port: 9090/TCP
TargetPort: 9090/TCP
NodePort: 31105/TCP
Endpoints: 192.168.1.23:9090,192.168.1.24:9090
Session Affinity: None
External Traffic Policy: Cluster
Events:
Expected:
Expect to be able to curl the cluster ip defined in the kubernetes service
I'm not exactly sure which is the so called "public-node-ip", so I tried every related ip address, only when using the master ip as the "public-node-ip" it shows "No route to host".
I used "netstat" to check if the endpoint is listened.
I tried "https://github.com/rancher/rancher/issues/6139" to flush my iptables, and it was not working at all.
I tried "https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/", "nslookup hostnames.default" is not working.
The services seems working perfectly fine, but the services still cannot be accessed.
I'm using "calico" and the "flannel" was also tried.
I tried so many tutorials of apply services, they all cannot be accessed.
I'm new to kubernetes, plz if anyone could help me.
If you are on any public cloud you are not supposed to get public ip address at ip a command. But even though the port will be exposed to 0.0.0.0:31105
Here is the sample file you can verify for your configuration:
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: app-name
name: bss
namespace: default
spec:
externalIPs:
- 172.16.2.2
- 172.16.2.3
- 172.16.2.4
externalTrafficPolicy: Cluster
ports:
- port: 9090
protocol: TCP
targetPort: 9090
selector:
k8s-app: bss
sessionAffinity: ClientIP
type: LoadBalancer
status:
loadBalancer: {}
Just replace your <private-ip> at externalIPs: and do curl your public ip with your node port.
If you are using any cloud to deploy application, Also verify configuration from cloud security groups/firewall for opening port.
Hope this may help.
Thank you!
My k8s cluster is 1 master and 1 node.
The service pod is running on the node.
So I used http://nodeip:31105, it shows "Hello Kubernetes!".
But http://masterip:31105 still not working, is it suppose to be right?
I checked the endpoint listen, 31105 is listened on master.