Looking to migrate to using AWS Fargate to host a number of containers to be load balanced via HAProxy, it seems an elegant method to then use a combination of AWS Cloudmap for service discovery and then HAProxy DNS (server-template) syntax to autopopulate the backend servers.
However it's come to attention that route 53 the underlying system of Cloudmap only returns 8 A or SRV records at most which from HAProxy documentation makes it sound like it will continuously mark the nodes not returned in the latest DNS call to be marked as unhealthy which would lead to backends being constantly dropped and re-added to the HAProxy pool even if they're all healthy.
I can only assume this is something others have encountered before and if there's a trick to get ting HAProxy to accomodate for the maximum value of 8 backend servers?
HAProxy supports DNS service discovery with the server-template directive. Make sure you configure a resolvers section and use it with the resolvers directive on the server line. There's a blog post here. If you find that you need to accommodate more records you can adjust the accepted_payload_size size.
Related
For statefulsets, I am wondering if it's possible to put each replica behind a virtual ip, perhaps using a Service, so that we have the same connection and DNS behavior for per replica host names as we do for ClusterIP hostname that we get for a non-headless service.
When we use replica hostnames, we seem to lose the load balancing and connection management provided by virtual ip, and that causes problems for our app.
You can improve your DNS request by deploying a NodeLocal DNS Cache. This could help to reduce the average DNS lookup time. The local DNS cache can be used with kube dns ConfigMAP to automatically pick up stub domains and upstream nameservers.
You can enable this feature in an existing cluster adding the –update-addons with the argument NodeLocalDNS=ENABLED like it is shown in the following example:
gcloud container clusters update CLUSTER_NAME \
--update-addons=NodeLocalDNS=ENABLED
You can find more information regarding this feature in this link:
Also to setup a service when you are using a StatefulSet you can use the Pod label, this label allows you to attach a Service to a specific Pod.
In addition you can deploy a Health check to review if your backend responds to traffic, If the backend fails to respond will be marked as unhealthy and the traffic will be attended by the healthy backend
I have a statefulset that I need to run using the host network, purely for performance reasons. But I also want to be able to reference service-name endpoints. Is it possible to do this? ClusterFirstWithHostNet does not work because it doesn't prioritize using the host's network. The dnsConfig configuration might be promising, but I don't know how I would configure it to do what I'm asking about.
This is a community wiki answer. Feel free to expand it.
It might be possible if the app can select random port to listen during start and change if port is busy. However, Kubernetes is not involved in the selecting port for the application.
Statefulset requires a headless service, so it doesn't have an IP and works as a set of DNS records in coredns. A record would probably contain the same IP for the replicas on the same node, but SRV record may actually provide a proper endpoint.
For further reference, please take a look at the below sources:
How do I get individual pod hostnames in a Deployment registered and looked up in Kubernetes?
SRV records
I'm hosting a Kubernetes cluster on VMs/VPS from a random cloud provider not providing any Kubernetes things at all, meaning with a dedicated public IP address and to allow the trafic coming to the worker nodes, I'm defining my Service with the spec.externalIPs with the fixed list of IP addresses.
I'm looking for a way to get that list updated when a node is drained/down automatically.
I had a look at the existing operators from https://operatorhub.io/ but I haven't found any that seem to cover my use case.
The idea would that when the event of a node passing to NotReady is emitted, the Service is updated with the Nodes being Ready.
Is there any operator that could allow doing that?
After some time working on this, I finally figured out that this is not possible, at least today, there's no known operator or what so ever that could update the field with the IP addresses.
And even if it was the case, there would be delays to update the DNS records.
What I've done instead is to buy another VPS, installing HAproxy in order to proxy the Kubernetes API trafic to the master nodes, and the web trafic (both 80 and 443) to the Kubernetes worker nodes.
HAproxy monitors the nodes, and add/remove nodes automagically and in a very quick way.
With this, you just need one DNS record, pointing to the Load Balancer (or VIP of the Load Balancers in order to avoid SPOF), and HAproxy will do the rest!
I have an HPC cluster application where I am looking to replace MPI and our internal cluster management software with a combination of Kubernetes and some middleware, most likely ZMQ or RabbitMQ.
I'm trying to design how best to do peer discovery on this system using Kubernetes' service discovery.
I know Kubernetes can provide a DNS name for a given service, and that's great, but is there a way to also dynamically discover ports?
For example, assuming I replaced the MPI middleware with ZeroMQ, I would need a way for ranks (processes on the cluster) to find each other. I know I could simply have the ranks issue service creation messages to the Kubernetes discovery mechanism and get a hostname like myapp_mypid_rank_42 fairly easily, but how would I handle the port?
If possible, it would be great if I could just do:
zmqSocket.connect("tcp://myapp_mypid_rank_42");
but I don't think that would work since I have no port number information from DNS.
How can I have Kubernetes service discovery also provide a port in as simple a manner as possible to allow ranks in the cluster to discover each other?
Note: The registering process knows its port and can register it with the K8s service discovery daemon. The problem is a quick and easy way to get that port number back for the processes that want it. The question I'm asking is whether or not there is a mechanism as simple as a DNS host name, or will I need to explicitly query both hostname and port number from the k8s daemon rather than simply building a hostname based on some agreed upon rule (like building a string from myapp_mypid_myrank)?
Turns out the best way to do this is with a DNS SRV record:
https://kubernetes.io/docs/concepts/services-networking/service/#discovering-services
https://en.wikipedia.org/wiki/SRV_record
A DNS SRV record provides both a hostname/IP and a port for a given request.
Luckily, Kubernetes service discovery supports SRV records and provides them on the cluster's DNS.
I think in the most usual case you should know the port number to access your services.
But if it is useful, Kubernetes add some environment variables to every pod to ease autodiscovery of all services. For example {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT. Docs here
It a Kubernetes cluster where everything is highly available, the DNS is a key piece of the system, everything relies on the DNS.
The pod kube2sky has a parameter "-kube_master_url" where, afaik, you can only specify one api server node.
You might have multiple api servers for redundancy behing a service, but if the one that kube2sky is using gets down, the whole DNS system gets down too, hence, the highly availabily of the cluster is gone.
For other pods, you can use the internal DNS name of the api server service, but in this case, you can't since this is the actual DNS service.
Any idea how to solve this issue?
In its standard configuration, kube2sky doesn't actually rely on having a single apiserver IP address to use. Instead, it uses the virtual IP of the kubernetes service that gets auto-created in every cluster, and which the kube-proxy sets up iptables rules for. It's briefly explained in the docs on github.
Also, it's recommended that replicated masters are put behind a load balancer in such high-availability configurations to avoid problems like this with client tools.