I have many services connected to zookeeper, and I want that service A can get service B's IP, when service B connected to zookeeper, is there any API can do that? Or I have to use other config file to write down all services's IP?
Take a look if this solves your problem:
http://curator.apache.org/curator-x-discovery/
Zookeeper doesn't provide service discovery out of the box, but it is easy to implement it yourself.
You won't be able to get the IP addresses of other connected clients (services, in your case) straight from the Zookeeper API. In order to get other services connected to the cluster, each service has to individually create an ephemeral znode under a specific path, e.g. /services, and set the necessary addressing and naming info as znode's data (IP, port, etc). This way, you can list that path and discover active services, or watch the /services path for any changes in your service configuration.
Since services are creating ephemeral nodes, they will automatically be removed once they are disconnected and their session expires. Of course, once you start doing something like this, you will see that there are many small details and design decisions you have to make, ergo the already mentioned Curator recipe.
Related
We have created a statefulset & headless service. There are 2 ways by which we can define peer ips in application:
Use 'cassandra-headless-service-name' in contactPoints
Fetch the peers ip from headless-service & externalize the peers ip and read these ips when initializing the connection.
SO far so good.
Above will work if one/some pods are restarted, not all. In this case, driver will updated the new ips automatically.
But, how this will work in case of complete outage ? If all pods are down & when they come back, if all pods ip are changed (IP can change in Kubernetes), how do application will connect to Cassandra?
In a complete outage, you're right, the application will not have any valid endpoints for the cluster. Those will need to be refreshed (and the app restarted) before the app will connect to Cassandra.
We actually wrote a RESTful API that we can use query current, valid endpoints by cluster. That way, the app teams can find the current IPs for their cluster at any time. I recommend doing something similar.
I am new to distributed systems, and came to this problem once needed to deploy a gRPC service to kubernetes (GKE). As far as I know, when a client initiate an rpc, it creates a long lasting http2 connection and further calls are multiplexed on it. I like to send/push notifications or similar messages to the client through this connection. If I deploy to multiple pod, then the connections are spread across them, and not sure what is the best way to locate the instance where the channel is registered to the client. A possible solution could be, as soon as user initiate a connection, keep a reference of clientId and pod ip (or some identification) in a centralized service and other pods lookup the pod and forward the message to it. Is something like is advisable or is there an existing solution for this? I am unfamiliar with this space and any suggestion is highly appreciated.
Edit: (response to #mebius99)
While looking at deploying option, I stumbled upon GKE, and other cloud deployment options were limited because of my use of gRPC/http2. Thanks for mentioning service discovery , and that or service mesh might be an option. With gRPC, client maintains a long lived connection to a single pod. So, I want every pod to be able to query, based on unique clientId (clients can do an initial register rpc call), which pod is it connected, so can make use of this connection and also a way pods to forward the message between them. So, something like when I get a registration call from client, I update the central registry about the client and pod ip, then look it up from any pod and forward package to it so it further forward to client through the existing streaming connection. You guiding me to the right direction, please let me know above is possible in container environment.
thank you.
Another idea, You can use Envoy proxy.
If you are using GKE, these posts are helpful.
https://cloud.google.com/solutions/exposing-grpc-services-on-gke-using-envoy-proxy
https://github.com/GoogleCloudPlatform/grpc-gke-nlb-tutorial
I'd suggest to start from the Kubernetes Service concept and Service discovery. The External HTTP(S) Load Balancing should fit your needs.
In case you need something more sophisticated, Envoy proxy + Network Load Balancing could be a solution, as is mentioned here.
It sounds like you want to implement some kind of Pub-Sub system.
You must do some back-of-envelop calculation of the scale, such as how many clients, how many messages per second first.
Then you can choose whether to implement yourself or pick an off-the-shelf system, such as https://doc.akka.io/docs/alpakka/current/google-cloud-pub-sub-grpc.html
I just want to add more explanations to the existing answers here.
Since requests in HTTP/2 is multiplexed (multiple requests can be active on the same connection at any point in time), requests will be just pinned to a single Kubernetes pod. Hence, we need to configure a service mesh to shift from connection-based balancing to request-based balancing. Envoy Proxy mentioned here is one example.
I'd recommend everyone to read this good article from Kubernetes blog https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears.
I have an HPC cluster application where I am looking to replace MPI and our internal cluster management software with a combination of Kubernetes and some middleware, most likely ZMQ or RabbitMQ.
I'm trying to design how best to do peer discovery on this system using Kubernetes' service discovery.
I know Kubernetes can provide a DNS name for a given service, and that's great, but is there a way to also dynamically discover ports?
For example, assuming I replaced the MPI middleware with ZeroMQ, I would need a way for ranks (processes on the cluster) to find each other. I know I could simply have the ranks issue service creation messages to the Kubernetes discovery mechanism and get a hostname like myapp_mypid_rank_42 fairly easily, but how would I handle the port?
If possible, it would be great if I could just do:
zmqSocket.connect("tcp://myapp_mypid_rank_42");
but I don't think that would work since I have no port number information from DNS.
How can I have Kubernetes service discovery also provide a port in as simple a manner as possible to allow ranks in the cluster to discover each other?
Note: The registering process knows its port and can register it with the K8s service discovery daemon. The problem is a quick and easy way to get that port number back for the processes that want it. The question I'm asking is whether or not there is a mechanism as simple as a DNS host name, or will I need to explicitly query both hostname and port number from the k8s daemon rather than simply building a hostname based on some agreed upon rule (like building a string from myapp_mypid_myrank)?
Turns out the best way to do this is with a DNS SRV record:
https://kubernetes.io/docs/concepts/services-networking/service/#discovering-services
https://en.wikipedia.org/wiki/SRV_record
A DNS SRV record provides both a hostname/IP and a port for a given request.
Luckily, Kubernetes service discovery supports SRV records and provides them on the cluster's DNS.
I think in the most usual case you should know the port number to access your services.
But if it is useful, Kubernetes add some environment variables to every pod to ease autodiscovery of all services. For example {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT. Docs here
I have a scenario where one of our services exposes WCF hosts that receive callbacks from an external service.
These hosts are dynamically created and there may be hundreds of them. I need to ensure that they are all up and running on the node before the node starts receiving requests so they don't receive failures, this is critical.
Is there a way to ensure that the service doesn't receive requests until I say it's ready? In cloud services I could do this by containing all this code within the OnStart method.
My initial thought is that I might be able to bootstrap this before I open the communication listener - in the hope that the fabric manager only sends requests once this has been done, but I can't find any information on how this lifetime is handled.
There's no "fabric manager" that controls network traffic between your services within the cluster. If your service is up, clients or other services inside the cluster can choose to try to connect to it if they know its address. With that in mind, there are two things you have control over here:
The first is whether or not your service's endpoint is discoverable by other services or clients. This is the point at which your service endpoint is registered with Service Fabric's Naming Service, which occurs when your ICommunicationListener.OpenAsync method returns. At that point, the service endpoint is registered and others can discover it and attempt to connect to it. Of course you don't have to use the Naming Service or the ICommunicationListener pattern if you don't want to; your service can open up an endpoint whenever it feels like it, but if you don't register it with the Naming Service, you'll have to come up with your own service discovery mechanism.
The second is whether or not the node on which your service is running is receiving traffic from the Azure Load Balancer (or any load balancer if you're not hosting in Azure). This has less to do with Service Fabric and more to do with the load balancer itself. In Azure, you can use a load balancer probe to determine whether or not traffic should be sent to nodes.
EDIT:
I added some info about the Azure Load Balancer to our documentation, hope this helps: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-connect-and-communicate-with-services/
I'm looking at consensus-type tools like ZooKeeper, Consul and Eureka and they all seem to market the same set of solutions:
Service discovery
Dynamic, centralized configuration management
Synchronization primitives
Consensus algorithms
However the more I read about these things, the more I struggle to see how service discovery is really any different than a dynamic, centralized configuration management (KV pair) system.
My understanding (so far) of service discovery is that it allows nodes to dynamically search for, find and connnect to remote services. So if an application uses an AuthService for authentication, authorization, it would use service discovery to find an AuthService node at, say, http://auth103.example.org:9103 and use it.
My understanding of dynamic config systems is that they provide a centralized infrastructure for nodes to dynamically receive updates from, as well as publish updates to, config servers. So if an app instance decides it needs to update a configuration for all its other instances, it would contact the config service and update, say, the numPurgerThreads config. The config service would then update all other app instances so that they updated their respective configs properly.
But aren't these exactly the same problem?
In both cases, you:
Connect to a lookup service of some sort
Query it for data; or
Publish data to it, which then ripples out to other nodes
Service discovery is dynamic configuration, right?!?!
What I'm really driving at is: Couldn't I just implement one config service with one of these tools, which coincidentally also solves service discovery? Or is there a reason why I would need to have, say, 1 Consul cluster for config/KV management, and another, say, Consul cluster for service discovery?
Well, if you look at it like that, these are all just flavors of databases, or data stores if you will, as you describe it:
Connect to a lookup service of some sort
Query it for data; or
Publish data to it, which then ripples out to other nodes
All of the use cases you mentioned require some sort of a data store that multiple clients connect to. What makes some of them more optimal then others for different use cases is their interface, data model, consistency guaranties etc.
Specifically, in case of service discovery, a nice feature can be failure detection - e.g. a k/v store that allows removing service data when the service is no longer connected. This way, you can register a service to your service discovery tool and know that when the service goes down or looses connectivity, it will no longer be present in the stored data.
Just to clarify a few terms, (imo) dynamic config systems can update their state at runtime, in contrast to static where everything is defined beforehand (i.e. config files). Centralized CM means there's a single place to store all config data. So centralized CM can both be static or dynamic, right?
IMO, service discovery has a component which CM doesn't and that's a protocol for automatic detection of services.
Guess you'd need a dynamic CM (KV store) to implement service discovery, but you couldn't implement a CM store just using a service discovery (protocol). Take DHCP for example (hope we agree it is a service discovery protocol?) - Dynamic Host Configuration Protocol', so it has a configuration aspect, but also a protocol, which is a bit more than a simple KV store. Btw, it's decentralized, just to illustrate the previous point that CM doesn't have to be centralized..
So service discovery has a configuration aspect, but CM doesn't (necessarily) have a service discovery one. Does this mean SD is a subset of CM? I'd say no.