how sockets or communication channels are maintained in distibuted system - kubernetes

I am new to distributed systems, and came to this problem once needed to deploy a gRPC service to kubernetes (GKE). As far as I know, when a client initiate an rpc, it creates a long lasting http2 connection and further calls are multiplexed on it. I like to send/push notifications or similar messages to the client through this connection. If I deploy to multiple pod, then the connections are spread across them, and not sure what is the best way to locate the instance where the channel is registered to the client. A possible solution could be, as soon as user initiate a connection, keep a reference of clientId and pod ip (or some identification) in a centralized service and other pods lookup the pod and forward the message to it. Is something like is advisable or is there an existing solution for this? I am unfamiliar with this space and any suggestion is highly appreciated.
While looking at deploying option, I stumbled upon GKE, and other cloud deployment options were limited because of my use of gRPC/http2. Thanks for mentioning service discovery , and that or service mesh might be an option. With gRPC, client maintains a long lived connection to a single pod. So, I want every pod to be able to query, based on unique clientId (clients can do an initial register rpc call), which pod is it connected, so can make use of this connection and also a way pods to forward the message between them. So, something like when I get a registration call from client, I update the central registry about the client and pod ip, then look it up from any pod and forward package to it so it further forward to client through the existing streaming connection. You guiding me to the right direction, please let me know above is possible in container environment.
Another idea, You can use Envoy proxy.
If you are using GKE, these posts are helpful.

I'd suggest to start from the Kubernetes Service concept and Service discovery. The External HTTP(S) Load Balancing should fit your needs.
In case you need something more sophisticated, Envoy proxy + Network Load Balancing could be a solution, as is mentioned here.

It sounds like you want to implement some kind of Pub-Sub system.
You must do some back-of-envelop calculation of the scale, such as how many clients, how many messages per second first.
Then you can choose whether to implement yourself or pick an off-the-shelf system, such as

I just want to add more explanations to the existing answers here.
Since requests in HTTP/2 is multiplexed (multiple requests can be active on the same connection at any point in time), requests will be just pinned to a single Kubernetes pod. Hence, we need to configure a service mesh to shift from connection-based balancing to request-based balancing. Envoy Proxy mentioned here is one example.
I'd recommend everyone to read this good article from Kubernetes blog


Kubernetes-services load balancing

I have read this question which is very similar to what I am asking, but still wanted to write a new question since the accepted answer there seems very incomplete and also potentially wrong.
Basically, it seems like there is some missing or contradictory information regarding built in load-balancing for regular Kubernetes Services (I am not talking about LoadBalancer services). For example, the official Cilium documentation states that "Kubernetes doesn't come with an implementation of Load Balancing". In addition, I couldn't find any information in the official Kubernetes documentation about load balancing for internal services (there was only a section discussing this under ingresses).
So my question is - how does load balancing or distribution of requests work when we make a request from within a Kubernetes cluster to the internal address of a Kubernetes service?
I know there's a Kubernetes proxy on each node that creates the DNS records for such services, but what about services that span multiple pods and nodes? There's got to be some form of request distribution or load-balancing, or else this just wouldn't work at all, no?
A standard Kubernetes Service provides basic load-balancing. Even for a ClusterIP-type Service, the Service has its own cluster-internal IP address and DNS name, and forwards requests to the collection of Pods specified by its selector:.
In normal use, it is enough to create a multiple-replica Deployment, set a Service to point at its Pods, and send requests only to the Service. All of the replicas will receive requests.
The documentation discusses the implementation of internal load balancing in more detail than an application developer normally needs. Unless your cluster administrator has done extra setup, you'll probably get round-robin request routing – the first Pod will receive the first request, the second Pod the second, and so on.
... the official Cilium documentation states ...
This is almost certainly a statement about external load balancing. As a cluster administrator (not a programmer) a "plain" Kubernetes installation doesn't include an external load-balancer implementation, and a LoadBalancer-type Service behaves identically to a NodePort-type Service.
There are obvious deficiencies to round-robin scheduling, most notably if you do wind up having individual network requests that take a long time and a lot of resource to service. As an application developer the best way to address this is to make these very-long-running requests run asynchronously; return something like an HTTP 201 Created status with a unique per-job URL, and do the actual work in a separate queue-backed worker.

Launch a specific Pod via API and connect from outside

I am currently designing a system where users should be able to start a simulation through a Web Portal and then connect to it with a gRPC client (amongst other things). After the user is finished the simulation then terminates. I want to run the whole system in a kind of microservice architecture in a kubernetes cluster if possible. This is however my first time working with kubernetes and I am unsure if it is possible to achieve this.
As far as I could gather from reading the documentation and googling around it seems like I should be able to launch a pod by calling POST /api/v1/namespaces/{namespace}/pods and making it availble under the Host IP by setting hostPort. However what I dont know is how I would determine a free port on the Node to deploy to or let kubernetes decide that (if hostPort is even the correct choice for this). After that it should be pretty straightforward. Send the user the IP:Port to connect to and he just plugs that into his gRPC client.
Any suggestions on how to best achieve this?
Using hostPort is rather not recommended, so you'd be better off by specifying a service and access your Pod via a service. In your case you can define NodePort service and let Kubernetes decide on the port. Then, fetch the service port using Kubernetes API.

What are the best practices for a health check API and probes in micro-services Kubernetes environment?

We are developing tons of micro-services. They all run in Kubernetes. As ops, I need to define probes for each micro-service. So we will create a health check API for each micro-service. What are the best practices for this API? What are the best practices for probes? Do we need to check the service's health only or the database connection too (and more)? Is it redundant? The databases are in Kubernetes too, and have their own probes too. Can we just use the /version API as the probe?
I'm looking for feedback and documentation. Thank you.
An argument for including databases and other downstream dependencies in the health check is the following:
Assume you have a load balancer exposing some number of micro-services to the outside world. If due to a large amount of load the database of one of these micro-services goes down, and this is not included in the health check of the micro-service, the load balancer will still try to direct traffic to micro-service, further increasing the problem the database is experiencing.
If instead the health-check included downstream dependencies, the load-balancer would stop directing traffic to the micro-service (and hopefully show a nice error message to the user). This would give the database time to restore from the increase in load (and ops time to react).
So I would argue that using a basic /version is not a good idea.
A microservice generally calls other microservices/services to retrieve data, and there is the chance that the downstream service may be down. You can use the "Circuit Breaker Pattern". This pattern is suited to, prevent an application from trying to invoke a remote service or access a shared resource if this operation is highly likely to fail.
You will find a pattern in Observability Patterns (/Health Check) in Microservices. Each service needs to have an endpoint that can be used to check the health of the application, such as /health.1

Use service discovery to dispatch jobs with same ID to same worker node

(Apology in advance for the noob question; I have zero experience with DevOps.)
In my recent project I stumbled upon this problem that I don’t know if service discovery tools (such as Consul/Istio/etc.) can address.
Our use case is this: we have a VoIP app similar in idea to Discord. Users can join a voice channel and start talking. However, to forward the voice packets between users in a same voice channel, their WebRTC voice connections need to be handled by a same server process, so that we can process & forward all the voice packets in a voice channel in-memory.
In order to do this, we have a separate service (call it service X) in front of our voice service (service V) that receives a user request to join channel N, and based on N, assign a server in service V to the user. We need to guarantee that for the same channel N, X always picks the same server in V.
We implemented this in a non-scalable way just for quick prototyping. Now that we want to implement this properly, I’m wondering if tools like Consul/Istio/etc. can help us in this scenario. Is there a common approach to address this kind of problems?
Istio won't necessarily help you since it's more about [controlling traffic](Like you mentioned you can use Consul as a service discovery tool, or ). For example, doing canary deployment or applying security to your service. Quoted from the docs:
Istio doesn’t provide DNS resolution. Applications can try to resolve the FQDN by using the DNS service present in their platform of choice, for example kube-dns.
You can use the standard Kubernetes service discovery using DNS for Services and Pods. Or like you mentioned you can use Consul as a service discovery tool, the added benefit of using something like Consul is that since it's not Kubernetes specific you could potentially also use for services outside your Kubernetes cluster or in other Kubernetes clusters.
Since sounds like your initial connections come and go, it sounds like that in order to who joins what channel and what channel talks to what backend you will need to keep state somewhere like a database, or key-value store.

Notify containers of updated pods in Kubernetes

I have some servers I want to deploy in Kubernetes. The clients of those servers will also be in Kubernetes. Clients and servers can independently be deployed or scaled.
The clients must know the list of the servers (IPs). I have an HTTP endpoint on the clients to update the list of the servers while the clients are running (hot config reload).
All this is currently running outside of Kubernetes. I want to migrate to GCP.
What's the industry standard regarding pods updates and notifications? I want to get notified when servers are updated to call the endpoints on the clients to update the list of the servers.
Can't use a LoadBalancer since the clients really need to call a specific server (business logic are in the clients).
The standard for calling a group of pods that offer a functionality is services. If you don't want automated load-balancing or a single IP address, which regular services do, you should look into headless services. Calling headless services returns a list of DNS A records that point to the pods behind the service. This list is automatically updated as pods become available/unavailable.
While I think modifying an existing script to just pull a list from a headless is much simpler, it might be worth mentioning CRDs (Custom Resource Definitions) as well.
You could build a custom controller that listens to service events and then posts the data from that event to an HTTP endpoint of another Service or Ingress. The custom resource would define which service to watch and where to post the results.
Though, this is probably much heavier weight solution that just having a sidecar / separate container in a pod polling the service for changes (which sounds closer to you existing model).
I upvoted Alassane answer as I think it is the correct first path to something like this before building a CRD.