GKE metadata server pod issues with new nodes during autoscaling - kubernetes

We have an issue with gke workload identity.We enabled workload identity for our prod clusters.when enable workload identity it creates daemon set.whenever you application required service account key to access the gcp resource this make a api call to metadata server for accessing service accounts
Issue:-
In morning we have some nodes autoscaling so when a new node came then first our application pods got scheduled and then only these daemon set pods got scheduled.These time then application pods got restarted because in our application will try to access an api call to metadata server pods for initiating some beans for bigquery client
I read some docs also they are suggesting some priority class.
But this gke daemon set pods already have a priority class but it won't work in this usecase

Related

Architecture Question - User Driven Resource Allocation

I am working on a SaaS application built on Azure AKS. Users will connect to a web frontend, and depending on their selection, will deploy containers on demand for their respective Organization. There will be a backend API layer that will connect to the Kubernetes API for the deployment of different YAML configurations.
Users will select a predefined container (NodeJs container app), and behind the scenes that container will be created from a template and a URL provided to the user to consume that REST API resource via common HTTP verbs.
I read the following blurb on the Kubernetes docs:
You'll rarely create individual Pods directly in Kubernetes—even singleton Pods. This is because Pods are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or indirectly by a controller), the new Pod is scheduled to run on a Node in your cluster. The Pod remains on that node until the Pod finishes execution, the Pod object is deleted, the Pod is evicted for lack of resources, or the node fails.
I am thinking that that each "organization account" in my application should deploy containers that are allocated a shared context constrained to a Pod, with multiple containers spun up for each "resource" request. This is because, arguably, an Organization would prefer that their "services" were unique to their Organization and not shared with the scope of others. Assume that namespace, service, or pod name is not a concern as each will be named on the backend with a GUID or similar unique identifier.
Questions:
Is this an appropriate use of Pods and Services in Kubernetes?
Will scaling out mean that I add nodes to the cluster to support the
maximum constraint of 110 Pods / node?
Should I isolate these data services / pods from the front-end to its own dedicated cluster, then add a cluster when (if) maximum Node count of 15,000 is reached?
I guess you should have a look at Deployments
A container is in a pod.
A pod is in a deployment
A service exposes a deployment.

Connect to Cassandra on Kubernetes using java-driver

We are bringing up a Cassandra cluster, using k8ssandra helm chart, it exposes several services, our client applications are using the datastax Java-Driver and running at the same k8s cluster as the Cassandra cluster (this is testing phase)
CqlSessionBuilder builder = CqlSession.builder();
What is the recommended way to connect the application (via the Driver) to Cassandra?
Adding all nodes?
for (String node :nodes) {
builder.addContactPoint(new InetSocketAddress(node, 9042));
}
Adding just the service address?
builder.addContactPoint(new InetSocketAddress(service-dns-name , 9042))
Adding the service address as unresolved? (would that even work?)
builder.addContactPoint(InetSocketAddress.createUnresolved(service-dns-name , 9042))
The k8ssandra Helm chart deploys a CassandraDatacenter object and cass-operator in addition to a number of other resources. cass-operator is responsible for managing the CassandraDatacenter. It creates the StatefulSet(s) and creates several headless services including:
datacenter service
seeds service
all pods service
The seeds service only resolves to pods that are seeds. Its name is of the form <cluster-name>-seed-service. Because of the ephemeral nature of pods cass-operator may designate different C* nodes as seed nodes. Do not use the seed service for connecting client applications.
The all pods service resolves to all Cassandra pods, regardless of whether they are readiness. Its name is of the form <cluster-name>-<dc-name>-all-pods-service. This service is intended to facilitate with monitoring. Do not use the all pods service for connecting client applications.
The datacenter service resolves to ready pods. Its name is of the form <cluster-name>-<dc-name>-service This is the service that you should use for connecting client applications. Do not directly use pod IPs as they will change over time.
Adding all nodes?
You definitely do not need to add all of the nodes as contact points. Even in vanilla Cassandra, only adding a few is fine as the driver will gossip and find the rest.
Adding just the service address?
Your second option of binding on the service address is all you should need to do. The nice thing about the service address, is that it will account for changing/removing of IPs in the cluster.

Is there a way of obtaining activity logs from Service objects in Kubernetes?

I have the following situation (this is my motivation to ask the question, not the question itself):
I've got a web application that accepts uploads from users.
The users access the application through an Ingress,
then a Service,
Then a Deployment with two Pods.
Application contained in each Pod.
Sometimes the upload fails:
I can see in the logs from the Pod that the upload went all right.
I can even see the data uploaded by the user.
There are nothing but normal logs in the Pod.
But the ingress reports a HTTP 500 error.
And the users sees a HTTP 500 error - connection reset by peer.
If the Pod seems all right, but ingress complains, then I should check the middle man, the Service. Then I realized that there is no easy way to obtain logs from the service.
So this is my question:
How can I read logs from the Service object ? I mean activity logs, not the deployment events.
Do they exist?
The only resources in K8s that produce logs are Pods! Pods lead to the creation of containers, which for their part lead to the creation of Linux processes on the K8s nodes. Those processes write logs that are "reaped" by the container runtime and made available to K8s, e.g. when you run kubectl logs.
Consequently, only K8s resources that are backed by Pods produce logs, e.g. Deployment, Daemonsets, StatefulSets and Jobs.
Services are merely logical resources that configures how network traffic can be routed to Pods. So, in a way they have underlying Pods, but do not produce any additional log output. The only tangible outcome of a Service resource are iptables rules on the K8s nodes, that define how traffic has to be routed from the Service IP to the IPs of the underlying Pods.
To resolve Ingress related problems, you might get further insights from the logs of your ingress controller which is typically deployed as a deployment and therefore backed by Pods.

Are Headless Services updated "atomic" in Kubernetes?

I am working on a Kubernetes integration of the database Apache IoTDB which supports a Cluster mode. Currently, to start a cluster each node needs to know the IP adresses of all other nodes in its "ensemble" upfront, before starting.
I think the default approach to this in Kubernetes would generally be to use a StatefulSet and a headless Service. And the startup loop I scratched would be something like this
start each pod with a container which contains a "pre-start" script
In the pre-start script wait until all other pods of the set are started and get all their ip adresses from the headless service)
update the configs / env variables with all cluster ip adresses
start the iotdb instance(s)
So the only question I have is for step 2: When do I know that all pods are started?
Is the update of the DNS / A records of the headless Service atomic in the sense that I see all pods or no pod?
OR do I have to query the API Server separately to see the number of replicas and then wait until I got all their records from the headless service?
Or is there a more kubernetes-like way to achieve that?
Thanks already!
When a DNS address of a headless service is resolved, it returns a list of Pods (ie. IPs) from an underlying endpoint object. The endpoint object always holds the list of Ready pods.
So, you will get the list of Ready pods of that moment on resolving headless service DNS.

How do managed Kubernetes providers hide the master nodes?

If I run kubectl get nodes on GKE, EKS, or DigitalOcean Kubernetes, I only see the worker nodes. How are these systems architected at the network or application level to create this separation between workers and masters?
You can run the Kubernetes control plane outside Kubernetes as long as the worker nodes have network access to the control plane. This approach is used on most managed Kubernetes solutions.
A Container Engine cluster is a group of Compute Engine instances running Kubernetes. It consists of one or more node instances, and a managed Kubernetes master endpoint.
Every container cluster has a single master endpoint, which is managed by Container Engine. The master provides a unified view into the cluster and, through its publicly-accessible endpoint, is the doorway for interacting with the cluster.
The managed master also runs the Kubernetes API server, which services REST requests, schedules pod creation and deletion on worker nodes, and synchronizes pod information (such as open ports and location) with service information.
More info can be found here