Distribute public key to all pods in all namespaces automatically - kubernetes

I have a public key that all my pods needs to have.
My initial thought was to create a ConfigMap or Secret to hold it but as far as I can tell neither of those can be used across namespaces. Apart from that, it's really boiler plate to paste the same volume into all my Deployments
So now I'm left with only, in my opinion, bad alternatives such as creating the same ConfigMap/Secret in all Namespaces and do the copy-paste thing in deployments.
Any other alternatives?
Extra information after questions.
The key doesn't need to be kept secret, it's a public key, but it needs to be distributed in a trusted way.
It won't rotate often but when it happens all images can't be re-built.
Almost all images/pods needs this key and there will be hundreds of images/pods.

You can use Kubernetes initializers to intercept object creation and mutate as you want. This can solve copy-paste in all your deployments and you can manage it from a central location.
https://medium.com/google-cloud/how-kubernetes-initializers-work-22f6586e1589
You will still need to create configmaps/secrets per namespace though.

While I don't really like the idea, one of the ways to solve it could be an init container that populates a volume with key(s) you need and then these volumes can be mounted in your containers as you see fit. That way it becomes independent of how you namespace stuff and relies only on how pods are defined/created.
That said, the Kubed mentioned by Ryan above sounds like more reasonable approach to a case like this one, and last but not least, something creates your namespaces after all, so having the creation of required elements of a namespace inside the same process sounds legit as well.

Related

Filtering items are stored in Kubernetes shared informer cache

I have a Kubernetes controller written using client-go informers package. It maintains a watch on all Pods in the cluster, there are about 15k of these objects, and their YAML representation takes around 600 MB to print (I assume their in-memory representation is not that different.)
As a result, this (otherwise really small) controller watching Pods ends up with a huge memory footprint (north of 1 GiB). Even methods that you'd think offer a way of filtering, such as the one named like NewFilteredSharedInformerFactory doesn't really give you a way to specify a predicate function that chooses which objects are stored in the in-memory cache.
Instead, that method in client-go offers a TweakListOptionsFunc. It helps you control ListOptions but my predicate unfortunately cannot be satisfied with a labelSelector or fieldSelector. I need to drop the objects when they arrive to the controller through a predicate function.
Note: the predicate I have is something like "Pods that have an ownerReference by a DaemonSet" (which is not possible with fieldSelectors –also another question of mine) and there's no labelSelector that can work in my scenario.
How would I go about configuring an informer on Pods that only have DaemonSet owner references to reduce the memory footprint of my controller?
Here's an idea, you can get a list of all the DaemonSets in your cluster. Read the spec.selector.matchLabels field to retrieve the label that the DaemonSet pods are bound to have. Use those labels as part of your TweakListOptionsFunc with a logic like:
Pods with label1 OR label2 OR label3 ...
I know it's a bit of toil, but seems to be a working approach. I believe there isn't a way to specify fields in client-go.
It appears that today if you use SharedInformers, there's no way to filters which objects to keep in the shared cache and which ones to discard.
I have found an interesting code snippet in kube-state-metrics project that opts into the lower-layer of abstraction of initiating Watch calls directly (which would normally be considered as an anti-pattern) and using watch.Filter, it decides whether to return an object from a Watch() call (to a cache/reflector or not).
That said, many controller authors might choose to not go down this path as it requires you to specify your own cache/reflector/indexer around the Watch() call. Furthermore, projects like controller-runtime don't even let you get access to this low-level machinery, as far as I know.
Another aspect of reducing controllers' memory footprint can be done through field/data erasure on structs (instead of discarding objects altogether). This is possible in newer versions of client-go through cache.TransformFunc, which can let you delete fields of irrelevant objects (though, these objects would still consume some memory). This one is more of a band-aid that can make your situation better.
In my case, I mostly needed to watch for DaemonSet Pods in certain namespaces, so I refactored the code from using 1 informer (watching all namespaces) to N namespace-scoped informers running concurrently.

Is there a good way to set dynamic labels for k8s resources?

I'm attempting to add some recommended labels to several k8s resources, and I can't see a good way to add labels for things that would change frequently, in this case "app.kubernetes.io/instance" and "app.kubernetes.io/version". Instance seems like a label that should change every time a resource is deployed, and version seems like it should change when a new version is released, by git release or similar. I know that I could write a script to generate these values and interpolate them, but that's a lot of overhead for what seems like a common task. I'm stuck using Kustomize, so I can't just use Helm and have whatever variables I want. Is there a more straightforward way to apply labels like these?
Kustomize's commonLabels transformer is a common way to handle this, sometimes via a component. It really depends on your overall layout.

Cosmos DB Change Feeds in a Kubernetes Cluster with arbitrary number of pods

I have a collection in my Cosmos database that I would like to observe for changes. I have many documents (official and unofficial) explaining how to do this. There is one thing though that I cannot get to work in a reliable way: how do I receive the same changes to multiple instances when I don't have any common reference for instance names?
What do I mean by this? Well, I'm running my work loads in a Kubernetes cluster (AKS). I have a variable number of instances within the cluster that should observe my collection. In order for change feeds to work properly, I have to have a unique instance name for each instance. The only candidate I have is the pod name. It's usually on the form of <deployment-name>-<random string>. E.g. pod-5f597c9c56-lxw5b.
If I use the pod name as instance name, all instances do not receive the same changes (which is my requirement), only one instance will receive the change (see https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-processor#dynamic-scaling). What I can do is to use the pod name as feed name instead, then all instances get the same changes. This is what I fear will bite me in the butt at some point; when peek into the lease container, I can see a set of documents per feed name. As pod names come and go (the random string part of the name), I fear the container will grow over time, generating a heap of garbage. I know Cosmos can handle huge work loads, but you know, I like to keep things tidy.
How can I keep this thing clean and tidy? I really don't want to invent (or reuse for that matter!) some protocol between my instances to vote for which instance gets which name out of a finite set of names.
One "simple" solution would be to build my own instance names, if AKS or Kubernetes held some "index" of some sort for my pods. I know stateful sets give me that, but I don't want to use stateful sets, as the pods themselves aren't really stateful (except for this particular aspect!).
There is a new Change Feed pull model (which is in preview at this time).
The differences are:
In your case, it looks like you don't need parallelization (you want all instances to receive everything). The important part would be to design a state storing model that can maintain the continuation tokens (or not, maybe you don't care to continue if a pod goes down and then restarts).
I would suggest that you proceed to use the pod name as unique ID. If you are concerned about sprawl of the data, you could monitor the container and devise a clean-up mechanism for the metadata.
In order to have at-least-once delivery, there is going to need to be metadata persisted somewhere to track items ACK-ed / position in a partition, etc. I suspect there could be a bit of work to get change feed processor to give you at-least-once delivery once you consider pod interruption/re-scheduling during data flow.
As another option Azure offers an implementation of checkpoint based message sharing from partitioned event hubs via EventProcessorClient. In EventProcessorClient, there is also a bit of metadata added to a storage account.

kubectl imperative object configuration use case

Out of the Kubernetes docs a kubectl tool has "three kinds of object management":
imperative commands
imperative object configuration
declarative object configuration
While the first and the last options' use cases are more or less clear, the second one really makes me confusing.
Moreover in the concepts section there is a clear distinction of use cases:
use imperative commands for quick creation of (simple)
single-container resources
use declarative commands for managing (more complex) set of resources
Also imperative style is recommended for CKA certification so it seems to be preferred for day-to-day cluster management activities.
But once again what is a best use case / practice for "Imperative object configuration" option and what is the root idea behind it?
There are two basic ways to deploy to Kubernetes: imperatively, with kubectl commands, or declaratively, by writing manifests and using kubectl apply. A Kubernetes object should be managed using only one technique. It is better to use only one way for the same object, mixing techniques for the same object results in undefined behavior.
Imperative commands operates on live objects
Imperative object configuration operates on individual files
Declarative object configuration operates on directories of files
Imperative object configuration creates, updates and deletes objects using configuration files, which contains fully-defined object definitions. You can store object configuration files in source control systems and audit changes more easily than with imperative commands.
You can run kubectl apply, delete, and replace operations with configuration files or directories containing configuration files.
Please refer to official documentation, where everything is fully described with examples. I hope it is helpful.

Do all cluster schedulers take array jobs, and if they do, do they set SGE_TASK_ID array id?

When using qsub to put array jobs on a cluster the global variable SGE_TASK_ID gets set to the array job ID. I use this in a shell script that I run on a cluster, where each array job needs to do something different based on the SGE_TASK_ID. Is this a common way for cluster schedulers to do this, or do they all have a different approach?
Most schedulers have a way to do this, although it can be slightly different in different setups. In TORQUE the variable is called $PBS_ARRAYID but it works the same.
Do all cluster schedulers take array jobs
No. Many do, but not all.
and if they do, do they set SGE_TASK_ID array id?
Only Grid Engine will set SGE_TASK_ID because this is simply what the variable is called in Grid Engine. Other cluster middlewares have a different name for it, with different semantics.
It's a bit unclear where you are aiming with your question, but if you want to write a program/system that runs on many different cluster middlewares / load balancers / schedulers, you should look into DRMAA. This will abstract variables like SGE_TASK_ID.