Manually spawn stateful pod instances - kubernetes

I'm working on a project where I need to spawn 1 instance per user (customer).
I figured it makes sense to create some sort of manager to handle that and host it somewhere. Kubernetes seems like a good choice since it can be hosted virtually anywhere and it will automate a lot of things (e.g. ensuring instances keep running on failure).
All entities are in Python and have a corresponding Flask API.
InstanceManager Instance (user1)
.-----------. .--------.
POST /instances/user3 --> | | ---------- | |---vol1
| | '--------'
| | -----.
'...........' \ Instance (user2)
\ .--------.
'- | |---vol2
'--------'
Now I can't seem to figure out how to translate this into Kubernetes
My thinking:
Instance is a StatefulSet since I want the data to be maintained through restarts.
InstanceManager is a Service with a database attached to track user to instance IP (for health checks, etc).
I'm pretty lost on how to make InstanceManager spawn a new instance on an incoming POST request. I did a lot of digging (Operators, namespaces, etc.) but nothing seems straightforward. Namely I don't seem to even be able to do that via kubectl. Am I thinking totally wrong on how Kubernetes works?

I've done some progress and thought to share.
Essentially you need to interact with Kubernetes REST API directly instead of applying a static yaml or using kubectl, ideally with one of the numerous clients out there.
In our case there's two options:
Create a namespace per user and then a service in that namespace
Create a new service with a unique name for each user
The first approach seems more sensible since using namespaces gives a lot of other benefits (network control, resource allocation, etc.).
The service itself can be pointing to a statefulset or a pod depending on the situation.
There's another gotcha (and possibly more). Namespaces, pod names, etc, they all need to conform to RFC 1123. So for namespaces, you can't simply use email addresses or even base64. You'll need to use something like user-100 and have a mapping table to map back to an actual user.

Related

How the dead nodes are handled in AWS OpenSearch?

Trying to understand what is the right approach to connect to AWS OpenSearch (single cluster, multiple data nodes).
To my understanding, as long as data nodes are behind the load balancer (according to this and other AWS docs: https://aws.amazon.com/blogs/database/set-access-control-for-amazon-elasticsearch-service/), we can not use:
var pool = new StaticConnectionPool(nodes);
and we probably should not use CloudConnectionPool - as originally it was dedicated to elastic search cloud and was left in open search client by mistake?
Hence we use SingleNodeConnectionPool and it works, but I've noticed several exceptions, which indicated that node had DeadUntil set to date one hour in advance - so I was wondering if that is expected behavior, as from client's perspective that is the only node it knows about?
What is correct way to connect to AWS OpenSearch that has multiple nodes and should I be concerned about DeadUntil property?

Understanding pod labels vs annotations

I am trying to understand the difference between pods and annotations.
Standard documentation says that annotations captures "non-identifying information".
While on labels, selectors can be applied. Labels are used to organise objects in kubernetes cluster.
If this is the case then why istio use pod annotations instead of labels for various different settings : https://istio.io/latest/docs/reference/config/annotations/
Isn't label is good approach ?
Just trying to understand what advantages does annotations provide, if istio developers chose to use annotations.
As Extending the Burak answer,
Kubernetes labels and annotations are both ways of adding metadata to
Kubernetes objects. The similarities end there, however. Kubernetes
labels allow you to identify, select and operate on Kubernetes
objects. Annotations are non-identifying metadata and do none of these
things.
Labels are mostly used to attach with the resources like POD, Replica set, etc. it also get used to route the traffic and routing deployment to service and other.
Labels are getting stored in the ETCD database so you can search using it.
Annotation is mostly to store metadata and config-if any.
Metadata like : owner details, last helm release if using helm, side car injection
You can store owner details in labels, K8s use labels for traffic routing from service to deployment and labels should be the same on both resources (deployment & service) to route traffic.
What will you do in that case to match labels for resources? Use service owner name same inside all deployment & service? when you are running multiple distributed services managed by diff team and service owners.
If you notice some of annotation of istio is just for storing metadata like : install.operator.istio.io/chart-owner, install.operator.istio.io/owner-generation
Read more at : https://istio.io/latest/docs/reference/config/annotations/
You should also check once syntax of both label and annotation.

block a rundeck node from arbitrary cloud and non-cloud resource discovery?

Is there a way to block arbitrary nodes being reported/discovered/red-status in rundeck? With all the sources feeding in (GCP plugin, resources.xml, etc.) I have often found a job status which applies to "all" is red since the individual instance isn't yet configured, giving a red status to the job.
Would be great if there were a way to do an easy block from the GUI and CLI for all resources for the given node.
You can use custom node-filters rules based on nodes status using health check status (also you can filter by name, tags, ip address, regex, etc). Take a look at this (at "Saving filters" section you've a good example).
Do a .hostnamepattern. in the exclude filter in the job and hit Save.
Simplify-simplify-simplify.

Kubernetes: How do we List all objects modified in N days in a specific namespace?

We have multiple people ( with admin access ) doing the deployments in kubernetes cluster. We are finding it difficult to manage who has modified which object.
We can control the access and privileges using RBAC with roles and role bindings. We are planning to implement well defined roles and rolebindings for different groups.
We would also want to list all objects modified in N days in a specific namespace. Is there a way to display the objects using kubectl? please let me know
This probably can't be done easily just with kubectl.
But you might look into Kubernetes auditing. It causes the API server to record all requests to the API, and you can query them in different ways. For example, it should be possible to query the audit logs for all the objects that have been specified in the last N days.

Kubernetes resource versioning

Using kubectl get with -o yaml on a resouce , I see that every resource is versioned:
kind: ConfigMap
metadata:
creationTimestamp: 2018-10-16T21:44:10Z
name: my-config
namespace: default
resourceVersion: "163"
I wonder what is the significance of these versioning and for what purpose these are used? ( use cases )
A more detailed explanation, that helped me to understand exactly how this works:
All the objects you’ve created throughout this book—Pods,
ReplicationControllers, Services, Secrets and so on—need to be
stored somewhere in a persistent manner so their manifests survive API
server restarts and failures. For this, Kubernetes uses etcd, which
is a fast, distributed, and consistent key-value store. The only
component that talks to etcd directly is the Kubernetes API server.
All other components read and write data to etcd indirectly through
the API server.
This brings a few benefits, among them a more robust optimistic
locking system as well as validation; and, by abstracting away the
actual storage mechanism from all the other components, it’s much
simpler to replace it in the future. It’s worth emphasizing that etcd
is the only place Kubernetes stores cluster state and metadata.
Optimistic concurrency control (sometimes referred to as optimistic
locking) is a method where instead of locking a piece of data and
preventing it from being read or updated while the lock is in place,
the piece of data includes a version number. Every time the data is
updated, the version number increases. When updating the data, the
version number is checked to see if it has increased between the time
the client read the data and the time it submits the update. If this
happens, the update is rejected and the client must re-read the new
data and try to update it again. The result is that when two clients
try to update the same data entry, only the first one succeeds.
The result is that when two clients try to update the same data entry,
only the first one succeeds
Marko Luksa, "Kubernetes in Action"
So, all the Kubernetes resources include a metadata.resourceVersion field, which clients need to pass back to the API server when updating an object. If the version doesn’t match the one stored in etcd, the API server rejects the update
The main purpose for the resourceVersion on individual resources is optimistic locking. You can fetch a resource, make a change, and submit it as an update, and the server will reject the update with a conflict error if another client has updated it in the meantime (their update would have bumped the resourceVersion, and the value you submit tells the server what version you think you are updating)