How Finalizer work for CustomResouce object? - kubernetes

In Kubernetes and Operator-sdk, we can define CRD (Custom Resource Definition) and CR (Custom Resource). In my operator controller, when a CR is initialized, then the controller reconcillation create a new Deployment and service.
When we delete a CR object, then the associated resources (such as Deployment or service) will be deleted as well at the same time. I understand it should be done by CR Finalizer. But, in Operator-SDK and my controller code, I never see any code to register or add Finalizer for CR, is there any default behavior for Operator-Sdk?
Could anybody point how it work for the case - "while deleting CR, the associated Deployment and Service have deleted as well"? Which part in controller is responsible for that?

Deletion of associated resources is not part of a controller. It's done by Kubernetes's garbage collector.
Basically, garbage collector using OwnerReference objects to find orphaned resources and delete them. Most likely, you set OwnerReference by calling controllerutil.SetControllerReference method somewhere in your code.

Related

Kubernetes: validating update requests to custom resource

I created a custom resource definition (CRD) and its controller in my cluster, now I can create custom resources, but how do I validate update requests to the CR? e.g., only certain fields can be updated.
The Kubernetes docs on Custom Resources has a section on Advanced features and flexibility (never mind that validating requests should be considered a pretty basic feature 😉). For validation of CRDs, it says:
Most validation can be specified in the CRD using OpenAPI v3.0 validation. Any other validations supported by addition of a Validating Webhook.
The OpenAPI v3.0 validation won't help you accomplish what you're looking for, namely ensuring immutability of certain fields on your custom resource, it's only helpful for stateless validations where you're looking at one instance of an object and determining if it's valid or not, you can't compare it to a previous version of the resource and validate that nothing has changed.
You could use Validating Webhooks. It feels like a heavyweight solution, as you will need to implement a server that conforms to the Validating Webhook contract (responding to specific kinds of requests with specific kinds of responses), but you will have the required data at least to make the desired determination, e.g. knowing that it's an UPDATE request and knowing what the old object looked like. For more details, see here. I have not actually tried Validating Webhooks, but it feels like it could work.
An alternative approach I've used is to store the user-provided data within the Status subresource of the custom resource the first time it's created, and then always look at the data there. Any changes to the Spec are ignored, though your controller can notice discrepancies between what's in the Spec and what's in the Status, and embed a warning in the Status telling the user that they've mutated the object in an invalid way and their specified values are being ignored. You can see an example of that approach here and here. As per the relevant README section of that linked repo, this results in the following behaviour:
The AVAILABLE column will show false if the UAA client for the team has not been successfully created. The WARNING column will display a warning if you have mutated the Team spec after initial creation. The DIRECTOR column displays the originally provided value for spec.director and this is the value that this team will continue to use. If you do attempt to mutate the Team resource, you can see your (ignored) user-provided value with the -o wide flag:
$ kubectl get team --all-namespaces -owide
NAMESPACE NAME DIRECTOR AVAILABLE WARNING USER-PROVIDED DIRECTOR
test test vbox-admin true vbox-admin
If we attempt to mutate the spec.director property, here's what we will see:
$ kubectl get team --all-namespaces -owide
NAMESPACE NAME DIRECTOR AVAILABLE WARNING USER-PROVIDED DIRECTOR
test test vbox-admin true API resource has been mutated; all changes ignored bad-new-director-name

Kubernetes resource versioning

Using kubectl get with -o yaml on a resouce , I see that every resource is versioned:
kind: ConfigMap
metadata:
creationTimestamp: 2018-10-16T21:44:10Z
name: my-config
namespace: default
resourceVersion: "163"
I wonder what is the significance of these versioning and for what purpose these are used? ( use cases )
A more detailed explanation, that helped me to understand exactly how this works:
All the objects you’ve created throughout this book—Pods,
ReplicationControllers, Services, Secrets and so on—need to be
stored somewhere in a persistent manner so their manifests survive API
server restarts and failures. For this, Kubernetes uses etcd, which
is a fast, distributed, and consistent key-value store. The only
component that talks to etcd directly is the Kubernetes API server.
All other components read and write data to etcd indirectly through
the API server.
This brings a few benefits, among them a more robust optimistic
locking system as well as validation; and, by abstracting away the
actual storage mechanism from all the other components, it’s much
simpler to replace it in the future. It’s worth emphasizing that etcd
is the only place Kubernetes stores cluster state and metadata.
Optimistic concurrency control (sometimes referred to as optimistic
locking) is a method where instead of locking a piece of data and
preventing it from being read or updated while the lock is in place,
the piece of data includes a version number. Every time the data is
updated, the version number increases. When updating the data, the
version number is checked to see if it has increased between the time
the client read the data and the time it submits the update. If this
happens, the update is rejected and the client must re-read the new
data and try to update it again. The result is that when two clients
try to update the same data entry, only the first one succeeds.
The result is that when two clients try to update the same data entry,
only the first one succeeds
Marko Luksa, "Kubernetes in Action"
So, all the Kubernetes resources include a metadata.resourceVersion field, which clients need to pass back to the API server when updating an object. If the version doesn’t match the one stored in etcd, the API server rejects the update
The main purpose for the resourceVersion on individual resources is optimistic locking. You can fetch a resource, make a change, and submit it as an update, and the server will reject the update with a conflict error if another client has updated it in the meantime (their update would have bumped the resourceVersion, and the value you submit tells the server what version you think you are updating)

Large payload for Custom Objects

While I can create custom objects just fine, I am wondering how one is supposed to handle large payloads (Gigabytes) for an object.
CRs are mostly used in order to interface with garbage collection/reference counting in Kubernetes.
Adding the payload via YAML does not work, though (out of memory for large payloads):
apiVersion: "data.foo.bar/v1"
kind: Dump
metadata:
name: my-data
ownerReferences:
- apiVersion: apps/v1
kind: Deploy
name: my-deploy
uid: d9607a69-f88f-11e7-a518-42010a800195
spec:
payload: dfewfawfjr345434hdg4rh4ut34gfgr_and_so_on_...
One could perhaps add the payload to a PV and just reference that path in the CR.
Then I have the problem, that it seems like I cannot clean up the payload file, should the CR get finalized (could not find any info about custom Finalizers).
Have no clear idea how to integrate such a concept into Kubernetes lifetimes.
In general the limit on size for any Kube API object is ~1M due to etcd restrictions, but putting more than 20-30k in an object is a bad idea and will be expensive to access (and garbage collection will be expensive as well).
I would recommend storing the data in a object storage bucket and using an RBAC proxy like https://github.com/brancz/kube-rbac-proxy to gate access the bucket contents (use a URL to the proxy as a reference from your object). That gives you all the benefits of tracking the data in the api, but keeps the object size small. If you want a more complex integration you could implement an aggregated API and reuse the core Kubernetes libraries to handle your API, storing the data in the object store.
We still went with using the CO. Alongside, we created a Kubernetes Controller, which handles the lifetime in the PV. For us this works fine, since the Controller can be the single writer to the PV, while the actual Services only need read access to the PV.
Combined with ownerReference, this makes for a good integration into the Kubernetes lifetime.

When exactly do I set an ownerReference's controller field to true?

I am writing a Kubernetes controller.
Someone creates a custom resource via kubectl apply -f custom-resource.yaml. My controller notices the creation, and then creates a Deployment that pertains to the custom resource in some way.
I am looking for the proper way to set up the Deployment's ownerReferences field such that a deletion of the custom resource will result in a deletion of the Deployment. I understand I can do this like so:
ownerReferences:
- kind: <kind from custom resource>
apiVersion: <apiVersion from custom resource>
uid: <uid from custom resource>
controller: <???>
I'm unclear on whether this is case where I would set controller to true.
The Kubernetes reference documentation says (in its entirety):
If true, this reference points to the managing controller.
Given that a controller is running code, and an owner reference actually references another Kubernetes resource via matching uid, name, kind and apiVersion fields, this statement is nonsensical: a Kubernetes object reference can't "point to" code.
I have a sense that the documentation author is trying to indicate that—using my example—because the user didn't directly create the Deployment herself, it should be marked with some kind of flag indicating that a controller created it instead.
Is that correct?
The follow on question here is of course: OK, what behavior changes if controller is set to false here, but the other ownerReference fields are set as above?
ownerReferences has two purposes:
Garbage collection: Refer to the answer of ymmt2005. Essentially all owners are considered for GC. Contrary to the accepted answer the controller field has no impact on GC.
Adoption: The controller field prevents fighting over resources which are to be adopted. Consider a replica set. Usually, the replica set controller creates the pods. However, if there is a pod which matches the label selector it will be adopted by the replica set. To prevent two replica sets fighting over the same pod, the latter is given a unique controller by setting the controller to true. If a resource already has a controller it will not be adopted by another controller. Details are in the design proposal.
TLDR: The field controller is only used for adoption and not GC.
According to the source code of Kubernetes, the object will be garbage collected only after all objects in ownerReferences field are deleted.
https://github.com/kubernetes/apimachinery/blob/15d95c0b2af3f4fcf46dce24105e5fbb9379af5a/pkg/apis/meta/v1/types.go#L240-L247
// List of objects depended by this object. If ALL objects in the list have
// been deleted, this object will be garbage collected. If this object is managed by a controller,
// then an entry in this list will point to this controller, with the controller field set to true.
// There cannot be more than one managing controller.
According to the documentation:
Sometimes, Kubernetes sets the value of ownerReference automatically. For example, when you create a ReplicaSet, Kubernetes automatically sets the ownerReference field of each Pod in the ReplicaSet. In 1.6, Kubernetes automatically sets the value of ownerReference for objects created or adopted by ReplicationController, ReplicaSet, StatefulSet, DaemonSet, and Deployment.
You can also specify relationships between owners and dependents by manually setting the ownerReference field.
Basically, a Deployment is on the top of the ownership hierarchy, and ownerReference is not set to it automatically. Therefore, you can manually add ownerReference to your Deployment to create the reference to your Foo resource.
You asked:
The follow on question here is of course: OK, what behavior changes if controller is set to false here, but the other ownerReference fields are set as above?
OwnerReference is used by a Garbage Collector. The role of the Kubernetes Garbage Collector is to delete certain objects that once had an owner, but no longer have it.
Here is the link to the description of the OwnerReference structure on Github. As you mentioned, if controller: true, the reference points to the managing controller, in other words, the owner. And also, it is the instruction for Garbage Collector's behavior related to the object and its owner. If controller: false, Garbage Collector manages the object as an object without an owner, for example, allows to delete it freely.
For more information, you can visit the following links:
- Garbage Collection
- Deletion and Garbage Collection of Kubernetes Objects

In kubernetes apiserver, what's the function of 'SetSelfLink' when we get or list resources

In kube-apiserver, when we get or list a resource, in resthandler.go it will set the 'SelfLink', it will result in little performance loss. I wonder what's the meaning of it, ObjectMeta.SelfLink has already been set in etcd, why reset it?
Thanks.
You're right, its redundant and unnecessary. It shouldn't be set especially when we persisting items to disk (etcd). SelfLink is only useful for kube clients and not for the apiserver.
Indeed, this is an issue being tracked in the kubernetes repo here:
https://github.com/kubernetes/kubernetes/issues/2797