Security: Yaml Bomb: user can restart kube-api by sending configmap - kubernetes

Create yaml-bomb.yaml file:
apiVersion: v1
data:
a: &a ["web","web","web","web","web","web","web","web","web"]
b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a]
c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b]
d: &d [*c,*c,*c,*c,*c,*c,*c,*c,*c]
e: &e [*d,*d,*d,*d,*d,*d,*d,*d,*d]
f: &f [*e,*e,*e,*e,*e,*e,*e,*e,*e]
g: &g [*f,*f,*f,*f,*f,*f,*f,*f,*f]
h: &h [*g,*g,*g,*g,*g,*g,*g,*g,*g]
i: &i [*h,*h,*h,*h,*h,*h,*h,*h,*h]
kind: ConfigMap
metadata:
name: yaml-bomb
namespace: default
Send ConfigMap creation request to Kubernetes API by cmd kubectl apply -f yaml-bomb.yaml.
kube-api CPU/memory usage are very high, even later are getting restarted.
How do we prevent such yaml-bomb?

This is a billion laughts attack and can only be fixed in the YAML processor.
Note that the Wikipedia is wrong here when it says
A "Billion laughs" attack should exist for any file format that can contain references, for example this YAML bomb:
The problem is not that the file format contains references; it is the processor expanding them. This is against the spirit of the YAML spec which says that anchors are used for nodes that are actually referred to from multiple places. In the loaded data, anchors & aliases should become multiple references to the same object instead of the alias being expanded to a copy of the anchored node.
As an example, compare the behavior of the online PyYAML parser and the online NimYAML parser (full disclosure: my work) when you paste your code snippet. PyYAML won't respond because of the memory load from expanding aliases, while NimYAML doesn't expand the aliases and therefore responds quickly.
It's astonishing that Kubernetes suffers from this problem; I would have assumed since it's written in Go that they are able to properly handle references. You have to file a bug with them to get this fixed.

There's a couple of possible mitigations I could think of although as #flyx says the real fix here would be in the YAML parsing library used by Kubernetes.
Interestingly running this on a Kubernetes cluster on my local machine showed the CPU spike to be client-side (it's the kubectl process churning CPU) rather than server side.
If the issue was server side, then possible mitigations would be to use RBAC to minimize access to ConfigMap creation, and potentially to use an admission controller like OPA to review manifests before they are applied to the cluster.
This should probably be raised with the Kubernetes security vulnerability response team so that a proper fix can be implemented.
EDIT - I think where the problem manifests, might be down to the cluster version used. Server-side apply graduated to beta (should be enabled by default) in 1.16. So on a 1.16 cluster perhaps this would hit server side instead of client side.
EDIT - Just setup a 1.16 cluster, still showing the CPU usage as client-side in kubectl...
EDIT - I've filed an issue for this here also confirmed that the DoS can be achieved server-side by using curl instead of kubectl
Final EDIT - This got assigned a CVE (CVE-2019-11253) and is being fixed in Kubernetes 1.13+ . The fix has also been applied to the underlying YAML parsing lib here so any other Go programs should be ok as long as they're using an up to date version.

There was a TrustCom19 paper studying vulnerabilities in YAML parsers for different languages, it found that most parsers have some issues, so this is common and there are several recent CVEs in this space (details in paper: Laughter in the Wild: A Study into DoS Vulnerabilities in YAML Libraries, TrustCom19.
Preprint: https://www.researchgate.net/publication/333505459_Laughter_in_the_Wild_A_Study_into_DoS_Vulnerabilities_in_YAML_Libraries

Related

Why prefix kubernetes manifest files with numbers?

I'm trying to deploy Node.js code to a Kubernetes cluster, and I'm seeing that in my reference (provided by the maintainer of the cluster) that the yaml files are all prefixed by numbers:
00-service.yaml
10-deployment.yaml
etc.
I don't think that this file format is specified by kubectl, but I found another example of it online: https://imti.co/kibana-kubernetes/ (but the numbering scheme isn't the same).
Is this a Kubernetes thing? A file naming convention? Is it to keep files ordered in a folder?
This is to handle the resource creation order. There's an opened issue in kubernetes:
https://github.com/kubernetes/kubernetes/issues/16448#issue-113878195
tl;dr kubectl apply -f k8s/* should handle the order but it does not.
However, except the namespace, I cannot imagine where the order will matter. Every relation except namespace is handled by label selectors, so it fixes itself once all resources are deployed. You can just do 00-namespace.yaml and everything else without prefixes. Or just skip prefixes at all unless you really hit the issue (I never faced it).
When you execute kubectl apply * the files are executed alphabetically. Prefixing files with a rising number allows you to control the order of the executed files. But in nearly all cases the order shouldn't matter.
Sequence helps in readability, user friendly and not the least maintainability. Looking at the resources one can conclude in which order the deployment needs to be performed. For example, deployment using configMap object would fail if the deployment is done before configMap is created.

Kubernetes etcd HighNumberOfFailedHTTPRequests QGET

I run kubernetes cluster in AWS, CoreOS-stable-1745.6.0-hvm (ami-401f5e38), all deployed by kops 1.9.1 / terraform.
etcd_version = "3.2.17"
k8s_version = "1.10.2"
This Prometheus alert method=QGET alertname=HighNumberOfFailedHTTPRequests is coming from coreos kube-prometheus monitoring bundle. The alert started to fire from the very beginning of the cluster lifetime and now exists for ~3 weeks without visible impact.
^ QGET fails - 33% requests.
NOTE: I have the 2nd cluster in other region built from scratch on the same versions and it has exact same behavior. So it's reproducible.
Anyone knows what might be the root cause, and what's the impact if ignored further?
EDIT:
Later I found this GH issue which describes my case precisely: https://github.com/coreos/etcd/issues/9596
From CoreOS documentation:
For alerts to not appear on arbitrary events it is typically better not to alert directly on a raw value that was sampled, but rather by aggregating and defining a relative threshold rather than a hardcoded value. For example: send a warning if 1% of the HTTP requests fail, instead of sending a warning if 300 requests failed within the last five minutes. A static value would also require a change whenever your traffic volume changes.
Here you can find detailed information on how to Develop Prometheus alerts for etcd.
I got the explanation in GitHub issue thread.
HTTP metrics/alerts should be replaced with GRPC.

Passing long configuration file to Kubernetes

I like the work methology of Kuberenetes, use self-contained image and pass the configuration in a ConfigMap, as a volume.
Now this worked great until I tried to do this thing with Liquibase container, The SQL is very long ~1.5K lines, and Kubernetes rejects it as too long.
Error from Kubernetes:
The ConfigMap "liquibase-test-content" is invalid: metadata.annotations: Too long: must have at most 262144 characters
I thought of passing the .sql files as a hostPath, but as I understand these hostPath's content is probably not going to be there
Is there any other way to pass configuration from the K8s directory to pods? Thanks.
The error you are seeing is not about the size of the actual ConfigMap contents, but about the size of the last-applied-configuration annotation that kubectl apply automatically creates on each apply. If you use kubectl create -f foo.yaml instead of kubectl apply -f foo.yaml, it should work.
Please note that in doing this you will lose the ability to use kubectl diff and do incremental updates (without replacing the whole object) with kubectl apply.
Since 1.18 you can use server-side apply to circumvent the problem.
kubectl apply --server-side=true -f foo.yml
where server-side=true runs the apply command on the server instead of the client.
This will properly show conflicts with other actors, including client-side apply and thus fail:
Apply failed with 4 conflicts: conflicts with "kubectl-client-side-apply" using apiextensions.k8s.io/v1:
- .status.conditions
- .status.storedVersions
- .status.acceptedNames.kind
- .status.acceptedNames.plural
Please review the fields above--they currently have other managers. Here
are the ways you can resolve this warning:
* If you intend to manage all of these fields, please re-run the apply
command with the `--force-conflicts` flag.
* If you do not intend to manage all of the fields, please edit your
manifest to remove references to the fields that should keep their
current managers.
* You may co-own fields by updating your manifest to match the existing
value; in this case, you'll become the manager if the other manager(s)
stop managing the field (remove it from their configuration).
See http://k8s.io/docs/reference/using-api/api-concepts/#conflicts
If the changes are intended you can simple use the first option:
kubectl apply --server-side=true -force-conflicts -f foo.yml
You can use an init container for this. Essentially, put the .sql files on GitHub or S3 or really any location you can read from and populate a directory with it. The semantics of the init container guarantee that the Liquibase container will only be launched after the config files have been downloaded.

Monitoring Kubernetes with Grafana: lots of missing data with latest Prometheus version

I have a working Kubernetes cluster that I want to monitor with Grafana.
I have been trying out many dashboards from https://grafana.com/dashboards but they all seem to have some problems: it looks like there's a mismatch between the Prometheus metric names and what the dashboard expects.
Eg if I look at this recently released, quite popular dashboard: https://grafana.com/dashboards/5309/revisions
I end up with many "holes" when running it:
Looking into the panel configuration, I see that the issues come from small key changes, eg node_memory_Buffers instead of node_memory_Buffers_bytes.
Similarly the dashboard expects node_disk_bytes_written when Prometheus provides node_disk_written_bytes_total.
I have tried out a lot of Kubernetes-specific dashboards and I have the same problem with almost all of them.
Am I doing something wrong?
The Prometheus node exporter changed a lot of the metric names in the 0.16.0 version to conform to new naming conventions.
From https://github.com/prometheus/node_exporter/releases/tag/v0.16.0:
Breaking changes
This release contains major breaking changes to metric names. Many
metrics have new names, labels, and label values in order to conform
to current naming conventions.
Linux node_cpu metrics now break out guest values into separate
metrics.
Many counter metrics have been renamed to include _total.
Many metrics have been renamed/modified to include
base units, for example node_cpu is now node_cpu_seconds_total.
See also the upgrade guide. One of its suggestion is to use compatibility rules that will create duplicate metrics with the old names.
Otherwise use version 0.15.x until the dashboards are updated, or fix them!

Edit the REST API in Kubernetes source

I have modified kubectl's edit command (/pkg/kubectl/cmd/edit.go) to restart all active pods to reflect the new changes immediately after the edit is done. (A down time is acceptable in my use case). Now I want to include this feature to the REST api, where when I call
PATCH /api/v1/namespaces/{namespace}/replicationcontrollers/{name}
the patch should be applied to the replicationController and restart all the pods that are maintained by the corresponding replication controller. How ever I can't find the file that I should edit in order to alter the REST API. Where can I find these files and is there a better way to achieve what I am currently doing. (Edits to the RC should be reflect immediately in the pods)
We're actually implementing this feature in the Deployment API. You want the Recreate update strategy. It will be in Kubernetes 1.2, but you can try it now, in v1.2.0-alpha.6 or by building from HEAD.
https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/deployments.md
That documentation is a little out of date, since Deployment is under active development. For the current API, please see
https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/extensions/v1beta1/types.go