Is Kubernetes' ETCD exposed for us to use? - kubernetes

We are working on provisioning our service using Kubernetes and the service needs to register/unregister some data for scaling purposes. Let's say the service handles long-held transactions so when it starts/scales out, it needs to store the starting and ending transaction ids somewhere. When it scales out further, it will need to find the next transaction id and save it with the ending transaction id that is covered. When it scales in, it needs to delete the transaction ids, etc. ETCD seems to make the cut as it is used (by Kubernetes) to store deployment data and not only that it is close to Kubernetes, it is actually inside and maintained by Kubernetes; thus we'd like to find out if that is open for our use. I'd like to ask the question for both EKS, AKS, and self-installed. Any advice welcome. Thanks.

Do not use the kubernetes etcd directly for an application.
Access to read/write data to the kubernetes etcd store is root access to every node in your cluster. Even if you are well versed in etcd v3's role based security model avoid sharing that specific etcd instance so you don't increase your clusters attack surface.
For EKS and GKE, the etcd cluster is hidden in the provided cluster service so you can't break things. I would assume AKS takes a similar approach unless they expose the instances to you that run the management nodes.
If the data is small and not heavily updated, you might be able to reuse the kubernetes etcd store via the kubernetes API. Create a ConfigMap or a custom resource definition for your data and edit it via the easily securable and namespaced functionality in the kubernetes API.
For most application uses run your own etcd cluster (or whatever service) to keep Kubernetes free to do it's workload scheduling. The coreos etcd operator will let you define and create new etcd clusters easily.

Related

Can two kubernetes clusters share the same external etcd and work like master slave

We have a requirement to setup a geo redundant cluster. I am looking at sharing an external etcd cluster to run two kubernetes clusters. It may sound absurd at first, but the requirements have come down to it..I am seeking some direction to whether it is possible, and if not, what are the challenges.
Yes it is possible, you can have a single etcd cluster and multiple k8s clusters attached to it. The key to achieve it, is to use -etcd-prefix string flag from kubernetes apiserver. This way each cluster will use different root path for storing its resources and avoid possible conflict with second cluster in the etcd. In addition to it, you should also setup the appropriate rbac rules and certificates for each k8s cluster. You can find more detailed information about it in the following article: Multi-tenant external etcd for Kubernetes clusters.
EDIT: Ooh wait, just noticed that you want to have those two clusters to behave as master-slave. In that case you could achieve it by assign to the slave cluster a read-only role in the etcd and change it to read-write when it has to become master. Theoretically it should work, but I have never tried it and I think the best option is to use builtin k8s mechanism for high-availability like leader-election.

Will the master know the data on workers/nodes in k8s

I try to deploy a set of k8s on the cloud, there are two options:the masters are in trust to the cloud provider or maintained by myself.
so i wonder about that if the masters in trust will leak the data on workers?
Shortly, will the master know the data on workers/nodes?
The abstractions in Kubernetes are very well defined with clear boundaries. You have to understand the concept of Volumes first. As defined here,
A Kubernetes volume is essentially a directory accessible to all
containers running in a pod. In contrast to the container-local
filesystem, the data in volumes is preserved across container
restarts.
Volumes are attached to the containers in a pod and There are several types of volumes
You can see the layers of abstraction source
Master to Cluster communication
There are two primary communication paths from the master (apiserver) to the cluster. The first is from the apiserver to the kubelet process which runs on each node in the cluster. The second is from the apiserver to any node, pod, or service through the apiserver’s proxy functionality.
Also, you should check the CCM - The cloud controller manager (CCM) concept (not to be confused with the binary) was originally created to allow cloud specific vendor code and the Kubernetes core to evolve independent of one another. The cloud controller manager runs alongside other master components such as the Kubernetes controller manager, the API server, and scheduler. It can also be started as a Kubernetes addon, in which case it runs on top of Kubernetes.
Hope this answers all your questions related to Master accessing the data on Workers.
If you are still looking for more secure ways, check 11 Ways (Not) to Get Hacked
Short answer: yes the control plane can access all of your data.
Longer and more realistic answer: probably don't worry about it. It is far more likely that any successful attack against the control plane would be just as successful as if you were running it yourself. The exact internal details of GKE/AKS/EKS are a bit fuzzy, but all three providers have a lot of experience running multi-tenant systems and it wouldn't be negligent to trust that they have enough protections in place against lateral escalations between tenants on the control plane.

Can I store my application data in kubernetes configuration resources?

I am trying to find a DB (object storage) for my application. The application is really a wrapper over ISTIO Network Routing API. Basically simplifying the ISTIO configuratin for my network. Kubernetes (k8s) Custom Resource Definition (CRD) seems to fit my requirements. Also love the watch and REST API capability provided by CRD.
DB requirement
Few 100 MBs data - worst case
Ability to watch objects
REST API support for object
Persistence
Around 2k writes/sec and similar/more reads/sec. Although I do have my application acting as a proxy for CRD where things can be cached.
Why using CRD will be a good or a bad idea? Are there any performance implication using CRD. This 2016 stackflow answer suggest that etcd data is not in RAM. Whereas etcd link suggest that etcd can do 10k writes/sec (so even things are not in RAM and purely in disk, who cares).
I am seeing multiple application using k8s CRDs.
Helm uses CRDs to store releases
Istio uses CRDs to store their networking routing API objects
Considering that (CRD page)
A resource is an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind. For example, the built-in pods resource contains a collection of Pod objects.
A custom resource is an extension of the Kubernetes API that is not necessarily available on every Kubernetes cluster. In other words, it represents a customization of a particular Kubernetes installation.
A CRD is for extending Kubernetes itself, it is not for application data.
The helm-crd/examples/mariadb.yaml is about lightweight metadata which will enable Kubernetes to download the right release and through Helm install it.
It is not to store data for a random application which could exist without Kubernetes (as opposed to Helm releases, which make sense only in a Kubernetes deployment scenario)
Similarly, Istio CRD makes sense only in a Kubernetes context:
Kubernetes currently implements the Istio configuration on Custom Resource Definitions (CRDs). These CRDs correspond to namespace-scope and cluster-scope CRDs and automatically inherit access protection via the Kubernetes RBAC.
That approach (using etcd to store any application data) would not scale.

What is the purpose of cluster-name parameter from kubernetes controller_manager?

So the question is in the title.
I'm wondering what is the purpose of cluster-name parameter from kubernetes controller_manager?
Reading the code of the cluster manager you'll find that the cluster-name is passed down to the service controller and the persistent volume controller which then pass them down to their related objects (Load balancers, persistent volumes, ...).
In both cases these pass down the cluster name to the related cloud provider (see interface) which could use it within the naming of their provider specific objects. This makes sense in case you run more than one Kubernetes cluster side by side on the same provider.
The GCE and AWS cloud providers do this for example, some others don't.
So having two clusters with the same cluster name configuration for the controller manager could therefore cause issues due to name collisions within the objects created by the cloud provider.
lets you create more than one cluster and helps you distinguish between them. Most folks just use kubernetes (which is the default) When you setup your kubectl you provide it as well.
this is from the k8s site # https://kubernetes.io/docs/getting-started-guides/scratch/
You should pick a name for your cluster. Pick a short name for each cluster which is unique from future cluster names. This will be used in several ways:
by kubectl to distinguish between various clusters you have access to. You will probably want a second one sometime later, such as for testing new Kubernetes releases, running in a different region of the world, etc.
Kubernetes clusters can create cloud provider resources (e.g. AWS ELBs) and different clusters need to distinguish which resources each created. Call this CLUSTER_NAME

Does Kubernetes provision new VMs for pods on my cloud platform?

I'm currently learning about Kubernetes and still trying to figure it out. I get the general use of it but I think that there still plenty of things I'm missing, here's one of them. If I want to run Kubernetes on my public cloud, like GCE or AWS, will Kubernetes spin up new VMs by itself in order to make more compute for new pods that might be needed? Or will it only use a certain amount of VMs that were pre-configured as the compute pool. I heard Brendan say, in his talk in CoreOS fest, that Kubernetes sees the VMs as a "sea of compute" and the user doesn't have to worry about which VM is running which pod - I'm interested to know where that pool of compute comes from, is it configured when setting up Kubernetes? Or will it scale by itself and create new machines as needed?
I hope I managed to be coherent.
Thanks!
Kubernetes supports scaling, but not auto-scaling. The addition and removal of new pods (VMs) in a Kubernetes cluster is performed by replication controllers. The size of a replication controller can be changed by updating the replicas field. This can be performed in a couple ways:
Using kubectl, you can use the scale command.
Using the Kubernetes API, you can update your config with a new value in the replicas field.
Kubernetes has been designed for auto-scaling to be handled by an external auto-scaler. This is discussed in responsibilities of the replication controller in the Kubernetes docs.