what are the Kubernetes modules directly communicating with etcd - kubernetes

I was trying to understand how exactly the kubernetes modules interacts with etcd. I understand kubernetes modules by themselves are stateless and they keep the states in etcd. But I am confused when it comes to how modules are interacting with etcd. I see conflicting texts on this, some saying all etcd interactions are happening through apiserver and some others say all the modules interacts with etcd.
I am looking for the possibility of changing etcd endpoint and restarting integration points so that they can work with new etcd instance.
I do not have time to go look in to the code to understand this part so hoping the someone here can help me on this.

If a kubernete component want to communicate with etcd, it must know the endpoint of etcd.
If you check the spec config of these components, you will find the correct answer: only api-server directly talk to etcd.

All kubernetes components, such as, kubelet, kubeproxy, scheduler, controllers etc. interact with etcd through API server. They dont directly talk to etcd.
if you change etcd endpoint, then same should be updated in api server configuration.

Related

Will the master know the data on workers/nodes in k8s

I try to deploy a set of k8s on the cloud, there are two options:the masters are in trust to the cloud provider or maintained by myself.
so i wonder about that if the masters in trust will leak the data on workers?
Shortly, will the master know the data on workers/nodes?
The abstractions in Kubernetes are very well defined with clear boundaries. You have to understand the concept of Volumes first. As defined here,
A Kubernetes volume is essentially a directory accessible to all
containers running in a pod. In contrast to the container-local
filesystem, the data in volumes is preserved across container
restarts.
Volumes are attached to the containers in a pod and There are several types of volumes
You can see the layers of abstraction source
Master to Cluster communication
There are two primary communication paths from the master (apiserver) to the cluster. The first is from the apiserver to the kubelet process which runs on each node in the cluster. The second is from the apiserver to any node, pod, or service through the apiserver’s proxy functionality.
Also, you should check the CCM - The cloud controller manager (CCM) concept (not to be confused with the binary) was originally created to allow cloud specific vendor code and the Kubernetes core to evolve independent of one another. The cloud controller manager runs alongside other master components such as the Kubernetes controller manager, the API server, and scheduler. It can also be started as a Kubernetes addon, in which case it runs on top of Kubernetes.
Hope this answers all your questions related to Master accessing the data on Workers.
If you are still looking for more secure ways, check 11 Ways (Not) to Get Hacked
Short answer: yes the control plane can access all of your data.
Longer and more realistic answer: probably don't worry about it. It is far more likely that any successful attack against the control plane would be just as successful as if you were running it yourself. The exact internal details of GKE/AKS/EKS are a bit fuzzy, but all three providers have a lot of experience running multi-tenant systems and it wouldn't be negligent to trust that they have enough protections in place against lateral escalations between tenants on the control plane.

Is an overlay network still necessary?

When using Istio with Kubernetes, is an overlay network still required for each node?
I have read the FAQ's and the documentation, but cannot see anything that directly references this.
Istio is built upon the Kubernetes. It creates sidecar containers in the Kubernetes Pods for routing requests, gathering metrics and so on. But still, it requires a way for Pods to communicate with each other. Therefore, Kubernetes network overlay is required.
For additional information, you can start from the following link.

usecase for etcd inside Kubernetes

I was just wondering, why it is useful to run etcd cluster inside Kubernetes, when Kubernetes itself depends on etcd.
It just does not make sense to me, as if I have HA Kube, I am also forced to have HA etcd outside. Hence to reason to install it again inside...
I have an external ETCD that manages my k8s HA cluster and im not letting any developer apps near it. I would be too concerned about something going wrong and breaking the k8s cluster. It is also a fixed size at 3 which works well for the cluster size with its requirements. If the developers need a key/value store for their db and want etcd, this would be a great way to make one in the cluster for the applications. With it being statefulsets, its scalable.
If you're using Kubernetes via GKE, the underlying Etcd cluster is not exposed in any way.

how to install kubernetes manually?

While getting familiar with kubernetes I do see tons of tools that should helps me to install kubernetes anywhere, but I don't understand exactly what it does inside, and as a result don't understand how to trouble shoot issues.
Can someone provide me a link with tutorial how to install kubernetes without any tools.
There are two good guides on setting up Kubernetes manually:
Kelsey Hightower's Kubernetes the hard way
Kubernetes guide on getting started from scratch
Kelsey's guide assumes you are using GCP or AWS as the infrstructure, while the Kubernetes guide is a bit more agnostic.
I wouldn't recommend running either of these in production unless you really know what you're doing. However, they are great for learning what is going on under the hood. Even if you just read the guides and don't use them to setup any infrastructure you should gain a better understanding of the pieces that make up a Kubernetes cluster. You can then use one of the helpful setup tools to create your cluster, but now you will understand what it is actually doing and can debug when things go wrong.
For simplicity, you can view k8s as three components
etcd
k8s master, which includes kube-apiserver, controller, scheduler
node, which contains kubelet
You can install etcd and k8s master together in one machine. The procedures are
Install etcd. Download etcd package and run it, which is quite
simple. Remember the port of etcd service, e.g. 2379,4001, or any you
set.
Git clone the kubernetes project from github. Find the executable binary file, e.g. for k8s version 1.3, you can find kube-apiserver, kube-controller-manager and kube-scheduler in src/k8s.io/kubernetes/_output/local/bin/linux/amd64 folder
Then run kube-apiserver, specify the etcd ip and port (e.g. --etcd_servers=http://127.0.0.1:4001)
Run scheduler and controller, specifying the apiserver ip and port(e.g. --master=127.0.0.1:8080). There is no oreder between scheduler and controller
Master is running so far. Make sure these processes run without errors. If etcd exits, apiserver would exit. If apiserver exits, scheduler and controller would exit.
On another machine(virtual preferred, network connected), run kubelet. Kubelet could also be found in previous folder(src/k8s.io/kubernetes/_output/local/bin/linux/amd64), specify apiserver ip and port(e.g. --api-servers=http://10.10.10.19:8080). You may install docker or something else on node, which to prove that you could create a container.

kubelet only choose the first api-server given, caused all services unavailable

one of The kubelet's start parameter is
-api-servers=[]: List of Kubernetes API servers for publishing events, and reading pods and services. (ip:port), comma separated.
It appears that it's designed for api-server's HA, only if one of the api server is alive, that everything will work well.
But I found that the kubelet would only choose the first api server, even if I gave it 3 api -servers. If the first api-server was stopped, all the services were unavailable.
The version I used is:
Kubernetes v1.2.1
So are there any ways to avoid this issue, Hopefully I just use it in a wrong way. Or I may fix it in the kubelet..
Any comments are appreciated.
This is expected.
In short, current model for HA expects load balancing (e.g., gcplb/elb/nginx/haproxy) in front of the apiserver, so that node components don't have to be aware of multiple apiservers. However, it's recognized that there is a need to pass multiple apiserver endpoints to kubernetes components, and is slotted for to be fixed for kubernetes v1.4.
See the detailed discussions in https://github.com/kubernetes/kubernetes/issues/18174