Pod <-> compute machine with CPU and RAM? - kubernetes

I am working with containers that contain HTTP-based REST Apis. Each container contains one service.
This documentation page says that each pod has a unique IP address, from which I conclude that the service inside each container can be accessed by this IP address but with a specific port. It is also stated that a pod is associated with a storage volume such that all containers have access to it. My first question comes here : do the containers inside the same pod pick in the same physical memory also when they are run ?
Reading documentation about pods reminded me of how I used to make multiple services interact with each other on my laptop. I opened them all on my laptop so they have the same IP address (localhost), and I could request one of the services by specifying the port it is listening to. So in my brain, I identified a pod to a laptop/a computer.
My second question would be: is this identification correct or misleading ?
When I read further the documentation, I find that:
A node is a worker machine ... and may be ... a physical machine
and that pods run on nodes. Here I am confused. If a node is a worker machine, so a compute machine juste like my laptop, then it has a unique IP address. And this is is incompatible with the fact that nodes run pods and that each pod has it unique IP address. I'm assuming that if a node is a machine, then it has an IP address X, and all "things" inside this machine will also have the same IP address X, including the pods.
So I again ask my second question : is a pod a compute machine ?
I think I need to sit down with someone that knows about the Kubernetes API, but not having such guy or girl, I'm trying here while continuing reading the docs!

All valid questions demonstrating an inquisitive mind (not guaranteed in IT these days). To answer your questions:
1. Do the containers inside the same Pod pick in the same physical memory also when they are run?
Containers all run on the same Kernel and therefore share the same resources. Unlike VMs there is no virtualization layer isolating them. However, Container processes benefit from the same memory isolation as any other process running on a Linux kernel. On top of that, container processes can be limited to use not more than a certain amount of memory and CPU.
2. Is this identification [Pods are like services running on a Laptop] correct or misleading?
This identification only holds regarding networking: just like on your laptop all Pod containers run in the same network and therefore can address each other with localhost. However, unlike on your laptop, containers within Pods are still isolated from each other. They have their own file systems and run in separate process namespaces.
3. Is a Pod a compute machine?
A compute machine would be hardware (virtual or real) plus a operating system (kernel + applications) in my books. That means a Pod is not a compute machine. A Pod is merely a bunch of contained processes sharing a network namespace. In order to run a Pod, you need to provide hardware, Kernel and container runtime aka a K8s cluster.
Regarding networking: K8s worker machines have IP addresses assigned to them that make them addressable within their network or externally. Those IPs don't conflict with IPs assigned to Pods since Pod IPs are internal IPs. They are only addressable from within K8s virtual network, i.e. from other Pods within the same K8s cluster.

Following are the simplistic definitions :
Container : An application which runs in a isolated user space (Note : not kernelspace) with all its dependencies required. so we can consider this as a 'light weight VM' from application perspective. we say 'light weight' because it shares Host OS kernel. we say its a '(light weight)VM' because it contains its own and isolated Process Namespace (Pid) , Network Name Space (net) , Disk NameSpace (mnt) and time (uts).
How can each container have its own IP which different to the Host IP ? :
This is possible due the implementation of "Network NameSpace"
Then what is a Pod ?
Pod is a Kubernetes object which will enable users to run containers. Kubernetes implemented Pod as an abstraction layer to break dependency with CRI (container runtime)
Can Pod multiple containers in them ?
Yes a POD can have multiple containers. IP is assigned at the pod level . so in case of multicontainer pods they can communicate among each other over localhost interface of the containers.
Is the pod that runs containers a compute machine with ram and cpu ? Is that rather a node ?
Yes , while creating a container as part of Pods you can assign Ram & CPU to it . here
It's more about how can a node have an IP address and the pods inside it have other IP addresses ?
This is possible with combination two linux features Veth interface ( Virtual Ethernet ) and network namespaces . even in a world with out containers we can create multiple veth intefaces in a linus system and assign different IPs to them. so containers leveraged veths of linux and combined them with network namespace to act as a isolated network environment for a container. see this video
Please refer following articles :
https://medium.com/faun/understanding-nodes-pods-containers-and-clusters-778dbd56ade8
https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/

Related

Kubernetes: How does CNI take advantage of BPG?

When learning the Kubernetes CNI, I heard some plugins are using the BGP or VXLAN under the hood.
On the internet, border gateway protocol (BGP) manages how packets are routed between edge routers.
Autonomous systems (AS) are network routers managed by a single enterprise or service provider. for example, Facebook and Google.
Autonomous systems (AS) communicate with peers and form a mesh.
But I still can't figure out how does the CNI plugin take advantage of BGP.
Imagine there is a Kubernetes cluster, which is composed of 10 nodes. Calico is the chosen CNI plugin.
Who plays the Autonomous System(AS) role? Is each node an AS?
How are packets forward from one node to another node? Is the iptable still required?
The CNI plugin is responsible for allocating IP addresses (IPAM) and ensuring that packets get where they need to get.
For Calico specifically, you can get a lot of information from the architecture page as well as the Calico network design memoirs.
Whenever a new Pod is created, the IPAM plugin allocates an IP address from the global pool and the Kubernetes scheduler assigns the Pod to a Node. The Calico CNI plugin (like any other) configures the networking stack to accept connections to the Pod IP and routes them to the processes inside. This happens with iptables and uses a helper process called Felix.
Each Node also runs a BIRD (BGP) daemon that watches for these configuration events: "IP 10.x.y.z is hosted on node A". These configuration events are turned into BGP updates and sent to other nodes using the open BGP sessions.
When the other nodes receive these BGP updates, they program the node route table (with simple ip route commands) to ensure the node knows how to reach the Pod. In this model, yes, every node is an AS.
What I just described is the "AS per compute server" model: it is suitable for small deployments in environments where nodes are not necessarily on the same L2 network. The problem is that each node needs to maintain a BGP session with every other node, which scales as O(N^2).
For larger deployments therefore, a compromise is to run one AS per rack of compute servers ("AS per rack"). Each top of rack switch then runs BGP to communicate routes to other racks, while the switch internally knows how to route packets.

kubernetes: How multiple containers inside a pod use localhost

I see that kubernets uses pod and then in each pod there can be multiple containers.
Example I create a pod with
Container 1: Django server - running at port 8000
Container 2: Reactjs server - running at port 3000
Since the containers inside cant have port conflicts, then its better to put all of them in one containers. Because I see the advantage of using containers is no need to worry about port conflict.
Container 1: BOTH Django server - running at port 8000 and Reactjs server - running at port 3000
No need of container2.
and also
When i run different docker containers on my PC i cant access them like local host
But then how is this possible inside a POD with multiple containers.
Whats the difference between the docker containers run on PC and inside a POD.
The typical way to think about this delineation is "which parts of my app scale together?"
So for your example, you probably wouldn't even choose a common pod for them. You should have a Django pod and separately, a ReactJS server pod. Thus you can scale these independently.
The typical case for deploying pods with multiple containers is a pattern called "sidecar", where the added container enhances some aspect of the deployed workload, and always scales right along with that workload container. Examples are:
Shipping logs to a central log server
Security auditing
Purpose-built Proxies - e.g. handles DB connection details
Service Mesh (intercepts all network traffic and handles routing, circuit breaking, load balancing, etc.)
As for deploying the software into the same container, this would only be appropriate if the two pieces being considered for co-deployment into the same container are developed by the same team and address the same concerns (that is - they really are only one piece when you think about it). If you can imagine them being owned/maintained by distinct teams, let those teams ship a clean container image with a contract to use networking ports for interaction.
(some of) The details are this:
Pods are a shared Networking and IPC namespace. Thus one container in a pod can modify iptables and the modification applies to all other containers in that pod. This may help guide your choice: Which containers should have that intimate a relationship to each other?
Specifically I am referring to Linux Namespaces, a feature of the kernel that allows different processes to share a resource but not "see" each other. Containers are normal Linux processes, but with a few other Linux features in place to stop them from seeing each other. This video is a great intro to these concepts. (timestamp in link lands on a succinct slide/moment)
Edit - I noticed the question edited to be more succinctly about networking. The answer is in the Namespace feature of the Linux kernel that I mentioned. Every process belongs to a Network namespace. Without doing anything special, it would be the default network namespace. Containers usually launch into their own network namespace, depending on the tool you use to launch them. Linux then includes a feature where you can virtually connect two namespaces - this is called a Veth Pair (Pair of Virtual Ethernet devices, connected). After a Veth pair is setup between the default namespace and the container's namespace, both get a new eth device, and can talk to each other. Not all tools will setup that veth pair by default (example: Kubernetes will not do this by default). You can, however, tell Kubernetes to launch your pod in "host" networking mode, which just uses the system's default network namespace so the veth pair is not even required.

What's the difference between pod and container from container runtime's perspective?

Kubernetes documentation describes pod as a wrapper around one or more containers. containers running inside of a pod share a set of namespaces (e.g. network) which makes me think namespaces are nested (I kind doubt that). What is the wrapper here from container runtime's perspective?
Since containers are just processes constrained by namespaces, Cgroups e.g. Perhaps, pod is just the first container launched by Kubelet and the rest of containers are started and grouped by namespaces.
The main difference is networking, the network namespace is shared by all containers in the same Pod. Optionally, the process (pid) namespace can also be shared. That means containers in the same Pod all see the same localhost network (which is otherwise hidden from everything else, like normal for localhost) and optionally can send signals to processes in other containers.
The idea is the Pods are groups of related containers, not really a wrapper per se but a set of containers that should always deploy together for whatever reason. Usually that's a primary container and then some sidecars providing support services (mesh routing, log collection, etc).
Pod is just a co-located group of container and an Kubernetes object.
Instead of deploying them separate you can do deploy a pod of containers.
Best practices is that you should not actually run multiple processes via single container and here is the place where pod idea comes to a place. So with running pods you are grouping containers together and orchestrate them as single object.
Containers in a pod runs the same Network namespace (ip address and port space) so you have to be careful no to have the same port space used by two processes.
This differs for example when it comes to filesystem, since the containers fs comes from the image fs. The file systems are isolated unless they will share one Volume.

CIDR Address and advertise-address defining in Kubernetes Installation

I am trying to install Kubernetes in my on-premise server Ubuntu 16.04. And referring following documentation ,
https://medium.com/#Grigorkh/install-kubernetes-on-ubuntu-1ac2ef522a36
After installing kubelete kubeadm and kubernetes-cni I found that to initiate kubeadm with following command,
kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=10.133.15.28 --kubernetes-version stable-1.8
Here I am totally confused about why we are setting cidr and api server advertise address. I am adding few confusion from Kubernetes here,
Why we are specifying CIDR and --apiserver-advertise-address here?
How I can find these two address for my server?
And why flannel is using in Kubernetes installation?
I am new to this containerization and Kubernetes world.
Why we are specifying CIDR and --apiserver-advertise-address here?
And why flannel is using in kubernetes installation?
Kubernetes using Container Network Interface for creating a special virtual network inside your cluster for communication between pods.
Here is some explanation "why" from documentation:
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
all containers can communicate with all other containers without NAT
all nodes can communicate with all containers (and vice-versa) without NAT
the IP that a container sees itself as is the same IP that others see it as
Kubernetes applies IP addresses at the Pod scope - containers within a Pod share their network namespaces - including their IP address. This means that containers within a Pod can all reach each other’s ports on localhost. This does imply that containers within a Pod must coordinate port usage, but this is no different than processes in a VM. This is called the “IP-per-pod” model.
So, Flannel is one of the CNI which can be used for create network which will connect all your pods and CIDR option define a subnet for that network. There are many alternative CNI with similar functions.
If you want to get more details about how network working in Kubernetes you can read by link above or, as example, here.
How I can find these two address for my server?
API server advertise address has to be only one and static. That address using by all components to communicate with API server. Unfortunately, Kubernetes has no support of multiple API server addresses per master.
But, you can still use as many addresses on your server as you want, but only one of them you can define as --apiserver-advertise-address. The only one request for it - it has to be accessible from all your nodes in cluster.

kubernetes network performance issue: moving service from physical machine to kubernetes get half rps drop

I setup a kubernetes cluster with 2 powerful physical servers (32 cores + 64GB memory.) Everything runs very smooth except the bad network performance I observed.
As comparison: I run my service on such physical machine directly (one instance). Have a client machine in the same network subset calling the service. The rps can goes to 10k easily. While when I put the exact same service in kubernetes version 1.1.7, one pod (instance) of the service in launched and expose the service by ExternalIP in service yaml file. With the same client, the rps drops to 4k. Even after I switched to iptable mode of kube-proxy, it doesn't seem help a lot.
When I search around, I saw this document https://www.percona.com/blog/2016/02/05/measuring-docker-cpu-network-overhead/
Seems the docker port-forwarding is the network bottleneck. While other network mode of docker: like --net=host, bridge network, or containers sharing network don't have such performance drop. Wondering whether Kubernetes team already aware of such network performance drop? Since docker containers are launched and managed by Kubernetes. Is there anyway to tune the kubernetest to use other network mode of docker?
You can configure Kubernetes networking in a number of different ways when configuring the cluster, and a few different ways on a per-pod basis. If you want to try verifying whether the docker networking arrangement is the problem, set hostNetwork to true in your pod specification and give it another try (example here). This is the equivalent of the docker --net=host setting.