In preparation for HIPAA compliance, we are transitioning our Kubernetes cluster to use secure endpoints across the fleet (between all pods). Since the cluster is composed of about 8-10 services currently using HTTP connections, it would be super useful to have this taken care of by Kubernetes.
The specific attack vector we'd like to address with this is packet sniffing between nodes (physical servers).
This question breaks down into two parts:
Does Kubernetes encrypts the traffic between pods & nodes by default?
If not, is there a way to configure it such?
Many thanks!
Actually the correct answer is "it depends". I would split the cluster into 2 separate networks.
Control Plane Network
This network is that of the physical network or the underlay network in other words.
k8s control-plane elements - kube-apiserver, kube-controller-manager, kube-scheduler, kube-proxy, kubelet - talk to each other in various ways. Except for a few endpoints (eg. metrics), it is possible to configure encryption on all endpoints.
If you're also pentesting, then kubelet authn/authz should be switched on too. Otherwise, the encryption doesn't prevent unauthorized access to the kubelet. This endpoint (at port 10250) can be hijacked with ease.
Cluster Network
The cluster network is the one used by the Pods, which is also referred to as the overlay network. Encryption is left to the 3rd-party overlay plugin to implement, failing which, the app has to implement.
The Weave overlay supports encryption. The service mesh linkerd that #lukas-eichler suggested can also achieve this, but on a different networking layer.
The replies here seem to be outdated. As of 2021-04-28 at least the following components seem to be able to provide an encrypted networking layer to Kubernetes:
Istio
Weave
linkerd
cilium
Calico (via Wireguard)
(the list above was gained via consultation of the respective projects home pages)
Does Kubernetes encrypts the traffic between pods & nodes by default?
Kubernetes does not encrypt any traffic.
There are servicemeshes like linkerd that allow you to easily introduce https communication between your http service.
You would run a instance of the service mesh on each node and all services would talk to the service mesh. The communication inside the service mesh would be encrypted.
Example:
your service -http-> localhost to servicemesh node - https-> remoteNode -http-> localhost to remote service.
When you run the service mesh node in the same pod as your service the localhost communication would run on a private virtual network device that no other pod can access.
No, kubernetes does not encrypt traffic by default
I haven't personally tried it, but the description on the Calico software defined network seems oriented toward what you are describing, with the additional benefit of already being kubernetes friendly
I thought that Calico did native encryption, but based on this GitHub issue it seems they recommend using a solution like IPSEC to encrypt just like you would a traditional host
Related
Why do we need point-to-point connection between pods while we have workloads abstraction and networking mechanism (Service/kube-proxy/Ingress etc.) over it?
What is the default CNI?
REDACTED: I was confused about this question because I felt like I haven't installed any of popular CNI plugins when I was installing Kubernetes. It turns out Kubernetes defaults to kubenet
Btw, I see a lot of overlap features between Istio and container networks. IMO they could achieve identical objectives. The only difference is that Istio is high-level and CNI is low-level and more efficient, is that correct?
REDACTED:Interestingly, istio has it's own CNI
Kubernetes networking has some requirements:
pods on a node can communicate with all pods on all nodes without NAT
agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
pods in the host network of a node can communicate with all pods on all nodes without NAT
and CNI(Container Network Interface) setup a standard interface, all implements(calico, flannel) need follow it.
So it aims to resolve the kubernetes networking.
The SVC is different, it's supplied a virtual address to proxy the pods, sine pods is ephemeral and its ip will changing but the address of svc is immutable.
For the istio, it's another thing, it make the connection between microservice as infrastructure and pull out this part from business code (think about spring cloud).
why do we need point-to-point connection between pods while we have workloads abstraction and networking mechanism(Service/kube-proxy/Ingress etc.) over it?
In general, you will find everything about networking in a cluster in this documentation. You can find more information about pod networking:
Every Pod gets its own IP address. This means you do not need to explicitly create links between Pods and you almost never need to deal with mapping container ports to host ports. This creates a clean, backwards-compatible model where Pods can be treated much like VMs or physical hosts from the perspectives of port allocation, naming, service discovery, load balancing, application configuration, and migration.
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
pods on a node can communicate with all pods on all nodes without NAT
agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
Note: For those platforms that support Pods running in the host network (e.g. Linux):
pods in the host network of a node can communicate with all pods on all nodes without NAT
Then you are asking:
what is the default cni?
There is no single default CNI in a kubernetes cluster. It depends on what type you meet, where and how you set up the cluster etc. As you can see reading this doc about implementing networking model there are many CNI's available in Kubernetes.
Istio is a completely different tool for something else. You can't compare them like that. Istio is a service mesh tool.
Istio extends Kubernetes to establish a programmable, application-aware network using the powerful Envoy service proxy. Working with both Kubernetes and traditional workloads, Istio brings standard, universal traffic management, telemetry, and security to complex deployments.
How to deploy kubernertes service (type LoadBalancer) on onprem VMs ? When I using type=LoadBalcer it's shows external IP as "pending" but everything works fine with the same yaml if I deployed on GKS. My question is-:
Do we need a Load balancer if I use type=LoadBalcer on Onprem VMs?
Can I assign LoadBalncer IP manually in yaml?
You need to setup metalLB.
MetalLB hooks into your Kubernetes cluster, and provides a network load-balancer implementation. In short, it allows you to create Kubernetes services of type LoadBalancer in clusters that don’t run on a cloud provider, and thus cannot simply hook into paid products to provide load-balancers.
To install run
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
For more details Click here to install
It might be helpful to check the Banzai Cloud Pipeline Kubernetes Engine (PKE) that is "a simple, secure and powerful CNCF-certified Kubernetes distribution" platform. It was designed to work on any cloud, VM or on bare metal nodes to provide a scalable and secure foundation for private clouds. PKE is cloud-aware and includes an ever-increasing number of cloud and platform integrations.
When I using type=LoadBalcer it's shows external IP as "pending" but everything works fine with the same yaml if I deployed on GKS.
If you create a LoadBalancer service — for example try to expose your own TCP based service, or install an ingress controller — the cloud provider integration will take care of creating the needed cloud resources, and writing back the endpoint where your service will be available. If you don't have a cloud provider integration or a controller for this purpose, your Service resource will remain in Pending state.
In case of Kubernetes, LoadBalancer services are the easiest and most common way to expose a service (redundant or not) for the world outside of the cluster or the mesh — to other services, to internal users, or to the internet.
Load balancing as a concept can happen on different levels of the OSI network model, mainly on L4 (transport layer, for example TCP) and L7 (application layer, for example HTTP). In Kubernetes, Services are an abstraction for L4, while Ingresses are a generic solution for L7 routing.
You need to setup metalLB.
MetalLB is one of the most popular on-prem replacements for LoadBalancer cloud integrations. The whole solution runs inside the Kubernetes cluster.
The main component is an in-cluster Kubernetes controller which watches LB service resources, and based on the configuration supplied in a ConfigMap, allocates and writes back IP addresses from a dedicated pool for new services. It maintains a leader node for each service, and depending on the working mode, advertises it via BGP or ARP (sending out unsolicited ARP packets in case of failovers).
MetalLB can operate in two ways: either all requests are forwarded to pods on the leader node, or distributed to all nodes with kubeproxy.
Layer 7 (usually HTTP/HTTPS) load balancer appliances like F5 BIG-IP, or HAProxy and Nginx based solutions may be integrated with an applicable ingress-controller. If you have such, you won't need a LoadBalancer implementation in most cases.
Hope that sheds some light on a "LoadBalancer on bare metal hosts" question.
I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!
The scenario:
I have two K8s clusters. One is on-prem, the other is hosted in AWS. I could use Istio to make communication painless and do things like balloon capacity in AWS, but I'm getting hung up on trying to connect them. Reading the documentation, it looks like I need a VPN deployed inside of K8s if I want to have encrypted tunnels so that each internal network can talk to the other side. They're both non-overlapping 10-dots so I have that part done.
Is that correct or am I missing something on how to connect the two K8s clusters?
Having Istio in your cluster is independent of setting up basic communication in between your two clusters. There are a few options that I can think of here:
VPN between some nodes in both clusters like you mentioned.
BGP peering with Calico and your existing infrastructure.
A router in between your two clusters that understand the internal cluster IPs (This could be with BGP or static routes)
Kubernetes Federation. V1 is in alpha and V2 is in the implementation phase as of this writing. Not prod ready yet IMO.
OK I figured out I'm basically doing it wrong. Since istio uses TLS - I don't need the VPN for crypto, just connectivity, which is overkill since it's encrypting encrypted traffic. I just need some sort of connectivity between the clusters which we can facilitate on the existing link and I can use EIPs if I don't have that.
I have a Kubernetes cluster (1.3.2) in the the GKE and I'd like to connect VMs and services from my google project which shares the same network as the cluster.
Is there a way for a VM that's internal to the subnet but not internal to the cluster itself to connect to the service without hitting the external IP?
I know there's a ton of things you can do to unambiguously determine the IP and port of services, such as the ENVs and DNS...but the clusterIP is not reachable outside of the cluster (obviously).
Is there something I'm missing? An important component to this is that this is meant to be a service "public" to the project, such that I don't know which VMs on the project will want to connect to the service (this could rule out loadBalancerSourceRanges). I understand the endpoint which the services actually wraps is the internal IP I can hit, but the only good way to get to that IP is though the Kube API or kubectl, both of which are not prod-ideal ways of hitting my service.
Check out my more thorough answer here, but the most common solution to this is to create bastion routes in your GCP project.
In the simplest form, you can create a single GCE Route to direct all traffic w/ dest_ip in your cluster's service IP range to land on one of your GKE nodes. If that SPOF scares you, you can create several routes pointing to different nodes, and traffic will round-robin between them.
If that management overhead isn't something you want to do going forward, you could write a simple controller in your GKE cluster to watch the Nodes API endpoint, and make sure that you have a live bastion route to at least N nodes at any given time.
GCP internal load balancing was just released as alpha, so in the future, kube-proxy on GCP could be implemented using that, which would eliminate the need for bastion routes to handle internal services.