IPVLAN CNI based pods across hosts using VLAN headers - kubernetes

I have 2 worker nodes in a Kubernetes cluster. The worker nodes are on the same L2 domain.
$]cat ipvlanconf1.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ipvlanconf1
namespace: cncf
spec:
config: '{
"cniVersion": "0.3.0",
"type": "ipvlan",
"master": "enp1s0.10",
"mode": "l3",
"vlan": 10,
"ipam": {
"type": "whereabouts",
"range": "10.1.1.1/24",
"gateway": "10.1.1.254"
}
}'
Pod00 on Worker-node0 is using IPVLAN. So, net1 gets 10.1.1.1
Pod01 on Worker-node1 is using IPVLAN. So, net1 gets 10.1.1.2
I want to able to ping 10.1.1.1 <---> 10.1.1.2 and it should carry the VLAN header. I don't see any in the tcpdump.
Questions:
I assumed that the VLAN header is inserted by the Pod itself. However, in the IPVLAN CNI I don't see any code where VLAN information is taken via config. Is my understanding correct?
Should interfaces in pod be explicitly configured as vlan-subinterfaces (net1.10) or should I do it on the worker node (enp1s0.10)?
What should I use as 'master' interface? enp1s0 or enp1s0.10?
Thanks

Related

kubernetes how to call full service name instead of AWS node ip

I'm running my application EKS cluster, few days back we encounter the issues, let say we have application pod is running with one replicas count in different AWS node lets call vm name as like below.
ams-99547bd55-9fp6r 1/1 Running 0 7m31s 10.255.114.81 ip-10-255-12-11.eu-central-1.compute.internal
mongodb-58746b7584-z82nd 1/1 Running 0 21h 10.255.113.10 ip-10-255-12-11.eu-central-1.compute.internal
Here the my running serivces
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ams-service NodePort 172.20.81.165 <none> 3030:30010/TCP 18m
mongodb-service NodePort 172.20.158.81 <none> 27017:30003/TCP 15d
I have setting.conf.yaml file running as config map where i have application related configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: ama-settings
labels:
conf: ams-settings
data:
config : |
"git": {
"prefixUrl": "ssh://git#10.255.12.11:30001/app-server/repos/ams/",
"author": {
"name": "app poc",
"mail": "app#domain.com"
},
"mongodb": {
"host": "10.255.12.11",
"port": "30003",
"database": "ams",
"ssl": false,
}
This is working as we expected, but in case if existing node ip where my pod is running, some of the reason when i'm deleting my running pod and trying to re-deploy that time my pod is placed in some other AWS node basically EC2 vm.
During this time my application is not working then I need edit my setting.conf.yaml file to update with new AWS node IP where my pod is running.
Here the question how to use the service name instead of AWS node IP, because we don't want change the ip address frequently in case if any existing VM is goes down.
ideally, instead of using the AWS IP you should be using the 0.0.0.0/0 Refrence doc
example in Node
const cors = require("cors");
app.use(cors());
const port = process.env.PORT || 8000;
app.listen(port,"0.0.0.0" ,() => {
console.log(`Server is running on port ${port}`);
});
however, if you want to add the service name :
you can use the full certified name, but I am not sure it will work on as host 0.0.0.0 would be better option
<service.name>.<namespace name>.svc.cluster.local
example
ams-service.default.svc.cluster.local

Second NIC with Macvlan on GKE

I need to add a second interface to some of the specific K8s pods on GKE that need to be accessible directly from public users on the Internet. So I used Multus and defined a Macvlan cni like this:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: macvlan-conf
spec:
config: '{
"cniVersion": "0.3.1",
"type": "macvlan",
"master": "eth0",
"mode": "bridge",
"ipam": {
"type": "host-local",
"subnet": "10.162.0.0/20",
"rangeStart": "10.162.0.100",
"rangeEnd": "10.162.0.150",
"routes": [
{ "dst": "0.0.0.0/0" }
],
"gateway": "10.162.0.1"
}
}'
10.162.0.1 is the default gateway of my K8s nodes in GCP. So, I imagine that in this case, pods should have the access to the outside. But in pods, just there is one default gateway that routes the internal pods traffic. Also, I can't add any route because of the privileges issues.
Question:
My expectation is wrong? How I should use Macvlan to create a public interface for those pods?

Kubernetes gives an internal source IP although externalTrafficPolicy is set to Local

Our Kubernetes cluster includes an nginx load balancer that forwards the requests to other pods.
However, the nginx sees local source IPs and therefore cannot set the correct X-Real-IP header. I tried setting the externalTrafficPolicy value of nginx to "Local" but the IP does not change.
Section of the nginx service config:
"selector": {
"app": "nginx-ingress",
"component": "controller",
"release": "loping-lambkin"
},
"clusterIP": "10.106.1.182",
"type": "LoadBalancer",
"sessionAffinity": "None",
"externalTrafficPolicy": "Local",
"healthCheckNodePort": 32718
Result:
GET / HTTP/1.1
Host: example.com:444
X-Request-ID: dd3310a96bf154d2ac38c8877dec312c
X-Real-IP: 10.39.0.0
X-Forwarded-For: 10.39.0.0
We use a bare metal cluster with metallb.
I found out that weave needs to be configured using NO_MASQ_LOCAL=1 to respect the externalTrafficPolicy property
This appears to be a bug in the IPVS implementation for services of type LoadBalancer : https://github.com/google/metallb/issues/290

How can I setup a Kubernetes cluster where deployed containers appear as directly connected machines on the network?

I have been trying to setup a Kubernetes cluster where deployed containers should get their IP address from the DHCP server on the network.
When deploying these containers manually, I run the following commands :
INTERFACE="eno2" # Host interface
CONTAINER_ID=$(docker run -d --rm --name="$CONTAINER_NAME" <snip>)
PID=$(docker inspect --format='{{.State.Pid}}' "$CONTAINER_ID")
ip link add link "$INTERFACE" eth1 netns "$PID" type macvlan mode bridge
Now if I do :
dhclient eth1
on the container, I can reach the container by going to the IP address on eth1. Essentially, the container behaves as if it was a physical machine ( or a virtual machine ) connected directly to the network.
I am trying to manage this container using Kubernetes, and I am trying to use CNI plugins. Here is my /etc/cni/net.d/10-multus.conf :
{
"cniVersion": "0.2.0",
"name": "macvlan-dhcp",
"type": "macvlan",
"master": "eno2",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "dhcp"
}
}
But containers are not starting up. I did a test deployment for hello-node application and tried :
kubectl get deployment hello-node -o yaml
and I am getting :
apiVersion: extensions/v1beta1
kind: Deployment
<snip>
status:
conditions:
- lastTransitionTime: 2018-07-23T04:04:25Z
lastUpdateTime: 2018-07-23T04:04:25Z
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: 2018-07-23T04:14:25Z
lastUpdateTime: 2018-07-23T04:14:25Z
message: ReplicaSet "hello-node-7b788668d8" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
What am I missing here ?

Is there a way to add arbitrary records to kube-dns?

I will use a very specific way to explain my problem, but I think this is better to be specific than explain in an abstract way...
Say, there is a MongoDB replica set outside of a Kubernetes cluster but in a network. The ip addresses of all members of the replica set were resolved by /etc/hosts in app servers and db servers.
In an experiment/transition phase, I need to access those mongo db servers from kubernetes pods.
However, kubernetes doesn't seem to allow adding custom entries to /etc/hosts in pods/containers.
The MongoDB replica sets are already working with large data set, creating a new replica set in the cluster is not an option.
Because I use GKE, changing any of resources in kube-dns namespace should be avoided I suppose. Configuring or replace kube-dns to be suitable for my need are last thing to try.
Is there a way to resolve ip address of custom hostnames in a Kubernetes cluster?
It is just an idea, but if kube2sky can read some entries of configmap and use them as dns records, it colud be great.
e.g. repl1.mongo.local: 192.168.10.100.
EDIT: I referenced this question from https://github.com/kubernetes/kubernetes/issues/12337
There are 2 possible solutions for this problem now:
Pod-wise (Adding the changes to every pod needed to resolve these domains)
cluster-wise (Adding the changes to a central place which all pods have access to, Which is in our case is the DNS)
Let's begin with the pod-wise solution:
As of Kunbernetes 1.7, It's possible now to add entries to a Pod's /etc/hosts directly using .spec.hostAliases
For example: to resolve foo.local, bar.local to 127.0.0.1 and foo.remote,
bar.remote to 10.1.2.3, you can configure HostAliases for a Pod under
.spec.hostAliases:
apiVersion: v1
kind: Pod
metadata:
name: hostaliases-pod
spec:
restartPolicy: Never
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "foo.local"
- "bar.local"
- ip: "10.1.2.3"
hostnames:
- "foo.remote"
- "bar.remote"
containers:
- name: cat-hosts
image: busybox
command:
- cat
args:
- "/etc/hosts"
The Cluster-wise solution:
As of Kubernetes v1.12, CoreDNS is the recommended DNS Server, replacing kube-dns. If your cluster originally used kube-dns, you may still have kube-dns deployed rather than CoreDNS. I'm going to assume that you're using CoreDNS as your K8S DNS.
In CoreDNS it's possible to Add an arbitrary entries inside the cluster domain and that way all pods will resolve this entries directly from the DNS without the need to change each and every /etc/hosts file in every pod.
First:
Let's change the coreos ConfigMap and add required changes:
kubectl edit cm coredns -n kube-system
apiVersion: v1
kind: ConfigMap
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
hosts /etc/coredns/customdomains.db example.org {
fallthrough
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . "/etc/resolv.conf"
cache 30
loop
reload
loadbalance
}
customdomains.db: |
10.10.1.1 mongo-en-1.example.org
10.10.1.2 mongo-en-2.example.org
10.10.1.3 mongo-en-3.example.org
10.10.1.4 mongo-en-4.example.org
Basically we added two things:
The hosts plugin before the kubernetes plugin and used the fallthrough option of the hosts plugin to satisfy our case.
To shed some more lights on the fallthrough option. Any given backend is usually the final word for its zone - it either returns a result, or it returns NXDOMAIN for the
query. However, occasionally this is not the desired behavior, so some of the plugin support a fallthrough option.
When fallthrough is enabled, instead of returning NXDOMAIN when a record is not found, the plugin will pass the
request down the chain. A backend further down the chain then has the opportunity to handle the request and that backend in our case is kubernetes.
We added a new file to the ConfigMap (customdomains.db) and added our custom domains (mongo-en-*.example.org) in there.
Last thing is to Remember to add the customdomains.db file to the config-volume for the CoreDNS pod template:
kubectl edit -n kube-system deployment coredns
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
- key: customdomains.db
path: customdomains.db
and finally to make kubernetes reload CoreDNS (each pod running):
$ kubectl rollout restart -n kube-system deployment/coredns
#OxMH answer is fantastic, and can be simplified for brevity. CoreDNS allows you to specify hosts directly in the hosts plugin (https://coredns.io/plugins/hosts/#examples).
The ConfigMap can therefore be edited like so:
$ kubectl edit cm coredns -n kube-system
apiVersion: v1
kind: ConfigMap
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
hosts {
10.10.1.1 mongo-en-1.example.org
10.10.1.2 mongo-en-2.example.org
10.10.1.3 mongo-en-3.example.org
10.10.1.4 mongo-en-4.example.org
fallthrough
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . "/etc/resolv.conf"
cache 30
loop
reload
loadbalance
}
You will still need to restart coredns so it rereads the config:
$ kubectl rollout restart -n kube-system deployment/coredns
Inlining the contents of the hostsfile removes the need to map the hostsfile from the configmap. Both approaches achieve the same outcome, it is up to personal preference as to where you want to define the hosts.
A type of External Name is required to access hosts or ips outside of the kubernetes.
The following worked for me.
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "tiny-server-5",
"namespace": "default"
},
"spec": {
"type": "ExternalName",
"externalName": "192.168.1.15",
"ports": [{ "port": 80 }]
}
}
For the record, an alternate solution for those not checking the referenced github issue.
You can define an "external" Service in Kubernetes, by not specifying any selector or ClusterIP. You have to also define a corresponding Endpoint pointing to your external IP.
From the Kubernetes documentation:
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "my-service"
},
"spec": {
"ports": [
{
"protocol": "TCP",
"port": 80,
"targetPort": 9376
}
]
}
}
{
"kind": "Endpoints",
"apiVersion": "v1",
"metadata": {
"name": "my-service"
},
"subsets": [
{
"addresses": [
{ "ip": "1.2.3.4" }
],
"ports": [
{ "port": 9376 }
]
}
]
}
With this, you can point your app inside the containers to my-service:9376 and the traffic should be forwarded to 1.2.3.4:9376
Limitations:
The DNS name used needs to be only letters, numbers or dashes. You can't use multi-level names (something.like.this). This means you probably have to modify your app to point just to your-service, and not yourservice.domain.tld.
You can only point to a specific IP, not a DNS name. For that, you can define a kind of a DNS alias with an ExternalName type Service.
UPDATE: 2017-07-03 Kunbernetes 1.7 now support Adding entries to Pod /etc/hosts with HostAliases.
The solution is not about kube-dns, but /etc/hosts.
Anyway, following trick seems to work so far...
EDIT: Changing /etc/hosts may has race condition with kubernetes system. Let it retry.
1) create a configMap
apiVersion: v1
kind: ConfigMap
metadata:
name: db-hosts
data:
hosts: |
10.0.0.1 db1
10.0.0.2 db2
2) Add a script named ensure_hosts.sh.
#!/bin/sh
while true
do
grep db1 /etc/hosts > /dev/null || cat /mnt/hosts.append/hosts >> /etc/hosts
sleep 5
done
Don't forget chmod a+x ensure_hosts.sh.
3) Add a wrapper script start.sh your image
#!/bin/sh
$(dirname "$(realpath "$0")")/ensure_hosts.sh &
exec your-app args...
Don't forget chmod a+x start.sh
4) Use the configmap as a volume and run start.sh
apiVersion: extensions/v1beta1
kind: Deployment
...
spec:
template:
...
spec:
volumes:
- name: hosts-volume
configMap:
name: db-hosts
...
containers:
command:
- ./start.sh
...
volumeMounts:
- name: hosts-volume
mountPath: /mnt/hosts.append
...
Use configMap seems better way to set DNS, but it's a little bit heavy when just add a few record (in my opinion). So I add records to /etc/hosts by shell script executed by docker CMD.
for example:
Dockerfile
...(ignore)
COPY run.sh /tmp/run.sh
CMD bash /tmp/run.sh
run.sh
#!/bin/bash
echo repl1.mongo.local 192.168.10.100 >> /etc/hosts
# some else command...
Notice, if your run MORE THAN ONE container in a pod, you have to add script in each container, because kubernetes start container randomly, /etc/hosts may be override by another container (which start later).