How to assign a namespace to certain nodes? - kubernetes

Is there any way to configure nodeSelector at the namespace level?
I want to run a workload only on certain nodes for this namespace.

To achieve this you can use PodNodeSelector admission controller.
First, you need to enable it in your kubernetes-apiserver:
Edit /etc/kubernetes/manifests/kube-apiserver.yaml:
find --enable-admission-plugins=
add PodNodeSelector parameter
Now, you can specify scheduler.alpha.kubernetes.io/node-selector option in annotations for your namespace, example:
apiVersion: v1
kind: Namespace
metadata:
name: your-namespace
annotations:
scheduler.alpha.kubernetes.io/node-selector: env=test
spec: {}
status: {}
After these steps, all the pods created in this namespace will have this section automatically added:
nodeSelector
env: test
More information about the PodNodeSelector you can find in the official Kubernetes documentation:
https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podnodeselector
kubeadm users
If you deployed your cluster using kubeadm and if you want to make this configuration persistent, you have to update your kubeadm config file:
kubectl edit cm -n kube-system kubeadm-config
specify extraArgs with custom values under apiServer section:
apiServer:
extraArgs:
enable-admission-plugins: NodeRestriction,PodNodeSelector
then update your kube-apiserver static manifest on all control-plane nodes:
# Kubernetes 1.22 and forward:
kubectl get configmap -n kube-system kubeadm-config -o=jsonpath="{.data}" > kubeadm-config.yaml
# Before Kubernetes 1.22:
# "kubeadmin config view" was deprecated in 1.19 and removed in 1.22
# Reference: https://github.com/kubernetes/kubeadm/issues/2203
kubeadm config view > kubeadm-config.yaml
# Update the manifest with the file generated by any of the above lines
kubeadm init phase control-plane apiserver --config kubeadm-config.yaml
kubespray users
You can just use kube_apiserver_enable_admission_plugins variable for your api-server configuration variables:
kube_apiserver_enable_admission_plugins:
- PodNodeSelector

I totally agree with the #kvaps answer but something is missing : it is necessary to add a label in your node :
kubectl label node <yournode> env=test
Like that, the pod created in the namespace with scheduler.alpha.kubernetes.io/node-selector: env=test will be schedulable only on node with env=test label

To dedicate nodes to only host resources belonging to a namespace, you also have to prevent the scheduling of other resources over those nodes.
It can be achieved by a combination of podSelector and a taint, injected via the admission controller when you create resources in the namespace. In this way, you don't have to manually label and add tolerations to each resource but it is sufficient to create them in the namespace.
Properties objectives:
the podSelector forces scheduling of resources only on the selected nodes
the taint denies scheduling of any resource not in the namespace on the selected nodes
Configuration of nodes/node pool
Add a taint to the nodes you want to dedicate to the namespace:
kubectl taint nodes project.example.com/GPUsNodePool=true:NoSchedule -l=nodesWithGPU=true
This example adds the taint to the nodes that already have the label nodesWithGPU=true. You can taint nodes also individually by name: kubectl taint node my-node-name project.example.com/GPUsNodePool=true:NoSchedule
Add a label:
kubectl label nodes project.example.com/GPUsNodePool=true -l=nodesWithGPU=true
The same is done if, for example, you use Terraform and AKS. The node pool configuration:
resource "azurerm_kubernetes_cluster_node_pool" "GPUs_node_pool" {
name = "gpusnp"
kubernetes_cluster_id = azurerm_kubernetes_cluster.clustern_name.id
vm_size = "Standard_NC12" # https://azureprice.net/vm/Standard_NC12
node_taints = [
"project.example.com/GPUsNodePool=true:NoSchedule"
]
node_labels = {
"project.example.com/GPUsNodePool" = "true"
}
node_count = 2
}
Namespace creation
Create then the namespace with instructions for the admission controller:
apiVersion: v1
kind: Namespace
metadata:
name: gpu-namespace
annotations:
scheduler.alpha.kubernetes.io/node-selector: "project.example.com/GPUsNodePool=true" # poorly documented: format has to be of "selector-label=label-val"
scheduler.alpha.kubernetes.io/defaultTolerations: '[{"operator": "Equal", "value": "true", "effect": "NoSchedule", "key": "project.example.com/GPUsNodePool"}]'
project.example.com/description: 'This namespace is dedicated only to resources that need a GPU.'
Done! Create resources in the namespace and the admission controller together with the scheduler will do the rest.
Testing
Create a sample pod with no label or toleration but into the namespace:
kubectl run test-dedicated-ns --image=nginx --namespace=gpu-namespace
# get nodes and nodes
kubectl get po -n gpu-namespace
# get node name
kubectl get po test-dedicated-ns -n gpu-namespace -o jsonpath='{.spec.nodeName}'
# check running pods on a node
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<node>

Related

Install calico GlobalNetworkPolicy via helm chart

I am trying to install a calico GlobalNetworkPolicy that will be applicable to all the pods in the cluster regardless of namespace , and to apply GlobalNetworkPolicy as per docs here -
Calico network policies and Calico global network policies are applied
using calicoctl
i.e calicoctl command (assuming calicoctl binary installed in the host) ->
calicoctl apply -f global-policy.yaml
OR if we have a calicoctl pod running ->
kubectl exec -ti -n kube-system calicoctl -- /calicoctl apply -f global-deny.yaml -o wide
global-policy.yaml ->
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: default-deny
spec:
selector: projectcalico.org/namespace == "kube-system"
types:
- Ingress
- Egress
Question -> How do I install such a policy via helm chart ? As helm implicitly calls via kubectl and that causes error on install.
Error using kubectl or helm =>
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: resource mapping not found for name: "default-deny" namespace: "" from "": no matches for kind "GlobalNetworkPolicy" in version "projectcalico.org/v3"
As per the Doc given by you Calico global network policy is a non-namespaced resource and can be applied to any kind of endpoint (pods, VMs, host interfaces) independent of namespace.
But you are using namespace in the Yaml, that might be the reason for the error. Kindly remove the name space and try again.
Because global network policies use kind: GlobalNetworkPolicy, they are grouped separately from kind: NetworkPolicy. For example, global network policies will not be returned from calicoctl get networkpolicy, and are rather returned from calicoctl get globalnetworkpolicy.
Below is the reference yaml from Doc :
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: allow-tcp-port-6379
Refer For more information on Global Network Policy, Calico Install Via Helm and Calico command line tools.

Disable kubernetes enableServiceLinks globally?

Is there a way to disable service links globally. There's a field in podSpec:
enableServiceLinks: false
but it's true by default. I couldn't find anything in kubelet to kill it. Or is there some cool admission webhook toolchain I could use
You can use the Kubernetes-native policy engine called Kyverno. Kyverno policies can validate, mutate (see: Mutate Resources), and generate Kubernetes resources.
A Kyverno policy is a collection of rules that can be applied to the entire cluster (ClusterPolicy) or to the specific namespace (Policy).
I will create an example to illustrate how it may work.
First we need to install Kyverno, you have the option of installing Kyverno directly from the latest release manifest, or using Helm (see: Quick Start guide):
$ kubectl create -f https://raw.githubusercontent.com/kyverno/kyverno/main/definitions/release/install.yaml
After successful installation, we can create a simple ClusterPolicy:
$ cat strategic-merge-patch.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: strategic-merge-patch
spec:
rules:
- name: enableServiceLinks_false_globally
match:
resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
spec:
enableServiceLinks: false
$ kubectl apply -f strategic-merge-patch.yaml
clusterpolicy.kyverno.io/strategic-merge-patch created
$ kubectl get clusterpolicy
NAME BACKGROUND ACTION READY
strategic-merge-patch true audit true
This policy adds enableServiceLinks: false to the newly created Pod.
Let's create a Pod and check if it works as expected:
$ kubectl run app-1 --image=nginx
pod/app-1 created
$ kubectl get pod app-1 -oyaml | grep "enableServiceLinks:"
enableServiceLinks: false
It also works with Deployments, StatefulSets, DaemonSets etc.:
$ kubectl create deployment deploy-1 --image=nginx
deployment.apps/deploy-1 created
$ kubectl get pod deploy-1-7cfc5d6879-kfdlh -oyaml | grep "enableServiceLinks:"
enableServiceLinks: false
More examples with detailed explanations can be found in the Kyverno Writing Policies documentation.

Subnetting within Kubernetes Cluster

I have couple of deployments - say Deployment A and Deployment B. The K8s Subnet is 10.0.0.0/20.
My requirement : Is it possible to get all pods in Deployment A to get IP from 10.0.1.0/24 and pods in Deployment B from 10.0.2.0/24.
This helps the networking clean and with help of IP itself a particular deployment can be identified.
Deployment in Kubernetes is a high-level abstraction that rely on controllers to build basic objects. That is different than object itself such as pod or service.
If you take a look into deployments spec in Kubernetes API Overview, you will notice that there is no such a thing as defining subnets, neither IP addresses that would be specific for deployment so you cannot specify subnets for deployments.
Kubernetes idea is that pod is ephemeral. You should not try to identify resources by IP addresses as IPs are randomly assigned. If the pod dies it will have another IP address. You could try to look on something like statefulsets if you are after unique stable network identifiers.
While Kubernetes does not support this feature I found workaround for this using Calico: Migrate pools feature.
First you need to have calicoctl installed. There are several ways to do that mentioned in the install calicoctl docs.
I choose to install calicoctl as a Kubernetes pod:
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
To make work faster you can setup an alias :
alias calicoctl="kubectl exec -i -n kube-system calicoctl /calicoctl -- "
I have created two yaml files to setup ip pools:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: pool1
spec:
cidr: 10.0.0.0/24
ipipMode: Always
natOutgoing: true
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: pool2
spec:
cidr: 10.0.1.0/24
ipipMode: Always
natOutgoing: true
Then you you have apply the following configuration but since my yaml were being placed in my host filesystem and not in calico pod itself I placed the yaml as an input to the command:
➜ cat ippool1.yaml | calicoctl apply -f-
Successfully applied 1 'IPPool' resource(s)
➜ cat ippool2.yaml | calicoctl apply -f-
Successfully applied 1 'IPPool' resource(s)
Listing the ippools you will notice the new added ones:
➜ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED SELECTOR
default-ipv4-ippool 192.168.0.0/16 true Always Never false all()
pool1 10.0.0.0/24 true Always Never false all()
pool2 10.0.1.0/24 true Always Never false all()
Then you can specify what pool you want to choose for you deployment:
---
metadata:
labels:
app: nginx
name: deployment1-pool1
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
annotations:
cni.projectcalico.org/ipv4pools: "[\"pool1\"]"
---
I have created similar one called deployment2 that used ippool2 with the results below:
deployment1-pool1-6d9ddcb64f-7tkzs 1/1 Running 0 71m 10.0.0.198 acid-fuji
deployment1-pool1-6d9ddcb64f-vkmht 1/1 Running 0 71m 10.0.0.199 acid-fuji
deployment2-pool2-79566c4566-ck8lb 1/1 Running 0 69m 10.0.1.195 acid-fuji
deployment2-pool2-79566c4566-jjbsd 1/1 Running 0 69m 10.0.1.196 acid-fuji
Also its worth mentioning that while testing this I found out that if your default deployment will have many replicas and will ran out of ips Calico will then use different pool.

Taint a node in kubernetes live cluster

How can I achieve the same command with a YAML file such that I can do a kubectl apply -f? The command below works and it taints but I can't figure out how to use it via a manifest file.
$ kubectl taint nodes \
172.4.5.2-3a1d4eeb \
kops.k8s.io/instancegroup=loadbalancer \
NoSchedule
Use the -o yaml option and save the resulting YAML file and make sure to remove the status and some extra stuff, this will apply the taint , but provide you the yaml that you can later use to do kubectl apply -f , and save it to version control ( even if you create the resource from command line and later get the yaml and apply it , it will not re-create the resource , so it is perfectly fine )
Note: Most of the commands support --dry-run , that will just generate the yaml and not create the resource , but in this case , I could not make it work with --dry-run , may be this command does not support that flag.
C02W84XMHTD5:~ iahmad$ kubectl taint node minikube dedicated=foo:PreferNoSchedule -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: 2018-10-16T21:44:03Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: minikube
node-role.kubernetes.io/master: ""
name: minikube
resourceVersion: "291136"
selfLink: /api/v1/nodes/minikube
uid: 99a1a304-d18c-11e8-9334-f2cf3c1f0864
spec:
externalID: minikube
taints:
- effect: PreferNoSchedule
key: dedicated
value: foo
then use the yaml with kubectl apply:
apiVersion: v1
kind: Node
metadata:
name: minikube
spec:
taints:
- effect: PreferNoSchedule
key: dedicated
value: foo
I have two nodes in my cluster, please look at labels
kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
172.16.2.53 Ready node 7d4h v1.19.7 type=primary
172.16.2.89 Ready node 33m v1.19.7 type=secondary
Lets say I want to taint node name with "172.16.2.89"
kubectl taint node 172.16.2.89 type=secondary:NoSchedule
node/172.16.2.89 tainted
Example -
kubectl taint node <node-name> <label-key>=<value>:NoSchedule
I have two nodes in my cluster, please look at labels
NoExecute means the pod will be evicted from the node.
NoSchedule means the scheduler will not place the pod onto the node

spinnaker /halyard : Unable to communicate with the Kubernetes cluster

I am trying to deploy spinnaker on multi node . I have 2 VMs : the first with halyard and kubectl the second contain the kubernetes master api.
my kubectl is well configured and able to communicate with the remote kubernetes api,
the "kubectl get namespaces " works
kubectl get namespaces
NAME STATUS AGE
default Active 16d
kube-public Active 16d
kube-system Active 16d
but when I run this cmd
hal config provider -d kubernetes account add spin-kubernetes --docker-registries myregistry
I get this error
Add the spin-kubernetes account
Failure
Problems in default.provider.kubernetes.spin-kubernetes:
- WARNING You have not specified a Kubernetes context in your
halconfig, Spinnaker will use "default-system" instead.
? We recommend explicitly setting a context in your halconfig, to
ensure changes to your kubeconfig won't break your deployment.
? Options include:
- default-system
! ERROR Unable to communicate with your Kubernetes cluster:
Operation: [list] for kind: [Namespace] with name: [null] in namespace:
[null] failed..
? Unable to authenticate with your Kubernetes cluster. Try using
kubectl to verify your credentials.
- Failed to add account spin-kubernetes for provider
kubernetes.
From the error message there seem to be two approaches to this, set your halconfig to talk to the default-system context, so it could communicate with your cluster or the other way around, that is configure your context.
Try this:
kubectl config view
I suppose you'll see the context and current context over there to be default-system, try changing those.
For more help do
kubectl config --help
I guess you're looking for the set-context option.
Hope that helps.
You can set this in your halconfig as mentioned by #Naim Salameh.
Another way is to try setting your K8S cluster info in your default Kubernetes config ~/.kube/config.
Not certain this will work since you are running halyard and kubectl on different VM's.
# ~/.kube/config
apiVersion: v1
clusters:
- cluster:
server: http://my-kubernetes-url
name: my-k8s-cluster
contexts:
- context:
cluster: my-k8s-cluster
namespace: default
name: my-context
current-context: my-context
kind: Config
preferences: {}
users: []