I learned recently that Kubernetes has a feature called Init Containers. Awesome, because I can use this feature to wait for my postgres service and create/migrate the database before my web application service runs.
However, it appears that Init Containers can only be configured in a Pod yaml file. Is there a way I can do this via a Deployment yaml file? Or do I have to choose?
To avoid confusion, ill answer your specific question. i agree with oswin that you may want to consider another method.
Yes, you can use init containers with a deployment. this is an example using the old style (pre 1.6) but it should work
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: 'nginx'
spec:
replicas: 1
selector:
matchLabels:
app: 'nginx'
template:
metadata:
labels:
app: 'nginx'
annotations:
pod.beta.kubernetes.io/init-containers: '[
{
"name": "install",
"image": "busybox",
"imagePullPolicy": "IfNotPresent",
"command": ["wget", "-O", "/application/index.html", "http://kubernetes.io/index.html"],
"volumeMounts": [
{
"name": "application",
"mountPath": "/application"
}
]
}
]'
spec:
volumes:
- name: 'application'
emptyDir: {}
containers:
- name: webserver
image: 'nginx'
ports:
- name: http
containerPort: 80
volumeMounts:
- name: 'application'
mountPath: '/application'
You probably want to use readiness probes instead of init containers for this use case. Check out this link and a blog. Also note that a deployment will not send traffic to a pod that is not reported ready - If that was your worry.
This is a well known pattern and a readiness probe in the web server would simply check the DB endpoint / data availability before reporting ready. This is a simple solution as opposed to the complexity of an extra init container and has the advantage of detecting DB outages correctly as well.
Related
I have a cli app written in NodeJS [not by me].
I want to deploy this on a k8s cluster like I have done many times with web servers.
I have not deployed something like this before, so I am in a kind of a loss.
I have worked with dockerized cli apps [like Terraform] before, and i know how to use them in a CICD.
But how should I deploy them in a pod so they are always available for usage from another app in the cluster?
Or is there a completely different approach that I need to consider?
#EDIT#
I am using this in the end of my Dockerfile ..
# the main executable
ENTRYPOINT ["sleep", "infinity"]
# a default command
CMD ["mycli help"]
That way the pod does not restart and the cli inside is waiting for commands like mycli do this
Is it a hacky way that is frowned upon or a legit solution?
Your edit is one solution, another one if you do not want or cannot change the Docker image is to Define a Command for a Container to loop infinitely, this would achieve the same as the Dockerfile ENTRYPOINT but without having to rebuild the image.
Here's an example of such implementation:
apiVersion: v1
kind: Pod
metadata:
name: command-demo
labels:
purpose: demonstrate-command
spec:
containers:
- name: command-demo-container
image: debian
command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
restartPolicy: OnFailure
As for your question about if this is a legit solution, this is hard to answer; I would say it depends on what your application is designed to do. Kubernetes Pods are designed to be ephemeral, so a good solution would be one that is running until the job is completed; for a web server, for example, the job is never completed because it should be constantly listening to requests.
If your pods are in the same cluster they are already available to other pods through Core-DNS. An internal DNS service which allows you to access them by their internal DNS name. Something like my-cli-app.my-namespace.svc.cluster. DNS for service and pods
You would then create a deployment file with all your apps. Note this doesn't need ports to work and also doesn't include communication through the internet.
#deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
First all of, for some reasons, I'm using an unsupported and obsolete version of Kubernetes (1.12), and I can't upgrade.
I'm trying to configure the scheduler to avoid running pods on some nodes by changing the node score when the scheduler try to find the best available node, and I would like to do that on scheduler level and not by using nodeAffinity at deployment, replicaset, pod, etc level (therefore all pods will be affected by this change).
After reading the k8s docs here: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins and checking that some options were already present in 1.12, I'm trying to use the NodePreferAvoidPods plugins.
In the documentation the plugin specifies:
Scores nodes according to the node annotation scheduler.alpha.kubernetes.io/preferAvoidPods
Which if understand correctly should do the work.
So, i've updated the static manifest for kube-scheduler.yaml to use the following config:
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
profiles:
- plugins:
score:
enabled:
- name: NodePreferAvoidPods
weight: 100
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
But adding the following annotation
scheduler.alpha.kubernetes.io/preferAvoidPods: to the node doesn't seem to work.
For testing I'm made a basic nginx deployment with a replica equal to the number of worker nodes (4).
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 4
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
Then I check where the pods where created with kubectl get pods -owide
So, I believe some options are required for this annotation to works.
I've tried to set the annotation to "true", "1" but k8s refuse my change and I can't figure what are the valid options for this annotation and I can't find any documentation about that.
I've checked within git release for 1.12, this plugin was already present (at least there are some lines of codes), I don't think the behavior or settings changed much since.
Thanks.
So from source Kubernetes codes here a valid value for this annoation:
{
"preferAvoidPods": [
{
"podSignature": {
"podController": {
"apiVersion": "v1",
"kind": "ReplicationController",
"name": "foo",
"uid": "abcdef123456",
"controller": true
}
},
"reason": "some reason",
"message": "some message"
}
]
}`
But there is no details on how to predict the uid and no answer where given when asked by another one on github years ago: https://github.com/kubernetes/kubernetes/issues/41630
For my initial question which was to avoid scheduling pods on node, I found an other method by using the well-known taint node.kubernetes.io/unschedulable and the value PreferNoSchedule
Tainting a node with this command do the job and this taint seem persistent across cordon/uncordon (a cordon will set to NoSchedule and uncordon will set it back to PreferNoSchedule).
kubectl taint node NODE_NAME node.kubernetes.io/unschedulable=:PreferNoSchedule
I have a kubernetes cluster deployed with rke witch is composed of 3 nodes in 3 different servers and in those server there is 1 pod which is running yatsukino/healthereum which is a personal modification of ethereum/client-go:stable .
The problem is that I'm not understanding how to add an external ip to send request to the pods witch are
My pods could be in 3 states:
they syncing the ethereum blockchain
they restarted because of a sync problem
they are sync and everything is fine
I don't want my load balancer to transfer requests to the 2 first states, only the third point consider my pod as up to date.
I've been searching in the kubernetes doc but (maybe because a miss understanding) I only find load balancing for pods inside a unique node.
Here is my deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: goerli
name: goerli-deploy
spec:
replicas: 3
selector:
matchLabels:
app: goerli
template:
metadata:
labels:
app: goerli
spec:
containers:
- image: yatsukino/healthereum
name: goerli-geth
args: ["--goerli", "--datadir", "/app", "--ipcpath", "/root/.ethereum/geth.ipc"]
env:
- name: LASTBLOCK
value: "0"
- name: FAILCOUNTER
value: "0"
ports:
- containerPort: 30303
name: geth
- containerPort: 8545
name: console
livenessProbe:
exec:
command:
- /bin/sh
- /app/health.sh
initialDelaySeconds: 20
periodSeconds: 60
volumeMounts:
- name: app
mountPath: /app
initContainers:
- name: healthcheck
image: ethereum/client-go:stable
command: ["/bin/sh", "-c", "wget -O /app/health.sh http://my-bash-script && chmod 544 /app/health.sh"]
volumeMounts:
- name: app
mountPath: "/app"
restartPolicy: Always
volumes:
- name: app
hostPath:
path: /app/
The answers above explains the concepts, but about your questions anout services and external ip; you must declare the service, example;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
type: LoadBalancer
The type: LoadBalancer will assign an external address for in public cloud or if you use something like metallb. Check your address with kubectl get svc goerli. If the external address is "pending" you have a problem...
If this is your own setup you can use externalIPs to assign your own external ip;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
externalIPs:
- 222.0.0.30
The externalIPs can be used from outside the cluster but you must route traffic to any node yourself, for example;
ip route add 222.0.0.30/32 \
nexthop via 192.168.0.1 \
nexthop via 192.168.0.2 \
nexthop via 192.168.0.3
Assuming yous k8s nodes have ip 192.168.0.x. This will setup ECMP routes to your nodes. When you make a request from outside the cluster to 222.0.0.30:8545 k8s will load-balance between your ready PODs.
For loadbalancing and exposing your pods, you can use https://kubernetes.io/docs/concepts/services-networking/service/
and for checking when a pod is ready, you can use tweak your liveness and readiness probes as explained https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
for probes you might want to consider exec actions like execution a script that checks what is required and returning 0 or 1 dependent on status.
When a container is started, Kubernetes can be configured to wait for a configurable
amount of time to pass before performing the first readiness check. After that, it
invokes the probe periodically and acts based on the result of the readiness probe. If a
pod reports that it’s not ready, it’s removed from the service. If the pod then becomes
ready again, it’s re-added.
Unlike liveness probes, if a container fails the readiness check, it won’t be killed or
restarted. This is an important distinction between liveness and readiness probes.
Liveness probes keep pods healthy by killing off unhealthy containers and replacing
them with new, healthy ones, whereas readiness probes make sure that only pods that
are ready to serve requests receive them. This is mostly necessary during container
start up, but it’s also useful after the container has been running for a while.
I think you can use probe for your goal
I'm trying to run private stellar blockchain infrastructure on kubernetes (not to join to existing public or test stellar network) but my question can be generalized to the scenario of running any peer to peer services on kubernetes. Therefore, I will try to explain my problem in a generalized way (hoping that it can yield answers that are applicable to any similar topology running on the kubernetes).
Here is the scenario:
I want to run 3 peers (in kube terms: pods) which are able to communicate with each other in a decentralized way but the problem lies in the fact that each of these peers has a slightly different configuration. In general, configuration looks like this (this is an example for pod0):
NETWORK_PASSPHRASE="my private network"
NODE_SEED=<pod0_private_key>
KNOWN_PEERS=[
"stellar-0",
"stellar-1",
"stellar-2"]
[QUORUM_SET]
VALIDATORS=[ <pod1_pub_key>, <pod2_pub_key> ]
The problem lies in the fact that each pod would have different:
NODE_SEED
VALIDATORS list
My first idea (before realizing this problem) was to:
Create config map for this configuration
Create statefulset (3 replicas) with headless service to enable stable reachability between pods (stellar-0, stellar-1, stellar-2...etc.)
Another idea (after realizing this problem) would be to:
Create separate config maps for each peer
Create statefulset (1 replica) with service
I'm wondering if there is any better solution/pattern that could be utilized for this purpose rather than running completely same services with slightly different configuration as separate entities (statefulset, deployment..) with their separate service through which these peers would be available (but this kind of defeats a purpose of using kubernetes high level resources which enable replication)?
Thanks
So you can have a single ConfigMap with multiple keys each one uniquely meant for one of your replicas. You can also deploy your pods using a StatefulSet with an initContainer to setup the configs. This is just an example (You'll have to tweak it to your needs):
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: stellar
labels:
app: stellar
data:
stellar0.cnf: |
NETWORK_PASSPHRASE="my private network"
NODE_SEED=<stellar0_private_key>
KNOWN_PEERS=[
"stellar-0",
"stellar-1",
"stellar-2"]
[QUORUM_SET]
VALIDATORS=[ <stellar1_pub_key>, <stellar2_pub_key> ]
stellar1.cnf: |
NETWORK_PASSPHRASE="my private network"
NODE_SEED=<stellar1_private_key>
KNOWN_PEERS=[
"stellar-0",
"stellar-1",
"stellar-2"]
[QUORUM_SET]
VALIDATORS=[ <stellar0_pub_key>, <stellar2_pub_key> ]
stellar2.cnf: |
NETWORK_PASSPHRASE="my private network"
NODE_SEED=<stellar2_private_key>
KNOWN_PEERS=[
"stellar-0",
"stellar-1",
"stellar-2"]
[QUORUM_SET]
VALIDATORS=[ <stellar0_pub_key>, <stellar1_pub_key> ]
StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: stellarblockchain
spec:
selector:
matchLabels:
app: stellar
serviceName: stellar
replicas: 3
template:
metadata:
labels:
app: stellar
spec:
initContainers:
- name: init-stellar
image: stellar-image:version
command:
- bash
- "-c"
- |
set -ex
# Generate config from pod ordinal index.
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
# Copy appropriate conf.d files from config-map to emptyDir.
if [[ $ordinal -eq 0 ]]; then
cp /mnt/config-map/stellar0.cnf /mnt/conf.d/
elif [[ $ordinal -eq 1 ]]; then
cp /mnt/config-map/stellar1.cnf /mnt/conf.d/
else
cp /mnt/config-map/stellar2.cnf /mnt/conf.d/
fi
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
containers:
- name: stellar
image: stellar-image:version
ports:
- name: stellar
containerPort: <whatever port you need here>
volumeMounts:
- name: conf
mountPath: /etc/stellar/conf.d <== wherever your config for stellar needs to be
volumes:
- name: conf
emptyDir: {}
- name: config-map
configMap:
name: stellar
Service (if you need to expose it)
apiVersion: v1
kind: Service
metadata:
name: stellar
labels:
app: stellar
spec:
ports:
- name: stellar
port: <stellar-port>
clusterIP: None
selector:
app: stellar
Hope it helps!
It is worth stating: Kube's main strength is managing scalable workloads of identical Pods. That's why the ReplicaSet exists in the Kube API.
Blockchain validator nodes are not identical Pods. They are not anonymous ; they are identified by their public addresses which require unique private keys.
Blockchain nodes which serve as RPC nodes are simpler in this sense; they can be replicated and RPC requests can be round robined between the nodes.
There is value to using Kube for blockchain networks; but if deploying validators (and boot nodes) feels like it goes against the grain, that's because it doesn't neatly fit into the ReplicaSet model.
I'm currently using the GCE standard container cluster with lot of success and pleasure. But I had a question about the provisioning of GCE Persistent disks.
As described in this document form Kubernetes. I created two YAML files:
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
name: slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
and
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
If I now create a following Volume Claim:
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "claim-test",
"annotations": {
"volume.beta.kubernetes.io/storage-class": "hdd"
}
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "3Gi"
}
}
}
}
The disk gets created perfectly!
And if I now start following unit
apiVersion: v1
kind: ReplicationController
metadata:
name: nfs-server
spec:
replicas: 1
selector:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: claim-test
The disk gets mounted perfectly but many times I stumble upon the following error (not more can be found in the kubelet.log file):
Failed to attach volume "claim-test" on node "...." with: GCE persistent disk not found: diskName="....." zone="europe-west1-b"
Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "....". list of unattached/unmounted volumes=[....]
Sometimes the pod boots perfectly, but sometimes it crashes. The only thing I could find is that there needs to be enough time between creating the PVC and the RC itself. I tried this many times but with the same uncertain results.
I hope someone can give me some kind of suggestion or help.
Thanks in advance!
Best regards,
Hacor
Thanks in advance for your comments! After a few days of searching I was finally able to determine what the problem was, I'm posting it because it may be useful other users.
I was using the NFS example for Kubernetes as a replication controller to provide my apps with NFS storage, but it seems that when the NFS server and the PV,PVC get deleted sometimes the NFS share gets stuck on the node itself and I think it has to do with the fact that I didn't delete this elements in a particular order and therefore the node got stuck with the share becoming incapable of mounting new shares to itself or the pod!
I noticed that the problem always occurred after I deleted some app (NFS, PV, PVC and other components) from the cluster. If I created a new cluster on GCE it works perfectly to create apps, until I delete one and it goes wrong...
What the correct deletion order is I don't know for sure, but I think:
Pods using the NFS share
PV, PVC of the NFS share
NFS server
If the pod takes longer to delete, and it isn't completely gone before PV is deleted, the node hangs with a mount it can't delete because it's in use, and that's where the problems occur.
I must honestly say that now I'm moving to an externally provisioned GlusterFS cluster.
Hope it helps someone!
Regards,
Hacor