Access kubernetes job pods using hostname - kubernetes

I am trying to run a k8s job with 2 pods in which one pod will try to connect to other pod.
I cannot connect to other pod using hostname of the pod as suggested in the doc - https://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode.
I have created a service and trying to access the pod as k8s-train-0.default.svc.cluster.local as mentioned in the document.
apiVersion: batch/v1
kind: Job
metadata:
name: k8s-train
spec:
parallelism: 2
completions: 2
completionMode: Indexed
manualSelector: true
selector:
matchLabels:
app.kubernetes.io/name: proxy
template:
metadata:
labels:
app.kubernetes.io/name: proxy
spec:
containers:
- name: k8s-train
image: pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime
command: ["/bin/sh","-c"]
args:
- echo starting;
export MASTER_PORT=54321;
export MASTER_ADDR=k8s-train-0.trainsvc.default.svc.cluster.local;
export WORLD_SIZE=8;
pip install -r /data/requirements.txt;
export NCCL_DEBUG=INFO;
python /data/bert.py --strategy=ddp --num_nodes=2 --gpus=4 --max_epochs=3;
echo done;
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 54321
name: master-port
resources:
requests:
nvidia.com/gpu: 4
limits:
nvidia.com/gpu: 4
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: efs-claim
restartPolicy: Never
backoffLimit: 0
---
apiVersion: v1
kind: Service
metadata:
name: trainsvc
spec:
selector:
app.kubernetes.io/name: proxy
ports:
- name: master-svc-port
protocol: TCP
port: 54321
targetPort: master-port
clusterIP: None
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
I am looking to establish communication between pod either using the hostname or to assign svc only to one pod slected with job-index.
Please let me know if i'm missing something here.
Thanks.

Related

kubernetes Deployment PodName setting

apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
labels:
app: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
name: test
labels:
app: test
spec:
containers:
- name: server
image: test_ml_server:2.3
ports:
- containerPort: 8080
volumeMounts:
- name: hostpath-vol-testserver
mountPath: /app/test/api
# env:
# - name: POD_NAME
# valueFrom:
# fieldRef:
# fieldPath: template.metadata.name
- name: testdb
image: test_db:1.4
ports:
- name: testdb
containerPort: 1433
volumeMounts:
- name: hostpath-vol-testdb
mountPath: /var/opt/mssql/data
# env:
# - name: POD_NAME
# valueFrom:
# fieldRef:
# fieldPath: template.metadata.name
volumes:
- name: hostpath-vol-testserver
hostPath:
path: /usr/testhostpath/testserver
- name: hostpath-vol-testdb
hostPath:
path: /usr/testhostpath/testdb
I want to set the name of the pod Because it communicates internally based on the name of the pod
but when a pod is created, it cannot be used because the variable name is appended to the end.
How can I set the pod name?
It's better if you use, statefulset instead of deployment. Statefulset's pod name will be like <statefulsetName-0>,<statefulsetName-1>... And you will need a clusterIP service. with which you can bound your pods. see the doc for more details. Ref
apiVersion: v1
kind: Service
metadata:
name: test-svc
labels:
app: test
spec:
ports:
- port: 8080
name: web
clusterIP: None
selector:
app: test
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: test-StatefulSet
labels:
app: test
spec:
replicas: 1
serviceName: test-svc
selector:
matchLabels:
app: test
template:
metadata:
name: test
labels:
app: test
spec:
containers:
- name: server
image: test_ml_server:2.3
ports:
- containerPort: 8080
volumeMounts:
- name: hostpath-vol-testserver
mountPath: /app/test/api
- name: testdb
image: test_db:1.4
ports:
- name: testdb
containerPort: 1433
volumeMounts:
- name: hostpath-vol-testdb
mountPath: /var/opt/mssql/data
volumes:
- name: hostpath-vol-testserver
hostPath:
path: /usr/testhostpath/testserver
- name: hostpath-vol-testdb
hostPath:
path: /usr/testhostpath/testdb
Here, The pod name will be like this test-StatefulSet-0.
if you are using the kind: Deployment it won't be possible ideally in this scenario you can use kind: Statefulset.
Instead of POD to POD communication, you can use the Kubernetes service for communication.
Still, statefulset manage the pod name in the sequence
statefulsetname - 0
statefulsetname - 1
statefulsetname - 2
You can't.
It is the property of the pods of a Deployment that they do not have an identity associated with them.
You could have a look at Statefulset instead of a Deployment if you want the pods to have a state.
From the docs:
Like a Deployment, a StatefulSet manages Pods that are based on an
identical container spec. Unlike a Deployment, a StatefulSet maintains
a sticky identity for each of their Pods. These pods are created from
the same spec, but are not interchangeable: each has a persistent
identifier that it maintains across any rescheduling.
So, if you have a Statefulset object named myapp with two replicas, the pods will be named as myapp-0 and myapp-1.

kubectl - Error response from daemon: error while creating mount source path

I'm trying to install SAP HANA Express docker image in a Kubernete node in Google Cloud Platform as per guide https://developers.sap.com/tutorials/hxe-k8s-advanced-analytics.html#7f5c99da-d511-479b-8745-caebfe996164 however, during execution of step 7 "Deploy your containers and connect to them" I'm not getting the expected result.
I'm executing command kubectl create -f hxe.yaml and here is the yaml file I'm using it:
kind: ConfigMap
apiVersion: v1
metadata:
creationTimestamp: 2018-01-18T19:14:38Z
name: hxe-pass
data:
password.json: |+
{"master_password" : "HXEHana1"}
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: persistent-vol-hxe
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 150Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/hxe_pv"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: hxe-pvc
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hxe
labels:
name: hxe
spec:
selector:
matchLabels:
run: hxe
app: hxe
role: master
tier: backend
replicas: 1
template:
metadata:
labels:
run: hxe
app: hxe
role: master
tier: backend
spec:
initContainers:
- name: install
image: busybox
command: [ 'sh', '-c', 'chown 12000:79 /hana/mounts' ]
volumeMounts:
- name: hxe-data
mountPath: /hana/mounts
volumes:
- name: hxe-data
persistentVolumeClaim:
claimName: hxe-pvc
- name: hxe-config
configMap:
name: hxe-pass
imagePullSecrets:
- name: docker-secret
containers:
- name: hxe-container
image: "store/saplabs/hanaexpress:2.00.045.00.20200121.1"
ports:
- containerPort: 39013
name: port1
- containerPort: 39015
name: port2
- containerPort: 39017
name: port3
- containerPort: 8090
name: port4
- containerPort: 39041
name: port5
- containerPort: 59013
name: port6
args: [ "--agree-to-sap-license", "--dont-check-system", "--passwords-url", "file:///hana/hxeconfig/password.json" ]
volumeMounts:
- name: hxe-data
mountPath: /hana/mounts
- name: hxe-config
mountPath: /hana/hxeconfig
- name: sqlpad-container
image: "sqlpad/sqlpad"
ports:
- containerPort: 3000
---
apiVersion: v1
kind: Service
metadata:
name: hxe-connect
labels:
app: hxe
spec:
type: LoadBalancer
ports:
- port: 39013
targetPort: 39013
name: port1
- port: 39015
targetPort: 39015
name: port2
- port: 39017
targetPort: 39017
name: port3
- port: 39041
targetPort: 39041
name: port5
selector:
app: hxe
---
apiVersion: v1
kind: Service
metadata:
name: sqlpad
labels:
app: hxe
spec:
type: LoadBalancer
ports:
- port: 3000
targetPort: 3000
protocol: TCP
name: sqlpad
selector:
app: hxe
I'm also using the last version of HANA Express Edition docker image: store/saplabs/hanaexpress:2.00.045.00.20200121.1 that you can see available here: https://hub.docker.com/_/sap-hana-express-edition/plans/f2dc436a-d851-4c22-a2ba-9de07db7a9ac?tab=instructions
The error I'm getting is the following:
Any thought on what could be wrong?
Best regards and happy new year for everybody.
Thanks to the Mahboob suggestion now I can start the pods (partially) and the issue is not poppin up in the "busybox" container starting stage. The problem was that I was using an Container-Optimized image for the node pool and the required one is Ubuntu. If you are facing a similar issue double check the image flavor you are choosing at the moment of node pool creation.
However, I have now a different issue, the pods are starting (both the hanaxs and the other for sqlpad), nevertheless one of them, the sqlpad container, is crashing at some point after starting and the pod gets stuck in CrashLoopBackOff state. As you can see in picture below, the pods are in CrashLoopBackOff state and only 1/2 started and suddenly both are running.
I'm not hitting the right spot to solve this problem since I'm a newcomer to the kubernetes and docker world. Hope some of you can bring some light to me.
Best regards.

kubernetes StorageClass does not retain existing data

My Kubernetes StorageClass volume doesn't retain existing data when the pod is deleted and deployed back with my postgresql database. When I delete the pod, the new pod is created but the database is empty.
I have followed variations of the different versions of the tutorials (https://kubernetes.io/docs/concepts/storage/persistent-volumes/) but nothing seems to work.
I paste all the YAML files cause the problem might be in the combination.
storage-google.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: spingular-pvc
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 7Gi
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zone: us-east4-a
jhipsterpress-postgresql.yml
apiVersion: v1
kind: Secret
metadata:
name: jhipsterpress-postgresql
namespace: default
labels:
app: jhipsterpress-postgresql
type: Opaque
data:
postgres-password: NjY0NXJxd24=
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jhipsterpress-postgresql
namespace: default
spec:
replicas: 1
template:
metadata:
labels:
app: jhipsterpress-postgresql
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: spingular-pvc
containers:
- name: postgres
image: postgres:10.4
env:
- name: POSTGRES_USER
value: jhipsterpress
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: jhipsterpress-postgresql
key: postgres-password
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/
---
apiVersion: v1
kind: Service
metadata:
name: jhipsterpress-postgresql
namespace: default
spec:
selector:
app: jhipsterpress-postgresql
ports:
- name: postgresqlport
port: 5432
type: LoadBalancer
jhipsterpress-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jhipsterpress
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: jhipsterpress
version: "v1"
template:
metadata:
labels:
app: jhipsterpress
version: "v1"
spec:
initContainers:
- name: init-ds
image: busybox:latest
command:
- '/bin/sh'
- '-c'
- |
while true
do
rt=$(nc -z -w 1 jhipsterpress-postgresql 5432)
if [ $? -eq 0 ]; then
echo "DB is UP"
break
fi
echo "DB is not yet reachable;sleep for 10s before retry"
sleep 10
done
containers:
- name: jhipsterpress-app
image: galore/jhipsterpress
env:
- name: SPRING_PROFILES_ACTIVE
value: prod
- name: SPRING_DATASOURCE_URL
value: jdbc:postgresql://jhipsterpress-postgresql.default.svc.cluster.local:5432/jhipsterpress
- name: SPRING_DATASOURCE_USERNAME
value: jhipsterpress
- name: SPRING_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
name: jhipsterpress-postgresql
key: postgres-password
- name: JAVA_OPTS
value: " -Xmx256m -Xms256m"
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1"
ports:
- name: http
containerPort: 8080
readinessProbe:
httpGet:
path: /management/health
port: http
initialDelaySeconds: 20
periodSeconds: 15
failureThreshold: 6
livenessProbe:
httpGet:
path: /management/health
port: http
initialDelaySeconds: 120
jhipsterpress-service.yml
apiVersion: v1
kind: Service
metadata:
name: jhipsterpress
namespace: default
labels:
app: jhipsterpress
spec:
selector:
app: jhipsterpress
type: LoadBalancer
ports:
- name: http
port: 8080
When I included a Retain Policy I was getting this error:
#cloudshell:~ (academic-veld-230622)$ kubectl apply -f storage-google.yaml
error: error validating "storage-google.yaml": error validating data:
ValidationError(PersistentVolumeClaim.spec): unknown field "persistentVolumeReclaimPolicy" in io.k8s.api.core.v1.PersistentVolumeClaimSpec; if you choose to ignore these errors, turn validation off with --validate=false
Please, if you know of a complete example on a public image that works (in postgresql, I can make it work with Mongo), I will really appreciate it.
Thanks to all.
Note that for this to work you need to have your PVC dynamically provision a PV to satisfy its requirements, then there will be a permanent binding between the PVC and PV and every time your workload uses the PVC then it will use the same PV. Specifically indicated by this excerpt:
If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC
If in your case the Google Persistent Disk is being provisioned by the PVC, and you can verify that on GCP it's the same PV used every time, then it's probably an issue with the pod startup process where it's removing all the data. (Is there any reason why you are using /var/lib/postgresql/ vs /var/lib/postgresql?)
Also, persistentVolumeReclaimPolicy: Retain applies to a PV, not a PVC. For dynamically provisioned PVs the value is Delete. In your case, it wouldn't apply because your dynamically provisioned volume should be bound to your PVC. In other words, you are not reclaiming the volume.
Having said all that the recommended way to deploy a DB is using StatefulSets similar to this mysql example using a volumeClaimTemplate.

unable to deploy debezium on minikube

I am new to kubernetes, I am trying to integrate kafka with debezium and mysql.
i successfully deploy kafka and mysql on minikube , once i deploy the debezium yml on minikube, it got hanged and don't response at all , then i restart the minikube, After running all pod minikube again got hanged.
below is my code:
zookeeper service
apiVersion: v1
kind: Service
metadata:
name: zoo1
labels:
app: zookeeper-1
spec:
ports:
- name: client
port: 2181
protocol: TCP
- name: follower
port: 2888
protocol: TCP
- name: leader
port: 3888
protocol: TCP
selector:
app: zookeeper-1
zookeeper deployment:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: zookeeper-deployment-1
spec:
template:
metadata:
labels:
app: zookeeper-1
spec:
containers:
- name: zoo1
image: debezium/zookeeper
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_ID
value: "1"
- name: ZOOKEEPER_SERVER_1
value: zoo1
kafka service:
apiVersion: v1
kind: Service
metadata:
name: kafka-service
labels:
name: kafka
spec:
ports:
- port: 9092
name: kafka-port
protocol: TCP
selector:
app: kafka
id: "1"
type: NodePort
kafka deployemnt:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: kafka-broker1
spec:
template:
metadata:
labels:
selector: kafka
app: kafka
id: "1"
spec:
containers:
- name: kafka
image: debezium/kafka
ports:
- containerPort: 9092
env:
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_ADVERTISED_HOST_NAME
value: 192.168.39.47
- name: KAFKA_ZOOKEEPER_CONNECT
value: zoo1:2181
- name: KAFKA_BROKER_ID
value: "1"
- name: KAFKA_CREATE_TOPICS
value: hello-topic:3:3
MySql-persistance volume:
#application/mysql/mysql-pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: mysql-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
mysql deployment:
#application/mysql/mysql-deployment.yaml
# this command is for mysql client kubectl run -it --rm --image=debezium/example-mysql --restart=Never mysql-client -- mysql -h mysql -pdebezium
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
ports:
- port: 3306
selector:
app: mysql
clusterIP: None
---
apiVersion: extensions/v1beta1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql
spec:
containers:
- image: debezium/example-mysql
name: mysql
env:
# Use secret in real usage
- name: MYSQL_ROOT_PASSWORD
value: debezium
- name: MYSQL_USER
value: mysqluser
- name: MYSQL_PASSWORD
value: mysqlpw
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pv-claim
Debezium deployment:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: debezium-connect-source
spec:
selector:
matchLabels:
app: debezium-connect-source
replicas: 1
template:
metadata:
labels:
app: debezium-connect-source
spec:
terminationGracePeriodSeconds: 30
containers:
- name: debezium-connect-source
image: debezium/connect
env:
- name: BOOTSTRAP_SERVERS
value: kafka-service:9092
- name: GROUP_ID
value: "1"
- name: CONFIG_STORAGE_TOPIC
value: debezium-connect-source_config
- name: OFFSET_STORAGE_TOPIC
value: debezium-connect-source_offset
ports:
- containerPort: 8083
name: dm-c-source
when i deploy the debezium , then problem starts and minikube response like
$ kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
OS :Centos
minikube version: v0.30.0
I believe this is happening because of the resource crunch on the VM started by minikube.
By default when you start using minikube start it takes up only 2 CPU and 2GB RAM from your system, and by looking at your deployments (kafka + mysql + debezium) that might not be enough.
You can increase CPU and memory allocated to VM by using minikube start with parameters --cpu and --memory (value should be in MB).
For more info, you should do minikube start -h
I strongly suggest, if you want to setup a heavy deployments you should be using machines with more resources.
Hope this helps.
you should set memory limits for the Java-based pods. The older versions of Java would see the whole guest memory as their own and will happily consume it completely - and there are at least three JVMs started.

Kubernetes get nodeport mappings in a pod

I'm migrating an application to Docker/Kubernetes. This application has 20+ well-known ports it needs to be accessed on. It needs to be accessed from outside the kubernetes cluster. For this, the application writes its public accessible IP to a database so the outside service knows how to access it. The IP is taken from the downward API (status.hostIP).
One solution is defining the well-known ports as (static) nodePorts in the service, but I don't want this, because it will limit the usability of the node: if another service has started and incidentally taken one of the known ports the application will not be able to start. Also, because Kubernetes opens the ports on all nodes in the cluster, I can only run 1 instance of the application per cluster.
Now I want to make the application aware of the port mappings done by the NodePort-service. How can this be done? As I don't see a hard link between the Service and the Statefulset object in Kubernetes.
Here is my (simplified) Kubernetes config:
apiVersion: v1
kind: Service
metadata:
name: my-app-svc
labels:
app: my-app
spec:
ports:
- port: 6000
targetPort: 6000
protocol: TCP
name: debug-port
- port: 6789
targetPort: 6789
protocol: TCP
name: traffic-port-1
selector:
app: my-app
type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-app-sf
spec:
serviceName: my-app-svc
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-repo/myapp/my-app:latest
imagePullPolicy: Always
env:
- name: K8S_ServiceAccountName
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: K8S_ServerIP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: serverName
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- name: debug
containerPort: 6000
- name: traffic1
containerPort: 6789
This can be done with an initContainer.
You can define an initContainer to get the nodeport and save into a directory that shared with the container, then container can get the nodeport from that directory later, a simple demo like this:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: my-app
image: busybox
command: ["sh", "-c", "cat /data/port; while true; do sleep 3600; done"]
volumeMounts:
- name: config-data
mountPath: /data
initContainers:
- name: config-data
image: tutum/curl
command: ["sh", "-c", "TOKEN=`cat /var/run/secrets/kubernetes.io/serviceaccount/token`; curl -kD - -H \"Authorization: Bearer $TOKEN\" https://kubernetes.default:443/api/v1/namespaces/test/services/app 2>/dev/null | grep nodePort | awk '{print $2}' > /data/port"]
volumeMounts:
- name: config-data
mountPath: /data
volumes:
- name: config-data
emptyDir: {}