Deploy a scalable application on Kubernetes which requires each replica Pod to have different args - kubernetes

I am trying to understand how to deploy an application on Kubernetes which requires each Pod of the same deployment to have different args used with the starting command.
I have this application which runs spark on Kubernetes and needs to spawn executor Pods on start. The problem is that each Pod of the application needs to spawn its own executors using its own port and spark app name.
I've read of stateful sets and searched the documentation but I didn't found a solution to my problem. Since every Pod needs to use a different port, I need that port to be declared in a service if I understood correctly, and also directly passed as an argument to the pod command in the args.
Is there a way to obtain this without using multiple deployments, one for each pod I need to create? Because this is the only solution i can think of but it can't be scaled after being deployed.
I'm using Helm to deploy the application, so I can easily create as many deployments and / or services as needed, but I would like to find a solution which can scale at runtime, if possible.

I don't think you can have a Deployment which creates PODs from different Specs. You can't have it in Kubernetes and Helm won't help here (since Helm is just a template manager over Kubernetes configurations).
What you can do is to specify each Pod as a separate configuration (if single Pod, you don't necessarily need Deployment) and let Helm manage it.

Posting the solution I used since it could be useful for other people searching around.
In the end I found a great configuration to solve my problem. I used a StatefulSet to declare the deployment of the Spark application. Associated with the StatefulSet, a headless Service which expose each pod on a specific port.
StatefulSet can declare a property spec.serviceName which can have the same name of a headless service to create a unique network name for each Pod. Something like <pod_name>.<service_name>
Additionally, each Pod has a unique and not-changing name which is created using the application name and an ordinal starting from 0 for each replica Pod.
Using a starting script in the docker image and inserting in the environment of each Pod the pod name from the metadata, I was able to use different configurations for each pod since, even with the same deployment, each pod have their own unique metadata name and I can use the StatefulSet service to obtain what I needed.
This way, the StatefulSet is scalable at run time and works as expected.

hey I am not sure if this will exactly match your scenario but I think this is what you can try. Use a sidecar container to run the replica instances, A sidecar is a container which runs along with the main container and also shares the same namespace and can share volumes across each container.
Now to pass the different arguments to each container or sidecar, you will have to tweak the dockerfile or rather tweak the way your container starts.
Create a start.sh script file which accepts the arguments and starts the container with those arguments, the trick here is to accept the argument from environment variables thus allowing you to later configure these from configmaps or pod env.
So here is an example of php/laravel application running the same code and starting with different arguments. And the start.sh the file looks like this.
#!/bin/sh
if [ "${CONTAINER_ROLE}" = "queue" ];
then
echo "Running the queue..."
php artisan queue:work --queue=${QUEUENAME}
echo "Queue Started"
else
echo "Running Iceberg."
exec apache2-foreground
fi
So a sample dockerfile looks like this
FROM php:7.1.24-apache
COPY . /srv/myapp
...
...
RUN chown -R www-data:www-data /srv/app \
&& a2enmod remoteip && a2enmod rewrite
WORKDIR /srv/app
RUN chmod +x .docker/start.sh
CMD [ "sh",".docker/start.sh"]
Let me know how it goes.

Related

Expose container in Kubernetes

I want to create specific version of redis to be used as a cache. Task:
Pod must run in web namespace
Pod name should be cache
Image name is lfccncf/redis with the 4.0-alpine tag
Expose port 6379
The pods need to be running after complete
This are my steps:
k create ns web
k -n web run cache --image=lfccncf/redis:4.0-alpine --port=6379 --dry-run=client-o yaml > pod1.yaml
vi pod1.yaml
pod looks like this
k create -f pod1.yaml
When the expose service name is not define is this right command to fully complete the task ?
k expose pod cache --port=6379 --target-port=6379.
Is it the best way to keep pod running using command like this command: ["/bin/sh", "-ec", "sleep 1000"] ?
You should not use sleep to keep a redis pod running. As long as the redis process runs in the container the pod will be running.
The best way to go about it is to take a stable helm chart from https://hub.helm.sh/charts/stable/redis-ha. Do a helm pull and modify the values as you need.
Redis should be definde as a Statefulset for various reasons. You could also do a
mkdir my-redis
helm fetch --untar --untardir . 'stable/redis' #makes a directory called redis
helm template --output-dir './my-redis' './redis' #redis dir (local helm chart), export to my-redis dir
then use Kustomise if you like.
You will notice that a redis deployment definition is not so trivial when you see how much code there is in the stable chart.
You can then expose it in various ways, but normally you need the access only within the cluster.
If you need a fast way to test from outside the cluster or use it as a development environment check the official ways to do that.

Helm chart copy shell script from local machine to remote pod , change permission and exeucte

Is there a way I can copy shell script from local machine to pod using charts and helm, change the script permission and execute the script inside the pod?
No, Helm cannot do this. In effect only the Kubernetes commands it can run are kubectl apply and kubectl delete, though it can apply templating before sending YAML off to the Kubernetes server. The sorts of imperative commands you're describing (kubectl cp and kubectl exec) aren't things Helm can do.
(The sorts of imperative commands you're describing aren't generally good form in Kubernetes in any case. Generally you'd need to package your script up in a Docker image to be able to run it in the cluster, and you want to try to set up your containers to be able to set themselves up as much as they can. Also remember that pods get deleted routinely, sometimes even outside of your control, and anything you've manually copied into a pod will get lost when this happens.)

Difficulty with different kubernetes pods run using kubetctl apply running same container images sharing directories

I am attempting to run two separate pods using the same container image on a cluster by applying a config file. Despite there being no shared or persistent volume when both pods are active the same directory on both pods is updated with created files from the other pod and write access changes suddenly. The container being used is the jupyter-docker-stacks jupyter/minimal-notebook image being pulled directly from dockerhub. These pods running this container is created by applying a manifest. The two pods have different labels and names. A service with a unique name is created for each pod for access.
Do resources for containers persist over time on a cluster like in docker containers? I cannot find something equivalent to a --rm flag to be used alongside kubectl apply.
Thanks
If you want to delete the pod after the job is completed, you might want to apply job instead of pod. The idea of job in k8s is to launch a pod and do the job, and then the pod get stopped. For more info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
$ kubectl apply -f <fileName> will create or make some changes in the pod. If you want to delete pod using apply you must use $ kubectl delete -f <fileName>
About sharing, if you have 2 separate manifest you can specify volumeMounts for each container. For more information please read the documentation depends on your needs.
Also as #Kaizhe Huang advised you can use Job if you want to execute something one time or try initContainers if you want to install something in POD before main container will be run. More about initContainers here.
You could check the dockerfile of your image. See if there are 'VOLUME' claimed. If have, maybe they share the same volume on host. Not sure, but you could check.

Does Kubernetes provide a colocated Job container?

I wonder how would one implement a colocated auxiliary container in a Pod within a Deployment which does not provide a service but rather a job/batch workload?
Background of my questions is, that I want to deploy a scalable service at which each instance needs configuration after its start. This configuration is done via a HTTP POST to its local colocated service instance. I've implemented a auxiliary container for this in order to benefit from the feature of colocation. So the auxiliary container always knows which instance needs to be configured.
Problem is, that the restartPolicy needs to be defined at the Pod level. I am looking for something like restart policy always for the service and a different restart policy onFailurefor the configuration job.
I know that k8s provides the Job resource for such workloads. But is there an option to colocate those jobs to Pods?
Furthermore I've stumbled across the so called init containers which might be defined via annotations. But these suffer the drawback, that k8s ensures that the actual Pod is only started after the init container did run. So for my very scenario it seems unsuitable.
As I understand you need your service running to configure it.
Your solution is workable and you can set restartPolicy: always you just need a way to tell your one off configuration container that it already ran. You could create and attach an emptyDir volume to your configuration container, create a file on it to mark your configuration successful and check for this file from your process. After your initialization you enter sleep in a loop. The downside is that some resources will be taken up by that container too.
Or you can just add an extra process in the same container and do the configuration (maybe with the file mentioned above as a guard to avoid configuring twice). So write a simple shell script like this and run it instead of your main process:
#!/bin/sh
(
[ -f /mnt/guard-vol/stamp ] && exit 0
/opt/my-config-process parameters && touch /mnt/guard-vol/stamp
) &
exec /opt/my-main-process "$#"
Alternatively you could implement a separate pod that queries the kubernetes API for pods of your service with label configured=false. Configure it and remove the label with the API. You should also modify your Service to select configured=true pods.

Reload Kubernetes ReplicationController to get newly created Service

Is there a way to reload currently running pods created by replicationcontroller to reapply newly created services?
Example:
I have a running pods created by ReplicationController config file. I have deleted a service called mongo-svc and recreated it again using different port. Is there a way for the pod's env file to be updated with the new IP and ports from the new mongo-svc?
You can restart pods by simply deleting them: if they are linked to a Replication controller, the RC will take care of restarting them
kubectl delete pod <your-pod-name>
if you have a couple pods, it's easy enougth to copy/paste the pod names, but if you have many pods it can become cumbersome.
So another way to delete pods and restart them is to scale the RC down to 0 instances and back up to the number you need.
kubectl scale --replicas=0 rc <your-rc>
kubectl scale --replicas=<n> rc <your-rc>
By-the-way, you may also want to look at 'rolling-updates' to do this in a more production friendly manner, but that implies updating the RC config.
If you want the same pod to have the new service, the clean answer is no. You could (I strongly suggest not to do this) run kubectl exec <pod-name> -c <containers> -- export <service env var name>=<service env var value>. But your best bet is to run kubectl delete <pod-name> and let your replication controller handle the work.
I've ran into a similar issue for services being ran outside of kubernetes, say a DB for instance, to address this I've been creating this https://github.com/cpg1111/kubongo which updates the service's endpoint without deleting the pods. That same idea can also be applied to other pods in kubernetes to automate the service update. Basically it watches a specific service, and when it's IP changes for whatever reason it updates all the pods without deleting them. This does use the same code as kubectl exec however it is automated, sanitizes input and ensures the export is executed on all pods.
What do you mean with 'reapply'?
The pods to which the services point are generally selected based on labels.In other words, you can add / remove labels from the pods to include / exclude them from a service.
Read here for more information about defining services: http://kubernetes.io/v1.1/docs/user-guide/services.html#defining-a-service
And here for more information about labels: http://kubernetes.io/v1.1/docs/user-guide/labels.html
Hope it helps!