Right now i use docker-compose for development. This is a great tool that comes handy if i use it on simple projects where i got maximum of 3-6 active services but when it comes to 6-8 and more it is become hard to manage.
So i've started to learn k8s on minikube and now i got few questions about some questions:
How to make "two-way" binding for volumes? for example if i have folder named "my-frontend" and i want to sync specific folder in deployment, how to "link" them using PV and PVC ?
Very often it comes handy to make some service with specific environment like node:12.0.0 and then use it as command executor like this: docker-compose run workspace npm install
how to achieve something like this using k8s?
How to make "two-way" binding for volumes? for example if i have folder named "my-frontend" and i want to sync specific folder in deployment, how to "link" them using PV and PVC ?
You need to create a PersistentVolume which in your case will use a specific directory in the host, in Kubernetes official documentation there's an example with this same use case.
Then a PersistentVolumeClaim to request some space from this volume (also an example in the previous documentation) and then mount the PVC on the pod/deployment where you need it.
volumes:
- name: my-volume
persistentVolumeClaim:
claimName: my-pvc
containers:
....
volumeMounts:
- mountPath: "/mount/path/in/pod"
name: my-volume
Very often it comes handy to make some service with specific environment like node:12.0.0 and then use it as command executor like this: docker-compose run workspace npm install
how to achieve something like this using k8s?
You need to use kubectl, it has very similar functionalities as docker CLI, it also supports run command with very similar parameters and functionality. Alternatively, you can start your pod once and then run commands multiple times in the same instance by using kubectl exec
Related
I set up a local Kubernetes cluster using Kind, and then I run Apache-Airflow on it using Helm.
To actually create the pods and run Airflow, I use the command:
helm upgrade -f k8s/values.yaml airflow bitnami/airflow
which uses the chart airflow from the bitnami/airflow repo, and "feeds" it with the configuration of values.yaml.
The file values.yaml looks something like:
web:
extraVolumeMounts:
- name: functions
mountPath: /dir/functions/
extraVolumes:
- name: functions
hostPath:
path: /dir/functions/
type: Directory
where web is one component of Airflow (and one of the pods on my setup), and the directory /dir/functions/ is successfully mapped from the cluster inside the pod. However, I fail to do the same for a single, specific file, instead of a whole directory.
Does anyone knows the syntax for that? Or have an idea for an alternative way to map the file into the pod (its whole directory is successfully mapped into the cluster)?
There is a File type for hostPath which should behave like you desire, as it states in the docs:
File: A file must exist at the given path
which you can then use with the precise file path in mountPath. Example:
web:
extraVolumeMounts:
- name: singlefile
mountPath: /path/to/mount/the/file.txt
extraVolumes:
- name: singlefile
hostPath:
path: /path/on/the/host/to/the/file.txt
type: File
Or if it's not a problem, you could mount the whole directory containing it at the expected path.
With this said, I want to point out that using hostPath is (almost always) never a good idea.
If you have a cluster with more than one node, saying that your Pod is mounting an hostPath doesn't restrict it to run on a specific host (even tho you can enforce it with nodeSelectors and so on) which means that if the Pod starts on a different node, it may behave differently, not finding the directory and / or file it was expecting.
But even if you restrict the application to run on a specific node, you need to be ok with the idea that, if such node becomes unavailable, the Pod will not be scheduled on its own somewhere else.. meaning you'll need manual intervention to recover from a single node failure (unless the application is multi-instance and can resist one instance going down)
To conclude:
if you want to mount a path on a particular host, for whatever reason, I would go for local volumes.. or at least use hostPath and restrict the Pod to run on the specific node it needs to run on.
if you want to mount small, textual files, you could consider mounting them from ConfigMaps
if you want to configure an application, providing a set of files at a certain path when the app starts, you could go for an init container which prepares files for the main container in an emptyDir volume
I have a simple web app that uses volume/persistent volume claim to serve static data from there. Pods got scheduled only on the first worker node where volume resides.
How to deal with the shared volume between nodes in case I want to scale pods and have them allocated across all the available nodes?
- name: app-nginx
image: nginx:1.20.0
command: ["/usr/sbin/nginx", "-g", "daemon off;"]
volumeMounts:
- name: static-volume
mountPath: "/opt/static"
volumes:
- name: static-volume
persistentVolumeClaim:
claimName: pvc-static
One option is to use NFS and not physically allocate volume on an EC2 instance.
Another way is to duplicate static files for each pod (populate them into proper directory with init container) but that requires extra time and is not feasible for a lot a static data.
What's the proper way to deal with such a case within Kubernetes? Is there a way to declare deployment which will be using logically same volume but physically different instances located on the different nodes?
What you are looking for is a volume provider that supports the ReadOnlyMany
or ReadWriteMany Access Mode.
Follow the documentation link to get a list of the officially supported ones.
If you are on AWS than probably using EFS through the NFS plugin will be the easiest solution, but please take into account the fact it is an NFS-based solution and you might hit a performance penalty.
As a side note, what you are trying to do smeels like an anti-pattern to me.
Docker images for an application should be self contained. In your case, having a container serving a static website should contain all the static files it needs to in order to be fully portable. This would remove the need to have an external volume with the data completely.
Yes, you are right one option is to use the NFS. You have to implement the ReadWriteMany or ReadOnlyMany : https://stackoverflow.com/a/57798369/5525824
If you have a scenario of ReadOnlyMany you can create the PVC in GCP with GKE. https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
However, if you are looking forward to doing a write operation also the available option is to use FileSystem or NFS.
You can also checkout implementing the Minio if not want to use any managed service and following cloud-agnostic : https://min.io/
NAS : https://github.com/minio/charts#nas-gateway
Just for FYI : https://github.com/ctrox/csi-s3 performance might be not good.
I have a python pod running.
This python pod is using different shared libraries. To make it easier to debug the shared libraries I would like to have the libraries directory on my host too.
The python dependencies are located in /usr/local/lib/python3.8/site-packages/ and I would like to access this directory on my host to modify some files.
Is that possible and if so, how? I have tried with emptyDir and PV but they always override what already exists in the directory.
Thank you!
This is by design. Kubelet is responsible for preparing the mounts for your container. At the time of mounting they are empty and kubelet has no reason to put any content in them.
That said, there are ways to achieve what you seem to expect by using init container. In your pod you define init container using your docker image, mount your volume in it in some path (ie. /target) but instead of running regular content of your container, run something like
cp -r /my/dir/* /target/
which will initiate your directory with expected content and exit allowing further startup of the pod
Please take a look: overriding-directory.
Another option is to use subPath. Subpath references files or directories that are controlled by the user, not the system. Take a loot on this example how to mount single file into existing directory:
---
volumeMounts:
- name: "config"
mountPath: "/<existing folder>/<file1>"
subPath: "<file1>"
- name: "config"
mountPath: "/<existing folder>/<file2>"
subPath: "<file2>"
restartPolicy: Always
volumes:
- name: "config"
configMap:
name: "config"
---
Check full example here. See: mountpath, files-in-folder-overriding.
You can also as #DavidMaze said debug your setup in a non-container Python virtual environment if you can, or as a second choice debugging the image in Docker without Kubernetes.
You can take into consideration also below third party tools, that were created especially for Kubernetes app developers keeping in mind this functionality (keep in-sync source and remote files).
Skaffold's Continuous Deployment workflow - it takes care of keeping source and remote files (Pod mounted directory) in sync.
Telepresence`s Volume access feature.
I am new to openshift, I have gone through Openshift website for more details but wanted to know if anyone has deployed init container.
I want to use that to take dump from database and restore it to new version of it with the help of init container
We are using postgres database
Any help would be appreciated.
Thanks!
I want to use that to take dump from database and restore it to new version of it with the help of init container
I would say you should rather use the operator instead of initContainer. Take a look at below Init Containers Design Considerations
There are some considerations that you should take into account when you create initcontainers:
They always get executed before other containers in the Pod. So, they
shouldn’t contain complex logic that takes a long time to complete.
Startup scripts are typically small and concise. If you find that
you’re adding too much logic to init containers, you should consider
moving part of it to the application container itself.
Init containers are started and executed in sequence. An init
container is not invoked unless its predecessor is completed
successfully. Hence, if the startup task is very long, you may
consider breaking it into a number of steps, each handled by an init
container so that you know which steps fail.
If any of the init containers fail, the whole Pod is restarted
(unless you set restartPolicy to Never). Restarting the Pod means
re-executing all the containers again including any init containers.
So, you may need to ensure that the startup logic tolerates being
executed multiple times without causing duplication. For example, if
a DB migration is already done, executing the migration command again
should just be ignored.
An init container is a good candidate for delaying the application
initialization until one or more dependencies are available. For
example, if your application depends on an API that imposes an API
request-rate limit, you may need to wait for a certain time period to
be able to receive responses from that API. Implementing this logic
in the application container may be complex; as it needs to be
combined with health and readiness probes. A much simpler way would
be creating an init container that waits until the API is ready
before it exits successfully. The application container would start
only after the init container has done its job successfully.
Init containers cannot use health and readiness probes as application
containers do. The reason is that they are meant to start and exit
successfully, much like how Jobs and CronJobs behave.
All containers on the same Pod share the same Volumes and network.
You can make use of this feature to share data between the
application and its init containers.
The only thing I found about using it for dumping data is this example about doing that with mysql, maybe it can guide you how to do it with postgresql.
In this scenario, we are serving a MySQL database. This database is used for testing an application. It doesn’t have to contain real data, but it must be seeded with enough data so that we can test the application's query speed. We use an init container to handle downloading the SQL dump file and restore it to the database, which is hosted in another container. This scenario can be illustrated as below:
The definition file may look like this:
apiVersion: v1
kind: Pod
metadata:
name: mydb
labels:
app: db
spec:
initContainers:
- name: fetch
image: mwendler/wget
command: ["wget","--no-check-certificate","https://sample-videos.com/sql/Sample-SQL-File-1000rows.sql","-O","/docker-entrypoint-initdb.d/dump.sql"]
volumeMounts:
- mountPath: /docker-entrypoint-initdb.d
name: dump
containers:
- name: mysql
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "example"
volumeMounts:
- mountPath: /docker-entrypoint-initdb.d
name: dump
volumes:
- emptyDir: {}
name: dump
The above definition creates a Pod that hosts two containers: the init container and the application one. Let’s have a look at the interesting aspects of this definition:
The init container is responsible for downloading the SQL file that contains the database dump. We use the mwendler/wget image because we only need the wget command.
The destination directory for the downloaded SQL is the directory used by the MySQL image to execute SQL files (/docker-entrypoint-initdb.d). This behavior is built into the MySQL image that we use in the application container.
The init container mounts /docker-entrypoint-initdb.d to an emptyDir volume. Because both containers are hosted on the same Pod, they share the same volume. So, the database container has access to the SQL file placed on the emptyDir volume.
Additionally for best practices I would suggest to take a look at kubernetes operators, as far as I know that's the best practice way to menage databases in kubernetes.
If you're not familiar with operators I would suggest to start with kubernetes documentation and this short video on youtube.
Operators are methods of packaging Kubernetes that enable you to more easily manage and monitor stateful applications. There are many operators already available, such as the
Crunchy Data PostgreSQL Operator
Postgres Operator
which automates and simplifies deploying and managing open source PostgreSQL clusters on Kubernetes by providing the essential features you need to keep your PostgreSQL clusters up and running.
I want to create specific version of redis to be used as a cache. Task:
Pod must run in web namespace
Pod name should be cache
Image name is lfccncf/redis with the 4.0-alpine tag
Expose port 6379
The pods need to be running after complete
This are my steps:
k create ns web
k -n web run cache --image=lfccncf/redis:4.0-alpine --port=6379 --dry-run=client-o yaml > pod1.yaml
vi pod1.yaml
pod looks like this
k create -f pod1.yaml
When the expose service name is not define is this right command to fully complete the task ?
k expose pod cache --port=6379 --target-port=6379.
Is it the best way to keep pod running using command like this command: ["/bin/sh", "-ec", "sleep 1000"] ?
You should not use sleep to keep a redis pod running. As long as the redis process runs in the container the pod will be running.
The best way to go about it is to take a stable helm chart from https://hub.helm.sh/charts/stable/redis-ha. Do a helm pull and modify the values as you need.
Redis should be definde as a Statefulset for various reasons. You could also do a
mkdir my-redis
helm fetch --untar --untardir . 'stable/redis' #makes a directory called redis
helm template --output-dir './my-redis' './redis' #redis dir (local helm chart), export to my-redis dir
then use Kustomise if you like.
You will notice that a redis deployment definition is not so trivial when you see how much code there is in the stable chart.
You can then expose it in various ways, but normally you need the access only within the cluster.
If you need a fast way to test from outside the cluster or use it as a development environment check the official ways to do that.