Apache common io libraries used for file deletion in kubernetes persistent volume not working as expected. FileUtils.forceDelete is used to delete a file in kubernetes pv but this does not delete the file. User running this APP has full permission to r/w/x.
Related
I want to make a container that is able to transfer files between itself and other containers on the cluster. I have multiple containers that are responsible for executing a task, and they are waiting to get an input file to do so. I want a separate container to be responsible for handling files before and after the task is executed by the other containers. As an example:
have all files on the file manager container.
let the file manager container automatically copy a file to a task executing container.
let task executing container run the task.
transfer the output of the task executing container to the file manager container.
And i want to do this automatically, so that for example 400 input files can be processed to output files in this way. What would be the best way to realise such a process with kubernetes? Where should I start?
A simple approach would be to set up the NFS or use the File system like AWS EFS or so.
You can mount the File system or NFS directly to POD which will be in ReadWriteMany access method.
ReadWriteMany - Multiple POD can access the single file system.
If you don't want to use the Managed service like EFS or so you can also set up the file system on K8s checkout the MinIO : https://min.io/
All files will be saved in the File system and as per POD requirement, it can simply access it from the file system.
You can create different directories to separate the outputs.
If you want only read operation, meaning all PODs can read the files only you can also set up the ReadOnlyMany access mode.
If you are GCP you can checkout this nice document : https://cloud.google.com/filestore/docs/accessing-fileshares
I have used Kubernetes and I deployed for example WordPress or nginx or etc. We install from yaml file. Where is it installed how can i find directory of pages(for example WordPress pages etc.) at same point at Google Cloud too. When I use Kubernetes at Google Cloud where is the path of installed files(ex. index.php).
If you are running the docker image directly without attaching anything like NFS, S3 or Disk then you will be able to get those files by default in the container file system(index.php and all).
With any K8s cluster you check files inside container either Gcloud or any :
kubectl get pods
kubectl exec -it <Wordpress pod name> -- /bin/bash
If you are attaching the File system like NFS, or object storage S3 or EFS you will be able to watch those files there unless you mount and apply config using the YAML file.
Regarding setup file (YAML),
Kubernetes uses the ETCD database as a data store. The flow is like this. Kubectl command connect to API server and sends the YAML file to API server. API parses and store the information in ETCD database so you wont be getting those file as it is in YAML format.
I have a Google Cloud Composer 1 environment (Airflow 2.1.2) where I want to run an Airflow DAG that utilizes the KubernetesPodOperator.
Cloud Composer makes available to all DAGs a shared file directory for storing application data. The files in the directory reside in a Google Cloud Storage bucket managed by Composer. Composer uses FUSE to map the directory to the path /home/airflow/gcs/data on all of its Airflow worker pods.
In my DAG I run several Kubernetes pods like so:
from airflow.contrib.operators import kubernetes_pod_operator
# ...
splitter = kubernetes_pod_operator.KubernetesPodOperator(
task_id='splitter',
name='splitter',
namespace='default',
image='europe-west1-docker.pkg.dev/redacted/splitter:2.3',
cmds=["dotnet", "splitter.dll"],
)
The application code in all the pods that I run needs to read from and write to the /home/airflow/gcs/data directory. But when I run the DAG my application code is unable to access the directory. Likely this is because Composer has mapped the directory into the worker pods but does not extend this courtesy to my pods.
What do I need to do to give my pods r/w access to the /home/airflow/gcs/data directory?
Cloud Composer uses FUSE to mount certain directories from Cloud Storage into Airflow worker pods running in Kubernetes. It mounts these with default permissions that cannot be overwritten, because that metadata is not tracked by Google Cloud Storage. A possible solution is to use a bash operator that runs at the beginning of your DAG to copy files to a new directory. Another possible solution can be to use a non-Google Cloud Storage path like a /pod path.
I have a kubernetes cluster of 1 master & 1 worker. I deployed 1 nodejs application and mounted a directory /home/user_name/data using hostPath volume type to the pod. The type I set to Directory for the volume.
The nodejs application is working perfectly and writing data to and returning the saved data without any error.
Initially, I also tested by deleting the data folder and trying to apply the deployment and I received error as the volume type is Directory. So it looks the mount is correctly pointing to the directory in the worker node. But when I am trying to see the file which should be created by the nodejs app, I do not see any file in the host directory.
I also checked if the pod is running on worker node and it's running on worker node only.
Mu understanding about hostPath volume type is that it's similar to bindMount volume type of docker.
I have not idea why I cannot see the file which the nodejs app is creating and working perfectly. Any help will be appreciated.
I am facing an issue while reading a file stored in my system in spark cluster mode program.It is giving me an error that "File not found" but file is present at defined location.Please suggest me some idea so that i can read local file in spark cluster using kubernetes.
You cannot refer local files on your machine when you submit Spark on Kubernetes.
The available solutions for your case might be:
Use Resource staging server. Is not available in the main branch of Apache Spark codebase, so the whole integration is on your side.
Put your file to the http/hdfs accessible location: refer docs
Put your file inside Spark Docker image and refer it as local:///path/to/your-file.jar
If you are running local Kubernetes cluster like Minikube you can also create a Kubernetes Volume with files you are interested in and mount it to the Spark Pods: refer docs. Be sure to mount that volume to both Driver and Executors.