Unable to read local files in spark kubernetes cluster mode - scala

I am facing an issue while reading a file stored in my system in spark cluster mode program.It is giving me an error that "File not found" but file is present at defined location.Please suggest me some idea so that i can read local file in spark cluster using kubernetes.

You cannot refer local files on your machine when you submit Spark on Kubernetes.
The available solutions for your case might be:
Use Resource staging server. Is not available in the main branch of Apache Spark codebase, so the whole integration is on your side.
Put your file to the http/hdfs accessible location: refer docs
Put your file inside Spark Docker image and refer it as local:///path/to/your-file.jar
If you are running local Kubernetes cluster like Minikube you can also create a Kubernetes Volume with files you are interested in and mount it to the Spark Pods: refer docs. Be sure to mount that volume to both Driver and Executors.

Related

Where are the setup files or installed files on Kubernetes. Where are these installed on Linux or Google Cloud?

I have used Kubernetes and I deployed for example WordPress or nginx or etc. We install from yaml file. Where is it installed how can i find directory of pages(for example WordPress pages etc.) at same point at Google Cloud too. When I use Kubernetes at Google Cloud where is the path of installed files(ex. index.php).
If you are running the docker image directly without attaching anything like NFS, S3 or Disk then you will be able to get those files by default in the container file system(index.php and all).
With any K8s cluster you check files inside container either Gcloud or any :
kubectl get pods
kubectl exec -it <Wordpress pod name> -- /bin/bash
If you are attaching the File system like NFS, or object storage S3 or EFS you will be able to watch those files there unless you mount and apply config using the YAML file.
Regarding setup file (YAML),
Kubernetes uses the ETCD database as a data store. The flow is like this. Kubectl command connect to API server and sends the YAML file to API server. API parses and store the information in ETCD database so you wont be getting those file as it is in YAML format.

Apache common io with kubernetes

Apache common io libraries used for file deletion in kubernetes persistent volume not working as expected. FileUtils.forceDelete is used to delete a file in kubernetes pv but this does not delete the file. User running this APP has full permission to r/w/x.

How to access Google Cloud Composer's data folder from a pod launched using KubernetesPodOperator?

I have a Google Cloud Composer 1 environment (Airflow 2.1.2) where I want to run an Airflow DAG that utilizes the KubernetesPodOperator.
Cloud Composer makes available to all DAGs a shared file directory for storing application data. The files in the directory reside in a Google Cloud Storage bucket managed by Composer. Composer uses FUSE to map the directory to the path /home/airflow/gcs/data on all of its Airflow worker pods.
In my DAG I run several Kubernetes pods like so:
from airflow.contrib.operators import kubernetes_pod_operator
# ...
splitter = kubernetes_pod_operator.KubernetesPodOperator(
task_id='splitter',
name='splitter',
namespace='default',
image='europe-west1-docker.pkg.dev/redacted/splitter:2.3',
cmds=["dotnet", "splitter.dll"],
)
The application code in all the pods that I run needs to read from and write to the /home/airflow/gcs/data directory. But when I run the DAG my application code is unable to access the directory. Likely this is because Composer has mapped the directory into the worker pods but does not extend this courtesy to my pods.
What do I need to do to give my pods r/w access to the /home/airflow/gcs/data directory?
Cloud Composer uses FUSE to mount certain directories from Cloud Storage into Airflow worker pods running in Kubernetes. It mounts these with default permissions that cannot be overwritten, because that metadata is not tracked by Google Cloud Storage. A possible solution is to use a bash operator that runs at the beginning of your DAG to copy files to a new directory. Another possible solution can be to use a non-Google Cloud Storage path like a /pod path.

Do not see created file in the host directory mounted by hostPath

I have a kubernetes cluster of 1 master & 1 worker. I deployed 1 nodejs application and mounted a directory /home/user_name/data using hostPath volume type to the pod. The type I set to Directory for the volume.
The nodejs application is working perfectly and writing data to and returning the saved data without any error.
Initially, I also tested by deleting the data folder and trying to apply the deployment and I received error as the volume type is Directory. So it looks the mount is correctly pointing to the directory in the worker node. But when I am trying to see the file which should be created by the nodejs app, I do not see any file in the host directory.
I also checked if the pod is running on worker node and it's running on worker node only.
Mu understanding about hostPath volume type is that it's similar to bindMount volume type of docker.
I have not idea why I cannot see the file which the nodejs app is creating and working perfectly. Any help will be appreciated.

how to use binary file to create a cluster in bare metal

Already to download the all binary about kubernetes,
the directory:
~/vagrant/kubernetes/server/kubernetes/server/bin$ ls
federated-apiserver kubelet
hyperkube kubemark
kube-apiserver kube-proxy
kube-apiserver.docker_tag kube-proxy.docker_tag
kube-apiserver.tar kube-proxy.tar
kube-controller-manager kube-scheduler
kube-controller-manager.docker_tag kube-scheduler.docker_tag
kube-controller-manager.tar kube-scheduler.tar
kubectl
Can use these binary directly to create a cluster?
Yes, but unfortunately it is a non-trivial task to start with plain binaries and end up with a fully functional cluster.
To create a cluster, I'd recommend following one of the many supported solutions. If you want to create a cluster without using one of the existing scripts, you can follow the Creating a Custom Cluster from Scratch guide.
I downloaded the tar.gz(flannel\etcd\kubernetes) and modified download-release.sh by changing curl to tar local files directly.Then i run kube-up.sh and created a cluster with downloaded files.