Using kfp.dls.containerOp() to run multiple scripts on Kubeflow Pipelines - kubernetes

I have been using the Kubeflow dsl container op command to run a python script on a custom for my Kubeflow pipeline. My configuration looks something like this :
def test_container_op():
input_path = '/home/jovyan/'
return dsl.ContainerOp(
name='test container',
image="<image name>",
command=[
'python', '/home/jovyan/test.py'
],
file_outputs={
'modeule-logs' : input_path + 'output.log'
}
)
Now, I also want to run a bash script called deploy.sh within the same container. I haven't seen examples of that. Is there something like
command = [
'/bin/bash', '/home/jovyan/deploy.sh',
'python', '/home/jovyan/test.py'
]
Not sure if it's possible. Would appreciate the help.

Kubeflow job is just a Kubernetes job, thus you are limited with Kubernetes job entrypoint being a single command.
However you can still chain multiple commands into a single sh command:
sh -c "echo 'my first job' && echo 'my second job'"
So that you kubeflow command can be:
command = [
'/bin/sh', '-c', '/home/jovyan/deploy.sh && python /home/jovyan/test.py'
]

Related

DevSpace hook for running tests in container after an update to the container

My ultimate goal is to have tests run automatically anytime a container is updated. For example, if update /api, it should sync the changes between local and the container. After that it should automatically run the tests... ultimately.
I'm starting out with Hello World! though per the example:
# DevSpace --version = 5.16.0
version: v1beta11
...
hooks:
- command: |
echo Hello World!
container:
imageSelector: ${APP-NAME}/${API-DEV}
events: ["after:initialSync:${API}"]
...
I've tried all of the following and don't get the desired behavior:
stop:sync:${API}
restart:sync:${name}
after:initialSync:${API}
devCommand:after:sync
At best I can just get Hello World! to print on the initial run of devspace dev -b, but nothing after I make changes to the files for /api which causes files to sync.
Suggestions?
You will need a post-sync hook for this, which is separate from the DevSpace lifecycle hooks. You can define it with the dev.sync directly and it looks like this:
dev:
sync:
- imageSelector: john/devbackend
onUpload:
execRemote:
onBatch:
command: bash
args:
- -c
- "echo 'Hello World!' && other commands..."
More information in the docs: https://devspace.sh/cli/docs/configuration/development/file-synchronization#onupload

Is it possible to submit a job to a cluster using initization script on Google Dataproc?

I am using Dataproc with 1 job on 1 cluster.
I would like to start my job as soon as the cluster is created. I found that the best way to achieve this is to submit a job using an initialization script like below.
function submit_job() {
echo "Submitting job..."
gcloud dataproc jobs submit pyspark ...
}
export -f submit_job
function check_running() {
echo "checking..."
gcloud dataproc clusters list --region='asia-northeast1' --filter='clusterName = {{ cluster_name }}' |
tail -n 1 |
while read name platform worker_count preemptive_worker_count status others
do
if [ "$status" = "RUNNING" ]; then
return 0
fi
done
}
export -f check_running
function after_initialization() {
local role
role=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${role}" == 'Master' ]]; then
echo "monitoring the cluster..."
while true; do
if check_running; then
submit_job
break
fi
sleep 5
done
fi
}
export -f after_initialization
echo "start monitoring..."
bash -c after_initialization & disown -h
is it possible? When I ran this on Dataproc, a job is not submitted...
Thank you!
Consider to use Dataproc Workflow, it is designed for workflows of multi-steps, creating cluster, submitting job, deleting cluster. It is better than init actions, because it is a first class feature of Dataproc, there will be a Dataproc job resource, and you can view the history.
Please consider to use cloud composer - then you can write a single script that creates the cluster, runs the job and terminates the cluster.
I found a way.
Put a shell script named await_cluster_and_run_command.sh on GCS. Then, add the following codes to the initialization script.
gsutil cp gs://...../await_cluster_and_run_command.sh /usr/local/bin/
chmod 750 /usr/local/bin/await_cluster_and_run_command.sh
nohup /usr/local/bin/await_cluster_and_run_command.sh &>>/var/log/master-post-init.log &
reference: https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/post-init/master-post-init.sh

ECS Task Definition - When overriding ENTRYPOINT, Docker image's CMD is dropped

I have a Docker Image built with the following CMD
# Dockerfile
...
CMD ["nginx", "-g", "daemon off;"]
When my task definition does not include entryPoint or command the task successfully enters a running state.
{
"containerDefinitions": [
{
"image": "<myregistry>/<image>",
...
}
]
}
I need to run an agent in some instances of this container, so I am using an entrypoint for this task to run my agent. The problem is when I add an entryPoint parameter to the task definition, the container starts and immediately stops.
This is what I'm doing to add the entryPoint:
{
"containerDefinitions": [
{
"image": "<myregistry>/<image>",
...
"entryPoint": [
"custom-entry-point.sh"
],
}
]
}
And here is the contents of custom-entry-point.sh:
#!/bin/bash
/myagent &
echo "CMD is: $#"
exec "$#"
To confirm my suspicion that CMD is dropped, the logs just show:
CMD is:
If I add the CMD array from the Dockerfile to the task definition with the command parameter, it works fine and the task starts:
{
"containerDefinitions": [
{
"image": "<myregistry>/<image>",
...
"entryPoint": [
"custom-entry-point.sh"
],
"command": [
"nginx",
"-g",
"daemon off;"
}
]
}
And the logs show the expected:
CMD is: nginx -g daemon off;
I have many docker images with various iterations of CMD, I do not want to have to copy these into my task definitions. It seems that just adding only an entryPoint to a task definition should not override a docker image's CMD with an empty value.
Hoping some ECS / fargate experts can help shed some light on a path forward.
Some tips:
Check if your entrypoint script is executable
Use absolute path to your entrypoint script
Check logs to see the error. hopefully you autoconfigured awslog driver?
Have you successfully run the entrypoint version on your local?
Also have a read of this for some useful background:
https://aws.amazon.com/blogs/opensource/demystifying-entrypoint-cmd-docker/
I don't think this has anything to do with ECS. This is how Docker behaves, and there's no way to change it as far as I know.
See https://docs.docker.com/engine/reference/builder/
If CMD is defined from the base image, setting ENTRYPOINT will reset CMD to an empty value. In this scenario, CMD must be defined in the current image to have a value.
This particular snippet only refers to defining a new ENTRYPOINT in the image, but this Github discussion confirms the same behavior holds when overriding ENTRYPOINT at runtime.
I got the same problem, with my entrypoint and command attributes being something like sh -c .... I needed to delete sh -c, put the commands directly, and add #!/bin/sh at the top of my scripts.

How to run a python function or script somescript.py on the KubernetesPodOperator in airflow?

I am running a Celery Executor and I'm trying to run some python script in the KubernetesPodOperator. Below are examples of what I have tried that didn't work. What am I doing wrong?
Running sctipt
org_node = KubernetesPodOperator(
namespace='default',
image="python",
cmds=["python", "somescript.py" "-c"],
arguments=["print('HELLO')"],
labels={"foo": "bar"},
image_pull_policy="Always",
name=task,
task_id=task,
is_delete_operator_pod=False,
get_logs=True,
dag=dag
)
Running function load_users_into_table()
def load_users_into_table(postgres_hook, schema, path):
gdf = read_csv(path)
gdf.to_sql('users', con=postgres_hook.get_sqlalchemy_engine(), schema=schema)
org_node = KubernetesPodOperator(
namespace='default',
image="python",
cmds=["python", "somescript.py" "-c"],
arguments=[load_users_into_table],
labels={"foo": "bar"},
image_pull_policy="Always",
name=task,
task_id=task,
is_delete_operator_pod=False,
get_logs=True,
dag=dag
)
The script somescript.py must be in Docker image.
Step-1: let's create a image https://docs.docker.com/develop/develop-images/dockerfile_best-practices/.
FROM python:3.8
# copy requirement.txt from local to container
COPY requirements.txt requirements.txt
# install dependencies into container (geopandas, sqlalchemy)
RUN pip install -r requirements.txt
# copy the python script from local to container
COPY somescript.py somescript.py
ENTRYPOINT [ "python", "somescript.py"]
Step-2: Build and push the image into public Docker repository https://hub.docker.com.
NB: kubernetes_pod_operator looks for image from public docker repo
# build image
docker build -t my-python-img:latest .
# test if your image works perfectly
docker run my-python-img:latest
# push image.
docker tag my-python-img username/my-python-img
docker push username/my-python-img
docker pull username/my-python-img
step-3: Lest's create k8s task.
p = KubernetesPodOperator(
namespace='default',
image='username/my-python-img:latest',
labels={'dag-id': dag.dag_id},
name='airflow-my-image-pod',
task_id='load-users',
in_cluster=False, #False: local, True: cluster
cluster_context='microk8s',
config_file='/usr/local/airflow/include/.kube/config',
is_delete_operator_pod=True,
get_logs=True,
dag=dag
)
If you don't understand where configuration file comes from, look here: https://www.astronomer.io/docs/cloud/stable/develop/kubepodoperator-local.
Finally: I want to mention something important when working with databases (credentials). Kubernetes offers the use secret to secure sensitive information. https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
KubernetesPodOperator launches a Kubernetes pod that runs a container as specified in the operator's arguments.
First Example
In the first example, the following happens:
KubernetesPodOperator instructs K8s to lunch a pod and prepare to run a container in it using the python image (the image parameter) from hub.docker.com (the default image registry)
ENTRYPOINT of the python image is replaced by ["python", "somescript.py" "-c"] (the cmd parameter)
CMD of the python image is replaced by ["print('HELLO')"] (the arguments parameter)
...
The container is run
So, the complete command that is run in the container is
python somescript.py -c print('HELLO')
Obviously, the official Python image from Docker Hub does not have somescript.py in its working directory. Even if did, it probably would have been not the one that you wrote. That is why the command fails with something like:
python: can't open file 'somescrit.py': [Errno 2] No such file or directory
Second Example
In the second example, pretty much the same happens as in the first example, but the command that is run in the container (again based on the cmd and arguments parameters) is
python somescript.py -c None
(None is the string representation of the load_users_into_table()'s return value)
This command fails, because of the same reasons as in the first example.
How It Could be Done (a Sketch)
You could build a Docker image with somescript.py and all its dependencies. Push the image to an image registry. Specify the image, ENTRYPOINT, and CMD in the corresponding parameters of KubernetesPodOperator.

Node - Run last bash command

What is happening:
user starts local react server via any variation of npm [run] start[:mod]
my prestart script runs and kills the local webserver if found
Once pkill node is run, that kills the npm start script as well so I want to run the starting command again.
My current solution is to do
history 1 | awk '/some-regex/
to get the name of the last command which I can run with
exec('bash -c 'sleep 1 ; pkill node && ${previousCommand}' &')
This is starting to get pretty hacky so I'm thinking there has to be a better way to do this.
My node script so far:
const execSync = require("child_process").execSync;
const exec = require("child_process").exec;
const netcat = execSync('netcat -z 127.0.0.1 3000; echo $?') == 1 ? true : false; // true when :3000 is available #jkr
if(netcat == false) {
exec(`bash -c 'sleep 1 ; pkill node' &`);
console.warn('\x1b[32m%s\x1b[0m', `\nKilling all local webservers, please run 'npm start' again.\n`);
}
There seems to be an npm package which does this: kill-port
const kill = require('kill-port')
kill(port, 'tcp').then(console.log).catch(console.log)
Source: https://www.npmjs.com/package/kill-port
I understand this might not answer the question of running last command but should solve OP's problem.