Use local script as source for Argo workflow - argo-workflows

I have a python script that I'd like to execute on cloud using an Argo workflow.
Currently, I'm alternating between copying the source code to the workflow itself (using copy and paste), which is inconvenient and causes issues.
The second options is uploading my project directory to an s3 bucket, then downloading the source code to the Argo pod, then running the commands.
Both methods require some actions to sync the source code after I modify the script.
Is there a way to specify on the Argo workflow from where it should take the source code from?
Say, instead of creating a script template that takes the source from a string specified in the .yml file - take it from a local file by specifying a local path?
Prefer not to use Git for that
Also, if possible would prefer solutions with support for attaching additional dependencies source code files

If you have something more sophisticated than a simple script that you use in the .yml file, it might be worthwhile to use a docker image with a container template that you pre-build for your workflow.
The image will be named my-script and the entrypoint my-entrypoint
Assuming the script is in python called script.py you can have the following files:
workflow.yml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: workflow-name-
spec:
entrypoint: my-entrypoint
templates:
- name: my-entrypoint
container:
image: my-script
command: python3
args:
- script.py
script.py
import requests
response = requests.get('www.google.com')
print(response.status_code)
requirements.txt
requests
Dockerfile
FROM python:3.11.1-slim
COPY . .
RUN pip3 install -r requirements.txt
CMD python3 script.py
Assuming you can build your image to the cluster (in the case of minikube). You'd run:
docker build -t my-script .
This approach also makes your code testable, should you decide to have tests. For this it is not necessary to use Git, although I'd encourage you to use it for collaboration and versioning. Also the COPY command in the Dockerfile copies all files in your directory to the image, so you'd have other information readily available. I would discourage you to copy actual data in this way, but rather use argo parameters and artifacts.
Check out https://argoproj.github.io/argo-workflows/workflow-concepts/ for more info

Related

There does not seem to be a good substitute for core.exportVariable in github-script right now

Every time we use core.exportVariable which, as far as I know, is the canonical way to export a variable in #action/core and, consequently, in github-script, you get an error such as this one:
Warning: The set-env command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
That link leads to an explanation of environment files, which, well, are files. Problem is files do not seem to have such a great support in github-script. There's the #actions/io package, but there's no way to create a file with that.
So is there something I'm missing, or there is effectively no way to create an environment file form inside a github-script step?
You no longer need the actions/github-script nor any other special API to export an environment variable. According to the Environment Files documentation, you can simply write to the $GITHUB_ENV file directly from the workflow step like this:
steps:
- name: Set environment variable
run: echo "{name}={value}" >> $GITHUB_ENV
The step will expose given environment variable to subsequent steps in the currently executing workflow job.

How to include a PowerShell script file in a GitLab CI YAML file

Currently I have a large Bash script in my GitLab CI YAML file. The example below is how I am grouping my functions so that I can use them during my CI process.
.test-functions: &test-functions |
function write_one() { echo "Function 1" }
function write_two() { echo "Function 2"}
function write_three() { echo "Function 3"}
.plugin-nuget:
tags:
- test-es
- kubernetes
image: mygitlab-url.com:4567/containers/dotnet-sdk-2.2:latest
script:
- *test-functions
- write_one
- write_two
- write_three
The example below shows how we can include a YAML file inside another one:
include:
- project: MyGroup/GitlabCiPlugins/Dotnet
file: plugin-test.yml
ref: JayCiTest
I would like to do the same thing with my script. Instead of having the script in the same file as my YAML, I would like to include the file, so that my YAML has access to my script's functions. I would also like to use PowerShell instead of Bash if possible.
How can I do this?
Split shell scripts and GitLab CI syntax
GitLab CI has no feature "include file content to script block".
GitLab CI include feature doesn't import YAML anchors.
GitLab CI can't concat arrays. So you cant write before_script in one .gitlab-ci.yml file and then use it in before_script in another. You can only rewrite it, not concat.
Because of all of these problems you can't easily manage your scripts; split them, organize them and do another nice developer's decomposition stuff.
There are possible workarounds. You can store your scripts somewhere else; where a gitlab runner could reach them, and then inject them to your current job environment via source /ci-scripts/my-script.sh in before_script block.
Possible locations for storing ci scripts:
Special docker image with all your build/test/deploy utils and ci scripts
The same, but dedicated build server
You can deploy simple web page containing your scripts and download and import then in before_script. Just in case, make sure nobody, except gitlab runner could access it.
Using powershell
You can use powershell only if you installed your GitLab Runner on Windows. You can't use anything else in that case.

How to run a pytest-bdd test?

I am not understanding how to properly run a simple test(feature file and python file)
with the library pytest-bdd.
From the official documentation, I can't understand what command to issue to run a test.
I tried using pytest command, but I saw the NO test ran.
Do I need to use another library behave to run a feature file?
I figured out trying for 2 days,that ,
for running a pytest-bdd test, there are certain requirements, at least in my view.
put both the feature file and python file in the same directory (maybe this can be changed with configuration files)
the python file name needs to start with test_
the python file needs to contain a method of which name will start with test_
the method starting with test_ , need to be assigned to the #scenario sentence
to run the test, issue pytest command in the same directory(maybe it is also configurable)
After issuing you will only see the method with the name starting with test_ has passed, but all the tests actually ran. To test, you can assert False in any #when or #then annotated method, it will throw errors.
The system contained : pytest-bdd==3.0.2 (copied from pip freeze output)
Features files and python files can be placed in different folders using the bdd_features_base_dir hook provided by pytest-bdd; I think it is better having features files in different folders too.
Here you can see a working example (a simple hello world BDD test):
https://github.com/davidemoro/pytest-play-docker/tree/master/tests
https://github.com/davidemoro/pytest-play-docker/blob/master/tests/pytest.ini (see bdd_features_base_dir in [pytest] section)
https://github.com/davidemoro/pytest-play-docker/tree/master/tests/bdd
If you want to try out pytest-bdd without installation you can use Docker. Create a folder with inside your pytest BDD files and if you want a separate features folder targeted in bdd_features_base_dir and run:
docker run --rm -it -v $(pwd):/src davidemoro/pytest-play:latest
I've found out, that in the python file you don't have to put:
the method starting with test_ , need to be assigned to the #scenario sentence
You can just add: scenarios("") - to allow the tests to be started, which are using steps defined in this specific python file.
Remember to import scenarios!: from pytest_bdd import scenarios
Example:
Code example
Command..
pytest -v path_to_test_file.py
Things to note here..
Check format of feature file as filename.feature
Always __init__ modules, otherwise test-runner will not find test files
Glue right step definitions to test function
Add feature in features module
If you are using python3 execute test with python3
So,
python3 -m pytest -v path_to_test_file.py
Documentation
https://pytest-bdd.readthedocs.io/en/stable/#

How to run jupyter notebook in airflow

My code is written in jupyter and saved as .ipynb format.
We want to use airflow to schedule the execution and define the dependencies.
How can the notebooks be executed in airflow?
I know I can convert them to python files first but the graphs generated on the fly will be difficult to handle.
Is there are any easier solution? Thanks
You can also use combination of airflow + papermill.
Papermill
Papermill is a tool for running jupyter notebooks with parameters: https://github.com/nteract/papermill
Running a jupyter notebook is very easy, you can do it from python script:
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters = dict(alpha=0.6, ratio=0.1)
)
or from CLI:
$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
and it will run a notebook from the input path, create a copy in the output path and update this copy after each cell run.
Airflow Integration
To integrate it with Airflow, there is a dedicated papermill operator for running parametrized notebooks: https://airflow.readthedocs.io/en/latest/howto/operator/papermill.html
You can setup the same input/output/paramters arguments directly in the DAG definition and use the templating for the aifrlow variables:
run_this = PapermillOperator(
task_id="run_example_notebook",
dag=dag,
input_nb="/tmp/hello_world.ipynb",
output_nb="/tmp/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
)
We encountered this problem before and spent quite a couple of days to solve it.
We packaged it as a docker file and published on github https://github.com/michaelchanwahyan/datalab.
It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph generated on the fly can also be updated and saved within inside the notebook.
When it is executed
the notebook will be read and inject the parameters
the notebook is executed and the output will overwrite the original path
Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.
Another alternative is to use Ploomner (disclaimer: I'm the author). It uses papermill under the hood to build multi-stage pipelines. Tasks can be notebooks, scripts, functions, or any combination of them. You can run locally, Airflow, or Kubernetes (using Argo workflows).
This is how a pipeline declaration looks like:
tasks:
- source: notebook.ipynb
product:
nb: output.html
data: output.csv
- source: another.ipynb
product:
nb: another.html
data: another.csv
Repository
Exporting to Airflow
Exporting to Kubernetes
Sample pipelines

Building Artifactory fails for Build Stage in Delivery Pipeline

I have created a toolchain, which downloads the code from the bitbucket repository and builds the docker image in IBM Cloud.
After the code builds the image, the build stage fails while building the artifactory.
Error:
Preparing the build artifacts...
Customer script does not exist for the job, exitting
I have specified the Build archive directory as the folder name. Do I need to write any scripts for archiving?
That particular error occurs when one of our checks -- the existence of /home/pipeline/$TASK_ID/_customer_script.sh -- fails.
Archiving happens automatically but that file needs to be present as we use it as part of the traceability around how the artifact was created. Is it possible that file is getting removed? (Also will look into removing or making the check non-fatal however that will take time)
This issue appears to be caused by setting a working directory for the job. _customer_script.sh gets dropped into the working directory, but the script Simon is referring to (/opt/IBM/pipeline/bin/ids-buildables-notify.sh) only checks the top-level directory the code input is at (/home/pipeline/$TASK_ID/).
Three options to fix this, assuming you're doing a container registry job:
Run cp _customer_script.sh /home/pipeline/$TASK_ID in your script. The ids-buildables-notify.sh script does some grepping for your bx cr build call, so make sure that's still in there.
touch /home/pipeline/$TASK_ID/_customer_script.sh and export PIPELINE_IMAGE_URL=<your image url>. If PIPELINE_IMAGE_URL is set, the notify script doesn't bother with being clever, which I prefer.
Don't change the working directory.
A script which works for me:
#!/bin/bash
echo -e "Build environment variables:"
echo "REGISTRY_URL=${REGISTRY_URL}"
echo "REGISTRY_NAMESPACE=${REGISTRY_NAMESPACE}"
echo "IMAGE_NAME=${IMAGE_NAME}"
echo "BUILD_NUMBER=${BUILD_NUMBER}"
echo -e "Building container image"
set -x
export PIPELINE_IMAGE_URL=$REGISTRY_URL/$REGISTRY_NAMESPACE/$IMAGE_NAME:$BUILD_NUMBER
bx cr build -t $PIPELINE_IMAGE_URL .
set +x
touch /home/pipeline/$TASK_ID/_customer_script.sh