How to get the id of the run from within a component? - kubernetes

I'm doing some experimentation with Kubeflow Pipelines and I'm interested in retrieving the run id to save along with some metadata about the pipeline execution. Is there any way I can do so from a component like a ContainerOp?

You can use kfp.dsl.EXECUTION_ID_PLACEHOLDER and kfp.dsl.RUN_ID_PLACEHOLDER as arguments for your component. At runtime they will be replaced with the actual values.

I tried to do this using the Python's DSL but seems that isn't possible right now.
The only option that I found is to use the method that they used in this sample code. You basically declare a string containing {{workflow.uid}}. It will be replaced with the actual value during execution time.
You can also do this in order to get the pod name, it would be {{pod.name}}.

Since kubeflow pipeline relies on argo, you can use argo variable to get what you want.
For example,
#func_to_container_op
def dummy(run_id, run_name) -> str:
return run_id, run_name
#dsl.pipeline(
name='test_pipeline',
)
def test_pipeline():
dummy('{{workflow.labels.pipeline/runid}}', '{{workflow.annotations.pipelines.kubeflow.org/run_name}}')
You will find that the placeholders will be replaced with the correct run_id and run_name.
For more argo variables: https://github.com/argoproj/argo-workflows/blob/master/docs/variables.md
To Know what are recorded in the labels and annotation in the kubeflow pipeline run, just get the corresponding workflow from k8s.
kubectl get workflow/XXX -oyaml

create_run_from_pipeline_func which returns RunPipelineResult, and has run_id attribute
client = kfp.Client(host)
result = client.create_run_from_pipeline_func(…)
result.run_id

Your component's container should have an environment variable called HOSTNAME that is set to its unique pod name, from which you derive all necessary metadata.

Related

How to select object attribute in ADF using variable

I'm trying to parametrize a pipeline in Azure Data Factory in order to enable a certain functionality to mulptiple environments. The idea is that the current environment is always available through a global parameter. I'd like to use this parameter to look up an array of environments to process data to. Example:
targetEnvs = [{ "dev": ["dev"], "test": ["dev", "test"], "acc": [], "prod": ["acc", "prod"] }]
Then one should be able to select the targetEnv array with something like targetEnvs[environment] or targetEnvs.environment. Subsequently a ForEach is used to execute some logic on these target environments.
I tried setting this up with targetEnvs as a pipeline parameter (with default value mapping each env directly to targetEnv, as follows: {"dev": ["dev"], "test": ["test"]}) Then I have a Set variable step to take value from the targetEnvs parameter, as follows:.
I'm now looking for a way to use the current environment (stored in a global parameter) instead of hardcoding "dev" in the Set Variable expression, but I'm not sure how to do this.
.
Using this expression won't even start the pipeline.
.
Question: how do I select this attribute of the object? Any other suggestions on how to do tackle this problem are welcome as well!
(Python analogy would be to have a dictionary target_envs and taking a value from it by using the key "current_env": target_envs[current_env].)
When I tried to access the object same as you, the same error occurred. I have taken the parameter targetEnv (given array) and global parameter environment with value as dev.
You can use the following dynamic content to access the key value.
#pipeline().parameters.targetEnv[0][pipeline().globalParameters.environment]

Using regex to only return some of the Loki Label values

I am setting a Variable in Grafana.
I want to create a Query, that only returns a subset of the labels with value app the ones I want to return are those ending in dev
My Query so far, returns all of the labels with value app successfully. However, I have been unable to successfully filter the values so that only a-dev b-dev and c-dev are returned.
How do I successfully apply regex (or alternative) to this query so that I can see the desired values?
Any help on this would be greatly appreciated!
I eventually figured out what I needed to do. Originally I was trying to use | to run regex on the results from label_values.
However, this format worked:
label_values({app=~".*-dev$"}, app)
and returned only a-dev b-dev c-dev as expected.

KubernetesPodOperator specify dictionary resources

Somehow if I specify resource in my KubernetesPodOperator, the DAG will fail. Looks like the pod is created at least it attempts to create it. The log says Event: XXXXX-e59a4be6 had an event of type Pending.
resource_config = {'limit_memory': 1, 'limit_cpu': 1, 'request_memory': 1, 'request_cpu': 1}
dagA = KubernetesPodOperator(
name="podA", namespace='my-app', task_id="task1", resources=resource_config,
...
If I don't specify the resource, it will run. The resource param is of type dictionary looking at code.
Did anyone have this issue?
I found a solution. Specifying a whole integer values does not work based on the examples here. This resource spec works:
resource_config = {'limit_memory': '1024Mi', 'limit_cpu': '500m'}
So I think that is the correct way for specifying the values.

How to reference a DAG's execution date inside of a `KubernetesPodOperator`?

I am writing an Airflow DAG to pull data from an API and store it in a database I own. Following best practices outlined in We're All Using Airflow Wrong, I'm writing the DAG as a sequence of KubernetesPodOperators that run pretty simple Python functions as the entry point to the Docker image.
The problem I'm trying to solve is that this DAG should only pull data for the execution_date.
If I was using a PythonOperator (doc), I could use the provide_context argument to make the execution date available to the function. But judging from the KubernetesPodOperator's documentation, it seems that the Kubernetes operator has no argument that does what provide_context does.
My best guess is that you could use the arguments command to pass in a date range, and since it's templated, you can reference it like this:
my_pod_operator = KubernetesPodOperator(
# ... other args here
arguments=['python', 'my_script.py', '{{ ds }}'],
# arguments continue
)
And then you'd get the start date like you'd get any other argument provided to a Python file run as a script, by using sys.argv.
Is this the right way of doing it?
Thanks for the help.
Yes, that is the correct way of doing it.
Each Operator would have template_fields. All the parameters listed in template_fields can render Jinja2 templates and Airflow Macros.
For KubernetesPodOperator, if you check docs, you would find:
template_fields = ['cmds', 'arguments', 'env_vars', 'config_file']
which means you can pass '{{ ds }}'to any of the four params listed above.

How to get an option's name in a Rundeck JOB?

I have a JOB rundeck called "TEST"
I have an option called country
this option retreives a list of key, value from a remote URL as :
[
{"name":"FRANCE", "value":"FR"},
{"name":"ITALY", "value":"IT"},
{"name":"ALGERIA", "value":"DZ"}
]
I would like to use both of the name and the value in a job step.
echo ${option.country.name}
echo ${option.country.value}
But this doesn't work and I'm not able to get the name of the parameter
getting the value can be done using ${option.country}
Is there any trick to get the parameter name ???
Just for the record answer: Maybe the best approach is to create some script-step that reads the JSON file and extracts the name, also, you can use the same value name like this example (of course is not applicable for all cases).