How to pass variable between Chef resource blocks of same recipe - chef-recipe

We have a Chef recipe with a couple resource blocks. The first resource block is in bash and gets the value of the UUID of a logical volume and stores into variable $uuid.
# Get UUID value
bash 'get uuid' do
cwd "/"
code <<-EOH
uuid=$(blkid -o value -s UUID /dev/vg_volgroup/lv_logicalvolume)
EOH
end
We need to pass the variable $uuid to our second resource block:
# Mount directory, format, update fstab
mount node['mount_dir'] do
dump 1
pass 2
device #{uuid}
device_type :uuid
fstype node['fstype']
options node['options']
action [ :mount, :enable]
end
Unfortunately, this is not working. The value of $uuid is not getting passed into the second resource block.
Is there a more proper way to reference $uuid from within the second resource block? Is what I'm asking even possible?

UUID is part of filesystem2 Ohai data:
filesystem2:
by_device:
/dev/md1:
...
uuid: f49a3dc8-a0b6-4e1c-8cd3-926fa7d8ee29
There is no need to run blkid for this.
However, if you really need to do compute something in a block and use it later, you could declare uuid variable before the block and use ruby_block instead. You can also use node variable inside a ruby block. Anyway, you will be affected by Chef's two pass model and it would require further workarounds (like lazy attributes).
There is also a option to use helper method, but since UUID is part of Ohai data, I do not see any reason to even try (in this case).

Related

Pulum DigitalOcean: use outputs

I want to create some servers on DigitalOcean using Pulumi. I have the following code:
for i in range(0, amount):
name = f"droplet-{i+1}"
droplet = digitalocean.Droplet(
name,
image=_image,
region=_region,
size=_size,
)
pulumi.export(f"droplet-ip-{i+1}", droplet.ipv4_address)
This is correctly outputting the IP address of the servers on the console.
However I would like to use the IP addresses elsewhere in my Python script. Therefor I had added the droplets to a list as follows:
droplets = []
for i in range(0, amount):
name = f"droplet-{i+1}"
droplet = digitalocean.Droplet(
name,
image=_image,
region=_region,
size=_size,
)
pulumi.export(f"droplet-ip-{i+1}", droplet.ipv4_address)
droplets.append(droplet)
to then loop over the droplets as follows:
for droplet in droplets:
print(droplet.ipv4_address)
In the Pulumi output, I see the following:
Diagnostics:
pulumi:pulumi:Stack (Pulumi_DigitalOcean-dev):
<pulumi.output.Output object at 0x105086b50>
<pulumi.output.Output object at 0x1050a5ac0>
I realize that while the droplets are still being created, the IP address is unknown but I'm adding the droplets to the list after the creation.
Is there a way to know the IP addresses at some point so it can be used elsewhere in the Python script.
The short answer is that because these values are Outputs, if you want the strings, you'll need to use .apply:
https://www.pulumi.com/docs/intro/concepts/inputs-outputs/#apply
To access the raw value of an output and transform that value into a new value, use apply. This method accepts a callback that will be invoked with the raw value, once that value is available.
You can print these IPs by iterating over the list and calling the apply method on the ipv4_address output value:
...
pulumi.export(f"droplet-ip-{i+1}", droplet.ipv4_address)
droplets.append(droplet)
...
for droplet in droplets:
droplet.ipv4_address.apply(lambda addr: print(addr))
$ pulumi up
...
Diagnostics:
pulumi:pulumi:Stack (so-71888481-dev):
143.110.157.64
137.184.92.205
Outputs:
droplet-ip-1: "137.184.92.205"
droplet-ip-2: "143.110.157.64"
Depending on how you plan to use these strings in your program, this particular may may not be perfect, but in general, if you want the unwrapped value of pulumi.Output, you'll need to use .apply().
The pulumi.Output.all() also comes in handy if you want to wait for several output values to resolve before using them:
https://www.pulumi.com/docs/intro/concepts/inputs-outputs/#all
If you have multiple outputs and need to join them, the all function acts like an apply over many resources. This function joins over an entire list of outputs. It waits for all of them to become available and then provides them to the supplied callback.
Hope that helps!

Azure-data-Factory Copy data If a certain file exists

I have many files in a blob container. However I wanted to run a Stored procedure only IF a certain file (e.g. SRManifest.csv) exists on the blob Container. I used Get metadata and IF Condition on Data Factory. Can you please help me with the dynamic script for this. I tried this #bool(startswith(
activity('Get Metadata1').output.childitems.ItemName,
'SRManifest.csv')). It doesnt work.
Then I thought, what if i used #greaterOREquals(activity('Get Metadata1').output.LastModified,adddays(utcnow(),-2))But this checks the last modified within 2 days of the Bloob not the file exist. Thank you.
Please see below my diagram
I have understood your requirement differently I think.
I wanted to run a Stored procedure only IF a certain file (e.g. SRManifest.csv) exists on the blob Container
1 Change your metadata activity to look for existence of sentinel file (SRManifest.csv)
2 Follow with an IF activity, use this condition:
3 Put your sp in the True part of the IF activity
If you also needed the file list passed to the sp then you'll need the GetMetadata with childitems option inside the IF-True activity
Based on your diagram, since you are looping over all the blob names already, you can add a Boolean variable to the pipeline and set its default value to false:
Inside the ForEach activity, you only want to attempt to set the variable if the value is still false, and if the blob name is found, set it to true. Since Set Variable cannot be self-referential, do this inside the False branch of an If activity:
This will only attempt to process if the value is false (so the file name has not been found yet), and will do nothing if the value is true. Now set the variable based on your file name:
[NOTE: This value can be hard coded, parameterized, or based on a variable]
When you execute the pipeline, you'll see the Set Variable stops attempting once the value is set to true:
In the main pipeline, after the ForEach activity has completed, you can use the variable to set the condition of your final If activity. If the blob is never found, it will still be false, so put the Stored Procedure activity inside the True branch.

How to get the id of the run from within a component?

I'm doing some experimentation with Kubeflow Pipelines and I'm interested in retrieving the run id to save along with some metadata about the pipeline execution. Is there any way I can do so from a component like a ContainerOp?
You can use kfp.dsl.EXECUTION_ID_PLACEHOLDER and kfp.dsl.RUN_ID_PLACEHOLDER as arguments for your component. At runtime they will be replaced with the actual values.
I tried to do this using the Python's DSL but seems that isn't possible right now.
The only option that I found is to use the method that they used in this sample code. You basically declare a string containing {{workflow.uid}}. It will be replaced with the actual value during execution time.
You can also do this in order to get the pod name, it would be {{pod.name}}.
Since kubeflow pipeline relies on argo, you can use argo variable to get what you want.
For example,
#func_to_container_op
def dummy(run_id, run_name) -> str:
return run_id, run_name
#dsl.pipeline(
name='test_pipeline',
)
def test_pipeline():
dummy('{{workflow.labels.pipeline/runid}}', '{{workflow.annotations.pipelines.kubeflow.org/run_name}}')
You will find that the placeholders will be replaced with the correct run_id and run_name.
For more argo variables: https://github.com/argoproj/argo-workflows/blob/master/docs/variables.md
To Know what are recorded in the labels and annotation in the kubeflow pipeline run, just get the corresponding workflow from k8s.
kubectl get workflow/XXX -oyaml
create_run_from_pipeline_func which returns RunPipelineResult, and has run_id attribute
client = kfp.Client(host)
result = client.create_run_from_pipeline_func(…)
result.run_id
Your component's container should have an environment variable called HOSTNAME that is set to its unique pod name, from which you derive all necessary metadata.

How to reference a DAG's execution date inside of a `KubernetesPodOperator`?

I am writing an Airflow DAG to pull data from an API and store it in a database I own. Following best practices outlined in We're All Using Airflow Wrong, I'm writing the DAG as a sequence of KubernetesPodOperators that run pretty simple Python functions as the entry point to the Docker image.
The problem I'm trying to solve is that this DAG should only pull data for the execution_date.
If I was using a PythonOperator (doc), I could use the provide_context argument to make the execution date available to the function. But judging from the KubernetesPodOperator's documentation, it seems that the Kubernetes operator has no argument that does what provide_context does.
My best guess is that you could use the arguments command to pass in a date range, and since it's templated, you can reference it like this:
my_pod_operator = KubernetesPodOperator(
# ... other args here
arguments=['python', 'my_script.py', '{{ ds }}'],
# arguments continue
)
And then you'd get the start date like you'd get any other argument provided to a Python file run as a script, by using sys.argv.
Is this the right way of doing it?
Thanks for the help.
Yes, that is the correct way of doing it.
Each Operator would have template_fields. All the parameters listed in template_fields can render Jinja2 templates and Airflow Macros.
For KubernetesPodOperator, if you check docs, you would find:
template_fields = ['cmds', 'arguments', 'env_vars', 'config_file']
which means you can pass '{{ ds }}'to any of the four params listed above.

How to get an option's name in a Rundeck JOB?

I have a JOB rundeck called "TEST"
I have an option called country
this option retreives a list of key, value from a remote URL as :
[
{"name":"FRANCE", "value":"FR"},
{"name":"ITALY", "value":"IT"},
{"name":"ALGERIA", "value":"DZ"}
]
I would like to use both of the name and the value in a job step.
echo ${option.country.name}
echo ${option.country.value}
But this doesn't work and I'm not able to get the name of the parameter
getting the value can be done using ${option.country}
Is there any trick to get the parameter name ???
Just for the record answer: Maybe the best approach is to create some script-step that reads the JSON file and extracts the name, also, you can use the same value name like this example (of course is not applicable for all cases).