I am building a processing pipeline for genomic data for my master thesis and I am using Argo.
Basically, I have a fully functioning processing workflow implemented in Argo Workflows and now I am trying to create an EventSource for detecting when a folder is written by the sequencer (then the folder name should be passed to the workflow through a Sensor).
The first problem is that the sequencer takes some time to write all the data, thus I cannot start the workflow as soon as the base directory is created. Therefore, the idea is to wait for a specific file inside the new run folder to be created, then start the workflow.
For simulating this, I am coping an old run folder inside the watched directory.
Now, I have implemented the following EventSource, which does not listen to the specific file mentioned before, but just to the run folder and it works, the event is detected.
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
name: directory-event-source
namespace: tesi-fabrici
spec:
template:
container:
volumeMounts:
- mountPath: /test_dir
name: test-dir
volumes:
- name: test-dir
nfs:
server: 10.128.2.231
path: /tesi_fabrici
file:
directoryCreated:
watchPathConfig:
directory: "/test_dir/watched_dir/"
path: "210818_M70903_0027_000000000-JVRB4"
# pathRegexp: TODO with regex
eventType: CREATE
Now, I simulated what was said before, by copying all the data except for that one file and lastly, copying that file. Following the script for doing this.
#!/bin/bash
inputDirName=$1
inputDirPath=$2
sampleSheet=$3
outputPath=$4
rsync -hr --progress "$inputDirPath$inputDirName" $outputPath --exclude $sampleSheet
rsync -hr --progress "$inputDirPath${inputDirName}/$sampleSheet" "$outputPath$inputDirName"
And I run it from a pod in the cluster (with the same nfs folder mounted) as below:
./copy_script.sh 210818_M70903_0027_000000000-JVRB4 /external_prod_dir/AREA/MiSeqDx/ SampleSheet.csv /external_test_dir/watched_dir/
The file in question is the SampleSheet.csv. Now I modified the EventSource as it follows in order to listen to the creation of the sample sheet:
...
...
file:
directoryCreated:
watchPathConfig:
directory: "/test_dir/watched_dir/"
path: "210818_M70903_0027_000000000-JVRB4/SampleSheet.csv"
# pathRegexp: TODO with regex
eventType: CREATE
The data gets copied correctly, but in this case, the EventSource is not detecting the creation of the SampleSheet.csv.
By doing some testing, I noticed that the field path: expects a file or a folder, but the EventSource does not work when I use a path, like in my case.
Solving this particular case could be easy, I change the EventSource as it follows
...
...
file:
directoryCreated:
watchPathConfig:
directory: "/test_dir/watched_dir/210818_M70903_0027_000000000-JVRB4/"
path: "SampleSheet.csv"
# pathRegexp: TODO with regex
eventType: CREATE
and the creation of the sample sheet gets caught, but there is going to be only what's written in path: and I would need also the run folder name.
But the problem is, in a real scenario, the run folder names change but follow the same pattern as the folder I am using here (210818_M70903_0027_000000000-JVRB4). Therefore my plan was to use a regex to capture [path_of_new_run_folder]/SampleSheet.csv, and I don't think I can use a regex in the directory: but only in pathRegexp:
I hope I was pretty clear in what my problem is and please let me know how can I solve this.
Related
I supply the below cloud-init script through Azure portal in creating a VM. and the script never runs. appreciate if anyone can suggest what's wrong with my #cloud-config upload.
observation -
ubuntuVMexscript.sh is written
test.sh is NOT written in home directory
/etc/cloud/cloud.cfg doesn't show the change of [scripts-user, always] in final modules.
#cloud-config
package_upgrade: true
write_files:
- owner: afshan:afshan
path: /var/lib/cloud/scripts/per-boot/ubuntuVMexscript.sh
permissions: '0755'
content: |
#!/bin/sh
cat > testCat < /var/lib/cloud/scripts/per-boot/ubuntuVMexscript.sh
- owner: afshan:afshan
path: /home/afshan/test.sh
permissions: '0755'
content: |
#!/bin/sh
echo "test"
cloud_final_modules:
- rightscale_userdata
- scripts-vendor
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- [scripts-user, always]
- ssh-authkey-fingerprints
- keys-to-console
- phone-home
- final-message
- power-state-change
write_files runs before any user/group creation. Does the afshan user exist when write_files is being run? If not, attempting to set the own on the first file will throw an exception, and the write_files module will exit before attempting to create the second file. You can see if this is happening by checking /var/log/cloud-init.log on your instance.
/etc/cloud/cloud.cfg won't get updated by user data. It will stay as-is on disk, but your user data changes will get merged on top of it.
scripts-user refers to scripts written to /var/lib/cloud/instance/scripts. You haven't written anything there, so I'm not sure the purpose of your [scripts-user, always] change. If you're just looking to run a script every boot, the scripts-per-boot module (without any changes) should be fine. Every boot, it will run what's written to /var/lib/cloud/scripts/per-boot
Currently I do this:
configMapGenerator:
- name: sql-config-map
files:
- "someDirectory/one.sql"
- "someDirectory/two.sql"
- "someDirectory/three.sql"
and I would like to do sth. like this:
configMapGenerator:
- name: sql-config-map
files:
- "someDirectory/*.sql"
Is this somehow possible?
Nope.
See discussion around that feature in comment on "configMapGenerator should allow directories as input"
The main reason:
To move towards explicit dependency declaration, we're moving away from allowing globs in the kustomization file
This command works fine and will edit your kustomization.yaml:
kustomize edit add configmap my-configmap --from-file="$PWD/my-files/*"
The my-files directory has to be in the same folder that the kustomization.yaml file.
I have an environment.yml shown as follow, I would like to read out the content of the name variable (core-force) and set it as a value of the global variable in my azure-pipeline.yamal file how can I do it?
name: core-force
channels:
- conda-forge
dependencies:
- click
- Sphinx
- sphinx_rtd_theme
- numpy
- pylint
- azure-cosmos
- python=3
- flask
- pytest
- shapely
in my azure-pipeline.yml file I would like to have something like
variables:
tag: get the value of the name from the environment.yml aka 'core-force'
Please check this example:
File: vars.yml
variables:
favoriteVeggie: 'brussels sprouts'
File: azure-pipelines.yml
variables:
- template: vars.yml # Template reference
steps:
- script: echo My favorite vegetable is ${{ variables.favoriteVeggie }}.
Please note, that variables are simple string and if you want to use list you may need do some workaraund in powershell in place where you want to use value from that list.
If you don't want to use template functionality as it is shown above you need to do these:
create a separate job/stage
define step there to read environment.yml file and set variables using REST API or Azure CLI
create another job/stage and move you current build defitnion into there
I found this topic on developer community where you can read:
Yaml variables have always been string: string mappings. The doc appears to be currently correct, though we may have had a bug when last you visited.
We are preparing to release a feature in the near future to allow you to pass more complex structures. Stay tuned!
But I don't have more info bout this.
Global variables should be stored in a separate template file. This file ideally would be in a separate repo where other repos can refer to this.
Here is another answer for this
I have Ansible role, for example
---
- name: Deploy app1
include: deploy-app1.yml
when: 'deploy_project == "{{app1}}"'
- name: Deploy app2
include: deploy-app2.yml
when: 'deploy_project == "{{app2}}"'
But I deploy only one app in one role call. When I deploy several apps, I call role several times. But every time there is a lot of skipped tasks output (from tasks which do not pass condition), which I do not want to see. How can I avoid it?
I'm assuming you don't want to see the skipped tasks in the output while running Ansible.
Set this to false in the ansible.cfg file.
display_skipped_hosts = false
Note. It will still output the name of the task although it will not display "skipped" anymore.
UPDATE: by the way you need to make sure ansible.cfg is in the current working directory.
Taken from the ansible.cfg file.
ansible will read ANSIBLE_CONFIG,
ansible.cfg in the current working directory, .ansible.cfg in
the home directory or /etc/ansible/ansible.cfg, whichever it
finds first.
So ensure you are setting display_skipped_hosts = false in the right ansible.cfg file.
Let me know how you go
Since ansible 2.4, a callback plugin name full_skip was added to suppress the skipping of task names and skipping keyword in the ansible output. You can try the below ansible configuration:
[defaults]
stdout_callback = full_skip
Ansible allows you to control its output by using custom callbacks.
In this case you can simply use the skippy callback which will not output anything on a skipped task.
That said, skippy is now deprecated and will be removed in ansible v2.11.
If you don't mind losing colours you can elide the skipped tasks by piping the output through sed:
ansible-playbook whatever.yml | sed -nr '/^TASK/{h;n;/^skipping:/{n;b};H;x};p'
If you are using roles, you can use when to cancel the include in main.yml
# roles/myrole/tasks/main.yml
- include: somefile.yml
when: somevar is defined
# roles/myrole/tasks/somefile.yml
- name: this task will only run (and be seen in the output) if somevar is defined
debug:
msg: "Hello World"
I'm working on an ansible-playbook which should help to generate build agents for a continuous delivery pipeline. Among other issues, I'll need to install an oracle client on such an agent. I want to do something like
- name: "Provide response file"
copy: src=/custom.rsp dest=/opt/oracle
Within the custom.rsp file I've got some variables to be substituted. Normally, one could do it with a separate shell command like this:
- name: "Substitute Vars"
shell: "sed 's|<PARAMETER>|<VALUE>|g' -i /opt/oracle/custom.rsp"
I don't like it, though. There should be a more convinient way to do this. Anybody giving me a hint?
You want to be using a template rather than copying a static file.
Also, when using the copy or template modules, the dest parameter is a full path AND filename, not just a path. So if you want to end up with a copy of custom.rsp in the directory /opt/oracle then you need to do this:
- name: "Provide response file"
template: src=/custom.rsp dest=/opt/oracle/custom.rsp
I'm going to extend Bruce's answer with an example:
This is part of my inventory.yaml:
kafka_stage:
children:
kafka_with_zookeeper_stage:
kafka_only_stage:
vars:
zookeeper_hosts: "kafka-stage01:2181,kafka-stage02:2181,kafka-stage03:2181"
kafka_with_zookeeper_stage:
hosts:
kafka-stage01:
broker_id: 0
kafka-stage02:
broker_id: 1
vars:
services:
kafka:
zookeeper:
This is part of a configuration file:
# The id of the broker. This must be set to a unique integer for each broker.
broker.id={{ broker_id }}
# {{ zookeeper_hosts }}
advertised.listeners=PLAINTEXT://{{ ansible_host }}:9092
# {{ services }}
This command in a playbook:
- name: Copy to Host
ansible.builtin.template:
src: my_configfile.properties
dest: /tmp/hejsan.properties
Gave me this on the remote host kafka-stage02:
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# kafka-stage01:2181,kafka-stage02:2181,kafka-stage03:2181
advertised.listeners=PLAINTEXT://kafka-stage02:9092
# {'kafka': None, 'zookeeper': None}