Find nested key-value pair in yaml - yq

I'm trying to use yq to find if a key-value pair exists in a yaml.
Here's an example yaml:
houseThings:
- houseThing:
thingType: chair
- houseThing:
thingType: table
- houseThing:
thingType: door
I just want an expression that evaluates to true (or any value, or exits with zero status) if the key-value pair of thingType: door exists in the yaml above.
The best I can do so far is find if the value exists by recursively walking all nodes and checking their value:
yq eval '.. | select(. == "door")' my_file.yaml which returns door. But I also want to make sure thingType is its key.

You could use the select statement under houseThing as
yq e '.houseThings[].houseThing | select(.thingType == "door")' yaml
or do a recursive look for it
yq e '.. | select(has("thingType")) | select(.thingType == "door")' yaml

Related

Use yq to query the full path in dotted notation

Motivation
I have a simple docker-compose.yaml file which is of this structure
services:
foo:
image: docker.registry.url:version
foo2:
image: docker.registry.url2:version
foo3:
image: docker.registry.url3:version
And I can easly do:
GET
yq '.services.foo.image' docker-compose.yaml
docker.registry.url:version
SET
yq -i '.services.foo.image = "foo"' docker-compose.yaml
Wish
I don't know how many services I'll have but I want to loop over all of them and fix the URL of the registry in case it comes from my registry and needs some updates.
Basically I would like to extract all keys in a way they can be used in a query again - similar to what is in the Motivation as an SET example.
yq <<magic command>> docker-compose.yaml
.services.foo.image .services.foo2.image .services.foo3.image
And using these keys I can then loop over it using:
for key in .asdf.asdf. .asdf.; do echo "Some query using $key"; done
What I tried
yq '.services.*.image | path' docker-compose.yaml
- services
- foo
- image
- services
- foo2
- image
- services
- foo3
- image
yq '.services.*.image | path | .[-2]' docker-compose.yaml
foo
foo2
foo3
Maybe with some query and merging this can print the path in a way it can later be used for a query again.
You were almost there with the path filter; you just have to convert it to a dotpath:
yq eval '.services.*.image | path | "." + join(".")' docker-compose.yaml
.services.foo.image
.services.foo2.image
.services.foo3.image
Is the output as expected?
yq '.services | to_entries | .[] | .key + " - " + .value.image' docker-compose.yaml
Output
foo - docker.registry.url:version
foo2 - docker.registry.url2:version
foo3 - docker.registry.url3:version
I'm actually quite happy with:
for key in $(yq '.services.*.image | path | .[-2]' docker-compose.yaml); do echo "$key - $(y
q ".services.$key.image" docker-compose.yaml)"; done
But is there some better concept?

Adding multiple lines to yaml file based on key

I have sample.yaml file that looks like the following:
a:
b:
- "x"
- "y"
- "z"
and I have another file called toadd.yaml that contains the following
- "first to add"
- "second to add"
I want to modify sample.yaml file so that it looks like:
a:
b:
- "x
- "y"
- "z"
- "first to add"
- "second to add"
Also, I dont want redundant naming! so if there is "x" already in toadd.yaml than I dont want it to be added two times in sample.yaml/a.b
Please note that I have tried the following:
while read line; do
yq '.a.b += ['$line']' sample.yaml
done <toadd.yaml
and I fell on:
Error: Bad expression, could not find matching `]`
If the files are relatively smaller, you could just directly load the second file on to the first. See Merging two files together
yq '.a.b += load("toadd.yaml")' sample.yaml
Tested on mikefarah/yq version 4.25.1
To solve the redundancy requirement, do a unique operation before forming the array again.
yq 'load("toadd.yaml") as $data | .a.b |= ( . + $data | unique )' sample.yaml
which can be further simplified to just
yq '.a.b |= ( . + load("toadd.yaml") | unique )' sample.yaml

how to explicitly write two references in ruamel.yaml

If I have multiple references and when I write them to a YAML file using ruaml.yaml from Python I get:
<<: [*name-name, *help-name]
but instead I would prefer to have
<<: *name-name
<<: *help-name
Is there an option to achieve this while writing to the file?
UPDATE
descriptions:
- &description-one-ref
description: >
helptexts:
- &help-one
help_text: |
questions:
- &question-one
title: "title test"
reference: "question-one-ref"
field: "ChoiceField"
choices:
- "Yes"
- "No"
required: true
<<: *description-one-ref
<<: *help-one
riskvalue_max: 10
calculations:
- conditions:
- comparator: "equal"
value: "Yes"
actions:
- riskvalue: 0
- conditions:
- comparator: "equal"
value: "No"
actions:
- riskvalue: 10
Currently I'm reading such a file and modify specific values within python and then want to write it back. When I'm writing I'm getting the issue that the references are as list and not as outlined.
That means the workflow is as: I'm reading the doc via
yaml = ruamel.yaml.YAML()
with open('test.yaml') as f:
data = yaml.load(f)
for k in data.keys():
if k == 'questions':
q = data.get(k)
for i in range(0, len(q)):
q[i]['title'] = "my new title"
f.close()
g = open('new_file.yaml', 'w')
yaml(data)
g.close()
No, there is no such option, as it would lead to an invalid YAML file.
The << is a mapping key, for which the value is interpreted
specially assuming the parser implements to the language independent
merge key specification. And a mapping key must be unique
according to the YAML specification:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique.
That ruamel.yaml (< 0.15.75) doesn't throw an error on such
duplicate key is a bug. On duplicate normal keys, ruamel.yaml
does throw an error. The bug is inherited from PyYAML (which is not
specification conformant, and does not throw an error even on
duplicate normal keys).
However with a little pre- and post-processing what you want to do can
be easily achieved. The trick is to make the YAML valid before parsing
by making the offending duplicate << keys unique (but recognisable)
and then, when writing the YAML back to file, substituting these
unique keys by <<: * again. In the following the first occurence of
<<: * is replaced by [<<, 0]:, the second by [<<, 1]: etc.
The * needs to be part of the substitution, as there are no anchors in
the document for those aliases.
import sys
import subprocess
import ruamel.yaml
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.indent(sequence=4, offset=2)
class DoubleMergeKeyEnabler(object):
def __init__(self):
self.pat = '<<: ' # could be at the root level mapping, so no leading space
self.r_pat = '[<<, {}]: ' # probably not using sequences as keys
self.pat_nr = -1
def convert(self, doc):
while self.pat in doc:
self.pat_nr += 1
doc = doc.replace(self.pat, self.r_pat.format(self.pat_nr), 1)
return doc
def revert(self, doc):
while self.pat_nr >= 0:
doc = doc.replace(self.r_pat.format(self.pat_nr), self.pat, 1)
self.pat_nr -= 1
return doc
dmke = DoubleMergeKeyEnabler()
with open('test.yaml') as fp:
# we don't do this line by line, that would not work well on flow style mappings
orgdoc = fp.read()
doc = dmke.convert(orgdoc)
data = yaml.load(doc)
data['questions'][0].anchor.always_dump = True
#######################################
# >>>> do your thing on data here <<< #
#######################################
with open('output.yaml', 'w') as fp:
yaml.dump(data, fp, transform=dmke.revert)
res = subprocess.check_output(['diff', '-u', 'test.yaml', 'output.yaml']).decode('utf-8')
print('diff says:', res)
which gives:
diff says:
which means the files are the same on round-trip (as long as you don't
change anything before dumping).
Setting preserve_quotes and calling ident() on the YAML instance are necessary to
preserve your superfluous quotes, resp. keeping the indentation.
Since the anchor question-one has no alias, you need to enable dumping explicitly by
setting always_dump on that attribute to True. If necessary you can recursively
walk over data and set anchor.always_dump = True when .anchor.value is not None

Extracting values from a single file

I have a file with multiple lines; but a specific line contains tons of information, with several repeated expressions. I'm trying to extract some specific values. I first tried some commands with sed, for instance, but with no success. So, I was wondering if you could give me some insights.
So, here you have one fraction of the unique line of the given document I mentioned:
[...]6[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01},DLOOP.rate_median=0.04131395026396427,length=
[...]
10[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61},DLOOP.rate_median=0.04131395026396427,length=
[...]
My aim here is first to extract all the values that is between the brackets, after "habitat.set.prob={". and put them in a single line in a text file.
Also, it would be important to extract the numbers that appears just before the expression "[&length_range=]", which in this case are "6" and "10". They are the label of the set of numbers after "prob={"
So the set of numbers I want to extract always appears between "habitat.set.prob={" and "},DLOOP.rate_median", while the other number (the label) is always rigth before "[&length_range="; but what is before the label is not the same expression; actually it is a random number.
The goal then is end up with a file with the following characteristcs:
6 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
10 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
and so on …
What do you think? Is this possible?
I started with this very basic command at least to try to extract the set of numbers, but it didn't work
sed -n "/habitat.set.prob={/,/},DLOOP.rate_median=/ p"
| Well... I got some improvement.
I was able to get the values at least:
awk '{gsub("habitat.set.prob={","\n");printf"%s",$0}' filename | awk -F'},' '{print $1"}"}' | grep -iv "TREE" > stats.txt
|
Many thanks in advance.
Cheers,
Luiz
Something like that:
sed -rn '/.*[0-9]+\[&length_range=\{/,/habitat.set.prob=\{/{s/.*\b([0-9]+)\[&length_range.*/\1/p; s/.*habitat.set.prob=\{([^D]+)\},DLOOP.rate.*/\1/p}' habitat
6
0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01
10
0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
The first part '/.a./,/.b./' searches from pattern a to b, distributed over multiple lines. The -n told sed to do non-printing as default.
In '/.a./,/.b./{s/.c./.d./p; s/.e./.f./p}'
there are two substitution commands with p=print in curly braces.
I am not sure if you really digged a little, so not providing the complete answer, but let's hope this would help you:
for the first part: getting the no(which you call as label) you didn't mention if there is any specific pattern, so try this (data is the file which contains the actual input) - you need to work on how to get the number and tweak the RE a bit
sed -n 's/.*\([0-9][0-9]*\).*length_range.*/\1/p' data
For the other part which gives the numericals between habitat and DLOOP:
sed -n 's/.*habitat.set.prob=\(.*\),DLOOP.*/\1/pg' data | tr '{' ' ' | tr '}' ' '
Now, try to take this as a starter and work on your output to get your desired result!
To explain a bit:
In the first section - I am trying to capture the numericals between anything(.*) and (.*)length_range [you can escape the character [ and & by using \ in front of them]
In the second section: I am capturing pattern in between habitat.set.prob and DLOOP and then doin a tr to remove the brackets.
#include <iostream>
using namespace std;
int main()
{
string p = "1:2:3:4"; //input your string
int arr[4] = {}; //create a new empty integer array to put the integers in it
for(int i=0, j=0; i <p.length(); i++){//loop on the string to extract integers
if( p[i] == ':'){continue;}//if the value = ':' skip it and continue
arr[j]=(int)p[i]-48;j++;//put the integer in the array we created
}
cout << "String={"<<arr[0]<<" "<<arr[1]<<" "<<arr[2]<<" "<<arr[3]<<"}";//print the array
return 0;
}

Convert Ansible variable from Unicode to ASCII

I'm getting the output of a command on the remote system and storing it in a variable. It is then used to fill in a file template which gets placed on the system.
- name: Retrieve Initiator Name
command: /usr/sbin/iscsi-iname
register: iscsiname
- name: Setup InitiatorName File
template: src=initiatorname.iscsi.template dest=/etc/iscsi/initiatorname.iscsi
The initiatorname.iscsi.template file contains:
InitiatorName={{ iscsiname.stdout_lines }}
When I run it however, I get a file with the following:
InitiatorName=[u'iqn.2005-03.org.open-iscsi:2bb08ec8f94']
What I want:
InitiatorName=iqn.2005-03.org.open-iscsi:2bb08ec8f94
What am I doing wrong?
I realize I could write this to the file with an "echo "InitiatorName=$(/usr/sbin/iscsi-iname)" > /etc/iscsi/initiatorname.iscsi" but that seems like an un-Ansible way of doing it.
Thanks in advance.
FWIW, if you really do have an array:
[u'string1', u'string2', u'string3']
And you want your template/whatever result to be NOT:
ABC=[u'string1', u'string2', u'string3']
But you prefer:
ABC=["string1", "string2", "string3"]
Then, this will do the trick:
ABC=["{{ iscsiname.stdout_lines | list | join("\", \"") }}"]
(extra backslashes due to my code being in a string originally.)
Use a filter to avoid unicode strings:
InitiatorName = {{ iscsiname.stdout_lines | to_yaml }}
Ansible Playbook Filters
To avoid the 80 symbol limit of PyYAML, just use the to_json filter instead:
InitiatorName = {{ iscsiname.stdout_lines | to_yaml }}
In my case, I'd like to create a python array from a comma seperated list. So a,b,c should become ["a", "b", "c"]. But without the 'u' prefix because I need string comparisations (without special chars) from WebSpher. Since they seems not to have the same encoding, comparisation fails. For this reason, I can't simply use var.split(',').
Since the strings contains no special chars, I just use to_json in combination with map(trim). This fixes the problem that a, b would become "a", " b".
restartApps = {{ apps.split(',') | map('trim') | list | to_json }}
Since JSON also knows arrays, I get the same result than python would generate, but without the u prefix.