How can I print an Ansible vaulted variable that includes a Kubernetes secret from the CLI? - kubernetes

I have a Ansible group_vars directory with the following file within it:
$ cat inventory/group_vars/env1
...
...
ldap_config: !vault |
$ANSIBLE_VAULT;1.1;AES256
31636161623166323039356163363432336566356165633232643932623133643764343134613064
6563346430393264643432636434356334313065653537300a353431376264333463333238383833
31633664303532356635303336383361386165613431346565373239643431303235323132633331
3561343765383538340a373436653232326632316133623935333739323165303532353830386532
39616232633436333238396139323631633966333635393431373565643339313031393031313836
61306163333539616264353163353535366537356662333833653634393963663838303230386362
31396431636630393439306663313762313531633130326633383164393938363165333866626438
...
...
This Ansible encrypted string has a Kubernetes secret encapsulated within it. A base64 blob that looks something like this:
IyMKIyBIb3N0IERhdGFiYXNlCiMKIyBsb2NhbGhvc3QgaXMgdXNlZCB0byBjb25maWd1cmUgdGhlIGxvb3BiYWNrIGludGVyZmFjZQojIHdoZW4gdGhlIHN5c3RlbSBpcyBib290aW5nLiAgRG8gbm90IGNoYW5nZSB0aGlzIGVudHJ5LgojIwoxMjcuMC4wLjEJbG9jYWxob3N0CjI1NS4yNTUuMjU1LjI1NQlicm9hZGNhc3Rob3N0Cjo6MSAgICAgICAgICAgICBsb2NhbGhvc3QKIyBBZGRlZCBieSBEb2NrZXIgRGVza3RvcAojIFRvIGFsbG93IHRoZSBzYW1lIGt1YmUgY29udGV4dCB0byB3b3JrIG9uIHRoZSBob3N0IGFuZCB0aGUgY29udGFpbmVyOgoxMjcuMC4wLjEga3ViZXJuZXRlcy5kb2NrZXIuaW50ZXJuYWwKIyBFbmQgb2Ygc2VjdGlvbgo=
How can I decrypt this in a single CLI?

We can use an Ansible adhoc command to retrieve the variable of interest, ldap_config. To start we're going to use this adhoc to retrieve the Ansible encrypted vault string:
$ ansible -i "localhost," all \
-m debug \
-a 'msg="{{ ldap_config }}"' \
--vault-password-file=~/.vault_pass.txt \
-e#inventory/group_vars/env1
localhost | SUCCESS => {
"msg": "ABCD......."
Make note that we're:
using the debug module and having it print the variable, msg={{ ldap_config }}
giving ansible the path to the secret to decrypt encrypted strings
using the notation -e#< ...path to file...> to pass the file with the encrypted vault variables
Now we can use Jinja2 filters to do the rest of the parsing:
$ ansible -i "localhost," all \
-m debug \
-a 'msg="{{ ldap_config | b64decode | from_yaml }}"' \
--vault-password-file=~/.vault_pass.txt \
-e#inventory/group_vars/env1
localhost | SUCCESS => {
"msg": {
"apiVersion": "v1",
"bindDN": "uid=readonly,cn=users,cn=accounts,dc=mydom,dc=com",
"bindPassword": "my secret password to ldap",
"ca": "",
"insecure": true,
"kind": "LDAPSyncConfig",
"rfc2307": {
"groupMembershipAttributes": [
"member"
],
"groupNameAttributes": [
"cn"
],
"groupUIDAttribute": "dn",
"groupsQuery": {
"baseDN": "cn=groups,cn=accounts,dc=mydom,dc=com",
"derefAliases": "never",
"filter": "(objectclass=groupOfNames)",
"scope": "sub"
},
"tolerateMemberNotFoundErrors": false,
"tolerateMemberOutOfScopeErrors": false,
"userNameAttributes": [
"uid"
],
"userUIDAttribute": "dn",
"usersQuery": {
"baseDN": "cn=users,cn=accounts,dc=mydom,dc=com",
"derefAliases": "never",
"scope": "sub"
}
},
"url": "ldap://192.168.1.10:389"
}
}
NOTE: The above section -a 'msg="{{ ldap_config | b64decode | from_yaml }}" is what's doing the heavy lifting in terms of converting from Base64 to YAML.
References
How to run Ansible without hosts file
https://docs.ansible.com/ansible/latest/user_guide/playbooks_filters.html#filters-for-formatting-data
Base64 Decode String in jinja
How to decrypt string with ansible-vault 2.3.0

If you need a one liner that works with any yaml file (not only in inventory) containing inlined vault vars, and if you are ready to install a pip package for that, there is a solution using yq, a yaml processor built on top of jq
prerequesite: Install yq
pip install yq
Usage
You can get your result with the following command:
yq -r .ldapconfig inventory/group_vars/env1 | ansible_vault decrypt
If you need to type your vault pass interactively, don't forget to add the relevant option
yq -r .ldapconfig inventory/group_vars/env1 | ansible_vault --ask-vault-pass decrypt
Note: the -r option to yq is mandatory to get a raw result without the quotation marks around the value.

Related

Getting just a string from a list

How do I just get 1 output from "labels"?
tried doing -o=jsonpath='{.metadata.labels[0]}' in hopes of getting the first string but that threw an error.
"metadata": {
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "143.110.156.190",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/controlplane": "true",
"node-role.kubernetes.io/etcd": "true",
"node-role.kubernetes.io/worker": "true"
},
metadata.labels is not an array, so don't think '{.metadata.labels[0]}' will work.
One of the option is perhaps you can try is to use shell to fetch the first value as following:
kubectl get ingress -o json | jq '.items[0].metadata.labels' | head -2 |tr -d , |cat - <(echo "}") | jq
Kubectl uses JSONPath expressions to filter on specific fields in the JSON object and format the output:
kubectl get ingress -o=jsonpath='{.items[0].metadata.labels}'
For reference use the following link:
https://kubernetes.io/docs/reference/kubectl/jsonpath/

Partition JSON data using jq and then send Rest query

I have a json file like below
[
{
"field": {
"empID": "sapid",
"location": "India",
}
},
{
"field": {
"empID": "sapid",
"location": "India",
}
},
{
"field": {
"empID": "sapid",
"location": "India",
}
}
{
"field": {
"empID": "sapid",
"location": "India",
}
},
{
"field": {
"empID": "sapid",
"location": "India",
}
}
{
"field": {
"empID": "sapid",
"location": "India",
}
}
.... upto 1 million
]
I have to use this json as an input for a rest request For example
curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d #temp.json
My server will not accept 1 million json object at a time.
I am looking for an approach where i have to extract the first 500 objects from the main json and send it in one rest query and then next 500 object in second rest query and so on.
Can you please suggest how can i achieve this by jq?
There's an intrinsic tradeoff here between space and time efficiency. In the following, the focus is on the latter.
Assuming that each call to curl must send a JSON array, a time-efficient solution can be constructed along the following lines:
< array.json jq -c '
def batch($n): length as $l | range(0;length;$n) as $i | .[$i: $i+n];
batch(500)
' | while read -r json
do
echo "$json" | curl -X POST -H "Content-Type: application/json" -d -# ....
done
Here .... signifies additional appropriate curl arguments.
GNU parallel
You might also want to consider using GNU parallel, e.g.:
< array.json jq -c '
def batch($n):
length as $l
| range(0;length;$n) as $i
| .[$i: $i+n];
batch(500)
' | parallel --pipe -N1 curl -X POST -H "Content-Type: application/json" -d #- ....
You have not shared any HW information of the system you are running this on. At the minimum you need to do some sort of multiprocessing to make this faster instead of running (1000000/500) curl requests altogether.
One way, would be to use GNU xargs which has a built-in to run number of parallel instances of a given process using the -P flag and number of lines of input to read from at any time with the L flag.
To start with you can do something like below to instruct curl to run on 500 lines at a time and invoke 20 such invocations in parallel. So at a given tick, approximately (500 *20) lines of input are processed. You can tune the numbers depending on your HW capability both on the host and the server side.
xargs -L 500 -P 20 curl -X POST -H "Content-Type: application/json" http://sample-url -d #- < <(jq -c 'range(0;length;500) as $i | .[$i: $i+500]' json)
Modified jq filter to pack the JSON payload as an array of objects (credit peak's answer). The earlier version jq -c '.[]' json might not work as the individual chunk of lines passed at a time doesn't represent a valid JSON.
Note: Not tested due to performance constraints.
Assuming you have this formatting, splitting can be done by unpacking the array and saving the desired number of objects to separate files, e.g.:
<input.json jq -c '.[]' | split -l500
Which creates xaa with the first 500 objects, xab with the next 500 objects, etc. If you want to repackage the objects in an array, use the -s option to jq, e.g.: jq -s . xaa.
If you want to do this from the shell, you could use jq to split your JSON and pass it to xargs to call curl for each object returned.
jq -c '.[]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
This will send one curl request for each object. However, if you e.g. want to send the first 500 objects in a single curl request, you can specify a subarray in the jq filter. To send all of your JSON objects you will then somehow need to repeat the command, as afaik jq has no built-in way to split the input into chunks of objects.
jq -c '.[0:500]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
jq -c '.[500:1000]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
jq -c '.[1000:1500]' temp.json | xargs -I {} curl <REST Server URL with temp.json as input> "Content-Type: application/json" -d '{}'
[...]

How can I view the config details of the current context in kubectl?

I'd like to see the 'config' details as shown by the command of:
kubectl config view
However this shows the entire config details of all contexts, how can I filter it (or perhaps there is another command), to view the config details of the CURRENT context?
kubectl config view --minify displays only the current context
use the below command to get the full config including certificates
kubectl config view --minify --flatten
The cloud-native way to do this is to use the JSON output of the command, then filter it with jq:
kubectl config view -o json | jq '. as $o
| ."current-context" as $current_context_name
| $o.contexts[] | select(.name == $current_context_name) as $context
| $o.clusters[] | select(.name == $context.context.cluster) as $cluster
| $o.users[] | select(.name == $context.context.user) as $user
| {"current-context-name": $current_context_name, context: $context, cluster: $cluster, user: $user}'
{
"current-context-name": "docker-for-desktop",
"context": {
"name": "docker-for-desktop",
"context": {
"cluster": "docker-for-desktop-cluster",
"user": "docker-for-desktop"
}
},
"cluster": {
"name": "docker-for-desktop-cluster",
"cluster": {
"server": "https://localhost:6443",
"insecure-skip-tls-verify": true
}
},
"user": {
"name": "docker-for-desktop",
"user": {
"client-certificate-data": "REDACTED",
"client-key-data": "REDACTED"
}
}
}
This answer helped me figure out some of the jq bits.
The bash/kubectl with a little bit of jq, for any context equivalent:
exec >/tmp/output &&
CONTEXT_NAME=kubernetes-admin#kubernetes \
CONTEXT_CLUSTER=$(kubectl config view -o=jsonpath="{.contexts[?(#.name==\"${CONTEXT_NAME}\")].context.cluster}") \
CONTEXT_USER=$(kubectl config view -o=jsonpath="{.contexts[?(#.name==\"${CONTEXT_NAME}\")].context.user}") && \
echo "[" && \
kubectl config view -o=json | jq -j --arg CONTEXT_NAME "$CONTEXT_NAME" '.contexts[] | select(.name==$CONTEXT_NAME)' && \
echo "," && \
kubectl config view -o=json | jq -j --arg CONTEXT_CLUSTER "$CONTEXT_CLUSTER" '.clusters[] | select(.name==$CONTEXT_CLUSTER)' && \
echo "," && \
kubectl config view -o=json | jq -j --arg CONTEXT_USER "$CONTEXT_USER" '.users[] | select(.name==$CONTEXT_USER)' && \
echo -e "\n]\n" && \
exec >/dev/tty && \
cat /tmp/output | jq && \
rm -rf /tmp/output
You can use the command kubectl config view --minify to get current context only.
It is handy to use --help to get the options what you could have for kubectl operations.
kubectl config view --help

Getting value from json parameter doesn't work (jq: 1 compile error)

I am using awscli and trying to get the value of IpAddress from the output of my query.
I tried to use jq but I get a compile error.
This is the case:
output="$(aws efs describe-mount-targets --file-system-id fs-089b5e31)"
echo $output
{ "MountTargets": [ { "MountTargetId": "fsmt-bb29e666", "IpAddress": "172.20.33.255", "OwnerId": "668225551666", "SubnetId": "subnet-0b61377039d31e666", "NetworkInterfaceId": "eni-045f6ea1376662bdf", "FileSystemId": "fs-089b5e66", "LifeCycleState": "available" } ] }
And this is the command I am using to get the IpAddress:
echo array | jq '.[]MountTarget[]s.IpAddress'
The error I get is this:
parse error: Invalid numeric literal at line 2, column 0
ubuntu#ip-10-10-16-245:~/infra-devops/kops/vector$ echo array | jq '.[]MountTarget[]s.IpAddress'
jq: error: syntax error, unexpected IDENT, expecting $end (Unix shell quoting issues?) at <top-level>, line 1:
.[]MountTarget[]s.IpAddress
jq: 1 compile error
Is my query is the problem or maybe I better use sed instead?
Your syntax to access array is wrong. To get the IP address, use this:
aws efs describe-mount-targets --file-system-id fs-089b5e31 |
jq '.MountTargets[0].IpAddress'
The MountTargets is an array from which you want the first object.
If you need raw data (without double quotes) use -r option in the jq command.

MongoDB to BigQuery

What is the best way to export data from MongoDB hosted in mlab to google bigquery?
Initially, I am trying to do one time load from MongoDB to BigQuery and later on I am thinking of using Pub/Sub for real time data flow to bigquery.
I need help with first one time load from mongodb to bigquery.
In my opinion, the best practice is building your own extractor. That can be done with the language of your choice and you can extract to CSV or JSON.
But if you looking to a fast way and if your data is not huge and can fit within one server, then I recommend using mongoexport. Let's assume you have a simple document structure such as below:
{
"_id" : "tdfMXH0En5of2rZXSQ2wpzVhZ",
"statuses" : [
{
"status" : "dc9e5511-466c-4146-888a-574918cc2534",
"score" : 53.24388894
}
],
"stored_at" : ISODate("2017-04-12T07:04:23.545Z")
}
Then you need to define your BigQuery Schema (mongodb_schema.json) such as:
$ cat > mongodb_schema.json <<EOF
[
{ "name":"_id", "type": "STRING" },
{ "name":"stored_at", "type": "record", "fields": [
{ "name":"date", "type": "STRING" }
]},
{ "name":"statuses", "type": "record", "mode": "repeated", "fields": [
{ "name":"status", "type": "STRING" },
{ "name":"score", "type": "FLOAT" }
]}
]
EOF
Now, the fun part starts :-) Extracting data as JSON from your MongoDB. Let's assume you have a cluster with replica set name statuses, your db is sample, and your collection is status.
mongoexport \
--host statuses/db-01:27017,db-02:27017,db-03:27017 \
-vv \
--db "sample" \
--collection "status" \
--type "json" \
--limit 100000 \
--out ~/sample.json
As you can see above, I limit the output to 100k records because I recommend you run sample and load to BigQuery before doing it for all your data. After running above command you should have your sample data in sample.json BUT there is a field $date which will cause you an error loading to BigQuery. To fix that we can use sed to replace them to simple field name:
# Fix Date field to make it compatible with BQ
sed -i 's/"\$date"/"date"/g' sample.json
Now you can compress, upload to Google Cloud Storage (GCS) and then load to BigQuery using following commands:
# Compress for faster load
gzip sample.json
# Move to GCloud
gsutil mv ./sample.json.gz gs://your-bucket/sample/sample.json.gz
# Load to BQ
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
--max_bad_records=999999 \
--ignore_unknown_values=true \
--encoding=UTF-8 \
--replace \
"YOUR_DATASET.mongodb_sample" \
"gs://your-bucket/sample/*.json.gz" \
"mongodb_schema.json"
If everything was okay, then go back and remove --limit 100000 from mongoexport command and re-run above commands again to load everything instead of 100k sample.
ALTERNATIVE SOLUTION:
If you want more flexibility and performance is not your concern, then you can use mongo CLI tool as well. This way you can write your extract logic in a JavaScript and execute it against your data and then send output to BigQuery. Here is what I did for the same process but used JavaScript to output in CSV so I can load it much easier to BigQuery:
# Export Logic in JavaScript
cat > export-csv.js <<EOF
var size = 100000;
var maxCount = 1;
for (x = 0; x < maxCount; x = x + 1) {
var recToSkip = x * size;
db.entities.find().skip(recToSkip).limit(size).forEach(function(record) {
var row = record._id + "," + record.stored_at.toISOString();;
record.statuses.forEach(function (l) {
print(row + "," + l.status + "," + l.score)
});
});
}
EOF
# Execute on Mongo CLI
_MONGO_HOSTS="db-01:27017,db-02:27017,db-03:27017/sample?replicaSet=statuses"
mongo --quiet \
"${_MONGO_HOSTS}" \
export-csv.js \
| split -l 500000 --filter='gzip > $FILE.csv.gz' - sample_
# Load all Splitted Files to Google Cloud Storage
gsutil -m mv ./sample_* gs://your-bucket/sample/
# Load files to BigQuery
bq load \
--source_format=CSV \
--max_bad_records=999999 \
--ignore_unknown_values=true \
--encoding=UTF-8 \
--replace \
"YOUR_DATASET.mongodb_sample" \
"gs://your-bucket/sample/sample_*.csv.gz" \
"ID,StoredDate:DATETIME,Status,Score:FLOAT"
TIP: In above script I did small trick by piping output to able to split the output in multiple files with sample_ prefix. Also during split it will GZip the output so you can load it easier to GCS.
From a basic reading of MongoDB's documentation, it sounds like you can use mongoexport to dump your database as JSON. Once you've done that, refer to the BigQuery loading data topic for a description of how to create a table from JSON files after copying them to GCS.
You can read data from MongoDB and stream it to BigQuery. You can find an example in NodeJS here.
This is an extension of the linked example that prevents duplicated records (as long as they are still streaming buffer):
const { BigQuery } = require('#google-cloud/bigquery');
const bigqueryClient = new BigQuery();
...
const jsonData = // Array of documents from MongoDB
const inputRows = jsonData.map(row => ({
insertId: row._id,
json: row
}));
const insertOptions = {
raw: true
};
await bigqueryClient
.dataset(datasetId)
.table(tableId)
.insert(inputRows, insertOptions);