How do I pass a "blob" to AWS sns publish on the command line? - command-line

I am trying to publish a message to an SNS topic from the command line, which includes binary data.
How do I pass this binary data into the message-attributes field, as indicated in for --message-attributes?
This is my command:
awslocal sns publish \
--topic-arn ${AWS_TOPIC_ARN} \
--subject="Test Subject" \
--message "Data for today..." \
and my input:
"ProtoData": {
"DataType": "Binary",
"BinaryValue": **BLOB**
My "BLOB" data is in file://sample_proto_data.db


How we can Filter the Insight data for multiple specific campaigns in Facebook marketing?

I'm trying to get Insight of an Ad-Account by filtering using multiple specific campaigns, I was able to filter with single specific campaigns
Here is the code which I tried for single specific campaigns,reach,impressions,clicks,cpc,spend&filtering=[{field: "",operator:"CONTAIN", value: '123456789'}}]
You have 2 options:
You can build a list of campaigns by your specific condition and then you use IN operator instead of CONTAIN operator like this:,reach,impressions,clicks,cpc,spend&filtering=[{field: "",operator:"IN", value: ['id1', 'id2', 'id3']}]
You can try to use batch request like this (documentation here and here):
curl \
-F 'access_token=<ACCESS_TOKEN>' \
-F 'batch=[ \
{ \
"method": "GET", \
"relative_url": "v12.0/act_YOUR_ACCOUNT_ID/insights?fields=impressions,spend,ad_id,adset_id&filtering=[{field: "",operator:"CONTAIN", value: '123456789'}]" \
}, \
{ \
"method": "GET", \
"relative_url": "v12.0/act_YOUR_ACCOUNT_ID/insights?fields=impressions,spend,ad_id,adset_id&filtering=[{field: "",operator:"CONTAIN", value: '222222222'}]" \
}, \
]' \

Get run id after triggering a github workflow dispatch event

I am triggering a workflow run via github's rest api. But github doesn't send any data in the response body (204).
How do i get the run id of the trigger request made?
I know about the getRunsList api, which would return runs for a workflow id, then i can get the latest run, but this can cause issues when two requests are submitted at almost the same time.
This is not currently possible to get the run_id associated to the dispatch API call in the dispatch response itself, but there is a way to find this out if you can edit your worflow file a little.
You need to dispatch the workflow with an input like this:
curl "$OWNER/$REPO/actions/workflows/$WORKFLOW/dispatches" -s \
-H "Authorization: Token $TOKEN" \
-d '{
Also edit your workflow yaml file with an optionnal input (named id here). Also, place it as the first job, a job which has a single step with the same name as the input id value (this is how we will get the id back using the API!):
name: ID Example
description: 'run identifier'
required: false
name: Workflow ID Provider
runs-on: ubuntu-latest
- name: ${{}}
run: echo run identifier ${{ }}
The trick here is to use name: ${{}}
Then the flow is the following:
run the dispatch API call along with the input named id in this case with a random value
in a loop get the runs that have been created since now minus 5 minutes (the delta is to avoid any issue with timings):
in the run API response, you will get a jobs_url that you will call:
the job API call above returns the list of jobs, as you have declared the id jobs as 1st job it will be in first position. It also gives you the steps with the name of the steps. Something like this:
"id": 3840520726,
"run_id": 1321007088,
"run_url": "$OWNER/$REPO/actions/runs/1321007088",
"run_attempt": 1,
"node_id": "CR_kwDOEi1ZxM7k6bIW",
"head_sha": "4687a9bb5090b0aadddb69cc335b7d9e80a1601d",
"url": "$OWNER/$REPO/actions/jobs/3840520726",
"html_url": "$OWNER/$REPO/runs/3840520726",
"status": "completed",
"conclusion": "success",
"started_at": "2021-10-08T15:54:40Z",
"completed_at": "2021-10-08T15:54:43Z",
"name": "Hello world",
"steps": [
"name": "Set up job",
"status": "completed",
"conclusion": "success",
"number": 1,
"started_at": "2021-10-08T17:54:40.000+02:00",
"completed_at": "2021-10-08T17:54:42.000+02:00"
"name": "12345678", <=============== HERE
"status": "completed",
"conclusion": "success",
"number": 2,
"started_at": "2021-10-08T17:54:42.000+02:00",
"completed_at": "2021-10-08T17:54:43.000+02:00"
"name": "Complete job",
"status": "completed",
"conclusion": "success",
"number": 3,
"started_at": "2021-10-08T17:54:43.000+02:00",
"completed_at": "2021-10-08T17:54:43.000+02:00"
"check_run_url": "$OWNER/$REPO/check-runs/3840520726",
"labels": [
"runner_id": 1,
"runner_name": "Hosted Agent",
"runner_group_id": 2,
"runner_group_name": "GitHub Actions"
The name of the id step is returning your input value, so you can safely confirm that it is this run that was triggered by your dispatch call
Here is an implementation of this flow in python, it will return the workflow run id:
import random
import string
import datetime
import requests
import time
# edit the following variables
owner = "YOUR_ORG"
repo = "YOUR_REPO"
workflow = "dispatch.yaml"
token = "YOUR_TOKEN"
authHeader = { "Authorization": f"Token {token}" }
# generate a random id
run_identifier = ''.join(random.choices(string.ascii_uppercase + string.digits, k=15))
# filter runs that were created after this date minus 5 minutes
delta_time = datetime.timedelta(minutes=5)
run_date_filter = (datetime.datetime.utcnow()-delta_time).strftime("%Y-%m-%dT%H:%M")
r ="{owner}/{repo}/actions/workflows/{workflow}/dispatches",
headers= authHeader,
json= {
"id": run_identifier
print(f"dispatch workflow status: {r.status_code} | workflow identifier: {run_identifier}")
workflow_id = ""
while workflow_id == "":
r = requests.get(f"{owner}/{repo}/actions/runs?created=%3E{run_date_filter}",
headers = authHeader)
runs = r.json()["workflow_runs"]
if len(runs) > 0:
for workflow in runs:
jobs_url = workflow["jobs_url"]
print(f"get jobs_url {jobs_url}")
r = requests.get(jobs_url, headers= authHeader)
jobs = r.json()["jobs"]
if len(jobs) > 0:
# we only take the first job, edit this if you need multiple jobs
job = jobs[0]
steps = job["steps"]
if len(steps) >= 2:
second_step = steps[1] # if you have position the run_identifier step at 1st position
if second_step["name"] == run_identifier:
workflow_id = job["run_id"]
print("waiting for steps to be executed...")
print("waiting for jobs to popup...")
print("waiting for workflows to popup...")
print(f"workflow_id: {workflow_id}")
gist link
Sample output
$ python3
dispatch workflow status: 204 | workflow identifier: Z7YPF6DD1YP2PTM
get jobs_url
get jobs_url
get jobs_url
get jobs_url
waiting for steps to be executed...
get jobs_url
get jobs_url
waiting for steps to be executed...
get jobs_url
get jobs_url
waiting for steps to be executed...
get jobs_url
get jobs_url
get jobs_url
workflow_id: 1321475221
It would have been easier if there was a way to retrieve the workflow inputs via API but there is no way to do this at this moment
Note that in the worflow file, I use ${{}} because ${{}} doesn't work. It seems inputs is not being evaluated when we use it as the step name
gh workflow list --repo <repo-name>
Trigger workflow of type workflow_dispatch
gh workflow run $WORKFLOWID --repo <repo-name>
It doesnot return the run-id which is required get the status of execution
Get latest run-id WORKFLOW_RUNID
gh run list -w $WORKFLOWID --repo <repo> -L 1 --json databaseId | jq '.[]| .databaseId'
Get workflow run details
gh run view --repo <repo> $WORKFLOW_RUNID
This is workaround that we do. It is not perfect, but should work.
inspired by the comment above, made a /bin/bash script which gets your $run_id
name: ID Example
description: 'run identifier'
required: false
name: Workflow ID Provider
runs-on: ubuntu-latest
- name: ${{}}
run: echo run identifier ${{ }}
workflow_id= generates a random 8 digit number
now, later, date_filter= use for time filter, now - 5 minutes \
generates a random ID
POST job and trigger workflow
GET action/runs descending and gets first .workflow_run[].id
keeps looping until script matches random ID from step 1
echo run_id
TOKEN="" \
GH_USER="" \
REPO="" \
WORKFLOW_ID=$(tr -dc '0-9' </dev/urandom | head -c 8) \
NOW=$(date +"%Y-%m-%dT%H:%M") \
LATER=$(date -d "-5 minutes" +"%Y-%m-%dT%H:%M") \
JSON=$(cat <<-EOF
) && \
curl -s \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $TOKEN" \
"$GH_USER/$REPO/actions/workflows/main.yml/dispatches" \
-d $JSON && \
INFO="null" \
ATTEMPTS=10 && \
until [[ $CHECK -eq $WORKFLOW_ID ]] || [[ $COUNT -eq $ATTEMPTS ]];do
echo -e "$(( COUNT++ ))..."
INFO=$(curl -s \
-X GET \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $TOKEN" \
"$GH_USER/$REPO/actions/runs?created:<$DATE_FILTER" | jq -r '.workflow_runs[].id' | grep -m1 "")
CHECK=$(curl -s \
-X GET \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $TOKEN" \
"$GH_USER/$REPO/actions/runs/$INFO/jobs" | jq -r '.jobs[].steps[].name' | grep -o '[[:digit:]]*')
sleep 5s
echo "Your run_id is $CHECK"
Your run_id is 67530050
I recommend using the convictional/trigger-workflow-and-wait action:
- name: Example
uses: convictional/trigger-workflow-and-wait#v1.6.5
owner: my-org
repo: other-repo
workflow_file_name: other-workflow.yaml
github_token: ${{ secrets.GH_TOKEN }}
client_payload: '{"key1": "value1", "key2": "value2"}'
This takes care of waiting for the other job and returning a success or failure according to whether the other workflow succeeded or failed. It does so in a robust way that handles multiple runs being triggered at almost the same time.
the whole idea is to know which run was dispatched, when id was suggested to use on dispatch, this id is expected to be found in the response of the GET call to this url "actions/runs" so now user is able to identify the proper run to monitor. The injected id is not part of the response, so extracting another url to find your id is not helpful since this is the point where id needed to identify the run for monitoring

How can I print an Ansible vaulted variable that includes a Kubernetes secret from the CLI?

I have a Ansible group_vars directory with the following file within it:
$ cat inventory/group_vars/env1
ldap_config: !vault |
This Ansible encrypted string has a Kubernetes secret encapsulated within it. A base64 blob that looks something like this:
How can I decrypt this in a single CLI?
We can use an Ansible adhoc command to retrieve the variable of interest, ldap_config. To start we're going to use this adhoc to retrieve the Ansible encrypted vault string:
$ ansible -i "localhost," all \
-m debug \
-a 'msg="{{ ldap_config }}"' \
--vault-password-file=~/.vault_pass.txt \
localhost | SUCCESS => {
"msg": "ABCD......."
Make note that we're:
using the debug module and having it print the variable, msg={{ ldap_config }}
giving ansible the path to the secret to decrypt encrypted strings
using the notation -e#< ...path to file...> to pass the file with the encrypted vault variables
Now we can use Jinja2 filters to do the rest of the parsing:
$ ansible -i "localhost," all \
-m debug \
-a 'msg="{{ ldap_config | b64decode | from_yaml }}"' \
--vault-password-file=~/.vault_pass.txt \
localhost | SUCCESS => {
"msg": {
"apiVersion": "v1",
"bindDN": "uid=readonly,cn=users,cn=accounts,dc=mydom,dc=com",
"bindPassword": "my secret password to ldap",
"ca": "",
"insecure": true,
"kind": "LDAPSyncConfig",
"rfc2307": {
"groupMembershipAttributes": [
"groupNameAttributes": [
"groupUIDAttribute": "dn",
"groupsQuery": {
"baseDN": "cn=groups,cn=accounts,dc=mydom,dc=com",
"derefAliases": "never",
"filter": "(objectclass=groupOfNames)",
"scope": "sub"
"tolerateMemberNotFoundErrors": false,
"tolerateMemberOutOfScopeErrors": false,
"userNameAttributes": [
"userUIDAttribute": "dn",
"usersQuery": {
"baseDN": "cn=users,cn=accounts,dc=mydom,dc=com",
"derefAliases": "never",
"scope": "sub"
"url": "ldap://"
NOTE: The above section -a 'msg="{{ ldap_config | b64decode | from_yaml }}" is what's doing the heavy lifting in terms of converting from Base64 to YAML.
How to run Ansible without hosts file
Base64 Decode String in jinja
How to decrypt string with ansible-vault 2.3.0
If you need a one liner that works with any yaml file (not only in inventory) containing inlined vault vars, and if you are ready to install a pip package for that, there is a solution using yq, a yaml processor built on top of jq
prerequesite: Install yq
pip install yq
You can get your result with the following command:
yq -r .ldapconfig inventory/group_vars/env1 | ansible_vault decrypt
If you need to type your vault pass interactively, don't forget to add the relevant option
yq -r .ldapconfig inventory/group_vars/env1 | ansible_vault --ask-vault-pass decrypt
Note: the -r option to yq is mandatory to get a raw result without the quotation marks around the value.

What is the format of the JSON for a Jenkins REST buildWithParameters to override the default parameters values

I am able to build a Jenkins job with its parameters' default values by sending a POST call to
and I can override the default parameters "product", "suites" and "markers by sending to this URL:
But I saw examples were the parameters can be override by sending a JSON body with new values. I am trying to do that by sending the following json bodies. Neither of them works for me.
'product': 'ALL',
'suites': 'ALL',
'markers': 'ALL'
"parameter": [
"name": "product",
"value": "ALL"
"name": "suites",
"value": "ALL"
"name": "markers",
"value": "ALL"
What JSON to send if I want to override the values of parameters "product", "suites" & "markers"?
I'll leave the original question as is and elaborate here on the various API calls to trigger parameterized builds. These are the calls options that I used.
Additional documentation:
The job contains 3 parameters named: product, suites, markers
Send the parameters as URL query parameters to /buildWithParameters:
Send the parameters as JSON data\payload to /build:
The JSON data\payload is not sent as the call's json_body (which is what confused me), but rater in the data payload as:
"parameter": [
{"name":"product", "value":"123"},
{"name":"suites", "value":"high"},
{"name":"markers", "value":"Hello"}
And here are the CURL commands for each of the above calls:
curl -X POST -H "Jenkins-Crumb:2e11fc9...0ed4883a14a" http://jenkins:8080/view/Orion_phase_2/job/test_remote_api_triggerring/build --user "raameeil:228366f31...f655eb82058ad12d" --form json='{"parameter": [{"name":"product", "value":"123"}, {"name":"suites", "value":"high"}, {"name":"markers", "value":"Hello"}]}'
curl -X POST \
'http://jenkins:8080/view/Orion_phase_2/job/test_remote_api_triggerring/buildWithParameters?product=234&suites=333&markers=555' \
-H 'authorization: Basic c2hsb21pb...ODRlNjU1ZWI4MjAyOGFkMTJk' \
-H 'cache-control: no-cache' \
-H 'jenkins-crumb: 0bed4c7...9031c735a' \
-H 'postman-token: 0fb2ef51-...-...-...-6430e9263c3b'
What to send to Python's requests
In order to send the above calls in Python you will need to pass:
headers = jenkins-crumb
auth = tuple of your (user_name, user_auth_token)
data = dictionary type { 'json' : json string of {"parameter":[....]} }
curl -v POST http://user:token#host:port/job/my-job/build --data-urlencode json='{"parameter": [{"name":"xx", "value":"xxx"}]}
or use Python request:
import requests
import json
url = " http://user:token#host:port/job/my-job/build "
pyload = {"parameter": [
{"name":"xx", "value":"xxx"},
data = {'json': json.dumps(pyload)}
rep =, data)

MongoDB to BigQuery

What is the best way to export data from MongoDB hosted in mlab to google bigquery?
Initially, I am trying to do one time load from MongoDB to BigQuery and later on I am thinking of using Pub/Sub for real time data flow to bigquery.
I need help with first one time load from mongodb to bigquery.
In my opinion, the best practice is building your own extractor. That can be done with the language of your choice and you can extract to CSV or JSON.
But if you looking to a fast way and if your data is not huge and can fit within one server, then I recommend using mongoexport. Let's assume you have a simple document structure such as below:
"_id" : "tdfMXH0En5of2rZXSQ2wpzVhZ",
"statuses" : [
"status" : "dc9e5511-466c-4146-888a-574918cc2534",
"score" : 53.24388894
"stored_at" : ISODate("2017-04-12T07:04:23.545Z")
Then you need to define your BigQuery Schema (mongodb_schema.json) such as:
$ cat > mongodb_schema.json <<EOF
{ "name":"_id", "type": "STRING" },
{ "name":"stored_at", "type": "record", "fields": [
{ "name":"date", "type": "STRING" }
{ "name":"statuses", "type": "record", "mode": "repeated", "fields": [
{ "name":"status", "type": "STRING" },
{ "name":"score", "type": "FLOAT" }
Now, the fun part starts :-) Extracting data as JSON from your MongoDB. Let's assume you have a cluster with replica set name statuses, your db is sample, and your collection is status.
mongoexport \
--host statuses/db-01:27017,db-02:27017,db-03:27017 \
-vv \
--db "sample" \
--collection "status" \
--type "json" \
--limit 100000 \
--out ~/sample.json
As you can see above, I limit the output to 100k records because I recommend you run sample and load to BigQuery before doing it for all your data. After running above command you should have your sample data in sample.json BUT there is a field $date which will cause you an error loading to BigQuery. To fix that we can use sed to replace them to simple field name:
# Fix Date field to make it compatible with BQ
sed -i 's/"\$date"/"date"/g' sample.json
Now you can compress, upload to Google Cloud Storage (GCS) and then load to BigQuery using following commands:
# Compress for faster load
gzip sample.json
# Move to GCloud
gsutil mv ./sample.json.gz gs://your-bucket/sample/sample.json.gz
# Load to BQ
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
--max_bad_records=999999 \
--ignore_unknown_values=true \
--encoding=UTF-8 \
--replace \
"YOUR_DATASET.mongodb_sample" \
"gs://your-bucket/sample/*.json.gz" \
If everything was okay, then go back and remove --limit 100000 from mongoexport command and re-run above commands again to load everything instead of 100k sample.
If you want more flexibility and performance is not your concern, then you can use mongo CLI tool as well. This way you can write your extract logic in a JavaScript and execute it against your data and then send output to BigQuery. Here is what I did for the same process but used JavaScript to output in CSV so I can load it much easier to BigQuery:
# Export Logic in JavaScript
cat > export-csv.js <<EOF
var size = 100000;
var maxCount = 1;
for (x = 0; x < maxCount; x = x + 1) {
var recToSkip = x * size;
db.entities.find().skip(recToSkip).limit(size).forEach(function(record) {
var row = record._id + "," + record.stored_at.toISOString();;
record.statuses.forEach(function (l) {
print(row + "," + l.status + "," + l.score)
# Execute on Mongo CLI
mongo --quiet \
export-csv.js \
| split -l 500000 --filter='gzip > $FILE.csv.gz' - sample_
# Load all Splitted Files to Google Cloud Storage
gsutil -m mv ./sample_* gs://your-bucket/sample/
# Load files to BigQuery
bq load \
--source_format=CSV \
--max_bad_records=999999 \
--ignore_unknown_values=true \
--encoding=UTF-8 \
--replace \
"YOUR_DATASET.mongodb_sample" \
"gs://your-bucket/sample/sample_*.csv.gz" \
TIP: In above script I did small trick by piping output to able to split the output in multiple files with sample_ prefix. Also during split it will GZip the output so you can load it easier to GCS.
From a basic reading of MongoDB's documentation, it sounds like you can use mongoexport to dump your database as JSON. Once you've done that, refer to the BigQuery loading data topic for a description of how to create a table from JSON files after copying them to GCS.
You can read data from MongoDB and stream it to BigQuery. You can find an example in NodeJS here.
This is an extension of the linked example that prevents duplicated records (as long as they are still streaming buffer):
const { BigQuery } = require('#google-cloud/bigquery');
const bigqueryClient = new BigQuery();
const jsonData = // Array of documents from MongoDB
const inputRows = => ({
insertId: row._id,
json: row
const insertOptions = {
raw: true
await bigqueryClient
.insert(inputRows, insertOptions);