Limit retries by time instead of attempts - rundeck

I have a job that processes items from a queue. When the queue is empty, it finishes in seconds. However, if there are items in the queue, it will sometimes take minutes (or hours) to finish.
I want it to automatically retry after a failure (5 minute wait) UNLESS the last successful job was more than 1 hour ago. If it fails and the last success was more than 1 hour ago, I want it to notify us (via notification plugin)
How can I do that?

You can create a "monitor job" that verifies the status of your "main job" using the Rundeck API "embeded" inside an inline-script step.
This job has an inline script step that verifies the latest status of the main job and calculates the minutes between the latest success execution and current time, all via API, the scrip needs jq and dateutils. If the condition happens, disable the schedule on the main job and send a notification email using the "On Failure" notification
HelloWorld job ("Main Job", the job to be monitored as you said, has a 5mins retry time, I suppose that is a scheduled job).
<joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<executionEnabled>true</executionEnabled>
<id>3fc55b67-b2a1-4e92-b949-7ed464041993</id>
<loglevel>INFO</loglevel>
<name>HelloWorld</name>
<nodeFilterEditable>false</nodeFilterEditable>
<plugins />
<retry delay='5m'>1</retry>
<schedule>
<month month='*' />
<time hour='17' minute='15' seconds='0' />
<weekday day='*' />
<year year='*' />
</schedule>
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<exec>eho "hello world"</exec>
</command>
</sequence>
<uuid>3fc55b67-b2a1-4e92-b949-7ed464041993</uuid>
</job>
</joblist>
And the "Monitor" job, this job verifies all conditions on an inline-script using Rundeck API:
<joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<executionEnabled>true</executionEnabled>
<id>7a4b117c-75cc-4b49-a504-703e27941170</id>
<loglevel>INFO</loglevel>
<name>Monitor</name>
<nodeFilterEditable>false</nodeFilterEditable>
<notification>
<onfailure>
<email attachLog='true' attachLogInFile='true' recipients='devops#example.net' subject='job failure' />
</onfailure>
</notification>
<notifyAvgDurationThreshold />
<plugins />
<schedule>
<month month='*' />
<time hour='*' minute='0/15' seconds='0' />
<weekday day='*' />
<year year='*' />
</schedule>
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<description>Monitor the main job</description>
<fileExtension>.sh</fileExtension>
<script><![CDATA[# get the latest execution
latest_execution_status=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[].status')
# now get the latest successful execution date
latest_succeeded_execution_date=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?status=succeeded&max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[]."date-ended".date' | sed 's/.$//')
# the current date (to compare with the latest execution date)
current_date=$(date --iso-8601=seconds)
# comparing the two dates using dateutils
time_between_latest_succeeded_and_current_time=$(dateutils.ddiff $latest_succeeded_execution_date $current_date -f "%M")
# just for debug
echo "LATEST EXECUTION STATE: $latest_execution_status"
echo "LATEST SUCCEEDED EXECUTION DATE: $latest_succeeded_execution_date"
echo "CURRENT DATE: $current_date"
echo "MINUTES SINCE LATEST SUCCEEDED EXECUTION: $time_between_latest_succeeded_and_current_time minutes"
# If it fails and the last success was more than 1 hour ago, we want it to notify us (via notification plugin)
if [ $latest_execution_status = 'failed' ] && [ $time_between_latest_succeeded_and_current_time -gt 60 ]; then
# disable main job schedule
curl -s --location --request POST 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/schedule/disable' --header 'Accept: application/json' --header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' --header 'Content-Type: application/json'
# with exit 1 this job fails and the notification "on failure" is triggered
exit 1
else
echo "all ok!"
fi
# all done.]]></script>
<scriptargs />
<scriptinterpreter>/bin/bash</scriptinterpreter>
</command>
</sequence>
<uuid>7a4b117c-75cc-4b49-a504-703e27941170</uuid>
</job>
</joblist>
Here the script if you need it separately:
# get the latest execution
latest_execution_status=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[].status')
# now get the latest successful execution date
latest_succeeded_execution_date=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?status=succeeded&max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[]."date-ended".date' | sed 's/.$//')
# the current date (to compare with the latest execution date)
current_date=$(date --iso-8601=seconds)
# comparing the two dates using dateutils
time_between_latest_succeeded_and_current_time=$(dateutils.ddiff $latest_succeeded_execution_date $current_date -f "%M")
# just for debug
echo "LATEST EXECUTION STATE: $latest_execution_status"
echo "LATEST SUCCEEDED EXECUTION DATE: $latest_succeeded_execution_date"
echo "CURRENT DATE: $current_date"
echo "MINUTES SINCE LATEST SUCCEEDED EXECUTION: $time_between_latest_succeeded_and_current_time minutes"
# If it fails and the last success was more than 1 hour ago, we want it to notify us (via notification plugin)
if [ $latest_execution_status = 'failed' ] && [ $time_between_latest_succeeded_and_current_time -gt 60 ]; then
# disable main job schedule
curl -s --location --request POST 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/schedule/disable' --header 'Accept: application/json' --header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' --header 'Content-Type: application/json'
# with exit 1 this job fails and the notification "on failure" is triggered
exit 1
else
echo "all ok!"
fi
# all done.
Of course parameters as job id, host, etc can be parameterized using options.

Related

How to restore the last on demand snapshot taken via api?

I took the backup of the database using below command
CODE=`curl --user "${{ secrets.PUBLIC_KEY }}:${{ secrets.PRIVATE_KEY }}" \
--digest --include \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--request POST "https://cloud.mongodb.com/api/atlas/v1.0/groups/${{ secrets.PROJECT_ID }}/clusters/cluster-1/backup/snapshots?pretty=true" \
--data '{ "description" : "On Demand Snapshot", "retentionInDays" : 3 }'`
Now I am looking for a way to use the above snapshot the restore back the database using the curl command
First you need to get the snapshot ID of the last snapshot like this:
echo "Get the snapshots list from ${ATLAS_SOURCE_CLUSTER}"
curl --user "${ATLAS_USER}:${ATLAS_KEY}" \
--output atlas-list-snaps.json \
--digest \
--silent \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--request GET "https://cloud.mongodb.com/api/atlas/v1.0/groups/${ATLAS_SOURCE_GROUPID}/clusters/${ATLAS_SOURCE_CLUSTER}/backup/snapshots/shardedClusters?pretty=true"
#cat atlas-list-snaps.json
echo "Get the last snapshot from the list"
LAST_SNAPSHOT_ID=$(cat atlas-list-snaps.json | jq -r '.results[0].id')
echo "> ${LAST_SNAPSHOT_ID}"
Then, create a json payload describing your import job and then trigger the restore job:
echo "Restore job payload generation"
RESTORE_JOB_PAYLOAD=$(cat <<RESTORE_JOB_JSON
{
"deliveryType": "automated",
"snapshotId": "${LAST_SNAPSHOT_ID}",
"targetClusterName": "${ATLAS_TARGET_CLUSTER}",
"targetGroupId": "${ATLAS_TARGET_GROUPID}"
}
RESTORE_JOB_JSON
)
echo "${RESTORE_JOB_PAYLOAD}"
echo "Submitting restore job request to cluster ${TARGET_ATLAS_CLUSTER}"
curl --user "${ATLAS_USER}:${ATLAS_KEY}" \
--digest \
--header "Accept: application/json" \
--silent \
--header "Content-Type: application/json" \
--output atlas-restore_job.json \
--request POST "https://cloud.mongodb.com/api/atlas/v1.0/groups/${ATLAS_SOURCE_GROUPID}/clusters/${ATLAS_SOURCE_CLUSTER}/backup/restoreJobs?pretty=true" \
--data "${RESTORE_JOB_PAYLOAD}"
echo "Submitted restore job"

Getting most recent execution by job via RunDeck API

Is there a more efficient way to get the most recent execution of every job in a RunDeck project that by (1) querying for the job list and then (2) querying for the max: 1 execution list of each job in serial?
The most efficient way to get all jobs last execution is getting the IDs, putting the IDs in a list, and then calling the execution endpoint based on this answer.
I made a working bash example using jq, take a look:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="localhost"
rdeck_port="4440"
rdeck_api="40"
rdeck_token="RQvvsGODsP8YhGUw6JARriXOAn6s6OQR"
# project name
project="ProjectEXAMPLE"
# first get all jobs (of ProjectEXAMPLE project)
jobs=$(curl -s --location "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/project/$project/jobs" --header "Accept: application/json" --header "X-Rundeck-Auth-Token: $rdeck_token" --header "Content-Type: application/json" | jq -r .[].id)
# just for debug, print all jobs ids
echo $jobs
# then iterates on the jobs id's and extracting the last succeeded execution
for z in ${jobs}; do
curl -s --location "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/job/$z/executions?status=succeeded&max=1" --header "Accept: application/json" --header "X-Rundeck-Auth-Token: $rdeck_token" --header "Content-Type: application/json" | jq
done
Sadly doesn't exist a direct way to do that in a single call.

Rundeck Job List View plugin installation issue

I'm trying out and exploring the plugins available in Rundeck. I'm trying to install Job List View plugin because I want to see the statistics of my jobs but after installing I still can't see the job list view. Then whenever I restart Rundeck service, then go to Plugin repositories, the plugin needs to be installed again even though I've clearly installed the job list view plugin before. I can't see any errors in service.log.
How can I fix this issue? Thanks!
My Rundeck version is 3.3.5
That's a bug reported here (by the question author). Anyway, you can get the job info via API, I leave some examples using jq to "beautify" the output:
To get All jobs from a project:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="your_rundeck_node"
rdeck_port="4440"
rdeck_api="36"
rdeck_token="cqgfZlrSF84oUoC2ZzRwiltiyefjZx9R"
# specific api call info
rdeck_job="5dc08e08-0e28-4a74-9ef0-4ec0c8e3f55e"
rdeck_project="YourProject"
# get the job list from a project
curl -s --location --request GET "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/project/$rdeck_project/jobs" \
--header "Accept: application/json" \
--header "X-Rundeck-Auth-Token: $rdeck_token" | jq
Get all job metadata:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="your_rundeck_node"
rdeck_port="4440"
rdeck_api="36"
rdeck_token="cqgfZlrSF84oUoC2ZzRwiltiyefjZx9R"
# specific api call info
rdeck_job="5dc08e08-0e28-4a74-9ef0-4ec0c8e3f55e"
# get the job metadata
curl -s --location --request GET "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/job/$rdeck_job/info" \
--header "Accept: application/json" \
--header "X-Rundeck-Auth-Token: $rdeck_token" | jq
Get job forecast information:
#!/bin/sh
# protocol
protocol="http"
# basic rundeck info
rdeck_host="your_rundeck_node"
rdeck_port="4440"
rdeck_api="36"
rdeck_token="cqgfZlrSF84oUoC2ZzRwiltiyefjZx9R"
# specific api call info
rdeck_job="5dc08e08-0e28-4a74-9ef0-4ec0c8e3f55e"
# get the job forecast
curl -s --location --request GET "$protocol://$rdeck_host:$rdeck_port/api/$rdeck_api/job/$rdeck_job/forecast" \
--header "Accept: application/json" \
--header "X-Rundeck-Auth-Token: $rdeck_token" | jq
More info about Rundeck API here, and here a lot of useful examples.

How to trigger TeamCity build from Command Line using REST API?

I am trying to trigger TeamCity build from command line.
Firstly, I tried:
curl http://<user name>:<user password>#<server address>/httpAuth/action.html?add2Queue=<build configuration Id>
But in latest versions of TeamCity this approach is removed and response is following:
405 Only POST method is allowed for this request.
So, based on information from https://www.jetbrains.com/help/teamcity/rest-api.html#RESTAPI-BuildRequests it should work via REST API in this way:
url -v -u user:password http://teamcity.server.url:8111/app/rest/buildQueue --request POST --header "Content-Type:application/xml" --data-binary #build.xml
build.xml example:
build.xml
<build>
<buildType id="buildConfID"/>
</build>
For me is not clear where should I place my configured build.xml?
curl -u user:password -X POST \
https://teamcity.host.io/app/rest/buildQueue \
-H 'Accept: application/json' \
-H 'Content-Type: application/xml' \
-H 'Host: teamcity.host.io' \
-d '<build branchName="refs/heads/master">
<triggeringOptions cleanSources="true" rebuildAllDependencies="false" queueAtTop="false"/>
<buildType id="Test_Configuration_ID_for_trigger"/>
<lastChanges>
<change locator="version:e2418b4d7ae55ac4610dbff51bffe60d1f32e019"/>
</lastChanges>
<properties>
<property name="env.startedBy" value="build was triggering from %teamcity.serverUrl%/viewLog.html?buildId=%teamcity.build.id%"/>
</properties>
</build>'
you can skip lastChanges for run the build on the Latest changes

Slow importing into Google Cloud SQL

Google Cloud SQL is my first real evaluation at MySQL as a service. I created a D32 instance, set replication to async, and disabled binary logging. Importing 5.5 GB from dump files from a GCE n1-standard-1 instance in the same zone took 97 minutes.
Following the documentation, the connection was done using the public IP address, but is in the same region and zone. I'm fully open to the fact that I did something incorrectly. Is there anything immediately obvious that I should be doing differently?
we have been importing ~30Gb via cloud storage from zip files containing SQL statements and this is taking over 24Hours.
A big factor is the number of indexes that you have on the given table.
To keep it manageable, we split the file into chunks with each 200K sql statements which are being inserted in one transaction. This enables us to retry individual chunks in case of errors.
We also tried to do it via compute engine (mysql command line) and in our experience this was even slower.
Here is how to import 1 chunk and wait for it to complete. You cannot do this in parallel as cloudSql only allows for 1 import operation at a time.
#!/bin/bash
function refreshAccessToken() {
echo "getting access token..."
ACCESSTOKEN=`curl -s "http://metadata/computeMetadata/v1/instance/service-accounts/default/token" -H "X-Google-Metadata-Request: True" | jq ".access_token" | sed 's/"//g'`
echo "retrieved access token $ACCESSTOKEN"
}
START=`date +%s%N`
DB_INSTANCE=$1
GCS_FILE=$2
SLEEP_SECONDS=$3
refreshAccessToken
CURL_URL="https://www.googleapis.com/sql/v1beta1/projects/myproject/instances/$DB_INSTANCE/import"
CURL_OPTIONS="-s --header 'Content-Type: application/json' --header 'Authorization: OAuth $ACCESSTOKEN' --header 'x-goog-project-id:myprojectId' --header 'x-goog-api-version:1'"
CURL_PAYLOAD="--data '{ \"importContext\": { \"database\": \"mydbname\", \"kind\": \"sql#importContext\", \"uri\": [ \"$GCS_FILE\" ]}}'"
CURL_COMMAND="curl --request POST $CURL_URL $CURL_OPTIONS $CURL_PAYLOAD"
echo "executing $CURL_COMMAND"
CURL_RESPONSE=`eval $CURL_COMMAND`
echo "$CURL_RESPONSE"
OPERATION=`echo $CURL_RESPONSE | jq ".operation" | sed 's/"//g'`
echo "Import operation $OPERATION started..."
CURL_URL="https://www.googleapis.com/sql/v1beta1/projects/myproject/instances/$DB_INSTANCE/operations/$OPERATION"
STATE="RUNNING"
while [[ $STATE == "RUNNING" ]]
do
echo "waiting for $SLEEP_SECONDS seconds for the import to finish..."
sleep $SLEEP_SECONDS
refreshAccessToken
CURL_OPTIONS="-s --header 'Content-Type: application/json' --header 'Authorization: OAuth $ACCESSTOKEN' --header 'x-goog-project-id:myprojectId' --header 'x-goog-api-version:1'"
CURL_COMMAND="curl --request GET $CURL_URL $CURL_OPTIONS"
CURL_RESPONSE=`eval $CURL_COMMAND`
STATE=`echo $CURL_RESPONSE | jq ".state" | sed 's/"//g'`
END=`date +%s%N`
ELAPSED=`echo "scale=8; ($END - $START) / 1000000000" | bc`
echo "Import process $OPERATION for $GCS_FILE : $STATE, elapsed time $ELAPSED"
done