How to allow Rundeck job to pickup latest node inventory list? - rundeck

There are two Rundeck Jobs in the infra:
Job 01: (For local execution)
Will allow the user to upload a file consisting of servers list (Nodes: Execute Locally on rundeck)
The new nodes list will get updated to the new inventory file (resources.xml) which will be used by Rundeck henceforth
Triggers an API call to run Job 02
Job 02: (For remote execution)
Will run the Job for the updated inventory list.
Result: The runs were successful. The new nodes are reflecting to the latest inventory.
Problem: The issue is after 5 such executions, Rundeck uses the cache inventory names. For example, For Job execution #5, it uses #4 execution's inventory list. Is there any way this can be avoided? This could turn out to be a bigger issue when deployed in large.

You can decrease the model source cache delay time, go to Project Settings > Edit Nodes > Configuration tab, and set the Cache Delay textbox in seconds (default value: 30 seconds).

Related

In Azure DevOps is it possible to run a YAML pipeline stage when a third party service returns certain payload (not status code)?

I want to run a Veracode security scan on our big application. The problem is that the scan takes about 5 days and I do not want to keep the pipeline agent pinned down during all this time.
My current solution is this:
Have build 1 to upload the binaries and initiate the pre-scan.
Have build 2 run automatically when build 1 is over and poll the status of the upload. If the pre-scan is over - start the scan. If the scan is over - mark as success and clear the schedule. If the pre-scan or scan is still running - schedule the build 2 to run every 30 minutes from the current point in time by modifying the schedule in the build definition and mark the build as partially succeeded.
This scheme works, but I wonder if it can be simplified.
What if the build 2 was a Release tied to the build 1?
But is it possible to repeat the logic without scheduling a new Release every time and without pinning down the build agent for the entire duration of the scan (5 days) and by using YAML (i.e. unified pipeline instead of Build + Release) ?
After all, a single Release is able to deploy the same stage mutliple times - these are the different deployment attempts. The problem is that I do not know how to request an attempt in 30 minutes without just blocking the agent for the next 30 minutes, which is unacceptable.
So this brings up the question - can we use a YAML pipeline to start a build and then poll a third party service until it returns what we need (the payload must be parsed and examined, HTTP Status code is not enough) so that the next stage could run?
You're looking for the concept of "gates" or "approvals and checks" depending on whether you're using YAML or classic pipelines.
You can add an "invoke REST API" check that periodically queries a REST endpoint and only proceeds when it satisfies the necessary conditions. This is a "serverless" operation -- it does not tie up an agent.

Github Actions Concurrency Queue

Currently we are using Github Actions for CI for infrastructure.
Infrastructure is using terraform and a code change on a module triggers plan and deploy for changed module only (hence only updates related modules, e.g 1 pod container)
Since auto-update can be triggered by another github repository push they can come relatively on same time frame, e.g Pod A Image is updated and Pod B Image is updated.
Without any concurrency in place, since terraform holds lock, one of the actions will fail due to lock timeout.
After implementing concurreny it is ok for just 2 on same time pushes to deploy as second one can wait for first one to finish.
Yet if there are more coming, Githubs concurreny only takes into account last push for queue and cancels waiting ones (in progress one can still continue). This is logical from single app domain perspective but since our Infra code is using difference checks, by passing deployments on canceled job actually bypasses and deployment!.
Is there a mechanism where we can queue workflows (or even maybe give a queue wait timeout) on Github Actions ?
Eventually we wrote our own script in workflow to wait for previous runs
Get information on current run
Collect previous non completed runs
and wait until completed (in a loop)
If exited waiting loop continue
on workflow
Tutorial on checking status of workflow jobs
https://www.softwaretester.blog/detecting-github-workflow-job-run-status-changes

Gitlab coordinator waits 5 minutes after job finished by runner - How to diagnose

We have a current (13.5.1-ee) Kubernetes deployed Gitlab which has recently developed an unusual and obstructive behaviour:
After a job finishes (either successfully or with a failure), and this is reported by the runner (as confirmed by the local log), the coordinator waits for 5 minutes to report the status via the UI and start the next job.
This behaviour does not depend on the:
type of executor. It occurs with the docker and kubernetes executor.
size of artifacts (for test cases there are none)
size of logs (for test cases they are 5 lines long)
image being used (for test cases it is busybox)
script (for tests it is empty)
network quality (for tests I have activated feature flags that this)
It feels that the coordinator is attempting a call to another system and times out.
Has anyone seen this before? Does anyone have a means to diagnose?
This was a bug affecting Azure.

AWS Fargate vs Batch vs ECS for a once a day batch process

I have a batch process, written in PHP and embedded in a Docker container. Basically, it loads data from several webservices, do some computation on data (during ~1h), and post computed data to an other webservice, then the container exit (with a return code of 0 if OK, 1 if failure somewhere on the process). During the process, some logs are written on STDOUT or STDERR. The batch must be triggered once a day.
I was wondering what is the best AWS service to use to schedule, execute, and monitor my batch process :
at the very begining, I used a EC2 machine with a crontab : no high-availibilty function here, so I decided to switch to a more PaaS approach.
then, I was using Elastic Beanstalk for Docker, with a non-functional Webserver (only to reply to the Healthcheck), and a Crontab inside the container to wake-up my batch command once a day. With autoscalling rule min=1 max=1, I have HA (if the container crash or if the VM crash, it is restarted by AWS)
but now, to be more efficient, I decided to move to some ECS service, and have an approach where I do not need to have EC2 instances awake 23/24 for nothing. So I tried Fargate.
with Fargate I defined my task (Fargate type, not the EC2 type), and configure everything on it.
I create a Cluster to run my task : I can run "by hand, one time" my task, so I know every settings are corrects.
Now, going deeper in Fargate, I want to have my task executed once a day.
It seems to work fine when I used the Scheduled Task feature of ECS : the container start on time, the process run, then the container stop. But CloudWatch is missing some metrics : CPUReservation and CPUUtilization are not reported. Also, there is no way to know if the batch quit with exit code 0 or 1 (all execution stopped with status "STOPPED"). So i Cant send a CloudWatch alarm if the container execution failed.
I use the "Services" feature of Fargate, but it cant handle a batch process, because the container is started every time it stops. This is normal, because the container do not have any daemon. There is no way to schedule a service. I want my container to be active only when it needs to work (once a day during at max 1h). But the missing metrics are correctly reported in CloudWatch.
Here are my questions : what are the best suitable AWS managed services to trigger a container once a day, let it run to do its task, and have reporting facility to track execution (CPU usage, batch duration), including alarm (SNS) when task failed ?
We had the same issue with identifying failed jobs. I propose you take a look into AWS Batch where logs for FAILED jobs are available in CloudWatch Logs; Take a look here.
One more thing you should consider is total cost of ownership of whatever solution you choose eventually. Fargate, in this regard, is quite expensive.
may be too late for your projects but still I thought it could benefit others.
Have you had a look at AWS Step Functions? It is possible to define a workflow and start tasks on ECS/Fargate (or jobs on EKS for that matter), wait for the results and raise alarms/send emails...

Jenkins trigger job by another which are running on offline node

Is there any way to do the following:
I have 2 jobs. One job on offline node has to trigger the second one. Are there any plugins in Jenkins that can do this. I know that TeamCity has a way of achieving this, but I think that Jenkins is more constrictive
When you configure your node, you can set Availability to Take this slave on-line when in demand and off-line when idle.
Set Usage as Leave this machine for tied jobs only
Finally, configure the job to be executed only on that node.
This way, when the job goes to queue and cannot execute (because the node is offline), Jenkins will try to bring this node online. After the job is finished, the node will go back to offline.
This of course relies on the fact that Jenkins is configured to be able to start this node.
One instance will always be turn on, on which the main job can be run. And have created the job which will look in DB and if in the DB no running instances, it will prepare one node. And the third job after running tests will clean up my environment.