Github Actions Concurrency Queue - github

Currently we are using Github Actions for CI for infrastructure.
Infrastructure is using terraform and a code change on a module triggers plan and deploy for changed module only (hence only updates related modules, e.g 1 pod container)
Since auto-update can be triggered by another github repository push they can come relatively on same time frame, e.g Pod A Image is updated and Pod B Image is updated.
Without any concurrency in place, since terraform holds lock, one of the actions will fail due to lock timeout.
After implementing concurreny it is ok for just 2 on same time pushes to deploy as second one can wait for first one to finish.
Yet if there are more coming, Githubs concurreny only takes into account last push for queue and cancels waiting ones (in progress one can still continue). This is logical from single app domain perspective but since our Infra code is using difference checks, by passing deployments on canceled job actually bypasses and deployment!.
Is there a mechanism where we can queue workflows (or even maybe give a queue wait timeout) on Github Actions ?

Eventually we wrote our own script in workflow to wait for previous runs
Get information on current run
Collect previous non completed runs
and wait until completed (in a loop)
If exited waiting loop continue
on workflow
Tutorial on checking status of workflow jobs
https://www.softwaretester.blog/detecting-github-workflow-job-run-status-changes

Related

In Azure DevOps is it possible to run a YAML pipeline stage when a third party service returns certain payload (not status code)?

I want to run a Veracode security scan on our big application. The problem is that the scan takes about 5 days and I do not want to keep the pipeline agent pinned down during all this time.
My current solution is this:
Have build 1 to upload the binaries and initiate the pre-scan.
Have build 2 run automatically when build 1 is over and poll the status of the upload. If the pre-scan is over - start the scan. If the scan is over - mark as success and clear the schedule. If the pre-scan or scan is still running - schedule the build 2 to run every 30 minutes from the current point in time by modifying the schedule in the build definition and mark the build as partially succeeded.
This scheme works, but I wonder if it can be simplified.
What if the build 2 was a Release tied to the build 1?
But is it possible to repeat the logic without scheduling a new Release every time and without pinning down the build agent for the entire duration of the scan (5 days) and by using YAML (i.e. unified pipeline instead of Build + Release) ?
After all, a single Release is able to deploy the same stage mutliple times - these are the different deployment attempts. The problem is that I do not know how to request an attempt in 30 minutes without just blocking the agent for the next 30 minutes, which is unacceptable.
So this brings up the question - can we use a YAML pipeline to start a build and then poll a third party service until it returns what we need (the payload must be parsed and examined, HTTP Status code is not enough) so that the next stage could run?
You're looking for the concept of "gates" or "approvals and checks" depending on whether you're using YAML or classic pipelines.
You can add an "invoke REST API" check that periodically queries a REST endpoint and only proceeds when it satisfies the necessary conditions. This is a "serverless" operation -- it does not tie up an agent.

Azure DevOps cancells some pipelines when exlusive lock with sequential run is enabled

According to documentation I can now use sequential deployments with usage of lockBehavior: sequential and exclusive locks on environment.
It works fine, stages that want to deploy to certain environment are queued and waiting for lock to be acquired but sometimes they are simply cancelled with information Did not obtain the lock like in the screen below. It happens randomly. When I rerun the pipeline again - sometimes it success sometimes fails with the same message. What could be the reason of cancelling these pipelines?
Infromation about sequential deployments:
https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/sprint-190-update

Datastage: How to keep continuous mode job running after a unexpected termination

I have a job that uses the Kafka Connector Stage in order to read a Kafka queue and then load into the database. That job runs in Continuous Mode, which it has no time to conclude, since it keeps monitoring the Kafka queue in real time.
For unexpected reasons (say, server issues, job issues etc) that job may terminate with failure. In general, that happens after 300 running hours of that job. So, in order to keep the job alive I have to manually look to the job status and then to do a Reset and Run, in order to keep the job running.
The problem is that between the job termination and my manual Reset and Run can pass several hours, which is critical. So I'm looking for a way to eliminate the manual interaction and to reduce that gap by automating the job invocation.
I tried to use Control-M to daily run the job, but with no success: The first day the Control-M called the job, it ran it fine. But in the next day, when the Control-M did an attempt to instantiate the job again it failed (since it was already running). Besides, the Datastage will never tell back Control-M that a job was successfully concluded, since the job's nature won't allow that.
Said that, I would like to hear ideas from you that can light me up.
The first thing that came in mind is to create a intermediate Sequence and then schedule it in Control-M. Then, this new Sequence would call the continuous job asynchronously by using command line stage.
For the case where just this one job terminates unexpectedly and you want it to be restarted as soon as possible, have you considered calling this job from a sequence? The sequence could be setup to loop running this job.
Thus sequence starts job and waits for it to finish. When job finishes, the sequence will then loop and start the job again. You could have added conditions on job exit (for example, if the job aborted, then based on that job end status, you could reset the job before re-running it.
This would not handle the condition where the DataStage engine itself was shut down (such as for maintenance or possibly an error) in which case all jobs end including your new sequence. The same also applies for a server reboot or other situations where someone may have inadvertently stopped your sequence. For those cases (such as DataStage engine stop) your team would need to have process in place for jobs/sequences that need to be started up following a DataStage or System outage.
For the outage scenario, you could create a monitor script (regardless of whether running the job solo or from sequence) that sleeps/loops on 5-10 minute intervals and then checks the status of your job using dsjob command, and if not running can start that job/sequence (also via dsjob command). You can decide whether that script startup would occur at DataSTage startup, machine startup, or run it from Control M or other scheduler.

PowerShell session/environment isolation - Jobs sharing same context?

I'm testing a workflow runbook that utilizes Add-Type to add some custom C# code.
All of a sudden I started getting 'type already exists' errors on subsequent test jobs, as if a new PSSession is not being created.
In other words, it looks like new jobs are sharing the same execution context. I only get this locally if I try to run the same command twice per PS instance.
The type in question is a static class with some Extension methods. Since it also happens to be the first type declared in the source block, I don't doubt other non-static types would throw errors as well.
I've executed this handfuls of times already, so I fully expect that 'eventually' this will stop happening, but I can't seem to force it, and I have no idea what I could've done to trip it into this situation, either.
Seeing evidence of shared execution contexts across jobs like this - even (especially?) if only temporal - makes me wonder if some or all of the general execution inconsistencies we've seen in the past when making & deploying changes & performing subsequent tests soon-after are related to this.
I'm tempted to think that this is simply a part of the difference between a Test Job and a 'real' one, but that raises questions about the validity of the Test jobs themselves WRT mimicking Published Jobs.
Are all Azure Automation Jobs supposed to execute in Isolation? Can this be controlled/exploited by a developer?
Each automation account has its own isolated sandboxes where its jobs run. Those sandboxes are distributed among a number of worker machines. For test jobs, to try to improve job start time since [make code change, retest] over and over is very common, Automation reuses the same sandbox as used for previous test jobs of this runbook, if the sandbox has not been cleaned up yet, so that sandboxes do not have to be spun up for each unique test job (sandbox creation is one reason for a longer job start time than desired). Due to this behavior, if you execute test jobs of the same runbook within a short amount of time, you will get the behavior you're seeing above.
However, even for production jobs, jobs of the same automation account (across runbooks) can share the same sandboxes. We randomly distribute jobs across our worker machines, so its possible job A is queued for execution and is placed on worker W, then 5 minutes later, job B is queued for execution and is placed on worker W as well. If job A and job B are of the same automation account and have the same "demands" in terms of modules / module versions, they will be placed in the same sandbox, if job A's sandbox is still around. "Module / module version demands" does not mean the modules used by the runbook, but the modules / latest module versions that existed in the automation account at the time when the job was started / runbook was scheduled (for jobs started via schedule) / runbook was assigned to a webhook (for jobs started via webhook)
In terms of resolving your specific problem, you could surround Add-Type with a try, catch statement, or maybe use Add-Type -IgnoreWarnings

Workflow scheduling in Informatica

My requirement is:
Workflow should run daily at 2pm. Workflow has been scheduled to run at 2pm
We have lookup on master tables. Records with IDs that are not present in the master tables will get rejected.
These new IDs have to be loaded into the master tables manually and then the workflow has to be re-run.
Daily the same thing happens.
My question is -
Is it possible to schedule a workflow to run twice every day(one for the first run, the other to run after the master table is updated)?
If No, can I manually start a scheduled workflow? Will it make the workflow unscheduled?
Please, Can any one help me with this?
Informatica's scheduler is a weak spot. I guess using two copies of the same workflow with different schedules would be the easiest solution.
Got a solution for my problem.
Once a workflow is scheduled, even if a particular session has to be re-run manually, whole workflow has be run from the workflow manager.
If that particular session is run manually, scheduling will be gone.
So always run the workflow instead of a session, so that scheduling will remain.