How can I create a queue for jobs of a workflow in Github Actions triggered by different PRs? - github

We are using Github Actions for our CI jobs, but when we have new commits to the different PRs, all of them trigger the CI workflow and we get many parallel executions of the workflow. The expected behavior would be to have a maximum of 2 or 3 parallel executions and a queue that then starts the jobs triggered by other PRs as soon as one of them finishes.
Using the concurrency setting doesn't work for this case, because it cancels other executions.

Related

How to disable GitHub Actions Concurrency

GitHub Actions concurrency broke my process, as we would push commits and want to prove they have all built.
But now with Concurrency, GitHub cancels builds on previous commits as soon as a new commit is pushed.
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency
Is there a away to completely disable it?
Or even better, have it configurable per branch?
As documented "Any previously pending in the concurrency group will be canceled". In others words, the queue is limited in one. Somewhat useless. To permit parallel execution just not use 'concurrency' term on action.

How to run actions without dependencies in sequence with Dagger

I am looking into Dagger, the CICD kit.
I understand that in the Dagger pipeline, when multiple actions are executed in succession and have dependencies on each other's actions, they can be executed in succession by setting the actions that depend on the inputs and outputs in the actions.
This can be understood from the samples and explanations on the official website.
https://docs.dagger.io/1221/action/#composite-actions
So, if I want to execute an action that does not depend on an action continuously, how can I set it up?
Thanks.

Reuse Jobs in GitHub Actions Workflow

I’m migrating a pipeline from Circle CI to Github Actions and am finding it a bit weird that I can only run jobs once instead of creating a job, then calling it from the workflow section, making it possible to call a job multiple times without duplicating the commands/scripts in that job.
My pipeline pushes out code to three environments, then runs a lighthouse scan for each of them. In circle ci I have 1 job to push the code to my envs and 1 job to run lighthouse. Then from my workflow section, I just call the jobs 3 times, passing the env as a parameter. Am I missing something or is there no way to do this in github actions? Do I just have to write out my commands 3 times in each job?
There are 3 main approaches for code reusing in GitHub Actions:
Reusing workflows
The obvious option is using the "Reusable workflows" feature that allows you to extract some steps into a separate "reusable" workflow and call this workflow as a job in other workflows.
Takeaways:
Reusable workflows can't call other reusable workflows.
The strategy property is not supported in any job that calls a reusable workflow.
Env variables and secrets are not inherited.
It's not convenient if you need to extract and reuse several steps inside one job.
Since it runs as a separate job, you have to use build artifacts to share files between a reusable workflow and your main workflow.
You can call a reusable workflow in synchronous or asynchronous manner (managing it by jobs ordering using needs keys).
A reusable workflow can define outputs that extract outputs/outcomes from executed steps. They can be easily used to pass data to the "main" workflow.
Dispatched workflows
Another possibility that GitHub gives us is workflow_dispatch event that can trigger a workflow run. Simply put, you can trigger a workflow manually or through GitHub API and provide its inputs.
There are actions available on the Marketplace which allow you to trigger a "dispatched" workflow as a step of "main" workflow.
Some of them also allow doing it in a synchronous manner (wait until dispatched workflow is finished). It is worth to say that this feature is implemented by polling statuses of repo workflows which is not very reliable, especially in a concurrent environment. Also, it is bounded by GitHub API usage limits and therefore has a delay in finding out a status of dispatched workflow.
Takeaways
You can have multiple nested calls, triggering a workflow from another triggered workflow. If done careless, can lead to an infinite loop.
You need a special token with "workflows" permission; your usual secrets.GITHUB_TOKEN doesn't allow you to dispatch a workflow.
You can trigger multiple dispatched workflows inside one job.
There is no easy way to get some data back from dispatched workflows to the main one.
Works better in "fire and forget" scenario. Waiting for a finish of dispatched workflow has some limitations.
You can observe dispatched workflows runs and cancel them manually.
Composite Actions
In this approach we extract steps to a distinct composite action, that can be located in the same or separate repository.
From your "main" workflow it looks like a usual action (a single step), but internally it consists of multiple steps each of which can call own actions.
Takeaways:
Supports nesting: each step of a composite action can use another composite action.
Bad visualisation of internal steps run: in the "main" workflow it's displayed as a usual step run. In raw logs you can find details of internal steps execution, but it doesn't look very friendly.
Shares environment variables with a parent job, but doesn't share secrets, which should be passed explicitly via inputs.
Supports inputs and outputs. Outputs are prepared from outputs/outcomes of internal steps and can be easily used to pass data from composite action to the "main" workflow.
A composite action runs inside the job of the "main" workflow. Since they share a common file system, there is no need to use build artifacts to transfer files from the composite action to the "main" workflow.
You can't use continue-on-error option inside a composite action.
Source: my "DRY: reusing code in GitHub Actions" article
I'm currently in the exact same boat and just found an answer. You're looking for a Composite Action, as suggested in this answer.
Reusable workflows can't call other reusable workflows.
Actually, they can, since Aug. 2022:
GitHub Actions: Improvements to reusable workflows
Reusable workflows can now be called from a matrix and other reusable workflows.
You can now nest up to 4 levels of reusable workflows giving you greater flexibility and better code reuse.
Calling a reusable workflow from a matrix allows you to create richer parameterized builds and deployments.
Learn more about nesting reusable workflows.
Learn more about using reusable workflows with the matrix strategy.

Celery: Make sure workers are not running only jobs from one user

I have 4 celery workers each with concurrency of 6.
I have users submitting varying number of jobs (from 1 to 20).
How do I ensure that each user's job get equal processing time, and that one user's job do not fill up the queue forcing other user's jobs to wait.
I am afraid if the workers are ending up going through all the jobs submitted by the first user, the other user's queued jobs must wait first user to finish, an inconvenience.
Is there a way to make the celery workers aware of one user's jobs holding up other user's queued jobs . Instead can I run maximum one job from each user at any given time?
I have one queue which I submit all the user's jobs to, would I need to make a queue for each user and somehow have round-robin strategy to pull one job from each user's queue?
At the moment Celery doesn't support priority queues.
Making a queue for each user and scheduling them based on round-robin algorithm seems to be a lot of work.
One simple way to solve your problem is to create a temporary table & store the incoming task details. Send first received task to celery. By the time it's completed, you might have received a lot of tasks from various users. Now based on the user id, completed tasks & uncompleted tasks, you can send most appropriate task to celery for execution.

Can Quartz Scheduler Run jobs serially?

I'm looking into using Quartz Scheduler, and I was wondering if it was possible to schedule jobs not by time, but when another job finishes. So, when Job A is done, it starts Job B. When that's done, it starts Job C, etc.
Job A -> Job B -> Job C -> Job A... continuously.
Is this the right tool for the job? Or should I be looking into something else?
Check out JobChainingJobListener, built-in to Quartz (bold mine):
Keeps a collection of mappings of which Job to trigger after the completion of a given job. If this listener is notified of a job completing that has a mapping, then it will then attempt to trigger the follow-up job. This achieves "job chaining", or a "poor man's workflow".
That's right, you are looking for a process or workflow engine. Have a look at activiti or jbpm.
You may want to check the QuartzDesk project I have been involved in. QuartzDesk is a management and monitoring platform for Quartz-based apps and in version 2.0 we have added a new job chaining engine to the platform.
The engine allows you to orchestrate the execution of your jobs and there is no need to modify your application code in any way. Job chains can be dynamically updated through the QuartzDesk GUI without any disruption to your application.