I have a workflow for CI in a monorepo, for this workflow two projects end up being built. The jobs run fine, however, I'm wondering if there is a way to remove the duplication in this workflow.yml file with the setting up of the runner for the job. I have them split so they run in parallel as they do not rely on one another and to be faster to complete. It's a big time difference in 5 minutes vs. 10+ when waiting for the CI to finish.
jobs:
job1:
name: PT.W Build
runs-on: macos-latest
steps:
- name: Checkout Repo
uses: actions/checkout#v1
- name: Setup SSH-Agent
uses: webfactory/ssh-agent#v0.2.0
with:
ssh-private-key: |
${{ secrets.SSH_PRIVATE_KEY }}
- name: Setup JDK 1.8
uses: actions/setup-java#v1
with:
java-version: 1.8
- name: Setup Permobil-Client
run: |
echo no | npm i -g nativescript
tns usage-reporting disable
tns error-reporting disable
npm run setup.all
- name: Build PT.W Android
run: |
cd apps/wear/pushtracker
tns build android --env.uglify
job2:
name: SD.W Build
runs-on: macos-latest
steps:
- name: Checkout Repo
uses: actions/checkout#v1
- name: Setup SSH-Agent
uses: webfactory/ssh-agent#v0.2.0
with:
ssh-private-key: |
${{ secrets.SSH_PRIVATE_KEY }}
- name: Setup JDK 1.8
uses: actions/setup-java#v1
with:
java-version: 1.8
- name: Setup Permobil-Client
run: |
echo no | npm i -g nativescript
tns usage-reporting disable
tns error-reporting disable
npm run setup.all
- name: Build SD.W Android
run: |
cd apps/wear/smartdrive
tns build android --env.uglify
You can see here the jobs have almost an identical process, it's just the building of the different apps themselves. I'm wondering if there is a way to take the duplicate blocks in the jobs and create a way to only write that once and reuse it in both jobs.
There are 3 main approaches for code reusing in GitHub Actions:
Reusable Workflows
Dispatched workflows
Composite Actions <-- it's the best one in your case
The following details are from my article describing their pros and cons:
🔸 Reusing workflows
The obvious option is using the "Reusable workflows" feature that allows you to extract some steps into a separate "reusable" workflow and call this workflow as a job in other workflows.
🥡 Takeaways:
Nested reusable workflow calls are allowed (up to 4 levels) while loops are not permitted.
Env variables are not inherited. Secrets can be inherited by using special secrets: inherit job param.
It's not convenient if you need to extract and reuse several steps inside one job.
Since it runs as a separate job, you have to use build artifacts to share files between a reusable workflow and your main workflow.
You can call a reusable workflow in synchronous or asynchronous manner (managing it by jobs ordering using needs keys).
A reusable workflow can define outputs that extract outputs/outcomes from executed steps. They can be easily used to pass data to the "main" workflow.
🔸 Dispatched workflows
Another possibility that GitHub gives us is workflow_dispatch event that can trigger a workflow run. Simply put, you can trigger a workflow manually or through GitHub API and provide its inputs.
There are actions available on the Marketplace which allow you to trigger a "dispatched" workflow as a step of "main" workflow.
Some of them also allow doing it in a synchronous manner (wait until dispatched workflow is finished). It is worth to say that this feature is implemented by polling statuses of repo workflows which is not very reliable, especially in a concurrent environment. Also, it is bounded by GitHub API usage limits and therefore has a delay in finding out a status of dispatched workflow.
🥡 Takeaways
You can have multiple nested calls, triggering a workflow from another triggered workflow. If done careless, can lead to an infinite loop.
You need a special token with "workflows" permission; your usual secrets.GITHUB_TOKEN doesn't allow you to dispatch a workflow.
You can trigger multiple dispatched workflows inside one job.
There is no easy way to get some data back from dispatched workflows to the main one.
Works better in "fire and forget" scenario. Waiting for a finish of dispatched workflow has some limitations.
You can observe dispatched workflows runs and cancel them manually.
🔸 Composite Actions
In this approach we extract steps to a distinct composite action, that can be located in the same or separate repository.
From your "main" workflow it looks like a usual action (a single step), but internally it consists of multiple steps each of which can call own actions.
🥡 Takeaways:
Supports nesting: each step of a composite action can use another composite action.
Bad visualisation of internal steps run: in the "main" workflow it's displayed as a usual step run. In raw logs you can find details of internal steps execution, but it doesn't look very friendly.
Shares environment variables with a parent job, but doesn't share secrets, which should be passed explicitly via inputs.
Supports inputs and outputs. Outputs are prepared from outputs/outcomes of internal steps and can be easily used to pass data from composite action to the "main" workflow.
A composite action runs inside the job of the "main" workflow. Since they share a common file system, there is no need to use build artifacts to transfer files from the composite action to the "main" workflow.
You can't use continue-on-error option inside a composite action.
As I know currently there is no way to reuse steps
but in this case, you can use strategy for parallel build and different variation:
jobs:
build:
name: Build
runs-on: macos-latest
strategy:
matrix:
build-dir: ['apps/wear/pushtracker', 'apps/wear/smartdrive']
steps:
- name: Checkout Repo
uses: actions/checkout#v1
- name: Setup SSH-Agent
uses: webfactory/ssh-agent#v0.2.0
with:
ssh-private-key: |
${{ secrets.SSH_PRIVATE_KEY }}
- name: Setup JDK 1.8
uses: actions/setup-java#v1
with:
java-version: 1.8
- name: Setup Permobil-Client
run: |
echo no | npm i -g nativescript
tns usage-reporting disable
tns error-reporting disable
npm run setup.all
- name: Build Android
run: |
cd ${{ matrix.build-dir }}
tns build android --env.uglify
For more information please visit https://help.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstrategy
Since Oct. 2021, "Reusable workflows are generally available"
Reusable workflows are now generally available.
Reusable workflows help you reduce duplication by enabling you to reuse an entire workflow as if it were an action. A number of improvements have been made since the beta was released in October:
You can utilize outputs to pass data from reusable workflows to other jobs in the caller workflow
You can pass environment secrets to reusable workflows
The audit log includes information about which reusable workflows are used
See "Reusing workflows" for more.
A workflow that uses another workflow is referred to as a "caller" workflow.
The reusable workflow is a "called" workflow.
One caller workflow can use multiple called workflows.
Each called workflow is referenced in a single line.
The result is that the caller workflow file may contain just a few lines of YAML, but may perform a large number of tasks when it's run. When you reuse a workflow, the entire called workflow is used, just as if it was part of the caller workflow.
Example:
In the reusable workflow, use the inputs and secrets keywords to define inputs or secrets that will be passed from a caller workflow.
# .github/actions/my-action.yml
# Note the special trigger 'on: workflow_call:'
on:
workflow_call:
inputs:
username:
required: true
type: string
secrets:
envPAT:
required: true
Reference the input or secret in the reusable workflow.
jobs:
reusable_workflow_job:
runs-on: ubuntu-latest
environment: production
steps:
- uses: ./.github/actions/my-action
with:
username: ${{ inputs.username }}
token: ${{ secrets.envPAT }}
With ./.github/actions/my-action the name of the my-action.yml file in your own repository.
A reusable workflow does not have to be in the same repository, and can be in another public one.
Davide Benvegnù aka CoderDave illustrates that in "Avoid Duplication! GitHub Actions Reusable Workflows" where:
n3wt0n/ActionsTest/.github/workflows/reusableWorkflowsUser.yml references
n3wt0n/ReusableWorkflow/.github/workflows/buildAndPublishDockerImage.yml#main
Related
I have a bunch of GitHub actions for services that each build a container if something changes in their corresponding directories. Then I have a final GitHub actions that sets up a cloud environment with Terraform and uses the containers created by the other actions to deploy the microservices and API for my project.
Currently I'm running that final action manually after all the other actions complete. However, I was wondering if there is any way to automate this. I realize I can chain actions but I'm unsure how to handle this since any number of actions might be running and might finish in any order.
You can turn each GitHub actions you have into a composite action for more reusability and readability.
But you can just combine all existing actions into 1 workflow - each in separate job if they need different runners.
Then you can set dependencies between those jobs them using needs syntax.
At the end when they all complete, you can have 1 job that has needs for each of those combined and check if they succeeded and based on that execute what you need.
As an example:
jobs:
job1:
job2:
needs: job1 //if you want to queue jobs then set dependency, if you want to run job1, job2 in parallel, remove this dependency
final_action:
needs: [job1, job2]
if: ${{ needs.job1.result == 'success' && needs.job2.result == 'success' }}
run: //PUBLISH STUFF
notify_error:
needs: [job1, job2]
if: ${{ always() && !cancelled() && (needs.job1.result != 'success' || needs.job2.result != 'success') }}
run: //NOTIFY FAILURE
In one workflow? You can run all actions in parallel in their own job. Then use needs: for the final job to depend on all the other jobs and deploy the application in that job.
What's the difference between GITHUB_REPOSITORY and github.repository? both the value and usage in Github action.
Found the answer from GitHub official docs.
Determining when to use default environment variables or contexts
GitHub Actions includes a collection of variables called contexts and a similar collection of variables called default environment variables. These variables are intended for use at different points in the workflow:
Default environment variables: These variables exist only on the runner that is executing your job. For more information, see "Default environment variables."
Contexts: You can use most contexts at any point in your workflow, including when default environment variables would be unavailable. For example, you can use contexts with expressions to perform initial processing before the job is routed to a runner for execution; this allows you to use a context with the conditional if keyword to determine whether a step should run. Once the job is running, you can also retrieve context variables from the runner that is executing the job, such as runner.os. For details of where you can use various contexts within a workflow, see "Context availability."
The following example demonstrates how these different types of environment variables can be used together in a job:
name: CI
on: push
jobs:
prod-check:
if: ${{ github.ref == 'refs/heads/main' }}
runs-on: ubuntu-latest
steps:
- run: echo "Deploying to production server on branch $GITHUB_REF"
In this example, the if statement checks the github.ref context to determine the current branch name; if the name is refs/heads/main, then the subsequent steps are executed. The if check is processed by GitHub Actions, and the job is only sent to the runner if the result is true. Once the job is sent to the runner, the step is executed and refers to the $GITHUB_REF environment variable from the runner.
Suppose i have three workflows: build_backend, build_frontend and deploy. First two should trigger in parallel, but the third should only trigger when both of those workflows are finished.
Currently the deploy workflow triggers twice -- i suspect that's for each of the two workflows completed.
# .github/workflows/build-xxx.yml
name: Build and Test - Backend
on:
push:
branches:
- master
pull_request:
branches:
- master
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout#v2
# ...
# .github/workflows/deploy.yml
name: Deploy
on:
workflow_run:
workflows:
- "Build and Test - Backend"
- "Build and Test - Frontend"
types:
- completed
branch: master
jobs:
deploy:
# ...
I haven't found the solution in docs:
https://docs.github.com/en/actions/reference/events-that-trigger-workflows#workflow_run
https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions
GitHub actions does not support specific trigger definitions like this for whole workflows.
However, you can use the needs keyword on a job level. So if you could consolidate all of these workflows into one workflow file. It seems like like this could work for you since these workflows all have the same (branch) trigger and the buildxxx are only a single job each.
There is also a GitHub Roadmap item describing, that they are working on adding workflow partials. That would enable you to separte these parts out in the future if you want to, but that is not avilable yet it seems like.
name: master builder
on:
push:
branches:
- master
~~~
I have a workflow like this. So, Whenever I push to the master branch, the actions run.
But I want the build to work only on the last push.
For example,
master branch - feature1 (person1)
master branch - feature2 (person2)
master branch - feature3 (person3)
In this structure, if features1,2,3 are merged at almost the same time, the build will run 3 times.
But I want the master branch to be built only on the last merge. Just once.
Is there anyway to do this? like.. run the build only once after waiting for about 1 minute when pushing.
This is a sample code that I proceeded in the way you answered. But I get an error "The key 'concurrency' is not allowed". What's wrong?
name: test
on:
push:
branches:
- feature/**
concurrency:
group: ${{ github.ref }}
cancel-in-progress: true
jobs:
~~~
You may try to achieve this with concurrency and cancel-in-progress: true
Concurrency ensures that only a single job or workflow using the same concurrency group will run at a time. A concurrency group can be any string or expression. The expression can only use the github context. For more information about expressions, see "Context and expression syntax for GitHub Actions."
You can also specify concurrency at the job level. For more information, see jobs.<job_id>.concurrency.
When a concurrent job or workflow is queued, if another job or workflow using the same concurrency group in the repository is in progress, the queued job or workflow will be pending. Any previously pending job or workflow in the concurrency group will be canceled. To also cancel any currently running job or workflow in the same concurrency group, specify cancel-in-progress: true.
However
Note: Concurrency is currently in beta and subject to change.
Here is an example workflow:
name: Deploy
on:
push:
branches:
- main
- production
paths-ignore:
- '**.md'
# Ensures that only one deploy task per branch/environment will run at a time.
concurrency:
group: environment-${{ github.ref }}
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Extract commit
run: |
echo "Sending commit $GITHUB_SHA for $GITHUB_REPOSITORY"
It's not clear for me from the documentation if it's even possible to pass one job's output to the another job (not from task to task, but from job to job).
I don't know if conceptually I'm doing the right thing, maybe it should be modeled differently in Concourse, but what I'm trying to achieve is having pipeline for Java project split into several granular jobs, which can be executed in parallel, and triggered independently if I need to re-run some job.
How I see the pipeline:
First job:
pulls the code from github repo
builds the project with maven
deploys artifacts to the maven repository (mvn deploy)
updates SNAPSHOT versions of the Maven project submodules
copies artifacts (jar files) to the output directory (output of the task)
Second job:
picks up jar's from the output
builds docker containers for all of them (in parallel)
Pipeline goes on
I was unable to pass the output from job 1 to job 2.
Also, I am curious if any changes I introduce to the original git repo resource will be present in the next job (from job 1 to job 2).
So the questions are:
What is a proper way to pass build state from job to job (I know, jobs might get scheduled on different nodes, and definitely in different containers)?
Is it necessary to store the state in a resource (say, S3/git)?
Is the Concourse stateless by design (in this context)?
Where's the best place to get more info? I've tried the manual, it's just not that detailed.
What I've found so far:
outputs are not passed from job to job
Any changes to the resource (put to the github repo) are fetched in the next job, but changes in working copy are not
Minimal example (it fails if commented lines are uncommented with error: missing inputs: gist-upd, gist-out):
---
resources:
- name: gist
type: git
source:
uri: "git#bitbucket.org:snippets/foo/bar.git"
branch: master
private_key: {{private_git_key}}
jobs:
- name: update
plan:
- get: gist
trigger: true
- task: update-gist
config:
platform: linux
image_resource:
type: docker-image
source: {repository: concourse/bosh-cli}
inputs:
- name: gist
outputs:
- name: gist-upd
- name: gist-out
run:
path: sh
args:
- -exc
- |
git config --global user.email "nobody#concourse.ci"
git config --global user.name "Concourse"
git clone gist gist-upd
cd gist-upd
echo `date` > test
git commit -am "upd"
cd ../gist
echo "foo" > test
cd ../gist-out
echo "out" > test
- put: gist
params: {repository: gist-upd}
- name: fetch-updated
plan:
- get: gist
passed: [update]
trigger: true
- task: check-gist
config:
platform: linux
image_resource:
type: docker-image
source: {repository: alpine}
inputs:
- name: gist
#- name: gist-upd
#- name: gist-out
run:
path: sh
args:
- -exc
- |
ls -l gist
cat gist/test
#ls -l gist-upd
#cat gist-upd/test
#ls -l gist-out
#cat gist-out/test
To answer your questions one by one.
All build state needs to be passed from job to job in the form of a resource which must be stored on some sort of external store.
It is necessary to store on some sort of external store. Each resource type handles this upload and download itself, so for your specific case I would check out this maven custom resource type, which seems to do what you want it to.
Yes, this statelessness is the defining trait behind concourse. The only stateful element in concourse is a resource, which must be strictly versioned and stored on an external data store. When you combine the containerization of tasks with the external store of resources, you get the guaranteed reproducibility that concourse provides. Each version of a resource is going to be backed up on some sort of data store, and so even if the data center that your ci runs on is to completely fall down, you can still have strict reproducibility of each of your ci builds.
In order to get more info I would recommend doing a tutorial of some kind to get your hands dirty and build a pipeline yourself. Stark and wayne have a tutorial that could be useful. In order to help understand resources there is also a resources tutorial, which might be helpful for you specifically.
Also, to get to your specific error, the reason that you are seeing missing inputs is because concourse will look for directories (made by resource gets) named each of those inputs. So you would need to get resource instances named gist-upd and gist-out prior to to starting the task.