How do I cache steps in GitHub actions?

How do I cache steps in GitHub actions? - github

Say I have a GitHub actions workflow with 2 steps.
Download and compile my application's dependencies.
Compile and test my application
My dependencies rarely change and the compiled dependencies can be safely cached until I next change the lock-file that specifies their versions.
Is a way to save the result of the first step so that in future workflow can skip over that step?

Most use-cases are covered by existing actions, for example:
actions/setup-node for JS
docker/build-push-action for Docker
Custom caching is supported via the cache action. It works across both jobs and workflows within a repository. See also: GitHub docs and Examples.
Consider the following example:
name: GitHub Actions Workflow with NPM cache
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v3
- name: Cache NPM dependencies
uses: actions/cache#v3
with:
path: ~/.npm
key: ${{ runner.OS }}-npm-cache-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.OS }}-npm-cache-
- name: Install NPM dependencies
run: npm install
How caching works step-by-step:
At the Cache NPM dependencies step, the action will check if there's an existing cache for the current key
If no cache is found, it will check spatial matches using restore-keys. In this case, if package-lock.json changes, it will fall back to a previous cache. It is useful to prefix keys and restore keys with the OS and name of the cache, as it shouldn't load files for a different type of cache or OS.
If any cache is found, it will load the files to path
The CI continues to the next step and can use the filed loaded from the cache. In this case, npm install will use the files in ~/.npm to save downloading them over the network (note that for NPM, caching node_modules directly is not recommended).
At the end of the CI run a post-action is executed to save the updated cache in case the key changes. This is not explicitly defined in the workflow, rather it is built into the cache action to take care of both loading and saving the cache.
You can also build your own reusable caching logic with #actions/cache such as:
1-liner NPM cache
1-liner Yarn cache
Old answer:
Native caching is not currently possible, expected to be implemented by mid-November 2019.
You can use artifacts (1, 2) to move directories between jobs (within 1 workflow) as proposed on the GH Community board. This, however, doesn't work across workflows.

The cache action can only cache the contents of a folder. So if there is such a folder, you may win some time by caching it.
For instance, if you use some imaginary package-installer (like Python's pip or virtualenv, or NodeJS' npm, or anything else that puts its files into a folder), you can win some time by doing it like this:
- uses: actions/cache#v2
id: cache-packages # give it a name for checking the cache hit-or-not
with:
path: ./packages/ # what we cache: the folder
key: ${{ runner.os }}-packages-${{ hashFiles('**/packages*.txt') }}
restore-keys: |
${{ runner.os }}-packages-
- run: package-installer packages.txt
if: steps.cache-packages.outputs.cache-hit != 'true'
So what's important here:
We give this step a name, cache-packages
Later, we use this name for conditional execution: if, steps.cache-packages.outputs.cache-hit != 'true'
Give the cache action a path to the folder you want to cache: ./packages/
Cache key: something that depends on the hash of your input files. That is, if any packages.txt file changes, the cache will be rebuilt.
The second step, package installer, will only be run if there was no cache
For users of virtualenv: if you need to activate some shell environment, you have to do it in every step. Like this:
- run: . ./environment/activate && command

My dependencies rarely change and the compiled dependencies can be safely cached until I next change the lock-file that specifies their versions. Is a way to save the result of the first step so that in future workflow can skip over that step?
The first step being:
Download and compile my application's dependencies.
GitHub Actions themselves will not do this for you. The only advice I can give you is that you adhere to Docker best practices in order to ensure that if Actions do make use of docker caching, your image could be re-used instead of rebuilt. See: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache
When building an image, Docker steps through the instructions in your Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image.
This also implies that the underlying system of GitHub Actions can/will leverage the Docker caching.
However things like compilation, Docker won't be able to use the cache mechanism, so I suggest you think very well if this is something you desperately need. The alternative is to download the compiled/processed files from an artifact store (Nexus, NPM, MavenCentral) to skip that step. You do have to weight the benefits vs the complexity you are adding to your build on this.

This is now natively supported using: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/caching-dependencies-to-speed-up-workflows.
This is achieved by using the new cache action: https://github.com/actions/cache

If you are using Docker in your WorkFlows, as #peterevans answered, GitHub now supports caching through the cache action, but it has its limitations.
For that reason, you might find useful this action to bypass GitHub's action limitations.
Disclaimer: I created the action to support cache before GitHub did it officially, and I still use it because of its simplicity and flexibility.

I'll summarize the two options:
Caching
Docker
Caching
You can add a command in your workflow to cache directories. When that step is reached, it'll check if the directory that you specified was previously saved. If so, it'll grab it. If not, it won't. Then in further steps you write checks to see if the cached data is present. For example, say you are compiling some dependency that is large and doesn't change much. You could add a cache step at the beginning of your workflow, then a step to build the contents of the directory if they aren't there. The first time that you run it won't find the files but subsequently it will and your workflow will run faster.
Behind the scenes, GitHub is uploading a zip of your directory to github's own AWS storage. They purge anything older than a week or if you hit a 2GB limit.
Some drawbacks with this technique is that it saves just directories. So if you installed into /usr/bin, you'll have to cache that! That would be awkward. You should instead install into $home/.local and use echo set-env to add that to your path.
Docker
Docker is a little more complex and it means that you have to have a dockerhub account and manage two things now. But it's way more powerful. Instead of saving just a directory, you'll save an entire computer! What you'll do is make a Dockerfile that will have in it all your dependencies, like apt-get and python pip lines or even long compilation. Then you'll build that docker image and publish it on dockerhub. Finally, you'll have your tests set to run on that new docker image, instead of on eg, ubuntu-latest. And from now on, instead of installing dependencies, it'll just download the image.
You can automate this further by storing that Dockerfile in the same GitHub repo as the project and then write a job with steps that will download the latest docker image, rebuild if necessary just the changed steps, and then upload to dockerhub. And then a job which "needs" that one and uses the image. That way your workflow will both update the docker image if needed and also use it.
The downsides is that your deps will be in one file, the Dockerfile, and the tests in the workflow, so it's not all together. Also, if the time to download the image is more than the time to build the dependencies, this is a poor choice.
I think that each one has upsides and downsides. Caching is only good for really simple stuff, like compiling into .local. If you need something more extensive, Docker is the most powerful.

Related

github actions not discovering artifacts uploaded between runs

I have a set of static binaries that I am currently re-downloading every CI run. I need these binaries to test against. I would like to cache these OS specific binaries on github actions so i don't need to re-download them everytime.
A key consideration here is the binaries do not change between jobs, they are 3rd party binaries that I do not want to re-download from the 3rd party site every time a PR is submitted to github. These binaries are used to test against, and the 3rd party publishes a release once every 6 months
I have attempted to do this with the upload-artifact and download-artifact flow with github actions.
I first created an action to upload the artifacts. These are static binaries I would like to cache repository wide and re-use everytime a PR is opened.
Here is the commit that did that:
https://github.com/bitcoin-s/bitcoin-s/runs/2841148806
I pushed a subsequent commit and added logic to download-artifact on the same CI job. When it runs, it claims that there is no artifact with that name despite the prior commit on the same job uploading it
https://github.com/bitcoin-s/bitcoin-s/pull/3281/checks?check_run_id=2841381241#step:4:11
What am i doing wrong?
Next

Artifacts and cache achieve the same thing, but should be used for different use cases. From the GitHub docs:
Artifacts and caching are similar because they provide the ability to store files on GitHub, but each feature offers different use cases and cannot be used interchangeably.
Use caching when you want to reuse files that don't change often
between jobs or workflow runs.
Use artifacts when you want to save
files produced by a job to view after a workflow has ended.
In your case you could use caching and set up a cache action. You will need a key and a path, and it will look something like this:
- name: Cache dmg
uses: actions/cache#v2
with:
key: "bitcoin-s-dmg-${{steps.previoustag.outputs.tag}}-${{github.sha}}"
path: ${{ env.pkg-name }}-${{steps.previoustag.outputs.tag}}.dmg
When there's a cache hit (your key is found), the action restores the cached files to your specified path.
When there's a cache miss (your key is not found), a new cache is created.
By using contexts you can update your key and observe changes in files or directories. E.g. to update the cache whenever your package-lock.json file changes you can use ${{ hashFiles('**/package-lock.json') }}.

How to install an old version of the Direct X Api in GitHub actions

I'm working on an implementation of continuous integration in this project, which requires an old version of the DirectX SDK from June 2010. Is it possible to install this as a part of a GitHub Actions workflow at all? It may build with any version of the SDK as long as it's compatible with Windows 7.
Here's the workflow I've written so far, and here's the general building for Windows guide I'm following...

I have a working setup for project using DX2010, however i am not running installer (which always failed for me during beta, maybe it's fixed nowadays) but extracting only parts required for build. Looking at link you provided, this is exactly what guide recommends :)
First, DXSDK_DIR variable is set using ::set-env "command". Variable most likely should point to directory outside default location, which can be overwritten if repository is checked out after preparing DX files.
- name: Config
run: echo ::set-env name=DXSDK_DIR::$HOME/cache/
shell: bash
I didn't want to include DX files in repository, so they had to be downloaded when workflow is running. To avoid doing that over and over again cache action is used to keep files between builds.
- name: Cache
id: cache
uses: actions/cache#v1
with:
path: ~/cache
key: cache
And finally, downloading and extracting DX2010. This step will run only if cache wasn't created previously or current workflow cannot create/restore caches (like on: schedule or on: repository_dispatch).
- name: Cache create
if: steps.cache.outputs.cache-hit != 'true'
run: |
curl -L https://download.microsoft.com/download/a/e/7/ae743f1f-632b-4809-87a9-aa1bb3458e31/DXSDK_Jun10.exe -o _DX2010_.exe
7z x _DX2010_.exe DXSDK/Include -o_DX2010_
7z x _DX2010_.exe DXSDK/Lib/x86 -o_DX2010_
mv _DX2010_/DXSDK $HOME/cache
rm -fR _DX*_ _DX*_.exe
shell: bash
Aaand that's it, project is ready for compilation.

Deploy individual services from a monorepo using github actions

I have around 10 individual micro-services which are mostly cloud functions for various data processing jobs, which all live in a single github repository.
The goal is to trigger the selective deployment of these service to Google Cloud Functions, on push to a branch - when an individual function has been updated.
I must avoid the situation in which update of a single service causes the deployment of all the cloud functions.
My current repository structure:
/repo
--/service_A
----/function
----/notebook
--/service_B
----/function
----/notebook
On a side note, what are the pros/cons of using Github Actions VS Google Cloud Build for such automation?

GitHub Actions supports monorepos with path filtering for workflows. You can create a workflow to selectively trigger when files on a specific path change.
https://help.github.com/en/articles/workflow-syntax-for-github-actions#onpushpull_requestpaths
For example, this workflow will trigger on a push when any files under the path service_A/ have changed (note the ** glob to match files in nested directories).
on:
push:
paths:
- 'service_A/**'

You could also run some script to discover which services were changed based on git diff and trigger corresponding job via GitHub REST API.
There could be two workflows main.yml and services.yml.
Main workflow will be configured to be started always on push and it will only start script to find out which services were changed. For each changed service repository dispatch event will be triggered with service name in payload.
Services workflow will be configured to be started on repository_dispatch and it will contain one job for each service. Jobs would have additional condition based on event payload.
See showcase with similar setup:
https://github.com/zladovan/monorepo

It's not a Monorepo
If you only have apps, then I'm sorry... but all you have is a repo of many apps.
A monorepo is a collection of packages that you can map a graph of dependencies between.
Aha, I have a monorepo
But if you have a collection of packges which depend on each other, then read on.
apps/
one/
depends:
pkg/foo
two/
depends:
pkg/bar
pkg/foo
pkg/
foo/
bar/
baz/
The answer is that you switch to a tool that can describe which packages have changed between the current git ref and some other git ref.
The following two examples runs the release npm script on each package that changed under apps/* and all the packges they would depend on.
I'm unsure if the pnpm method silently skips packages that don't have a release target/command/script.
Use NX Dev
Using NX.dev, it will work it out for you with its nx affected command.
you need a nx.json in the root of your monorepo
it assumes you're using the package.json approach with nx.dev, if you have project.json in each package, then the target would reside there.
your CI would then look like:
pnpx nx affected --target=release
Pnpm Filtering
Your other option is to switch to pnpm and use its filtering syntax:
pnpm --filter "...{apps/**}[origin/master]" release
Naive Path Filtering
If you just try and rely on "which paths" changed in this git commit, then you miss out on transient changes that affect the packages you actually want to deploy.
If you have a github action like:
on:
push:
paths:
- 'app/**'
Then you won't ever get any builds for when you only push commits that change anything in pkg/**.
Other interesting github actions
https://github.com/marketplace/actions/nx-check-changes
https://github.com/marketplace/actions/nx-affected-dependencies-action
https://github.com/marketplace/actions/nx-affected-list (a non nx alternative here is dorny/paths-filter
https://github.com/marketplace/actions/nx-affected-matrix

Has Changed Path Action might be worth a try:
name: Conditional Deploy
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
with:
fetch-depth: 100
- uses: marceloprado/has-changed-path#v1
id: service_A_deployment
with:
paths: service_A
- name: Deploy front
if: steps.service_A_deployment.outputs.changed == 'true'
run: /deploy-service_A.sh

Ignore Files In Repo When Deploying To Netlify

I have a Github repo that includes some large graphical assets. These assets result in failed deploys to Netlify due to Netlify's size restrictions. Is there any way I can keep these files within the Github repo but exclude them from Netlify deploys, in the same way I could use a .slugignore file when deploying to heroku.

Netlify doesn't really have explicit size restrictions, though uploads of files >20MB may fail. Are your files bigger than that? If so, hosting them on Netlify's CDN also doesn't make sense as the CDN edge cache will ignore them and they'll load slowly for browsers anyway.
To not deploy them, the most straightforward way is to remove them after your build, something like this:
npm run build && rm dist/hugefile1.jpg dist/subdir/hugefile2.pdf
You can get fancier and use a file to list them or just look for everything huge. Warning - something huge that it DOES make sense to host is your sourcemap (if you use one), so watch out for what this might catch!
npm run build && find dist -type f -size +20M
Effectively - you can do anything you could do in a shell script. NB: You need to make sure that your build pipeline fails if any necessary step fails - this is why the examples show && to chain commands rather than ; (build could fail, find succeed, and we publish an empty site!).
More details here: https://www.netlify.com/blog/2016/10/18/how-our-build-bots-build-sites/ and you can test out your scripts using the methodology described here: https://github.com/netlify/build-image#testing-locally

Rustdoc on gh-pages with Travis

I have generated documentation for my project with cargo doc, and it is made in the target/doc directory. I want to allow users to view this documentation without a local copy, but I cannot figure out how to push this documentation to the gh-pages branch of the repository. Travis CI would help me automatically do this, but I cannot get it to work either. I followed this guide, and set up a .travis.yml file and a deploy.sh script. According to the build logs, everything goes fine but the gh-pages branch never gets updated. My operating system is Windows 7.

It is better to use travis-cargo, which is intended to simplify deploying docs and which also has other features. Its readme provides an example of .travis.yml file, although in the simplest form it could look like this:
language: rust
sudo: false
rust:
- nightly
- beta
- stable
before_script:
- pip install 'travis-cargo<0.2' --user && export PATH=$HOME/.local/bin:$PATH
script:
- |
travis-cargo build &&
travis-cargo test &&
travis-cargo --only beta doc
after_success:
- travis-cargo --only beta doc-upload
# needed to forbid travis-cargo to pass `--feature nightly` when building with nightly compiler
env:
global:
- TRAVIS_CARGO_NIGHTLY_FEATURE=""
It is very self-descriptive, so it is obvious, for example, what to do if you want to use another Rust release train for building docs.
In order for the above .travis.yml to work, you need to set your GH_TOKEN somehow. There are basically two ways to do it: inside .travis.yml via an encrypted string, or by configuring it in the Travis itself, in project options. I prefer the latter way, so I don't need to install travis command line tool or pollute my .travis.yml (and so the above config file does not contain secure option), but you may choose otherwise.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse