github actions not discovering artifacts uploaded between runs - github

I have a set of static binaries that I am currently re-downloading every CI run. I need these binaries to test against. I would like to cache these OS specific binaries on github actions so i don't need to re-download them everytime.
A key consideration here is the binaries do not change between jobs, they are 3rd party binaries that I do not want to re-download from the 3rd party site every time a PR is submitted to github. These binaries are used to test against, and the 3rd party publishes a release once every 6 months
I have attempted to do this with the upload-artifact and download-artifact flow with github actions.
I first created an action to upload the artifacts. These are static binaries I would like to cache repository wide and re-use everytime a PR is opened.
Here is the commit that did that:
https://github.com/bitcoin-s/bitcoin-s/runs/2841148806
I pushed a subsequent commit and added logic to download-artifact on the same CI job. When it runs, it claims that there is no artifact with that name despite the prior commit on the same job uploading it
https://github.com/bitcoin-s/bitcoin-s/pull/3281/checks?check_run_id=2841381241#step:4:11
What am i doing wrong?
Next

Artifacts and cache achieve the same thing, but should be used for different use cases. From the GitHub docs:
Artifacts and caching are similar because they provide the ability to store files on GitHub, but each feature offers different use cases and cannot be used interchangeably.
Use caching when you want to reuse files that don't change often
between jobs or workflow runs.
Use artifacts when you want to save
files produced by a job to view after a workflow has ended.
In your case you could use caching and set up a cache action. You will need a key and a path, and it will look something like this:
- name: Cache dmg
uses: actions/cache#v2
with:
key: "bitcoin-s-dmg-${{steps.previoustag.outputs.tag}}-${{github.sha}}"
path: ${{ env.pkg-name }}-${{steps.previoustag.outputs.tag}}.dmg
When there's a cache hit (your key is found), the action restores the cached files to your specified path.
When there's a cache miss (your key is not found), a new cache is created.
By using contexts you can update your key and observe changes in files or directories. E.g. to update the cache whenever your package-lock.json file changes you can use ${{ hashFiles('**/package-lock.json') }}.

Related

How to read from github actions cache without writing to it

I'm using github actions cache for persisting remotely downloaded dependencies from tests across CI executions. https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
The issue I'm having is that I only want the action write to the cache when it's running on the push action on the master branch. If the action is a pull_request, I'd like it to read from the cache, but not write to it.
The reason for this is that caches that are originated from master are mostly reusable for any PR, but caches generated from a PR may not be super useful for other CI invocations because the code is yet to be reviewed and the developer may be trying out things which may just mess up the cache for other invocations.
Right now I'm doing something like this
- name: Cache packages
uses: actions/cache#v3
with:
key: 'cache-${{ github.event_name }}'
restore-keys: |
cache-push
path: |
/path/to/cache
This way I have 2 cache keys, one for PRs and one for master, master will always use the cache from he previous master invocation because it will only match cache-push, but prs will use a different key, cache-pull_request and fallback to cache-push if it doesnt exist. This way master pushes never use a cache that was generated from a pr, only caches that were generated from the previous master push.
Ideally I'd like the cache-pull_request key to not even exist and just have PRs use cache-push but not write to it at the end of the execution. Is this possible?
EDIT: Github Actions now officially supports this as of version 3.2.0!
Original comment:
I've been looking for the same thing and unfortunately it does not seem to be possible. There are open PRs and issues in the repo on Github
https://github.com/actions/cache/pull/489
So until it get merged or implemented in some other way it is not a possibility with the official Github cache workflow.
I also noticed that this PR had been closed
https://github.com/actions/cache/pull/474
The author closed it himself due to inactivity, but forked it to another repo and implemented it there. See https://github.com/MartijnHols/actions-cache
I have not used this repo myself but it might be worth checking out
Check actions/cache/restore#v3 and actions/cache/save#v3.
You can restore or save cache separately.

Referencing an artifact built by Github Actions

The upload/download artifact documentation implies that one should be able to build into the dist folder. My interpretation of this is that we can then reference this content in, for example, a static site, so that a site auto-builds itself for github pages on master pushes. However, it seems that artifacts are only uploaded to a specific location (i.e. GET /repos/{owner}/{repo}/actions/artifacts ) and can be downloaded only in zipped format, which defeats the purpose.
Is there a way to populate the dist folder of the repo, so that the file that was built becomes publicly and permanently accessible as part of the repo, and I can reference it without having to deploy it elsewhere like S3 etc?
Example
Here's a use case:
I have a dashboard which parses some data from several remote locations and shows it in charts. The page is deployed from /docs because it's a Github Pages hosted page.
the web page only reads static, cached data from /docs/cache/dump.json.
the dump.json file is generated via a scheduled Github Action which invokes a script that goes to the data sources and generates the dump.
This is how the web page can function quickly without on-page lockups due to lengthy data processing while the dump generation happes in the background. The web page periodically re-reads the /docs/cache/dump.json file to get new data, which should override old data on every scheduled trigger.
The idea is to have the action run and replace the dump.json file periodically, but all I can do is produce an artifact which I then have to manually fetch and unzip. Ideally, it would just replace the current dump.json file in place.
To persist changes made by a build process, it is necessary to add and commit them like after any change to a repo. Several actions exist for this, like this one.
So you would add the following to the workflow:
- name: Commit changes
uses: EndBug/add-and-commit#v7
with:
author_name: Commitobot
author_email: my#mail.com
message: "Updating build result!"
add: "docs/cache/dump.json"

How do I cache steps in GitHub actions?

Say I have a GitHub actions workflow with 2 steps.
Download and compile my application's dependencies.
Compile and test my application
My dependencies rarely change and the compiled dependencies can be safely cached until I next change the lock-file that specifies their versions.
Is a way to save the result of the first step so that in future workflow can skip over that step?
Most use-cases are covered by existing actions, for example:
actions/setup-node for JS
docker/build-push-action for Docker
Custom caching is supported via the cache action. It works across both jobs and workflows within a repository. See also: GitHub docs and Examples.
Consider the following example:
name: GitHub Actions Workflow with NPM cache
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v3
- name: Cache NPM dependencies
uses: actions/cache#v3
with:
path: ~/.npm
key: ${{ runner.OS }}-npm-cache-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.OS }}-npm-cache-
- name: Install NPM dependencies
run: npm install
How caching works step-by-step:
At the Cache NPM dependencies step, the action will check if there's an existing cache for the current key
If no cache is found, it will check spatial matches using restore-keys. In this case, if package-lock.json changes, it will fall back to a previous cache. It is useful to prefix keys and restore keys with the OS and name of the cache, as it shouldn't load files for a different type of cache or OS.
If any cache is found, it will load the files to path
The CI continues to the next step and can use the filed loaded from the cache. In this case, npm install will use the files in ~/.npm to save downloading them over the network (note that for NPM, caching node_modules directly is not recommended).
At the end of the CI run a post-action is executed to save the updated cache in case the key changes. This is not explicitly defined in the workflow, rather it is built into the cache action to take care of both loading and saving the cache.
You can also build your own reusable caching logic with #actions/cache such as:
1-liner NPM cache
1-liner Yarn cache
Old answer:
Native caching is not currently possible, expected to be implemented by mid-November 2019.
You can use artifacts (1, 2) to move directories between jobs (within 1 workflow) as proposed on the GH Community board. This, however, doesn't work across workflows.
The cache action can only cache the contents of a folder. So if there is such a folder, you may win some time by caching it.
For instance, if you use some imaginary package-installer (like Python's pip or virtualenv, or NodeJS' npm, or anything else that puts its files into a folder), you can win some time by doing it like this:
- uses: actions/cache#v2
id: cache-packages # give it a name for checking the cache hit-or-not
with:
path: ./packages/ # what we cache: the folder
key: ${{ runner.os }}-packages-${{ hashFiles('**/packages*.txt') }}
restore-keys: |
${{ runner.os }}-packages-
- run: package-installer packages.txt
if: steps.cache-packages.outputs.cache-hit != 'true'
So what's important here:
We give this step a name, cache-packages
Later, we use this name for conditional execution: if, steps.cache-packages.outputs.cache-hit != 'true'
Give the cache action a path to the folder you want to cache: ./packages/
Cache key: something that depends on the hash of your input files. That is, if any packages.txt file changes, the cache will be rebuilt.
The second step, package installer, will only be run if there was no cache
For users of virtualenv: if you need to activate some shell environment, you have to do it in every step. Like this:
- run: . ./environment/activate && command
My dependencies rarely change and the compiled dependencies can be safely cached until I next change the lock-file that specifies their versions. Is a way to save the result of the first step so that in future workflow can skip over that step?
The first step being:
Download and compile my application's dependencies.
GitHub Actions themselves will not do this for you. The only advice I can give you is that you adhere to Docker best practices in order to ensure that if Actions do make use of docker caching, your image could be re-used instead of rebuilt. See: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache
When building an image, Docker steps through the instructions in your Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image.
This also implies that the underlying system of GitHub Actions can/will leverage the Docker caching.
However things like compilation, Docker won't be able to use the cache mechanism, so I suggest you think very well if this is something you desperately need. The alternative is to download the compiled/processed files from an artifact store (Nexus, NPM, MavenCentral) to skip that step. You do have to weight the benefits vs the complexity you are adding to your build on this.
This is now natively supported using: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/caching-dependencies-to-speed-up-workflows.
This is achieved by using the new cache action: https://github.com/actions/cache
If you are using Docker in your WorkFlows, as #peterevans answered, GitHub now supports caching through the cache action, but it has its limitations.
For that reason, you might find useful this action to bypass GitHub's action limitations.
Disclaimer: I created the action to support cache before GitHub did it officially, and I still use it because of its simplicity and flexibility.
I'll summarize the two options:
Caching
Docker
Caching
You can add a command in your workflow to cache directories. When that step is reached, it'll check if the directory that you specified was previously saved. If so, it'll grab it. If not, it won't. Then in further steps you write checks to see if the cached data is present. For example, say you are compiling some dependency that is large and doesn't change much. You could add a cache step at the beginning of your workflow, then a step to build the contents of the directory if they aren't there. The first time that you run it won't find the files but subsequently it will and your workflow will run faster.
Behind the scenes, GitHub is uploading a zip of your directory to github's own AWS storage. They purge anything older than a week or if you hit a 2GB limit.
Some drawbacks with this technique is that it saves just directories. So if you installed into /usr/bin, you'll have to cache that! That would be awkward. You should instead install into $home/.local and use echo set-env to add that to your path.
Docker
Docker is a little more complex and it means that you have to have a dockerhub account and manage two things now. But it's way more powerful. Instead of saving just a directory, you'll save an entire computer! What you'll do is make a Dockerfile that will have in it all your dependencies, like apt-get and python pip lines or even long compilation. Then you'll build that docker image and publish it on dockerhub. Finally, you'll have your tests set to run on that new docker image, instead of on eg, ubuntu-latest. And from now on, instead of installing dependencies, it'll just download the image.
You can automate this further by storing that Dockerfile in the same GitHub repo as the project and then write a job with steps that will download the latest docker image, rebuild if necessary just the changed steps, and then upload to dockerhub. And then a job which "needs" that one and uses the image. That way your workflow will both update the docker image if needed and also use it.
The downsides is that your deps will be in one file, the Dockerfile, and the tests in the workflow, so it's not all together. Also, if the time to download the image is more than the time to build the dependencies, this is a poor choice.
I think that each one has upsides and downsides. Caching is only good for really simple stuff, like compiling into .local. If you need something more extensive, Docker is the most powerful.

Azure Pipelines: Store git submodules as artifacts and only build as needed

We have a project written in C that depends on several libraries as git submodules. We built an Azure Pipeline to build it, using multiple containers targeting multiple environments.
The challenge is that the build takes more time than we'd like, partly because of the fact that the submodules are being recompiled every time, even though they do not change.
What I'm looking for is a way to build the submodules only when needed, store them as artifacts, and have the main build consume them.
As far as I understand, I can set up a build for the submodule's repos which will poll for changes, but I want my product to depend on specific commits of the submodules - i.e. I'm not always taking the latest submodule version.
So I'm looking to trigger a submodule build whenever we switch to a new commit. Can this be achieved in Azure Pipelines? What would be the best way to manage the artifacts (e.g. store the commit ID as part of the artifact name)?

How you increment the version number using Travis CI?

The project that I am working on is a jQuery plugin. I have managed to get Travis CI to build a test project using Gulp/NodeJS successfully. Now I am trying to work out what workflow to use to bump the version number.
In TeamCity and MyGet there is a setting in the CI server to form a version number pattern that auto increments on each build, which can be used by the build script to update versions in the deployment files and to label the Git repo. However, in the free version of Travis CI, there doesn't seem to be an option for versioning at all.
I have read several articles on continuous deployment with Travis CI, here, here, and here, but none of them even broach the topic of versioning. Obviously, the version needs to be changed for the release. So what am I missing here?
Another problem I noted when going through the documentation is that it mentioned that Travis CI is not able to update the GitHub repository. Doesn't that basically mean it won't be able to create a Git tag?
If there is no way to version from Travis CI, then what is the typical workflow for the release process for such a plugin? Is the versioning always done manually? If so, how could there be "continuous deployment"?
Before it starts running the instructions in your .travis.yml file, Travis will set a bunch of environment variables (in the VM that is building your project) with various bits of information about your build, such as what branch is being built and so on.
You probably want one of these:
TRAVIS_BUILD_NUMBER: The number of the current build (for example, “4”).
TRAVIS_JOB_NUMBER: The number of the current job (for example, “4.1”).
But it's going to be very difficult to do anything sensible if you don't have control of the repository, because you'll need to upload a .travis.yml file into the root of your source code folder, otherwise Travis won't know what to do.
Use bumped for release versioning. When you're satisfied with the changes in master, run:
bumped release <major|minor|patch>
After you push the changes, either directly or through a release PR, you can check for the presence of new tags in Travis CI and publish the package to the registry automatically.
If you consider that every PR must end up to your enduser without thinking of the impact of such changes, then your version numbers have no meaning.
You don't give your user a way to know if it is a major change that break compatibility or a bug fix. You don't allow him to get update without worrying about backward compatibility.
Currently, the commit id is your version number.
If you want to give meaning to your version numbers then you have to think of the impact of your pull requests on the enduser (http://semver.org/). You have to choose a version number for a specific PR or a group of PR.
So basically, since you have to 'think' of a certain version number for a specific version that you want to deliver, you can't automate this process.
Release/tag creation is the way to go : )
You can accomplish this by setting up a script that would create a ~/.netrc file to access the repository. In this file you can specify something like:
machine https://github.com/xxx/yyy.git
login <blah>
And instead of putting in your credentials, you can pass an github access token. You can use the travis encrypt to register it in the .travis.yml file, and export the variable for your script's use. From there in your script, you can issue regular git commands such as:
git add <some file>
git commit -m "This is $TRAVIS_BUILD_NUMBER"
git push origin <branch>