Run a Concourse build step only when an output of a prior task changes - concourse

Given an input, I have a cheap function and an expensive function; each of these is modeled as a Concourse task.
If two invocations of the cheap function have the same output, I know that two invocations of the expensive function will likewise have the same output.
How can I set up a pipeline that only runs the expensive function when the result of the cheap function changes?
For the sake of an example, let's say that the cheap function strips comments and whitespace from a codebase and then calculates a checksum; whereas the expensive function actually runs the code contained. My goal, in this scenario, is to not bother building any revision that differs from the prior one only in comments or whitespace.
I've considered using a git resource and (in our example) storing a hash of preprocessor output for each compilation target in a different file, so the task doing actual compilation (and applicable unit tests) can trigger on changes to the file with hash of the inputs that went into building that file. Having a separate git resource that maintains historical hashes indefinitely seems like overkill, though. Is there a better approach?
This is similar to Have Concourse only build new docker containers on file diff not on commit, but I'm trying to test whether the result of running a function against a file changes, to trigger only on changes that could modify build results rather than all possible changes. (The proposal described above, creating an intermediary repo with outputs from the cheap function, would effectively be using the answers to that question as one of its components; but I'm hoping there's an option with fewer moving parts).

Consider using put nested in the try: modifier:
The cheap job takes two inputs:
git repo with the code
hash of the last cheap computation
On every commit to code-repo, the cheap job reads the last-hash input, mapped from hash and compares it to the computation result (in the silly example below, the contents of hash.txt checked into the root of code-repo).
If it determines that the hash value from incoming commit differs from the previously recorded hash value, it populates the put param hash/hash.txt with the new hash value, which results in a new put to the resource which in turn will trigger the expensive job.
If no change is detected, the put attempt will fail because the put param will not exist, but the overall cheap job will succeed.
resources:
- name: code-repo
type: git
source:
branch: master
private_key: ((key))
uri: git#github.com:myorg/code-repo.git
- name: hash
type: s3
source:
access_key_id: ((aws_access))
secret_access_key: ((aws_secret))
region_name: ((aws_region))
bucket: my-versioned-aws-bucket
versioned_file: hash/hash.txt
jobs:
- name: cheap
plan:
- get: code-repo
trigger: true
- get: hash
- task: check
input_mapping:
last-hash: hash
config:
platform: linux
image_resource:
type: docker-image
source: { repository: alpine }
inputs:
- name: code-repo
- name: last-hash
outputs:
- name: hash
run:
path: /bin/sh
args:
- -c
- |
LAST="$(cat last-hash/hash.txt)"
NEW=$(cat code-repo/hash.txt)
if [ "$LAST" != "$NEW" ]; then
cp code-repo/hash.txt hash/hash.txt
fi
on_success:
try:
put: hash
params:
file: hash/hash.txt
- name: expensive
plan:
- get: hash
trigger: true
passed: [ cheap ]
Note: you must populate the initial state file in s3 with some value, or the cheap job won't take off.

Related

Github Actions Environment Variables

I'm sure this must be really simple, but I just can't see it and would appreciate any assistance.
Lets say I have a really simple pipeline like this:
name: Deploy to App Engine
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
PROJECT_ID: stankardian
REGION: europe-west2
APP_NAME: tootler
jobs:
deploy:
name: Test Deployments To App Engine
runs-on: ubuntu-latest
steps:
# Checkout the repo code:
- name: Checkout repository
uses: actions/checkout#v3
with:
ref: main
What I am trying to do is re-use the same pipeline for multiple deployment scenarios where the deployment steps can can the same, but I need to be able to use different values in the deployment steps.
For example the APP_NAME below is 'tootler'. Lets say I need to deploy this to dev, test and preprod. For dev the app name would be 'dev-tootler', in test it would be 'test-tootler, but in preprod it might need to be 'preprod-tootler-v4' or some such.
Ideally I would like to set a single variable somewhere to control the environment I'm deploying into then depending on the value of that variable then load a range of other environment variables with the specific values pertaining to that environment. The example is grossly simplified, but I might need to load 40 variables for each environment and each of those might be a different value (but the same env variable name).
In an ideal world I would like to package the env variables and values in the app directory and load the correct file based on the evaluation of the control variable. E.g.
|
|-- dev.envs
|-- test.envs
|-- preprod.envs
along with :
$env_to_load_for = $env:control_variable
load_env_variables_file($env_to_load_for)
In that pseudocode the value of $env_to_load_for evaluates to the correct filename for the environment I need to work with,then the correct environment varibles get loaded.
I have tried running a bash shell script which exports the variables I need, but I'm finding that those variables only exist for the specific step in the pipeline. By the time I list out the environment variables in the next step, they are gone.
Does that makes sense? This kind of scenario must be very common, but I cant seem to locate any patterns that explain how to accomplish this. I don't want to do down the route of managing different yaml files per environment when the actions are identical.
Would be very grateful for any assistance.
After a lot of experimentation I think I came up with a good way to achieve this. Posting as an answer in case this information helps someone else in the future.
What I did was:
Add a bash step immediately after the checkout
Use that step to run a shell script, I called the script 'target_selector.sh'
In 'target_selector.sh' I evaluate an existing environment variable which I set already at either the job or workflow scope. This is set to either 'dev', 'test' or 'preprod' and wil be used to set the context for everything in one single easy to manage value.
I used a case block to then, depending on the value of that variable, dot source either 'dev.sh', 'test.sh' or 'preprod.sh' depending on what was evaluated. These files I put in a /targets folder.
This is where the magic happens, in those .sh files (in the /targets folder), I added the environment variables I need, for that context, using this syntax:
echo "DEPLOY_TARGET=dev" >> $GITHUB_ENV
echo "APP_NAME=dev_appname" >> $GITHUB_ENV
Turns out that this syntax will write the output of the expression up to the workflow scope. This means that subsequent steps in the workflow can use those variables.
Its a little bit of a faff, but it works and is clearly extremely powerful.
Hope it helps someone, someday.

Concourse Slack alert

I have written a bash script it process some data and puts in one file. My intention is to give slack alert if there is content in that file if not it should not give the alert. Is there a way to do it? In Concourse
You should take advantage of the Concourse community's open source resource types. There's a list here. There is a slack resource listed on that page, but I use the one here (not included in the list above because it has not been added by the authors) https://github.com/cloudfoundry-community/slack-notification-resource.
That will give you the ability to add a put step in your job plan to send a slack resource. As for the logic of your original ask, you can use try and on_success. Your task might look something like this:
- try:
task: do-a-thing
config:
platform: linux
image_resource:
type: registry-image
source:
repository: YOUR_TASK_IMAGE
tag: latest
inputs:
- name: some-input
params:
FILE: some-file
run:
path: /bin/sh
args:
- -ec
- |
[ ! -z `cat some-input/${FILE}` ]
on_success:
put: slack
params:
<your slack resource put params go here>
The on_success part will run if the code defined in the task's run section returns 0. The script listed there just checks to see if there are more than zero bytes in the file. Because the task is wrapped in a try step, regardless of whether or not the task succeeds (and hence, sends you a message), the step will succeed and move to the next step in the plan.

How to version build artifacts using GitHub Actions?

My use case is I want to have a unique version number for artifacts per each build/run. With current tools like CircleCI, Travis, etc. there is a build number available which is basically a counter that always goes up. So, I can create version strings like 0.1.0-27. This counter is increased each time even for the same commit.
How can I do something similar with GitHub Actions? Github actions only offer GITHUB_SHA and GITHUB_REF.
GitHub Actions now has a unique number and ID for a run/build in the github context.
github.run_id : A unique number for each workflow run within a repository. This number does not change if you re-run the workflow run.
github.run_number : A unique number for each run of a particular workflow in a repository. This number begins at 1 for the workflow's first run, and increments with each new run. This number does not change if you re-run the workflow run.
github.run_attempt : A unique number for each attempt of a particular workflow run in a repository. This number begins at 1 for the workflow run's first attempt, and increments with each re-run.
ref: https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#github-context
You can reference them in workflows like this:
- name: Output Run ID
run: echo ${{ github.run_id }}
- name: Output Run Number
run: echo ${{ github.run_number }}
- name: Output Run Attempt
run: echo ${{ github.run_attempt }}
I had the same problem and have just created an action to generate sequential build numbers. Use it like
- uses: einaregilsson/build-number#v1
with:
token: ${{secrets.github_token}}
In steps after that you'll have a BUILD_NUMBER environment variable. See more info about using the same build number for different jobs and more at https://github.com/einaregilsson/build-number/
UPDATE: There is now a $GITHUB_RUN_NUMBER variable built into GitHub Actions, so this approach is not needed anymore.
If you want a constant integer increment (1,2,3,4,5), I haven't found anything in the docs that you could use as such increment which is aware of how many times that particular action ran. There's two solutions I can think of:
Maintaining state on the repo: for example with a count.build file that uses the workflow ID and you increment it on build. This is my least favourite solution of the two because it adds other complexities, like it will itself trigger a push event. You could store this file somewhere else like S3 or in a Gist.
Using the Date: if you're not worried about sequence on the integer increment you could just use the current data and time, for example 0.1.0-201903031310 for Today at 13:10.
Regardless if you have Actions Beta Access, I would definitely feed this back to GitHub.
Hope it helps.
You can use GitVersion to generate incrementing versions from tags in Git. The PR at https://github.com/GitTools/GitVersion/pull/1787 has some details, but basically you can define this job:
- uses: actions/checkout#v1
- name: Get Git Version
uses: docker://gittools/gitversion:5.0.2-beta1-34-linux-debian-9-netcoreapp2.1
with:
args: /github/workspace /nofetch /exec /bin/sh /execargs "-c \"echo $GitVersion_MajorMinorPatch > /github/workspace/version.txt\""

Concourse call job from another job with parameters

I have a job with many tasks like this:
- name: main-job
serial: true
plan:
- aggregate:
- get: <git-resource>
passed: [previous-job]
trigger: true
- get: <git-resource-3>
- task: <task-1>
file: <git-resource>/<path>/<task-1-no-db>.yml
- task: <task-2>
tags: ['<specific-tag>']
file: <git-resource>/<path>/<task-1>.yml
params:
DATABASE_HOST: <file>
DATABASE: <my-db-1>
- task: <task-2>
tags: ['<specific-tag>']
file: <git-resource>/<path>/<task-1>.yml
params:
DATABASE_HOST: <file>
DATABASE: <my-db-1>
The problem for me is, I have to literally call the same job but instead of DATABASE params being my-db-1, I want it to be my-db-2.
The only way I am able to do this is by having new job and pass the params, literally copy the entire set of lines. My job is too fat, as in has too many tasks in them, so copying it though is the obvious solution, I am wondering if there's a way to re-use by having multiple pipelines and one main pipeline that essentially calls these pipelines with the param for DATABASE passed or have two small jobs that calls this main job with different params something like this:
- name: <call-main-job-with-db-1>
serial: true
plan:
- aggregate:
- get: <git-resource>
passed: [previous-job]
trigger: true
- task: <call-main-job-task>
params:
DATABASE_HOST: <file>
DATABASE: <my-db-1>
- name: <call-main-job-with-db-2>
serial: true
plan:
- aggregate:
- get: <git-resource>
passed: [previous-job]
trigger: true
- task: <call-main-job-task>
params:
DATABASE: <my-db-2>
I am not sure if this is even possible since I didn't find any example of this.
Remember you are using YAML, so you can use YAML features like "Anchors"
You will find some additional information about "Anchors" in this link. Look for "EXTRA YAML FEATURES"
YAML also has a handy feature called 'anchors', which let you easily duplicate
content across your document. Both of these keys will have the same value: anchored_content: &anchor_name This string will appear as the
value of two keys. other_anchor: *anchor_name
# Anchors can be used to duplicate/inherit properties
base: &base
name: Everyone has same name
foo: &foo
<<: *base
age: 10
bar: &bar
<<: *base
age: 20
Try this for your Concourse Pipeline:
common:
db_common: &db_common
serial: true
plan:
- aggregate:
- get: <git-resource>
passed: [previous-job]
trigger: true
- task: <call-main-job-task>
params:
jobs:
- name: <call-main-job-with-db-1>
<<: *db_common
DATABASE_HOST: <file>
DATABASE: <my-db-1>
- name: <call-main-job-with-db-2>
<<: *db_common
DATABASE: <my-db-2>
NOTE: Remember that you can have as many Anchors as you want, you can define two or more anchors for the same Job/Task/Resource, etc.
You need to just copy and paste the task as you do in the question description. Concourse expects an expressive yaml, there is no branching or logic allowed. If you don't want to copy and paste so much yaml, then you can do some yaml generation magic to simplify what you look at and work with, but concourse will want the full yaml with each job defined separately.
Concourse has this fan in fan out paradigm, where you want to keep the jobs simple and short. Use a scripting language e.g. like python or ruby to make your pipeline creation more flexible.
Personally i use one pipeline.yml.erb file where i render different job templates inside. I try to keep my job.yml.erb as generic as possible so i can reuse them for different pipelines.
To bring it to the next level you could specify a meta config.yml and use this config inside your templates to generate your pipeline depending on what you specified in the config.

Passing parameters between concourse jobs / tasks

What's the best way to pass parameters between concourse tasks and jobs? For example; if my first task generates a unique ID, what would be the best way to pass that ID to the next job or task?
If you are just passing between tasks within the same job, you can use artifacts (https://concourse-ci.org/running-tasks.html#outputs) and if you are passing between jobs, you can use resources (like putting it in git or s3). For example, if you are passing between tasks, you can have a task file
---
platform: linux
image_resource: # ...
outputs:
- name: unique-id
run:
path: project-src/ci/fill-in-output.sh
And the script fill-in-output.sh will put the file that contains the unique ID into path unique-id/. With that, you can have another task that takes the unique-id output as an input (https://concourse-ci.org/running-tasks.html#inputs) and use that unique id file.
Additionally to tasks resources will place files automagically for you in their working directory.
For example I have a pipeline job as follows
jobs:
- name: build
plan:
- get: git-some-repo
- put: push-some-image
params:
build: git-some-repo/the-image
- task: Use-the-image-details
config:
platform: linux
image_resource:
type: docker-image
source:
repository: alpine
inputs:
- name: push-some-image
run:
path: sh
args:
- -exc
- |
ls -lrt push-some-image
cat push-some-image/repository
cat push-some-image/digest
Well see the details of the image push from push-some-image
+ cat push-some-image/repository
xxxxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/path/image
+ cat push-some-image/digest
sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Passing data within a job's tasks could easily be done with input/output artifacts (files), As Clara Fu noted.
For the case between jobs, when simple e.g. 'string' data has to be passed , and using a git is an overkill, the 'keyval' resource[1] seems to be a good solution.
The readme describes that the data is stored and managed as a standard properties file.
https://github.com/SWCE/keyval-resource