How to include a PowerShell script file in a GitLab CI YAML file - powershell

Currently I have a large Bash script in my GitLab CI YAML file. The example below is how I am grouping my functions so that I can use them during my CI process.
.test-functions: &test-functions |
function write_one() { echo "Function 1" }
function write_two() { echo "Function 2"}
function write_three() { echo "Function 3"}
.plugin-nuget:
tags:
- test-es
- kubernetes
image: mygitlab-url.com:4567/containers/dotnet-sdk-2.2:latest
script:
- *test-functions
- write_one
- write_two
- write_three
The example below shows how we can include a YAML file inside another one:
include:
- project: MyGroup/GitlabCiPlugins/Dotnet
file: plugin-test.yml
ref: JayCiTest
I would like to do the same thing with my script. Instead of having the script in the same file as my YAML, I would like to include the file, so that my YAML has access to my script's functions. I would also like to use PowerShell instead of Bash if possible.
How can I do this?

Split shell scripts and GitLab CI syntax
GitLab CI has no feature "include file content to script block".
GitLab CI include feature doesn't import YAML anchors.
GitLab CI can't concat arrays. So you cant write before_script in one .gitlab-ci.yml file and then use it in before_script in another. You can only rewrite it, not concat.
Because of all of these problems you can't easily manage your scripts; split them, organize them and do another nice developer's decomposition stuff.
There are possible workarounds. You can store your scripts somewhere else; where a gitlab runner could reach them, and then inject them to your current job environment via source /ci-scripts/my-script.sh in before_script block.
Possible locations for storing ci scripts:
Special docker image with all your build/test/deploy utils and ci scripts
The same, but dedicated build server
You can deploy simple web page containing your scripts and download and import then in before_script. Just in case, make sure nobody, except gitlab runner could access it.
Using powershell
You can use powershell only if you installed your GitLab Runner on Windows. You can't use anything else in that case.

Related

Use local script as source for Argo workflow

I have a python script that I'd like to execute on cloud using an Argo workflow.
Currently, I'm alternating between copying the source code to the workflow itself (using copy and paste), which is inconvenient and causes issues.
The second options is uploading my project directory to an s3 bucket, then downloading the source code to the Argo pod, then running the commands.
Both methods require some actions to sync the source code after I modify the script.
Is there a way to specify on the Argo workflow from where it should take the source code from?
Say, instead of creating a script template that takes the source from a string specified in the .yml file - take it from a local file by specifying a local path?
Prefer not to use Git for that
Also, if possible would prefer solutions with support for attaching additional dependencies source code files
If you have something more sophisticated than a simple script that you use in the .yml file, it might be worthwhile to use a docker image with a container template that you pre-build for your workflow.
The image will be named my-script and the entrypoint my-entrypoint
Assuming the script is in python called script.py you can have the following files:
workflow.yml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: workflow-name-
spec:
entrypoint: my-entrypoint
templates:
- name: my-entrypoint
container:
image: my-script
command: python3
args:
- script.py
script.py
import requests
response = requests.get('www.google.com')
print(response.status_code)
requirements.txt
requests
Dockerfile
FROM python:3.11.1-slim
COPY . .
RUN pip3 install -r requirements.txt
CMD python3 script.py
Assuming you can build your image to the cluster (in the case of minikube). You'd run:
docker build -t my-script .
This approach also makes your code testable, should you decide to have tests. For this it is not necessary to use Git, although I'd encourage you to use it for collaboration and versioning. Also the COPY command in the Dockerfile copies all files in your directory to the image, so you'd have other information readily available. I would discourage you to copy actual data in this way, but rather use argo parameters and artifacts.
Check out https://argoproj.github.io/argo-workflows/workflow-concepts/ for more info

There does not seem to be a good substitute for core.exportVariable in github-script right now

Every time we use core.exportVariable which, as far as I know, is the canonical way to export a variable in #action/core and, consequently, in github-script, you get an error such as this one:
Warning: The set-env command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
That link leads to an explanation of environment files, which, well, are files. Problem is files do not seem to have such a great support in github-script. There's the #actions/io package, but there's no way to create a file with that.
So is there something I'm missing, or there is effectively no way to create an environment file form inside a github-script step?
You no longer need the actions/github-script nor any other special API to export an environment variable. According to the Environment Files documentation, you can simply write to the $GITHUB_ENV file directly from the workflow step like this:
steps:
- name: Set environment variable
run: echo "{name}={value}" >> $GITHUB_ENV
The step will expose given environment variable to subsequent steps in the currently executing workflow job.

Azure DevOps ssh script file task

I'm just wondering if it's possible to use a script file in the SSH task in the Releases that will be populated with the environment variables from azure.
Script:
TEST=$(test)
I saved this script as an artifact, successfully download it, and select this script in the SSH task as a file, but the problem is the environment variables is not unwrapped, does someone have some approach?
If I put this same script as an inline script, then it's working. But if I chose script file then not.
I want to have this script in the git repo, so I can easily edit the script.
Does someone have this working?
I'm just wondering if it's possible to use a script file in the SSH task in the Releases that will be populated with the environment variables from azure
I am afraid there is no such out of way to use a script file in the SSH task to populated with the environment variables from azure.
As workaround, we could use the task Replace Tokens to update the value in the script file:
The format of variable in .sh file is #{TestVar}#.
Hope this helps.

Is there a tool to validate an Azure DevOps Pipeline locally? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last month.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
When making changes to YAML-defined Azure DevOps Pipelines, it can be quite tedious to push changes to a branch just to see the build fail with a parsing error (valid YAML, but invalid pipeline definition) and then try to trial-and-error fix the problem.
It would be nice if the feedback loop could be made shorter, by analyzing and validating the pipeline definition locally; basically a linter with knowledge about the various resources etc that can be defined in an Azure pipline. However, I haven't been able to find any tool that does this.
Is there such a tool somewhere?
UPDATE: This functionality was removed in Issue #2479 in Oct, 2019
You can run the Azure DevOps agent locally with its YAML testing feature.
From the microsoft/azure-pipelines-agent project, to install an agent on your local machine.
Then use the docs page on Run local (internal only) to access the feature that is available within the agent.
This should get you very close to the type of feedback you would expect.
FYI - this feature has been removed in Issue #2479 - remove references to "local run" feature
Hopefully they'll bring it back later considering Github Actions has the ability to run actions locally
Azure DevOps has provided a run preview api endpoint that takes a yaml override and returns the expanded yaml. I added this support to the AzurePipelinePS powershell module. The command below will execute the pipeline with the id of 01 but with my yaml override and return the expanded yaml pipeline.
Preview - Preview
Service:
Pipelines
API Version:
6.1-preview.1
Queues a dry run of the pipeline and returns an object containing the final yaml.
# AzurePipelinesPS session
$session = 'myAPSessionName'
# Path to my local yaml
$path = ".\extension.yml"
# The id of an existing pipeline in my project
$id = 01
# The master branch of my repository
$resources = #{
repositories = #{
self = #{
refName = 'refs/heads/master'
}
}
}
Test-APPipelineYaml -Session $session -FullName $path -PipelineId $id -Resources
$resources
A pipeline described with YAML, and YAML can be validated if you have a schema with rules on how that YAML file should be composed. It will work as short feedback for the case you described, especially for syntax parsing errors. YAML Schema validation might be available for almost any IDE. So, we need:
YAML Schema - against what we will validate our pipelines
An IDE (VS Code as a popular example) - which will perform validation
on the fly
Configure two of the above to work together for the greater good
The schema might be found from many places, for this case, I'll suggest using https://www.schemastore.org/json/
It has Azure Pipelines schema (this schema contains some issues, like different types of values comparing to Microsoft documentation, but still cover the case of invalid syntax)
VS Code will require an additional plug-in to perform YAML text validation, there are also a bunch of those, who can validate schema. I'll suggest try YAML from RedHat (I know, the rating of the plugin is not the best, but it works for the validation and is also configurable)
In the settings of that VS Code plugin, you will see a section about validation (like on screenshot)
Now you can add to the settings required schema, even without downloading it to your machine:
"yaml.schemas": {
"https://raw.githubusercontent.com/microsoft/azure-pipelines-vscode/v1.174.2/service-schema.json" : "/*"
}
Simply save settings and restart your VS Code.
You will notice warnings about issues in your YAML Azure DevOps Pipeline files (if there is any). Failed validation for purpose on the screenshot below:
See more details with examples here as well
I can tell you how we manage this disconnect.
We use only pipeline-as-code, yaml.
We use ZERO yaml templates and strictly enforce one-file-pr-pipeline.
We use the azure yaml extension to vscode, to get linter-like behaviour in the editor.
Most of the actual things we do in the pipelines, we do by invoking PowerShell, that via sensible defaulting also can be invoked in the CLI, meaning we in essence can execute anything relevant locally.
Exceptions are Configurations of the agent - and actual pipeline-only stuff, such as download-artifact tasks and publish tasks etc.
Let me give some examples:
Here we have the step that builds our FrontEnd components:
Here we have that step running in the CLI:
I wont post a screenshot of the actual pipeline run, because it would take to long to sanitize it, but it basically is the same, plus some more trace information, provided by the run.ps1 call-wrapper.
Such tool does not exists at the moment - There are a couple existing issues in their feedback channels:
Github Issues - How to test YAML locally before commit
Developer Community - How to test YAML locally before commit
As a workaround - you can install azure devops build agent on your own machine, register as its own build pool and use it for building and validating yaml file correctness. See Jamie's answer in this thread
Of course this would mean that you will need to constantly switch between official build agents and your own build pool which is not good. Also if someone accidentally pushes some change via your own machine - you can suffer from all kind of problems, which can occur in normal build machine. (Like ui prompts, running hostile code on your own machine, and so on - hostile code could be even unintended virus infection because of 3rd party executable execution).
There are two approaches which you can take:
Use cake (frosten) to perform build locally as well as perform building on Azure Devops.
Use powershell to perform build locally as well as on Azure Devops.
Generally 1 versus 2 - 1 has more mechanics built-in, like publishing on Azure devops (supporting also other build system providers, like github actions, and so on...).
(I by myself would propose using 1st alternative)
As for 1:
Read for example following links to have slightly better understanding:
https://blog.infernored.com/cake-frosting-even-better-c-devops/
https://cakebuild.net/
Search for existing projects using "Cake.Frosting" on github to get some understanding how those projects works.
As for 2: it's possible to use powershell syntax to maximize the functionality done on build side and minimize functionality done on azure devops.
parameters:
- name: publish
type: boolean
default: true
parameters:
- name: noincremental
type: boolean
default: false
...
- task: PowerShell#2
displayName: invoke build
inputs:
targetType: 'inline'
script: |
# Mimic build machine
#$env:USERNAME = 'builder'
# Backup this script if need to troubleshoot it later on
$scriptDir = "$(Split-Path -parent $MyInvocation.MyCommand.Definition)"
$scriptPath = [System.IO.Path]::Combine($scriptDir, $MyInvocation.MyCommand.Name)
$tempFile = [System.IO.Path]::Combine([System.Environment]::CurrentDirectory, 'lastRun.ps1')
if($scriptPath -ne $tempFile)
{
Copy-Item $scriptPath -Destination $tempFile
}
./build.ps1 'build;pack' -nuget_servers #{
'servername' = #{
'url' = "https://..."
'pat' = '$(System.AccessToken)'
}
'servername2' = #{
'url' = 'https://...'
'publish_key' = '$(ServerSecretPublishKey)'
}
} `
-b $(Build.SourceBranchName) `
-addoperations publish=${{parameters.publish}};noincremental=${{parameters.noincremental}}
And on build.ps1 then handle all parameters as seems to be necessary.
param (
# Can add operations using simple command line like this:
# build a -add_operations c=true,d=true,e=false -v
# =>
# a c d
#
[string] $addoperations = ''
)
...
foreach ($operationToAdd in $addoperations.Split(";,"))
{
if($operationToAdd.Length -eq 0)
{
continue
}
$keyValue = $operationToAdd.Split("=")
if($keyValue.Length -ne 2)
{
"Ignoring command line parameter '$operationToAdd'"
continue
}
if([System.Convert]::ToBoolean($keyValue[1]))
{
$operationsToPerform = $operationsToPerform + $keyValue[0];
}
}
This will allow to run all the same operations on your own machine, locally and minimize amount of yaml file content.
Please notice that I have added also last execution .ps1 script copying as lastRun.ps1 file.
You can use it after build if you see some non reproducible problem - but you want to run same command on your own machine to test it.
You can use ` character to continue ps1 execution on next line, or in case it's complex structure already (e.g. #{) - it can be continued as it's.
But even thus yaml syntax is minimized, it still needs to be tested - if you want different build phases and multiple build machines in use. One approach is to have special kind of argument -noop, which does not perform any operation - but will only print what was intended to be executed. This way you can run your pipeline in no time and check that everything what was planned to be executed - will gets executed.

How to run jupyter notebook in airflow

My code is written in jupyter and saved as .ipynb format.
We want to use airflow to schedule the execution and define the dependencies.
How can the notebooks be executed in airflow?
I know I can convert them to python files first but the graphs generated on the fly will be difficult to handle.
Is there are any easier solution? Thanks
You can also use combination of airflow + papermill.
Papermill
Papermill is a tool for running jupyter notebooks with parameters: https://github.com/nteract/papermill
Running a jupyter notebook is very easy, you can do it from python script:
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters = dict(alpha=0.6, ratio=0.1)
)
or from CLI:
$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
and it will run a notebook from the input path, create a copy in the output path and update this copy after each cell run.
Airflow Integration
To integrate it with Airflow, there is a dedicated papermill operator for running parametrized notebooks: https://airflow.readthedocs.io/en/latest/howto/operator/papermill.html
You can setup the same input/output/paramters arguments directly in the DAG definition and use the templating for the aifrlow variables:
run_this = PapermillOperator(
task_id="run_example_notebook",
dag=dag,
input_nb="/tmp/hello_world.ipynb",
output_nb="/tmp/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
)
We encountered this problem before and spent quite a couple of days to solve it.
We packaged it as a docker file and published on github https://github.com/michaelchanwahyan/datalab.
It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph generated on the fly can also be updated and saved within inside the notebook.
When it is executed
the notebook will be read and inject the parameters
the notebook is executed and the output will overwrite the original path
Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.
Another alternative is to use Ploomner (disclaimer: I'm the author). It uses papermill under the hood to build multi-stage pipelines. Tasks can be notebooks, scripts, functions, or any combination of them. You can run locally, Airflow, or Kubernetes (using Argo workflows).
This is how a pipeline declaration looks like:
tasks:
- source: notebook.ipynb
product:
nb: output.html
data: output.csv
- source: another.ipynb
product:
nb: another.html
data: another.csv
Repository
Exporting to Airflow
Exporting to Kubernetes
Sample pipelines