Not able to update experiment metrics from iterative.ai studio - mlops

I have DVC and gitlab-ci integrated using CML and with studio as well. But whenever I run an experiment from studio dashboard, the new experiment appears on the dashboard but metrics don't get updated in git and thus in studio dashboard as well. But yes the updated metrics are shown in the merge comment. Is there a way update metrics in git repo and studio as well. My gitlab-ci.yml is as follows.
train-and-report:
image: iterativeai/cml:0-dvc2-base1 # Python, DVC, & CML pre-installed
script:
- pip install -r requirements.txt # Install dependencies
- dvc repro # Reproduce pipeline
# Create CML report
- echo "## Metrics workflow vs. main" >> report.md
- git fetch --depth=1 origin main:main
- dvc metrics diff --show-md main >> report.md
- echo "## Plots" >> report.md
- echo "### Training loss function diff" >> report.md
- dvc plots diff --target dvclive/plots/metrics/number.tsv --show-vega main > vega.json
- vl2png vega.json > plot.png
- echo '![](./plot.png "Training Loss")' >> report.md
- cml comment create report.md
Is there something I am missing to update the metrics within the git itself.

You need to save (commit and push) your results, otherwise neither you, GitLab, nor Iterative Studio will be able to retrieve any results.
...
# add a comment to GitLab
- cml comment create report.md
# upload code/metadata
- cml pr create . # this will git commit, push, and open a merge request
# upload data/artefacts
- dvc push
Note for dvc push to work, you will need to setup storage credentials if you haven't done so already.

Related

Is it possible to split up a GitHub workflow such that each step has a separate badge?

I am relatively new to GitHub workflows and testing. I am working in a private GitHub repository with a dozen colleagues. We want to avoid using services like CircleCI for the time being and see how much we can do with just the integrated GitHub actions, since we are unsure about the kind of access a third party service would be getting to the repo.
Currently, we have two workflows (each one tests the same code for a separate Python environment) that get triggered on push or pull request in the master branch.
The steps of the workflow are as follows (the full workflow yml file is given at the bottom):
Install Anaconda
Create the conda environment (installing dependencies)
Patch libraries
Build a 3rd party library
Run python unit tests
It would be amazing to know immediately which part of the code failed given some new pull requests. Right now, every aspect of the codebase gets tested by a single python file run_tests.py. I was thinking of splitting up this file and creating a workflow per aspect I want to test separately, but then I would have to create a whole new environment, patch the libraries and build the 3rd party library every time I want to conduct a single test. These tests already take quite some time.
My question is now: is there any way to avoid doing that? Is there a way to build everything on the Linux server and re-use that, so that they don't need to be rebuilt every test? Is there a way to display a badge per python test that fails/succeeds, so that we can give more information than just "everything passed" or "everything failed". Is such a thing better suited for a service like CircleCI (or other recommendations are also welcome)?
Here is the full yml file for the workflow for the Python 3 environment. The Python2 one is identical except for the anaconda environment steps.
name: (Python 3) install and test
# Controls when the workflow will run
on:
# Triggers the workflow on push or pull request events but only for the master branch
push:
branches: [ master ]
pull_request:
branches: [ master ]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout#v2
# Install Anaconda3 and update conda package manager
- name: Install Anaconda3
run: |
wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh --quiet
bash Anaconda3-2020.11-Linux-x86_64.sh -b -p ~/conda3-env-py3
source ~/conda3-env-py3/bin/activate
conda info
# Updating the root environment. Install dependencies (YAML)
# NOTE: The environment file (yaml) is in the 'etc' folder
- name: Install ISF dependencies
run: |
source ~/conda3-env-py3/bin/activate
conda-env create --name isf-py3 --file etc/env-py3.yml --quiet
source activate env-py3
conda list
# Patch Dask library
- name: Patch dask library
run: |
echo "Patching dask library."
source ~/conda3-env-py3/bin/activate
source activate env-py3
cd installer
python patch_dask_linux64.py
conda list
# Install pandas-msgpack
- name: Install pandas-msgpack
run: |
echo "Installing pandas-msgpack"
git clone https://github.com/abast/pandas-msgpack.git
# Applying patch to pandas-msgpack (generating files using newer Cython)
git -C pandas-msgpack apply ../installer/pandas_msgpack.patch
source ~/conda3-env-py3/bin/activate
source activate env-py3
cd pandas-msgpack; python setup.py install
pip list --format=freeze | grep pandas
# Compile neuron mechanisms
- name: Compile neuron mechanisms
run: |
echo "Compiling neuron mechanisms"
source ~/conda3-env-py3/bin/activate
source activate env-py3
pushd .
cd mechanisms/channels_py3; nrnivmodl
popd
cd mechanisms/netcon_py3; nrnivmodl
# Run tests
- name: Testing
run: |
source ~/conda3-env-py3/bin/activate
source activate env-py3
export PYTHONPATH="$(pwd)"
dask-scheduler --port=38786 --dashboard-address=38787 &
dask-worker localhost:38786 --nthreads 1 --nprocs 4 --memory-limit=100e15 &
python run_tests.py
Many thanks in advance
Tried:
Building everything in a single github workflow, testing everything in the same workflow.
Expected:
Gaining information on specific steps that failed or worked. Displaying this information as a badge on the readme page.
Actual result:
Only the overall success status can be displayed as badge. Only the success status of "running all tests" is available.

Populate version of s3 file dynamically on Github Actions

I am trying to migrate from circle CI to GitHub actions and I am stuck at this step where I am trying to populate the version of a s3 file dynamically.
This is how it is being done on circle CI and it works fine
echo "export FILE_LOCATION='s3://xxx-xxx/'${PROJECT_NAME}_$(cat VERSION)'.zip'" >> $BASH_ENV
This is how I tried doing it on Github Actions config
env:
NAME: '${{ github.repository }}_$(cat VERSION).zip'
However, I get the following error when I run it on GitHub actions
cat: VERSION: No such file or directory
Any idea how to handle such values to be dynamic on GitHub actions? TIA
If you want to create an environment variable, add it to the file behind $GITHUB_ENV like so:
- run: echo "NAME=${{ github.repository }}_$(cat VERSION).zip" >> $GITHUB_ENV
- run: echo ${{ env. NAME }}
For more information, see the docs on Workflow commands for GitHub Actions / Setting an environment variable

Azure DevOps Pipeline - Checkout only folder [duplicate]

My repository in my organisation's devops project contains a lot of .net solutions and some unity projects as well. When I run my build pipeline, it fails due to several of these:
Error MSB3491: Could not write lines to file "obj\Release\path\to\file". There is not enough space on the disk.
I would like the pipeline to only checkout and fetch parts of the repository that are required for a successful build. This might also help with execution time of the pipeline since it currently also fetches the whole of my unity projects with gigabytes of resources which takes forever.
I would like to spread my projects across multiple repositories but the admin won't give me more than the one I already have. It got a lot better when I configured git fetch as shallow (--depth=1) but I still get the error every now and then.
This is how I configured the checkout:
steps:
- checkout: self
clean: true
# shallow fetch
fetchDepth: 1
lfs: false
submodules: false
The build is done using VSBuild#1 task.
I can't find a valid solution to my problem except for using multiple repositories, which is not an option right now.
Edit: Shayki Abramczyk's solution #1 works perfectly. Here is my full implementation.
GitSparseCheckout.yml:
parameters:
access: ''
repository: ''
sourcePath: ''
steps:
- checkout: none
- task: CmdLine#2
inputs:
script: |
ECHO ##[command] git init
git init
ECHO ##[command] git sparse-checkout: ${{ parameters.sourcePath }}
git config core.sparsecheckout true
echo ${{ parameters.sourcePath }} >> .git/info/sparse-checkout
ECHO ##[command] git remote add origin https://${{ parameters.repository }}
git remote add origin https://${{ parameters.access }}#${{ parameters.repository }}
ECHO ##[command] git fetch --progress --verbose --depth=1 origin master
git fetch --progress --verbose --depth=1 origin master
ECHO ##[command] git pull --progress --verbose origin master
git pull --progress --verbose origin master
Checkout is called like this (where template path has to be adjusted):
- template: ../steps/GitSparseCheckout.yml
parameters:
access: anything:<YOUR_PERSONAL_ACCESS_TOKEN>
repository: dev.azure.com/organisation/project/_git/repository
sourcePath: path/to/files/
In Azure DevOps you don't have option to get only part of the repository, but there is a workaround:
Disable the "Get sources" step and get only the source you want by manually executing the according git commands in a script.
To disable the default "Get Sources" just specify none in the checkout statement:
- checkout: none
In the pipeline add a CMD/PowerShell task to get the sources manually with one of the following 2 options:
1. Get only part of the repo with git sparse-checkout.
For example, get only the directories src_1 and src_2 within the test folder (lines starting with REM ### are just the usual batch comments):
- script: |
REM ### this will create a 'root' directory for your repo and cd into it
mkdir myRepo
cd myRepo
REM ### initialize Git in the current directory
git init
REM ### set Git sparsecheckout to TRUE
git config core.sparsecheckout true
REM ### write the directories that you want to pull to the .git/info/sparse-checkout file (without the root directory)
REM ### you can add multiple directories with multiple lines
echo test/src_1/ >> .git/info/sparse-checkout
echo test/src_2/ >> .git/info/sparse-checkout
REM ### fetch the remote repo using your access token
git remote add -f origin https://your.access.token#path.to.your/repo
REM ### pull the files from the source branch of this build, using the build-in Azure DevOps variable for the branch name
git pull origin $(Build.SourceBranch)
displayName: 'Get only test/src_1 & test/src_2 directories instead of entire repository'
Now in the builds task make myRepo the working directory.
Fetching the remote repo using an access token is necessary, since using checkout: none will prevent your login credentials from being used.
In the end of the pipeline you may want to add step to clean the myRepo directory.
2. Get parts of the repo with Azure DevOps Rest API (Git - Items - Get Items Batch).
The other answers work well but I found a different way using potentially newer features of git.
This will fetch to a depth of 1 and show all the files in the root folder plus folder1, folder2 and folder3
- task: CmdLine#2
inputs:
script: |
git init
git sparse-checkout init --cone
git sparse-checkout set folder1 folder2 folder3
git remote add origin https://<github-username>:%GITHUB_TOKEN%#<your-git-repo>
git fetch --progress --verbose --depth=1 origin
git switch develop
env:
GITHUB_TOKEN: $(GITHUB_TOKEN)
Maybe it is helpful for you to check out only a specific branch. This works by:
resources:
repositories:
- repository: MyGitHubRepo
type: github
endpoint: MyGitHubServiceConnection
name: MyGitHubOrgOrUser/MyGitHubRepo
ref: features/tools
steps:
- checkout: MyGitHubRepo
Or by using the inline syntax like so
- checkout: git://MyProject/MyRepo#features/tools # checks out the features/tools branch
- checkout: git://MyProject/MyRepo#refs/heads/features/tools # also checks out the features/tools branch
- checkout: git://MyProject/MyRepo#refs/tags/MyTag # checks out the commit referenced by MyTag.
More information can be found here
A Solution For Pull Request and Master Support
I realized after posting this solution it is similar to the updated one on the post. However this solution is a bit more rich and optimized. But most importantly this solution uses the pull request merge branch in Dev Ops for the deployments like the native checkouts do. It also fetches only the needed commits.
Supports multiple folder/path patterns as parameters
Minimal checkout with the bare minimum needed via sparse checkout
Shallow depth, multithreaded fetch, with a sparse index.
It takes into account using the PR merge branch against main rather than the raw PR branch itself if needed.
Uses native System Token already in pipeline
Handles detection and alternative ref flows for master where a merge branch does not exist.
Example Use in your Script:
- job: JobNameHere
displayName: JobDisplayName Here
steps:
- template: templates/sparse-checkout.yml
parameters:
checkoutFolders:
- /Scripts
- /example-file.ps1
# other steps
templates/sparse-checkout.yaml
parameters:
- name: checkoutFolders
default: '*'
type: object
steps:
- checkout: none
- task: PowerShell#2
inputs:
targetType: inline
script: |
$useMasterMergeIfAvaiable = $true
$checkoutFolders = ($env:CheckoutFolders | ConvertFrom-Json)
Write-Host $checkoutFolders
$sw = [Diagnostics.Stopwatch]::StartNew() # For timing the run.
$checkoutLocation = $env:Repository_Path
################ Setup Variables ###############
$accessToken = "$env:System_AccessToken";
$repoUriSegments = $env:Build_Repository_Uri.Split("#");
$repository = "$($repoUriSegments[0]):$accessToken#$($repoUriSegments[1])"
$checkoutBranchName = $env:Build_SourceBranch;
$prId = $env:System_PullRequest_PullRequestId;
$repositoryPathForDisplay = $repository.Replace("$accessToken", "****");
$isPullRequest = $env:Build_Reason -eq "PullRequest";
################ Configure Refs ##############
if ($isPullRequest)
{
Write-Host "Detected Pull Request"
$pullRequestRefMap = "refs/heads/$($checkoutBranchName):refs/remotes/origin/pull/$prId"
$mergeRefMap = "refs/pull/$prId/merge:refs/remotes/origin/pull/$prId";
$mergeRefRemote = $mergeRefMap.Split(":")[0];
$remoteMergeBranch = git ls-remote $repository "$mergeRefRemote" # See if remote merge ref exiss for PR.
if ($useMasterMergeIfAvaiable -and $remoteMergeBranch)
{
Write-Host "Remote Merge Branch Found: $remoteMergeBranch" -ForegroundColor Green
$refMapForCheckout = $mergeRefMap
$remoteRefForCheckout = "pull/$prId/merge"
}else{
Write-Host "No merge from master found (or merge flag is off in script), using pullrequest branch." -ForegroundColor Yellow
$refMapForCheckout = $pullRequestRefMap
$remoteRefForCheckout = "heads/$checkoutBranchName"
}
$localRef = "origin/pull/$prId"
}else{
Write-Host "This is not a pull request. Assuming master branch as source."
$localRef = "origin/master"
$remoteRefForCheckout = "master"
}
######## Sparse Checkout ###########
Write-Host "Beginning Sparse Checkout..." -ForegroundColor Green;
Write-Host " | Repository: $repositoryPathForDisplay" -ForegroundColor Cyan
if (-not (test-path $checkoutLocation) ) {
$out = mkdir -Force $checkoutLocation
}
$out = Set-Location $checkoutLocation
git init -q
git config core.sparsecheckout true
git config advice.detachedHead false
git config index.sparse true
git remote add origin $repository
git config remote.origin.fetch $refMapForCheckout
git sparse-checkout set --sparse-index $checkoutFolders
Write-Host " | Remote origin configured. Fetching..."
git fetch -j 4 --depth 1 --no-tags -q origin $remoteRefForCheckout
Write-Host " | Checking out..."
git checkout $localRef -q
Get-ChildItem -Name
# tree . # Shows a graphical structure - can be large with lots of files.
############ Clean up ##################
if (Test-Path -Path ..\$checkoutLocation)
{
Write-Host "`nChecked Out`n#############"
Set-Location ../
}
$sw.Stop()
Write-Host "`nCheckout Complete in $($sw.Elapsed.TotalSeconds) seconds." -ForegroundColor Green
displayName: 'Sparse Checkout'
env:
Build_Repository_Uri: $(Build.Repository.Uri)
Build_Reason: $(Build.Reason)
System_PullRequest_SourceBranch: $(System.PullRequest.SourceBranch)
System_PullRequest_PullRequestId: $(System.PullRequest.PullRequestId)
System_PullRequest_SourceRepositoryURI: $(System.PullRequest.SourceRepositoryURI)
Build_BuildId: $(Build.BuildId)
Build_SourceBranch: $(Build.SourceBranch)
CheckoutFolders: ${{ convertToJson(parameters.checkoutFolders) }}
System_AccessToken: $(System.AccessToken)
Repository_Path: $(Build.Repository.LocalPath)
With LFS support on Ubuntu and Windows agents
parameters:
folders: '*'
steps:
- bash: |
set -ex
export ORIGIN=$(Build.Repository.Uri)
export REF=$(Build.SourceVersion)
export FOLDERS='${{ parameters.folders }}'
git version
git lfs version
git init
git sparse-checkout init --cone
git sparse-checkout add $FOLDERS
git remote add origin $ORIGIN
git config core.sparsecheckout true
git config gc.auto 0
git config advice.detachedHead false
git config http.version HTTP/1.1
git lfs install --local
git config uploadpack.allowReachableSHA1InWant true
git config http.extraheader "AUTHORIZATION: bearer $(System.AccessToken)"
git fetch --force --no-tags --progress --depth 1 origin develop $REF
git checkout $REF --progress --force
displayName: Fast sparse Checkout
Then use as a step
steps:
- checkout: none
- template: fastCheckout.yaml
parameters:
folders: 'Folder1 src/Folder2'
You can pass folders as paramters
The exports are there to make it easier to test the script locally.
Improved checkouts from 10mins to 2mins

Is there a way to log error responses from Github Actions?

I am trying to create a bug tracker that allows me to record the error messages of the python script I run. Here is my YAML file at the moment:
name: Bug Tracker
#Controls when the workflow will run
on:
# Triggers the workflow on push request events
push:
branches: [ main ]
# Allows you to run this workflow manually from the Actions tab (for testing)
workflow_dispatch:
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
build:
# Self Hosted Runner
runs-on: windows-latest
# Steps for tracker to get activated
steps:
# Checks-out your repository under BugTracker so the job can find it
- uses: actions/checkout#v2
- name: setup python
uses: actions/setup-python#v2
with:
python-version: 3.8
# Runs main script to look for
- name: Run File and collect bug
id: response
run: |
echo Running File...
python script.py
echo "${{steps.response.outputs.result}}"
Every time I run the workflow I can't save the error code. By save the error code, I mean for example... if the python script produces "Process completed with exit code 1." then I can save that to a txt file. I've seen cases where I could save if it runs successfully. I've thought about getting the error in the python script but I don't want to have to add the same code to every file if I don't have to. Any thoughts? Greatly appreciate any help or suggestions.
Update: I have been able to successfully use code in python to save to the txt file. However, I'm still looking to do this in Github if anyone has any suggestions.
You could :
redirect the output to a log file while capturing the exit code
set an output with the exit code value like:
echo ::set-output name=status::$status
in another step, commit the log file
in a final step, check that the exit code is success (0) otherwise exit the script with this exit code
Using ubuntu-latest, it would be like this:
name: Bug Tracker
on: [push,workflow_dispatch]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
- name: setup python
uses: actions/setup-python#v2
with:
python-version: 3.8
- name: Run File and collect logs
id: run
run: |
echo Running File...
status=$(python script.py > log.txt 2>&1; echo $?)
cat log.txt
echo ::set-output name=status::$status
- name: Commit log
run: |
git config --global user.name 'GitHub Action'
git config --global user.email 'action#github.com'
git add -A
git checkout master
git diff-index --quiet HEAD || git commit -am "deploy workflow logs"
git push
- name: Check run status
if: steps.run.outputs.status != '0'
run: exit "${{ steps.run.outputs.status }}"
On windows, I think you would need to update this part:
status=$(python script.py > log.txt 2>&1; echo $?)
cat log.txt

Publish releases to gh-pages with travis without deleting previous releases

I want to publish releases to gh-pages. The deploy provider can be configured to keep history using keep-history: true. However I would like previous versions not to be just available in the git history but not to be deleted from the repository.
I've configured yarn and webpack to create a separate directory for each tag and to put the distribution in both a "latest"-directory as well as this tag specific directory. I would like to see a tag directory for all previous versions and not just for the latest version.
Here are the results of my current configuration: https://github.com/retog/rdfgraphnode-rdfext/tree/gh-pages
I found the following solution.
In travis.yml replace:
- provider: pages
skip-cleanup: true
github-token: $GITHUB_TOKEN
keep-history: true
local-dir: distribution
on:
tags: true
With:
- provider: script
script: bash .travis_publish
on:
tags: true
And add the script file .travis_publish with the following content:
#!/bin/bash
PUBLICATION_BRANCH=gh-pages
# Checkout the branch
REPO_PATH=$PWD
pushd $HOME
git clone --branch=$PUBLICATION_BRANCH https://${GITHUB_TOKEN}#github.com/$TRAVIS_REPO_SLUG publish 2>&1 > /dev/null
cd publish
# Update pages
cp -r $REPO_PATH/distribution .
# Commit and push latest version
git add .
git config user.name "Travis"
git config user.email "travis#travis-ci.org"
git commit -m "Updated distribution."
git push -fq origin $PUBLICATION_BRANCH 2>&1 > /dev/null
popd