Avoid git clean with Azure Devops self-hosted Build Agent - azure-devops

I have a YAML build script in an Azure hosted git repository which gets triggered across 7 build agents running on a local VM. Every time this runs, the build performs a git clean which takes a significant amount of time due to a large node_modules folder which takes a long time to clean up.
The MSDN page here seems to suggest this is configurable but shows no detail of how to configure it. I can't tell whether this is a setting that should be specified on the agent, the YAML script, within DevOps on the pipeline, or where.
Is there any other documentation I'm missing or is this not possible?
Update:
The start of the YAML file is here:
variables:
BUILD_VERSION: 1.0.0.$(Build.BuildId)
buildConfiguration: 'Release'
process.clean: false
jobs:
###### ######################################################
###### 1 - Build and publish .NET
#############################################################
- job: net_build_publish
displayName: .NET build and publish
pool:
name: default
steps:
- script: echo $(BUILD_VERSION)
- task: DotNetCoreCLI#2
displayName: dotnet build $(buildConfiguration)
inputs:
command: 'build'
projects: |
myrepo/**/API/*.csproj
arguments: '-c $(buildConfiguration) /p:Version=$(BUILD_VERSION)'
The complete yaml is a lot longer, but the output from the first job includes this output in a Checkout task
Checkout myrepo#master to s
View raw log
Starting: Checkout myrepo#master to s
==============================================================================
Task : Get sources
Description : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
Version : 1.0.0
Author : Microsoft
Help : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
==============================================================================
Syncing repository: myrepo (Git)
Prepending Path environment variable with directory containing 'git.exe'.
git version
git version 2.26.2.windows.1
git lfs version
git-lfs/2.11.0 (GitHub; windows amd64; go 1.14.2; git 48b28d97)
git config --get remote.origin.url
git clean -ffdx
Removing myrepo/Data/Core/API/bin/
Removing myrepo/Data/Core/API/customersettings.json
Removing myrepo/Data/Core/API/obj/
Removing myrepo/Data/Core/Shared/bin/
Removing myrepo/Data/Core/Shared/obj/
....
We have another job further down which runs npm install and npm build for an Angular project, and every build in the pipeline is taking 5 minutes to perform the npm install step, possibly because of this git clean when retrieving the repository?

Click on your pipeline to show the run history
Click Edit
Click the 3 dot kebab menu
Click Triggers
Click YAML
Click Get Sources
Set Clean to False and Save
To say this is obfuscated is an understatement!
I can't say what affect this will have though, I think the agent reuses the same folder each time a pipeline runs and I'm not Node.js developer so I don't know what leaving old node_modules hanging around will do!
P.S. what people were saying about pipeline caching I don't think is what you were asking, also pipeline caching zips up the cached folder and uploads it to your artifacts storage, it then downloads it each time, if you only have 1 build agent then actually not doing a git clean might be more efficent I'm not 100%

As I mentioned below. You need to calculate hash before you run npm install. If hash is the same as the one kept close to node_modules you can skip installing dependencies. This may help you achieve this:
steps:
- task: PowerShell#2
displayName: 'Calculate and save packages.config hash'
inputs:
targetType: 'inline'
pwsh: true
script: |
# generates a hash of package-lock.json
$newHash = Get-FileHash -Algorithm MD5 -Path (Get-ChildItem package-lock.json)
$hashPath = "$(System.DefaultWorkingDirectory)/cache-npm/hash.txt"
if(Test-Path -path $hashPath) {
if(Compare-Object -ReferenceObject $(Get-Content $hashPath) -DifferenceObject $newHash) {
Write-Host "##vso[task.setvariable variable=NodeModulesAreUpToDate;]true"
$newHash > $hashPath
Write-Host ("Hash File saved to " + $hashPath)
} else {
# files are the same
Write-Host "no need to install node_modules"
}
} else {
$newHash > $hashPath
Write-Host ("Hash File saved to " + $hashPath)
}
$storedHash = Get-Content $hashPath
Write-Host $storedHash
workingDirectory: '$(System.DefaultWorkingDirectory)/cache-npm'
- script: npm install
workingDirectory: '$(Build.SourcesDirectory)/cache-npm'
condition: ne(variables['NodeModulesAreUpToDate'], true)

git clean -ffdx will clean any change untracked by source control in the source. You may try Pipeline caching, which can help reduce build time by allowing the outputs or downloaded dependencies from one run to be reused in later runs, thereby reducing or avoiding the cost to recreate or redownload the same files again. Check the following link:
https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#nodejsnpm
variables:
npm_config_cache: $(Pipeline.Workspace)/.npm
steps:
- task: Cache#2
inputs:
key: 'npm | "$(Agent.OS)" | package-lock.json'
restoreKeys: |
npm | "$(Agent.OS)"
path: $(npm_config_cache)
displayName: Cache npm

In the checkout step, it allows us to set the boolean option clean to true or false. The default is true so it runs git clean by default.
Below is a minimal example with clean set to false.
jobs:
- job: Build_Job
timeoutInMinutes: 0
pool: 'PoolOne'
steps:
- checkout: self
clean: false
submodules: recursive
- task: PowerShell#2
displayName: Make build
inputs:
targetType: 'inline'
script: |
bash -c 'make'
More documentation and related options can be found here

Related

Checkov scan particular folder or PR custom branch files

Trying to run Checkov (for IaC validation) via Azure DevOps YAML pipelines, for ARM template files stored in Azure DevOps version control. The code below:
trigger: none
pool:
vmImage: ubuntu-latest
stages:
- stage: 'runCheckov'
displayName: 'Checkov - Scan ARM files'
jobs:
- job: 'RunCheckov'
displayName: 'Checkov solution'
steps:
- bash: |
docker pull bridgecrew/checkov
workingDirectory: $(System.DefaultWorkingDirectory)
displayName: 'Pull bridgecrew/checkov image'
- bash: |
docker run \
--volume $(pwd):/scripts bridgecrew/checkov \
--directory /scripts \
--output junitxml \
--soft-fail > $(pwd)/CheckovReport.xml
workingDirectory: $(System.DefaultWorkingDirectory)
displayName: 'Run checkov'
- task: PublishTestResults#2
inputs:
testRunTitle: 'Checkov run results'
failTaskOnFailedTests: false
testResultsFormat: 'JUnit'
testResultsFiles: 'CheckovReport.xml'
searchFolder: '$(System.DefaultWorkingDirectory)'
mergeTestResults: false
publishRunAttachments: true
displayName: 'Publish Test results'
The problem - how to change the path/folder of ARM templates to scan. Now it scans all ARM templates found under my whole repo1, regardless what directory value I set.
Also, how to scan PR files committed to custom branch during PR review, so it would trigger the build but the build would scan only those files in the custom branch. I know how to set to trigger build via DevOps repository settings, but again, how to assure build pipeline uses/scan particular PR commit files, not whole repo1 (and master branch).
I recommend you use the Docker image bridgecrew/checkov to set up a container job to run the Checkov scan. The container job will run all the tasks of the job into the Docker container started from this image.
In the container job, you can check out the source repository into the container, then use a script task (such as Bash task) to run the related Checkov CLI to do the files scan. On the script task, you can use the 'workingDirectory' option to specify the path/folder where the command lines run in. Normally, the command lines will only act on files which are in the specified directory and its subdirectories.
If you want to only scan the files in a specific branch in the job, you can clone/checkout the specific branch to the working directory of the job in the container, then like as above mentioned, use the related Checkov CLI to scan files under the specified directory.
[UPDATE]
In the pipeline job, you can try to call the Azure DevOps REST API "Commits - Get Changes" to get all the changed files and folders for the particular commit.
Then use the Checkov CLI with the parameter --directory (-d) or --file (-f) to scan the specified file or folder.

How to rename multiple PowerBi reports in Dev, QA, PROD during CICD using Azure DevOps Yaml Pipeline?

In my repository, I have 2 PowerBi reports (.pbix files) in a folder:
Report1.pbix
Report2.pbix
In my Yaml pipeline, I am creating an artifact named "reports" and coping all the reports there in the CI step.
Then in the CD steps (Dev, Qa, and Prod), I am deploying them using PowerBIActions#5 task. Everything is working as expected. But I want to rename them in each environment.
For Example in DEV:
DEV - Report1.pbix
DEV - Report2.pbix
For Example in QA:
QA - Report1.pbix
QA - Report2.pbix
For Example in PROD:
PROD - Report1.pbix
PROD - Report2.pbix
This should be generic, in the future, I can have more reports. I have used copy files task but it does not have the option to provide the destination filename. There are other options like CmdLine ren, Powershell Rename-Item, and Powershell Copy-Item but they copy or rename a single file at a time. But in my case, there are 2 reports and the number of reports will increase in the future. So I do not want to put multiple Rename-Item tasks for each report. I think a loop or something else is required. Guidance will be appreciated.
Thanks.
The below code is a sample for 'Dev-'. You can capture the current environment and rename all of the '.pbix' files.
trigger:
- none
pool:
vmImage: ubuntu-latest
steps:
- script: |
dir
displayName: 'Run a multi-line script'
- task: PythonScript#0
inputs:
scriptSource: 'inline'
script: |
#foreach folder and rename files with specific suffix
import os
import re
def rename_files(path,env):
for file in os.listdir(path):
if file.endswith(".pbix"):
os.rename(file , env+file)
print(file)
print(os.path.join(path, file))
else:
pass
rename_files(".","Dev-")
- script: |
dir
displayName: 'Run a multi-line script'
Successfully on my side:
After rename the files, copy them to target place.

How do you copy azure repo folders to a folder on a VM in an Environment in a pipeline?

I have an Environment called 'Dev' that has a resource, which is a VM. As part of the 'Dev' pipeline I want to copy files from a specific folder on the develop branch of a specific repo to a specific folder on the VM that's on the Environment.
I've not worked with Environments before or yaml pipelines much but I gather I need to use the CopyFiles#2 task.
So I've got an azure pipeline yaml file something like this:
variables:
isDev: $[eq(variables['Build.SourceBranch'], 'refs/heads/develop')]
stages:
- stage: Build
jobs:
- job: Build
pool:
vmImage: 'windows-latest'
steps:
- task: CopyFiles#2
displayName: 'Copy Files'
inputs:
contents: 'myFolder\**'
Overwrite: true
targetFolder: $(Build.ArtifactStagingDirectory)
- task: PublishBuildArtifacts#1
inputs:
pathToPublish: $(Build.ArtifactStagingDirectory)
artifactName: myArtifact
- stage: Deployment
dependsOn: Build
condition: and(succeeded(), eq(variables.isDev, true))
jobs:
- deployment: Deploy
displayName: Deploy to Dev
pool:
vmImage: 'windows-latest'
environment: Dev
strategy:
runOnce:
deploy:
steps:
- script: echo Foo Bar
The first question is how to I get this to copy the files to a specific path on the Dev environment?
Is the PublishBuildArtifacts really needed? The reason I ask is that I want this to copy files every time the pipeline is run and not error if the artifact already exists.
It also feels a bit dirty to have to check the branch is the correct branch this way. Is there a better way to do it?
The deployment strategy you're using relies on specifying an agent pool, which means it doesn't run on the machines in the environment. If you use a strategy such as rolling, it will run the specified steps on those machines automatically, including any download steps to download artifacts.
Ref: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/deployment-jobs?view=azure-devops#deployment-strategies
You need to publish artifacts as part of the pipeline if you want them to be automatically available to down-stream jobs. Each run will get a different set of artifacts, even if the actual artifact contents are the same.
That said, based on the YAML you posted, you probably don't need to. In fact, you don't need the "build" stage at all. You could just add a checkout step during your rolling deployment, and the repo would be cloned on each of the target machines.
Ok, worked this out with help from this article: https://dev.to/kenakamu/azure-devops-yaml-release-pipeline-trigger-when-build-pipeline-completed-54d5.
I've taken the advice from Daniel Mann regarding the strategy being 'rolling'. I then split my pipeline into 2 pipelines; 1 for building the artifacts and 1 for releasing (copying them).
If you want just download the particular folders instead of all the source files from the repository, you can try using the REST API "Items - Get" to download each particular folder individually.
GET https://dev.azure.com/{organization}/{project}/_apis/git/repositories/{repositoryId}/items?path={path}&download=true&$format=zip&versionDescriptor.version={versionDescriptor.version}&resolveLfs=true&api-version=6.0
For example:
Have the repository like as below.
Now, in the YAML pipeline, I just want to download the 'TestFolder01' folder from the main branch.
jobs:
- job: build
. . .
steps:
- checkout: none # Do not check out all the source files.
- task: Bash#3
displayName: 'Download particular folder'
env:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
inputs:
targetType: inline
script: |
curl -X GET \
-o TestFolder01.zip \
-u :$SYSTEM_ACCESSTOKEN 'https://dev.azure.com/MyOrg/MyProj/_apis/git/repositories/ShellScripts/items?path=/res/TestFolder01&download=true&$format=zip&versionDescriptor.version=main&resolveLfs=true&api-version=6.0'
This will download the 'TestFolder01' folder as a ZIP file (TestFolder01.zip) into the current working directory. You can use the unzip command to decompress it.
[UPDATE]
If you want to download the particular folders in the deploy job which target to your VM environment, yes, the folders will be download into the pipeline working directory on the VM.
Actually, you can consider a VM type environment resource is a self-hosted agent installed on the VM. So, when your deploy job is targeting to the VM environment resource, it is running on the self-hosted agent on the VM.
The pipeline working directory is under the directory where you install the VM environment resource (self-hosted agent). Normally, you can use the variable $(Pipeline.Workspace) to get value of this path (see here).
stages:
- stage: Deployment
jobs:
- deployment: Deploy
displayName: 'Deploy to Dev'
environment: 'Dev.VM-01'
strategy:
runOnce:
deploy:
steps:
- task: Bash#3
displayName: 'Download particular folder'
env:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
inputs:
targetType: inline
script: |
echo "Current working directory: $PWD"
curl -X GET \
-o TestFolder01.zip \
-u :$SYSTEM_ACCESSTOKEN 'https://dev.azure.com/MyOrg/MyProj/_apis/git/repositories/ShellScripts/items?path=/res/TestFolder01&download=true&$format=zip&versionDescriptor.version=main&resolveLfs=true&api-version=6.0'

There is a cache miss

I'm trying to cache the cypress installation in my build pipeline.
I have this task setup:
- task: Cache#2
inputs:
key: 'cypress | $(Agent.OS) | package-lock.json'
path: 'C:\npm\prefix\node_modules\cypress'
I've run the build pipeline multiple times but I'm always getting the same error:
There is a cache miss
This is the previous build:
As you can see it's the same fingerprint, so why is the caching not working?
One more thing - In order that cache task works as expected all task must succeed.
Set variable system.debug to true to get more information.
You would check the path to see whether it is correct on the agent machine (you are using self-hosted agent, correct?)
Usually, on the first run after the task is added, the cache step will report a "cache miss" since the cache identified by this key does not exist. As you always got "cache miss" issue, I suspect the cache is not created or uploaded correctly. You may try to do some modification for package-lock.json to recache and re-generated one new key, to see how's the result.
Check the "cache post-job task" results and see if the keys are different.
In my case I had to use npm install --no-save so that the package-lock.json file wasn't regenerated during the pipeline. This ensured the "cache post-job task" was using the same cache key when caching node_modules.
edit
Here is an exported example of what we currently use in our pipeline to cache npm modules.
(You must make sure that your package-lock.json is checked in to your code repository)
steps:
- task: Cache#2
displayName: 'Npm Install Cache'
inputs:
key: '"npm" | "$(Agent.OS)" | my-project/package-lock.json'
path: 'my-project/node_modules'
cacheHitVar: NpmInstallCached
- task: Npm#1
displayName: 'Npm Install'
inputs:
command: custom
workingDir: my-project/
verbose: false
customCommand: 'install --no-save'
condition: ne(variables['NpmInstallCached'], 'true')

When my Azure DevOps build pipeline runs, I ONLY want to create a new nuget package version - if a solution's project's code has changed. How?

Problem:
When my Azure DevOps build pipeline runs, I ONLY want to create a nuget package, if a solution's project's code has changed. if no changes, occured, i would love it if the pipeline could not continue in process (before the nuget push).
meaning - I do not want to release a new version of a nuget package if there were no code changes.
this is easy to do if i have an exclusive git repository for the project. because each git commit/push resembles a code change.. but i'm not sure how to do this if i have a .net solution with many projects inside (the nuget project being one of them).
I would like to share another method to achieve this goal.
In addition to adding a judgment on the file, you also need to add a judgment on the pipeline trigger method.
You can also try to use Git command and PowerShell Script to check the commit information.
Then you need to add Condition on the Nuget Push Task. (e.g. in(variables['Build.Reason'], 'IndividualCI', 'BatchedCI')) This is Required.
You need to make sure that the Nuget Push task is triggered only when CI triggered. Otherwise, when you trigger the pipeline manually, this task will still create a new nuget version.
Here is an Example:
trigger:
- master
pool:
vmImage: 'ubuntu-latest'
steps:
- task: PowerShell#2
inputs:
targetType: 'inline'
script: |
$files=$(git diff HEAD HEAD~ --name-only)
$temp=$files -split ' '
echo $temp
$count=$temp.Length
echo "Total changed $count files"
For ($i=0; $i -lt $temp.Length; $i++)
{
$name=$temp[$i]
echo "this is $name file"
if ($name -like "filepath/test/*")
{
Write-Host "##vso[task.setvariable variable=IsContainFile]True"
}
else
{
Write-Host "##vso[task.setvariable variable=IsContainFile]False"
}
}
- task: NuGetCommand#2
inputs:
command: 'push'
packagesToPush: '$(Build.ArtifactStagingDirectory)/**/*.nupkg;!$(Build.ArtifactStagingDirectory)/**/*.symbols.nupkg'
nuGetFeedType: 'internal'
......
condition: and(eq(variables['IsContainFile'], 'true'), in(variables['Build.Reason'], 'IndividualCI', 'BatchedCI'))
WorkFlow:
The Powershell task will check the file information. If the commits contains the target file or folder, it will set the variable value as true.
In Nuget Push task, When it meets two conditions(CI Trigger and Variable Value is true) at the same time, the task will run.
If you have separate pipeline for creating nuget package you may consider path filer to trigger only then when there is a need to create a new nuget version:
# specific path build
trigger:
branches:
include:
- master
- releases/*
paths:
include:
- docs/*
exclude:
- docs/README.md
``