Azure Data Factory pipelines are failing when no files available in the source - azure-data-factory

Currently – we do our data loads from Hadoop on-premise server to SQL DW [ via ADF Staged Copy and DMG on-premise server]. We noticed that ADF pipelines are failing – when there are no files in the Hadoop on-premise server location [ we do not expect our upstreams to send the files everyday and hence its valid scenario to have ZERO files on Hadoop on-premise server location ].
Do you have a solution for this kind of scenario ?
Error message given below
Failed execution Copy activity encountered a user error:
ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot
find the 'HDFS' file.
,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The
remote server returned an error: (404) Not Found.,Source=System,'.
Thanks,
Aravind

This Requirement can be solved by using the ADFv2 Metadata Task to check for file existence and then skip the copy activity if the file or folder does not exist:
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

You can change the File Path Type to Wildcard, add the name of the file and add a "*" at the end of the name or any other place that suits you.
This is a simple way to stop the Pipeline failing when there is no file.

Do you have Input DataSet for your pipeline? See if you can skip your Input Dataset dependency..

Mmmm, this is a tricky one. I'll up vote the question I think.
Couple of options that I can think of here...
1) I would suggest the best way would be to create a custom activity ahead of the copying to check the source directory first. This could handle the behaviour if there isn't a file present, rather than just throwing an error. You could then code this to be a little more graceful when it returns and not block the downstream ADF activities.
2) Use some PowerShell to inspect the ADF activity for the missing file error. Then simply set the dataset slice to either skipped or ready using the cmdlet to override the status.
For example:
Set-AzureRmDataFactorySliceStatus `
-ResourceGroupName $ResourceGroup `
-DataFactoryName $ADFName.DataFactoryName `
-DatasetName $Dataset.OutputDatasets `
-StartDateTime $Dataset.WindowStart `
-EndDateTime $Dataset.WindowEnd `
-Status "Ready" `
-UpdateType "Individual"
This of course isn't ideal, but would be quicker to develop than a custom activity using Azure Automation.
Hope this helps.

I know i'm late to the party, but if you're like me and running into this issue, looks like they made an update a while back to allow for no files found

Related

AzCopy ignore if source file is older

Is there an option to handle the next situation:
I have a pipeline and Copy Files task implemented in it, it is used to upload some static html file from git to blob. Everything works perfect. But sometimes I need this file to be changed in the blob storage (using hosted application tools). So, the question is: can I "detect" if my git file is older than target blob file and ignore this file for the copy task to leave it untouched. My initial idea was to use Azure file copy and use an "Optional Arguments" textbox. However, I couldn't find required option in the documentation. Does it allow such things? Or should this case be handled some other way?
I think you're looking for the isSourceNewer value for the --overwrite option.
--overwrite string Overwrite the conflicting files and blobs at the destination if this flag is set to true. (default true) Possible values include true, false, prompt, and ifSourceNewer.
More info: azcopy copy - Options
Agree with ickvdbosch. The isSourceNewer value for the --overwrite option could meet your requirements.
error: couldn't parse "ifSourceNewer" into a "OverwriteOption"
Based on my test, I could reproduce this issue in Azure file copy task.
It seems that the isSourceNewer value couldn't be set to Overwrite option in Azure File copy task.
Workaround: you could use PowerShell task to run the azcopy script to upload the files with --overwrite=ifSourceNewer
For example:
azcopy copy "filepath" "BlobURLwithSASToken" --overwrite=ifSourceNewer --recursive
For more detailed info, you could refer to this doc.
For the issue about the Azure File copy task, I suggest that you could submit a feedback ticket in the following link: Report task issues.

Azure DevOps Release Pipelines - Using env parms with a period . in

I am finding using AZDO Release pipeline variables maddening in Powershell steps.
I am running an Azure PowerShell step to return a primary key value. It is 2 lines…
$primarykey = (Get-AzRelayKey -ResourceGroupName ${env:az-resourcegroupname} -Namespace ${env:az-relaynamespace} -HybridConnection ${env:serviceBus.primaryRelay.ConnectionName} -Name ${env:serviceBus.primaryRelay.KeyName} | Select-Object -ExpandProperty PrimaryKey)
Write-Host "##vso[task.setvariable variable=serviceBus.primaryRelay.Key]$primarykey"
In my pipeline I have a mix of variable names, some I have complete control over (the az- prefixed ones) and others I don’t (the ones starting serviceBus.)
The reason I have no control over the latter is that they are used for a later File Transform step that navigates an appsettings.json file to find/replace values, and its unable to be changed (for example serviceBus.primaryRelay.ConnectionName is a value that is changed in the JSON and the file transform step specifies to navigate the JSON structure, it has to be separated with a period . )
When this script runs it always complains about the -HybridConnection value being empty. This is because the variable has a period in it.
I’ve tried everything I can think of to retrieve that value in the code.
Are they suggesting here that a variable with a period isn’t workable in Powershell in AZDO release pipelines? I’m completely lost.
I have found the answer by looking under the Release Pipelines "Initialize Job" log. It appears to substitute the period . with a dash -
The log revealed this...
[SERVICEBUS_PRIMARYRELAY_CONNECTIONNAME] --> [dev-sbrelay]

What are my PowerShell options for determining a path that changes with every build (TeamCity)?

I'm on a project that uses TeamCity for builds.
I have a VM, and have written a PowerShell script that backs up a few files, opens a ZIP artifact that I manually download from TeamCity, and then copies it to my VM.
I'd like to enhance my script by having it retrieve the ZIP artifact (which always has the same name).
The problem is that the download path contains the build number which is always changing. Aside from requesting the download path for the ZIP artifact, I don't really care what it is.
An example artifact path might be:
http://{server}/repository/download/{project}/{build_number}:id/{project}.zip
There is a "Last Successful Build" page in TeamCity that I might be able to obtain the build number from.
What do you think the best way to approach this issue is?
I'm new to TeamCity, but it could also be that the answer is "TeamCity does this - you don't need a PowerShell script." So direction in that regard would be helpful.
At the moment, my PowerShell script does the trick and only takes about 30 seconds to run (which is much faster than my peers that do all of the file copying manually). I'd be happy with just automating the ZIP download so I can "fire and forget" my script and end up with an updated VM.
Seems like the smallest knowledge gap to fill and retrieving changing path info at run-time with PowerShell seems like a pretty decent skill to have.
I might just use C# within PS to collect this info, but I was hoping for a more PS way to do it.
Thanks in advance for your thoughts and advice!
Update: It turns out some other teams had been using Octopus Deploy (https://octopus.com/) for this sort of thing so I'm using that for now - though it actually seems more cumbersome than the PS solution overall since it involves logging into the Octopus server and going through a few steps to kick off a new build manually at this point.
I'm also waiting for the TC administrator to provide a Webhook or something to notify Octopus when a new build is available. Once I have that, the Octopus admin says we should be able to get the deployments to happen automagically.
On the bright side, I do have the build process integrated with Microsoft Teams via a webhook plugin that was available for Octopus. Also, the Developer of Octopus is looking at making a Microsoft Teams connector to simplify this. It's nice to get a notification that the new build is available right in my team chat.
You can try to get your artefact from this url:
http://<ServerUrl>/repository/downloadAll/<BuildId>/.lastSuccessful
Where BuildId is the unique identifier of the build configuration.
My implementation of this question is, in powershell:
#
# GetArtefact.ps1
#
Param(
[Parameter(Mandatory=$false)][string]$TeamcityServer="",
[Parameter(Mandatory=$false)][string]$BuildConfigurationId="",
[Parameter(Mandatory=$false)][string]$LocalPathToSave=""
)
Begin
{
$username = "guest";
$password = "guest";
function Execute-HTTPGetCommand() {
param(
[string] $target = $null
)
$request = [System.Net.WebRequest]::Create($target)
$request.PreAuthenticate = $true
$request.Method = "GET"
$request.Headers.Add("AUTHORIZATION", "Basic");
$request.Accept = "*"
$request.Credentials = New-Object System.Net.NetworkCredential($username, $password)
$response = $request.GetResponse()
$sr = [Io.StreamReader]($response.GetResponseStream())
$file = $sr.ReadToEnd()
return $file;
}
Execute-HTTPGetCommand http://$TeamcityServer/repository/downloadAll/$BuildConfigurationId/.lastSuccessful | Out-File $LocalPathToSave
}
And call this with the appropriate parameters.
EDIT: Note that the current credential I used here was the guest account. You should check if the guest account has the permissions to do this, or specify the appropriate account.
Try constructing the URL to download build artifact using TeamCity REST API.
You can get a permanent link using a wide range of criteria like last successful build or last tagged with a specific tag, etc.
e.g. to get last successful you can use something like:
http://{server}/app/rest/builds/buildType:(id:{build.conf.id}),status:SUCCESS/artifacts/content/{file.name}
TeamCity has the capability to publish its artifacts to a built in NuGet feed. You can then use NuGet to install the created package, not caring about where the artifacts are. Once you do that, you can install with nuget.exe by pointing your source to the NuGet feed URL. Read about how to configure the feed at https://confluence.jetbrains.com/display/TCD10/NuGet.
Read the file content of the path in TEAMCITY_BUILD_PROPERTIES_FILE environment variable.
Locate the teamcity.configuration.properties.file row in the file, iirc the value is backslash encoded.
Read THAT file, and locate the teamcity.serverUrl value, decode it.
Construct the url like this:
{serverurl}/httpAuth/repository/download/{buildtypeid}/.lastSuccessful/file.txt
Here's an example (C#):
https://github.com/WideOrbit/buildtools/blob/master/RunTests.csx#L272

Jenkins Powershell Output

I would like to capture the output of some variables to be used elsewhere in the job using Jenkins Powershell plugin.
Is this possible?
My goal is to build the latest tag somehow and the powershell script was meant to achieve that, outputing to a text file would not help and environment variables can't be used because the process is seemingly forked unfortunately
Besides EnvInject the another common approach for sharing data between build steps is to store results in files located at job workspace.
The idea is to skip using environment variables altogether and just write/read files.
It seems that the only solution is to combine with EnvInject plugin. You can create a text file with key value pairs from powershell then export them into the build using the EnvInject plugin.
You should make the workspace persistant for this job , then you can save the data you need to file. Other jobs can then access this persistant workspace or use it as their own as long as they are on the same node.
Another option would be to use jenkins built in artifact retention, at the end of the jobs configure page there will be an option to retain files specified by a match (e.g *.xml or last_build_number). These are then given a specific address that can be used by other jobs regardless of which node they are on , the address can be on the master or the node IIRC.
For the simple case of wanting to read a single object from Powershell you can convert it to a JSON string in Powershell and then convert it back in Groovy. Here's an example:
def pathsJSON = powershell(returnStdout: true, script: "ConvertTo-Json ((Get-ChildItem -Path *.txt) | select -Property Name)");
def paths = [];
if(pathsJSON != '') {
paths = readJSON text: pathsJSON
}

Azure Storage: use AzCopy.exe to copy a folder from blob storage to another storage account

Using AzCopy.exe, I am able to copy over an entire container successfully. However, I cannot figure out how to copy over a blob where the name includes a folder structure. I have tried the following:
.\AzCopy.exe /Source:https://sourceaccount.blob.core.windows.net/container /Dest:https://destaccount.blob.core.windows.net/container /SourceKey:sourceKey== /DestKey:destKey== /S /Pattern:CorruptZips/2013/6
While also changing the /Pattern: to things like:
/Pattern:CorruptZips/2013/6/*
/Pattern:CorruptZips/2013/6/.
/Pattern:CorruptZips/2013/6/
And everything just says that there are zero records copied. Can this be done or is it just for container/file copying? Thank you.
#naspinski, there is the other tool named Azure Data Factory which can help copying a folder from a blob storage account to another one. Please refer to the article Move data to and from Azure Blob using Azure Data Factory to know it and follow the steps below to do.
Create a Data Factory on Azure portal.
Click the Copy Data button as below to move to the powercopytool interface, and follow the tips to copy the folder step by step.
Took me a few tries to get this. Here is the key:
If the specified source is a blob container or virtual directory, then
wildcards are not applied.
In other words, you can't wildcard copy files nested in a folder structure in a container. You have two options:
Use /S WITHOUT a pattern to recursive copy everything
Use /S and specify the full file path in your pattern without a wildcard
Example:
C:\Users\myuser>azcopy /Source:https://source.blob.core.windows.net/system /Dest:https://dest.blob.core.windows.net/system /SourceKey:abc /DestKey:xyz /S /V /Pattern:"Microsoft.Compute/Images/vmimage/myimage.vhd"
EDIT: Oops, my answer was worded incorrectly!
Please specify the command without /S:
AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer1 /Dest:https://myaccount.blob.core.windows.net/mycontainer2 /SourceKey:key /DestKey:key /Pattern:abc.txt
You can find the information from "Copy single blob within Storage account" in http://aka.ms/azcopy .