Azure Storage: use AzCopy.exe to copy a folder from blob storage to another storage account - powershell

Using AzCopy.exe, I am able to copy over an entire container successfully. However, I cannot figure out how to copy over a blob where the name includes a folder structure. I have tried the following:
.\AzCopy.exe /Source:https://sourceaccount.blob.core.windows.net/container /Dest:https://destaccount.blob.core.windows.net/container /SourceKey:sourceKey== /DestKey:destKey== /S /Pattern:CorruptZips/2013/6
While also changing the /Pattern: to things like:
/Pattern:CorruptZips/2013/6/*
/Pattern:CorruptZips/2013/6/.
/Pattern:CorruptZips/2013/6/
And everything just says that there are zero records copied. Can this be done or is it just for container/file copying? Thank you.

#naspinski, there is the other tool named Azure Data Factory which can help copying a folder from a blob storage account to another one. Please refer to the article Move data to and from Azure Blob using Azure Data Factory to know it and follow the steps below to do.
Create a Data Factory on Azure portal.
Click the Copy Data button as below to move to the powercopytool interface, and follow the tips to copy the folder step by step.

Took me a few tries to get this. Here is the key:
If the specified source is a blob container or virtual directory, then
wildcards are not applied.
In other words, you can't wildcard copy files nested in a folder structure in a container. You have two options:
Use /S WITHOUT a pattern to recursive copy everything
Use /S and specify the full file path in your pattern without a wildcard
Example:
C:\Users\myuser>azcopy /Source:https://source.blob.core.windows.net/system /Dest:https://dest.blob.core.windows.net/system /SourceKey:abc /DestKey:xyz /S /V /Pattern:"Microsoft.Compute/Images/vmimage/myimage.vhd"
EDIT: Oops, my answer was worded incorrectly!

Please specify the command without /S:
AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer1 /Dest:https://myaccount.blob.core.windows.net/mycontainer2 /SourceKey:key /DestKey:key /Pattern:abc.txt
You can find the information from "Copy single blob within Storage account" in http://aka.ms/azcopy .

Related

Azure Data Factory - What is the fastest way to copy a lot of files from OnPrem to blob storage when they are deeply nested

I must get to two different excel files that are nested within 360 parent directories XXXX with a \ME (Month End directory) then a year directory, and finally a yyyymm directory.
Example: Z500\ME\2022\202205\Z500_contributions_202205.xls.
I tried with the copy data activity and killed it after it was still spinning on the listing source step. I thought about the lookup and metadata activities and those have limits of 5000 rows. Any thoughts on what would be the fastest way to do this?
Code for creating the filelist. I'll clean the results up in Excel
dir L:*.xls /s /b > "C:\Foo.txt"
Right now I am creating a list of files with the DOS dir command and hoping that if I have a filelist with the copy activity that it will runs faster if it doesn't have to go through the "list sources" step and interrogate the filesystem.
Thoughts on a ADF option?
If you are facing issues with copy activity, instead you can try azcopy which also can be used for copying from OnPrem to Blob storage.
You can try the below code:
azcopy copy "local path/*" "https://<storage account
name>.blob.core.windows.net/<container name><path to blob" --recursive=true --include-pattern "*.xlsx"
Please go through this Microsoft documentation to know how to use azcopy.
The above script copies all the excel files from nested folders recursively. But it also copies the folders to the blob storage as well.
After copying to blob storage, you can use the Start-AzureStorageBlobCopy in the Powershell to list out all the excel files in the nested folders to a single folder.
Start-AzureStorageBlobCopy -SrcFile $sourcefile -DestCloudBlob “Destination path”
Please refer this SO thread to list out the files in the blob recursively.
If you are creating list of files in OnPrem then you can use either azcopy or copy activity as your wish.
Please check these screenshots of azcopy for your reference:
Here I am using azcopy with SAS. You can use it in both ways with SAS and with Active Directory as mentioned in the documentation above.
Excel file in Blob storage:

Copy activity with simultaneous renaming of a file. From blob to blob

I have a "copy data" activity in Azure Data Factory. I want to copy .csv files from blob container X to Blob container Y. I don't need to change the content of the files in any way, but I want to add a timestamp to the name, e.g. rename it. However, I get the following error "Binary copy does not support copying from folder to file". Both the source and the sink are set up as binary.
If you want to copy the files and rename them, you pipeline should like this:
Create a Get Metadata active to get the file list(dataset Binary1):
Create For Each active to copy the each file:#activity('Get Metadata1').output.childItems:
Foreach inner active, create a copy active with source dataset
Binary2(same with Binary2) with dataset parameter to specify the source file:
Copy active sink setting, create the sink Binary3 also with
parameter to rename the files:
#concat(split(item().name,'.')[0],utcnow(),'.',split(item().name,'.')[1]):
Run the pipeline and check the output:
Note: The example I made just copy the files to the same container but with new name.

AzCopy ignore if source file is older

Is there an option to handle the next situation:
I have a pipeline and Copy Files task implemented in it, it is used to upload some static html file from git to blob. Everything works perfect. But sometimes I need this file to be changed in the blob storage (using hosted application tools). So, the question is: can I "detect" if my git file is older than target blob file and ignore this file for the copy task to leave it untouched. My initial idea was to use Azure file copy and use an "Optional Arguments" textbox. However, I couldn't find required option in the documentation. Does it allow such things? Or should this case be handled some other way?
I think you're looking for the isSourceNewer value for the --overwrite option.
--overwrite string Overwrite the conflicting files and blobs at the destination if this flag is set to true. (default true) Possible values include true, false, prompt, and ifSourceNewer.
More info: azcopy copy - Options
Agree with ickvdbosch. The isSourceNewer value for the --overwrite option could meet your requirements.
error: couldn't parse "ifSourceNewer" into a "OverwriteOption"
Based on my test, I could reproduce this issue in Azure file copy task.
It seems that the isSourceNewer value couldn't be set to Overwrite option in Azure File copy task.
Workaround: you could use PowerShell task to run the azcopy script to upload the files with --overwrite=ifSourceNewer
For example:
azcopy copy "filepath" "BlobURLwithSASToken" --overwrite=ifSourceNewer --recursive
For more detailed info, you could refer to this doc.
For the issue about the Azure File copy task, I suggest that you could submit a feedback ticket in the following link: Report task issues.

Copy files from single source to multiple destinations

I have a script to copy files from local machine to Azure blob, but my new requirement is to copy half of source files into one blob container and another half into another blob container. Let me know if I can do so using parallel or one after the other. I am using azcopy for now to move these files without splitting and from only one source to one destination.
.\AzCopy.exe /Source:$localfilepath /Dest:$Destinationpath /DestKey:$key1 /S
As I known, if there is a pattern for filtering these file names, you can use the parameter Pattern of AzCopy tool to separately upload them in two times, such as the command as below from the section Upload blobs matching a specific pattern of the offical tutorial if they are named with the a prefix.
AzCopy /Source:C:\myfolder /Dest:https://myaccount.blob.core.windows.net/mycontainer /DestKey:key /Pattern:a* /S
Here is the description of the parameter Pattern of AzCopy
/Pattern:"file-pattern"
Specifies a file pattern that indicates which files to copy. The behavior of the /Pattern parameter is determined by the location of the source data, and the presence of the recursive mode option. Recursive mode is specified via option /S.
If the specified source is a directory in the file system, then standard wildcards are in effect, and the file pattern provided is matched against files within the directory. If option /S is specified, then AzCopy also matches the specified pattern against all files in any subfolders beneath the directory.
If the specified source is a blob container or virtual directory, then wildcards are not applied. If option /S is specified, then AzCopy interprets the specified file pattern as a blob prefix. If option /S is not specified, then AzCopy matches the file pattern against exact blob names.
If the specified source is an Azure file share, then you must either specify the exact file name, (e.g. abc.txt) to copy a single file, or specify option /S to copy all files in the share recursively. Attempting to specify both a file pattern and option /S together results in an error.
AzCopy uses case-sensitive matching when the /Source is a blob container or blob virtual directory, and uses case-insensitive matching in all the other cases.
The default file pattern used when no file pattern is specified is . for a file system location or an empty prefix for an Azure Storage location. Specifying multiple file patterns is not supported.
Applicable to: Blobs, Files
If there is a simple pattern for files, you have to manually move them into the directories of their own category or write a simple script to filter them to generate the command strings for uploading. Then you can use Foreach-Parallel in PowerShell to realize the parallel upload workflow to satisfy your needs.

How do I clean a blob before copying to it (Azure release pipeline)

My release pipeline uploads to blob storage using
Azure File Copy
I want to delete the existing files in the blob before copying over the new files.
The help shows that
cleanTargetBeforeCopy: false
only applies to a VM
(since it is a release pipeline I can't edit the YAML anyway)
The tool tip for Optional Arguments shows
Optional AzCopy.exe arguments that will be applied when uploading to
blob like, /NC:10. If no optional arguments are specified here, the
following optional arguments will be added by default. /Y,
/SetContentType, /Z, /V, /S (only if container name is not $root),
/BlobType:page (only if specified storage account is a premium
account). If source path is a file, /Pattern will always be added
irrespective of whether or not you have specified optional arguments.
I want to delete the existing files in the blob before copying over the new files.
If you want to override the blobs during run copy file task, we neeed to add another optional argument at all.
As you mentioned that if we don't add optional Arguments. /Y paramter is added by default.
The blobs will be replaced by the new files by default when run Azure Copy Files task.
If you want to clean the container, you could use the Azure Powershell command to delete the container and recreate the new one before run the Azure copy file task.