How to delete sub folder using azure data factory delete activity? - azure-data-factory

i am able to achieve by using files or delete the entire folders by selecting file path.
But i unable to do delete sub-folder/directory
dataset image
pipeline image
delete file is working for me
unable to delete the empty folder(test1,test2)
Could any one help me

You need to check Delete file recursively in Delete Activity Source.
Delete file recursively - Delete all files in the input folder and its subfolders recursively or just the ones in the selected folder. This setting is disabled when a single file is selected.
For more information refer this link
Edit -
Known limitation
Delete activity does not support deleting list of folders described by wildcard.
When using file attribute filter in delete activity: modifiedDatetimeStart and modifiedDatetimeEnd to select files to be deleted, make sure to set "wildcardFileName": "*" in delete activity as well.

Related

Azure Data Factory - What is the fastest way to copy a lot of files from OnPrem to blob storage when they are deeply nested

I must get to two different excel files that are nested within 360 parent directories XXXX with a \ME (Month End directory) then a year directory, and finally a yyyymm directory.
Example: Z500\ME\2022\202205\Z500_contributions_202205.xls.
I tried with the copy data activity and killed it after it was still spinning on the listing source step. I thought about the lookup and metadata activities and those have limits of 5000 rows. Any thoughts on what would be the fastest way to do this?
Code for creating the filelist. I'll clean the results up in Excel
dir L:*.xls /s /b > "C:\Foo.txt"
Right now I am creating a list of files with the DOS dir command and hoping that if I have a filelist with the copy activity that it will runs faster if it doesn't have to go through the "list sources" step and interrogate the filesystem.
Thoughts on a ADF option?
If you are facing issues with copy activity, instead you can try azcopy which also can be used for copying from OnPrem to Blob storage.
You can try the below code:
azcopy copy "local path/*" "https://<storage account
name>.blob.core.windows.net/<container name><path to blob" --recursive=true --include-pattern "*.xlsx"
Please go through this Microsoft documentation to know how to use azcopy.
The above script copies all the excel files from nested folders recursively. But it also copies the folders to the blob storage as well.
After copying to blob storage, you can use the Start-AzureStorageBlobCopy in the Powershell to list out all the excel files in the nested folders to a single folder.
Start-AzureStorageBlobCopy -SrcFile $sourcefile -DestCloudBlob “Destination path”
Please refer this SO thread to list out the files in the blob recursively.
If you are creating list of files in OnPrem then you can use either azcopy or copy activity as your wish.
Please check these screenshots of azcopy for your reference:
Here I am using azcopy with SAS. You can use it in both ways with SAS and with Active Directory as mentioned in the documentation above.
Excel file in Blob storage:

How to delete all files in root of File system with Delete Activity of Data Factory?

I have on-prem file System Data set with pathpath "Z:\ProjectX"
It has contains files like Z:\ProjectX\myfile1.json
I would like to delete all json files in "Z:\ProjectX".
I wonder how to do? What value should be set to Folder?
In source settings, select the File path type as Wildcard file path and provide the Wildcard file name as ‘*.json’ to delete all the files of type JSON.

Copy activity with simultaneous renaming of a file. From blob to blob

I have a "copy data" activity in Azure Data Factory. I want to copy .csv files from blob container X to Blob container Y. I don't need to change the content of the files in any way, but I want to add a timestamp to the name, e.g. rename it. However, I get the following error "Binary copy does not support copying from folder to file". Both the source and the sink are set up as binary.
If you want to copy the files and rename them, you pipeline should like this:
Create a Get Metadata active to get the file list(dataset Binary1):
Create For Each active to copy the each file:#activity('Get Metadata1').output.childItems:
Foreach inner active, create a copy active with source dataset
Binary2(same with Binary2) with dataset parameter to specify the source file:
Copy active sink setting, create the sink Binary3 also with
parameter to rename the files:
#concat(split(item().name,'.')[0],utcnow(),'.',split(item().name,'.')[1]):
Run the pipeline and check the output:
Note: The example I made just copy the files to the same container but with new name.

How to delete specific files from the Source folder using delete task in Azure devOps

I am trying to add a task to delete files with specific type from source folder and all the sub folders using delete task in Azure DevOps pipeline.
Delete task:
https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/utility/delete-files?view=azure-devops doesn't seem to provide any information on the patterns.
I have tried following combinations but none of them worked.
(.xyz)
(.xyz)
*.xyz
*.xyz\
my expectation is to delete files with .xyz type from all the sub folders.
Try setting:
**/*.xyz
As the Contents variable value.
The full range of pattern filters is described in the documentation here.

gsutil - is it possible to list only folders?

Is it possible to list only the folders in a bucket using the gsutil tool?
I can't see anything listed here.
For example, I'd like to list only the folders in this bucket:
Folders don't actually exist. gsutil and the Storage Browser do some magic under the covers to give the impression that folders exist.
You could filter your gsutil results to only show results that end with a forward slash but this may not show all the "folders". It will only show "folders" that were manually created (i.e., not implicitly exist because an object name contains slashes):
gsutil ls gs://bucket/ | grep -e "/$"
Just to add here, if you directly drag a folder tree to google cloud storage web GUI, then you don't really get a file for a parent folder, in fact each file name is a fully qualified url e.g. "/blah/foo/bar.txt" , instead of a folder blah>foo>bar.txt
The trick here is to first use the GUI to create a folder called blah and then create another folder called foo inside (using the button in the GUI) and finally drag the files in it.
When you now list the file you will get a separate entry for
blah/
foo/
bar.txt
rather than only one
blah/foo/bar.txt