Filename pattern validation in adf v2 - azure-data-factory

I would like to validate my input filename whether it's in specified name.
Like my filename should be <><><>_<>.csv
Yes i am using event based i will get filename from trigger.
expected format: company_contry_yearmonth_timestamp.CSV

There is no explicit regex way of validating if the incoming file name matches a pattern. But if you are using activity like lookup or copy activity. You can specify in the source dataset settings a wildcard file name or file path to fetch a file matching the pattern.
- wildcardFileName
The file name with wildcard characters under the given container and
folder path (or wildcard folder path) to filter source files. Allowed
wildcards are: * (matches zero or more characters) and ? (matches zero
or single character). Use ^ to escape if your file name has a wildcard
or this escape character inside. See more examples in Folder and file
filter examples.
example:
You can use a if condition, with an expression as below using contains()
Here a storage event trigger, gets the trigged file name into a parameter. We then use contains() function to see if the file name contains a specified string
#contains(pipeline().parameters.filenameTriggered,'pattern')
If true a wait activity is executed.

Related

How to fetch file path dynamically using pyspark

I have multiple files in my folder , i want to pattern match if any file is present , if that file is present then store the variable with whole file path.
how to achieve this in pyspark
Since you want to store the whole path in a variable, you can achieve this with a combination of dbutils and Regular expression pattern matching.
We can use dbutils.fs.ls(path) to return the list of files present in a folder (storage account or DBFS). Assign its return value to a variable called files.
#my sample path- mounted storage account folder.
files = dbutils.fs.ls("/mnt/repro")
Loop through this list. Now using Python's re.match() you can check if the current item's file name matches your pattern. If it matches, append its path to your result variable (list).
from re import match
matched_files=[]
for file in files:
#print(file)
if(match("sample.*csv", file.name)): #"sample.*csv" is pattern to be matched
matched_files.append(file.path)
#print("Matched files: ",matched_files)
Sample output:

Output filename to VSCode replace string

Say I have a search I want to do across my entire project: callToMyFunction()
And I would like to replace it with:
callToMyFunction();
console.log("calling function from ${filename}");
Is there a way (or an extension) that could do the ${filename} (or whatever) where the filename of the file where the replacement string is being applied gets written in?
If a search match was found in the file index.js the replacement would be:
callToMyFunction();
console.log("calling function from index.js");
[BONUS if there is a way to also get line number, but I'd be more than satisfied with just the filename 😁]

Databricks PySpark environment, find Azure storage account file path of files having same filename pattern

Use Case: In Databricks PySpark environment, I want to check if there are multiple files with same file name pattern existing in the Azure storage account. If they exist, I expect to get the list of file path locations for each file matched.
Tried using, dbutils.fs.ls, but it does not support the wildcard pattern. PFA.
Workaround: Get paths of all files in the folder and then loop over each file to do filename pattern matching and prepare a list of required file paths.
Do let me know, if there is any other way to get the file paths, without looping over?
In Databricks, dbutils.fs.ls() doesn’t support wildcard paths. This official documentation consists of all the Databricks utilies and there is no dbfs utility function that helps to use wildcard paths for matching file names.
You cannot proceed further without using loops. The following operations are done using a storage account with random files for demo. This demonstrates a way you can use to get the files that match your pattern.
Using os.listdir() function, you can get the list of all files in your container/directory.
path_dbfs="dbfs:/mnt/omega/" #absolute dbfs path to your storage
import os
#using os.listdir() to get all files in container.
path = "/dbfs/mnt/omega"
file_names = os.listdir(path)
print(file_names)
['country_data.csv', 'json_input.json', 'json_input.txt', 'person.csv', 'sample_1.csv', 'sample_2.csv', 'sample_3.csv', 'sample_new_date_4.csv', 'store.txt']
Once you have list of all files, you can use regular expressions with re.search() and match object property group() to check whether each file matches the pattern or not.
import re
#use regex with loops to get absolute paths of pattern matching files.
file_to_find_pattern = "sample.*csv" #match pattern in this case.
# .* indicates 0 or more occurances of other characters, you can build it according to your requirement.
matched_files = []
for file in file_names:
val = re.search(file_to_find_pattern,file)
if(val is not None):
matched_files.append(path_dbfs+val.group())
print(matched_files)
['dbfs:/mnt/omega/sample_1.csv', 'dbfs:/mnt/omega/sample_2.csv', 'dbfs:/mnt/omega/sample_3.csv', 'dbfs:/mnt/omega/sample_new_date_4.csv']

Why is #regex used in task.json in Azure DevOps extension? What does it check for?

I came across this and was wondering what this means and how it works?
What's the significance of using #regex here and how does it expand?
https://github.com/microsoft/azure-pipelines-tasks/blob/master/Tasks/DownloadPackageV0/task.json
"endpointUrl": "{{endpoint.url}}/{{ **#regex ([a-fA-F0-9\\-]+/)[a-fA-F0-9\\-]+ feed }}_apis**/Packaging/Feeds/{{ **#regex [a-fA-F0-9\\-]*/([a-fA-F0-9\\-]+) feed** }}{{#if view}}#{{{view}}}{{/if}}/Packages?includeUrls=false"
Also I would like to know how many packages will it return and display in the Task input UI dropdown if there are thousands of packages in the feed. Is there a known limit like first 100 or something?
#regex doesn't appear to actually be documented anywhere, but it takes two space-delimited arguments. The first is a regular expression and the second is a "path expression" identifying what value to match against, in this case the value of the feed input parameter. If the regex matches the value, it returns the first capturing subexpression, otherwise it returns the empty string.
In this particular context, the feed parameter is formatted as 'projectId/feedId', where projectId and feedId are GUIDs, and projectId and the / are eliminated for organization-scoped feeds (i.e. feeds that are not inside a project). The first regex therefore extracts the project ID and inserts it into the URL, and the second regex extracts the feed ID and inserts it into the URL.
As of this writing, the default limit on the API it's calling is 1000.
Regex stands for regular expression, which allows you to match any pattern rather than an exact string. You can find more info on how to use it in Azure Devops here
This regex is very specific. In this case, the regex ([a-fA-F0-9\\-]+/)[a-fA-F0-9\\-]+\ matches one or more of the following 1) letters a-f (small or capital) Or 2) \ Or 3) - followed by / and then again one or more of those characters.
You can copy the regex [a-fA-F0-9\\-]+/)[a-fA-F0-9\\-]+ into https://regexr.com/ to play around with it, to see what does and doesn't match the pattern.
Examples:
it matches: a/a a/b abcdef-\/dcba
but doesn't match: /a, abcdef, this-doesn't-match
Note that the full endpoint consists of concatenations of both regular expression and hardcoded strings!

check file existance in progress 4GL

How to check existance of particular file by use of code.
Eg.
def var a as character.
a = "abc.p"
run value(a).
---> here first i want to check if abc.p exist in workspace or not.
You can use the SEARCH function. Directly from the online manual:
SEARCH function
Searches the directories and libraries defined in the PROPATH environment variable for a file. The SEARCH function returns the full pathname of the file unless it is found in your current working directory. If SEARCH does not find the file, it returns the Unknown value (?).
Syntax
SEARCH ( opsys-file )
opsys-file
A character expression whose value is the name of the file you want to find. The name can include a complete or partial directory path. If opsys-file is a constant string, you must enclose it in quotation marks (" "). The value of opsys-file must be no more than 255 characters long.
Example:
DEFINE VARIABLE cPgm AS CHARACTER NO-UNDO.
cPgm = "test.p".
IF SEARCH(cPgm) <> ? THEN
RUN VALUE(cPgm).
If you provide a fully qualified pathname, SEARCH checks if the file exists. In this case, SEARCH does not search directories on the PROPATH.
If you do not want to use the propath you can use the FILE-INFO system handle.
After setting FILE-NAME, you can check the FILE-TYPE if it exists. See also the Progress Help for FILE-INFO.
FILE-INFO:FILE-NAME = a.
IF FILE-INFO:FILE-TYPE MATCHES "*F*"
THEN RUN VALUE(FILE-INFO:FULL-PATHNAME).