What is the equivalent to Kusto's CountOf() function in Azure Data Factory? - substring

My requirement is to extract a string from filenames using a ADF variable, I need to extract the string until the final underscore '_' and the number of underscores vary in every filename as seen in the below example.
abc_xyz_20221221.txt --> abc_xyz
abc_xyz_a1_20221221.txt --> abc_xyz_a1
abc_c_ab_a1_20221221.txt --> abc_c_ab_a1
abc_c_ab_a1_a11_20221221.txt --> abc_c_ab_a1_a11
I tried to get it done using indexof() to get the position of the final underscore but it does not accept negative values, so I got the below logic which works in KQL (Azure Data Explorer) but fails in ADF because there is no CountOf() in this tool. Is there any equivalent function in ADF or can you please suggest me how to achieve the same in ADF?
substring("abc_xyz_20221221.txt", 0,
indexof("abc_xyz_20221221.txt", "_", 0,
strlen("abc_xyz_20221221.txt"),
countof("abc_xyz_20221221.txt", '_')))

You can try like this also using split and join inside ForEach activity.
Array for ForEach activity:
["abc_xyz_20221221.txt","abc_xyz_a1_20221221.txt","abc_c_ab_a1_20221221.txt","abc_c_ab_a1_a11_20221221.txt"]
Append variable inside ForEach:
#join(take(split(item(), '_'),add(length(split(item(), '_')),-1)),'_')
Result in an array variable:
As mentioned by #Joel Cochran, use the below expression in the append variable inside ForEach with lastIndexOf().
#substring(item(),0,lastindexof(item(),'_'))

This is a just a simpler form of what #Rakesh called out above . The only difference being , his implementation is iterating . In my case the file name is stored in a variable named foo
#substring(variables('foo'),0,lastindexof(variables('foo'),'_'))
output

Related

how to pass the outputs from Get metadata stage and use it for file name comparison in databricks notebook

I have 2 Get metadata stages in ADF which is fetching file names from 2 different folders, I need to use these outputs for file name comparison in databricks notebook and return true if all the files are present.
how to pass the output from Get meta data stages to databricks and perform string comparison and
return true if all files are present and return false if even 1 file is missing
How to achieve this?
Please find the below answer which I explained with 1 Get metadata stage , the same can be replicated for more than one also.
Create an ADF pipeline with below activities.
Now in the Get Metadata activity , add the childItems in the Fieldlist as argument, to pass the output of Get Metadata to Notebook as show below
In the Databricks Notebook activity , add the below parameter as Base Paramter which will capture the output of Get Metadata and pass as input paramater to Notebook. Generally this parameter will of object datatype , but I converted to string datatype to access the names of files in the notebook as show below
#string(activity('Get Metadata1').output.childItems)
Now we can able to access the Get Metadata output as string in the notebook.
import ast
required_filenames = ['File1.csv','File2.csv','File3.csv'] ##This is for comparing with the output we get from GetMetadata activity.
metadata_value = dbutils.widgets.get('metadata_output') ##Accessing the output from Get Metadata and storing into a variable using databricks widgets.
metadata_list = ast.literal_eval(metadata_value) ##Converting the above string datatype to the list datatype.
blob_output_list=[] ##Creating an empty list to add the names of files we get from GetMetadata activity.
for i in metadata_list:
blob_output_list.append(i['name']) ##This will add all the names of files from blob storage to the empty list we created above.
validateif = all(item in blob_output_list for item in required_filenames) ##This validateif variable now compare both the lists using list comprehension and provide either True or False.
I tried in the above way and can able to solve the provided requirement. Hope this helps.
Request to please upvote the answer if this helps in your requirement.

Data Factory activity to convert in proper json

I am running my ADF pipeline with Dataflow and I am getting the output as json as something like this
{"key1":"value1","key2":"[vaq:233,popo:basic5542]"}
However, my actual requirement is to have something like this.
{"key1":"value1","key2":["vaq:233","popo:basic5542"]}
Check the placement of double inverted commas for key "key2".In my Data factory pipeline I am using Derived column action in Dataflow and for key2 I am doing concat ("[",Data1,",popo:basic5542]" ) and Data1 has value vaq:233.
How can I adjust the double inverted comma here?
You could you use the below expression and check whether this meets your requirement.
array(Data1,"popo:basic5542")
Instead of the concat function.
Output :
["pea:P1013","popo:basic5542"]
Considering popo:basic5542 is a static value, you can try as below expression.
concat("[","\"",Data1,"\"",",","\"","popo:basic5542","\"","]")
Or if you are getting popo and basic5542 dynamically, you can try as below.
concat("[","\"",Data1,"\"",",","\"",popo,":",basic5542,"\"","]")
Example:

How to extract the value from a json object in Azure Data Factory

I have my ADF pipeline, Where my final output from set variable activity is something like this {name:test, value:1234},
The input coming to this variable is
{
"variableName": "test",
"value": "test:1234"
}
The expression provided in Set variable Item column is #item().ColumnName. And the ColumnName in my JSon file is something like this "ColumnName":"test:1234"
How can I change it so that I get only 1234. I am only interested in the value coming here.
It looks like you need to split the value by colon which you can do using Azure Data Factory (ADF) expressions and functions: the split function, which splits a string into an array and the last function to get the last item from the array. This works quite neatly in this case:
#last(split(variables('varWorking'), ':'))
Sample results:
Change the variable name to suit your case. You can also use string methods like lastIndexOf to locate the colon, and grab the rest of the string from there. A sample expression would be something like this:
#substring(variables('varWorking'),add(indexof(variables('varWorking'), ':'),1),4)
It's a bit more complicated but may work for you, depending on the requirement.
It seems like you are using it inside of an iterator since you got item but however, I tried with a simple json lookup value
#last(split(activity('Lookup').output.value[0].ColumnName,':'))

Foreach activity does not loop an array of numbers Azure data factory

I'm using Azure function to retrieve an array, it works very well but the problem I can't pass that array into a foreach activity, it does not iterate the array.
the result of the function activity :
the set variable :
#uriComponentToString(replace(uriComponent(activity('Azure Function1').output.Response), '%0D%0A', ''))
the result of set variable :
foreach activity :
when I execute the pipeline the append activity executes just one time :
Now the problem is the foreach activity, it treats the array as one value and it pass it to the append vaiable inside the foreach, how can resolve this proble please .
result variable is already an array. I think you need not convert to array like this while passing on to the foreach loop.'
#array(variables('result'))
Instead, just pass on the variable value directly like
#variables('result')
Thank you GregGalloway. Posting your suggestion as an answer to help other community members.
Use #json(variables('result')) instead of #array(variables('result')) in item field of ForEach activity

Use CSV values in JMeter as request path

I have one of jmeter User defined variable as a "comma separated value" - ${countries} = IN,US,CA,ALL .
(I was first trying to get it as a list/array - [IN,US,CA,ALL] )
I want to use the variable to test a web service - GET /${country}/info . IS it possible using ForEach controller or Loop controller ?
Only thing is that I want to save it or read it as IN,US,..,ALL and use it in the request path.
Thanks
The CSV should be as per the format mentioned in the image attached.
Refer to the link on how to use CSV in Jmeter: http://ivetetecedor.com/how-to-use-a-csv-file-with-jmeter/
Thread Group Settings
No. of threads: 1
Ramp-up period: 1
Loop Count: 4
Hope this will help.
CSV config is a red herring, you don't need it.
You can use a regular expression extractor to split up the variable into another variable (eg MyVar), using something like:
(.+?)[,\n]
This is trying to match each item before a , or newline. It will place the values in variables like MyVar_1, MyVar_2, etc. This is as close to an array as JMeter understands natively.
You can then loop on the contents of the matches using MyVar_matchNr, and MyVar_1 to MyVar_n (you will need to use __V() function to access the 'array' contents.