Azure data factory get object properties from array - azure-data-factory

I have a Get Metadata activity which get all child items under a blob container. There are both files and folders but i just need files. So in a filter activity which filter for only items of type = file. This is what I got from the filter activity:
Output
{
"ItemsCount": 4,
"FilteredItemsCount": 3,
"Value": [
{
"name": "BRAND MAPPING.csv",
"type": "File"
},
{
"name": "ChinaBIHRA.csv",
"type": "File"
},
{
"name": "ChinaBIHRA.csv_20201021121500",
"type": "File"
}
]
}
So there is an array of 3 objects being returned. Each object has a name and type properties. I want just the names to be fed to a store procedure activity as a parameter. I have used this expression to try to get a comma separated list as the parameter.
#join(activity('Filter1').output.Value.name, ',')
and got this error:
The expression 'join(activity('Filter1').output.Value.name, ',')' cannot be evaluated because property 'name' cannot be selected. Array elements can only be selected using an integer index.
So how can I achieve this?

You can create For Each activity after Filter activity. Within For Each activity, append file name.
Step:
1.create two variable.
2.Setting of For Each activity
3.Setting of Append Variable activity within For Each activity
4.Setting of Set variable

Use the below codeblock instead
#concat('''',join(json(replace(replace(replace(replace(string(
activity('Filter1').output.Value)
,',"type":"File"','')
,'"name":','')
,'{','')
,'}','')),''','''),'''')
This would forego the use of multiple activities, and you can use the existing framework that you have.

Related

Check If the Array contains value in Azure Data Factory

I need to process files in the container using Azure Datafactory and keep a track of processed files in the next execution.
so I am keeping a table in DB which stores the processed file information,
In ADF I am getting the FileNames of the processed files and I want to check whether the current file has been processed or not.
I am Using Lookup activity: Get All Files Processed
to get the processed files from DB by using below query:
select FileName from meta.Processed_Files;
Then I am traversing over the directory, and getting File Details for current File in the directory by using Get Metadata Activity: "Get Detail of Current File in Iteration"
and in the If Condition activity, I am using following Expression:
#not(contains(activity('Get All Files Processed').output.value,activity('Get Detail of current file in iteration').output.itemName))
This is always returning True even if the file has been processed
How do we compare the FileName from the returned value
Output of activity('Get All Files Processed').output.value
{
"count": 37,
"value": [
{
"FileName": "20210804074153AlteryxRunStats.xlsx"
},
{
"FileName": "20210805074129AlteryxRunStats.xlsx"
},
{
"FileName": "20210806074152AlteryxRunStats.xlsx"
},
{
"FileName": "20210809074143AlteryxRunStats.xlsx"
},
{
"FileName": "20210809074316AlteryxRunStats.xlsx"
},
{
"FileName": "20210810074135AlteryxRunStats.xlsx"
},
{
"FileName": "20210811074306AlteryxRunStats.xlsx"
},
Output of activity('Get Detail of current file in iteration').output.itemName
"20210804074153AlteryxRunStats.xlsx"
I often pass this type of thing off to SQL in Azure Data Factory (ADF) too, especially if I've got one in the architecture. However bearing in mind that any hand-offs in ADF take time, it is possible to check if an item exists in an array using contains, eg a set of files returned from a Lookup.
Background
Ordinary arrays normally look like this: [1,2,3] or ["a","b","c"], but if you think about values that get returned in ADF, eg from Lookups, they they look more like this:
{
"count": 3,
"value": [
{
"Filename": "file1.txt"
},
{
"Filename": "file2.txt"
},
{
"Filename": "file3.txt"
}
],
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (North Europe)",
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
...
So what you've got is a complex piece of JSON representing an object (the return value of the Lookup activity plus some additional useful info about the execution), and the array we are interested in is within the value object. However it has additional curly brackets, ie it is itself an object.
Solution
So the thing to do is to pass to contains something that will look like your object which has the single attribute Filename. Use concat to create the string and json to make it authentic:
#contains(activity('Lookup').output.value, json(concat('{"Filename":"',pipeline().parameters.pFileToCheck,'"}')))
Here I'm using a parameter which holds the filename to check but this could also be a variable or output from another Lookup activity.
Sample output from Lookup:
The Set Variable expression using contains:
The result assigned to a variable of boolean type:
I tried something like this.
from SQL table, brought all the processed files as comma-separated values using select STRING_AGG(processedfile, ',') as files in lookup activity
Assign the comma separated value to an array variable (test) using split function
#split(activity('Lookup1').output.value[0]['files'],',')
meta data activity to get current files in directory
filter activity to filter the files in current directory against the processed files
items:
#activity('Get Metadata1').output.childitems
condition:
#not(contains(variables('test'),item().name))

How can I get the count of JSON array in ADF?

I'm using Azure data factory to retrieve data and copy into a database... the Source looks like this:
{
"GroupIds": [
"4ee1a-0856-4618-4c3c77302b",
"21259-0ce1-4a30-2a499965d9",
"b2209-4dda-4e2f-029384e4ad",
"63ac6-fcbc-8f7e-36fdc5e4f9",
"821c9-aa73-4a94-3fc0bd2338"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}
In my ADF pipeline, I am trying to get the count of GroupIds and store that in a database column (along with the associated Id from the JSON above).
Is there some kind of syntax I can use to tell ADF that I just want the count of GroupIds or is this going to require some kind of recursive loop activity?
You can use the length function in Azure Data Factory (ADF) to check the length of json arrays:
length(json(variables('varSource')).GroupIds)
If you are loading the data to a SQL database then you could use OPENJSON, a simple example:
DECLARE #json NVARCHAR(MAX) = '{
"GroupIds": [
"4ee1a-0856-4618-4c3c77302b",
"21259-0ce1-4a30-2a499965d9",
"b2209-4dda-4e2f-029384e4ad",
"63ac6-fcbc-8f7e-36fdc5e4f9",
"821c9-aa73-4a94-3fc0bd2338"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}'
SELECT *
FROM OPENJSON( #json, '$.GroupIds' )
SELECT COUNT(*) countOfGroupIds
FROM OPENJSON( #json, '$.GroupIds' );
My results:
If your data is stored in a table the code is similar. Make sense?
Another funky way to approach it if you really need the count in-line, is to convert the JSON to XML using the built-in functions and then run some XPath on it. It's not as complicated as it sounds and would allow you to get the result inside the pipeline.
The Data Factory XML function converts JSON to XML, but that JSON must have a single root property. We can fix up the json with concat and a single line of code. In this example I'm using a Set Variable activity, where varSource is your original JSON:
#concat('{"root":', variables('varSource'), '}')
Next, we can just apply the XPath with another simple expression:
#string(xpath(xml(json(variables('varIntermed1'))), 'count(/root/GroupIds)'))
My results:
Easy huh. It's a shame there isn't more built-in support for JPath unless I'm missing something, although you can use limited JPath in the Copy activity.
You can use Data flow activity in the Azure data factory pipeline to get the count.
Step1:
Connect the Source to JSON dataset, and in Source options under JSON settings, select single document.
In the source preview, you can see there are 5 GroupIDs per ID.
Step2:
Use flatten transformation to deformalize the values into rows for GroupIDs.
Select GroupIDs array in Unroll by and Unroll root.
Step3:
Use Aggregate transformation, to get the count of GroupIDs group by ID.
Under Group by: Select a column from the drop-down for your aggregation.
Under Aggregate: You can build the expression to get the count of the column (GroupIDs).
Aggregate Data preview:
Step4: Connect the output to Sink transformation to load the final output to database.

Filter Lookup Results from values in Second Lookup in Azure Data Factory

I have two lookups within an Until activity in ADF. The first lookup (BookList) is a list of books that look like the JSON listed below.
[
{
"BookID": 1,
"BookName": "Book A"
},
{
"BookID": 2,
"BookName": "Book B"
}
]
The second lookup is a list of books that I want to exclude from the first list (ExcludedBooks) which is listed below.
[
{
"BookID": 2,
"BookName": "Book B"
}
]
After these two lookups, I have a Filter activity whose items are the values from BookList lookup. I would like the filter condition to be based on the BookID value not being listed in the ExcludedBooks values, but I'm not sure how to write this condition based on the collection functions in ADF. What I have is listed below which does not work.
#not(contains(activity('ExcludedBooks').output.value, item().BookID))
I realize one way to solve this is to loop through each record of the ExcludedBooks and use a SetVariable
activity to build an array of BookIDs which WOULD work with the collection function Contains(), but ADF does not allow nested activity groups for some reason (ForEach within an Until).
I also cannot set the list of excluded books outside of the Until activtity as it will change with each iteration of the Until activity. I also realize the workaround to the nested group activity restriction is to create a completely different pipeline, but that is not ideal and creates unnecessary complexity when trying to return the results.
Does anyone have any suggestions for how to filter the results of a lookup based on the results of another lookup?
Below expression doesn't work because item of activity('ExcludedBooks').output.value is object,item().BookID is number.
#not(contains(activity('ExcludedBooks').output.value, item().BookID))
If your each item in ExcludedBooks is the same as your item in BookList(like your provide sample),you can use this expression:#not(contains(activity('ExcludedBooks').output.value, item())).
My test result:
For another hand,if your item in ExcludedBooks like this json(BookList is the same as your provided):
[
{
"BookID": 2,
"BookName": "Book B",
"num": 22
}
]
you can only compare their BookID by using this expression:
#not(contains(join(activity('ExcludedBooks').output.value,','),concat('"BookID":',item().BookID,',')))
(cast activity('ExcludedBooks').output.value to string,concat item() in 'BookList' as "BookID":2, and check whether 'ExcludedBooks' string contains 'BookList' item string)
My test result:
Hope this can help you.

Convert Row Count to INT in Azure Data Factory

I am trying to use a Lookup Activity to return a row count. I am able to do this, but once I do, I would like to run an If Statement against it and if the count returns more than 20MIL in rows, I want to execute an additional pipeline for further table manipulation. The issue, however, is that I can not compare the returned value to a static integer. Below is the current Dynamic Expression I have for this If Statement:
#greater(int(activity('COUNT_RL_WK_GRBY_LOOKUP').output),20000000)
and when fired, the following error is returned:
{
"errorCode": "InvalidTemplate",
"message": "The function 'int' was invoked with a parameter that is not valid. The value cannot be converted to the target type",
"failureType": "UserError",
"target": "If Condition1",
"details": ""
}
Is it possible to convert this returned value to an integer in order to make the comparison? If not, is there a possible work around in order to achieve my desired result?
Looks like the issue is with your dynamic expression. Please correct your dynamic expression similar to below and retry.
If firstRowOnly is set to true : #greater(int(activity('COUNT_RL_WK_GRBY_LOOKUP').output.firstRow.propertyname),20000000)
If firstRowOnly is set to false : #greater(int(activity('COUNT_RL_WK_GRBY_LOOKUP').output.value[zero based index].propertyname),20000000)
The lookup result is returned in the output section of the activity run result.
When firstRowOnly is set to true (default), the output format is as shown in the following code. The lookup result is under a fixed firstRow key. To use the result in subsequent activity, use the pattern of #{activity('MyLookupActivity').output.firstRow.TableName}.
Sample Output JSON code is as follows:
{
"firstRow":
{
"Id": "1",
"TableName" : "Table1"
}
}
When firstRowOnly is set to false, the output format is as shown in the following code. A count field indicates how many records are returned. Detailed values are displayed under a fixed value array. In such a case, the Lookup activity is followed by a Foreach activity. You pass the value array to the ForEach activity items field by using the pattern of #activity('MyLookupActivity').output.value. To access elements in the value array, use the following syntax: #{activity('lookupActivity').output.value[zero based index].propertyname}. An example is #{activity('lookupActivity').output.value[0].tablename}.
Sample Output JSON Code is as follows:
{
"count": "2",
"value": [
{
"Id": "1",
"TableName" : "Table1"
},
{
"Id": "2",
"TableName" : "Table2"
}
]
}
Hope this helps.
Do this - when you run the debugger look at the output from your lookup. It will give a json string including the alias for the result of your query. If it's not firstrow set then you get a table. But for first you'll get output then firstRow and then your alias. So that's what you specify.
For example...if you put alias of your count as Row_Cnt then...
#greater(activity('COUNT_RL_WK_GRBY_LOOKUP').output.firstRow.Row_Cnt,20000000)
You don't need the int function. You were trying to do that (just like I was!) because it was complaining about datatype. That's because you were returning a bunch of json text as the output instead of the value you were after. Totally makes sense after you realize how it works. But it is NOT intuitively obvious because it's coming back with data but its string stuff from json, not the value you're after. And functions like equals are just happy with that. It's not until you try to do something like greater where it looks for numeric value that it chokes.

Is it possible to run a query using Orion where the searching criteria is given by attributes value?

As I create entities in an Orion server, I can search by ID, as flat or using regular expressions:
http://<localhost>:1026/v1/queryContext
Content:
{
"entities": [
{
"type": "Sensor",
"isPattern": "true",
"id": "sensor_1.*"
}
],
"attributes": ["temperature","humidity"]
}
In the above example I'd get all objects of type "Sensor" whose ID starts with "sensor_1", and their attributes "temperature" and "humidity". I wonder if there is any way that would allow me to search by specific attribute value, for example to get those sensors whose humidity is over "60.2", or this selection must be done over the retrieved data queried by ID.
Not in the current Orion version (0.19.0) but it will be possible in the future. Have a look to the attribute::<name> filter with the = operator in this document.