Azure Data Factory - Copy Activity - rest api collection reference - rest

Helo eveyone,
I am fairly new to Data Factory and I need to copy information from Dynamics Business Central's Rest API. I am struggling with the "Details" type entities such as "invoiceSalesHeader".
The api for that entity forces me to provide a header ID as a filter. In that sense, I would have to loop x times (a few thousand) and call the Rest API to retreive the lines of each sales invoice. I find that completely ridiculous and am trying to find other ways to get the information.
To avoid doing that, I am trying to get the information by calling the "salesInvoice" entity and use "$expand=salesInvoiceLines".
That gets me the information I need but inside data factory's Copy Activity, I am struggling with what I should put as a "collection reference" so that I end up with one row per salesInvoiceLine.
The data returned is an array of sales invoices with a sub array of invoice lines.
If I select "salesInvoiceLines" as the collection reference, I end up with "$['value'][0]['salesInvoiceLines']" and that only gives me the lines for the first invoice (since there is an index of zero).
What should I put in Collection Reference so that I get one row per salesInvoiceLine

It is not support to foreach nested json array in ADF.
Alternatively, we can use a Flattern activity in data flow to flatten the nested json array.
Here is my example:
This is my example json data, the structure is like yours:
[
{
"id": 1,
"Value": "January",
"orders":[{"orderid":1,"orderno":"qaz"},{"orderid":2,"orderno":"edc"}]
},
{
"id": 2,
"Value": "February",
"orders":[{"orderid":3,"orderno":"wsx"},{"orderid":4,"orderno":"rfv"}]
},
{
"id": 3,
"Value": "March",
"orders":[{"orderid":5,"orderno":"rfv"},{"orderid":6,"orderno":"tgb"}]
},
{
"id": 11,
"Value": "November",
"orders":[{"orderid":7,"orderno":"yhn"},{"orderid":8,"orderno":"ujm"}]
}
]
In the dataflow, we can select the header of the nested json array, here is orders:
Then we can see the result, we have transposed the JSON orders array with 2 objects (orderid, orderno) into 8 flatten rows:

Related

How can I get the count of JSON array in ADF?

I'm using Azure data factory to retrieve data and copy into a database... the Source looks like this:
{
"GroupIds": [
"4ee1a-0856-4618-4c3c77302b",
"21259-0ce1-4a30-2a499965d9",
"b2209-4dda-4e2f-029384e4ad",
"63ac6-fcbc-8f7e-36fdc5e4f9",
"821c9-aa73-4a94-3fc0bd2338"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}
In my ADF pipeline, I am trying to get the count of GroupIds and store that in a database column (along with the associated Id from the JSON above).
Is there some kind of syntax I can use to tell ADF that I just want the count of GroupIds or is this going to require some kind of recursive loop activity?
You can use the length function in Azure Data Factory (ADF) to check the length of json arrays:
length(json(variables('varSource')).GroupIds)
If you are loading the data to a SQL database then you could use OPENJSON, a simple example:
DECLARE #json NVARCHAR(MAX) = '{
"GroupIds": [
"4ee1a-0856-4618-4c3c77302b",
"21259-0ce1-4a30-2a499965d9",
"b2209-4dda-4e2f-029384e4ad",
"63ac6-fcbc-8f7e-36fdc5e4f9",
"821c9-aa73-4a94-3fc0bd2338"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}'
SELECT *
FROM OPENJSON( #json, '$.GroupIds' )
SELECT COUNT(*) countOfGroupIds
FROM OPENJSON( #json, '$.GroupIds' );
My results:
If your data is stored in a table the code is similar. Make sense?
Another funky way to approach it if you really need the count in-line, is to convert the JSON to XML using the built-in functions and then run some XPath on it. It's not as complicated as it sounds and would allow you to get the result inside the pipeline.
The Data Factory XML function converts JSON to XML, but that JSON must have a single root property. We can fix up the json with concat and a single line of code. In this example I'm using a Set Variable activity, where varSource is your original JSON:
#concat('{"root":', variables('varSource'), '}')
Next, we can just apply the XPath with another simple expression:
#string(xpath(xml(json(variables('varIntermed1'))), 'count(/root/GroupIds)'))
My results:
Easy huh. It's a shame there isn't more built-in support for JPath unless I'm missing something, although you can use limited JPath in the Copy activity.
You can use Data flow activity in the Azure data factory pipeline to get the count.
Step1:
Connect the Source to JSON dataset, and in Source options under JSON settings, select single document.
In the source preview, you can see there are 5 GroupIDs per ID.
Step2:
Use flatten transformation to deformalize the values into rows for GroupIDs.
Select GroupIDs array in Unroll by and Unroll root.
Step3:
Use Aggregate transformation, to get the count of GroupIDs group by ID.
Under Group by: Select a column from the drop-down for your aggregation.
Under Aggregate: You can build the expression to get the count of the column (GroupIDs).
Aggregate Data preview:
Step4: Connect the output to Sink transformation to load the final output to database.

Return Only Most Recent Record From Related Entity in OData Query

I am trying to create an OData Query to return Bugs from Azure DevOps for a PowerBI report, but I am not getting the results I am looking for, as one of the Related Entities that I am trying to expand returns multiple results.
My base Query looks like this (simplified & removing custom fields)
https://analytics.dev.azure.com/[organization]/[project]/_odata/v3.0-preview/WorkItems?$select=WorkItemId,WorkItemType,Title,State,LeadTimeDays&$filter=WorkItemType eq 'bug'&$expand=Teams($select=TeamName,AnalyticsUpdatedDate)
Some records return multiple Team Names in the JSON Response
"value": [
{
"WorkItemId": 16547,
"LeadTimeDays": 173.0639004,
"Title": "test",
"WorkItemType": "Bug",
"State": "Closed",
"Severity": "3 - Medium",
"Teams": [
{
"TeamName": "Team1",
"AnalyticsUpdatedDate": "2019-09-17T01:48:46.5433333Z"
},
{
"TeamName": "Team2",
"AnalyticsUpdatedDate": "2019-12-03T16:52:39.9466667Z"
}
]
}
]
I can't tell why these records have multiple values for this Entity, but I only need the most recent (Team 2 in the example above). Is it possible to return only the most recent record for the Related Teams Entity? I've tried using orderby and top on the expand clause and other places in the query to no effect. If I can't do it in the OData query, then I can accomplish it in Power BI after expanding the Table.
I found how to solve this. I needed semicolons between the clauses within the Expand clause.
https://analytics.dev.azure.com/[organization]/[projet]_odata/v3.0-preview/WorkItems?$select=WorkItemId,WorkItemType,Title,State,LeadTimeDays&$filter=WorkItemType eq 'bug'&$expand=Teams($select=TeamName,AnalyticsUpdatedDate;$orderby=AnalyticsUpdatedDate desc;$top=1)

Filter Lookup Results from values in Second Lookup in Azure Data Factory

I have two lookups within an Until activity in ADF. The first lookup (BookList) is a list of books that look like the JSON listed below.
[
{
"BookID": 1,
"BookName": "Book A"
},
{
"BookID": 2,
"BookName": "Book B"
}
]
The second lookup is a list of books that I want to exclude from the first list (ExcludedBooks) which is listed below.
[
{
"BookID": 2,
"BookName": "Book B"
}
]
After these two lookups, I have a Filter activity whose items are the values from BookList lookup. I would like the filter condition to be based on the BookID value not being listed in the ExcludedBooks values, but I'm not sure how to write this condition based on the collection functions in ADF. What I have is listed below which does not work.
#not(contains(activity('ExcludedBooks').output.value, item().BookID))
I realize one way to solve this is to loop through each record of the ExcludedBooks and use a SetVariable
activity to build an array of BookIDs which WOULD work with the collection function Contains(), but ADF does not allow nested activity groups for some reason (ForEach within an Until).
I also cannot set the list of excluded books outside of the Until activtity as it will change with each iteration of the Until activity. I also realize the workaround to the nested group activity restriction is to create a completely different pipeline, but that is not ideal and creates unnecessary complexity when trying to return the results.
Does anyone have any suggestions for how to filter the results of a lookup based on the results of another lookup?
Below expression doesn't work because item of activity('ExcludedBooks').output.value is object,item().BookID is number.
#not(contains(activity('ExcludedBooks').output.value, item().BookID))
If your each item in ExcludedBooks is the same as your item in BookList(like your provide sample),you can use this expression:#not(contains(activity('ExcludedBooks').output.value, item())).
My test result:
For another hand,if your item in ExcludedBooks like this json(BookList is the same as your provided):
[
{
"BookID": 2,
"BookName": "Book B",
"num": 22
}
]
you can only compare their BookID by using this expression:
#not(contains(join(activity('ExcludedBooks').output.value,','),concat('"BookID":',item().BookID,',')))
(cast activity('ExcludedBooks').output.value to string,concat item() in 'BookList' as "BookID":2, and check whether 'ExcludedBooks' string contains 'BookList' item string)
My test result:
Hope this can help you.

Partial Representations in REST API's for collections vs items

I'm putting together a REST based API but I'm not sure on how I should deliver the response for collections vs individual resources.
Does it make sense to have a slimmed down representation for a collection over a single item in the world of REST?
Say I have something along the lines of this for a collection of albums:
{
items: [
{
"id": 1,
"title": "Thriller"
},
...
]
}
But then for the actual individual item I had
{
"id": 1,
"title": "Thriller",
"artist": "Michael Jackson",
"released": "1982",
"imageLinks": {
"smallThumbnail": "...",
"largeThumbnail": "..."
}
...
}
A resource representation should be unique irrespective of whether it is given as a collection or a single item. But, you can introduce a new parameter like fields which can be used by the clients to get only the required field thereby optimising the bandwidth.
/albums - This should give the list of objects each having the structure of what you would give in a individual item api
/albums?fields=id,title - This can give the list of objects with just the id & title.

ExtJS4 Ext Direct form loading with array named fields

I have fields like this (for example only one):
Ext.create("Ext.form.Number", {
name: "field[]",
allowDecimals: true
});
...and I can post nice values. But when I'm trying to load values (form.load({params: {id: 1}})), it returns failure, and doesn't load the values to the fields.
Returned ajax values like this:
{
"type": "rpc",
"tid": 2,
"action": "MyAction",
"method": "getFormData",
"result": {
"field":["5"]
}
}
Can you help me, what should I do? Form can't load array values to array fields?
Array is not a valid type for fields. How would you expect this to work ? ExtJs stores are like tables in a database, model are like the rows.
As you cannot save an array into a field in mysql, you cannot in a field of an ExtJs model either.
You have to model your data differently in two tables instead of one ( main table and details table ). Do it the same way as you would in a database.