Azure Data Factory Copy Data activity - Use variables/expressions in mapping to dynamically select correct incoming column - azure-data-factory

I have the below mappings for a Copy activity in ADF:
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"path": "$['id']"
},
"sink": {
"name": "TicketID"
}
},
{
"source": {
"path": "$['summary']"
},
"sink": {
"name": "TicketSummary"
}
},
{
"source": {
"path": "$['status']['name']"
},
"sink": {
"name": "TicketStatus"
}
},
{
"source": {
"path": "$['company']['identifier']"
},
"sink": {
"name": "CustomerAccountNumber"
}
},
{
"source": {
"path": "$['company']['name']"
},
"sink": {
"name": "CustomerName"
}
},
{
"source": {
"path": "$['customFields'][74]['value']"
},
"sink": {
"name": "Landlord"
}
},
{
"source": {
"path": "$['customFields'][75]['value']"
},
"sink": {
"name": "Building"
}
}
],
"collectionReference": "",
"mapComplexValuesToString": false
}
The challenge I need to overcome is that the array indexes of the custom fields of the last two sources might change. So I've created an Azure Function which calculates the correct array index. However I can't work out how to use the Azure Function output value in the source path string - I have tried to refer to it using an expression like #activity('Get Building Field Index').output but as it's expecting a JSON path, this doesn't work and produces an error:
JSON path $['customFields'][#activity('Get Building Field Index').outputS]['value'] is invalid.
Is there a different way to achieve what I am trying to do?
Thanks in advance

I have a slightly similar scenario that you might be able to work with.
First, I have a JSON file that is emitted that I then access with Synapse/ADF with Lookup.
I next have a For each activity that runs a copy data activity.
The for each activity receives my Lookup and makes my JSON usable, by setting the following in the For each's Settings like so:
#activity('Lookup').output.firstRow.childItems
My JSON roughly looks as follows:
{"childItems": [
{"subpath": "path/to/folder",
"filename": "filename.parquet",
"subfolder": "subfolder",
"outfolder": "subfolder",
"origin": "A"}]}
So this means in my copy data activity within the for each activity, I can access the parameters of my JSON like so:
#item()['subpath']
#item()['filename']
#item()['folder']
.. etc
Edit:
Adding some screen caps of the parameterization:
https://i.stack.imgur.com/aHpWk.png

Related

Azure data factory Dynamic Content

I have the following output from a web activity .
{
"value": [
{
"id": "/subscriptions/xy_csv",
"name": "xy_csv",
"type": "Microsoft.code",
"etag": "6200",
"properties": {
"folder": {
"name": "samplecodes"
},
"content": {
"query": "select * from table 1",
"metadata": {
"language": "sql"
},
"currentConnection": {
"databaseName": "demo",
"poolName": "Built-in"
},
"resultLimit": 5000
},
"type": "SqlQuery"
}
},
{
"id": "/subscriptions/ab_csv",
"name": "ab_csv",
"type": "Microsoft.code",
"etag": "6200",
"properties": {
"folder": {
"name": "livecode"
},
"content": {
"query": "select * from table 2",
"metadata": {
"language": "sql"
},
"currentConnection": {
"databaseName": "demo",
"poolName": "Built-in"
},
"resultLimit": 5000
},
"type": "SqlQuery"
}
}
]
I would like to create filter activity after the web activity just to filter out items that are saved under the folder name "livecode".
On the filter activity item field I have -#activity('Web1').output.value
On the condition field I have -- #startswith(item().properties.folder.name,'livecode')
The web activity is successful but the filter activity is failed with this error.
{
"errorCode": "InvalidTemplate",
"message": "The execution of template action 'FilterFilter1' failed: The evaluation of 'query' action 'where' expression '#startswith(item().properties.folder.name,'sql')' failed: 'The expression 'startswith(item().properties.folder.name,'sql')' cannot be evaluated because property 'folder' doesn't exist, available properties are 'content, type'.",
"failureType": "UserError",
"target": "Filter1",
"details": ""
}
it feels like I am going wrong on how i have written the Condition Dynamic Content filter to navigate to properties.folder.name. I am not sure what is missing in my condition. Can anyone help? thanks Much appreciated.
The error is because of the web activity output's properties object might not contain the folder sometimes.
I have taken the following json and got the same error:
{
"value":[
{
"id":"/subscriptions/xy_csv",
"name":"xy_csv",
"type":"Microsoft.code",
"etag":"6200",
"properties":{
"content":{
"query":"select * from table 1",
"metadata":{
"language":"sql"
},
"currentConnection":{
"databaseName":"demo",
"poolName":"Built-in"
},
"resultLimit":5000
},
"type":"SqlQuery"
}
},
{
"id":"/subscriptions/ab_csv",
"name":"ab_csv",
"type":"Microsoft.code",
"etag":"6200",
"properties":{
"folder":{
"name":"livecode"
},
"content":{
"query":"select * from table 2",
"metadata":{
"language":"sql"
},
"currentConnection":{
"databaseName":"demo",
"poolName":"Built-in"
},
"resultLimit":5000
},
"type":"SqlQuery"
}
}
]
}
So, you have to modify the filter condition to check whether it contains a folder key or not using the following dynamic content. I have taken your web activity output as a parameter value and took out folder key from properties object:
#startswith(if(contains(item().properties,'folder'),item().properties.folder.name,''),'livecode')
When I debug the pipeline, we get desired result:

JSON Schema - can array / list validation be combined with anyOf?

I have a json document I'm trying to validate with this form:
...
"products": [{
"prop1": "foo",
"prop2": "bar"
}, {
"prop3": "hello",
"prop4": "world"
},
...
There are multiple different forms an object may take. My schema looks like this:
...
"definitions": {
"products": {
"type": "array",
"items": { "$ref": "#/definitions/Product" },
"Product": {
"type": "object",
"oneOf": [
{ "$ref": "#/definitions/Product_Type1" },
{ "$ref": "#/definitions/Product_Type2" },
...
]
},
"Product_Type1": {
"type": "object",
"properties": {
"prop1": { "type": "string" },
"prop2": { "type": "string" }
},
"Product_Type2": {
"type": "object",
"properties": {
"prop3": { "type": "string" },
"prop4": { "type": "string" }
}
...
On top of this, certain properties of the individual product array objects may be indirected via further usage of anyOf or oneOf.
I'm running into issues in VSCode using the built-in schema validation where it throws errors for every item in the products array that don't match Product_Type1.
So it seems the validator latches onto that first oneOf it found and won't validate against any of the other types.
I didn't find any limitations to the oneOf mechanism on jsonschema.org. And there is no mention of it being used in the page specifically dealing with arrays here: https://json-schema.org/understanding-json-schema/reference/array.html
Is what I'm attempting possible?
Your general approach is fine. Let's take a slightly simpler example to illustrate what's going wrong.
Given this schema
{
"oneOf": [
{ "properties": { "foo": { "type": "integer" } } },
{ "properties": { "bar": { "type": "integer" } } }
]
}
And this instance
{ "foo": 42 }
At first glance, this looks like it matches /oneOf/0 and not oneOf/1. It actually matches both schemas, which violates the one-and-only-one constraint imposed by oneOf and the oneOf fails.
Remember that every keyword in JSON Schema is a constraint. Anything that is not explicitly excluded by the schema is allowed. There is nothing in the /oneOf/1 schema that says a "foo" property is not allowed. Nor does is say that "foo" is required. It only says that if the instance has a keyword "foo", then it must be an integer.
To fix this, you will need required and maybe additionalProperties depending on the situation. I show here how you would use additionalProperties, but I recommend you don't use it unless you need to because is does have some problematic properties.
{
"oneOf": [
{
"properties": { "foo": { "type": "integer" } },
"required": ["foo"],
"additionalProperties": false
},
{
"properties": { "bar": { "type": "integer" } },
"required": ["bar"],
"additionalProperties": false
}
]
}

Using ADF Copy Activity with dynamic schema mapping

I'm trying to drive the columnMapping property from a database configuration table. My first activity in the pipeline pulls in the rows from the config table. My copy activity source is a Json file in Azure blob storage and my sink is an Azure SQL database.
In copy activity I'm setting the mapping using the dynamic content window. The code looks like this:
"translator": {
"value": "#json(activity('Lookup1').output.value[0].ColumnMapping)",
"type": "Expression"
}
My question is, what should the value of activity('Lookup1').output.value[0].ColumnMapping look like?
I've tried several different json formats but the copy activity always seems to ignore it.
For example, I've tried:
{
"type": "TabularTranslator",
"columnMappings": {
"view.url": "url"
}
}
and:
"columnMappings": {
"view.url": "url"
}
and:
{
"view.url": "url"
}
In this example, view.url is the name of the column in the JSON source, and url is the name of the column in my destination table in Azure SQL database.
The issue is due to the dot (.) sign in your column name.
To use column mapping, you should also specify structure in your source and sink dataset.
For your source dataset, you need specify your format correctly. And since your column name has dot, you need specify the json path as following.
You could use ADF UI to setup a copy for a single file first to get the related format, structure and column mapping format. Then change it to lookup.
And as my understanding, your first format should be the right format. If it is already in json format, then you may not need use "json" function in your expression.
There seems to be a disconnect between the question and the answer, so I'll hopefully provide a more straightforward answer.
When setting this up, you should have a source dataset with dynamic mapping. The sink doesn't require one, as we're going to specify it in the mapping.
Within the copy activity, format the dynamic json like the following:
{
"structure": [
{
"name": "Address Number"
},
{
"name": "Payment ID"
},
{
"name": "Document Number"
},
...
...
]
}
You would then specify your dynamic mapping like this:
{
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "Address Number",
"type": "Int32"
},
"sink": {
"name": "address_number"
}
},
{
"source": {
"name": "Payment ID",
"type": "Int64"
},
"sink": {
"name": "payment_id"
}
},
{
"source": {
"name": "Document Number",
"type": "Int32"
},
"sink": {
"name": "document_number"
}
},
...
...
]
}
}
Assuming these were set in separate variables, you would want to send the source as a string, and the mapping as json:
source: #string(json(variables('str_dyn_structure')).structure)
mapping: #json(variables('str_dyn_translator')).translator
VladDrak - You could skip the source dynamic definition by building dynamic mapping like this:
{
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"type": "String",
"ordinal": "1"
},
"sink": {
"name": "dateOfActivity",
"type": "String"
}
},
{
"source": {
"type": "String",
"ordinal": "2"
},
"sink": {
"name": "CampaignID",
"type": "String"
}
}
]
}
}

CloudFormation - Access Output of Parent Stack in Child Nested stack

I have a master Cloudformation template which invokes two child templates. I have my first template run and the Outputs captured in the Outputs section of the resource. I have given lot of tries in using the ChildStack01 Output values in the Second Template which is nested and I am not sure why I get Template format error: Unresolved resource dependencies [XYZ] in the Resources block of the template. Here is my master template.
{
"AWSTemplateFormatVersion": "2010-09-09",
"Resources": {
"LambdaStack": {
"Type": "AWS::CloudFormation::Stack",
"Properties": {
"TemplateURL": "https://s3.amazonaws.com/bucket1/cloudformation/Test1.json",
"TimeoutInMinutes": "60"
}
},
"PermissionsStack": {
"Type": "AWS::CloudFormation::Stack",
"Properties": {
"TemplateURL": "https://s3.amazonaws.com/bucket1/cloudformation/Test2.json",
"Parameters": {
"LambdaTest": {
"Fn::GetAtt": ["LambdaStack", "Outputs.LambdaTest"]
}
},
"TimeoutInMinutes": "60"
}
}
}
}
Here is my Test1.json Template
{
"Resources": {
"LambdaTestRes": {
"Type": "AWS::Lambda::Function",
"Properties": {
"Description": "Testing AWS cloud formation",
"FunctionName": "LambdaTest",
"Handler": "lambda_handler.lambda_handler",
"MemorySize": 128,
"Role": "arn:aws:iam::3423435234235:role/lambda_role",
"Runtime": "python2.7",
"Timeout": 300,
"Code": {
"S3Bucket": "bucket1",
"S3Key": "cloudformation/XYZ.zip"
}
}
}
},
"Outputs": {
"LambdaTest": {
"Value": {
"Fn::GetAtt": ["LambdaTestRes", "Arn"]
}
}
}
}
Here is My Test2.json which has to use the output of Test1.json.
{
"Resources": {
"LambdaPermissionLambdaTest": {
"Type": "AWS::Lambda::Permission",
"Properties": {
"Action": "lambda:invokeFunction",
"FunctionName": {
"Ref": "LambdaTest"
},
"Principal": "apigateway.amazonaws.com",
"SourceArn": {
"Fn::Join": ["", ["arn:aws:execute-api:", {
"Ref": "AWS::Region"
}, ":", {
"Ref": "AWS::AccountId"
}, ":", {
"Ref": "TestAPI"
}, "/*"]]
}
}
}
},
"Parameters": {
"LambdaTest": {
"Type": "String"
}
}
}
It is not enough to just have output, you need to export that output.
Look here: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-stack-exports.html
So you need something like:
"Outputs": {
"LambdaTest": {
"Value": {
"Fn::GetAtt": ["LambdaTestRes", "Arn"]
}
"Export": {
"Name": "LambdaTest"
}
}
}
You have two unresolved Ref resource dependencies in Test2.json, one to LambdaTest and one to TestAPI.
For LambdaTest, it looks like you're trying to pass this as a parameter from the parent stack, but you haven't specified it as an input Parameter in the child Test2.json template. Add an entry in Test2.json's Parameters section, like this:
"Parameters": {
"LambdaTest": {
"Type": "String"
}
},
Regarding TestAPI, this reference doesn't seem to appear anywhere else in your templates, so you should either specify this as a fixed string directly, or add another input Parameter in your Test2.json stack (see above) and then provide it from the parent stack.
The error is coming from test1.json(LambdaStack).
Logical ID
An identifier for the current output. The logical ID must be alphanumeric (a-z, A-Z, 0-9) and unique within the template.
It seems you have two logical ID with the same name "LambdaTest", one in resource section and other in output section.

Filtering nested results an OData Query

I have a OData query returning a bunch of items. The results come back looking like this:
{
"d": {
"__metadata": {
"id": "http://dev.sp.swampland.local/_api/SP.UserProfiles.PeopleManager/GetPropertiesFor(accountName=#v)",
"uri": "http://dev.sp.swampland.local/_api/SP.UserProfiles.PeopleManager/GetPropertiesFor(accountName=#v)",
"type": "SP.UserProfiles.PersonProperties"
},
"UserProfileProperties": {
"results": [
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "UserProfile_GUID",
"Value": "66a0c6c2-cbec-4abb-9e25-cc9e924ad390",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "ADGuid",
"Value": "System.Byte[]",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "SID",
"Value": "S-1-5-21-2355771569-1952171574-2825027748-500",
"ValueType": "Edm.String"
}
]
}
}
}
In reality, there's a lot of items (100+) coming back in the UserProfileProperties collection however I'm only looking for a few where the KEY matches a few items but I can't figure out exactly what I need my filter to be. I've tried $filter=UserProfileProperties/Key eq 'SID' but that still gives me everything. Also trying to figure out how to pull back multiple items.
Ideas?
I believe you forgot about how each of the results have a key, not the UserProfileProperties so UserProfileProperties/Key doesn't actually exist. Instead because result is an array you must check either a certain position (eq. result(1)) or use the oData functions any or all.
Try $filter=UserProfileProperties/results/any(r: r/Key eq 'SID') if you want all the profiles where just one of the keys is SID or use
$filter=UserProfileProperties/results/all(r: r/Key eq 'SID') if you want the profiles where every result has a key equaling SID.