Using ADF Copy Activity with dynamic schema mapping - azure-data-factory

I'm trying to drive the columnMapping property from a database configuration table. My first activity in the pipeline pulls in the rows from the config table. My copy activity source is a Json file in Azure blob storage and my sink is an Azure SQL database.
In copy activity I'm setting the mapping using the dynamic content window. The code looks like this:
"translator": {
"value": "#json(activity('Lookup1').output.value[0].ColumnMapping)",
"type": "Expression"
}
My question is, what should the value of activity('Lookup1').output.value[0].ColumnMapping look like?
I've tried several different json formats but the copy activity always seems to ignore it.
For example, I've tried:
{
"type": "TabularTranslator",
"columnMappings": {
"view.url": "url"
}
}
and:
"columnMappings": {
"view.url": "url"
}
and:
{
"view.url": "url"
}
In this example, view.url is the name of the column in the JSON source, and url is the name of the column in my destination table in Azure SQL database.

The issue is due to the dot (.) sign in your column name.
To use column mapping, you should also specify structure in your source and sink dataset.
For your source dataset, you need specify your format correctly. And since your column name has dot, you need specify the json path as following.
You could use ADF UI to setup a copy for a single file first to get the related format, structure and column mapping format. Then change it to lookup.
And as my understanding, your first format should be the right format. If it is already in json format, then you may not need use "json" function in your expression.

There seems to be a disconnect between the question and the answer, so I'll hopefully provide a more straightforward answer.
When setting this up, you should have a source dataset with dynamic mapping. The sink doesn't require one, as we're going to specify it in the mapping.
Within the copy activity, format the dynamic json like the following:
{
"structure": [
{
"name": "Address Number"
},
{
"name": "Payment ID"
},
{
"name": "Document Number"
},
...
...
]
}
You would then specify your dynamic mapping like this:
{
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "Address Number",
"type": "Int32"
},
"sink": {
"name": "address_number"
}
},
{
"source": {
"name": "Payment ID",
"type": "Int64"
},
"sink": {
"name": "payment_id"
}
},
{
"source": {
"name": "Document Number",
"type": "Int32"
},
"sink": {
"name": "document_number"
}
},
...
...
]
}
}
Assuming these were set in separate variables, you would want to send the source as a string, and the mapping as json:
source: #string(json(variables('str_dyn_structure')).structure)
mapping: #json(variables('str_dyn_translator')).translator

VladDrak - You could skip the source dynamic definition by building dynamic mapping like this:
{
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"type": "String",
"ordinal": "1"
},
"sink": {
"name": "dateOfActivity",
"type": "String"
}
},
{
"source": {
"type": "String",
"ordinal": "2"
},
"sink": {
"name": "CampaignID",
"type": "String"
}
}
]
}
}

Related

JDBC sink topic with multiple structs to postgres

I am trying to sink a few topics top a postgres database. However the topic schema defines a array at the top level and within it multiple structs. Automapping does not work and I cannot find any reference how to handle this. I need all structs because they are dependent types, the second struct references the first struct as a field.
Currently it breaks when hitting the 2nd struct stating statusChangeEvent (struct) has no mapping to sql column type. This because it is using auto.create to make a table (probably called ProcessStatus) then when hitting the second entry there is no column of course.
[
{
"type": "record",
"name": "processStatus",
"namespace": "company.some.process",
"fields": [
{
"name": "code",
"doc": "The code of the processStatus",
"type": "string"
},
{
"name": "name",
"doc": "The name of the processStatus",
"type": "string"
},
{
"name": "description",
"type": "string"
},
{
"name": "isCompleted",
"type": "boolean"
},
{
"name": "isSuccessfullyCompleted",
"type": "boolean"
}
]
},
{
"type": "record",
"name": "StatusChangeEvent",
"namespace": "company.some.process",
"fields": [
{
"name": "contNumber",
"type": "string"
},
{
"name": "processId",
"type": "string"
},
{
"name": "processVersion",
"type": "int"
},
{
"name": "extProcessId",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "fromStatus",
"type": "process.status"
},
{
"name": "toStatus",
"doc": "The new status of the process",
"type": "company.some.process.processStatus"
},
{
"name": "changeDateTime",
"type": "long",
"logicalType": "timestamp-millis"
},
{
"name": "isPublic",
"type": "boolean"
}
]
}
]
I am not using ksql atm. Which connector settings are suited for this task? If there is a ksql alternative it would be nice to know but the current requirement is to use the JDBC connector.
I tried using flatten but it does not support struct fields that have a schema. Which seems kind of weird. Aren't schema's the whole selling point of connect with kafka? Or is it more of a constraint you have to work around?
Aren't schema's the whole selling point of connect with kafka?
Yes, but Postgres (or the JDBC Sink, in general) doesn't really support nested objects within columns. For that, you're better off with a document database, such as using Mongo Sink Connector.
Which connector settings are suited for this task?
None, really, other than transforms. You could write your own if flatten doesn't work.
You could try pre-defining your table to use JSONB for the two status columns, however, that's more of a workaround.

Azure data factory Dynamic Content

I have the following output from a web activity .
{
"value": [
{
"id": "/subscriptions/xy_csv",
"name": "xy_csv",
"type": "Microsoft.code",
"etag": "6200",
"properties": {
"folder": {
"name": "samplecodes"
},
"content": {
"query": "select * from table 1",
"metadata": {
"language": "sql"
},
"currentConnection": {
"databaseName": "demo",
"poolName": "Built-in"
},
"resultLimit": 5000
},
"type": "SqlQuery"
}
},
{
"id": "/subscriptions/ab_csv",
"name": "ab_csv",
"type": "Microsoft.code",
"etag": "6200",
"properties": {
"folder": {
"name": "livecode"
},
"content": {
"query": "select * from table 2",
"metadata": {
"language": "sql"
},
"currentConnection": {
"databaseName": "demo",
"poolName": "Built-in"
},
"resultLimit": 5000
},
"type": "SqlQuery"
}
}
]
I would like to create filter activity after the web activity just to filter out items that are saved under the folder name "livecode".
On the filter activity item field I have -#activity('Web1').output.value
On the condition field I have -- #startswith(item().properties.folder.name,'livecode')
The web activity is successful but the filter activity is failed with this error.
{
"errorCode": "InvalidTemplate",
"message": "The execution of template action 'FilterFilter1' failed: The evaluation of 'query' action 'where' expression '#startswith(item().properties.folder.name,'sql')' failed: 'The expression 'startswith(item().properties.folder.name,'sql')' cannot be evaluated because property 'folder' doesn't exist, available properties are 'content, type'.",
"failureType": "UserError",
"target": "Filter1",
"details": ""
}
it feels like I am going wrong on how i have written the Condition Dynamic Content filter to navigate to properties.folder.name. I am not sure what is missing in my condition. Can anyone help? thanks Much appreciated.
The error is because of the web activity output's properties object might not contain the folder sometimes.
I have taken the following json and got the same error:
{
"value":[
{
"id":"/subscriptions/xy_csv",
"name":"xy_csv",
"type":"Microsoft.code",
"etag":"6200",
"properties":{
"content":{
"query":"select * from table 1",
"metadata":{
"language":"sql"
},
"currentConnection":{
"databaseName":"demo",
"poolName":"Built-in"
},
"resultLimit":5000
},
"type":"SqlQuery"
}
},
{
"id":"/subscriptions/ab_csv",
"name":"ab_csv",
"type":"Microsoft.code",
"etag":"6200",
"properties":{
"folder":{
"name":"livecode"
},
"content":{
"query":"select * from table 2",
"metadata":{
"language":"sql"
},
"currentConnection":{
"databaseName":"demo",
"poolName":"Built-in"
},
"resultLimit":5000
},
"type":"SqlQuery"
}
}
]
}
So, you have to modify the filter condition to check whether it contains a folder key or not using the following dynamic content. I have taken your web activity output as a parameter value and took out folder key from properties object:
#startswith(if(contains(item().properties,'folder'),item().properties.folder.name,''),'livecode')
When I debug the pipeline, we get desired result:

Azure Data Factory Copy Data activity - Use variables/expressions in mapping to dynamically select correct incoming column

I have the below mappings for a Copy activity in ADF:
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"path": "$['id']"
},
"sink": {
"name": "TicketID"
}
},
{
"source": {
"path": "$['summary']"
},
"sink": {
"name": "TicketSummary"
}
},
{
"source": {
"path": "$['status']['name']"
},
"sink": {
"name": "TicketStatus"
}
},
{
"source": {
"path": "$['company']['identifier']"
},
"sink": {
"name": "CustomerAccountNumber"
}
},
{
"source": {
"path": "$['company']['name']"
},
"sink": {
"name": "CustomerName"
}
},
{
"source": {
"path": "$['customFields'][74]['value']"
},
"sink": {
"name": "Landlord"
}
},
{
"source": {
"path": "$['customFields'][75]['value']"
},
"sink": {
"name": "Building"
}
}
],
"collectionReference": "",
"mapComplexValuesToString": false
}
The challenge I need to overcome is that the array indexes of the custom fields of the last two sources might change. So I've created an Azure Function which calculates the correct array index. However I can't work out how to use the Azure Function output value in the source path string - I have tried to refer to it using an expression like #activity('Get Building Field Index').output but as it's expecting a JSON path, this doesn't work and produces an error:
JSON path $['customFields'][#activity('Get Building Field Index').outputS]['value'] is invalid.
Is there a different way to achieve what I am trying to do?
Thanks in advance
I have a slightly similar scenario that you might be able to work with.
First, I have a JSON file that is emitted that I then access with Synapse/ADF with Lookup.
I next have a For each activity that runs a copy data activity.
The for each activity receives my Lookup and makes my JSON usable, by setting the following in the For each's Settings like so:
#activity('Lookup').output.firstRow.childItems
My JSON roughly looks as follows:
{"childItems": [
{"subpath": "path/to/folder",
"filename": "filename.parquet",
"subfolder": "subfolder",
"outfolder": "subfolder",
"origin": "A"}]}
So this means in my copy data activity within the for each activity, I can access the parameters of my JSON like so:
#item()['subpath']
#item()['filename']
#item()['folder']
.. etc
Edit:
Adding some screen caps of the parameterization:
https://i.stack.imgur.com/aHpWk.png

How do I use an extended definition and not allow additional properties in a way that is compatible with multiple validators (JSON schema draft 7)?

I am creating a strict validator for a complex JSON file and want to re-use various definitions in order to keep the schema manageable and easier to update.
According to the documentation it is necessary to use allOf to extend a definition to add more properties. This is exactly what I've done, but I find that without use of additionalProperties set to false validation doesn't prevent arbitrary other properties being added.
The following massively cut-down schema demonstrates what I'm doing:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/schema/2021/02/example.json",
"description": "This schema demonstrates how VSCode's JSON schema mechanism fails with allOf used to extend a definition",
"definitions": {
"valueProvider": {
"type": "object",
"properties": {
"example": {
"type": "string"
},
"alternative": {
"type": "string"
}
},
"oneOf": [
{
"required": [
"example"
]
},
{
"required": [
"alternative"
]
}
]
},
"selector": {
"type": "object",
"allOf": [
{
"$ref": "#/definitions/valueProvider"
},
{
"required": [
"operator",
"value"
],
"properties": {
"operator": {
"type": "string",
"enum": [
"IsNull",
"Equals",
"NotEquals",
"Greater",
"GreaterOrEquals",
"Less",
"LessOrEquals"
]
},
"value": {
"type": "string"
}
}
}
],
"additionalProperties": false
}
},
"properties": {
"show": {
"properties": {
"name": {
"type": "string"
},
"selector": {
"description": "This property does not function correctly in VSCode",
"allOf": [
{
"$ref": "#/definitions/selector"
},
{
"additionalProperties": false
}
]
}
},
"additionalProperties": false
}
}
}
This works a treat in IntelliJ IDEA's JSON editor (2020.3.2 ultimate edition) when editing JSON against this schema (using a schema mapping). For example, the file ex-fail.json's content of:
{
"show": {
"name": "a",
"selector": {
"example": "a",
"operator": "IsNull",
"value": "false",
"d": "a"
}
}
}
Is correctly validated, simply highlighting "d" as not allowed, thus:
However, when I use the very same schema and JSON file with VSCode (1.53.2) with vanilla configuration (except for a schema mapping) VSCode erroneously marks "example", "operator", "value" and "d" as not allowed. It looks like this in the VSCode editor:
If I remove the additionalProperties definition from the show.selector property, both IDEA and VSCode indicate that all is well, including allowing the "d" property - in doing this I can simplify that property definition to:
"selector": {
"description": "This property does not function correctly in VSCode",
"$ref": "#/definitions/selector"
}
What can I do to the schema to support both IDEA and VSCode whilst disallowing additional properties where they should not appear?
PS: The schema mapping in VSCode is simply along the lines of:
{
"json.schemas": [
{
"fileMatch": [
"*/config/ex-*.json"
],
"url": "file:///C:/my/path/to/example-schema.json"
}
]
}
You cannot do what you ask with JSON Schema draft-07 or prior.
The reason is, when $ref is used in a schema object, all other properties MUST be ignored.
An object schema with a "$ref" property MUST be interpreted as a
"$ref" reference. The value of the "$ref" property MUST be a URI
Reference. Resolved against the current URI base, it identifies the
URI of a schema to use. All other properties in a "$ref" object MUST
be ignored.
https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-01#section-8.3
We changed this to not be the case for draft 2019-09.
It sounds like VSCode is merging the properties in applicators upwards to the nearest schema object (which is wrong), and IntelliJ IDEA is doing something similar but in a different way (which is also wrong).
The correct validation result for your schema and instance is VALID. See the live demo here: https://jsonschema.dev/s/C6ent
additionalProperties relies on the values of properties and patternProperties within the SAME schema object. It cannot "see through" applicators such as $ref and allOf.
For draft 2019-09, we added unevaluatedProperties, which CAN "see through" applicator keywords (although it's a little more complex than that).
Update:
After reviewing your update, sadly the same is still true.
One approach makes it sort of possible but involves some duplication, and only works when you control the schemas you are referencing.
You would need to redefine your selector property like this...
"selector": {
"description": "This property did not function correctly in VSCode",
"allOf": [
{
"$ref": "#/definitions/selector"
},
{
"properties": {
"operator": true,
"value": true,
"example": true,
"alternative": true
},
"additionalProperties": false
}
]
}
The values of a property object are schema values, and booleans are valid schemas. You don't need (or want to) deal with their validation here, only say these are the allowed ones, followed by no additionalProperties.
You'll also need to remove the additionalProperties: false from your definition of selector, as that is preventing ALL properties (which I now guess is why you saw that issue in one of the editors).
It involves some duplication, but is the only way I'm aware of that you can do this for draft-07 or previous. As I said, not a problem for draft 2019-09 or above due to new kewords.
additionalProperties is problematic because it depends on the properties and patternProperties. The result is that "additionalProperties": false effectively blocks schema composition. #Relequestual showed one alternative approach, here is another approach that is a little less verbose, but still requires duplication of property names.
draft-06 and up
{
"allOf": [{ "$ref": "#/definitions/base" }],
"properties": {
"bar": { "type": "number" }
},
"propertyNames": { "enum": ["foo", "bar"] },
"definitions": {
"base": {
"properties": {
"foo": { "type": "string" }
}
}
}
}

Swagger UI doesn't show embedded json properties model

I am using the swagger tool for documenting my Jersey based REST API (the swaggerui I am using was downloaded on June 2014 don't know if this issue has been fixed in later versions but as I made a lot of customization to its code so I don't have the option to download the latest without investing lot of time to customize it again).
So far and until now, all my transfer objects have one level deep properties (no embedded pojos). But now that I added some rest paths that are returning more complex objects (two levels of depth) I found that SwaggerUI is not expanding the JSON model schema when having embedded objects.
Here is the important part of the swagger doc:
...
{
"path": "/user/combo",
"operations": [{
"method": "POST",
"summary": "Inserts a combo (user, address)",
"notes": "Will insert a new user and a address definition in a single step",
"type": "UserAndAddressWithIdSwaggerDto",
"nickname": "insertCombo",
"consumes": ["application/json"],
"parameters": [{
"name": "body",
"description": "New user and address combo",
"required": true,
"type": "UserAndAddressWithIdSwaggerDto",
"paramType": "body",
"allowMultiple": false
}],
"responseMessages": [{
"code": 200,
"message": "OK",
"responseModel": "UserAndAddressWithIdSwaggerDto"
}]
}]
}
...
"models": {
"UserAndAddressWithIdSwaggerDto": {
"id": "UserAndAddressWithIdSwaggerDto",
"description": "",
"required": ["user",
"address"],
"properties": {
"user": {
"$ref": "UserDto",
"description": "User"
},
"address": {
"$ref": "AddressDto",
"description": "Address"
}
}
},
"UserDto": {
"id": "UserDto",
"properties": {
"userId": {
"type": "integer",
"format": "int64"
},
"name": {
"type": "string"
},...
},
"AddressDto": {
"id": "AddressDto",
"properties": {
"addressId": {
"type": "integer",
"format": "int64"
},
"street": {
"type": "string"
},...
}
}
...
The embedded objects are User and Address, their models are being created correctly as shown in the json response.
But when opening the SwaggerUI I can only see:
{
"user": "UserDto",
"address": "AddressDto"
}
But I should see something like:
{
"user": {
"userId": "integer",
"name": "string",...
},
"address": {
"addressId": "integer",
"street": "string",...
}
}
Something may be wrong in the code that expands the internal properties, the javascript console doesn't show any error so I assume this is a bug.
I found the solution, there is a a line of code that needs to be modified to make it work properly:
In the swagger.js file there is a getSampleValue function with a conditional checking for undefined:
SwaggerModelProperty.prototype.getSampleValue = function(modelsToIgnore) {
var result;
if ((this.refModel != null) && (modelsToIgnore[this.refModel.name] === 'undefined'))
...
I updated the equality check to (removing quotes):
modelsToIgnore[this.refModel.name] === undefined
After that, SwaggerUI is able to show the embedded models.