I am creating a DataFusion pipeline to ingest a CSV file from s3 bucket, applying wrangler directives and storing it in GCS bucket. The input CSV file had 18 columns. However, the output CSV file has only 8 columns. I have a doubt that this could be due to the CSV encoding format, but I am not sure. What could be the reason here?
Pipeline JSON
{
"name": "aws_fusion_v1",
"description": "Data Pipeline Application",
"artifact": {
"name": "cdap-data-pipeline",
"version": "6.1.2",
"scope": "SYSTEM"
},
"config": {
"resources": {
"memoryMB": 2048,
"virtualCores": 1
},
"driverResources": {
"memoryMB": 2048,
"virtualCores": 1
},
"connections": [
{
"from": "Amazon S3",
"to": "Wrangler"
},
{
"from": "Wrangler",
"to": "GCS2"
},
{
"from": "Argument Setter",
"to": "Amazon S3"
}
],
"comments": [],
"postActions": [],
"properties": {},
"processTimingEnabled": true,
"stageLoggingEnabled": true,
"stages": [
{
"name": "Amazon S3",
"plugin": {
"name": "S3",
"type": "batchsource",
"label": "Amazon S3",
"artifact": {
"name": "amazon-s3-plugins",
"version": "1.11.0",
"scope": "SYSTEM"
},
"properties": {
"format": "text",
"authenticationMethod": "Access Credentials",
"filenameOnly": "false",
"recursive": "false",
"ignoreNonExistingFolders": "false",
"schema": "{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"body\",\"type\":\"string\"}]}",
"referenceName": "aws_source",
"path": "${input.bucket}",
"accessID": "${input.access_id}",
"accessKey": "${input.access_key}"
}
},
"outputSchema": [
{
"name": "etlSchemaBody",
"schema": "{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"body\",\"type\":\"string\"}]}"
}
],
"type": "batchsource",
"label": "Amazon S3",
"icon": "icon-s3"
},
{
"name": "Wrangler",
"plugin": {
"name": "Wrangler",
"type": "transform",
"label": "Wrangler",
"artifact": {
"name": "wrangler-transform",
"version": "4.1.5",
"scope": "SYSTEM"
},
"properties": {
"field": "*",
"precondition": "false",
"threshold": "1",
"workspaceId": "804a2995-7c06-4ab2-b342-a9a01aa03a3d",
"schema": "${output.schema}",
"directives": "${directive}"
}
},
"outputSchema": [
{
"name": "etlSchemaBody",
"schema": "${output.schema}"
}
],
"inputSchema": [
{
"name": "Amazon S3",
"schema": "{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"body\",\"type\":\"string\"}]}"
}
],
"type": "transform",
"label": "Wrangler",
"icon": "icon-DataPreparation"
},
{
"name": "GCS2",
"plugin": {
"name": "GCS",
"type": "batchsink",
"label": "GCS2",
"artifact": {
"name": "google-cloud",
"version": "0.14.2",
"scope": "SYSTEM"
},
"properties": {
"project": "auto-detect",
"suffix": "yyyy-MM-dd-HH-mm",
"format": "csv",
"serviceFilePath": "auto-detect",
"location": "us",
"referenceName": "gcs_sink",
"path": "${output.path}",
"schema": "${output.schema}"
}
},
"outputSchema": [
{
"name": "etlSchemaBody",
"schema": "${output.schema}"
}
],
"inputSchema": [
{
"name": "Wrangler",
"schema": ""
}
],
"type": "batchsink",
"label": "GCS2",
"icon": "fa-plug"
},
{
"name": "Argument Setter",
"plugin": {
"name": "ArgumentSetter",
"type": "action",
"label": "Argument Setter",
"artifact": {
"name": "argument-setter-plugins",
"version": "1.1.1",
"scope": "USER"
},
"properties": {
"method": "GET",
"connectTimeout": "60000",
"readTimeout": "60000",
"numRetries": "0",
"followRedirects": "true",
"url": "${argfile}"
}
},
"outputSchema": [
{
"name": "etlSchemaBody",
"schema": ""
}
],
"type": "action",
"label": "Argument Setter",
"icon": "fa-plug"
}
],
"schedule": "0 * * * *",
"engine": "spark",
"numOfRecordsPreview": 100,
"description": "Data Pipeline Application",
"maxConcurrentRuns": 1
}
}
Edit:
The missing columns in the output file were due to spaces in the column names. But I am facing another issue. In wrangler, when I pass a directive as
"parse-as-csv :body ',' false", the output file is empty. But when I pass something like "parse-as-csv :body ',' true", the output file has all the data without header as expected.
Related
We are using ARM templates to deploy function apps but the slotSetting: true property is not respected and I cannot find any modern documentation as to how to make app settings slot specific.
THis is my app settings snippet in my ARM template
{
"name": "AzureWebJobs.HandleFiscalFrResponse.Disabled",
"value": "1",
"slotSetting": true
}
the setting and the value works but the slotSettings attribute is ignored silently, no error is shown its just ignored.
What is the correct way to make a function app setting slot specific?
I have reproduced the issue and able to resolve, please follow the below steps
Open VS code and create a file using .json extension and se the below code
Thanks #patelchandni for the ARM Template code.
My Filename.json
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"functionAppName": {
"type": "string",
"defaultValue": "[format('tar-{0}', uniqueString(resourceGroup().id))]"
},
"storageAccountType": {
"type": "string",
"defaultValue": "Standard_LRS",
"allowedValues": [
"Standard_LRS",
"Standard_GRS",
"Standard_RAGRS"
]
},
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]"
},
"appInsightsLocation": {
"type": "string",
"defaultValue": "[resourceGroup().location]"
},
"functionWorkerRuntime": {
"type": "string",
"defaultValue": "node",
"allowedValues": [
"dotnet",
"node",
"python",
"java"
]
},
"functionPlanOS": {
"type": "string",
"defaultValue": "Windows",
"allowedValues": [
"Windows",
"Linux"
]
},
"functionAppPlanSku": {
"type": "string",
"defaultValue": "EP1",
"allowedValues": [
"EP1",
"EP2",
"EP3"
]
},
"linuxFxVersion": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "Only required for Linux app to represent runtime stack in the format of 'runtime|runtimeVersion'. For example: 'python|3.9'"
}
}
},
"variables": {
"hostingPlanName": "[parameters('functionAppName')]",
"applicationInsightsName": "[parameters('functionAppName')]",
"storageAccountName": "[concat(uniquestring(resourceGroup().id), 'azfunctions')]",
"isReserved": "[if(equals(parameters('functionPlanOS'), 'Linux'), true(), false())]",
"slotContentShareName": "[concat(parameters('functionAppName'), '-deployment')]"
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"apiVersion": "2021-02-01",
"name": "[variables('storageAccountName')]",
"location": "[parameters('location')]",
"sku": {
"name": "[parameters('storageAccountType')]"
},
"kind": "Storage"
},
{
"type": "Microsoft.Web/serverfarms",
"apiVersion": "2021-02-01",
"name": "[variables('hostingPlanName')]",
"location": "[parameters('location')]",
"sku": {
"tier": "ElasticPremium",
"name": "[parameters('functionAppPlanSku')]",
"family": "EP"
},
"properties": {
"maximumElasticWorkerCount": 20,
"reserved": "[variables('isReserved')]"
},
"kind": "elastic"
},
{
"type": "microsoft.insights/components",
"apiVersion": "2020-02-02",
"name": "[variables('applicationInsightsName')]",
"location": "[parameters('appInsightsLocation')]",
"tags": {
"[concat('hidden-link:', resourceId('Microsoft.Web/sites', variables('applicationInsightsName')))]": "Resource"
},
"properties": {
"Application_Type": "web"
},
"kind": "web"
},
{
"type": "Microsoft.Web/sites",
"apiVersion": "2021-02-01",
"name": "[parameters('functionAppName')]",
"location": "[parameters('location')]",
"kind": "[if(variables('isReserved'), 'functionapp,linux', 'functionapp')]",
"dependsOn": [
"[resourceId('Microsoft.Web/serverfarms', variables('hostingPlanName'))]",
"[resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName'))]",
"[resourceId('Microsoft.Insights/components', variables('applicationInsightsName'))]"
],
"properties": {
"reserved": "[variables('isReserved')]",
"serverFarmId": "[resourceId('Microsoft.Web/serverfarms', variables('hostingPlanName'))]",
"siteConfig": {
"linuxFxVersion": "[if(variables('isReserved'), parameters('linuxFxVersion'), json('null'))]",
"appSettings": [
{
"name": "APPINSIGHTS_INSTRUMENTATIONKEY",
"value": "[reference(resourceId('microsoft.insights/components', variables('applicationInsightsName')), '2015-05-01').InstrumentationKey]"
},
{
"name": "AzureWebJobsStorage",
"value": "[concat('DefaultEndpointsProtocol=https;AccountName=', variables('storageAccountName'), ';EndpointSuffix=', environment().suffixes.storage, ';AccountKey=',listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName')), '2019-06-01').keys[0].value)]"
},
{
"name": "WEBSITE_CONTENTAZUREFILECONNECTIONSTRING",
"value": "[concat('DefaultEndpointsProtocol=https;AccountName=', variables('storageAccountName'), ';EndpointSuffix=', environment().suffixes.storage, ';AccountKey=',listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName')), '2019-06-01').keys[0].value)]"
},
{
"name": "WEBSITE_CONTENTSHARE",
"value": "[toLower(parameters('functionAppName'))]"
},
{
"name": "FUNCTIONS_EXTENSION_VERSION",
"value": "~4"
},
{
"name": "FUNCTIONS_WORKER_RUNTIME",
"value": "[parameters('functionWorkerRuntime')]"
},
{
"name": "WEBSITE_NODE_DEFAULT_VERSION",
"value": "~14"
}
]
}
}
},
{
"type": "Microsoft.Web/sites/slots",
"apiVersion": "2021-02-01",
"name": "[concat(parameters('functionAppName'), '/deployment')]",
"kind": "[if(variables('isReserved'), 'functionapp,linux', 'functionapp')]",
"location": "[parameters('location')]",
"dependsOn": [
"[resourceId('Microsoft.Web/sites', parameters('functionAppName'))]"
],
"properties": {
"reserved": "[variables('isReserved')]",
"serverFarmId": "[resourceId('Microsoft.Web/serverfarms', variables('hostingPlanName'))]",
"siteConfig": {
"linuxFxVersion": "[if(variables('isReserved'), parameters('linuxFxVersion'), json('null'))]",
"appSettings": [
{
"name": "APPINSIGHTS_INSTRUMENTATIONKEY",
"value": "[reference(resourceId('microsoft.insights/components', variables('applicationInsightsName')), '2015-05-01').InstrumentationKey]"
},
{
"name": "AzureWebJobsStorage",
"value": "[concat('DefaultEndpointsProtocol=https;AccountName=', variables('storageAccountName'), ';EndpointSuffix=', environment().suffixes.storage, ';AccountKey=',listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName')), '2019-06-01').keys[0].value)]"
},
{
"name": "WEBSITE_CONTENTAZUREFILECONNECTIONSTRING",
"value": "[concat('DefaultEndpointsProtocol=https;AccountName=', variables('storageAccountName'), ';EndpointSuffix=', environment().suffixes.storage, ';AccountKey=',listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName')), '2019-06-01').keys[0].value)]"
},
{
"name": "WEBSITE_CONTENTSHARE",
"value": "[variables('slotContentShareName')]"
},
{
"name": "FUNCTIONS_EXTENSION_VERSION",
"value": "~4"
},
{
"name": "FUNCTIONS_WORKER_RUNTIME",
"value": "[parameters('functionWorkerRuntime')]"
},
{
"name": "WEBSITE_NODE_DEFAULT_VERSION",
"value": "~14"
}
]
}
}
}
]
}
Click on the below marked one to create parameter file
Click new as shown in below picture
Use the below code in FileName.parameters.json
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"functionAppName": {
"value": "zapper01"
},
"storageAccountType": {
"value": "Standard_LRS"
},
"location": {
"value": "EastUS"
},
"appInsightsLocation": {
"value": "EastUS"
},
"functionWorkerRuntime": {
"value": "node"
},
"functionPlanOS": {
"value": "Windows"
},
"functionAppPlanSku": {
"value": "EP1"
},
"linuxFxVersion": {
"value": "3.9"
}
}
}
To login to azure portal, run the PowerShell command
az login
Set subscription by using
az account set --subscription "Subscription ID xxxxxx-xxxxxxx-xxxxxxx-xxxxx"
Deploy to azure portal
New-AzResourceGroupDeployment -ResourceGroupName "ResourceGroupName" -TemplateFile "FileName.json" -TemplateParameterFile "Filename.parameters.json"
After execution you will get below one as result in power shall
After deploying into azure portal open the function app and select deployment slot
Update
I have deployed the below code in Custom deployment In Azure portal
Thanks, #seligj95 for the ARM Template code.
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"baseResourceName": {
"type": "string",
"metadata": {
"description": "Name of the resource"
},
"maxLength": 15
},
"appSettingName": {
"type": "string",
"metadata": {
"description": "Name of the app setting"
},
"maxLength": 24
},
"environments": {
"defaultValue": [
"Dev",
"QA",
"UAT",
"Preview"
],
"type": "array",
"metadata": {
"description": "Array with the names for the environment slots"
},
"maxLength": 19
},
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]",
"metadata": {
"description": "Location for all resources."
}
}
},
"variables": {
"standardPlanMaxAdditionalSlots": 4,
"webAppPortalName": "[concat(parameters('baseResourceName'), 'Portal')]",
"appServicePlanName": "[concat('AppServicePlan-', parameters('baseResourceName'))]",
"stickyAppSettingName": "[concat(parameters('appSettingName'), '-sticky')]"
},
"resources": [
{
"apiVersion": "2020-06-01",
"type": "Microsoft.Web/serverfarms",
"kind": "app",
"name": "[variables('appServicePlanName')]",
"location": "[parameters('location')]",
"comments": "This app service plan is used for the web app and slots.",
"tags": {
"displayName": "AppServicePlan"
},
"properties": { },
"sku": {
"name": "[if(lessOrEquals(length(parameters('environments')), variables('standardPlanMaxAdditionalSlots')), 'S1', 'P1')]"
}
},
{
"apiVersion": "2020-06-01",
"type": "Microsoft.Web/sites",
"kind": "app",
"name": "[variables('webAppPortalName')]",
"location": "[parameters('location')]",
"comments": "This is the web app, also the default 'nameless' slot.",
"tags": {
"displayName": "WebApp"
},
"properties": {
"serverFarmId": "[resourceId('Microsoft.Web/serverfarms', variables('appServicePlanName'))]",
"siteConfig": {
"appSettings": [
{
"name": "[parameters('appSettingName')]",
"value": "value"
},
{
"name": "[variables('stickyAppSettingName')]",
"value": "value"
}
]
}
},
"dependsOn": [
"[resourceId('Microsoft.Web/serverfarms', variables('appServicePlanName'))]"
]
},
{
"apiVersion": "2020-06-01",
"type": "Microsoft.Web/sites/slots",
"name": "[concat(variables('webAppPortalName'), '/', parameters('environments')[copyIndex()])]",
"kind": "app",
"location": "[parameters('location')]",
"comments": "This specifies the web app slots.",
"tags": {
"displayName": "WebAppSlots"
},
"properties": {
"serverFarmId": "[resourceId('Microsoft.Web/serverfarms', variables('appServicePlanName'))]"
},
"dependsOn": [
"[resourceId('Microsoft.Web/Sites', variables('webAppPortalName'))]"
],
"copy": {
"name": "webPortalSlot",
"count": "[length(parameters('environments'))]"
}
},
{
"apiVersion": "2020-06-01",
"name": "[concat(variables('webAppPortalName'), '/slotconfignames')]",
"type": "Microsoft.Web/sites/config",
"comments": "This specifies the sticky (slot setting) application settings.",
"dependsOn": [
"[resourceId('Microsoft.Web/Sites', variables('webAppPortalName'))]"
],
"properties": {
"appSettingNames": [
"[variables('stickyAppSettingName')]"
]
}
}
]
}
I have selected the second slot as specific sticky slot in arm template, So the second slot is deployed
I'm trying to deploy an Azure Data Factory service using Azure Resource Manager Templates but getting an error The request content was invalid and could not be desterilized:
'Could not find member 'name' on object of type 'Template'. Path 'properties.template.name', line 1, position 34.'.
You can use the below ARM template for creating a Azure storage account and Azure data factory and linking both of them via linked service:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.4.1.14562",
"templateHash": "8367564219536411224"
}
},
"parameters": {
"dataFactoryName": {
"type": "string",
"defaultValue": "[format('datafactory{0}', uniqueString(resourceGroup().id))]",
"metadata": {
"description": "Data Factory Name"
}
},
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]",
"metadata": {
"description": "Location of the data factory."
}
},
"storageAccountName": {
"type": "string",
"defaultValue": "[format('storage{0}', uniqueString(resourceGroup().id))]",
"metadata": {
"description": "Name of the Azure storage account that contains the input/output data."
}
},
"blobContainerName": {
"type": "string",
"defaultValue": "[format('blob{0}', uniqueString(resourceGroup().id))]",
"metadata": {
"description": "Name of the blob container in the Azure Storage account."
}
}
},
"functions": [],
"variables": {
"dataFactoryLinkedServiceName": "ArmtemplateStorageLinkedService",
"dataFactoryDataSetInName": "ArmtemplateTestDatasetIn",
"dataFactoryDataSetOutName": "ArmtemplateTestDatasetOut",
"pipelineName": "ArmtemplateSampleCopyPipeline"
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"apiVersion": "2021-04-01",
"name": "[parameters('storageAccountName')]",
"location": "[parameters('location')]",
"sku": {
"name": "Standard_LRS"
},
"kind": "StorageV2"
},
{
"type": "Microsoft.Storage/storageAccounts/blobServices/containers",
"apiVersion": "2021-04-01",
"name": "[format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName'))]",
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
]
},
{
"type": "Microsoft.DataFactory/factories",
"apiVersion": "2018-06-01",
"name": "[parameters('dataFactoryName')]",
"location": "[parameters('location')]",
"identity": {
"type": "SystemAssigned"
}
},
{
"type": "Microsoft.DataFactory/factories/linkedservices",
"apiVersion": "2018-06-01",
"name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('dataFactoryLinkedServiceName'))]",
"properties": {
"type": "AzureBlobStorage",
"typeProperties": {
"connectionString": "[format('DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}', parameters('storageAccountName'), listKeys(resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName')), '2021-04-01').keys[0].value)]"
}
},
"dependsOn": [
"[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
"[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
]
},
{
"type": "Microsoft.DataFactory/factories/datasets",
"apiVersion": "2018-06-01",
"name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('dataFactoryDataSetInName'))]",
"properties": {
"linkedServiceName": {
"referenceName": "[variables('dataFactoryLinkedServiceName')]",
"type": "LinkedServiceReference"
},
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "[format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName'))]",
"folderPath": "input",
"fileName": "emp.txt"
}
}
},
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts/blobServices/containers', split(format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName')), '/')[0], split(format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName')), '/')[1], split(format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName')), '/')[2])]",
"[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
"[resourceId('Microsoft.DataFactory/factories/linkedservices', parameters('dataFactoryName'), variables('dataFactoryLinkedServiceName'))]"
]
},
{
"type": "Microsoft.DataFactory/factories/datasets",
"apiVersion": "2018-06-01",
"name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('dataFactoryDataSetOutName'))]",
"properties": {
"linkedServiceName": {
"referenceName": "[variables('dataFactoryLinkedServiceName')]",
"type": "LinkedServiceReference"
},
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "[format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName'))]",
"folderPath": "output"
}
}
},
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts/blobServices/containers', split(format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName')), '/')[0], split(format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName')), '/')[1], split(format('{0}/default/{1}', parameters('storageAccountName'), parameters('blobContainerName')), '/')[2])]",
"[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
"[resourceId('Microsoft.DataFactory/factories/linkedservices', parameters('dataFactoryName'), variables('dataFactoryLinkedServiceName'))]"
]
},
{
"type": "Microsoft.DataFactory/factories/pipelines",
"apiVersion": "2018-06-01",
"name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('pipelineName'))]",
"properties": {
"activities": [
{
"name": "MyCopyActivity",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "BinarySource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true
}
},
"sink": {
"type": "BinarySink",
"storeSettings": {
"type": "AzureBlobStorageWriterSettings"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "[variables('dataFactoryDataSetInName')]",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "[variables('dataFactoryDataSetOutName')]",
"type": "DatasetReference"
}
]
}
]
},
"dependsOn": [
"[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
"[resourceId('Microsoft.DataFactory/factories/datasets', parameters('dataFactoryName'), variables('dataFactoryDataSetInName'))]",
"[resourceId('Microsoft.DataFactory/factories/datasets', parameters('dataFactoryName'), variables('dataFactoryDataSetOutName'))]"
]
}
]
}
Reference:
Create an Azure Data Factory using an Azure Resource Manager template (ARM template) - Azure Data Factory | Microsoft Docs
I am new to ADF & ARM. I have a blank Data Factory-v2(TestDataFactory-123Test) which I want to get it populated using an existing ADF(TestDataFactory-123). I followed step by step what is mentioned in the official documentation Create a Resource Manager template for each environment. The deployment shows succeeded but I can't see anything in it. I used 'Build your own template in the editor' option in the portal for importing the existing ARM template. Am I missing anything?
Below is the ARM which I got by 'exporting' the ARM for TestDataFactory-123:
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"factoryName": {
"type": "string",
"metadata": "Data Factory name",
"defaultValue": "TestDataFactory-123"
},
"AzureBlobStorageLinkedService_connectionString": {
"type": "secureString",
"metadata": "Secure string for 'connectionString' of 'AzureBlobStorageLinkedService'",
"defaultValue": "TestDataFactory-123"
}
},
"variables": {
"factoryId": "[concat('Microsoft.DataFactory/factories/', parameters('factoryName'))]"
},
"resources": [
{
"name": "[concat(parameters('factoryName'), '/AzureBlobStorageLinkedService')]",
"type": "Microsoft.DataFactory/factories/linkedServices",
"apiVersion": "2018-06-01",
"properties": {
"annotations": [],
"type": "AzureBlobStorage",
"typeProperties": {
"connectionString": "[parameters('AzureBlobStorageLinkedService_connectionString')]"
}
},
"dependsOn": []
},
{
"name": "[concat(parameters('factoryName'), '/InputDataset')]",
"type": "Microsoft.DataFactory/factories/datasets",
"apiVersion": "2018-06-01",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorageLinkedService",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "emp.txt",
"folderPath": "input",
"container": "adftutorial"
}
}
},
"dependsOn": [
"[concat(variables('factoryId'), '/linkedServices/AzureBlobStorageLinkedService')]"
]
},
{
"name": "[concat(parameters('factoryName'), '/OutputDataset')]",
"type": "Microsoft.DataFactory/factories/datasets",
"apiVersion": "2018-06-01",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorageLinkedService",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"folderPath": "output",
"container": "adftutorial"
}
}
},
"dependsOn": [
"[concat(variables('factoryId'), '/linkedServices/AzureBlobStorageLinkedService')]"
]
},
{
"name": "[concat(parameters('factoryName'), '/CopyPipeline')]",
"type": "Microsoft.DataFactory/factories/pipelines",
"apiVersion": "2018-06-01",
"properties": {
"activities": [
{
"name": "CopyFromBlobToBlob",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "BinarySource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true
}
},
"sink": {
"type": "BinarySink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "InputDataset",
"type": "DatasetReference",
"parameters": {}
}
],
"outputs": [
{
"referenceName": "OutputDataset",
"type": "DatasetReference",
"parameters": {}
}
]
}
],
"annotations": []
},
"dependsOn": [
"[concat(variables('factoryId'), '/datasets/InputDataset')]",
"[concat(variables('factoryId'), '/datasets/OutputDataset')]"
]
}
]
}
The fix was as simple as replacing the 'defaultValue' for the 'factoryName' parameter with the name of the empty data factory viz. 'TestDataFactory-123Test' and not the existing one 'TestDataFactory-123'! Also, I replaced the 'defaultValue' of the 'AzureBlobStorageLinkedService_connectionString' parameter with the actual connection string.
I am getting crazy on this issue, I am running an Azure data factory V1, I need to schedule a copy job every week from 01/03/2009 through 01/31/2009, so I defined this schedule on the pipeline:
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-31T00:00:00Z",
"isPaused": false,
monitoring the pipeline, the data factory schedule on these date:
12/29/2008
01/05/2009
01/12/2009
01/19/2009
01/26/2009
instead of this wanted schedule:
01/03/2009
01/10/2009
01/17/2009
01/24/2009
01/31/2009
why the starting date defined on the pipeline doesn't correspond to the schedule date on the monitor?
Many thanks!
Here is the JSON Pipeline:
{
"name": "CopyPipeline-blob2datalake",
"properties": {
"description": "copy from blob storage to datalake directory structure",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "script/dat230.usql",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {
"salesfile": "$$Text.Format('/DAT230/{0:yyyy}/{0:MM}/{0:dd}.txt', Date.StartOfDay (SliceStart))",
"lineitemsfile": "$$Text.Format('/dat230/dataloads/{0:yyyy}/{0:MM}/{0:dd}/factinventory/fact.csv', Date.StartOfDay (SliceStart))"
}
},
"inputs": [
{
"name": "InputDataset-dat230"
}
],
"outputs": [
{
"name": "OutputDataset-dat230"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"retry": 1
},
"scheduler": {
"frequency": "Day",
"interval": 7
},
"name": "DataLakeAnalyticsUSqlActivityTemplate",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-11T00:00:00Z",
"isPaused": false,
"hubName": "edxlearningdf_hub",
"pipelineMode": "Scheduled"
}
}
and here the datasets:
{
"name": "InputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "Source-BlobStorage-dat230",
"typeProperties": {
"fileName": "*.txt.gz",
"folderPath": "dat230/{year}/{month}/{day}/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t",
"firstRowAsHeader": true
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
],
"compression": {
"type": "GZip"
}
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": true,
"policy": {}
}
}
{
"name": "OutputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "Destination-DataLakeStore-dat230",
"typeProperties": {
"fileName": "txt.gz",
"folderPath": "dat230/dataloads/{year}/{month}/{day}/factinventory/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t"
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
]
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": false,
"policy": {}
}
}
You need to look at the time slices for the datasets and there activity.
The pipeline schedule (badly named) only defines the start and end period in which any activities can use to provision and run there time slices.
ADFv1 doesn't use a recursive schedule like the SQL Server Agent. Each execution has to be provisioned at an interval on the time line (the schedule) you create.
For example, if you pipeline start and end is for 1 year. But your dataset and activity has a frequency of monthly and interval of 1 month you will only get 12 executions of the whatever is happening.
Apologies, but the concept of time slices is a little difficult to explain if you aren't already familiar. Maybe read this post: https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/
Hope this helps.
Would you share with us the json for the datasets and the pipeline? It would be easier to help you having that.
In the meanwhile, check if you are using "style": "StartOfInterval" at the scheduler property of the activity, and also check if you are using an offset.
Cheers!
I am using jira rest api's in my application.
I have found the api for getting the meta-data for creating jira issue but that API doesn't return default values of the fields for example :-
This is the request :-
http://kelpie9:8081/rest/api/latest/issue/createmeta?projectKeys=QA&issuetypeNames=Bug&expand=project.issuetypes.fields
the default value of priority field is set to "major" and the description of priority is also customized but the return from api is:-
{
"expand": "projects",
"projects": [
{
"expand": "issuetypes",
"self": "http://kelpie9:8081/rest/api/2/project/QA",
"id": "10010",
"key": "QA",
"name": "QA",
"avatarUrls": {
"16x16": "http://kelpie9:8081/secure/projectavatar?size=small&pid=10010&avatarId=10011",
"48x48": "http://kelpie9:8081/secure/projectavatar?pid=10010&avatarId=10011"
},
"issuetypes": [
{
"expand": "fields",
"self": "http://kelpie9:8081/rest/api/2/issuetype/1",
"id": 1,
"name": "Bug",
"iconUrl": "http://kelpie9:8081/images/icons/bug.gif",
"fields": {
"summary": {
"required": true,
"schema": {
"type": "string",
"system": "summary"
},
"operations": [
"set"
]
},
"timetracking": {
"required": false,
"operations": [ ]
},
"issuetype": {
"required": true,
"schema": {
"type": "issuetype",
"system": "issuetype"
},
"operations": [ ],
"allowedValues": [
{
"id": "1",
"name": "Bug",
"description": "A problem which impairs or prevents the functions of the product.",
"iconUrl": "http://kelpie9:8081/images/icons/bug.gif"
}
]
},
"priority": {
"required": false,
"schema": {
"type": "priority",
"system": "priority"
},
"name": "Priority",
"operations": [
"set"
],
"allowedValues": [
{
"self": "http://172.19.30.101:18080/rest/api/2/priority/1",
"iconUrl": "http://172.19.30.101:18080/images/icons/priority_blocker.gif",
"name": "Blocker",
"id": "1"
},
{
"self": "http://172.19.30.101:18080/rest/api/2/priority/2",
"iconUrl": "http://172.19.30.101:18080/images/icons/priority_critical.gif",
"name": "Critical",
"id": "2"
},
{
"self": "http://172.19.30.101:18080/rest/api/2/priority/3",
"iconUrl": "http://172.19.30.101:18080/images/icons/priority_major.gif",
"name": "Major",
"id": "3"
},
{
"self": "http://172.19.30.101:18080/rest/api/2/priority/4",
"iconUrl": "http://172.19.30.101:18080/images/icons/priority_minor.gif",
"name": "Minor",
"id": "4"
},
{
"self": "http://172.19.30.101:18080/rest/api/2/priority/5",
"iconUrl": "http://172.19.30.101:18080/images/icons/priority_trivial.gif",
"name": "Trivial",
"id": "5"
}
]
},
"customfield_10080": {
"required": false,
"schema": {
"type": "array",
"items": "string",
"custom": "com.atlassian.jira.plugin.system.customfieldtypes:labels",
"customId": 10080
},
"operations": [ ]
},
"customfield_10010": {
"required": false,
"schema": {
"type": "array",
"items": "string",
"custom": "com.atlassian.jira.plugin.system.customfieldtypes:labels",
"customId": 10010
},
"operations": [ ]
},
"customfield_10071": {
"required": false,
"schema": {
"type": "array",
"items": "string",
"custom": "com.atlassian.jira.plugin.system.customfieldtypes:textfield",
"customId": 10071
},
"operations": [ ]
}
}
}
]
}
]
}
There is nothing like default value or description in priority field, how will I get those values?