Use azure data factory Updating Azure Machine Learning models - azure-data-factory

When I use data factory to update Azure ML models like the document said (https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-ml-update-resource-activity),
I faced one problem:
The blob reference: test/model.ilearner has an invalid or missing file extension. Supported file extensions for this output type are: ".csv, .tsv, .arff".'.
I have searched the problem and found this solution:
https://disqus.com/home/discussion/thewindowsazureproductsite/data_factory_create_predictive_pipelines_using_data_factory_and_machine_learning_microsoft_azure/ .
But my linked service for the outputs of training service pipeline and update service pipeline are already different.
How can I solve this problem?

Related

ADF Copy data from Azure Data Bricks Delta Lake to Azure Sql Server

I'm trying to use the data copy activity to extract information from azure databricks delta lake, but I've noticed that it doesn't pass the information directly from the delta lake to the SQL server I need, but must pass it to an azure blob storage, when running it, it throws the following error
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key Caused by: Invalid configuration value detected for fs.azure.account.key
Looking for information I found a possible solution but it didn't work.
Invalid configuration value detected for fs.azure.account.key copy activity fails
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
These are some images of the structure that I have in ADF:
In the image I get a message that tells me that I must have a Storage Account to continue
These are the configuration images, and execution failed:
Conf:
Fail:
Thank you very much
The solution for this problem was the following:
Correct the way the Storage Access Key configuration was being defined:
in the instruction: spark.hadoop.fs.azure.account.key..blob.core.windows.net
The following change must be made:
spark.hadoop.fs.azure.account.key.
storageaccountname.dfs.core.windows.net
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
To achieve Above scenario, follow below steps:
First go to your Databricks cluster Edit it and under Advance options >> spark >> spark config Add below code if you are using blob storage.
spark.hadoop.fs.azure.account.key.<storageaccountname>.blob.core.windows.net <Accesskey>
spark.databricks.delta.optimizeWrite.enabled true
spark.databricks.delta.autoCompact.enabled true
After that as you are using SQL Database as a sink.
Enable staging and give same blob storage account linked service as Staging account linked service give storage path from your blob storage.
And then debug it. make sure you complete Prerequisites from official document.
My sample Input:
Output in SQL:

Azure Data Factory CICD error: The document creation or update failed because of invalid reference

All, when running a build pipeline using Azure Devops with ARM template, the process is consistently failing when trying to deploy a dataset or a reference to a dataset with this error:
ARM Template deployment: Resource Group scope (AzureResourceManagerTemplateDeployment)
BadRequest: The document creation or update failed because of invalid reference 'dataset_1'.
I've tried renaming the dataset and also recreating it to see if that would help.
I then deleted the dataset_1.json file from the repo and still get the same message so it's some reference to this dataset and not the dataset itself I think. I've looked through all the other files for references to this but they all look fine.
Any ideas on how to troubleshoot this?
thanks
try this
Looks like you have created 'myTestLinkedService' linked service, tested connection but haven't published it yet and trying to reference that linked service in the new dataset that you are trying to create using Powershell.
In order to reference any data factory entity from Powershell, please make sure those entities are published first. Please try publishing the linked service first from the portal and then try to run your Powershell script to create the new dataset/actvitiy.
I think I found the issue. When I went into the detailed logs I found that in addition to this error there was an error message about an invalid SQL connection string, so I though it may be related since the dataset in question uses Azure SQL database linked service.
I adjusted the connection string and this seems to have solved the issue.

Azure Data Factory not Using Data Flow Runtime

I have an Azure Data Factory with a pipeline that I'm using to pick up data from an on-premise database and copy to CosmosDB in the cloud. I'm using a data flow step at the end to delete documents that don't exist in the source from the sink.
I have 3 integration runtimes set up:
AutoResolveIntegrationRuntime (default set up by Azure)
Self hosted integration runtime (I set this up to connect to the on-premise database so it's used by the source dataset)
Data flow integration runtime (I set this up to be used by the data flow step with a TTL setting)
The issue I'm seeing is when I trigger the pipeline the AutoResolveIntegrationRuntime is the one being used so I'm not getting the optimisation that I need from the Data flow integration runtime with the TTL.
Any thoughts on what might be going wrong here?
Per my experience, only the AutoResolveIntegrationRuntime (default set up by Azure) supports the optimization:
When we choose the data flow run on non-default integration, there isn't the optimization:
And once the integration runtime created, we also couldn't change the settings:
Data Factory documents didn't talk more about this. When I run the pipeline, I found that the dataflowruntime won't work:
That means that no matter which integration runtime you used to connect to the dataset, data low will always use the Azure Default integration runtime.
SHIR doesnt support dataflow execution.

Release Pipeline error when using Azure Dacpac Task

I'm new to using Azure release pipelines and have been fighting issues trying to deploy a database project to a new Azure SQL database. Currently the pipeline is giving me the following error...
TargetConnectionString argument cannot be used in conjunction with any other Target database arguments
I've tried deploying with and without the TargetConnectionString included in my publish profile. Any suggestions or something else to try? I'm out of ideas.
TargetConnectionString
Specifies a valid SQL Server/Azure connection string to the target database. If this parameter is specified it shall be used exclusively of all other target parameters. (short form /tcs)
So please remove all other TargetXXX arguments.
(if you don't have them can you show what arguments you have inline and in publish profile - of course without data)

Data Factory job failing to submit to Azure Batch?

I'm trying to submit a Data Factory pipeline to Azure Batch compute, with a linked service that I have previously been using and works fine.
However, the pipeline is failing with the following message:
Azure Batch entity not found. Code: 'JobNotFound' Message: 'The specified job does not exist. RequestId:d8f2b8d6-b34b-4823-9a06-9037ff549185 Time:2016-05-26T10:21:43.1480686Z'
The two sentences seem inconsistent, one states that the batch entity wasn't found, thought code says JobNotFound, which is referring to a Azure Batch Job.
Would appreciate help.
I fixed the problem by deleting the batch account, and creating a new one with a different name.
Could not figure out what was causing the issue.