Azure Data Factory not Using Data Flow Runtime - azure-data-factory

I have an Azure Data Factory with a pipeline that I'm using to pick up data from an on-premise database and copy to CosmosDB in the cloud. I'm using a data flow step at the end to delete documents that don't exist in the source from the sink.
I have 3 integration runtimes set up:
AutoResolveIntegrationRuntime (default set up by Azure)
Self hosted integration runtime (I set this up to connect to the on-premise database so it's used by the source dataset)
Data flow integration runtime (I set this up to be used by the data flow step with a TTL setting)
The issue I'm seeing is when I trigger the pipeline the AutoResolveIntegrationRuntime is the one being used so I'm not getting the optimisation that I need from the Data flow integration runtime with the TTL.
Any thoughts on what might be going wrong here?

Per my experience, only the AutoResolveIntegrationRuntime (default set up by Azure) supports the optimization:
When we choose the data flow run on non-default integration, there isn't the optimization:
And once the integration runtime created, we also couldn't change the settings:
Data Factory documents didn't talk more about this. When I run the pipeline, I found that the dataflowruntime won't work:
That means that no matter which integration runtime you used to connect to the dataset, data low will always use the Azure Default integration runtime.

SHIR doesnt support dataflow execution.

Related

ADF Copy data from Azure Data Bricks Delta Lake to Azure Sql Server

I'm trying to use the data copy activity to extract information from azure databricks delta lake, but I've noticed that it doesn't pass the information directly from the delta lake to the SQL server I need, but must pass it to an azure blob storage, when running it, it throws the following error
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key Caused by: Invalid configuration value detected for fs.azure.account.key
Looking for information I found a possible solution but it didn't work.
Invalid configuration value detected for fs.azure.account.key copy activity fails
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
These are some images of the structure that I have in ADF:
In the image I get a message that tells me that I must have a Storage Account to continue
These are the configuration images, and execution failed:
Conf:
Fail:
Thank you very much
The solution for this problem was the following:
Correct the way the Storage Access Key configuration was being defined:
in the instruction: spark.hadoop.fs.azure.account.key..blob.core.windows.net
The following change must be made:
spark.hadoop.fs.azure.account.key.
storageaccountname.dfs.core.windows.net
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
To achieve Above scenario, follow below steps:
First go to your Databricks cluster Edit it and under Advance options >> spark >> spark config Add below code if you are using blob storage.
spark.hadoop.fs.azure.account.key.<storageaccountname>.blob.core.windows.net <Accesskey>
spark.databricks.delta.optimizeWrite.enabled true
spark.databricks.delta.autoCompact.enabled true
After that as you are using SQL Database as a sink.
Enable staging and give same blob storage account linked service as Staging account linked service give storage path from your blob storage.
And then debug it. make sure you complete Prerequisites from official document.
My sample Input:
Output in SQL:

boltdb scramble for MOCK / DEV Purposes

I have SOAR that uses boltDB to host it's incidents.
I want to take that boltDB copy over to DEV environment and leverage its data without compromising PROD data.
New to BoltDB; are there tools available for me to review / query bolt DB database. Ultimately looking to see if I can script a solution to scramble certain values within the boltDB?

Release Pipeline error when using Azure Dacpac Task

I'm new to using Azure release pipelines and have been fighting issues trying to deploy a database project to a new Azure SQL database. Currently the pipeline is giving me the following error...
TargetConnectionString argument cannot be used in conjunction with any other Target database arguments
I've tried deploying with and without the TargetConnectionString included in my publish profile. Any suggestions or something else to try? I'm out of ideas.
TargetConnectionString
Specifies a valid SQL Server/Azure connection string to the target database. If this parameter is specified it shall be used exclusively of all other target parameters. (short form /tcs)
So please remove all other TargetXXX arguments.
(if you don't have them can you show what arguments you have inline and in publish profile - of course without data)

Use AWS Amplify and App Sync with existing Node Server using Mongodb

Currently, I'm developing a native application using React-Native. I've decided to go with AWS Amplify because of it's real time updates as well as its authentication.
I also have a Web Application that runs on a Node.js with Epxress server. This web application connects to a Mongo database.
My big problem is that I would like to have all of my aws amplify queries run to my existing MongoDb instead of a new dynamoDb database which is provided with AWS AppSync, but unfortunately I dont know where to start. This is especially helpful in adding authentication easily in my existing web application as well.
My first idea was to just create all my API endpoints in a new node js server and have app sync call to these API end points, but I'm not sure how to implement calling end points on an existing server (and this seems kind of counter intuitive to the 'serverless' idea)
My other idea came from this: Can AWS App-Sync be used without dynamoDB
This states to use AWS Lambda to 'pipeline' my data to the existing mongodb, but I'm not really sure what that entails.
TL;DR - I would like to be able to query an existing Mongodb instead of using DynamoDb when using AWS Amplify with AppSync.
I hope this is clear enough and doesn't sound like I'm rambling. Thanks in advance!
I would suggest using either an HTTP datasource to connect to your MongoDB backend or a Lambda function. Here are a couple getting started tutorials for both:
https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-http-resolvers.html
https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-lambda-resolvers.html
If you go the Lambda route, then you can leverage the new #function feature of the GraphQL Transformer in the Amplify CLI: https://aws-amplify.github.io/docs/cli/graphql#function

Not Able to Publish ADF Incremental Package

As Earlier Posted a thread for syncing Data from Premises Mysql to Azure SQL over here referring this article, and found that lookup component for watermark detection is only available for SQL Server Only.
So tried a work Around, that while using "Copy" Data Flow task ,will pick data greater than last watermark stored from Mysql.
Issue:
Able to validate package successfully but not able to publish same.
Question :
In Copy Data Flow Task i'm using below query to get data from MySql greater than watermark available.
Can't we use Query like below on other relational sources like Mysql
select * from #{item().TABLE_NAME} where #{item().WaterMark_Column} > '#{activity('LookupOldWaterMark').output.firstRow.WatermarkValue}'
CopyTask SQL Query Preview
Validate Successfully
Error With no Details
Debug Successfully
Error After following steps mentioned by Franky
Azure SQL Linked Service Error (Resolved by re configuring connection /edit credentials in connection tab)
Source Query got blank (resolved by re-selection source type and rewriting query)
Could you verify if you have access to create a template deployment in the azure portal?
1) Export the ARM Template: int he top-right of the ADFv2 portal, click on ARM Template -> Export ARM Template, extract the zip file and copy the content of the "arm_template.json" file.
2) Create ARM Template deployment: Go to https://portal.azure.com/#create/Microsoft.Template and log in with the same credentials you use in the ADFv2 portal (you can also get to this page going in the Azure portal, click on "Create a resource" and search for "Template deployment"). Now click on "Build your own template in editor" and paste the ARM template from the previous step in the editor and Save.
3) Deploy template: Click on existing resource group and select the same resource group as the one where your Data Factory is. Fill out the parameters that are missing (for this testing it doesn't really matter if the values are valid); Factory name should already be there. Agree the terms and click purchase.
4) Verify the deployment succeeded. If not let me know the error, it might be an access issue which would explain why your publish fails. (ADF team is working on giving a better error for this issue).
Did any of the objects publish into your Data Factory?