Databricks jobs support file based trigger - azure-data-factory

We would like know if we using Databricks jobs instead ADF for orchestration, we might have to check if databricks jobs support file based trigger. kindly advise.
ultimately goal is, we have different ADF environment and subscription, we know that the subscription and environment does not a issues to stop our goal.
Kindly help.

There will be upcoming feature to trigger jobs based on the file events. It was mentioned in the latest Databricks quarterly roadmap webinar that you can watch.

I doubt that . But in ADF you do have the support for file based trigger and also that ADF has a notebook activity . You can stich these together .
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook

Related

How to Convert the logicapp plan from consumption to Standard by using azure devops?

I am trying to convert logicapp plans using Azure DevOps pipelines in our organization but I didn't find any option to run the task in ADO. Any suggestions Please.
As of now as per this Microsoft Document. Azure Devops is not supporting Logic APP standard plan. So, we can't convert consumption to standard plan.
You can raise a Feature Request that will help the other community members who want to use this feature.

Is it possible to use Databricks-Connect along with Github to make changes to my Azure Databricks notebooks from an IDE?

My aim is to make changes to my Azure Databricks notebooks using an IDE rather than in Databricks. While at the same time implementing some sort of version control.
Reading the Databricks-Connect documentation this doesn't look like it supports this kind of functionality. Was wondering if anyone else has tried to do this and had any success?

Is there a way to programatically generate the adf_publish content in Azure Data Factory?

I am new to Azure Data Factory and reading thorugh the docs I found that to generate an artifact to deploy to other DF envs, you need to publish in the dev DF, which generates an adf_publish branch with the jsons to deploy. My question is whether I can run this publish programatically and thus generate the jsons from any branch?
Not sure about programmatically publishing to adf_publish.
But, it's very possible to skip the adf_publish branch entirely and deploy using Azure DevOps or PowerShell straight from the source json instead.
Currently the only way to update the 'adf_publish' branch is by manually clicking the publish button in the UX.
The product group is currently designing a solution to be able to do this programmatically via a DevOps build task. No exact ETA unfortunately.
Thanks,
Daniel

Trigger Jupyter Notebook in Azure ML workspace from ADF

How do I trigger a notebook in my Azure Machine Learning notebook workspace from Azure Data Factory
I want to run a notebook in my Azure ML workspace when there are changes to my Azure storage account.
My understanding is that your use case is 100% valid and it is currently possible with the azureml-sdk. It requires that you create the following:
Create an Azure ML Pipeline. Here's a great introduction.
Add a NotebookRunnerStep to your pipeline. Here is a notebook demoing the feature. I'm not confident that this feature is still being maintained/supported, but IMHO it's a valid and valuable feature. I've opened this issue to learn more
Create a trigger using Logic apps to run your pipeline anytime a change in the datastore is detected.
There's certainly a learning curve to Azure ML Pipelines, but I'd argue the payoff is in the flexibility you get in composing steps together and easily scheduling and orchestrating the result.
This feature is currently supported by Azure ML Notebooks. You can also use Logic apps to trigger a run of your Machine Learning pipeline when there are changes to your Azure storage account.

How to schedule Google Data Fusion pipeline?

I have deployed a simple Data Fusion pipeline that reads from GCS and writes to BigQuery table.
I am looking for way to schedule the pipeline but could not find relevant documents.
Can anyone point me to documentation/pages that briefs about scheduling Data fusion pipelines?
You can schedule pipeline after deployment by clicking on Schedule button in the pipeline detail page. Once you click on it, you can configure the pipeline to run periodically.
Please see screenshots below:
I was using "Data Fusion Basic Edition" which doesn't support scheduling and hence I was not able to find an option to schedule.
In Enterprise edition, I see an option "Schedule" after deploying the pipeline.
Feature comparisons here - Comparison between Basic and Enterprise edition