Is it possible to use Databricks-Connect along with Github to make changes to my Azure Databricks notebooks from an IDE? - github

My aim is to make changes to my Azure Databricks notebooks using an IDE rather than in Databricks. While at the same time implementing some sort of version control.
Reading the Databricks-Connect documentation this doesn't look like it supports this kind of functionality. Was wondering if anyone else has tried to do this and had any success?

Related

AWS Glue - version control and setting up for continuous integration

We are in the process of setting up the CI / CD process for AWS Glue ETL Process. The existing ETL process contains the following AWS Glue Components - Crawlers, Registered tables in catalog, Jobs, Triggers and workflows.
Obviously the first step is to set up a code repository and link the existing artifacts from different components mentioned above to the repository, which will ideally need to facilitate the developers in performing the check-ins and pull request from the tool (Something similar to ADF and Databricks). However as far as we have explored, AWS glue does not have integration to any of the source code repository which can directly provide this feature unless we are missing something.
Hence what is the method to setup the environment for CI (I'm still not talking about CD), the below link gives a reference for CI/CD:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/
However it mentions at the beginning that, AWS CloudFormation template file for deploying the ETL jobs are both committed to version control - so not clear on how this is done for the on-going regular commits from the developers.
However as far as we have explored, AWS glue does not have integration
to any of the source code repository which can directly provide this
feature unless we are missing something.
Correct, Glue does not have VC integration.
I develop (python and cloudformation) locally on vscode and use it's git integration plugin. And I use a container if I want to test something locally, but Glue also has a Dev Endpoint for similar tasks.

Azure DevOps: Use pipelines to deploy BAR files to IBM Integration Bus

The goal I'm trying to accomplish is migrating out of CVS to using Azure Repos. Currently BAR deployments are sent through CVS repos and then deployed to IBM Integration Bus. I would like to accomplish this same process through Azure DevOps.
I know this extension exists:
https://marketplace.visualstudio.com/items?itemName=ms-vsts.ibm-integration-bus&ssr=false#overview
However there is limited documentation available. I'm curious to know if anyone has had success using the above extension?
If not through the extension is there another solution available?
I have never used that extension myself.
The tasks involved in setting up a build and deploy pipeline are not difficult, so you should be able to roll your own.
See mqsicreatebar and mqsipackagebar for how to build the BAR files.
See mqsideploy for details of how to deploy the BAR files.
The rest will depend very much on your chosen source code control system and build orchestration technology.

Version control ad hoc google bigquery scripts in github

Is there any way to open sql scripts that are version controlled in GitHub in the BigQuery console?
Right now the only mechanism that I am aware of is to copy and past from the GitHub repo into the BigQuery console, but I'm hoping there is a more direct way to link the two. Unable to find any material saying this is or is not possible.
There is 3rd party IDE for BigQuery supporting GitHub - This is Goliath - part of Potens.io Suite available at Marketplace.
Note: Another tool in this suite is Magnus - Workflow Automator. Supports all BigQuery, Cloud Storage and most of Google APIs as well as multiple simple utility type Tasks like BigQuery Task, Export to Storage Task, Loop Task and many many more along with advanced scheduling, triggering, etc. Supports GitHub as a source control as well
Disclosure: I am GDE for Google Cloud and creator of those tools and leader on Potens team

Trigger Jupyter Notebook in Azure ML workspace from ADF

How do I trigger a notebook in my Azure Machine Learning notebook workspace from Azure Data Factory
I want to run a notebook in my Azure ML workspace when there are changes to my Azure storage account.
My understanding is that your use case is 100% valid and it is currently possible with the azureml-sdk. It requires that you create the following:
Create an Azure ML Pipeline. Here's a great introduction.
Add a NotebookRunnerStep to your pipeline. Here is a notebook demoing the feature. I'm not confident that this feature is still being maintained/supported, but IMHO it's a valid and valuable feature. I've opened this issue to learn more
Create a trigger using Logic apps to run your pipeline anytime a change in the datastore is detected.
There's certainly a learning curve to Azure ML Pipelines, but I'd argue the payoff is in the flexibility you get in composing steps together and easily scheduling and orchestrating the result.
This feature is currently supported by Azure ML Notebooks. You can also use Logic apps to trigger a run of your Machine Learning pipeline when there are changes to your Azure storage account.

Snowflake connectivity with GIT

Is there any way we can connect snowflake with GIT for version control. With the help of that, we can maintain version of our merge statement and any other sql script in GIT.
DBeaver has git integration and is the best solution my team has found for version control with Snowflake. It's not perfect but it allows you to run your scripts against Snowflake and then push your SQL code to a git repository through the app UI or command line.
Yes! One way to do this is to store your Snowflake SQL code in a file/files with the sql extension (i.e. filename.sql). You can add those files to a GIT repo and track them in the repo accordingly.
This is an age old question when dealing with databases and how one should go about versioning them. Unfortunately, no database really integrates directly into any VCS that I'm aware of.
My team has settled on using dbt. This essentially turns the database into a series of text files that are easily integrated with git. The short of it is that you edit your models as local text files, and then use dbt run to put these models into Snowflake itself. This is kind of nice as you can configure separate environments such as dev and prod.
Other answers help with using an IDE as a go-between for git and Snowflake. These projects could be useful also:
https://medium.com/snowflake/snowflake-vs-code-sql-tools-and-github-7eab915e10cb
use VSCode as the IDE with a useful snowflake extension
https://github.com/Snowflake-Labs/schemachange
manage schema changes as script in git, deploy them with CI/CD
https://github.com/Snowflake-Labs/sfsnowsightextensions#get-sfworksheets
the missing feature of SnowSight -- export worksheets
There is now a VSCode extension for Snowflake. I'm able to connect vscode to our repo (Azure DevOps in my case) and Snowflake. It's got some nice features too like being able to easily cycle through past queries (including query results) and gives the same level of detail (or more) than the Snowflake UI.