Connect to Azure SQL Database from Databricks Notebook - pyspark

I wanted to load the data from Azure Blob storage to Azure SQL Database using Databricks notebook . Could anyone help me in doing this

I'm new to this, so I cannot comment, but why use Databricks for this? It would be much easier and cheaper to use Azure Data Factory.
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-copy-data-dot-net
If you really need to use Databricks, you would need to either mount your Blob Storage account, or access it directly from your Databricks notebook or JAR, as described in the documentation (https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html).
You can then read the files into DataFrames for whatever format they are in, and use the SQL JDBC connector to create a connection for writing the data to SQL (https://docs.azuredatabricks.net/spark/latest/data-sources/sql-databases.html).

Related

Reading data from QVD using python and databricks

I am new to python,can you help me with the details of how QVD can be read into azure databricks dataframe using python.
I need the detail syntax (authentication with access key)of accessing the QVD from datalake and the read the same using qvd_reader

How do I get file metadata using Databricks Connect?

I am using Azure Databricks, which I have hooked up to a data lake and I want to get metadata such as modified date for the files in the lake. I am able to do this within Databricks itself using os.stat() as detailed in this answer, but I am developing locally using Databricks connect, and can't figure out how to test it locally given it only has context of my local file system.

How to connect VectorWise Database to Azure Data Factory or Azure Synapse?

I have a vectorwise database that contain big size multiple table, I need to copy those table into Azure Storage or Azure SQL DB or Synapse. We don't have direct connector for vectorWise in Azure data factory.
Is there any way to connect with VectorWise in ADF by creating API or something?
You will need to setup a virtual machine on a network which can reach your database, you will need to install the self-hosted integration runtime on it, you will need to install the proper ODBC driver from your database vendor, and then create an ODBC linked service.

Post load scripts for Postgresql database

We successfully migrated data to an Azure Virtual Machine that contains a Postgresqldb with Azure Data Factory. We now need to run on this database some post-loading scripts, like creating views, create indexes and so on.
For a normal SQL-DB I would put the scripts into a Stored Procedure and trigger them in Azure Data Factory.
What is the best way to trigger these scripts for the PostgreSQL also from Azure Data Factory?
ADF stored proc acitivity does not support PostgresSQL , I think you should use Azure function as a workaround . You can always invoke the azure function from ADF .

Is it possible to copy GeoJson data into PostGIS with Azure Data Factory?

I am looking if it is possible to transition from Airflow to Azure Data Factory.
I have a REST API from which I extract GeoJSON and would like to export this to a Postgres Database with PostGIS. I tried to do this with the Copy Data activity, but this only provides a simple mapping between the GeoJSON fields and similar fields in my table.
Normally I would use ogr2ogr to do this, but am not sure how to approach this with Azure Data Factory.
Does anyone know if my use case would be possible? If yes, how would you suggest to do it?
I fixed my own question. I created an Azure Function which runs Python in a self assigned docker container (one of the options in Azure Functions). I installed gdal in the standard Azure Functions Python Docker container and run subprocess.run() to execute ogr2ogr with the parameters I pass to it via the body of the Azure Functions POST request. I can run this Azure Function via Azure Data Factory.
Hope this can help anyone else searching for a similar approach.