How to execute SQL scripts using azure databricks - scala

I have one SQL scripts file
In that file there is some sql query
I want to upload it on dbfs and read it from azure databricks and execute querys from the script on azure databricks

Databricks does not directly support the execution of .sql files. However you could just read them into a string and execute them.
with open("/dbfs/path/to/query.sql") as queryFile:
queryText = queryFile.read()
results = spark.sql(queryText)

Related

Load sql script in PySpark notebook

In Azure Synapse Analytics, I want to keep my SQL queries separately from my PySpark notebook.
So I have created some SQL scripts. And I would like to use them in my PySpark notebook.
Is it possible ?
And what is the python code to load a SQL script into a variable ?
As I undertand the ask here , is can we read the SQL scripts which we have already created from pysprk notebook . I was looking at the storage account whiched is mapped to my synapse analytics studio ( ASA) and i do not see that the notebook or SQL scripts are stored there . So i dont think you can convert the existing SQL script to Pyspark code within the ASA . Yes if you export the SQL scripts and then upload to storage and then read the scripts from the notebook .

Read .sql files with Azure Automation runbook

I am trying to connect my powershell runbook to a storage account (blob) to read .sql files and execute them in my Azure SQL Database.
Connect to a blob container
Read the script on .sql file
Execute the script on the db
When I try Invoke-Sqlcmd, it requires a dedicated storage to store the file. However, runbooks work serverless and there is no storage I can use for the files, as far as I know.
Is there a way to only read the files (without moving them around) via powershell runbook or can I store the files to read with it?

How to read tables from synapse database tables using pyspark

I am a newbie to Azure Synapse, I have to work on the Azure spark notebook. One of my colleagues connected the on-prime database using the azure link service. Now I have written a test framework for comparing the on-prime data and data-lake(curated) data. but I don't understand how to read those tables using Pyspark.
here is my linked service data structure.
enter image description here
here my Link service names and Database name.
You can read any file as a table which is stored in Synapse Linked location by using Azure Synapse Dedicated SQL Pool Connector for Apache Spark.
First you need to read the file which you need to read as the table in Synapse. Use below code to read the file.
%%pyspark
df = spark.read.load('abfss://sampleadls2#sampleadls1.dfs.core.windows.net/business.csv', format='csv', header=True)
Then convert this file into table using the code below:
%%pyspark
spark.sql("CREATE DATABASE IF NOT EXISTS business")
df.write.mode("overwrite").saveAsTable("business.data")
Refer below image.
Now you can run any Spark SQL command on this table as shown below:
%%pyspark
data = spark.sql("SELECT * FROM business.data")
display(data)
See the output in below image.

How do you run many SQL commands against an Azure SQL database using an Azure Automation powershell runbook

I'm using Azure Automation to move an Azure SQL database from one resource to another(from Prod to Dev for example). After the database is copied, I would then like to run SQL script that adds some users and permissions. This would mean I need to run a handful of commands like "Create user..." and "alter role....". Most examples I've found use powershell to execute a single SQL command, but using that code to run many commands seems like it would result in an excessively long powershell script. In the on-prem world, I probably would have .sql file that gets executed. Any suggestions on how to achieve this easily using powershell in Azure Automation. Thanks!

how to run a shell script from Azure data factory

How to run a shell script from Azure Data Factory. Inside the shell script I am trying to execute an hql file like below:
/usr/bin/hive -f "wasbs://hivetest#batstorpdnepetradev01.blob.core.windows.net/hivetest/extracttemp.hql" >
wasbs://hivetest#batstorpdnepetradev01.blob.core.windows.net/hivetest/extracttemp.csv
My hql file is stored inside a Blob Storage and I want to execute it and collect the result into a csv file and store it back to Blob Storage . This entire script is stored in shell script which also in a Blob Storage. NowIi want to execute in a Azure Data Factory in hive activity. Help will be appreciated.
You could use Hadoop hive activity in ADF. Please take a look at this doc. And you could build your pipeline with ADF V2 UI.