how to run a shell script from Azure data factory - azure-data-factory

How to run a shell script from Azure Data Factory. Inside the shell script I am trying to execute an hql file like below:
/usr/bin/hive -f "wasbs://hivetest#batstorpdnepetradev01.blob.core.windows.net/hivetest/extracttemp.hql" >
wasbs://hivetest#batstorpdnepetradev01.blob.core.windows.net/hivetest/extracttemp.csv
My hql file is stored inside a Blob Storage and I want to execute it and collect the result into a csv file and store it back to Blob Storage . This entire script is stored in shell script which also in a Blob Storage. NowIi want to execute in a Azure Data Factory in hive activity. Help will be appreciated.

You could use Hadoop hive activity in ADF. Please take a look at this doc. And you could build your pipeline with ADF V2 UI.

Related

Load sql script in PySpark notebook

In Azure Synapse Analytics, I want to keep my SQL queries separately from my PySpark notebook.
So I have created some SQL scripts. And I would like to use them in my PySpark notebook.
Is it possible ?
And what is the python code to load a SQL script into a variable ?
As I undertand the ask here , is can we read the SQL scripts which we have already created from pysprk notebook . I was looking at the storage account whiched is mapped to my synapse analytics studio ( ASA) and i do not see that the notebook or SQL scripts are stored there . So i dont think you can convert the existing SQL script to Pyspark code within the ASA . Yes if you export the SQL scripts and then upload to storage and then read the scripts from the notebook .

Read .sql files with Azure Automation runbook

I am trying to connect my powershell runbook to a storage account (blob) to read .sql files and execute them in my Azure SQL Database.
Connect to a blob container
Read the script on .sql file
Execute the script on the db
When I try Invoke-Sqlcmd, it requires a dedicated storage to store the file. However, runbooks work serverless and there is no storage I can use for the files, as far as I know.
Is there a way to only read the files (without moving them around) via powershell runbook or can I store the files to read with it?

Best practice for importing bulk data to AWS RDS PostgreSQL database

I have a big AWS RDS database that needs to be updated with data on a periodic basis. The data is in JSON files stored in S3 buckets.
This is my current flow:
Download all the JSON files locally
Run a ruby script to parse the JSON files to generate a CSV file matching the table in the database
Connect to RDS using psql
Use \copy command to append the data to the table
I would like switch this to an automated approach (maybe using an AWS Lambda). What would be the best practices?
Approach 1:
Run a script (Ruby / JS) that parses all folders in the past period (e.g., week) and within the parsing of each file, connect to the RDS db and execute an INSERT command. I feel this is a very slow process with constant writes to the database and wouldn't be optimal.
Approach 2:
I already have a Ruby script that parses local files to generate a single CSV. I can modify it to parse the S3 folders directly and create a temporary CSV file in S3. The question is - how do I then use this temporary file to do a bulk import?
Are there any other approaches that I have missed and might be better suited for my requirement?
Thanks.

How do you run many SQL commands against an Azure SQL database using an Azure Automation powershell runbook

I'm using Azure Automation to move an Azure SQL database from one resource to another(from Prod to Dev for example). After the database is copied, I would then like to run SQL script that adds some users and permissions. This would mean I need to run a handful of commands like "Create user..." and "alter role....". Most examples I've found use powershell to execute a single SQL command, but using that code to run many commands seems like it would result in an excessively long powershell script. In the on-prem world, I probably would have .sql file that gets executed. Any suggestions on how to achieve this easily using powershell in Azure Automation. Thanks!

How to execute SQL scripts using azure databricks

I have one SQL scripts file
In that file there is some sql query
I want to upload it on dbfs and read it from azure databricks and execute querys from the script on azure databricks
Databricks does not directly support the execution of .sql files. However you could just read them into a string and execute them.
with open("/dbfs/path/to/query.sql") as queryFile:
queryText = queryFile.read()
results = spark.sql(queryText)