Check for the existence of an Azure Blob using TSQL - tsql

I currently have a TSQL function called "FileExists" that checks for the existence of a file on disk. However, we are moving the database to Azure Db and the files to Azure Blob storage, so this function needs to be rewritten (if possible). How can I check the Blob Storage container for a particular SubBlob and FileName combination using TSQL?

Of course, you cannot execute direct T-SQL query to Azure Blob. Possible workaround is to use xp_cmdshell to run Powershell script which calls Get-AzureStorageBlob to access the blob and get data... but much more easier to do the whole task in .NET code, not in SQL.

Related

ADF Mapping Dataflow Temp Table issue inside SP call

I have a mapping dataflow inside a foreach activity which I'm using to copy several tables into ADLS; in the dataflow source activity, I call a stored procedure from my synapse environment. In the SP, I have a small temp table which I create to store some values which I will later use for processing a query.
When I run the pipeline, I get an error on the mapping dataflow; "SQLServerException: 111212: Operation cannot be performed within a transaction." If I remove the temp table, and just do a simple select * from a small table, it returns the data fine; it's only after I bring back the temp table that I get an issue.
Have you guys ever seen this before, and is there a way around this?
If you go through the official MS docs, this error is very well documented.
Failed with an error: "SQLServerException: 111212; Operation cannot be performed within a transaction."
Symptoms
When you use the Azure SQL Database as a sink in the data flow to
preview data, debug/trigger run and do other activities, you may find
your job fails with following error message:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'sink': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: 111212;Operation cannot be performed within a transaction.","Details":"at Sink 'sink': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: 111212;Operation cannot be performed within a transaction."}
Cause
The error "111212;Operation cannot be performed within a transaction." only occurs in the Synapse dedicated SQL pool. But you
mistakenly use the Azure SQL Database as the connector instead.
Recommendation
Confirm if your SQL Database is a Synapse dedicated SQL pool. If so,
use Azure Synapse Analytics as a connector shown in the picture below.
So after running some tests around this issue, it seems like Mapping Dataflows do not like Temp Tables when I call my stored procedure.
The way I ended up fixing this was that instead of using a Temp Table, I ended up using a CTE, which believe it or not, runs a bit faster than when I used the Temp Table.
#KarthikBhyresh
I looked at that article before, but it wasn't an issue with the sink, I was using Synapse LS as my source and a Data Lake Storage as my sink, so I knew from the beginning that this did not apply to my issue, even though it was the same error number.

Infer Schema from .csv file appear in azure blob and generaate DDL script automatically using Azure SQL

Every time the .csv file appearing in the blob storage, i have to create DDL from that manually on azure sql. The data type is based on the value specified for that field.
The file have 400 column, and manually it is taking lots of time.
May someone please suggest how to automate this using SP or script, so when we execute the script, it will create TABLE or DDL script, based on the file in the blob storage.
I am not sure if it is possible, or is there any better way to handle such scenario.
Appreciate yours valuable suggestion.
Many Thanks
This can be achieved in multiple ways. As you mentioned about automating it, you can use Azure function as well.
Firstly create a function that reads the csv file from blob storage:
Read a CSV Blob file in Azure
Then add the code to generate the DDL statement:
Uploading and Importing CSV File to SQL Server
Azure function can be scheduled or run when new files are added to blob storage.
If this is once a day kind of requirement and can manually be done as well, we can download the file from blob and use the 'Import Flat File' functionality available within SSMS where we can just specify the csv file and it creates the schema based on existing column values.

Azure Data Factory V2 - Calling a stored procedure that returns multiple result set

I want to create an ADF v2 pipeline that calls a stored procedure in Azure SQL Database. The stored procedure has input parameters and would return multiple result sets (around 3). We need to get it extracted. We are trying to load to Blob storage of 4 different files or load to table.
Is there a way to perform in pipeline?
In SSIS there is option to use script component and extract. https://www.timmitchell.net/post/2015/04/27/the-ssis-object-variable-and-multiple-result-sets/
Looking for suggestions in Data factory.
You cannot easily accomplish that in Azure Data Factory (ADF) as the Stored Procedure activity does not support resultsets at all and the Copy activity does not support multiple resultsets. However with a few small changes you could get the same outcome: you have a couple of options:
If the code and SSIS package already exist and you want to minimise your refactoring, you could host it in ADF via the SSIS-IR
Maybe you could accomplish this with an Azure Function which are kind of roughly equivalent to the SSIS Script Tasks, but it seems like a bit of a waste of time to me. It's an unproven pattern and you have simpler options such as:
Break the stored proc into parts: have it process its data and not return any resultsets. Alter the proc to place the three resultsets in tables instead. Have multiple Copy activities which will run in parallel, copy the data to blob store after the main Stored Proc activity has finished, something like this:
It's also possible to trick the Lookup activity to run stored procedures for you but the outputs are limited to 5,000 rows and it's not like you can pipe it into a Copy activity afterwards. I would recommend option 3 which will get the same outcome with only a few changes to your proc.

Ways to import data into AzureSQL PaaS from Azure Blob Storage

All,
I have to BULK Insert data into AzureSQL from a Azure Blob Storage Account. I know one way is to use SAS keys but are there more secure ways to load data from T-SQL?
For example, is there a way to use the users AAD account to connect to the Storage? Would Managed Identity work? I have not come across an example in the Internet that uses anything other than SAS Keys.
Gopi
azure data factory generally serves this purpose. You can build a pipeline that grabs data from blob and massages it / loads it into sql, kind of what it's designed for. However if you do not wish to use that,
the recommended way is SAS because it can be temporary and revoked at any time. Why do you think SAS is less secure?
as per the documentation: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-ver15#credential--credential_name if you were to create an external data source with blob_storage type the identity/credentials MUST be SAS, as it doesn't support any other authentication type. as such, that means you cannot use any other auth method to a blob storage using tsql.

Invoking remotely a script in Google Cloud

Initially I thought of this approach: creating a full web app that would allow uploading the files into the Data Store, then Data Flow would do the ETL by reading the files from the Data Store and putting the transformed data in Cloud SQL, and then have a different section that would allow passing the query to output the results in a CSV file into the Data Store.
However I want to keep it as simple as possible, so I my idea is to create a Perl script in the Cloud that does the ETL and also another script in which you can pass a SQL query as argument and output the results in a CSV file into the Data Store. This script would be invoked remotely. The idea is to execute the script without having to install all the stack in each client (Google SQL proxy, etc.), just by executing a local script with the arguments that would be passed to the remote script.
Can this be done? If so, how? And in addition to that, does this approach makes sense?
Thanks!!