Infer Schema from .csv file appear in azure blob and generaate DDL script automatically using Azure SQL - tsql

Every time the .csv file appearing in the blob storage, i have to create DDL from that manually on azure sql. The data type is based on the value specified for that field.
The file have 400 column, and manually it is taking lots of time.
May someone please suggest how to automate this using SP or script, so when we execute the script, it will create TABLE or DDL script, based on the file in the blob storage.
I am not sure if it is possible, or is there any better way to handle such scenario.
Appreciate yours valuable suggestion.
Many Thanks

This can be achieved in multiple ways. As you mentioned about automating it, you can use Azure function as well.
Firstly create a function that reads the csv file from blob storage:
Read a CSV Blob file in Azure
Then add the code to generate the DDL statement:
Uploading and Importing CSV File to SQL Server
Azure function can be scheduled or run when new files are added to blob storage.
If this is once a day kind of requirement and can manually be done as well, we can download the file from blob and use the 'Import Flat File' functionality available within SSMS where we can just specify the csv file and it creates the schema based on existing column values.

Related

Move Entire Azure Data Lake Folders using Data Factory?

I'm currently using Azure Data Factory to load flat file data from our Gen 2 data lake into Synapse database tables. Unfortunately, we receive (many) thousands of files into timestamped folders for each feed. I'm currently using Synapse external tables to copy this data into standard heap tables.
Since each folder contains so many files, I'd like to move (or Copy/Delete) the entire folder (after processing) somewhere else in the lake. Is there some practical way to do that with Azure Data Factory?
Yes, you can use copy activity with a wild card. I tried to reproduce the same in my environment and I got the below results:
First, add source dataset and select wildcard with folder name. In my scenario, I have a folder name pool.
Then select sink dataset with file path
The pipeline run is successful. It transferred the file from one location to another location with the required name. Look at the following image for reference.

Issue while updating copy activity in ADF

I want to update a source excel column with a particular string.
My source contains n columns. I need to check where the string apple exists in any one of the columns. If the value exist in any column I need to replace the apple with orange string. And output the excel. How can I do this in ADF?
Note:I cannot use dataflows since we were using a self hosted vm
Excel files has lot of limitations in ADF like it is not supported in the copy activity sink and in Data flow sink as well.
You can raise the feature request for that in ADF.
So, try the above operation with a csv and copy the result to a csv in blob which later you can change it to Excel in your local machine.
To do the operations like above, Data flow can be a better option than doing it with normal activities as Dataflow deals with the transformations.
But Data flow won't support Self hosted linked service.
So, as a workaround first copy the Excel file as csv to Blob storage using copy activity. Create a Blob linked service for that to use in dataflow.
Now follow the below process in Data flow.
Source CSV from Blob:
Derived column transformation:
give the condition for each column case(col1=="apple", "orange", col1)
Sink :
In Sink settings specify as Output to single file.
After Pipeline execution a csv will be generated in the blob. You can convert it to Excel in your local machine.

Working with CSVs in SSIS without SQL Database connection

So I am tasked with performing some calculation in SSIS on a CSV file using T-SQL. Once I am complete I need to save the project as an ISPAC file so other users outside my business can run the program too, however their SQL connection managers will obviously differ from mine. To that end is it possible to use T-SQL to make modifications to a CSV inside of SSIS without actually using a connection manager to an SQL database? If so, what tasks would I use to achieve this? Many Thanks
EDIT: For more clarification on what i need to achieve: Basically I am tasked with adding additional data to a CSV file. For example adding forename and surname together etc, create new columns such as month numbers, names etc from a date string then output that to a new CSV location. I am only allowed to use T-SQL and SSIS to achieve this. Once Complete I need to send the SSIS project file to another organisation to use therefore I cannot hardcode any of my connections or assume they hold any of the databases I will use.
My initial question is can all this be done inside of SSIS using T-SQL but I think I have the answer to that now. I plan to parameterise the connection string, source file location, output file and use the Master DB and temp DBs to perform the SQL code to add the additional data, so they will have SQL their end but it wont make any assumptions on what they are using

Handling errors in Azure Pipeline from csv file in blog to Azure Sql

I am making a pipeline in data factory that takes a csv file in blog and loads it into Azure SQL.
Some lines in the csv file will have a delimiter missing/extra. So for that particular row the number of rows in the csv file and in the Azure SQL table will not correspond.
I would like the process to continue working and store the faulty rows in an error table and let the process finishing the work.
Please help.
Cheers.
This is being discussed at https://social.msdn.microsoft.com/Forums/azure/en-US/bd77151f-09e9-4ce6-9b7c-583fa5f6b583/store-non-corresponding-rows-rather-than-throwing-errors?forum=AzureDataFactory . Do refer there.

Check for the existence of an Azure Blob using TSQL

I currently have a TSQL function called "FileExists" that checks for the existence of a file on disk. However, we are moving the database to Azure Db and the files to Azure Blob storage, so this function needs to be rewritten (if possible). How can I check the Blob Storage container for a particular SubBlob and FileName combination using TSQL?
Of course, you cannot execute direct T-SQL query to Azure Blob. Possible workaround is to use xp_cmdshell to run Powershell script which calls Get-AzureStorageBlob to access the blob and get data... but much more easier to do the whole task in .NET code, not in SQL.