How to get dynamically all json files table data in a table(sql server data warehouse) using Azure Data Factory(Load from ADF to DWH) - azure-data-factory

I have to get all json files data into a table from azure data factory to sql server data warehouse.i'm able to load the data into a table with static values (by giving column names in the dataset) but generating in dynamic i'm unable to get that using azure data factory.Can some help on this solution to get dynamically in azure data factory?
Many thanks in Advance.
json file data as follows:
{
"TABLE": "TEST_M1",
"DATA": [{
"DFG": "123456",
"ADF": "SFSDF"
}, {
"DFG": "ABADHDD",
"ADF": "GHB"
}
}
same as follows for different TABLE names(TEST_M2.....)

You could invoke a stored procedure script in sql serer sink when doing copy.
Stored procedure script defines the logic about how to generate dynamic value based on source json data. See an example: https://learn.microsoft.com/en-us/azure/data-factory/connector-sql-server#invoking-stored-procedure-for-sql-sink

Related

How to transform data type in Azure Data Factory

I would like to copy the data from local csv file to sql server in Azure Data Factory. The table in sql server is created already. The local csv file is exported from mysql.
When I use copy data in Azure Data Factory, there is an error "Exception occurred when converting value 'NULL' for column name 'deleted' from type 'String' to type 'DateTime'. The string was not recognized as a valid DateTime.
What I have done:
I checked the original value from column name 'deleted' is NULL, without quotes(i.e. not 'NULL').
I cannot change the data type during file format settings. The data type for all column is preset to string as default.
I tried to create data flow instead of copy data. I can change the data type from source projection. But the sink dataset cannot select sql server.
What can I do to copy data from CSV file to sql server via Azure Data Factory?
Data Flow doesn't support on-premise SQL. We can't create the source and sink.
You can use copy active or copy data tool to do that. I made an example data which delete is NULL:
As you said the delete column data is Null or contains NULL, and ALL will be considered as String. The key is that your Sink SQL Server table schema if it allows NULL.
I tested many times and it all works well.

Can't use Data Explorer as a sink in Data Flow

I'm trying to do a Data Flow using ADL1 as the source and Data Explorer as the sink; I can create the source but when I select Dataset for Sink Type the only available options in the Dataset pulldown are my ADL1 Datasets. If I use Data Copy instead I can choose Data Explorer as a sink but this won't work as Data Copy won't allow null values into Data Explorer number data types. Any insight on how to fix this?
I figured out a workaround. First I Data Copy the csv file into a staging table where all columns are strings. Then I Data Copy from staging table to production table using a KQL query that converts strings to their destination data types.

How to load files from blob to sql dw by using azure data factory?

I have tried so many ways to load data from :
Azure blob to azure SQL synapse.
My requirement is :
Description:
(Input)Blob storage ---> Azure sql synapse(Output)
emp_dummy.csv----> emp_dummy table
dept_dummy.csv -----> dept_dummy table
sales_dummy.csv-----> sales_dummy table and so on
...
We have files starting with different names but the format is .csv only.
I have been trying this in various ways by using getmetadata activity or lookup activity.
When I tried with the below activity, facing the error:
[ADF pipeline][1]
[1]: https://i.stack.imgur.com/RynIb.png
Error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorMissingPropertyInPayload,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Required property 'fileName' is missing in payload.,Source=Microsoft.DataTransfer.ClientLibrary,'",
"failureType": "UserError",
"target": "Copy data1",
"details": []
}
I hope, I mention all details, if need some more, let me know.
I figured it out.
Here's my example steps: load two csv files to ADW, and auto create table with the same name with csv filename .
Csv files in blob storage:
Get all the filename in the blob container 'backup':
Foreach item settings:
#activity('Get Metadata2').output.childItems
Copy active in Foreach:
In copy active, using another blob source, add parameter to choose the file:
Source settings:
Sink dataset(ADW):
Sink settings:
table name expression: #split(item().name, '.')[0]
Note: get metadata will get the full file name like 'test.csv', when we set the table name, we need split it and set table name as 'test'.
Execute pipeline:
Check data in ADW:
Hope this helps.
I did a Google search for you. I found several really bad tutorials out there. The two links below looks pretty darn on point.
https://intellipaat.com/blog/azure-data-factory-tutorial/
https://medium.com/#adilsonbna/using-azure-data-lake-to-copy-data-from-csv-file-to-a-sql-database-712c243db658
Remember, when you're copying data from file stores by using Azure Data Factory, you can now configure wildcard file filters to let Copy Activity pick up only files that have the defined naming pattern—for example, "*.csv" or "???20180504.json".
For reference, look at the image below.
If you wanted to iterate through all the files, in different folders, in a Blob environment, instead of setting the File or Folder to this:
adfv2/SalesJan2009.csv
You can set the File or Folder to this:
adfv2/Sales*2009.csv
That will merge all Sales data from 2009 into a single dataframe, which you can them load to SQL Server (Data Warehouse, Synapse, etc).

How to model nested json data on redshift to query specific neseted property

I have the following JSON file structure on S3:
{
"userId": "1234",
"levelA": {
"LevelB": [
{
"bssid": "University",
"timestamp": "153301355685"
},
{
"bssid": "Mall",
"timestamp": "153301355688"
}
]
}
}
Now one of our future queries would be:
Return the total of users who saw bssid=University
So in my case it will return 1 (because userId=1234 contains that bssid's value)
Is Redshift the right solution for me for this type of query? In the case that it is, how can I model it?
The easiest way to model this would be to create a table with one row for each combination of userId and bssd:
userId, bssid, timestamp
1234,University,153301355685
1234,Mall,153301355688
The difficult part would be converting your JSON (contained in multiple files) into a suitable format for Redshift.
While Amazon Redshift can import data in JSON format, it would not handle the one-to-many relationship within your nested data.
Amazon Redshift also has a JSON_EXTRACT_PATH_TEXT Function that can extract data from a JSON string, but again it wouldn't handle the one-to-many relationship in your data.
I would recommend transforming your data into the above format prior to loading into Redshift. This would need to be done with an external script or ETL tool.
If you are frequently generating such files, a suitable method would be to trigger an AWS Lambda function whenever one of these files is stored in the Amazon S3 bucket. The Lambda function would then parse the file and output the CSV format, ready for loading into Redshift.

ADF - C # Custom Activity

I have a csv file as input which I have stored in Azure Blob Storage. I want to read data from csv file, perform some transformations on it and then store data in Azure SQL Database. I am trying to use a C# custom activity in Azure Data Factory having blob as input and sql table as output dataset. I am following this tutorial (https://azure.microsoft.com/en-us/documentation/articles/data-factory-use-custom-activities/#see-also) but it has both input and output as blobs. Can I get some sample code for sql database as output as I am unable to figure out how to do it. Thanks
You just need to fetch connection string of your Azure SQL database from a linked service and then you can talk to database. Try this sample code:
AzureSqlDatabaseLinkedService sqlInputLinkedService;
AzureSqlTableDataset sqlInputLocation;
Dataset sqlInputDataset = datasets.Where(dataset => dataset.Name == "<Dataset Name>").First();
sqlInputLocation = sqlInputDataset.Properties.TypeProperties as AzureSqlTableDataset;
sqlInputLinkedService = linkedServices.Where (
linkedService =>
linkedService.Name ==
sqlInputDataset.Properties.LinkedServiceName).First().Properties.TypeProperties
as AzureSqlDatabaseLinkedService;
SqlConnection connection = new SqlConnection(sqlInputLinkedService.ConnectionString);
connection.Open ();