ADF - C # Custom Activity - azure-data-factory

I have a csv file as input which I have stored in Azure Blob Storage. I want to read data from csv file, perform some transformations on it and then store data in Azure SQL Database. I am trying to use a C# custom activity in Azure Data Factory having blob as input and sql table as output dataset. I am following this tutorial (https://azure.microsoft.com/en-us/documentation/articles/data-factory-use-custom-activities/#see-also) but it has both input and output as blobs. Can I get some sample code for sql database as output as I am unable to figure out how to do it. Thanks

You just need to fetch connection string of your Azure SQL database from a linked service and then you can talk to database. Try this sample code:
AzureSqlDatabaseLinkedService sqlInputLinkedService;
AzureSqlTableDataset sqlInputLocation;
Dataset sqlInputDataset = datasets.Where(dataset => dataset.Name == "<Dataset Name>").First();
sqlInputLocation = sqlInputDataset.Properties.TypeProperties as AzureSqlTableDataset;
sqlInputLinkedService = linkedServices.Where (
linkedService =>
linkedService.Name ==
sqlInputDataset.Properties.LinkedServiceName).First().Properties.TypeProperties
as AzureSqlDatabaseLinkedService;
SqlConnection connection = new SqlConnection(sqlInputLinkedService.ConnectionString);
connection.Open ();

Related

Azure Data Factory schema mapping not working with SQL sink

I have a simple pipeline that loads data from a csv file to an Azure SQL db.
I have added a data flow where I have ensured all schema matches the SQL table. I have a specific field which contains numbers with leading zeros. The data type in the source - projection is set to string. The field is mapped to the SQL sink showing as string data-type. The field in SQL has nvarchar(50) data-type.
Once the pipeline is run, all the leading zeros are lost and the field appears to be treated as decimal:
Original data: 0012345
Inserted data: 12345.0
The CSV data shown in the data preview is showing correctly, however for some reason it loses its formatting during insert.
Any ideas how I can get it to insert correctly?
I had repro’d in my lab and was able to load as expected. Please see the below repro details.
Source file (CSV file):
Sink table (SQL table):
ADF:
Connect the data flow source to the CSV source file. As my file is in text format, all the source columns in the projection are in a string.
Source data preview:
Connect sink to Azure SQL database to load the data to the destination table.
Data in Azure SQL database table.
Note: You can all add derived columns before sink to convert the value to string as the sink data type is a string.
Thank you very much for your response.
As per your post the DF dataflow appears to be working correctly. I have finally discovered an issue with the transformation - I have an Azure batch service which runs a python script, which does a basic transformation and saves the output to a csv file.
Interestingly, when I preview the data in the dataflow, it looks as expected. However, the values stored in SQL are not.
For the sake of others having a similar issue, my existing python script used to convert a 'float' datatype column to string-type. Upon conversion, it used to retain 1 decimal number but as all of my numbers are integers, they were ending up with .0.
The solution was to convert values to integer and then to string:
df['col_name'] = df['col_name'].astype('Int64').astype('str')

How to transform data type in Azure Data Factory

I would like to copy the data from local csv file to sql server in Azure Data Factory. The table in sql server is created already. The local csv file is exported from mysql.
When I use copy data in Azure Data Factory, there is an error "Exception occurred when converting value 'NULL' for column name 'deleted' from type 'String' to type 'DateTime'. The string was not recognized as a valid DateTime.
What I have done:
I checked the original value from column name 'deleted' is NULL, without quotes(i.e. not 'NULL').
I cannot change the data type during file format settings. The data type for all column is preset to string as default.
I tried to create data flow instead of copy data. I can change the data type from source projection. But the sink dataset cannot select sql server.
What can I do to copy data from CSV file to sql server via Azure Data Factory?
Data Flow doesn't support on-premise SQL. We can't create the source and sink.
You can use copy active or copy data tool to do that. I made an example data which delete is NULL:
As you said the delete column data is Null or contains NULL, and ALL will be considered as String. The key is that your Sink SQL Server table schema if it allows NULL.
I tested many times and it all works well.

How to get dynamically all json files table data in a table(sql server data warehouse) using Azure Data Factory(Load from ADF to DWH)

I have to get all json files data into a table from azure data factory to sql server data warehouse.i'm able to load the data into a table with static values (by giving column names in the dataset) but generating in dynamic i'm unable to get that using azure data factory.Can some help on this solution to get dynamically in azure data factory?
Many thanks in Advance.
json file data as follows:
{
"TABLE": "TEST_M1",
"DATA": [{
"DFG": "123456",
"ADF": "SFSDF"
}, {
"DFG": "ABADHDD",
"ADF": "GHB"
}
}
same as follows for different TABLE names(TEST_M2.....)
You could invoke a stored procedure script in sql serer sink when doing copy.
Stored procedure script defines the logic about how to generate dynamic value based on source json data. See an example: https://learn.microsoft.com/en-us/azure/data-factory/connector-sql-server#invoking-stored-procedure-for-sql-sink

process multiple files on azure data lake

let's assume there are two file sets A and B on azure data lake store.
/A/Year/
/A/Month/Day/Month/
/A/Year/Month/Day/A_Year_Month_Day_Hour
/B/Year/
/B/Month/Day/Month/
/B/Year/Month/Day/B_Year_Month_Day_Hour
I want to get some values (let's say DateCreated of A entity) and use these values generate file paths for B set.
how can I achieve that?
some thoughts,but i'm not sure about this.
1.select values from A
2.store on some storage ( azure data lake or azure sql database).
3. build one comma separated string pStr
4. pass pStr via Data Factory to stored procedure which generates file paths with pattern.
EDIT
according to #mabasile_MSFT answer
Here is what i have right now.
First USQL script that generates json file, which looks following way.
{
FileSet:["/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__12",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__13",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__14",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__15"]
}
ADF pipeline which contains Lookup and second USQL script.
Lookup reads this json file FileSet property and as i understood i need to somehow pass this json array to second script right?
But usql compiler generates string variable like
DECLARE #fileSet string = "["/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__12",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__13",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__14",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__15"]"
and the script even didn't get compile after it.
You will need two U-SQL jobs, but you can instead use an ADF Lookup activity to read the filesets.
Your first ADLA job should extract data from A, build the filesets, and output to a JSON file in Azure Storage.
Then use a Lookup activity in ADF to read the fileset names from your JSON file in Azure Storage.
Then define your second U-SQL activity in ADF. Set the fileset as a parameter (under Script > Advanced if you're using the online UI) in the U-SQL activity - the value will look something like #{activity('MyLookupActivity').output.firstRow.FileSet} (see Lookup activity docs above).
ADF will write in the U-SQL parameter as a DECLARE statement at the top of your U-SQL script. If you want to have a default value encoded into your script as well, use DECLARE EXTERNAL - this will get overwritten by the DECLARE statements ADF writes in so it won't cause errors.
I hope this helps, and let me know if you have additional questions!
Try this root link, that can help you start with all about u-sql:
http://usql.io
Usefull link for your question:
https://saveenr.gitbooks.io/usql-tutorial/content/filesets/filesets-with-dates.html

Importing data to SQL from Excel using Talend

I am trying to import data to SQL from Excel. I have created a successful connection with the database but while trying to retrieve the schema I am not getting my table, instead I am having the schema of the database (Type CATALOG).
How do I get the schema of the table to which I will export the Excel data?
I have refereed to this video to do the import.
http://www.youtube.com/watch?v=JDBYU9f1p-I
What you can use is tFileExcelSheetOutput, map what you need with tMap and send the to t[DB]Input.
http://www.talendbyexample.com/talend-tdbinput-reference.html