Azure Data Factory: Flattening/normalizing a cloumn from CSV file using Azure Data Factory activity - azure-data-factory

I have pulled a csv file from one of our source using ADF and there is one column called "attributes" which contains multiple fields (in the form of key value pairs). Now I want to expand that column into different fields (columns). Below is the sample of that:
leadId activityDate activityTypeId campaignId primaryAttributeValue attributes
1234 2020-06-22T00:00:44Z 46 33686 Mail {"Description":"Clicked: https://stepuptostepout.com/","Source":"Lead action","Date":"2020-06-21 19:00:44"}
5678 2020-06-22T00:01:54Z 13 33128 SMS {"Reason":"Changed","New Value":110,"Old Value":null,"Source":"Marketo Flow Action"}
Here the attributes column have different Key-value pairs and I want them in different column so that I can store them in Azure SQL Database:
attributes
{"Reason":"Changed","New Value":110,"Old Value":null,"Source":"Marketo"}
I want them as:
Reason New Value Old Value Source
Changed 110 null Marketo
I am using Azure Data Factory. Please help!
Updating this:
One more thing I have noticed in my data is that the keys are not uniform, also if there is one key (say 'Source') present for one lead ID it might not be present/missing in the other leadId, making this more complicated. Hence having a separate column for each Attribute Key might not be a good idea.
Thus, we can have a separate table for 'attribute' field with lead ID, AttributeKey, AttributeValue as columns (we can join this with our main table using LeadID). The Attribute table will look like:
LeadID AttributeKey AttributeValue
5678 Reason Changed
5678 New Value 110
5678 Old Value null
5678 Source Marketo
Can you help me I can I achieve this using ADF?

You can use data flow to do this thing.Below is my test sample.
Setting of source1
Setting of Filter1
instr(attributes,'Reason') != 0
Setting of DerivedColumn1
Here is my expression and it's complex.
#(Reason=translate(split(split(attributes,',')[1],':')[2],'"',''),
NewValue=translate(split(split(attributes,',')[2],':')[2],'"',''),
OldValue=translate(split(split(attributes,',')[3],':')[2],'"',''),
Source=translate(translate(split(split(attributes,',')[4],':')[2],'"',''),'}',''))
Setting of Select1
Here is the result:
By the way,if your file is json,may be simple to do this than csv.
Hope this can help you:).

Related

How to map the iterator value to sink in adf

A question concerning Azure Data Factory.
I need to persist the iterator value from a lookup activity (an Id column from a sql table) to my sink together with other values.
How to do that?
I thought that I could just reference the iterator value as #{item().id} as source and a destination column name from from my sql table sink. That doesn’t seems to work. The resulting value in the destination column is NULL.
I have used 2 look up activities, one for id values and the other for remaining values. Now, to combine and insert these values to sink table, I have used the following:
The ids look up activity output is as following:
I have one more column to combine with above id values. The following is the look up output for that:
I have given the following dynamic content as the items value in for each as following:
#range(0,length(activity('ids').output.value))
Inside for each activity, I have given the following script activity query to insert data as required into sink table:
insert into t1 values(#{activity('ids').output.value[item()].id},'#{activity('remaining rows').output.value[item()].gname}')
The data would be inserted successfully and the following is the reference image of the same:

How to map Data Flow parameters to Sink SQL Table

I need to store/map one or more data flow parameters to my Sink (Azure SQL Table).
I can fetch other data from a REST Api and is able to map these to my Sink columns (see below). I also need to generate some UUID's as key fields and add these to the same table.
I would like my EmployeeId column to contain my Data Flow Input parameter, e.g. named param_test. In addition to this I need to insert UUID's to other columns which are not part of my REST input fields.
How to I acccomplish that?
You need to use a derived column transformation, and there edit the expression to include the parameters.
derived column transformation
expression builder
Adding to #Chen Hirsh, use the same derived column to get uuid values to the columns after REST API Source.
They will come into sink mapping:
Output:

Mapping Data Flows Dynamic Column Updates

I have a text input source. This has over 100 columns so I won't show all of them here - a cut-down view of the data would be:
CustomerNo
DOB
DOD
Status
01418495
01/02/1940
NULL
1
01418496
01/01/1930
NULL
1
The users want to be able to update/override any of these columns during processing by providing another input text file containing the PK (CustomerNo) and the key/value pairs of the columns to be updated e.g.
CustomerNo
Variable
New Value
01418495
DOB
01/12/1941
01418496
DOD
01/01/2021
01418496
Status
0
Can this data be used to create dynamic columns somehow that update the customer records regardless of the columns they want to update - in the example above this would result in:
CustomerNo
DOB
DOD
Status
01418495
01/02/1941
NULL
1
01418496
01/01/1930
01/01/2021
0
I have looked at the documentation but don't see any examples of how something like this could be achieved? Thanks in advance for any advice.
You would use a technique similar to what I describe in this video: https://www.youtube.com/watch?v=q7W6J-DUuJY. What I've done is created a file with rules that have expressions and then apply those rules dynamically inside of my data flow.
The key to make this work is using the expr() function to dynamically evaluate the expression from the external file.

Azure Data Factory - Expression to Source Dataset Column

I have an simple Azure Data Factory project aiming to copy data from an external service (Service Now) to an Azure Table Storage.
To keep things simple, consider source dataset only as an id and a creation Date:
ID, CreationDate
1 , 2020-05-02T10:00:00
2 , 2020-05-02T11:00:00
I want to copy it to the Azure Table with the same structure/columns, but I want to extract date from datetime column to use as Partition Key, and use the ID as Row Key (if possible, still maintaining the original ID column).
I think I need to use some expression to get the column values mapped to Partition/RowKey, but I didn´t found any expression that help me.
#formatDateTime(????source.CreationDate????, 'yyyy-MM-dd')
Thanks in advance for any help with the correct expression for this scenario.
Regards,
Based on test, the source column can't be referred in the dynamic content in Copy Activity.
You could try to add a column in the source dataset which extracts date from CreationDate column like this:
ID, CreationDate,ShortDate
1 , 2020-05-02T10:00:00,2020-05-02
2 , 2020-05-02T11:00:00,2020-05-02
then use ShortDate as Partition Key.

How to assign foreign keys in Access within imported table from Excel

I will use Access database instead of Excel. But I need to import data from one huge Excel sheet into several pre-prepared normalized tables in Access. In the core Access table I have mainly the foreign keys from other tables (of course some other fields are texts or dates).
How should I perform the import in the easiest way? I cannot perform import directly, because there is NOT, for example, "United States" string in the Access field 'Country'; there must be foreign key no. 84 from the table tblCountries. I think about DLOOKUP function in the Excel and replace strings for FK... Do you know any more simple method?
Thank you, Martin
You don’t mention how you will get the Excel data into several Access tables, so I will assume you will import the entire Excel file into ONE large table then break out the data from there. I assume the imported data may NOT match with existing Access keys (i.e. misspellings, new values, etc.) so you will need to locate those so you can make corrections. This will involve creating a number of ‘unmatched queries’ then a number of ‘Update queries’, finally you can use Append queries to pull data from your import table into the final resting place. Using your example, you have imported ‘Country = United States’, but you need to relate that value to key “84”?
Let’s set some examples:
Assume you imported your Excel data into one large Access table. Also assume your import has three fields you need to get keys for.
You already have several control tables in Access similar to the following:
a. tblRegion: contains RegionCode, RegionName (i.e. 1=Pacific, 2=North America, 3=Asia, …)
b. tblCountry: contains CountryCode, Country, Region (i.e. 84 | United States | 2
c. tblProductType: contains ProdCode, ProductType (i.e. VEH | vehicles; ELE | electrical; etc.)
d. Assume your imported data has fields
Here are the steps I would take:
If your Excel file does not already have columns to hold the key values (i.e. 84), add them before the import. Or after the import, modify the table to add the columns.
Create ‘Unmatched query’ for each key field you need to relate. (Use ‘Query Wizard’, ‘Find Unmatched Query Wizard’. This will show you all imported data that does not have a match in your key table and you will need to correct those valuse. i.e.:
SELECT tblFromExcel.Country, tblFromExcel.Region, tblFromExcel.ProductType, tblFromExcel.SomeData
FROM tblFromExcel LEFT JOIN tblCountry ON tblFromExcel.[Country] = tblCountry.[CountryName]
WHERE (((tblCountry.CountryName) Is Null));
Update the FK with matching values:
UPDATE tblCountry
INNER JOIN tblFromExcel ON tblCountry.CountryName = tblFromExcel.Country
SET tblFromExcel.CountryFK = [CountryNbr];
Repeat the above Unmatched / Matched for all other key fields.