Pivot data in mapping dataflows - azure-data-factory

Pivot data in mapping dataflows - azure-data-factory

I want to pivot the values into description using mapping dataflows
My example
So the value column plots to each new description value column.
it should look like this
I understand that the groupings will be on the IDName, ID and DateTime columns and I have removed the columns I don't need.
I'd like to know what goes where in the pivot values
Thanks

Step1: Create dataflow
Step2: Add original dataset to source
Step3: In Pivot Settings use groupby on ID, IDName, DateTime columns
You will get data preview as expected
Step4: Then Configure Sink with output csv and Store data to target file.

Related

ADF Unpivot Dynamically With New Column

There is an Excel worksheet that I wanted to unpivot all the columns after "Currency Code" into rows, the number of columns need to be unpivot might vary, new columns might be added after "NetIUSD". Is there a way to dynamically unpivot this worksheet with unknown columns?
It worked when I projected all the fields and define the datatype for all the numerical fields as "double" and set the unpivot column data type as "double" as well. However, the issue is there might be additional columns added to the source file, which I won't be able to define the datatype ahead, in this case, if the new column has different data type other than "double", it will throw an error that the new column is not of the same unpivot datatype.

I tried to repro this in Dataflow with sample input details.
Take the unpivot transformation and in unpivot settings do the following.
Ungroup by: Code, Currency_code
Unpivot column: Currency
Unpivoted Columns: Column arrangement: Normal
Column name: Amount
Type: string
Data Preview
All columns other than mentioned in ungroup by can be dynamically unpivoted even if you add additional fields.

I confirm an Aswin answer. Got the same issue: failed dataflow with dynamically new columns. The reason was in datatype of unpivoted columns. Changed that to string and all goes smoothly.
Imported projection does not affect this case i`ve tried both with imported and manually coded, both works with "string" datatype.

Is it possible to generate the space separated header row using data factory copy activity?

I am using azure sql as source dataset and delimited file as sink dataset in the copy activity.
I tried copy activity but First row as header gives comma separated headers.
Is there way to change the header output style ?
Please note spacing is unequal (h3...h4)

In this repro, I tried to give
1 space between 1st and 2nd column,
2 spaces between 2nd and 3rd column,
3 spaces between 3rd and 4th column.
Also, I tried to give same column name for column2 and column3. The approach is as follows.
Data is copied from Azure SQL database to datalake in comma delimitted format as a staging file.
This staging file is taken as a source in Dataflow activity.
In source dataset, first row as header is not checked.
Data preview of Source transformation:
Derived column transformation is added to change the column name of column2 and column3.
In this case, date_col of column1 is header data. Thus when column1 is 'date_col' replace column2 and column3 data with same column name.
column_2 = iif(Column_1=='date_col','ECIX',Column_2);
column_3 = iif(Column_1=='date_col','ECIX',Column_3);
Again derived column transformation is added to concat all the columns with spaces. Column name is given as concat . Value for this column is
concat(Column_1,' ',Column_2,' ',Column_3,' ',Column_4)
Select transformation is added and only concat column is selected here.
In sink, new delimited file is added as a sink dataset. And in sink dataset also , first row as header is not checked.
Output file screenshot
After pipeline is run, the target file looks like this.

Keeping the source as azure sql itself in the dataflow, I created a single derived column 'OUTDC' and added all the columns from the source like this:
(h1)+' '+(h2)+' '+(h3)
Then fed the OUTDC to a delimited sink and kept the Headers option as single string like this:
['h1 h2 h2']

ADF map source columns startswith to sink columns in SQL table

I have a ADF data flow with many csv files as a source and a SQL database as a sink. The data in the csv files are similar with 170 plus columns wide however not all of the files have the same columns. Additionally, some column names are different in each file, but each column name starts with the same corresponding 3 digits. Example: 203-student name, 644-student GPA.
Is it possible to map source columns using the first 3 characters?

Go back to the data flow designer and edit the data flow.
Click on the parameters tab
Create a new parameter and choose string array data type
For the default value as per your requirement, enter ['203-student name','203-student grade',’203-student-marks']
Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
We're going to change the first 3 column names to the new names defined in the parameter
To do this, add 3 rule-based mapping entries in the bottom pane
For the first column, the matching rule will be position==1 and the name will be $parameter11
Follow the same pattern for column 2 and 3
Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-dynamic-columns#parameterized-column-mapping

ADF Add Header to CSV Sink

Anyone know how to add header to csv sink? I have a data flow that's source is a database table. Then I have used derived column and concatenated the columns to make one column and split the data in the column by commas (done in the source via a query). I have then selected the column that has been concatenated to be export to csv.
Data example:
Matt,Smith,10
Therefore I technically only have one column, however, I want to add a header for each section of the data.
Desired output:
FirstName,LastName,Age
Matt,Smith,10

You can add headers in CSV file.
Select Data Flow Activity.
Select Source and use Select activity.
Add column names as shown in below screenshot.
Finally add Sink and run Pipeline.

Is it possible to create a filter that filters out distinct values in a dataset?

I'm trying to create a report that will contain two pie charts. I get the data for the report from SQL.
Currently, I created a dataset for the first chart which holds records with the following fields: Import ID, Date, Status. This dataset contains duplicate records.
For the second chart, I need the same data I have in the first dataset, only without duplications and aggregated differently.
I realize that I can create another dataset that will get the distinct values from the SQL database, but I was wondering if there was a way to use the built-in filtering functionality to filter out the dataset I already have to return only distinct values (based on ID field).
Looking at the options in the following filtering dialog, I see no obvious way to do this:

If ID is not unique and your 1st query looks like this:
select ImportID, Date, Status
from YourTableSource
WHERE YourConditions
Then you probably should use for your 2nd query form like this:
select DISTINCT ImportID, Status
from YourTableSource
WHERE YourConditions
If changing the query is not an option ,then you may create the Group in ssrs with invisible DETAIL pane and place the ID and Status fiels in int Group pane

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Pivot data in mapping dataflows - azure-data-factory

Step1: Create dataflow Step2: Add original dataset to source Step3: In Pivot Settings use groupby on ID, IDName, DateTime columns You will get data preview as expected Step4: Then Configure Sink with output csv and Store data to target file.

Related

ADF Unpivot Dynamically With New Column

Is it possible to generate the space separated header row using data factory copy activity?

ADF map source columns startswith to sink columns in SQL table

ADF Add Header to CSV Sink

Is it possible to create a filter that filters out distinct values in a dataset?

Categories

Resources