how to replace null values in dynamic table with 'mean' or 'unknown' as per the column data type in azure data factory? - azure-data-factory

I have data from two data sources i.e SQL and PostgreSQL. For every table want to replace the column having 'Null values' with MEAN if column type is integer and by 'Unknown' if column type is string.
I have tried using derived column but i am not sure how to pass on dynamic column values.
I created a pipeline with the 'LookUp' activity and 'ForEach' activity and calling a dataflow.
The migration is happening from SQL to Postgres so need to validate tables as well null values.

you have 2 cases here, the first one is replacing a null values in a string column with 'unknown' and the second case is replacing null values in an integer column with the mean of the values in the same column.
Main idea:
add a derived column , replace the null values in a string with unknown
fix the null values in an integer column,replace null with zeros so we will replace these zeros with the mean value when we calculate it by using a window activity.
Here is a quick demo that i built in ADF.
First, i created a dataset with 3 columns (name,height,address), height type is integer and address is a string like so:
ADF:
Derived Column activity:
modified address and height column as mentioned above.
Window activity:
in window activity, the idea is to replace the zeros with the mean value, to see the difference, i added a new column named it 'newHeight' just we can see the difference but you can override the original height column
in window settings -> window columns :
added a new column newHeight with the value :
case(height == 0 ,divide(sum(height),count(height)),toLong(height))
Output:
please read more about window transformation here:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-window

Related

ADF Unpivot Dynamically With New Column

There is an Excel worksheet that I wanted to unpivot all the columns after "Currency Code" into rows, the number of columns need to be unpivot might vary, new columns might be added after "NetIUSD". Is there a way to dynamically unpivot this worksheet with unknown columns?
It worked when I projected all the fields and define the datatype for all the numerical fields as "double" and set the unpivot column data type as "double" as well. However, the issue is there might be additional columns added to the source file, which I won't be able to define the datatype ahead, in this case, if the new column has different data type other than "double", it will throw an error that the new column is not of the same unpivot datatype.
I tried to repro this in Dataflow with sample input details.
Take the unpivot transformation and in unpivot settings do the following.
Ungroup by: Code, Currency_code
Unpivot column: Currency
Unpivoted Columns: Column arrangement: Normal
Column name: Amount
Type: string
Data Preview
All columns other than mentioned in ungroup by can be dynamically unpivoted even if you add additional fields.
I confirm an Aswin answer. Got the same issue: failed dataflow with dynamically new columns. The reason was in datatype of unpivoted columns. Changed that to string and all goes smoothly.
Imported projection does not affect this case i`ve tried both with imported and manually coded, both works with "string" datatype.

reduce function not working in derived column in adf mapping data flow

I am trying to create the derived column based on the condition that met the value and trying to do the summation of multiple matching column values dynamically. So I am using reduce function in ADF derived column mapping data flow. But the column is not getting created even the transformation is correct.
Columns from source
Derived column logic
Derived column data preview without the new columns as per logic
I could see only the fields from source but not the derived column fields. If I use only the array($$) I could see the fields getting created.
Derived column data preview with logic only array($$)
How to get the derived column with the summation of all the fields matching the condition?
We are getting data of 48 weeks forecast and the data to be prepared on monthly basis.
eg: Input data
Output data:
JAN
----
506 -- This is for first record i.e. (94 + 105 + 109 + 103 + 95)
The problem is that the array($$) in the reduce function has only one element, so that the reduce function can not accumulate the content of the matching columns correctly.
You can solve this by using two derived columns and a data flow parameter as follows:
Create derived columns with pattern matching for each month-week you did it before, but put the reference $$ into the value field, instead of the reduce(...) function.
This will create derived columns like jan0, jan1, etc. containing the copy of the original values. For example Week 0 (1 Jan - 7 Jan) => 0jan with value 95.
This step gives you a predefined set of column names for each week, which you can use to summarize the values with specific column names.
Define Data Flow parameters for each month containing the month-week column names in a string array, like this:
ColNamesJan=['0jan' ,'1jan', etc.] ColNamesFeb=['0feb' ,'1feb', etc.] and so on.
You will use these column names in a reduce function to summarize the month-week columns to monthly column in the next step.
Create a derived column for each month, which will contain the monthly totals, and use the following reduce function to sum the weekly values:
reduce(array(byNames($ColNamesJan)), 0, #acc + toInteger(toString(#item)),#result)
Replace the parameter name accordingly.
I was able to summarize the columns dynamically with the above solution.
Please let me know if you need more information (e.g. screenshots) to reproduce the solution.
Update -- Here are the screenshots from my test environment.
Data source (data preview):
Derived columns with pattern matching (settings)
Derived columns with pattern matching (data preview)
Data flow parameter:
Derived column for monthly sum (settings):
Derived column for monthly sum (data preview):

Can I create a parameter in a Local?

I created a Derived Column with a Expression
(dummy sample)
iif(columnX=='true',1,0)
This expression will be util in anothers Derived Columns, so I'd like create a Local with this Expression, but in the place of columnX I'll put a parameter for another column
Is it possible? How?
I tried creating a data flow parameter (param2) and added your expression to it using a different parameter (param1) instead of ColumnX.
But I was not able to change the parameter value in the expression later, it was only taking the default value. Also did not find any related documents to assign a column value to a parameter in the data flow.
The only way I could think of is, using the expression multiple times in different derived columns taking different columns in place of ColumnX.
Derived column1: Added expression to the new column (col1)
Preview of Derived column1: Evaluating expression against Sample1 column.
Derived column2: Reusing same column name Col1 to evaluate the expression against Sample2 column. If the previous value is needed, you can assign the previous value of Col1 to the new column (previous_col1) in derived column2 as shown in the below snip.
Preview of derived column2

Is there a way of creating a Serial Number based on other inputs on a MS access form?

I have some samples I need to take.
In order to create a good identifier/serial number for the samples, I want it to be a product of its characteristics.
For example, if the sample was taken from India and the temperature was 40 degrees then I would click dropdowns in the form to create those two entries and then a serial number would be spat out in the form "Ind40".
Assuming that your form is bound to a table, you can create a calculated column in the table that concatenates the values from other columns into a single value.
For instance, create a new column and give it a name (for example, SerialNbr). Then for Data Type select "Calculated". An expression builder window will appear:
Enter the columns you'd like to concatenate and separate them with &. Here is an example of how the expression could look:
Left([Country],3) & [Temperature]
This expression takes the first 3 chars from the Country column and combines it with the value from Temperature column to create the value in column SerialNbr. The calculated column will automatically update when values are entered into the other fields. I'd also suggest adding another value to the calculated expression to help avoid duplicates, such as date/time of submission.

SSRS null parameter to return values

I have two reports bound to each other. On the first report when I choose a field I am lead to a second report showing only data from the row I selected in the first report. The second report is used for updating, therefore it takes in parameters. I have three text-boxes which allow a null value and a dropdown list.
First when I created the dropdown list and specified the values, and added a null value the report returned the row I selected in the first report with all the data. Now I tried to assign the values of the parameter to a database, but each time I get to this report it first asks to select a value from the dropdown and then it will display the data.
How can I add a Null value to the items retrieved from the DB so when null is selected as default then all values would be returned without any problems, and without any selection needed?
You would need to add a condition to your Dataset query to handle the NULL parameter.
For example:
WHERE #Parameter is NULL or ColumnValue = #Parameter
Working with NULL valued parameters, I usually use this syntax:
WHERE ColumnValue = COALESCE(#Parameter, ColumnValue)