Turn 1 row into multiple rows in Azure Data Flows - azure-data-factory

I have a dataset along the lines of:
Account No
P01_Ind
P02_Ind
P03_Ind
1
Y
Y
N
2
Y
N
Y
3
N
Y
N
Is there a way of adding a transformation in Azure Data Flows so that each row would turn into 1 or more rows depending on these indicator columns? In this example, my dataset would become:
Account No
Indicator
1
P01
1
P02
2
P01
2
P03
3
P02
I looked at Unpivot but I couldn't see how this would work with this data. Note that this transformed dataset would undergo further transforms and wouldn't be sinked after this step. Any tips gratefully received. Thanks.

You can use ConditionalSplit transformation then add Indicator column to each condition by 'DerivedColumn' transformation. Finally, use Union and Select transformation to meet your need.(You can sort output of Select transformation if you need.)
Steps:
create a dataset and its data like your provided.
use ConditionalSplit transformation to split data to different stream.
add Indicator column to each stream.
union three stream
use 'Select' transformation to delete P01_Ind,P02_Ind,P03_Ind column.
sort output of Select transformation.
Data preview of 'Sort':

This is as simple as doing a unpivot which will get you the rows with 'Y'/'N' as another column and then filtering the rest out for values of 'N'.
You will get
1 P01_Ind Y
2 P02_Ind Y
3 P03_Ind N
.....
It is scalable next time you have 10 instead of 3 columns.

Related

KDB/Q : What is Vector operation?

I am learning KDB+ and Q programming and read about the following statement -
"select performs vector operations on column lists". What does Vector operation mean here? Could somebody please explain with an example? Also, How its faster than standard SQL?
A vector operation is an operation that takes one or more vectors and produces another vector. For example + in q is a vector operation:
q)a:1 2 3
q)b:10 20 30
q)a + b
11 22 33
If a and b are columns in a table, you can perform vector operations on them in a select statement. Continuing with the previous example, let's put a and b vectors in a table as columns:
q)([]a;b)
a b
----
1 10
2 20
3 30
Now,
q)select c:a + b from ([]a;b)
c
--
11
22
33
The select statement performed the same a+b vector addition, but took input and returned output as table columns.
How its faster than standard SQL?
"Standard" SQL implementations typically store data row by row. In a table with many columns the first element of a column and its second element can be separated in memory by the data from other columns. Modern computers operate most efficiently when the data is stored contiguously. In kdb+, this is achieved by storing tables column by column.
A vector is a list of atoms of the same type. Some examples:
2 3 4 5 / int
"A fine, clear day" / char
`ibm`goog`aapl`ibm`msft / symbol
2017.01 2017.02 2017.03m / month
Kdb+ stores and handles vectors very efficiently. Q operators – not just +-*% but e.g. mcount, ratios, prds – are optimised for vectors.
These operators can be even more efficient when vectors have attributes, such as u (no repeated items) and s (items are in ascending order).
When table columns are vectors, those same efficiencies are available. These efficiencies are not available to standard SQL, which views tables as unordered sets of rows.
Being column-oriented, kdb+ can splay large tables, storing each column as a separate file, which reduces file I/O when selecting from large tables.
The sentence means when you refer to a specific column of a table with a column label, it is resolved into the whole column list, rather than each element of it, and any operations on it shall be understood as list operations.
q)show t: flip `a`b!(til 3;10*til 3)
a b
----
0 0
1 10
2 20
q)select x: count a, y: type b from t
x y
---
3 7
q)type t[`b]
7h
q)type first t[`b]
-7h
count a in the above q-sql is equivalent to count t[`a] which is count 0 1 2 = 3. The same goes to type b; the positive return value 7 means b is a list rather than an atom: http://code.kx.com/q/ref/datatypes/#primitive-datatypes

DAX: Averaging multiple % Columns

I'm new to Power BI and Dax, having some difficulty with the below scenario.
test a b c d AVERAGE
aa 51.97% 46.61% 49%
I have 4 columns, a-d, and I simply want the average of the 4 columns in the AVERAGE column. Dependent on the row different columns may be blank. Each of the columns are measures pulling through a % value into the table.
I'm sure there must be a simple solution to this but any help would be much appreciated.
Try creating a column like this:
AVERAGE = ([a]+[b]+[c]+[d])/4
UPDATE: BLANK measures don't affect average result.
AVERAGE = DIVIDE(([a]+[b]+[c]+[d]),
(IF(ISBLANK([a]),0,1) + IF(ISBLANK([b]),0,1) +
IF(ISBLANK([c]),0,1) + IF(ISBLANK([d]),0,1)))

Aggregating from multiple columns in Tableau

I have a table that looks like:
id aff1 aff2 aff3 value
1 a x b 5
2 b c x 4
3 a b g 1
I would like to aggregate the aff columns to calculate the sum of "value" for each aff. For example, the above gives:
aff sum
a 6
b 10
c 4
g 1
x 9
Ideally, I'd like to do this directly in tableau without remaking the table by unfolding it along all the aff columns.
You can use Tableau’s inbuilt pivot method as below, without reshaping in source .
CTRL Select all 3 dimensions you want to merge , and click on pivot .
You will get your new reshaped data as below, delete other columns :
Finally build your view.
I hope this answers . Rest other options for the above results include JOIN at DB level, or creating multiple calculated fields for each attribute value which are not scalable.

Comparing, matching and combining columns of data

I need some help matching data and combining it. I currently have four columns of data in an Excel sheet, similar to the following:
Column: 1 2 3 4
U 3 A 0
W 6 B 0
R 1 C 0
T 9 D 0
... ... ... ...
Column two is a data value that corresponds to the letter in column one. What I need to do is compare column 3 with column 1 and whenever it matches copy the corresponding value from column 2 to column 4.
You might ask why don't I do this manually ? I have a spreadsheet with around 100,000 rows so this really isn't an option!
I do have access to MATLAB and have the information imported, if this would be more easily completed within that environment, please let me know.
As mentioned by #bla:
a formula similar to =IF(A1=C1,B1,0)
should serve (Excel).

Preserving matrix columns using Matlab brush/select data tool

I'm working with matrices in Matlab which have five columns and several million rows. I'm interested in picking particular groups of this data. Currently I'm doing this using plot3() and the brush/select data tool.
I plot the first three columns of the matrix as X,Y, Z and highlight the matrix region I'm interested in. I then use the brush/select tool's "Create variable" tool to export that region as a new matrix.
The problem is that when I do that, the remaining two columns of the original, bigger matrix are dropped. I understand why- they weren't plotted and hence the figure tool doesn't know about them. I need all five columns of that subregion though in order to continue the processing pipeline.
I'm adding the appropriate 4th and 5th column values to the exported matrix using a horrible nested if loop approach- if columns 1, 2 and 3 match in both the original and exported matrix, attach columns 4/5 of the original matrix to the exported one. It's bad design and agonizingly slow. I know there has to be a Matlab function/trick for this- can anyone help?
Thanks!
This might help:
1. I start with matrix 1 with columns X,Y,Z,A,B
2. Using the brush/select tool, I create a new (subregion) matrix 2 with columns X,Y,Z
3. I then loop through all members of matrix 2 against all members of matrix 1. If X,Y,Z match for a pair of rows, I append A and B
from that row in matrix 1 to the appropriate row in matrix 2.
4. I become very sad as this takes forever and shows my ignorance of Matlab.
If I understand your situation correctly here is a simple way to do it:
Assuming you have a matrix like so: M = [A B C D E] where each letter is a Nx1 vector.
You select a range, this part is not really clear to me, but suppose you can create the following:
idxA,idxB and idxC, that are 1 if they are in the region and 0 otherwise.
Then you can simply use:
M(idxA&idxB&idxC,:)
and you will get the additional two columns as well.