Extract table name from more than one schema length in Data Factory Expression

Extract table name from more than one schema length in Data Factory Expression - azure-data-factory

I need to extract table name from schema.table_name, I have more than one schema where length is unknown. e.g. Finance.Reporting or Setup.System. I want to extract Reporting and System from these string using Expression in Data Factory.

You can use the split() function to split the string based on delimiter which returns the array and get the second value from the string.
Note: Array index starts from 0.
#split('Finance.Reporting','.')[1]

Related

MD5/SHA Field Dataset in Data Fusion

I need to concatanate a few string values in order to obtain the SHA256 encrypted string. I've seen Data Fusion has a plugin to do the job:
The documentation however is very poor and nothing I've tried seems to work. I created a table in BQ with the string fields I need to concatanate but the output is same as input. Can anyone provide with an example on how to use this plugin?
EDIT
Below I present the example,
This is how the workflow looks like:
For the testing purposes, I added one column with the following string:
2022-01-01T00:00:00+01:00
And here's the output:

You can use Wrangler to concatenate the string values.
I tried your scenario adding Wrangler to the Pipeline:
Joining 2 Columns:
I named the column new_col, using , as delimiter:
Output:

What you described can be achieved by 2 Wranglers:
The first Wrangler will be what #angela-b described. Use the merge directive to create a new column with the concatenation of two columns. Example directive that joins column a and b using , as the delimiter and stores the result in column a_b:
merge a b a_b ,
The second Wrangler will use the hash directive which will hash the column in place using a specified algorithm. Example of a directive that hashes column a_b using MD5:
hash :a_b 'MD5' true
Remember to set the last parameter encode to true so that you get a string output instead of a byte array.

Creating an array of columns from an array of column names in data flow

How can I create an array of columns from an array of column names in dataflow?
The following creates an array of sorted columns with and exception of the last column:
sort(slice(columnNames(), 1, size(columnNames()) - 1), compare(#item1, #item2))
I want to get an array of the columns for this array of column names. I tried this:
toString(byNames(sort(slice(columnNames(), 1, size(columnNames()) - 1), compare(#item1, #item2))))
But I keep getting the error:
Column name function 'byNames' does not accept column or argument parameters
Please can anyone help me with a workaround for this?

Update--
It seems using ColumnNames() in any way (directly or assigning it to a parameter) seems to be leading to error. As at runtime on Spark it is fed to the byNames() function. Due to unavailability of a way to re-introduce as parameter or assign variable in Data Flow directly, see below which works for me.
Have empty string array type parameter in DataFLow
Use sha2 function as usual in derived column with parameter sha2(256,byNames($cols))
Create pipeline, there use getMetadata to get Structure from which you can get column names.
For each column, inside ForEach activity append to a variable.
Next, connect to DataFLow and pass the variable containing Column names.
The documentation for the byNames function states 'Computed inputs are not supported but you can use parameter substitutions'. This explains why you should use a parameter as input to create the array used in the byNames function:
Example: Where $cols parameter hold the list of columns.
sha2(256,byNames(split($cols,',')))
You can use computed columns names as input by creating the array prior to using in function. Instead of creating the expression in-line in the function call, set the column values in a parameter prior and then use it in your function directly afterwards.
For a parameter $cols of type array:
$cols = sort(slice(columnNames(), 1, size(columnNames()) - 1), compare(#item1, #item2))
toString(byNames(sort(slice($cols, compare(#item1, #item2))))
Refer: byNames

How to pass an array for column pattern matching in mapping dataflow derived column from CSV file through pipeline?

I have a mapping data flow with a derived column, where I want to use a column pattern for matching against an array of columns using in()
The data flow is executed in a pipeline, where I set the parameter $owColList_md5 based on a variable that I've populated from a single-line CSV file containing a comma-separated string
If I have a single column name in the CSV file/variable encapsuled in single quotes and have the "Expression" checkbox ticked, it works.
The problem is to get it to work with multiple columns. There seems to be parsing problems having multiple items in the variable each encapsuled in single-quotes, or potentially with the comma separating them. This often causes errors executing the data flow with messages like "store is not defined" etc
I've tried having ''col1'',''col2'' and "col1","col2" (2x single quotes and double quotes) in the CSV file. I've also tried having the file without quotes, trying to replace the comma with escaped quotes (using ) in the derived column pattern expression with no luck.
How do you populate this array in the derived column based on the data flow parameter which is based on the comma-separated string in the CSV file / variable from the pipeline with column names in a working way?

While array types are not supported as data flow parameters, passing in a comma-separated string can work if you use the instr() function to match.
Say you have two columns, col1 and col2. Pass in a parameter with value '"col1","col2"'.
Then use instr($<yourparamname>, '"' + name + '"') > 0 to see if the column name exists within the string you pass in. Note: You do not need double quotes, but the can be useful if you have column names that are subsets of other columns names such as id1 and id11.
Hope this helps!

How to use a dynamic comma separated String value as input for a List()?

I'm building a Spark Scala application that dynamically lists all tables in a SQL Server database and then loads them to Apache Kudu.
I'm building a dynamic string variable that tracks the primary key columns for each table. The primary keys are comma separated within the variable. The following is an example of my variable value:
PrimaryKeys=storeId,storeNum,custId
The following is a required function that I must enter a List[String] as input (which primary keys is definitely not correct):
setRangePartitionColumns(List("storeId","storeNum","custId").asJava
If I just use the PrimaryKeys variable for the List input (like the following), it only works for a single column (and would fail in this example with 3 comma separated values):
setRangePartitionColumns(List(PrimaryKeys).asJava
The following is another example, but using a Seq(). I"m supposed to put the same Primary Key column names in the same format below. Manually typing the column names works fine, however I can not figure out how to dynamically input the variable values:
kuduContext.createTable(tableName, df.schema, Seq(PrimaryKey), kuduTableOptions)
Any idea how I can parse the variable PrimaryKey dynamically and feed it into either function regardless of the number of comma separated values included?
Any assistance is greatly appreciated.

Convert varchar parameter with CSV into column values postgres

I have a postgres query with one input parameter of type varchar.
value of that parameter is used in where clause.
Till now only single value was sent to query but now we need to send multiple values such that they can be used with IN clause.
Earlier
value='abc'.
where data=value.//current usage
now
value='abc,def,ghk'.
where data in (value)//intended usage
I tried many ways i.e. providing value as
value='abc','def','ghk'
Or
value="abc","def","ghk" etc.
But none is working and query is not returning any result though there are some matching data available. If I provide the values directly in IN clause, I am seeing the data.
I think I should somehow split the parameter which is comma separated string into multiple values, but I am not sure how I can do that.
Please note its Postgres DB.

You can try to split input string into an array. Something like that:
where data = ANY(string_to_array('abc,def,ghk',','))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Extract table name from more than one schema length in Data Factory Expression - azure-data-factory

I need to extract table name from schema.table_name, I have more than one schema where length is unknown. e.g. Finance.Reporting or Setup.System. I want to extract Reporting and System from these string using Expression in Data Factory.

You can use the split() function to split the string based on delimiter which returns the array and get the second value from the string. Note: Array index starts from 0. #split('Finance.Reporting','.')[1]

Related

MD5/SHA Field Dataset in Data Fusion

Creating an array of columns from an array of column names in data flow

How to pass an array for column pattern matching in mapping dataflow derived column from CSV file through pipeline?

How to use a dynamic comma separated String value as input for a List()?

Convert varchar parameter with CSV into column values postgres

Categories

Resources