How to pass special character as parameter in Azure Data Factory? - azure-data-factory

I am trying to parametrize the Column Delimiter field in CSV dataset in Azure Data Factory.
(https://i.stack.imgur.com/JGkD5.png)
This unfortunately doesn't work when I pass a special character as a parameter.
When I hardcode the special character in the column delimiter field all works as expected.
This works
However, when I have \u0006 as a parameter in SQL DB (varchar(10) type)
(https://i.stack.imgur.com/GyTdr.png)
and I pass it in the pipeline
(https://i.stack.imgur.com/CUeRf.png)
The Copy Data activity doesn't detect this special character as a delimiter.
My guess is that when I use a parameter it passes \u0006 as a string, but I can't find anywhere how to bypass that.

I tried to pass \u0006 in column delimiter as a dynamic content. It didn't consider that as column delimiter. All data are shown as a single column.
Therefore, I tried to pass equivalent symbol of \u0006 ACK() as a dynamic value to that column delimiter and it worked. I tried to convert the \u0006 into the special character using SQL script. Below are the steps to do this.
File delimiters are stored in a table.
To convert this column into equivalent characters, \u is removed from the column and the resultant hexadecimal value is converted into an integer. Then nchar() function is used to the integer data.
select nchar(cast(right(file_delimiter,4) as int)) as file_delimiter from t5
The above SQL query is used in Lookup activity in ADF.
When this value is passed as a dynamic content to column delimiter to that dataset, values are properly delimited.
Once pipeline is run, data is copied successfully.

Related

Extract table name from more than one schema length in Data Factory Expression

I need to extract table name from schema.table_name, I have more than one schema where length is unknown. e.g. Finance.Reporting or Setup.System. I want to extract Reporting and System from these string using Expression in Data Factory.
You can use the split() function to split the string based on delimiter which returns the array and get the second value from the string.
Note: Array index starts from 0.
#split('Finance.Reporting','.')[1]

Azure ADF Copy Activity with Trailing Column Delimiter

I have a strange source CSV file where it contains a trailing column delimiter at the end of each record just before the carriage return/new line.
When ADF is previewing this data, it displays only 2 columns without issue and all the data rows. However, when using the copy activity, it fails with the following exception.
ErrorCode=DelimitedTextColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The
name of column index 3 is empty. Make sure column name is properly
specified in the header
Now I understand why it's complaining about this due to trailing delimiter, but my question is whether or not there is a way to deal with this condition? I've tried including the trailing comma in the record delimiter (,\r\n), but then it just pivots the data where all the columns become rows.
Is there a way to address this condition in copy activity?
When preview the data in dataset, it seams correct:
But actually in copy actives, the data will derived to 3 columns by the column delimiter ",", the third column is empty or NULL value. This will cause the error.
If you use Data Flow import projection from source, you can see the third column:
Just for now, copy active doesn't support modify the data schema. You must use Data flow Derived Column to create a new schema for the source. For example:
Then mapping the new column/schema to sink will solve the problem.
HTH.
Use a different encoding for your CSV. CSV utf-8 will do the trick.

ADF copy task field type boolean as lowercase

In ADF i have a copy task that copies data from JSON to Delimited text, i get the result as
A | B | C
"name"|False|"description"
Json record is like
{"A":"name","B":"false","C":"description"}
Excepted result is as below
A | B | C
"name"|false|"description"
The bool value have to be in lowercase in the resulting Delimited text file, what am i missing?
I can reproduce this. The reason is you are converting the string to the ADF dataytpe "Boolean" which for some reason renders the values in Proper case.
Do you really have a receiving process which is case-sensitive? If you need to maintain the case of the source value simply remove the mapping, ie
If you do need some kind of custom mapping, then simply change the mapping data type to String and not Boolean.
UPDATE after new JSON provided
OK, so your first json sample has the "false" value in quotes so is treated as a string. In your second example, your json "true" is not in quotes so is a genuine json boolean value. ADF is auto-detecting this at run time and it seems like it can not be over-ridden as far as I can tell. Happy to be corrected. As an alternative, consider altering your original json to a string, as per you original example OR copying the file to Blob Store or Azure Data Lake, runniing some transform on it (eg Databricks) and then outputting the file. Alternately consider Mapping Data Flows.

How to pass an array for column pattern matching in mapping dataflow derived column from CSV file through pipeline?

I have a mapping data flow with a derived column, where I want to use a column pattern for matching against an array of columns using in()
The data flow is executed in a pipeline, where I set the parameter $owColList_md5 based on a variable that I've populated from a single-line CSV file containing a comma-separated string
If I have a single column name in the CSV file/variable encapsuled in single quotes and have the "Expression" checkbox ticked, it works.
The problem is to get it to work with multiple columns. There seems to be parsing problems having multiple items in the variable each encapsuled in single-quotes, or potentially with the comma separating them. This often causes errors executing the data flow with messages like "store is not defined" etc
I've tried having ''col1'',''col2'' and "col1","col2" (2x single quotes and double quotes) in the CSV file. I've also tried having the file without quotes, trying to replace the comma with escaped quotes (using ) in the derived column pattern expression with no luck.
How do you populate this array in the derived column based on the data flow parameter which is based on the comma-separated string in the CSV file / variable from the pipeline with column names in a working way?
While array types are not supported as data flow parameters, passing in a comma-separated string can work if you use the instr() function to match.
Say you have two columns, col1 and col2. Pass in a parameter with value '"col1","col2"'.
Then use instr($<yourparamname>, '"' + name + '"') > 0 to see if the column name exists within the string you pass in. Note: You do not need double quotes, but the can be useful if you have column names that are subsets of other columns names such as id1 and id11.
Hope this helps!

How to use a dynamic comma separated String value as input for a List()?

I'm building a Spark Scala application that dynamically lists all tables in a SQL Server database and then loads them to Apache Kudu.
I'm building a dynamic string variable that tracks the primary key columns for each table. The primary keys are comma separated within the variable. The following is an example of my variable value:
PrimaryKeys=storeId,storeNum,custId
The following is a required function that I must enter a List[String] as input (which primary keys is definitely not correct):
setRangePartitionColumns(List("storeId","storeNum","custId").asJava
If I just use the PrimaryKeys variable for the List input (like the following), it only works for a single column (and would fail in this example with 3 comma separated values):
setRangePartitionColumns(List(PrimaryKeys).asJava
The following is another example, but using a Seq(). I"m supposed to put the same Primary Key column names in the same format below. Manually typing the column names works fine, however I can not figure out how to dynamically input the variable values:
kuduContext.createTable(tableName, df.schema, Seq(PrimaryKey), kuduTableOptions)
Any idea how I can parse the variable PrimaryKey dynamically and feed it into either function regardless of the number of comma separated values included?
Any assistance is greatly appreciated.