Azure Data Factory DataFlow Error: Key partitioning does not allow computed columns

Azure Data Factory DataFlow Error: Key partitioning does not allow computed columns - azure-data-factory

We have a generic dataflow that works for many tables, the schema is detected at runtime.
We are trying to add a Partition Column for the Ingestion or Sink portion of the delta.
We are getting error:
Azure Data Factory DataFlow Error: Key partitioning does not allow computed columns
Job failed due to reason: at Source 'Ingestion'(Line 7/Col 0): Key partitioning does not allow computed columns
Can we pass the partition column as a parameter to a generic dataflow?

Can we pass the partition column as a parameter to a generic dataflow?
I tried your scenario and got similar error.
There is a limitation of key partition method is we cannot apply any calculation to the partition column while declaring it. Instead, this must be created in advanced, either using derived column or read in from source.
To resolve this, you can try following steps -
First, I created a pipeline parameter with datatype string and gave column name as value.
Click on Dataflow >> Go to Parameter >> In value of parameter select Pipeline expression >> and pass the above created parameter.
OUTPUT:
It is taking it as partition key column and partitioning data accordingly.
Reference : How To Use Data Flow Partitions To Optimize Spark Performance In Data Factor

Related

How to map the iterator value to sink in adf

A question concerning Azure Data Factory.
I need to persist the iterator value from a lookup activity (an Id column from a sql table) to my sink together with other values.
How to do that?
I thought that I could just reference the iterator value as #{item().id} as source and a destination column name from from my sql table sink. That doesn’t seems to work. The resulting value in the destination column is NULL.

I have used 2 look up activities, one for id values and the other for remaining values. Now, to combine and insert these values to sink table, I have used the following:
The ids look up activity output is as following:
I have one more column to combine with above id values. The following is the look up output for that:
I have given the following dynamic content as the items value in for each as following:
#range(0,length(activity('ids').output.value))
Inside for each activity, I have given the following script activity query to insert data as required into sink table:
insert into t1 values(#{activity('ids').output.value[item()].id},'#{activity('remaining rows').output.value[item()].gname}')
The data would be inserted successfully and the following is the reference image of the same:

How to map Data Flow parameters to Sink SQL Table

I need to store/map one or more data flow parameters to my Sink (Azure SQL Table).
I can fetch other data from a REST Api and is able to map these to my Sink columns (see below). I also need to generate some UUID's as key fields and add these to the same table.
I would like my EmployeeId column to contain my Data Flow Input parameter, e.g. named param_test. In addition to this I need to insert UUID's to other columns which are not part of my REST input fields.
How to I acccomplish that?

You need to use a derived column transformation, and there edit the expression to include the parameters.
derived column transformation
expression builder

Adding to #Chen Hirsh, use the same derived column to get uuid values to the columns after REST API Source.
They will come into sink mapping:
Output:

How to format the negative values in dataflow?

I have below column in my table
I need an output as below
I am using Dataflow in the Azure data factory and unable to get the above output. I used derived column but no success. I used replace function, but it's not coming correct. Can anyone advise how to format this in dataflow?

Source is taken in data flow with data as in below image.
Derived column transformation is added next to source.
New column is added and the expression is given as
iif(left(id,1)=='-', replace(replace(id,"USD",""),"-","-$"), concat("$", replace(id,"USD","")))
Output of Derived Column activity

How do I query Postgresql with IDs from a parquet file in an Data Factory pipeline

I have an azure pipeline that moves data from one point to another in parquet files. I need to join some data from a Postgresql database that is in an AWS tenancy by a unique ID. I am using a dataflow to create the unique ID I need from two separate columns using a concatenate. I am trying to create where clause e.g.
select * from tablename where unique_id in ('id1','id2',id3'...)
I can do a lookup query to the database, but I can't figure out how to create the list of IDs in a parameter that I can use in the select statement out of the dataflow output. I tried using a set variable and was going to put that into a for-each, but the set variable doesn't like the output of the dataflow (object instead of array). "The variable 'xxx' of type 'Array' cannot be initialized or updated with value of type 'Object'. The variable 'xxx' only supports values of types 'Array'." I've used a flatten to try to transform to array, but I think the sync operation is putting it back into JSON?
What a workable approach to getting the IDs into a string that I can put into a lookup query?
Some notes:
The parquet file has a small number of unique IDs compared to the total unique IDs in the database.
If this were an azure postgresql I could just use a join in the dataflow to do the join, but the generic postgresql driver isn't available in dataflows. I can't copy the entire database over to Azure just to do the join and I need the dataflow in Azure for non-technical reasons.
Edit:
For clarity sake, I am trying to replace local python code that does the following:
query = "select * from mytable where id_number in "
df = pd.read_parquet("input_file.parquet")
df['id_number'] = df.country_code + df.id
df_other_data = pd.read_sql(conn, query + str(tuple(df.id_number))
I'd like to replace this locally executing code with ADF. In the ADF process, I have to replace the transformation of the IDs which seems easy enough if a couple of different ways. Once I have the IDs in the proper format in a column in a dataset, I can't figure out how to query a database that isn't supported by Data Flow and restrict it to only the IDs I need so I don't bring down the entire database.

Due to variables of ADF only can store simple type. So we can define an Array type paramter in ADF and set default value. Paramters of ADF support any type of elements including complex JSON structure.
For example:
Define a json array:
[{"name": "Steve","id": "001","tt_1": 0,"tt_2": 4,"tt3_": 1},{"name": "Tom","id": "002","tt_1": 10,"tt_2": 8,"tt3_": 1}]
Define an Array type paramter and set its default value:
So we will not get any error.

Copying data from a single spreadsheet into multiple tables in Azure Data Factory

The Copy Data activity in Azure Data Factory appears to be limited to copying to only a single destination table. I have a spreadsheet containing rows that should be expanded out to multiple tables which reference each other - what would be the most appropriate way to achieve that in Data Factory?
Would multiple copy tasks running sequentially be able to perform this task, or does it require calling a custom stored procedure that would perform the inserts? Are there other options in Data Factory for transforming the data as described above?

If the columnMappings in your source and sink dataset don't against the error conditions mentioned in this link,
1.Source data store query result does not have a column name that is specified in the input dataset "structure" section.
2.Sink data store (if with pre-defined schema) does not have a column name that is specified in the output dataset "structure" section.
3.Either fewer columns or more columns in the "structure" of sink dataset than specified in the mapping.
4.Duplicate mapping.
you could connect the copy activities in series and execute them sequentially.
Another solution is Stored Procedure which could meet your custom requirements.About configuration,please refer to my previous detailed case:Azure Data Factory mapping 2 columns in one column

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse