ADF - replace switch statement for dataflows with a parameterized solution - azure-data-factory

I have a switch statement that looks at a variable value and based on that determines which data flow to execute. The problem with this is, I need to update the switch statement every time I add a new ID/dataflow.
Is there an alternative design to this? What if my dataflows had the same name as the variable value - would it be possible to parameterize the name of the dataflow to execute?
e.g. variable value = "1" execute data flow with name "1_dataflow", 2 execute "2_dataflow" etc. How would I accomplish this?

Yes, you can parameterize the values in any activity in Azure Data Factory and make the pipeline dynamic instead of giving hard-coded values.
You can use parameters to pass external values into pipelines,
datasets, linked services, and data flows. Once the parameter has been
passed into the resource, it cannot be changed. By parameterizing
resources, you can reuse them with different values each time.
You can also use Set Variable activity in to define and variable and set value to it which you can use in Switch activity and also can change it later.
Refer: How to use parameters, expressions and functions in Azure Data Factory, Set Variable Activity in Azure Data Factory and Azure Synapse Analytics

Related

Persistable key value pair storage in Synapse or ADF

I am using Synapse and have a lot of scenarios where I need to read a value at the beginning of a pipeline then save a value at the end of a pipeline as a key value pair (kvp). e.g. when the pipeline begins I read a value from a kvp store to get the max date from the last time the pipeline ran, I use that value to get all values from a table that are greater than or equal to that datetime. when the pipeline finishes doing what it has to do, I save the max modified date from this run. wash, rise, dry. I have a few ideas, like parquet file, redis (this seems a bit much). Just trying to see if anyone has come up with a more elegant/simple approach.
You can use Global Parameters which can be used in different pipelines and the values can be modified in the run time.
Go to Manage in Azure Data Factory and click on Global Parameters in the left panel options. Then click on + New.
Create a new Global Parameter.
Later you use this global parameter in any pipeline and can change its value in runtime. Refer below image for the same.

ADF - what's the best way to execute one from a list of Data Flow activities based on a condition

I have 20 file formats and 1 Data Flow activity that maps to each one of them. Based on the file name, I know which data flow activity to execute. Is the only way to handle this through a "Switch" activity? Is there another way? e.g. can I parameterize the data flow to execute by a variable name?:
Unfortunately , there is no option to run one out of list of dataflows based on input condition.
To perform data migration and transformation for multiple tables, you can use same dataflow and parameterize the dataflow by providing the table names either during the runtime or use a control table to hold all the tablenames and inside foreach , call the dataflow activity. In the sink settings, use merge schema option.

Azure DevOps classic pipeline difference between linked parameters and variables?

What is the difference between linked task parameters (process parameters) and variables in classic Azure DevOps build pipeline? Don't they all allow having a single place where to change values?
What I mean by "linked" task parameters are what you get by clicking the link icon when configuring tasks like below
which leads to adding a textbox for the linked value in settings page for the pipeline as you see below
Regarding parameters in the classic pipeline, we generally use Process parameters. You can link all important arguments for tasks used across the build definition as process parameters, which are then shown at one place-the Pipeline view. This means you can quickly edit these arguments without needing to click through all the tasks. Templates come with a set of predefined process parameters.
Variables give you a convenient way to get key bits of data into various parts of the pipeline. The most common use of variables is to define a value that you can then use in your pipeline. All variables are stored as strings and are mutable. The value of a variable can change from run to run or job to job of your pipeline.
The difference between them is:
Variables can be a convenient way to collect information from the
user up front. You can also use variables to pass data from step to
step within a pipeline.Unlike variables, pipeline parameters can't be
changed by a pipeline while it's running.
Parameters have data types such as number and string, and they can be
restricted to a subset of values. Restricting the parameters is
useful when a user-configurable part of the pipeline should take a
value only from a constrained list. The setup ensures that the
pipeline won't take arbitrary data.
Process parameters differ from variables in the kind of input supported by them. Variables only take in string inputs while process parameters in addition to string inputs support additional data types like check boxes and drop-down list boxes.
For detailed information, please refer to the following documents:
Define variables
Process parameters
Variables and parameters

Do pipeline variables persist between runs?

I'm doing a simple data flow pipeline between 2 cosmos dbs. The pipeline starts with the dataflow, which grabs the pipeline variable "LastPipelineStartTime" and passes that parameter to the dataflow for the query to use in order to get all new data where c._ts >= "LastPipelineStartTime". Then, on data flow success, updates the variable via Set Variable to the pipeline.TriggerTime(). Essentially so I'm always grabbing new data between pipeline runs.
My question is: it looks like the variable during each debug run reverts back to its Default Value of 0, and instead grabs everything each time. Am I misunderstanding or using pipeline variables wrong? Thanks!
As i know,the variable which is set in the Set Variable Activity has it's own life cycle: during current execution of pipeline.Any change of variable can't persist until next execution stage.
To implement your needs,pls refer to my workarounds as below:
1.If you execute ADF pipeline in the schedule,you could just pass the schedule time as parameter into it to make sure you grab new data.
2.If the frequency is random,persist the trigger time into other residence(e.g. simple file in the blob storage),before data flow activity,use LookUp Activity to grab that time from blob storage file.

pass multiple dynamic parameters to U-SQL script via data factory

currently I can pass one parameter to u-sql script in data factory workflow.
and with that parameter i can apply some pattern to generate files paths.
is there any way to pass collection of datetimes parameters to u-sql and
apply pattern to generate file paths?
You can pass multiple parameters. U-SQL also allows parameters of type SqlArray<>. I am not sure though if ADF supports passing in such typed values. I think the PowerShell APIs do allow it.
I assume that passing the values as a file will not work, since you will not get compile time partition elimination with it.
Pass a Json parameter. Then handle it with u-sql.