I have used a look up activity to pass the value to the for each iteration activity. The output values from Lookup is generated from a SQL table. Once the iteration starts if one of the activity inside the for each fails, the for each iterator tries to run it for the number of times, the lookup output value is available. How do I come out of the loop? I have removed the records from the SQL table, to come out of the loop, but the loop continues to run. How can I clear the For Each Items set when an inner activity fails?
REgards,
Sandeep
How can I clear the For Each Items set when an inner activity fails?
No, we can't. For Each active doesn't support break for now even if the internal active failed.
Many users have post same questions in stack overflow and Data Factory feedback:
It' voted up 31 times but still with no respond of the Data Factory Product Team.
Ref: https://feedback.azure.com/forums/270578-data-factory/suggestions/39673909-foreach-activity-allow-break
Update:
Congratulations that you have found a solution for you scenario:
"Now used an until activity by comparing the variable values and count of files out put from a lookup activity to resolve the issue."
I post it in answer and this can be beneficial to other community members.
Hope this helps.
I have replaced the for each loop with the until activity. The input for the until activity was a SQL query which returns the count of records from the table where the file names are copied and a variable value. Used the #greater expression with Variable value, and lookup activity value. Inside the loop created logic to increment the value of the variable using a temp variable and add expression. If an expression fails, marked the variable value greater than the lookup output value.
Related
I'm creating a pipeline in Azure Data Factory and I'm trying to use a lookup activity to query an Azure SQL dataset as so:
The lookup activity is inside a for each loop. My hope is to get a value from the source dataset being queried in the lookup for every item in the for each loop. However, when I preview the data to test this, it does not work and I get this message:
Does anyone have any ideas as to how to call a for each loop item in a query on a dataset in a lookup in ADF?
Thanks,
Carolina
EDIT:
I've changed tact and tried to use a stored procedure but I'm still having the exact same issue. It seems like I can't call the for loop item with a query of stored procedure. Does anyone know a way around this or how to call the item properly?
Data preview in ADF is used for checking the inserting data is correct or not. So, when we use preview which involves dynamic expressions, it does not take those expressions from the pipeline run and it asks us to manually enter the value for it.
So, that's why in the above it is asking a sample item() expression value to give the data preview in the lookup.
When I gave my item() value in that I got correct preview as below:
Preview of lookup that I got for that particular item():
Your approach is fine and when you debug the pipeline with required query in lookup, you can get the desired result.
please go through my sample demonstration:
Array variable for ForEach:
look up inside ForEach with query:
Result after debug pipeline:
You can see that I got the same result in the first iteration of ForEach, which is in look up preview above.
So, for preview, we have to give a sample value to check our result.
Is there a way to query a DB table as a one time activity, so that the values can be used to drive a repeating pipeline activity.
Let's say I have a set of values that varies based on the environment(DEV/TEST/PROD). Instead of passing the values corresponding to the environment as parameters, can I configure these values in a DB table and read them the first time the Data Factory runs, so that a repeating Orchestrator task that runs every five minutes can fetch value obtained from the table?
You can use a Lookup activity for your case.
Specify your query in the Lookup activity to get the row you want to query for your environment value. You may also wants to check the "First row only" option for your case.
To access the value returned from DB, you can get the value from the output of the Lookup. It would be in the "firstRow" object of the output.
For the conditional/switch handling for your use case, put in #activity('Lookup config table').output.firstRow.VALUE for your expression in the Switch's dynamic content.
you can use lookup activity from azure data factory to query values from db and then use parameters to store them to use in next activities, please check this
I have a simple ADF pipeline which contains 1 lookup (which loads the name of tables to be migrated) and a ForEach activity (Which contains copy activity and a function App to loads data in BQ). I want to get the Iteration ID and want to send it to Azure function App.
Let say the Lookup returns a JSON with three tables in it (A,B,C) I want to get the iteration id inside the foreach loop for example 1 for A and 2 for B and 3 for C.
Any help on this will be highly appreciated.
I agree this is a common requirement,but it seems no direct way to get the array index inside the for-each activity. However,you could try my little trick with AzureFunction Activity.
Step1: Create a text file (named as index.txt)in the some blob storage path and store 1 value in it(for using it as array index)
Step2: Inside the For-each Activity, use LookUp Activity to read the value of index.txt. First time, it is 1.
Step3: After that, execute an Azure Function Activity to change the value --plus 1.So that,next time it is 2.
Step4: When you finish For-each Activity,you could reset the value as 0 by Azure Function Activity.
No need to create 2 azure functions,just 1. You could pass a boolean parameter to distinct whether this invoke is for reset or plus.
In the lookup table from which I was going to pick the Source and destination tables/databases. I added another column with the Iterator number like 1, 2,3,4 for each row in the Source table from which the lookup activities is retrieving the data.
Then inside Azure data factory, I read that column inside the Foreach loop. For each of the Source and Destination tables I have a self made Iterator and used that for my purpose. It worked perfectly fine for me.
I have an SSRS 2008 R2 report that uses this expression in a table:
=Lookup(Fields!DataSet1Date.Value, Fields!DataSet2Date.Value, Fields!DataSet2Price.Value, "DataSet2")
I have 2 data sets and am using the Lookup function to get data from one dataset based on the date in another dataset.
My problem is that this works on machines that I have tried it on, but others are getting errors like this:
Error 1 [rsFieldReference] The Value expression for the text box ‘Col_D2Price’ refers to the field ‘DataSet2Date’. Report item expressions can only refer to fields within the current dataset scope or, if inside an aggregate, the specified dataset scope.
Error 2 [rsFieldReference] The Value expression for the text box ‘Col_D2Price’ refers to the field ‘DataSet2Price’. Report item expressions can only refer to fields within the current dataset scope or, if inside an aggregate, the specified dataset scope.
What other things can we do to troubleshoot this issue? We are all using the same 2008R2 version.
I oftern get this "phantom" error when using the LookUp function. I call it phantom as no where can I find a reason, but there you have the error pop up.
The only way to get around it in my cases is to use the secondary function LookUpSet.
Hope I've helped.
Edit:
Furthermore you've intrigue me so I've done some research:
The lookup function is only for 1-to-1 relationship.
The loopupset funcrion is for 1-to-many relationship.
The multilookup function is for many 1-to-1 relationships, i.e. an array of single values where there is only 1 value in the second dataset. Not relevant but quite interesting.
Also I came across a potential fix. This being on the new machines try and open the datasets in the report and refresh all fields in the dialog box. For some reason this may relink the fields to this expression. Go figure...Blockquote
Ok, I have a question relating to an issue I've previously had. I know how to fix it, but we are having problems trying to reproduce the error.
We have a series of procedures that create records based on other records. The records are linked to the primary record by way of a link_id. In a procedure that grabs this link_id, the query is
select #p_link_id = id --of the parent
from table
where thingy_id = (blah)
Now, there are multiple rows in the table for the activity. Some can be cancelled. The code I have doesn't disinclude cancelled rows in the select statement, so if there are previously cancelled rows, those ids will appear in the select. There is always going to be one 'open' record that is selected if I disinclude cancelled rows. (append where status != 'C')
This solves this issue. However, I need to be able to reproduce the issue in our development environment.
I've gone through a process where I've entered a whole heap of data, opening, cancelling, etc to try and get this select statement to return an invalid id. However, whenever I run the select, the ids are in order (sequence generated), but in the case where this error occured, the select statement returned what seems to be the first value into the variable.
For example.
ID Status
1 Cancelled
2 Cancelled
3 Cancelled
4 Open
Given the above, if I do a select for the ID I want, I want to get '4'. In the error, the result is 1. However, even if I enter in 10 cancelled records, I still get the last one in the select.
In oracle, I know that if you select into a variable and more than one record is returned, you get an error (I think). Sybase apparently can assign multiple values into a variable without erroring.
I'm thinking that there's either something to do with how the data is selected from the table, where the id's without a sort order don't return in ascending order, or there's a dboption where a select into a variable will save the first or last value queried.
Edit: it looks like we can reproduce this error by rolling back stored procedure changes. However, the procs don't go anywhere near this link_id column. Is it possible that changes to the database architecture could break an index or something?
If more than one row is returned, the value that is stored will be the last value in the list, according to this.
If you haven't specified an order for retrieval via ORDER BY, then the order returned will be at the convenience of the database engine. It may very well vary by the database instance. It may be in the order created, or even appear "random" because of where the data is placed within the database block structure.
The moral of the story:
Always make singleton SELECTs return a single row
When #1 can't be done, use an ORDER BY to make sure the one you care about comes last