foreach actitivity in azure data factory

foreach actitivity in azure data factory - azure-data-factory

Is there any method available in Azure data factory to exit the for each loop. I am using a for each loop to iterate through files and to process it. But when the copy activity placed inside the loop fails, the loop executes multiple times to reprocess the failed file. I think it has to do something with the number of files available in the Get meta data array. Can anyone suggest a method to resolve this issue.
Regards,
Sandeep

Like #Jay Gong said that Data factory doesn't support break the loop if the inner copy active failed!
Others have created a user voice for data factory, it have been vote up 14 times.
But still with no response. :
Hope this helps.

You can't break the ForEach loop, but you can cancel the entire pipeline running that ForEach via the Rest API in case of an error. Will add example code in a blog post next week.

Related

Azure Data Factory ForEach is seemingly not running data flow in parallel

In Azure Data Factory I am using a Lookup activity to get a list of files to download, then pass it to a ForEach where a dataflow is processing each file
I do not have 'Sequential' mode turned on, I would assume that the data flows should be running in parallel. However, their runtimes are not the same but actually have almost constant time between them (like, first data flow ran 4 mins, second 6, third 8 and so on). It seems as if the second data flow is waiting for the first one to finish and then uses its cluster to process the file.
Is that intended behavior? I have TTL on the cluster set but that did not help too much. If it is, then what is a workaround? I am currently working on creating a list of files first and using that instead of a ForEach but I am not sure if I am going to see an increase in efficiency

I have not been able to solve the issue with the Parallel data flows not executing in parallel, however, I have managed to change the solution that would increase performance.
What was before: A lookup activity that would get a list of files to process, passed on to a ForEach loop with a data flow activity.
What I am testing now: A Data flow activity that would get a list of files, and save them in a text file in ADLS, Then another data flow activity that was previously in a ForEach loop, but changed its source to use "List of Files" and point to that list
The result was an increase in efficiency (Using the same cluster, 40 files would take around 80 mins using ForEach and only 2-3 mins using List of Files), however, debugging is not easy now that everything is in 1 data flow
You can overwrite a list of files file, or use dynamic expressions and name the file as the pipelineId or something else

AZURE DATA FACTORY - Can I set a variable from within a CopyData task or by using the output?

I have simple pipeline that has a Copy activity to populate a table. That task is based on a query and will only ever return 1 row.
The problem I am having is that I want to reuse the value from one of the columns (batch number) to set a variable so that at the end of the pipeline I can use a Stored Procedure to log that the batch was processed. I would rather avoid running the query a second time in a lookup task so can I make use of the data already being returned?
I have tried duplicating the column in the Copy activity and then mapping that to something like #BatchNo but that fails and have even tried to add a Set Variable task but can't figure out how to take a single column #{activity('Populate Aleprstw').output} does not error but not sure what that will actually do in this case.
Thanks and sorry if its a silly question.
Cheers
Mark

I always do it like this:
Generate a batch number (usually with a proc)
Use a lookup to grab it into a variable
Use the batch number in all activities (might be multiple copes, procs etc.)
Write the batch completion
From your description it seems you have the batch embedded in the data copy from the start which is not typical.
If you must do it this way, is there really an issue with running a lookup again?

Copy activity doesn't return data like that, so you won't be able to capture the results that way. With this design, running the query again in a Lookup is the best option.
Is the query in the Source running on the same Server as the Sink? If so, you could collapse the entire operation into a Stored Procedure that returns the data point you are trying to capture.

Using Azure Data Factory output in Logic App

I have a logic app that runs on occurrence initially that runs an ADF
pipeline which outputs a folder of files.
Then, I use a List Blobs action to pull one specific file
from the newly made folder and place its path on a queue.
And once a message is placed on that queue, it triggers the run of
another ADF pipeline.
The issue is I have not seen a way to get the output of the first ADF pipeline to put on the queue. I have tried to cheat within the List Blobs action that is sequential to the 1st ADF pipeline by explicitly searching the name of the output folder because it will be the same every time.
However, even after the 1st ADF is ran and produces the folder, within the first instance of this Logic App being ran the List Blobs can't find the folder and says the file path is not found.
Only after I run the Logic App a second time the folder is finally found which is not at all optimal. How can I fix this ? I prefer to keep everything in one logic app. Are there other Azure tools that can help in addition?

I am not having the details of the implementation but i am wondering if the message is written by the first pipeline is only used as a signal the second pipeline ? if thats the case why you cannot you call the second pipeline on completion of the first one ? may be these pipelines are on different ADF's ?
I suggest you to read and see if you can use the Event triggers

A file prepared by one spring batch job is not accessible to other for deletion

I have a requirement where I have to prepare a file using one job and another job which runs once a day will send the file to external system and delete/or move from the location. When this job tries to delete/or move the file it can't access it.
I tried setting writable to true when file is created. Running jobs on separate times (Running one job at a time). Tried adding "delete" as a step to the same job as well. Nothing worked.
I am using file.delete(). Also tried Files.deleteIfExists().
I suspect the first job is not assigning proper permissions but don't know a way around it set permissions in spring batch

Are these jobs run by the same user? i.e. Same user and permissions?
Also what is the actual error message? Does it say permissions denied? If so they it is likely an OS restriction not Spring Batch/Java limitation.
An easier solution would be to just add a step to the first job to send the files are part of the job and drop the job that just transfers the files.

Answering my own question 😀. Hope it helps someone.
Issue was the last ItemWriter was holding the resources because I was using the composite writer. While using CompositeWriter beforeStep, afterStep methods are “hidden”. You have to call them explicitely. I selected the approach to write a custom writer which will explicitely call writer.close().
Adding afterStep method and calling super.close() should also work. Though I have nit tries that out.

How to force an empty output file with Azure Stream Analytics

I have configured a Stream Analytics Jobs so that input data goes to an Azure Data Lake repository every hour.
Sometimes there is no event to track, so no output. But my Data Factory goes in error because the file doesn't exist.
I wonder if exist a way to force empty file out from Stream Analytics?
Many thanks!

You can look at our common query patterns here. In particular I think you can use the one named "fill missing values" to generate some events regularly, even when there is no input.
Let me know if it works for you.
Thanks!
JS

Are you using ADF v2?
I didn't find anything inbuilt in ADF to come up with it.
But I can see few workarounds - starting from simplest one:
In your ASA query, you can use WITH statement and union your input with a fake empty message. - Then there will be always output
As a second output in ASA job you can store in some DB info whenever a file was produced. Then in ADF you can check whenever there are files and run copy conditionally.
In ADF run web activity e.g. LogicApp/FunctionApp to get info whenever files in container exist.

Find the way to do it...
I had an activity using the data lake analytics, what I do is to run an U-SQL than read data with no transformation and write it to the output with headers.
In that way the activity always write an output file!
Very easy!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

foreach actitivity in azure data factory - azure-data-factory

Like #Jay Gong said that Data factory doesn't support break the loop if the inner copy active failed! Others have created a user voice for data factory, it have been vote up 14 times. But still with no response. : Hope this helps.

You can't break the ForEach loop, but you can cancel the entire pipeline running that ForEach via the Rest API in case of an error. Will add example code in a blog post next week.

Related

Azure Data Factory ForEach is seemingly not running data flow in parallel

AZURE DATA FACTORY - Can I set a variable from within a CopyData task or by using the output?

Using Azure Data Factory output in Logic App

A file prepared by one spring batch job is not accessible to other for deletion

How to force an empty output file with Azure Stream Analytics

Categories

Resources