I have a logic app that runs on occurrence initially that runs an ADF
pipeline which outputs a folder of files.
Then, I use a List Blobs action to pull one specific file
from the newly made folder and place its path on a queue.
And once a message is placed on that queue, it triggers the run of
another ADF pipeline.
The issue is I have not seen a way to get the output of the first ADF pipeline to put on the queue. I have tried to cheat within the List Blobs action that is sequential to the 1st ADF pipeline by explicitly searching the name of the output folder because it will be the same every time.
However, even after the 1st ADF is ran and produces the folder, within the first instance of this Logic App being ran the List Blobs can't find the folder and says the file path is not found.
Only after I run the Logic App a second time the folder is finally found which is not at all optimal. How can I fix this ? I prefer to keep everything in one logic app. Are there other Azure tools that can help in addition?
I am not having the details of the implementation but i am wondering if the message is written by the first pipeline is only used as a signal the second pipeline ? if thats the case why you cannot you call the second pipeline on completion of the first one ? may be these pipelines are on different ADF's ?
I suggest you to read and see if you can use the Event triggers
Related
which needs to be triggered when a file received in a Blob.
But the complex part is that there are 2 files, A.JSON and B.JSON which will be generated in 2 different locations.
So When A.JSON generated in location 1, the Pipeline A should trigger and also when B.JSON generated in Loation 2, the Pipeline A should trigger. I have done the blob trigger using 1 file on 1 location but not sure how to do when 2 different files come in 2 different locations .
There are three ways you could do this.
Using ADF directly with conditions to evaluate if the file triggered is from a specific path as per your need.
Setup Logic Apps for each different paths you would want to monitor for blobs created.
Add two different triggers configured for different paths (best option)
First method: (This has an overhead of running every time a file is triggered in container.)
Edit the trigger to look through whole storage or all containers. Select the file type: JSON in your case.
Parameterize source dataset for dynamic container and file name
Create parameters in pipeline, one each for refering the folder path you want to monitor and one for holding the triggered filename.
where receive_trigger_files will be assigned the triggered file name dynamically.
I am showing an example here where a lookup activity would evaluate the path and execute the respective activities forward if triggered file path and our monitoring paths match.
another for the path2
For example a Get MetaData activity or any in your scenario
Lets manually debug and check for a file exercise01.json that is sored in path2
You can also use IF condition activity similarly, but would require multiple steps or monitoring using activity statuses won't be clear.
Second method: Setup a blob triggered logic app
Run ADF pipeline using Create a pipeline run action, and set or pass appropriate parameters as explained previously.
Third method: Add 2 triggers each for a path you wish to monitor blob creation.
I would like to use an event based trigger to run a data factory pipeline.
The trigger will check a folder in a data lake for any new file and start a pipeline once a new CSV file is copied.
The pipeline will then copy the data to an intermediate table to check its consistency (multiple checks using different data flow activities) and if everything's correct, copies it into a stage table.
It is thus very important that the intermediate table will contain the data from only one single CSV file before it is checked.
I have read though that the event based trigger will start in parallel as many pipelines as the (simultaneously) downloaded CSV files.
Is this right? in this case how can I force each Pipeline to wait until the previous one is done?
Thank you for your help.
There is a flag on the pipeline properties (accssible in the top-right of the editor pane) called concurrency. Set this to 1 and only one copy will run and any other invocations will be queued until that one finishes.
I have 2 build pipelines in my azure devops project, one for building source code and the other one is for
making the setup.
I want the build number generated by the first pipeline that compiles code to be passed to the next pipeline which creates the setup file because i want the setup file to take the same version, so I added a variable group with a variable called sharedBuildCounter.
But when I set sharedBuildCounter the build number in the first pipeline using logging command like this(used inside PowerShell task):
Write-Host "##vso[task.setvariable variable=variable_name;]new_value"
The variable indeed takes the new value and I am able to output the new value using another PowerShell task with one line:
Write-Host $(SharedBuildCounter)
And when I run the next pipeline that builds the setup, I find that sharedBuildCounter is being re-set to the default empty value.
Notice: I found threads that suggests using API rest calls to change variable values, but it don't seem to include a specific pipeline name in case of using pipeline variables(not variable groups).
Variable groups will help to share static values across builds and releases pipeline.
What you need is a way to pass variables from one pipeline to another. I'm afraid to say the is no official way to do this.
As a workaround you could update the value of your variables inside your variable group. There are multiple ways to handle this, Rest API, powershell, 3rd-party extension. Detail ways please refer answers in this question: How to Increase/Update Variable Group value using Azure Devops Build Definition?
If you want to get the value of variable in the pipeline. Since you have used logging command to update that variable.
You need to use Rest API to get that particular build log to fetch related info.
You can use Azure Artifacts to pass information between pipelines. In one pipeline, you write the values to a file and publish the file to an artifact. In the other pipeline, you download the artifact and read the file.
There may be other ways to do it. Azure DevOps allows for free and infinite use of Azure Artifacts in this fashion.
See How to get variable values from pipeline resources in azure pipelines.
I have a azure blob container where some json files with data gets put every 6 hours and I want to use Azure Data Factory to copy it to an Azure SQL DB. The file pattern for the files are like this: "customer_year_month_day_hour_min_sec.json.data.json"
The blob container also has other json data files as well so I have filter for the files in the dataset.
First question is how can I set the file path on the blob dataset to only look for the json files that I want? I tried with the wildcard *.data.json but that doesn't work. The only filename wildcard I have gotten to work is *.json
Second question is how can I copy data only from the new files (with the specific file pattern) that lands in the blob storage to Azure SQL? I have no control of the process that puts the data in the blob container and cannot move the files to another location which makes it harder.
Please help.
You could use ADF event trigger to achieve this.
Define your event trigger as 'blob created' and specify the blobPathBeginsWith and blobPathEndsWith property based on your filename pattern.
For the first question, when an event trigger fires for a specific blob, the event captures the folder path and file name of the blob into the properties #triggerBody().folderPath and #triggerBody().fileName. You need to map the properties to pipeline parameters and pass #pipeline.parameters.parameterName expression to your fileName in copy activity.
This also answers the second question, each time the trigger is fired, you'll get the fileName of the newest created files in #triggerBody().folderPath and #triggerBody().fileName.
Thanks.
I understand your situation. Seems they've used a new platform to recreate a decades old problem. :)
The patter I would setup first looks something like:
Create a Storage Account Trigger that will fire on every new file in the source container.
In the triggered Pipeline, examine the blog name to see if it fits your parameters. If no, just end, taking no action. If so, binary copy the blob to a account/container your app owns, leaving the original in place.
Create another Trigger on your container that runs the import Pipeline.
Run your import process.
Couple caveats your management has to understand. You can be very, very reliable, but cannot guarantee compliance because there is no transaction/contract between you and the source container. Also, there may be a sequence gap since a small file can usually process while a larger file is processing.
If for any reason you do miss a file, all you need to do is copy it to your container where your process will pick it up. You can load all previous blobs in the same way.
I have configured a Stream Analytics Jobs so that input data goes to an Azure Data Lake repository every hour.
Sometimes there is no event to track, so no output. But my Data Factory goes in error because the file doesn't exist.
I wonder if exist a way to force empty file out from Stream Analytics?
Many thanks!
You can look at our common query patterns here. In particular I think you can use the one named "fill missing values" to generate some events regularly, even when there is no input.
Let me know if it works for you.
Thanks!
JS
Are you using ADF v2?
I didn't find anything inbuilt in ADF to come up with it.
But I can see few workarounds - starting from simplest one:
In your ASA query, you can use WITH statement and union your input with a fake empty message. - Then there will be always output
As a second output in ASA job you can store in some DB info whenever a file was produced. Then in ADF you can check whenever there are files and run copy conditionally.
In ADF run web activity e.g. LogicApp/FunctionApp to get info whenever files in container exist.
Find the way to do it...
I had an activity using the data lake analytics, what I do is to run an U-SQL than read data with no transformation and write it to the output with headers.
In that way the activity always write an output file!
Very easy!