How to retrieve the continuationToken from Rest API while using Azure Data Factory? - rest

I'm trying to implement Azure Data Factory's Copy Activity to copy data from an API to Azure Blob Storage. I have set up the source & sinks correctly so that when I trigger the pipeline it pulls and loads the first load of data but I am struggling with pagination.
When I trigger the pipeline It loads the first page correctly and afterward, it doesn't return the next continuation token for fetching the data from API. If I use it until or for-each activity the pipeline copies the data of the same continuation token endless until timeout.
When I run the Rest API call in Postman it returns the data and also the next continuation token as well.
The continuation Token will be like 0000xxxx-00000-xxx00-000000xx000000 and the next continuation token is like 0000xxyy-00000-xxx00-000000yy000000.
My goal is to retrieve the data from Rest API using a continuation token and retrieve the next continuation token so that I can retrieve the next page data until the continuation token is null and store it in Azure Blob Storage with Azure Data Factory Pipeline.
I am able to retrieve the access token from the rest API and only the first page data.
Is there any way to solve this issue please let me know?

Assuming that your web activity gives you an array of elements to be processed, you can do the following :
Chain a For Each activity to iterate through the array.
Inside the ForEach activity, you can have items as something like (assuming array of items as outputArray) : #activity('webActivity').output.outputArray
Inside the ForEach activity, you can have a parameterized web activity to iterate through each item in the array. Each item can be accessed using the expression :
#item()
Please check this link : https://imgur.com/a/LE1CHvp

Related

Copy Activity Not able to copy any response from Rest api in Azure Data factory

I am using input as rest api url .And I am trying to save the response to a sql table.When I run the pipeline the pipeline run successfully,But it is showing zero rows copied.
I tested the api in postman.I am able to see the reponse data (9 mb)
Anybody else got this same issue,Please help me
I tried to reproduce and faced similar problem Its not inserting any records.
The problem is causing due to API returns response in Json and pipeline doesn't know which object value should store in which column.
To resolve this use Mapping. import the scma and amp the paricular columns as below:
Output:
I think the intend here is copy the response json to SQL and if thats the case then we cannot do that with copy activity .
One way is you can use a web activity to call the API and after that you can call a Stored proc activity and pass the response as a input paramter to the SP . The SP will insert the record in the table . But 9MB of response is too big , i doubt if the web activity can handle that .

Azure Data Factory: Pagination in Data Flow with Rest API Source

I have a API source in an ADF DataFlow task. The API source gives me the current page and the toatl number of pages in the body of the response. I want to use that information to paginate through my API source. I'm able to paginate through it just fine outside of a DataFlow activity using the range function. The issue is that the Rest transformation in a DataFlow activity does not support the range function. I've been trying to use the AbsoluteUrl function plus an expression to do add one to the current page returned by the body but either pagination does not accept expressions or I cannot figure out the syntax
I have a url like this:
BaseURL/fabricationcodes?facets=relatedArticles:Not%20Empty&page={PageNumber}.
In this example my rest linked service URL has everything I need minus the &page=pageNumber. So I'm trying to add that part with the key/value pair function of AbsoluteUrl. The Key being &page= and the value should be currentPage +1. My desire is for it to get the first page, page 0 and then add +1 to that to formulate the next pages url. the end condition being when body.totalPages == body.currentPage
I've tried a bunch of different expression formaulations but none seem to work and debugging in a Data flow is tough b/c the logging and error messaging is poor
What I have right now.
As data flow don't support Range option or you cannot use dynamic expression to get page from API response.
To work around the issue, you can use Data Flow activity within ForEach loop using range function in dynamic expression.
First take a web activity and pass the URL of the Rest API as below Ito get the total no of Pages from API response
then take a for each activity to iterate on API like pagination give the Dynamic expression as #range(1,activity('Web1').output.total_pages)
I will iterate the API till the respective range in sequential manner.
create parameter with type string in source DataSource.
give that parameter as dynamic value in relative URL.
after this gave parameter value as ?page=#{item()} to give the no coming from range to the page.
OUTPUT:

REST API Pagination in Azure Data Factory

I have a project scenario to get all values from an endpoint URL. I am using ADF Pipeline but I'm having some issues with pagination.
To get the following values, I need to make requests with the PaginationCursor value in the current body response in the following request header.
I have read that ADF supports the following case, which would be mine.
Next request’s header = property value in current response body ADF - Pagination support
I don't know how to use the following attributes in order to use the paginationCursor value from the current response body in the header of the next request.
Attributes for pagination in ADF
I tried to reproduce above but not successful. Instead, if you want to do it without pagination, you can try this approach.
First create a web activity with any page URL of your API to get the total number of pages count.
In ForEach create an array for page numbers using the count from web activity as
#range(1,activity('Web1').output.total_pages)
Inside ForEach use the copy activity and give the source REST dataset parameter for the page number like ?page=#{item()}.
In the sink dataset also, create a dataset for each page with the dataset parameter value like APIdataset#{item()}.csv. This generates the sink dataset names like APIdataset1.csv, APIdataset2.csv,...
Now, you can copy from your REST API without pagination.
My repro for your reference:
Copy activity:
I could solve this problem with the following attributes.
Solution
In the Headers I had to put the name of the header of the next call. In my case the name is PaginationCursor and I got the value of this header from the actual body response called paginationCursor.

Rest API call from copy activity

Hi i am processing a set of ~50K records from a pipe delimeted flat kn azure data factory and need to invoke a rest API call for each input record. So, I am using a foreach loop to access each record and inside the loop, I am using a copy activity to invoke a rest API call.
My question is, can I invoke the rest API call in bulk for all the records at once, as the foreach loop is slowing the pipeline execution. I want to remove the foreach loop and also process the API json response and store it in azure sql database.
Thanks
You will have to check the Pagination properties so that you can decide how much payload you need to return from source API:
https://learn.microsoft.com/en-us/azure/data-factory/connector-rest?tabs=data-factory#pagination-support
Also, if you need to store the API JSON response in Azure SQL, then you can do so with many built in functions like JSON_PATH
More details can be found in this link:
https://learn.microsoft.com/en-us/azure/azure-sql/database/json-features

Page size in List By Factory method in REST API

I'm trying to list all the pipelines stored in the Azure Data Factory instance. I want to use Azure Data Factory REST API v2, Pipelines - List By Factory method.
I noticed the "nextLink" field in the PipelineListResponse, which contains the link to the next page of results, if any remaining results exist.
My question is, how many PipelineResources are sent in a single page of the response?
I didn't find any documentation regarding this question.
How many PipelineResources are sent in a single page of the response?
In normal, the list operation response includes the nextLink property when the list operation returns more than 1,000 items. For more details, please refer to here.