Timeout issue for http connector and web activity on adf - azure-data-factory

Timeout issue for http connector and web activity
Web activity and http connector on adf
We have tried loading data through Copy Activity using REST API with Json data some columns are getting skipped which is having no data at its first row. We have also tried REST API with cv data but it's throwing error. We have tried using Web Activity but its payload size is 4MB, so it is getting failed with timeout issue. We have tried using HTTP endpoint but its payload size is 0.5 MB, so it is also getting failed with timeout issue

In Mapping settings, Toggle on the advanced editor and give the respective value in collection reference to cross apply the value for nested Json data. Below is the approach.
Rest connector is used in source dataset. Source Json API is taken as in below image.
Then Sink dataset is created for Azure SQL database. Once the pipeline is run, few columns are not copied to database.
Therefore, In Mapping settings of copy actvity,
1. Schema is imported
2. Advanced editor is turned on
3. Collection reference given.
When pipeline is run after the above changes, all columns are copied in SQL database.

Related

Copy Activity Not able to copy any response from Rest api in Azure Data factory

I am using input as rest api url .And I am trying to save the response to a sql table.When I run the pipeline the pipeline run successfully,But it is showing zero rows copied.
I tested the api in postman.I am able to see the reponse data (9 mb)
Anybody else got this same issue,Please help me
I tried to reproduce and faced similar problem Its not inserting any records.
The problem is causing due to API returns response in Json and pipeline doesn't know which object value should store in which column.
To resolve this use Mapping. import the scma and amp the paricular columns as below:
Output:
I think the intend here is copy the response json to SQL and if thats the case then we cannot do that with copy activity .
One way is you can use a web activity to call the API and after that you can call a Stored proc activity and pass the response as a input paramter to the SP . The SP will insert the record in the table . But 9MB of response is too big , i doubt if the web activity can handle that .

Azure Data Factory - REST Pagination rules

I'm trying to pull data from Hubspot to my SQL Server Database through an Azure Data Factory pipeline with the usage of a REST dataset. I have problems setting up the right pagination rules. I've already spent a day on Google and MS guides, but I find it hard to get it working properly.
This is the source API. I am able to connect and pull the first set of 20 rows. It gives an offset which is usable with vidoffset= which is returned in the body.
I need to return the result of vid-offset from the body to the HTTP request. Also the process needs to stop when has-more results in 'false'.
I tried to reproduce the same in my environment and I got the below results:
First I create a linked service with this URL: https://api.hubapi.com/contacts/v1/lists/all/contacts/all?hapikey=demo&vidOffset
Then after I created the pagination end condition rule with $.has-more and absolute URL.
For demo purpose, I took sink as a storage account.
The pipeline run success full look at the below image for reference.
For more information refer this Ms Document

How to force to set Pipelines' status to failed

I'm using Copy Data.
When there is some data error. I would export them to a blob.
But in this case, the Pipelines's status is still Succeeded. I want to set it to false. Is it possible?
When there is some data error.
It depends on what error you mentioned here.
1.If you mean it's common incompatibility or mismatch error, ADF supports built-in feature named Fault tolerance in Copy Activity which supports below 3 scenarios:
Incompatibility between the source data type and the sink native
type.
Mismatch in the number of columns between the source and the sink.
Primary key violation when writing to SQL Server/Azure SQL
Database/Azure Cosmos DB.
If you configure to log the incompatible rows, you can find the log file at this path: https://[your-blob-account].blob.core.windows.net/[path-if-configured]/[copy-activity-run-id]/[auto-generated-GUID].csv.
If you want to abort the job as soon as any error occurs,you could set as below:
Please see this case: Fault tolerance and log the incompatible rows in Azure Blob storage
2.If you are talking about your own logic for the data error,may some business logic. I'm afraid that ADF can't detect that for you, though it's also a common requirement I think. However,you could follow this case (How to control data failures in Azure Data Factory Pipelines?) to do a workaround. The main idea is using custom activity to divert the bad rows before the execution of copy activity. In custom activity, you could upload the bad rows into Azure Blob Storage with .net SDK as you want.
Update:
Since you want to log all incompatible rows and enforce the job failed at the same time, I'm afraid that it can not be implemented in the copy activity directly.
However, I came up with an idea that you could use If Condition activity after Copy Activity to judge if the output contains rowsSkipped. If so, output False,then you will know there are some skip data so that you could check them in the blob storage.

Error - Azure Data Factory transfer from SQL Database to Azure Search

I've set up an Azure Data Factory pipeline to transfer the data from one table in our SQL Server Database to our new Azure Search service. The transfer job continuously fails giving the following error:
Copy activity encountered a user error at Sink side:
GatewayNodeName=SQLMAIN01,ErrorCode=UserErrorAzuerSearchOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error
happened when writing data to Azure Search Index
'001'.,Source=Microsoft.DataTransfer.ClientLibrary.AzureSearch,''Type=Microsoft.Rest.Azure.CloudException,Message=Operation
returned an invalid status code
'RequestEntityTooLarge',Source=Microsoft.Azure.Search,'.
From what I've read thus far, Request Entity Too Large error is a standard HTTP error 413 found inside REST API. Of all the research I've done though, nothing helps me understand how I can truly diagnose and resolve this error.
Has anyone dealt with this with specific context to Azure? I would like to find out how to get all of our database data into our Azure Search service. If there are adjustments that can be made on the Azure side to increase the allowed request size, the process for doing so certainly is not readily-available anywhere I've seen on the internet nor in the Azure documentation.
This error means that the batch size written by Azure Search sink into Azure Search is too large. The default batch size is 1000 documents (rows). You can decrease it to a value that balances size and performance by using writeBatchSize property of the Azure Search sink. See Copy Activity Properties in Push data to an Azure Search index by using Azure Data Factory article.
For example, writeBatchSize can be configured on the sink as follows:
"sink": { "type": "AzureSearchIndexSink", "writeBatchSize": 200 }

Is there a REST api of DashDB to export the content of a database in json format?

I need to re-create a database from a DashDB Bluemix service into another. And I need to automate this procedure in bash scripts.
The best I can think of is a DashDB REST API that allows me to export the content of the entire database into json format (or any other format you can think of), and a corresponding API that allows me to re-import the content in a different database on the same service or on a different service, possibly in a different Bluemix space. Thanks.
I assume you want to do a one time move and this is not about a continuous replication. In that case simply sign up on http://datascience.ibm.com, navigate to DataWorks, select "Load Data" from navigation panel (open it clicking top left) and then select Cloud Database as source type.
DataWorks load data from dashDB to dashDB
If you however still would prefer to write an own app or script that does the data movement and you want a REST API to export JSON data, then I recommend to write a simple R script that reads the data from a table (using ibmdbR) and writes it to stdout, deploy the script into dashDB (POST /home) and run the R script from your app/script calling /rscript endpoint: https://developer.ibm.com/clouddataservices/wp-content/themes/projectnext-clouddata/dashDB/#/
For Db2 on Cloud and Db2 Warehouse on Cloud, there is a REST API available that allows you to export data from a table in CSV format (up to 100.000 rows) and then load the data back. It will require a few requests as:
POST /auth/tokens
GET /sql_query_export
POST /home_content/{path}
POST /load_jobs
GET /load_jobs/{id}
I've implemented a client npm module for this API - db2-rest-client and you can export a statement result to a JSON file as:
export DB_USERID='<USERID>';export DB_PASSWORD='<PASSWORD>';export DB_URI='https://<SOURCE_HOSTNAME>/dbapi/v3'
db2-rest-client query --query="SELECT * FROM SRC_SCHEMA.SRC_TABLE" > test.json
Then you can transform that data into a .csv file and use the load job:
export DB_USERID='<USERID>';export DB_PASSWORD='<PASSWORD>';export DB_URI='https://<TARGET_HOSTNAME>/dbapi/v3'
db2-rest-client load --file=transformed.csv --table='DEST_TABLE' --schema='DEST_SCHEMA'