Can't read REST API with an XML response using Synapse Pipeline's Copy Activity - rest

Trying to read REST API endpoints through Synapse pipeline and sinking it in a JSON format. The API response is XML and the run ends up erroring out.
--------Error---------
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=JsonInvalidDataFormat,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when deserializing source JSON file ''. Check if the data is in valid JSON object format.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Newtonsoft.Json.JsonReaderException,Message=Unexpected character encountered while parsing value: <. Path '', line 0, position 0.,Source=Newtonsoft.Json,'",
"failureType": "UserError",
"target": "Copy REST API Data",
"details": []
}
--------Error---------
Do not want to go back and use existing C# script-based code which is currently run through SSIS packages.
Any assistance will be appreciated.

I tried to repro this using Rest connector in ADF and got the same error. Rest connector supports only JSON file. Refer this Microsoft Document on Supported capabilities of REST connector.
Instead use HTTP connector and select XML dataset. Below is the approach to do it.
Select HTTP in linked service.
Enter Base URL and authentication type and then click create.
Create new dataset for HTTP linked service. Select HTTP and then continue.
Select XML format and then select continue.
Give the linked service and relative url and then click OK.
Use this dataset as source dataset in copy activity. Once pipeline is run, data will be copied to sink.

Related

Azure Data Factory grab file from folder based on size

I ran a copy activity that used a http linked service to pull a zip file from an online and then extract the zip to a folder with multiple files within an Azure blob storage container. What I want to do now is dynamically pull the largest file from that newly created folder and run it through a data flow transformation while also deleting the folder through ADF. I am trying with a Get metadata activity that outputs the child items of the folder. The output is then connected to a ForEach activity with '#activity('Get Metadata1').output.childItems.' being passed in the Items of the ForEach setting with an inner GetMetadata activity to get the file sizes. But it errors on retrieving the file size giving me this..
{
"errorCode": "3500",
"message": "Field 'size' failed with error: 'Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (404) Not Found.,Source=Microsoft.WindowsAzure.Storage,''Type=System.Net.WebException,Message=The remote server returned an error: (404) Not Found.,Source=System,'.",
"failureType": "UserError",
"target": "Get Metadata2",
"details": []
}
Is it not possible to get the file sizes of a folder's child items?. I was following this documentation.
https://social.msdn.microsoft.com/Forums/azure/en-US/a83712ef-9a1a-4741-80b5-0e2ee8288ef5/get-child-items-size?forum=AzureDataFactory&prof=required
Create a data factory
Setup a scheduled trigger, or trigger it a different way if you know exactly when all the files are done extracting/loading.
Create a metadata activity that will return metadata on a specific folder.
Grab the largest file from blob based on the metadata.

GET a Salesforce Batch request using workbench

I am using the following syntax to get a batch request for bulk data load job from performed in our dev org
https://instance_nameā€”api.salesforce.com/services/async/APIversion/job/jobid/batch/batchId/request
In workbench I went to REST Explorer clicked GET and used the following query:
/services/async/v29.0/job/7501j000000Lb31/batch/7501g000000l0Lf
When clicking on execute, I get the following error message:
{"exceptionCode":"InvalidSessionId","exceptionMessage":"Unable to find session id"}
My end goal is to be able to pull all view request csvs from a bulk data load job instead of having to download each one of them manually
Thanks

HowTo create a hash index in ArangoDB via its HTTP API

I'm trying to create a hash index in ArangoDB via its HTTP API via CURL.
Within my ArangoDB I have several databases like:
production
staging
test
As mentioned in the docs in https://docs.arangodb.com/3.4/HTTP/Indexes/Hash.html one should call the "Index API" with an URL scheme as follows:
http://localhost:8529/_api/index?collection=products
Applied to my use case I have the following URL:
http://localhost:8529/_api/index?colletion=NodesElectric
Executing the CURL command always returns with an error like:
{
"error": true,
"errorMessage": "collection or view not found",
"code": 404,
"errorNum": 1203
}
I suppose that the problem is caused by having the collection "NodesElectric" in all databases "production", "staging",...
My question is how do I specify the according database for the mentioned collection?
Have not found an hint in the docs herein.
Thanks for any help!
Any operation triggered via ArangoDB's HTTP REST API is executed in the context
of exactly one database. To explicitly specify the database in a
request, the request URI must contain the database name in front of
the actual path:
http://localhost:8529/_db/mydb/... where ... is the actual path to the
accessed resource. In the example, the resource will be accessed in
the context of the database mydb. Actual URLs in the context of mydb
could look like this:
http://localhost:8529/_db/mydb/_api/version
This information can be found in the documentation:
https://docs.arangodb.com/3.4/HTTP/Database/
If no database is specified in the request URL, the _system database is used by default.
To create a hash index on collection NodesElectric in your database production the following URL has to be used:
http://localhost:8529/_db/production/_api/index?collection=NodesElectric

Not Able to Publish ADF Incremental Package

As Earlier Posted a thread for syncing Data from Premises Mysql to Azure SQL over here referring this article, and found that lookup component for watermark detection is only available for SQL Server Only.
So tried a work Around, that while using "Copy" Data Flow task ,will pick data greater than last watermark stored from Mysql.
Issue:
Able to validate package successfully but not able to publish same.
Question :
In Copy Data Flow Task i'm using below query to get data from MySql greater than watermark available.
Can't we use Query like below on other relational sources like Mysql
select * from #{item().TABLE_NAME} where #{item().WaterMark_Column} > '#{activity('LookupOldWaterMark').output.firstRow.WatermarkValue}'
CopyTask SQL Query Preview
Validate Successfully
Error With no Details
Debug Successfully
Error After following steps mentioned by Franky
Azure SQL Linked Service Error (Resolved by re configuring connection /edit credentials in connection tab)
Source Query got blank (resolved by re-selection source type and rewriting query)
Could you verify if you have access to create a template deployment in the azure portal?
1) Export the ARM Template: int he top-right of the ADFv2 portal, click on ARM Template -> Export ARM Template, extract the zip file and copy the content of the "arm_template.json" file.
2) Create ARM Template deployment: Go to https://portal.azure.com/#create/Microsoft.Template and log in with the same credentials you use in the ADFv2 portal (you can also get to this page going in the Azure portal, click on "Create a resource" and search for "Template deployment"). Now click on "Build your own template in editor" and paste the ARM template from the previous step in the editor and Save.
3) Deploy template: Click on existing resource group and select the same resource group as the one where your Data Factory is. Fill out the parameters that are missing (for this testing it doesn't really matter if the values are valid); Factory name should already be there. Agree the terms and click purchase.
4) Verify the deployment succeeded. If not let me know the error, it might be an access issue which would explain why your publish fails. (ADF team is working on giving a better error for this issue).
Did any of the objects publish into your Data Factory?

when using the spring cloud data flow sftp source starter app file_name header is not found

spring cloud dataflow sftp source starter app states that file name should be in the headers (mode=contents). However, when I connect this source to a log sink, I see a few headers (like Content-Type) but not the file_name header. I want to use this header to upload the file to S3 with the same name.
spring server: Spring Cloud Data Flow Local Server (v1.2.3.RELEASE)
my apps are all imported from here
stream definition:
stream create --definition "sftp --remote-dir=/incoming --username=myuser --password=mypwd --host=myftp.company.io --mode=contents --filename-pattern=preloaded_file_2017_ --allow-unknown-keys=true | log" --name test_sftp_log
configuring the log application to --expression=#root --level=debug doesn't make any difference. Also, writing my own sink that tries to access the file_name header I get an error message that such header does not exist
logs snippets from the source and sink are in this gist
Please follow this link bellow, You need to code your own Source and populate such a header manually downstream already after FileReadingMessageSource. And only after that send the message with content and appropriate header to the target destination.
https://github.com/spring-cloud-stream-app-starters/file/issues/9