Data Factory v2 - connecting using REST - rest

The aim is to connect to a public REST api using ADF. It's my first stab at sending requests to a REST api in ADF. It is the Companies House ('CH') governement website's API in England.
I have created an account and obtained a key. Apparently, it is basic authentication and the user name is the API key and password will be ignored (CH note on authentication)
I want to explore the contents of the 'Search all' API (CH note on Search All) and want to copy the results to Blob Storage.
I therefore set the linked service to use REST as below, the obfuscated User Name is the key I obtained from CH, the password is jsut the key repeated as their documentation states they ignore the password:
[
I then have added a REST dataset referencing this linked service:
And the testing of the connection works fine.
Problems then arise in the copy data task, I am getting an error when previewing and also when I attempt a copy to blob of 'Invalid Authorization Header':
I'd be grateful for pointers on where I'm going wrong.

I can't reproduce your Auth error but i notice that you want to add some parameters with your GET request in the Request Body.
I think you need to add parameters in relativeUrl property:
A relative URL to the resource that contains the data. When this
property isn't specified, only the URL that's specified in the linked
service definition is used. The HTTP connector copies data from the
combined URL: [URL specified in linked service]/[relative URL
specified in dataset].
Also i suggest you checking the correct REST API format of Search Api you are using.There is no other special features in the ADF REST connector. Just make sure the GET request works locally and duplicate it.

Related

Data Factory can't download CSV file from web API with Basic Auth

I'm trying to download a CSV file from a website in Data Factory using the HTTP connector as my source linked service in a copy activity. It's basically a web call to a url that looks like https://www.mywebsite.org/api/entityname.csv?fields=:all&paging=false.
The website uses basic authentication. I have manually tested by using the url in a browser and entering the credentials, and everything works fine. I have used the REST connector in a copy activity to download the data as a JSON file (same url, just without the ".csv" in there), and that works fine. But there is something about the authentication in the HTTP connector that is different and causing issues. When I try to execute my copy activity, it downloads a csv file that contains the HTML for the login page on the source website.
While searching, I did come across this Github issue on the docs that suggests that the basic auth header is not initially sent and that may be causing an issue.
As I have it now, the authentication is defined in the linked service. I'm hoping that maybe I can add something to the Additional Headers or Request Body properties of the source in my copy activity to make this work, but I haven't found the right thing yet.
Suggestions of things to try or code samples of a working copy activity using the HTTP connector and basic auth would be much appreciated.
The HTTP connector expects the API to return a 401 Unauthorized response after the initial request. It then responds with the basic auth credentials. If the API doesn't do this, it won't use the credentials provided in the HTTP linked service.
If that is the case, go to the copy activity source, and in the additional headers property add Authorization: Basic followed by the base64 encoded string of username:password. It should look something like this (where the string at the end is the encoded username:password):
Authorization: Basic ZxN0b2njFasdfkVEH1fU2GM=`
It's best if that isn't hard coded into the copy activity but is retrieved from Key Vault and passed as secure input to the copy activity.
I suggest you try to use the REST connector instead of the HTTP one. It supports Basic as authentication type and I have verified it using a test endpoint on HTTPbin.org
Above is the configuration for the REST linked service. Once you have created a dataset connected to this linked service you can include it in you copy activity.
Once the pipeline executes the content of the REST response will be saved in the specified file.

How can I use REST API authentication in Mendix?

I have designed an API REST service (with Bonita) to which I can perfectly connect with Postman, with the following parameters:
By the way, the x-www-form-urlencoded option that is selected comes from the Content-type application/x-www-form-urlencoded header that is not displayed in my screenshot. The official Bonita specification states that this header is needed and I always get a 200-OK status code as an answer.
How can I specify an equivalent request with the body part in a Mendix Call REST service in a microflow? Here is what I have so far:
I guess the body part should be specified in the Request tab, but I just don't know how to do it properly. I always get the following error message for my connector, which means that, whatever I specify, the username is not taken into account:
An error has occurred while handling the request. [User 'Anonymous_69a378ed-bb56-4183-ae71-c9ead783db1f' with session id '5fefb6ad-XXXX-XXXX-XXXX-XXXXXXXXb34f' and roles 'Administrator']
I finally found that the proxy setting was the actual problem. It was set at the project scope and simply clicking on No proxy in the General tab did the trick! (both services are hosted on my local machine so far)
I just had to fill in the dedicated Authentication field in the HTTP Headers tab then, with the correct credentials, to eventually log in my Bonita service.

Google Cloud Bucket custom metadata set but not returned in the HTTP request

I've managed to add custom metadata to my public file stored in Google Cloud Bucket, but that custom header is not returned in the HTTP response.
The image below shows that my custom metadata (X-Content-Type-Options) was added to my object. When I request that file from my browser, this custom header is not part of the response.
It is possible to add custom headers, but they will be prefixed with x-goog-meta-. AWS S3 suffers from the same limitation. It seems that this is due to security reasons. The leanest solution I've found to overcome this limitation is to use an edge such as AWS Lambda Edge or Cloudflare Edge Workers. The idea is to rewrite the headers on the fly. In my case, that would mean catching all headers that start with x-goog-meta-, and removing that prefix.
Here is an article of somebody who did that with AWS Lambda Edge: https://medium.com/#tom.cook/edge-lambda-cloudfront-custom-headers-3d134a2c18a2
You can use the x-goog-meta- for setting the metadata to the object (some examples here for adding a single metadata or for adding it in a cp operation).
You can get the custom metadata with the gsutil command and the -L param. You can also recover the custom metadata with the HTTP request API (try it out here).
But the custom metadata aren't provided in your browser when you access to the object via the URL https://storage.cloud.google.com/.... You have to build a proxy which requests the object with Storage API (for getting the content and the custom metadata) and which provides the object with the expected metadata.

How can I make a Google Cloud Storage object publicly visible while uploading it?

I have an application which is uploading objects to Google Cloud Storage using signed URLs and I'd like to know if it's possible to make the object public during the sign/upload step.
I know it's possible to make the object publicly visible by setting the policy on its bucket or by using the client library/making a REST request after it's been uploaded, but in order to minimize the impact on my workflow, I'd like to do it all in one go. Is this possible? If it can be done, I'm assuming it's by setting a header when signing the URL or when making the REST request using the signed URL but I haven't been able to find documentation which covers this.
UPDATE:
I've just found the Extension/Custom Headers section of the XML API docs which claims that this can be achieved using the x-goog-acl header (e.g. x-goog-acl: public-read). Unfortunately, this does not work. The object is not publicly visible after setting the header when signing the URL and when uploading the file.
Quoting the Cloud Storage documenation regarding Signed URLs:
When specifying the name:value pairs for headers, keep in mind the following:
Remove any whitespace around the colon that appears after the header name.
For example, using the custom header x-goog-acl: private without removing the space after the colon returns a 403 Forbidden error, because the request signature you calculate does not match the signature Google calculates.
So the solution could be setting the header value as x-goog-acl:public-read instead of x-goog-acl: public-read.

Google Storage Json Api - access "folders" from api fails?

I am having problems accessing objects that use slashes through the api. For example I have objects with the names "folder1/folder2/name". When I use this with the api I get a 400 bad request. Is this not supported yet by the Api? or is a special character needed? This also fails for me on the API explorer.
This is a URL encoding issue. The object name is a single URL path part, and thus all slashes in the name need to be %-encoded. (i.e., folder1%2ffolder2%2fname)
API explorer, unfortunately, has a known issue (reported internally) with storage.objects.get. The method returns actual file data, and the API explorer is expecting JSON metadata and things go poorly from there.