I need to extract data from an external REST API inside an Airflow DAG. The API is protected so I need to first authenticate/login to the API as a user, and then extract data by passing an access_token in the API call. Need some help in implementing this functionality. If anyone has ever done anything similar or some example would really help. Thanks
It will be better that you do it in python code as far as calling the rest api. This python code can take argument such as a filename with timestamp and then dump the data to that file. So this can be tested independently outside of airflow. It would be testing just like a regular python code.
Then call this python code/file with a BashOperator. In DAG, it will look something like this. Note, filename is hard-coded but that can be replaced with Jinja templates
that can return a filename with timestamp.
task1 = BashOperator(task_id='get_data',
bash_command="python ~/airflow/dags/src/rest_api_call.py data_20220701_010101.txt")
task2 = PythonOperator(
task_id='load_data',
provide_context=True,
python_callable=load_data_fn,
op_kwargs={ 'filename': 'data_20220701_010101.txt'},
dag=dag)
...
def load_data_fn(**kwargs):
print(kwargs.get("filename"))
...
Create http connection (in the UI under admin->connections)
In your DAG you should create 2 SimpleHttpOperator. The first task will login and read the access-token, the second one would send a request to get the data and in the header you should put the access-token from the login task.
Here an example of login and data tasks assuming the login is basic_auth and the access-token is bearer auth
userAndPass = b64encode(b"user:password").decode("ascii")
headers = {'Authorization': 'Basic %s' % userAndPass}
login = SimpleHttpOperator(
task_id="login",
http_conn_id="conn_id",
endpoint="/login",
method="POST",
headers=headers,
response_check=lambda response: response.json()["access_token"])
get_data = SimpleHttpOperator(
task_id="get_data",
http_conn_id="conn_id",
endpoint="/data",
method="POST",
headers={"Authorization": "Bearer {{ ti.xcom_pull(task_ids='login', key='return_value') }}",
"Accept": "application/json"},
response_check=lambda response: response.json())
(login >> get_data)
Related
I am trying to explore different options to connect to a REST endpoint using Azure Data Factory. I have the below python code which does what I am looking for but not sure if Azure Data Factory offers something out of the box to connect to the api or a way to call a custom code.
Code:
import sys
import requests
from requests_oauthlib import OAuth2Session
from oauthlib.oauth2 import BackendApplicationClient
import json
import logging
import time
logging.captureWarnings(True)
api_url = "https://webapi.com/api/v1/data"
client_id = 'client'
client_secret = 'secret'
client = BackendApplicationClient(client_id=client_id)
oauth = OAuth2Session(client=client)
token = oauth.fetch_token(token_url='https://webapi.com/connect/accesstoken', client_id=client_id, client_secret=client_secret)
client = OAuth2Session(client_id, token=token)
response = client.get(api_url)
data = response.json()
When I look at the REST linked service I don't see many authentication options
Could you please point to me on what activities to use to make OAuth2 working in Azure Data Factory
You would have to use a WebActivity to call using POST method and get the authentication token before getting data from API.
Here is an example.
First create an Web Activity.
Select your URL that would do the authentication and get the token.
Set Method to POST.
Create header > Name: Content-Type Value: application/x-www-form-urlencoded
Configure request body for HTTP request.
..
Format: grant_type=refresh_token&client_id={client_id}&client_secret=t0_0CxxxxxxxxOKyT8gWva3GPU0JxYhsQ-S1XfAIYaEYrpB&refresh_token={refresh_token}
Example: grant_type=refresh_token&client_id=HsdO3t5xxxxxxxxx0VBsbGYb&client_secret=t0_0CqU8oA5snIOKyT8gWxxxxxxxxxYhsQ-S1XfAIYaEYrpB&refresh_token={refresh_token
I have shown above for example, please replace with respective id and secret when you try.
As an output from this WebActivity, you would receive a JSON string. From which you can extract the access_token to further use in any request header from further activities (REST linked service) in the pipeline depending on your need.
You can get the access_token like below. I have assigned it to a variable for simplicity.
#activity('GetOauth2 token').output.access_token
Here is an example from official MS doc for Oauth authentication implementation for copying data.
I need to make a call to a rest API from databricks preferably using Scala to get the data and persist the same in databricks. This is the first time i am doing this and I need help. Can any of you please walk me through step by step as to how to achieve this?. The API team has already created a service principal and has given access to the API. So the authentication needs to be done through SPN.
Thanks!
REST API is not recommended approach to ingest data into databricks.
Reason: The amount of data uploaded by single API call cannot exceed 1MB.
To upload a file that is larger than 1MB to DBFS, use the streaming API, which is a combination of create, addBlock, and close.
Here is an example of how to perform this action using Python.
import json
import base64
import requests
DOMAIN = '<databricks-instance>'
TOKEN = b'<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)
def dbfs_rpc(action, body):
""" A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
response = requests.post(
BASE_URL + action,
headers={"Authorization: Bearer %s" % TOKEN },
json=body
)
return response.json()
# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
while True:
# A block can be at most 1MB
block = f.read(1 << 20)
if not block:
break
data = base64.standard_b64encode(block)
dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})
For more details, refer "DBFS API"
Hope this helps.
the above code will work, in case if you want to upload jar file or non-ascii file instead of
dbfs_rpc("add-block", {"handle": handle, "data": data})
use
dbfs_rpc("add-block", {"handle": handle, "data": data.decode('UTF8')})
rest of the details are same.
I am trying to delete a student from a OneNote classnote book using the microsoft onenote API, but getting the following error.
{'error': {'code': '19999', 'message': 'Something failed, the API cannot share any more information at the time of the request.', '#api.url': 'https://aka.ms/onenote-errors#C19999'}}
I am using the REST API command documented here https://learn.microsoft.com/en-us/previous-versions/office/office-365-api/how-to/onenote-classnotebook#remove-students-and-teachers
There isn't a graph API REST call, MS haven't provided any class notebook calls in graph, and the current graph documentation points to the above documentation for dealing with class notebooks.
Here is my python code
onenote_url = 'https://www.onenote.com/api/v1.0/me/notes'
def remove_student_from_notebook( token , studentid , notebookid ):
client = OAuth2Session(token=token)
headers = { 'Authorization': 'Bearer, {}'.format(token), 'Accept': 'application/json' }
url = '{0}/classNotebooks/{1}/students/{2}'.format(onenote_url,notebookid,studentid)
events = client.delete(url, headers=headers )
print( events.json() )
I know the ids are correct because I can use the exact same ones to add a student to the class notebook without any problems.
Has anyone got this API to work?
Does it work?
What am I doing wrong?
I have found the problem. I was using the principleUserName because it is what is returned by a previous API call to get the list of students in a class notebook.
In my case for some reason the principleUserName is something like
'i:0#.f|membership|name#org.co.uk'
which has special characters so can't form part of the URL the REST API call.
The solution is to strip of the first part of the principleUserName and only use the email address i.e. name#org.co.uk, then it works.
New to TCL and having an issue with using the ::rest::simple url query ?config? ?body? command - specifically getting basic authentication to work. The example given here (https://core.tcl-lang.org/tcllib/doc/tcllib-1-18/embedded/www/tcllib/files/modules/rest/rest.html#section4) is as follows:
set url http://twitter.com/statuses/update.json
set query [list status $text]
set res [rest::simple $url $query {
method post
auth {basic user password}
format json
}]
So my attempt is:
package require rest
package require json
set url http://0.0.0.0:5000/api/id
set response [rest::simple $url {
method get
auth {basic user password}
format json
}]
puts $response
However, I keep getting a 401 error when I try and run the above against a mock API endpoint for GET:
"GET /api/id?auth=basic%20user%20password&method=get&format=json HTTP/1.1" 401 -
I can make a curl request against that same endpoint using basic auth (with Python as well), and if I disable basic auth on the endpoint this works just fine in TCL:
set url http://0.0.0.0:5000/api/id
set response [rest::simple $url {
method get
format json
}]
puts $response
So it's something to do with the basic auth credentials in the TCL rest module.
Thanks to Shawn's comment pointing out I was misreading the meaning of ? in TCL docs. Parameters surrounded by question marks are optional, rather than parameters followed by question marks. I was interpreting ::rest::simple url query ?config? ?body? as meaning the query param was optional. If there is no query, you can use an empty query as the required parameter. This ended up working:
set response [rest::simple $url {} {
method get
auth {basic user password}
format json
}]
I'm trying to use a service of DocuSign API in an abap project. I want to send a document to a specific email so it can be signed. But im getting the following error:
"errorCode": "INVALID_REQUEST_PARAMETER",## "message": "The request contained at least one invalid parameter. Query parameter 'from_date' must be set to a valid DateTime, or 'envelope_ids' or 'transaction_ids' must be specified.
I tried the following:
CALL METHOD cl_http_client=>create_by_url
EXPORTING
url = l_url (https://demo.docusign.net/restapi/v2/accounts/XXXXXX')
proxy_host = co_proxy_host
proxy_service = co_proxy_service
IMPORTING
client = lo_http_client
lo_http_client->request->set_method( method = 'POST').
CALL METHOD lo_http_client->request->set_header_field
EXPORTING
name = 'Accept'
value = 'application/json'.
CALL METHOD lo_http_client->request->set_header_field
EXPORTING
name = 'X-DocuSign-Authentication'
value = get_auth_header( ). (json auth header)
CALL METHOD lo_http_client->request->set_cdata
EXPORTING
data = create_body( ).
This is my body:
CONCATENATE
`{`
`"emailSubject": "DocuSign REST API Quickstart Sample",`
`"emailBlurb": "Shows how to create and send an envelope from a document.",`
`"recipients": {`
`"signers": [{`
`"email": "test#email",`
`"name": "test",`
`"recipientId": "1",`
`"routingOrder": "1"`
`}]`
`},`
`"documents": [{`
`"documentId": "1",`
`"name": "test.pdf",`
`"documentBase64":` `"` l_encoded_doc `"`
`}],`
`"status": "sent"`
`}` INTO re_data.
The api request to get the Baseurl is working fine. (I know the error is quite specific what the problem is, but i cant find any sources on the docusign api documentation that one of the mentioned parameters should be added to the request)
Thank you in regards
The error message seems to indicate that you're Posting to an endpoint that requires certain query string parameters -- but you're not specifying them as expected in the query string. I'd suggest you check the DocuSign API documentation for the operation you are using, to determine what query string parameters it requires, and then ensure that you're including those parameters in your request URL.
If you can't figure this out using the documentation, then I'd suggest that you update your post to clarify exactly what URL (endpoint) you are using for the request, including any querystring parameters you're specifying in the URL. You can put fake values for things like Account ID, of course -- we just need to see the endpoint you are calling, and what qs params you're sending.
To create an envelope, use
https://demo.docusign.net/restapi/v2/accounts/XXXXXX/envelopes
instead of
https://demo.docusign.net/restapi/v2/accounts/XXXXXX
Thank you for all the answers, i found the mistake. Creating the request wasn´t the problem. I was using the wrong "sending"-method -_-.
now its working :)
lo_rest_client->post( EXPORTING io_entity = lo_request_entity ).