Pushing MailGun webhook onto AWS SQS - aws-api-gateway

I need to process MailGun webhooks. I did implement a solution directly on our web servers to process the webhooks, but MailGun generates so many calls from a large campaign that it effectively becomes a DOS attack.
One solution I've been looking at is using AWS API Gateway to a Lambda function to then push onto an SQS queue. We can then poll the queue at a rate we can manage. Unfortunately we can't get this to work as AWS API Gateway does not support multipart/form-data content types (which some of the webhooks are). This means that our SQS messages are not well formatted / structured. The best we can do is use the $util.escapeJavaScript($input.body) function in the mapping template to create an SQS message that contains the raw string of the webhook content (with escaped javascript chars) that is effectively unparsable i.e. we can't get data out of it.
I've had a go at using Zapier to process the webhook and push directly on the SQS queue. This can parse the various content types effectively and create a nicely structured message for us, but the cost of the service is not viable.
Has anybody managed this problem in another way? Are there solutions to API Gateway not parsing the content properly? I've deliberately stayed away from MailGuns event polling API as it involves significant delays before the polled data can be 'trusted' (according to MailGun).
Basically, is there another way of getting a nicely parsed message from content types multipart/form-data and application/x-www-form-urlencoded onto the queue?
Any ideas would be much appreciated!
To add, this link higlights issues with APS Gateway and multipart\form-data content:
API Gateway - Post multipart\form-data

As you've mentioned you can base64 encode in api gateway and call base64decode in the lambda function to retrieve the original payload (There are standard libraries in every language).
Also, note you can that you can use multipart form data for non file bodies.
Get non file body from multipart/form-data using AWS API Gateway and Lambda

I had the same challenge when building Suet. I ended up switching to Google Cloud functions which I really recommend. Don't waste time on Amazon API Gateway. Use Google Cloud Functions and use a middleware like multer. (You can see the source of Suet's webhook handler here).

Not sure if you ever came to a solution, but I have this working with the following settings.
Setup your API Gateway method to use "Use Lambda Proxy integration"
In your lambda (I use node.js) use busboy to work through the multi-part submission from the mailgun webhook. (use this post for help with busboy Busboy help)
Make sure that any code you are going to execute after all busboy is complete is executed in the 'finish' portion of the busboy code.

Related

Static Website, File Upload and recaptcha

I am just thinking what the best approach is to implement a simple form with file upload on a static website without any backend.
Scenario:
I have static website (NuxtJS) where a form can be filled and files can be uploaded.
To protect this form I wanted to use recaptcha by Google but as I read a little further in their documentation it seems that I need a backend which is a overkill for a static website.
Furthermore I wanted to support file upload... quite complicated without a backend.
What I thought of:
Maybe an existing product which does exactly what I am looking for? Or should I build a AWS Lambda Pipeline (of course with an S3 Bucket) to function as my "backend" for recaptcha and file upload.
Is there any approach which makes this scenario simpler, or am I thinking to complicated at the moment.
Use Case / Flow Chart:
Users enters Website.
Fills out form.
(optional) uploads files
Checks recaptcha
Clicks Send - Sends "Message" in our companies slack channel / or email.
However I solved this "common" task with a custom "backend" hosted on AWS Lambda which makes the whole stuff "serverless".
For those who are interested in "how to setup a server less backend" here's the current flow-chart which I made use of.
As you can see after the recaptcha is validated on client side and a token is generated, it is sent to the AWS API Gateway which triggers a Lambda Function (NodeJS Implementation of a Backend) where the token is validated and for file uploads pre-signed Uris are generated.
Notice: The API Gateway and the S3 Bucket need a valid CORS Configuration to communicate with each other and the world.

Webhook and API (Defination & Diffrences)

I want to know about webhook (what is webhook). What is the application of webhook (a real world scenario). Besides, what are the differences between webhook & API?
An API is a standardised way of communicating with a service. You've tagged REST in your question so I'll focus on RESTful APIs using HTTP but it is important to know that API is a very generic term.
In the REST world everything is a resource and you use the HTTP methods to define what action you want to take on or apply to that resource. For example, to list all the users on GitHub you would send a GET request to https://api.github.com/users. The URL (specifically the /users part) defines what resource you are interested in. Here the resource is a collection of all the users. There's other methods you can use; such as PUT to create or update a resource. To learn more about the different methods you can read the HTTP specification.
Webhooks are often used in conjunction with APIs but they are focused on events. They allow a service to send out 'notifications' when an event happens or some condition is met.
GitHub is again a good example of what webhooks are used for. Say I'm building a service which sends out an email every time someone leaves a comment on an issue in GitHub. I could use the GitHub API (like above) to list all of the comments on an issue and then check if there have been any new comments since the last time I checked. I can then just repeat this request every few seconds. This is known as polling. The issue here is that most of the time I'm checking the result is not going to change. This is going to be a waste of resources.
Webooks allow for Event-Driven Programming. Instead of randomly checking I can instruct GitHub to send my service a HTTP request every time a comment is added: aka a webhook. In this architecture I only have to send a request to GitHub's API when I know for sure that a new comment has been left.
Overall, you cannot really compare APIs and webhooks. The link between them is simply that webhooks send requests to APIs.

How to convert a Postman request into a NiFi request?

I don't mind if you use an example from another API that is not Adobe Analytics'. I just need to know the pattern that I have to follow in order to succesfully convert a Postman request into a NiFi request.
After successfully creating requests to pull reports from Adobe Analytics via Postman, I´m having difficulties to migrate these Postman requests to NiFi. I haven´t been able to find concrete use cases that explicity explain how to do this kind of task step-by-step.
I'm trying to build a backend on top of NiFi to handle multiple data extracts from Adobe Analytics in an efficient and robust way. That is instead of having to create all required scripts by myself. Yet, there is more documentation about REST APIs and Postman cases than there is about REST APIs and NiFi cases.
In the screenshot below we can see how the Postman request looks like. It takes 3 headers and 1 temporary header that includes the authorization value (Bearer token). This temporary header is generated automatically after filling in the OAuth 2.0 authorization form in the Authorization tab, as shown here.
Then, we have the body of the request. This json text is generated automatically by debugging Adobe Analytics' workspaces as shown here.
I'd like to know the following in a step-by-step manner with screenshots if possible:
Which processor(s) should I use in NiFi to obtain a similar response as the one I got in Postman?
Which properties should I add/remove from the processor to make this work?
How should I name these properties?
Is there a default property whose value/name I should modify?
As you can see, the question mainly refers to properties setup in NiFi, as well as Processor selection. I already tried to configure some processors but I don't seem to get the correct properties setup, or maybe I'm selecting the wrong processors.
I'm using NiFi v1.6.0 and Postman v7.8.0
This is most likely an easy task for users already familiar with NiFi and API requests, but it has proven challenging to me. Hopefully this will help other users looking to build more robust pipelines by using NiFi instead of doing it manually.
Thanks.
It only takes 3 NiFi processors to replicate a REST API request that works in Postman. In this solution we use a request that contains a nested JSON request. The advantage of this simple approach is that it reduces the amount of configuration required to obtain a successful response from the API. That is, even if you are using a complex JSON request. In this case the body of the JSON request is passed through the GenerateFlowFile processor, without the need of any other processor to parse/format the request.
Step #1. Create a processor called GenerateFlowFile. The only property that you will have to modify is the Custom Text. Paste in there your whole JSON request just as it was in Postman. In this case I'm using the very same JSON shown in the question above. It's a good idea to setup Yield Duration to 10 seconds or more.
Step #2. Create a processor called InvokeHTTP. Then modify the 6 properties shown in the screenshots below. Use the same Authorization details you've used in Postman. Make sure to copy the Bearer token from Postman after it has been tested. Also, don't forget to setup the HTTP Method, Remote URL and Content-Type as well.
Step #3. Finally, add a couple of LogAttribute processors to store the output of InvokeHTTP. One of these LogAttribute processors should store successful responses. The other one can be used for Failure, Original, Retry and No-Retry. Or you can create LogAttribute for each of these outputs.
Step #4. Now, connect the processors and Start your data flow! You should start seeing data populate the Successful LogAttribute. Then you can use the Data Provenance option to review the incoming data and confirm that this is exactly the same result you previously obtained from Postman.
Note: This is a simple, straightforward, "for starters" solution to replicate a Postman API request using a nested static JSON. There are more solutions in StackOverflow that tackle more complex cases, like dynamic JSON. Here's a list of some other posts:
nifi invokehttp post complex json
In NiFi processor 'InvokeHTTP' where do you write body of POST request?
Configuring HTTP POST request from Nifi

DialogFlow Fulfilment connecting to REST APIs

I want to use Dialogflow fulfillment to connect to an external webservice / API. One way of doing that is to use the custom webhook feature (not the inline web hook). However, when using the custom web hook it seems that you are limited to creating just one even though you may have many intents and you may want to call many endpoints. Is there a way to link to more custom webhooks (API endpoints)?
If you can only set up one web hook then your webserivce will always receive a Post request from Dialogflow and will then need to interpret the body of the request i.e. based on the intent parameter. Just wondering is there a better way to work with REST webservices with Dialogflow.
The other potential option is to use the inline web hook and then put logic in there to call specific endpoints, however, that might get a bit messy.
You can only setup one fulfillment that will handle the processing for all the Intents you've enabled. This can be either the built-in one through the fulfillment editor or at a webhook URL you specify.
That webhook is expected to delegate the actual processing to an Intent Handler of some sort. The Dialogflow node.js fulfillment library has a way to register what handler you want for each Intent name, or you can switch on the Intent name, the Action name, or any other field provided to you in your code.
In the library, you'll typically make the REST calls from an appropriate Intent handler which will take the parameters provided and craft the call. If you are using Javascript, make sure you are handling the call asynchronously and return a Promise.
I recommend a webhook because it gives you more control than the inline editor does. The inline editor is really just a webhook under the covers using Firebase Cloud Functions. Even putting it yourself in a Cloud Function gives you better control over it.
There may be costs depending where you host it, however Firebase has a free tier that is sufficient for testing and light operation. Once your Action is published, you are also eligible for a monthly cloud credit from Google.

Send batched data to Google Calendar with Scala/Spray

We are successfully sending data for new, changed, and removed events to Google Calendar from a Scala app using Spray HTTP. However, we are currently sending one event per request, and this becomes very inefficient when there are multiple events for the current user. In these cases we would like to send batched data, as described here:
https://developers.google.com/google-apps/calendar/batch
The documentation begins with:
A batch request is a single standard HTTP request containing multiple
Google Calendar API calls, using the multipart/mixed content type.
Within that main HTTP request, each of the parts contains a nested
HTTP request.
Since we are already using spray http we would like to use its support for multipart/mixed requests (spray.http.MultipartContent) but it isn't clear that this is possible since the parts must consist of one or more spray.http.BodyPart instances and there doesn't seem to be a way to turn a spray.http.HttpRequest into a BodyPart.
Has anyone successfully done this? We are also taking a look at the Google API Client for Java but would rather not go down that path if there is a more Scala-friendly way to do it.