Efficient architecture/tools for implementing async web API

Efficient architecture/tools for implementing async web API - rest

Consider an event-driven microservice based web application that should have some async web APIs. AFAIK the suggested way to achieve async http request/response is to respond each API call with a say 202 Accepted status code and a location header to let caller retrieve the results later.
In this way, we have to generate a unique ID (like uuid or guid) for each request and store the id and all related events in the future in a persistent storage, so the API caller can track the progress of its request.
My question is how this API layer should be implemented considering we may have tens or hundreds of thousands of requests and responses per second. What is the most efficient architecture and tools to make such an API with this load?
One way could be storing all the requests and all related events in both database and a cache like redis (just for a certain limited time like 30 minutes).
Is there any better pattern/architecture/tools? How big companies and websites solved this issue?
Which database could be better for this scenario? (MongoDB, MySQL, …)
I really appreciate any useful answer specially if you have some production experience.

very valid question! In term of architecture or tools point of you should check out zipkin, which is an open distributed tracing system tried and tested by Twitter and especially if you have a microservice architecture, It is really useful to track-down all your request/response. It also includes Storage options include in-memory, JDBC (mysql), Cassandra, and Elasticsearch.
If you are using spring-boot for your microservices then it is easily pluggable.
Even if you are not totally convinced with Zipkin, architecture is worth looking into. From Production experience, I have used it and it was really useful.

Related

How can event-driven architecture be applied to this example?

I am unsure how to make use of event-driven architecture in real-world scenarios. Let's say there is a route planning platform consisting of the following back-end services:
user-service (manages user data and roles)
map-data-service (roads & addresses, only modified by admins)
planning-tasks-service
(accepts new route planning tasks, keeps track of background tasks, stores results)
The public website will usually request data from all 3 of those services. map-data-service needs information about user-roles on a data change request. planning-tasks-service needs information about users, as well as about map-data to validate new tasks.
Right now those services would just make a sync request to each other to get the needed data. What would be the best way to translate this basic structure into an event-driven architecture? Can dependencies be reduced by making use of events? How will the public website get the needed data?

Cosmin is 100% correct in that you need something to do some orchestration.
One approach to take, if you have a client that needs data from multiple services, is the Experience API approach.
Clients call the experience API, which performs the orchestration - pulling data from different sources and providing it back to the client. The design of the experience API is heavily, and deliberately, biased towards what the client needs.
Based on the details you've said so far, I can't see anything that cries out for event-based architecture. The communication between the client and ExpAPI can be a mix of sync and async, as can the ExpAPI to [Services] communication.
And for what it's worth, putting all of that on API gateway is not a bad idea, in that they are designed to host API's and therefore provide the desirable controls and observability for managing them.
Update based on OP Comment
I was really interested in how an event-driven architecture could
reduce dependencies between my microservices, as it is often stated
Having components (or systems) talk via events is sort-of the asynchronous equivalent of Inversion of Control, in that the event consumers are not tightly-coupled to the thing that emits the events. That's how the dependencies are reduced.
One thing you could do would be to do a little side-project just as a learning exercise - take a snapshot of your code and do a rough-n-ready conversion to event-based and just see how that went - not so much as an attempt to event-a-cise your solution but to see what putting events into a real-world solution looks like. If you have the time, of course.

The missing piece in your architecture is the API Gateway, which should be the only entry-point in your system, used by the public website directly.
The API Gateway would play the role of an orchestrator, which decides to which services to route the request, and also it assembles the final response needed by the frontend.
For scalability purposes, the communication between the API Gateway and individual microservices should be done asynchronously through an event-bus (or message queue).
However, the most important step in creating a scalable event-driven architecture which leverages microservices, is to properly define the bounded contexts of your system and understand the boundaries of each functionality.
More details about this architecture can be found here

Event storming is the first thing you need to do to identify domain events(a change in state in your system). For example, 'userCreated', 'userModified', 'locatinCreated', 'routeCreated', 'routeCompleted' etc. Then you can define topics that manage these events. Interested parties can consume these events by subscribing to published events(via topics/channel) and then act accordingly. Implementation of an event-driven architecture is often composed of loosely coupled microservices that communicate asynchronously through a message broker like Apache Kafka. Free EDA book is an excellent resource to know most of the things in EDA.
Tutorial: Even-driven-architecture pattern

Sending >100 meg files to an API

We have a REST service where clients send us batch of files. These files can be quite large >100mb. We had assumed that we could do this via a REST API, but I was wonder what is considered the best practice here? We are not tied to using REST.
Is there a limitation on IIS and REST that would make receiving many large files untenable? Where can things go wrong?
Is there a best practice for REST for doing this sort of thing?
Are there alternatives outside of REST that are considered better?

Take a look at this document.
https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/large-data-and-streaming
MaxAllowedContentLength property of IIS setting and MaxReceivedMessageSize in binding setting might be taken to consideration when transferring large data.
In short, for Intranet transfer of large files, the preferred use of Nettcpbinding in WCF provides a good performance experience, and for Internet transfer of large files, the use of async streaming may be a good idea. In addition, I don’t think that restful services have better performance than soap web services on transferring large data.

Is ReST (HTTP) relevant for broadcasting data to many client application

I am currently working on a project made of many microservices that will asynchronously broadcast data to many possible client applications.
Additionally, client applications will be able to communicate with the system (i.e. the set of microservices) via a ReST Open-API
For broadcasting the data, my first consideration was to use a MOM (Message Oriented Middleware) such as AMQ.
However, I am asked to reconsider this solution and to prefer a ReST endpoint (over HTTP) in order to provide an API more "Open-API oriented".
I am not a big specialist of HTTP but it seems to me that main technologies to send asynchronous data from server to client are:
WebSocket
SSE
I am opening this discussion I order to get advices/feedback from other developers to help me to measure the pros & cons of this new solution. Among that:
is an HTTP technology such as SSE/WebSocket relevant for my needs
For additional information, here are a few metrics regarding the
amount of data to broadcast
considerable amount of messages per seconde
responsiveness
more than 100 clients listening for data
Thank you for your help and contribution

There's many different definitions of what people consider REST and not REST, but most people tend to agree that in practical terms and popular best practices REST services expose a data model via HTTP, and limit operations to this data model by either requesting the state of resources (GET), or updating the state of resources (PUT). From that foundation things are stacked on top of that.
What you describe is a pub-sub model. While it might be possible in academic terms to use REST concepts in a pub-sub architecture, I don't think that's really what you're looking for here.
Websocket and SSE are in most real-word situations do not fall under a REST umbrella, but they can augment an existing REST service.
If your goal is to simply create a pub-sub system that uses a technology stack that people are familiar with, Websockets are a really good choice. It's widely available and works in browsers.

Adapter Proxy for Restful APIs

this is a general 'what technologies are available' question.
My company provides a web application with a RESTful API. However, it is too slow for my needs and some of the results are in an awkward format.
I want to wrap their restful server with a proxy/adapter server, so when you connect to the proxy you get the RESTful API I wish the real one provides.
So it needs to do a few things:
passthrough most requests
cache some requests
do some extra requests on the original server to detect if a request is cacheable
for instance: there is a request for a field in a record: GET /records/id/field which might be slow, but there is a fingerprint request GET /records/id/fingerprint which is always fast. If there exists a cache of GET /records/1/field2 for the fingerprint feedbeef, then I need to check the original server still has the fingerprint feed beef before serving the cached version.
fix headers for some responses - e.g. content-type, based upon the path
do stream processing on some large content, for instance
GET /records/id/attachments/1234
returns a 100Mb log file in text format
remove null characters from files
optionally recode the log to filter out irrelevant lines, reducing the load on the client
cache the filtered version for later requests.
While I could modify the client to achieve this functionality, such code would not be re-usable for other clients (different languages), and complicates the client logic.
I had a look at whether clojure/ring could do it, and while there is a nice little proxy middleware for it, it doesn't handle streaming content as far as I can tell - the whole 100Mb would have to be downloaded. Also it doesn't include any cache logic yet.
I took a look at whether squid could do it, but I'm not familiar with the technology, and it seems mostly concerned with passing through requests rather than modifying them on the fly.
I'm looking for hints where I might find the correct technology to implement this. I'm mostly language agnostic if learning a new language gets me access to a really simple way to do it.

I believe you should choose a platform that is easier for you to implement your custom business logic on. The following web application frameworks provide easy connectivity with REST APIs, and allow you to create a web application that could work as a REST proxy:
Play framework (Java + Scala)
express + Node.js (Javascript)
Sinatra (Ruby)
I'm more familiar with Play, of which I know it provides utilities for caching you could find useful, and is also extendable by a number of plugins.
If you are familiar with Scala, you could have a also have a look at Finagle. It is a framework build be Twitter's infrastructure team to provide protocol-agnostic connectivity. It might be an overkill for REST to REST proxy, but it provides abstractions you might find useful.
You could also look at some 3rd party services like Apitools, which allows to create a proxy programmatically (in lua). Apirise is a similar service (of which I'm a co-founder) that intends to do provide similar functionalities with a user-friendly UI.

Beeceptor does exactly what you want. It plugs in-between your web-app and original API to route requests.
For your use-case of caching a few responses, you can create a rule. That way it shall not hit the original endpoint.
The requests to original APIs can be mocked, and you can inspect response
You can simulate delays.
(Note: it is a shameless plug, I am the author of Beeceptor and thought it should help you and other developers.)

https://github.com/nodejitsu/node-http-proxy is looking useful - although I don't yet know if it can stream process for transcoding.

Using SOAP to expose CRUD operations

Is exposing CRUD operations through SOAP web services a bad idea? My instinct tells me that it is not least of which because the overhead of doing database calls overhead could be huge. I'm struggling to find documentation for/against this (anti)pattern so I was wondering if anyone could point me to some documentation or has an opinion on the matter.
Also, if anyone knows of best practises (and/or documentation to that effect) when designing soap services, that would be great.
Here's an example of how the web service would look:
Create
Delete
Execute
Fetch
Update
And here's what the implementation would look like:
[WebMethod]
public byte[] Fetch(byte[] requestData)
{
SelectRequest request = (SelectRequest)Deserialize(requestData);
DbManager crudManager = new DbManager();
object result = crudManager.Select(request.ObjectType, request.Criteria);
return Serialize(result);
}

If you want to use SOAP in a RESTful manner then there is a interesting standard for this, WS-Transfer; which provides loosely coupled CRUD endpoints; from which you inspect the message and act upon your entities accordingly.
Then you can layer whatever else you want on top, WS-Secure, WS-Reliable messaging and so on.

I think publishing a SOAP service that exposes CRUD operations to anonymous, public "users" would be a particularly bad idea. If, however, you can restrict one or both of these caveats, then I see nothing wrong with it (moreover I've implemented such services many times).
You can require, in addition to whatever method parameters your require to perform the operation, username & password parameters that in effect authenticates the originator prior to processing the request: a failure to authenticate can be signalled with the return of a SOAP exception. If you were especially paranoid, you could optionally run the service over SSL
You can have the server solution that deals with sending and receiving the requests filter based on IP, onyl allowing requests from a list of approved addresses.
Yes, there are overheads to running requests over SOAP (as opposed to exposing direct database access) - namely the processing time to wrap a request into a HTTP request, open a socket & send it (and the reverse at the receiving end and the again for the response) - but, it does have advantages.
Java (though the NetBeans IDE) and .Net (through VS), both support consumption of Web Services into projects / solutions - the biggest benefit of this is that objects / structures on the remote service are automatically translated into native objects in the consuming application, which is exceptionally handy.

If all you want to do is CRUD over the web, I'd look at some different technologies for doing REST instead of using WS*. SQL Data Services (formerly Project Astoria) might actually be a good alternative.

There is nothing wrong with exposing the CRUD operations via SOAP web-services per se.
You will obviously find quite a lot of examples for such services.
However depending on your particular requirements you might find that for you using SOAP is too much overhead or that you could be better off using use JSON/AJAX etc.
So I believe that unless you will provide additional details about your particular details there is no good answer for your question.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse