Erlang/OTP architecture: RESTful protocol for SOAish services - rest

Let us imagine we have an orders processing system for a pizza shop to design and build.
The requirements are:
R1. The system should be client- and use-case-agnostic, which means that the system can be accessed by a client which was not taken into account during the initial design. For example, if the pizza shop decides that many of its customers use the Samsung Bada smartphones later, writing a client for Bada OS will not require rewriting the system's API and the system itself; or for instance, if it turns out that using iPads instead of Android devices is somehow better for delivery drivers, then it would be easy to create an iPad client and will not affect the system's API in any way;
R2. Reusability, which means that the system can be easily reconfigured without rewriting much code if the business process changes. For example, if later the pizza shop will start accepting payments online along with accepting cash by delivery drivers (accepting a payment before taking an order VS accepting a payment on delivery), then it would be easy to adapt the system to the new business process;
R3. High-availability and fault-tolerance, which means that the system should be online and should accept orders 24/7.
So, in order to meet R3 we could use Erlang/OTP and have the following architecture:
The problem here is that this kind of architecture has a lot of "hard-coded" functionality in it. If, for example, the pizza shop will move from accepting cash payments on delivery to accepting online payments before the order is placed, then it will take a lot of time and effort to rewrite the whole system and modify the system's API.
Moreover, if the pizza shop will need some enhancements to its CRM client, then again we would have to rewrite the API, the clients and the system itself.
So, the following architecture is aimed to solve those problems and thus to help meeting R1, R2 and R3:
Each 'service' in the system is a Webmachine webserver with a RESTful API. Such an approach has the following benefits:
all goodness of Erlang/OTP, since each Webmachine is an Erlang application, which can be supervised and can be put into an Erlang release;
service oriented architecture with all the benefits of SOA;
easy adaptable to changes in the business process;
easy to add new clients and new functions to clients (e.g. to the CRM client), because a client can use RESTful APIs of all the services in the system instead of one 'central' API (Service composability in terms of SOA).
So, essentially, the system architecture proposed in the second picture is a Service Oriented Architecture where each service has a RESTful API instead of a WSDL contract and where each service is an Erlang/OTP application.
And here are my questions:
Picture 2: Am I trying to reinvent the wheel here? Should I just stick with the pure Erlang/OTP architecture instead? ("Pure Erlang" means Erlang applications packed into a release, talking to each other via
gen_server:call and gen_server:cast function calls);
Can you name any disadvantages in suggested approach? (Picture 2)
Do you think it would be easier to maintain and grow (R1 and R2) a system like this (Picture 2) than a truly Erlang/OTP one?
The security of such a system (Picture 2) could be an issue, since there are many entry points open to the web (RESTful APIs of all services) instead of just one entry point (Picture 1), isn't it so?
Is it ok to have several 'orchestrating modules' in such a system or maybe some better practice exists? ("Accept orders", "CRM" and "Dispatch orders" services on Picture 2);
Does pure Erlang/OTP (Picture 1) have any advantages over this approach (Picture 2) in terms of message passing and the limitations of the protocol? (partly discussed in my previous similar question, gen_server:call VS HTTP RESTful calls)

The thing to keep in mind regarding SOA is that the architecture is not about the technology (REST, WS* ). So you can get a good SOA in place with endpoints of several types if/when needed (what I call an Edge component - separating business logic from other concerns like communications and protocols)
Also it is important to note that service boundary is a trust boundary so when you cross it you may need to authenticate and authorize, cross network etc. Additionally, separation into layers (like data and logic) shouldn't drive the way you partition your services.
So from what I am reading in your questions, I'd probably partition the services into more coarse grained services (see below). communications within the boundary of the service can be whatever, where-as communications across services uses a public API (REST or Erlang native is up to you, but the point is that it is managed, versioned, secured etc.) - again, a service may have endpoints in multiple technologies to facilitate different users (sometimes you'd use an ESB to mediate between services and protocols but the need for that depends on size and complexity of your system)
Regarding your specific questions
1 As noted above, I think theres a place to expose more public APIs
than just a single entry point, I am not sure that exposing each and
every capability as a service with public-api is the right way to go
see.
2&3 The disadvantages of exposing every little thing is
management overhead, decreased performance (e.g. you'd have to
authenticate on these calls). You get nano-services services
whose overhead is more than their utility.
One thing to add about the security is that the fact that some service has a REST API does not have to translate to having that API available to the general public. Deployment-wise you can keep it behind a firewall and restrict the access to it for known addresses
etc.
5 It is ok to have several orchestrating modules, though if you get beyond a few you should probably consider some orchestration module (and ESB or an Orchestration engine) alternatively you can use event based integration and get choreography based integration which is more flexible (but somewhat less manageable )
6 The first option has the advantage of easy development and probably better performance (if that's an issue). The hard-coded integration layer can prove harder to maintain over time. The erlang services, if you wrote them write should be able to evolve independently if you keep API integration and message passing between them (luckily Erland makes it relatively easy to get this right by its inherent features (e.g. immutability))

I'd introduce the third way that is rather more cost effective and change-reactive. The architecture definitely should be service oriented because you have services explicitly. But there's no requirement to expose each service as Restful or WSDL-defined one. I'm not an Erlang developer but I believe there's a way to invoke local and remote processes by messaging and thus avoid unnecessary serialisation/serialisation activities for internal calls. But one day you will be faced with new integration issue. For example you will be to integrate accounting or logistic system. Then if you designed architecture well regarding SOA principles the most efforts will be related to exposing existing service with RESTful front-end wrapper with no effort to refactor existing connections to other services. But the issue is to keep domain of responsibilities clean. I mean each service should be responsible to the activity it was originally designed.
The security issue you mentioned is known one. You should have authentication/authorization in all exposed services using tokens for example.

Related

Appropriate event architecture for producing an event graph of a company's microservices automation?

Hello we are trying to determine the best or appropriate architecture for tracking events as they occur between microservices. We know loose coupling begins this process for good microservice design. The use case is to literally query how a company's automation is taking place in real time or historically. There are many products such as Kafka, Solace, MassTransit (mediator, riders, message queues, sagas, routing slips).
Just basic advise at this point. We have to implement saga and routing slip patterns to satisfy our business model.
I would recommend starting by taking a look at the Open Telemetry (OTel). It's a CNCF project, so not tied to specific product, and their goal is to provide a level of observability across your architecture, including the ability of tracing across distributed apps (whether they are sync or async).
I will warn that there is currently a SIG focusing on defining the messaging semantics so this isn't a fully baked solution at this point. You can find more info on that SIG here. They are working to replace the existing experimental messaging semantic conventions with a stable document as things go GA.
That said, you'd probably want to start with instrumenting your apps/microservices and OTel has a number of auto-instrumentation libraries for different APIs & languages in various OTel repos. For example, the repo for the Java agent with a number of auto-instrumentation implementations (including JMS) can be found here: https://github.com/open-telemetry/opentelemetry-java-instrumentation. The idea of the auto-instrumentation is that it doesn't require app code changes so as things evolve it should be easy for you to evolve with it, which is obviously ideal since the messaging semantics are still in work.
Agree with the Open Telemetry comment above. As far as what is currently used most widely (by any tech or framework involved) in microservices is its predecessor Open Tracing (Open Telemetry is backward compatible to Open Tracing and/as OpenTracing is also CNCF). If interested we have created a microservices workshop that uses tracing across both Rest and messaging (specifically JMS) so you can see working examples. You can view these traces in Jaeger UI and such. In the same workshop we show "unified observability" where we show metrics, logging, and tracing all within the same Grafana console. This is really handy as you can, eg, see metrics and/or get a metrics alarm/alert, drill down to its corresponding log entries, and then click/drill down directly from a log entry to its trace (and vice-versa). Full disclosure, I am from Oracle, and so we also show the ability to trace from the OpenTracing/Kubernetes layer, down into the database thus showing the "3 pillars of observability" across both app and data tiers ...but you don't need the database to do any of the other tracing, etc. shown. The workshop is here:https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/view-workshop?p180_id=637 and the src is here: https://github.com/oracle/microservices-datadriven . I'm definitely interested if you'd like to see add anything added or have any questions, etc. We'll be adding more OpenTelemetry to it as its adoption grows.

Service Oriented Architecture approach

Is SOA solution by making a centralized service which works as a central backbone service for all other services is a good approach?
Its doable, but you need to consider service granularity.
Incorrect granularity could mean that a service covers too much functionality or too little functionality.
Incorrect granularity of services in your SOA can lead to bad performance, low reuse possibilities, leaky abstractions and services without added business value.
Other problems:
You're not able to explain to business people (e.g. sales/marketing) what the service does. It simply does not fit in their understanding of the business the company is in. The services are at a level of detail that is irrelevant from a business perspective.
Many different services are needed to achieve something of value to the business.
There is no clear ownership of the service, multiple department feel they should own the service (or parts of it).

Difference between Hub, Spoke and ESB

I know theres already a good question on this, but it doesn't really answer what I'm looking for.
From what I understand:
1.both are used as a central focal point between applications
2.both can use routing/mediation/transformation etc. between services/apps
But the only difference i can really see is that hub and spoke typically have many different formats entering the hub(SOAP/REST/XML/JSON...) while ESB typically has a standard format(Usually just SOAP.)
Also I keep reading that hub and spoke introduces a single point of failure compared to an ESB. So is the physical deployment the difference here? Where a hub has every possible endpoint and as ESB has endpoints deployed across multiple hubs? So an ESB is just multiple hubs(for want of better words)?
Can anyone help clear this up for me?
There is no exact answer here, since you can talk about ESB as a specific design pattern, or as the discourse about the evolution of software integration tools and SOA.
ESB as a design pattern means that you manage communication between different services using a bus where clients can easily plug in and out. This is usually done by forcing them to use standard data formats and protocols, whereas with Hub and Spoke you might use custom connectors and data transformations for each client. This limits the number of problems you may have with running multiple integrations, but you may still have a single point of failure in ESB.
ESB as a discourse (or marketing term) is a more complex issue, where people argue over what is "True ESB". Some people say you need to have a modular architecture where you can select which components you deploy, or you need to be able to distribute the components across different machines to allow scaling and fault tolerance. In the extreme definition you would need to deploy even your data transformers as distributed services.
From Here
The ESB is the next generation of enterprise integration technology, taking over where EAI(hub-spoke) leaves off.
Smarter Endpoints : The ESB enables architectures in which more intelligence is placed at the point
where the application interfaces with the outside world. The ESB allows each endpoint to present
itself as a service using standards such as WSDL and obviates the need for a unique interface written
for each application. Integration intelligence can be deployed natively on the end-points (clients and
servers) themselves. Canonical formats are bypassed in favor of directly formatting the payload to
the targeted format. This approach effectively removes much of the complexity inherent in EAI
products.
Distributed Architecture : Where EAI is a purely hub and spoke approach, ESB is a lightweight
distributed architecture. A centralized hub made sense when each interaction among programs had
to be converted to a canonical format. An ESB, distributes much more of the
processing logic to the end points.
No integration stacks : As customers used EAI products to solve more problems, each vendor added
stacks of proprietary features wedded to the EAI product. Over time these integration stacks got
monolithic and require deep expertise to use. ESBs, in contrast, are a relatively thin layer of software
to which other processing layers can be applied using open standards. For example, if an ESB user
wants to deploy a particular business process management tool, it can be easily integrated with the
ESB using industry standard interfaces such as BPEL for coordinating business processes.
The immediate short-term advantage of the ESB approach is that it achieves the same overall effect
as the EAI(hub-spoke) approach, but at a much lower total-cost-of-ownership. These savings are realized not
only through reduced hardware and software expenses, but also via labor savings that are realized by
using a framework that is distributed and flexible.
I don't know if you mean this when you say is physical deployment the difference here? but actually the main difference between Hubs and ESP is that, its communication system is in different Layer.
When we talking about an ESP we are reffering to a software architecture model where as a hub is reffering to a strict hardware connecting topology.
Profiously this hardware topology, (a collection of hubs) implements an ESP, but there is a distinct line in communication layers between the two.

stock exchange software

Does anybody knows how several tens of display screens are refreshed each second in stock exchange buildings?
Of course the server pushes the data to each screen, bud is this custom technology or some well known technology like example MSMQ ?
Are there any study papers, books or something for the architecture of this kind of software ?
Regards
I believe this is generally referred to as Messaging. From RabbitMQ:
What is messaging? Messaging describes the sending and receiving of
data (in the form of messages) between
systems. Messages are exchanged
between programs or applications,
similar to the way people communicate
by email but with guarantees on
delivery, speed, security and the
absence of spam.
A messaging infrastructure (a.k.a.
message-oriented middleware, a.k.a.
enterprise service bus) makes it
easier for developers to create
complex applications by decoupling the
individual program components. Rather
than communicating directly, the
messaging infrastructure facilitates
the exchange of data between
components. The components need know
nothing about each other’s status,
availability or implementation, which
allows them to be distributed over
heterogeneous platforms and turned off
and on as required.
In adopting this architecture, the
developer is insulated from the
details of the various operating
systems and network interfaces
involved and the interoperability,
scalability and flexibility of the
application are improved.
Please see this presentation on Why
you might need messaging for a general
introduction or this page on Wikipedia
for more information.
One popular paradigm for implementing messaging is publish/subscribe. Some implementations are implemented using point-to-point communication on TCP, some are using multicast on TCP/UDP.
For stock exchange displays, including other financial software that provides real-time prices, the prices are pushed to the clients rather than let them periodically request for the information (poll.) This is done to provide as near real-time prices as possible.
It tends to be proprietory software, with all the information providers (Reuters, Bloomberg etc.) supplying their own client libraries and/or applications. Most big banks (or at least the ones I've worked for) use Sun enterprise class servers, and Windows trading desk workstations.
I believe they use custom protocol through TCP/IP. Each display is connected to internal LAN network and requests for information as needed.

Old concepts with new names (namely REST and Cloud computing)

It seems that SaaS and Cloud computing are old concepts with new names, and I am curious if I am wrong.
For cloud computing you can look at: Difference between cloud computing and distributed computing?
Basically, it seems that when we have been hosting that that is cloud computing, it is just that now some companies have put in much great resources to ensure better uptime than my local ISP. But, it seems that there is nothing really new here.
For REST, it seems that it is what we have been doing with cgis for 15 years.
Here is a question on REST: What am I not understanding about REST?
It appears that REST is an old concept, and I am curious how it is different than has been done since the early days of the web, and, to a large extent, the early days of using telnet (which http is on top of).
Am I mistaken in my simplification of these? I try to see how what is new is like what I know so I can see what more has to be learned in that topic, but for cloud computing and REST it seems that very little needs to be learned.
You are both right and wrong. You are right in the sense that new ideas are normally similar to old ideas, and indeed cloud computing is based significantly on distributed computing.
What is new in cloud computing is
virtualization
self-service
With virtualization, you can run multiple operating systems on a single hardware. While that, in itself, isn't new, either, it was never considered in distributed systems as a relevant piece of the architecture. Using virtualization allows self-service: users can create their own clusters of nodes without the administrator of the hardware taking any action. This allows a significant acceleration of deployment, and a significant reduction of cost.
For ReST, what you are missing is the client API. It is true that on the server side, a ReST service can be implemented with CGI. What is new here is that it is not an end user which retrieves the URL, but a program.
Saying that HTTP is on top of telnet ignores realities; this is like saying that we made no progress since the introduction of copper wires for communication. Strictly speaking, HTTP is not in top of telnet, but on top of TCP (which telnet is also on top of, these days).
Considering Roy's dissertation coined the term REST back in 2000, you can definitely argue that there is nothing new about REST. Additionally, the REST architectural style was synthesized from successful existing practices, so REST implementations pre-date the definition. Having said that, there is nothing simple about designing REST interfaces. Ever since Netscape first abused cookies to allow servers to maintain session state people have been swimming upstream against the web.
REST's recent resurrection has come mainly from people becoming disillusioned with SOAP based Web Services. SOAP tried to hide HTTP instead of embracing it and I think people are starting to realize how effective HTTP can be as an distributed application protocol that can do more than just deliver HTML to web browsers.
RESTful web applications don't use session state, so one could argue that by that virtue alone it is different than most web applications in existence at the moment.
As for Cloud Computing, I find myself agreeing with Larry Ellison for once in my life.
I'm in agreement on what you've posted. You might consider making this community wiki since it's likely to garner many answers based on opinion. Cloud computing seems to have taken off as a buzzword, and this is largely due to a decrease in cost for mass quantities of hardware. And then there is REST which is really just a formal name and definition for something that has been in place for a long time. Some people like to encapsulate ideas with buzzwords and acronyms. Sometimes it's useful to put a name to an idea though.
Not only this, the concept of things being old concepts with new names is old. It's hard to be original these days :P
You are right about REST -- its mostly old concepts with a lot of added pedantry and not much added substance.
Cloud computing has a small but fundamental difference from distributed computing. In distributed computing you had servers dedicated to particular functions, and usually some sort of directory service to locate the correct server. In cloud computing any server is capable of any task and usually the servers queue up for work which is distributed from a central point.