Appropriate event architecture for producing an event graph of a company's microservices automation? - apache-kafka

Hello we are trying to determine the best or appropriate architecture for tracking events as they occur between microservices. We know loose coupling begins this process for good microservice design. The use case is to literally query how a company's automation is taking place in real time or historically. There are many products such as Kafka, Solace, MassTransit (mediator, riders, message queues, sagas, routing slips).
Just basic advise at this point. We have to implement saga and routing slip patterns to satisfy our business model.

I would recommend starting by taking a look at the Open Telemetry (OTel). It's a CNCF project, so not tied to specific product, and their goal is to provide a level of observability across your architecture, including the ability of tracing across distributed apps (whether they are sync or async).
I will warn that there is currently a SIG focusing on defining the messaging semantics so this isn't a fully baked solution at this point. You can find more info on that SIG here. They are working to replace the existing experimental messaging semantic conventions with a stable document as things go GA.
That said, you'd probably want to start with instrumenting your apps/microservices and OTel has a number of auto-instrumentation libraries for different APIs & languages in various OTel repos. For example, the repo for the Java agent with a number of auto-instrumentation implementations (including JMS) can be found here: https://github.com/open-telemetry/opentelemetry-java-instrumentation. The idea of the auto-instrumentation is that it doesn't require app code changes so as things evolve it should be easy for you to evolve with it, which is obviously ideal since the messaging semantics are still in work.

Agree with the Open Telemetry comment above. As far as what is currently used most widely (by any tech or framework involved) in microservices is its predecessor Open Tracing (Open Telemetry is backward compatible to Open Tracing and/as OpenTracing is also CNCF). If interested we have created a microservices workshop that uses tracing across both Rest and messaging (specifically JMS) so you can see working examples. You can view these traces in Jaeger UI and such. In the same workshop we show "unified observability" where we show metrics, logging, and tracing all within the same Grafana console. This is really handy as you can, eg, see metrics and/or get a metrics alarm/alert, drill down to its corresponding log entries, and then click/drill down directly from a log entry to its trace (and vice-versa). Full disclosure, I am from Oracle, and so we also show the ability to trace from the OpenTracing/Kubernetes layer, down into the database thus showing the "3 pillars of observability" across both app and data tiers ...but you don't need the database to do any of the other tracing, etc. shown. The workshop is here:https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/view-workshop?p180_id=637 and the src is here: https://github.com/oracle/microservices-datadriven . I'm definitely interested if you'd like to see add anything added or have any questions, etc. We'll be adding more OpenTelemetry to it as its adoption grows.

Related

Is it possible to scale Axon Framework without Axon Server Enterprise

Is it possible to scale Axon Framework without Axon Server Enterprise? I'm interested in creating a prototype CQRS app with Axon, but the final, deployable system has to be be free from licensing fees. If Axon Framework can't be scaled to half a dozen nodes using free software, then I should probably look elsewhere.
If Axon Framework turn out not to be a good choice for the system, what would you recommend? Would building something around Apache Pulsar be a sensible alternative?
I think I have good news for you then.
You can utilize Axon Framework perfectly fine without Axon Server Enterprise.
Firstly, you can use the Axon Server Standard edition, which is completely free and you can check out the code too if you want.
If you prefer to get infrastructure back in your own hands, you can also select different approaches to distributing the CommandBus and the EventBus/EventStore.
For the CommandBus the framework provides the DistributedCommandBus for which two implementation are in place, being:
JGroups
Spring Cloud
I'd argue option 2 is the most ideal for distributing your commands, as it gives you the freedom to choose whichever Spring Cloud Discovery Service implementation you desire. That should give you the handles to work "free of licenses" in that area.
For distributing your Events, you can broadly look at two approaches:
Share the database, aka your EventStore, among all instances
Use a event message bus to distribute your event messages
If you want instances of your nodes to be able to Event Source the Command Model, you are inclined to use option 1. This is required as Axon Framework requires a dedicated EventStore to be able to source the Command Models from history.
When you just want to connect other applications to your Event Stream, option 2 would be a good fit. Again, the framework has two options in this area:
AMQP
Kafka
The only thing I'd like to point out on this part additionally is that the Kafka extension is still in a release candidate state. It is being worked on actively at the moment though.
All these extensions and their options should be clearly stated in the Reference Guide, so I'd definitely check this documentation out if you are gonna start an application.
There is a sole thing you cannot distribute though, which is the QueryBus.
There is an outstanding issue to resolve this, for which someone put the effort in to provide a PR.
After seeing how Axon Server Standard edition does it though, he intentionally closed the PR (with the following comment) as it didn't seem feasible to him to maintain such a tool at this stage.
So, can you use Axon Framework without Axon Server Enterprise?
Well, I think you can. :-)
Mind you though, although you'd be winning on not having a license fee if you don't use Axon Server Enterprise, it doesn't mean your production is gonna be free.
You'd be introducing back quite some infrastructure set up and production time to get this going.
Hope this gives you plenty of feedback #ahoffer!

Why workflow/BPM is needed?

I need to work on a customer on boarding application. The workflow between various users can be implemented using JSF framework itself, with the help of faces confiq.xml I can specify the flow between various users. But here BPM is used with the help of webmethods tool. Does BPM is required always for implementation of workflow? What is its importance over normal implementation using other technologies?
Sassi,
in JSF you control only the pageflow between different UIs which can be part of a single activity which is performed by one user or or part of many activities.
A business process typically involves multiple people (participants / roles) and systems. The WfMS / BPMS for instance:
manages the task lists of the process participants
orchestrates the control flow between the different manual and system tasks
manages the process context information throughout the process (data. documents, persistence, versioning - ideally all ootb without coding)
provides rollback, error compensation features
creates an audit trail which is important for compliance / processes that need to be auditable (QA, regulators)
provides dashboards for operational monitoring
and reports for analysis and reporting of KPIs like averages process execution times or volumes grouped by different business data
allows you to model your business process in a graphical way, preferably in a standard notation (BPMN), which is much more user-friendly and a good basis for the communication between business and IT. The business will find it much harder to read the faces-config.xml.
supports the evaluation of simple or complex business rules to determine process flow and work assignment with user-friendly means
allows the
allows versioning of the process definitions (as if you had multiple faces-config versions in the classpath)
...
Find more BPMPS features and examples e.g. here http://www.eclipse.org/stardust/.
Eclipse Stardust is a mature and comprehensive open source BPMS which covers the aspects listed above and more.
There are lots of workflow solutions that are not a BPM system. However, a BPM system should always include a workflow solution. Presumably implemented by using a BPM notation standard and including kpi monitoring, business rules, simulation, user management, organization modeling and reporting. Although you could implement all those parts yourself in Java EE (with JSF) it would presumably take much more time.

Difference between Hub, Spoke and ESB

I know theres already a good question on this, but it doesn't really answer what I'm looking for.
From what I understand:
1.both are used as a central focal point between applications
2.both can use routing/mediation/transformation etc. between services/apps
But the only difference i can really see is that hub and spoke typically have many different formats entering the hub(SOAP/REST/XML/JSON...) while ESB typically has a standard format(Usually just SOAP.)
Also I keep reading that hub and spoke introduces a single point of failure compared to an ESB. So is the physical deployment the difference here? Where a hub has every possible endpoint and as ESB has endpoints deployed across multiple hubs? So an ESB is just multiple hubs(for want of better words)?
Can anyone help clear this up for me?
There is no exact answer here, since you can talk about ESB as a specific design pattern, or as the discourse about the evolution of software integration tools and SOA.
ESB as a design pattern means that you manage communication between different services using a bus where clients can easily plug in and out. This is usually done by forcing them to use standard data formats and protocols, whereas with Hub and Spoke you might use custom connectors and data transformations for each client. This limits the number of problems you may have with running multiple integrations, but you may still have a single point of failure in ESB.
ESB as a discourse (or marketing term) is a more complex issue, where people argue over what is "True ESB". Some people say you need to have a modular architecture where you can select which components you deploy, or you need to be able to distribute the components across different machines to allow scaling and fault tolerance. In the extreme definition you would need to deploy even your data transformers as distributed services.
From Here
The ESB is the next generation of enterprise integration technology, taking over where EAI(hub-spoke) leaves off.
Smarter Endpoints : The ESB enables architectures in which more intelligence is placed at the point
where the application interfaces with the outside world. The ESB allows each endpoint to present
itself as a service using standards such as WSDL and obviates the need for a unique interface written
for each application. Integration intelligence can be deployed natively on the end-points (clients and
servers) themselves. Canonical formats are bypassed in favor of directly formatting the payload to
the targeted format. This approach effectively removes much of the complexity inherent in EAI
products.
Distributed Architecture : Where EAI is a purely hub and spoke approach, ESB is a lightweight
distributed architecture. A centralized hub made sense when each interaction among programs had
to be converted to a canonical format. An ESB, distributes much more of the
processing logic to the end points.
No integration stacks : As customers used EAI products to solve more problems, each vendor added
stacks of proprietary features wedded to the EAI product. Over time these integration stacks got
monolithic and require deep expertise to use. ESBs, in contrast, are a relatively thin layer of software
to which other processing layers can be applied using open standards. For example, if an ESB user
wants to deploy a particular business process management tool, it can be easily integrated with the
ESB using industry standard interfaces such as BPEL for coordinating business processes.
The immediate short-term advantage of the ESB approach is that it achieves the same overall effect
as the EAI(hub-spoke) approach, but at a much lower total-cost-of-ownership. These savings are realized not
only through reduced hardware and software expenses, but also via labor savings that are realized by
using a framework that is distributed and flexible.
I don't know if you mean this when you say is physical deployment the difference here? but actually the main difference between Hubs and ESP is that, its communication system is in different Layer.
When we talking about an ESP we are reffering to a software architecture model where as a hub is reffering to a strict hardware connecting topology.
Profiously this hardware topology, (a collection of hubs) implements an ESP, but there is a distinct line in communication layers between the two.

Erlang/OTP architecture: RESTful protocol for SOAish services

Let us imagine we have an orders processing system for a pizza shop to design and build.
The requirements are:
R1. The system should be client- and use-case-agnostic, which means that the system can be accessed by a client which was not taken into account during the initial design. For example, if the pizza shop decides that many of its customers use the Samsung Bada smartphones later, writing a client for Bada OS will not require rewriting the system's API and the system itself; or for instance, if it turns out that using iPads instead of Android devices is somehow better for delivery drivers, then it would be easy to create an iPad client and will not affect the system's API in any way;
R2. Reusability, which means that the system can be easily reconfigured without rewriting much code if the business process changes. For example, if later the pizza shop will start accepting payments online along with accepting cash by delivery drivers (accepting a payment before taking an order VS accepting a payment on delivery), then it would be easy to adapt the system to the new business process;
R3. High-availability and fault-tolerance, which means that the system should be online and should accept orders 24/7.
So, in order to meet R3 we could use Erlang/OTP and have the following architecture:
The problem here is that this kind of architecture has a lot of "hard-coded" functionality in it. If, for example, the pizza shop will move from accepting cash payments on delivery to accepting online payments before the order is placed, then it will take a lot of time and effort to rewrite the whole system and modify the system's API.
Moreover, if the pizza shop will need some enhancements to its CRM client, then again we would have to rewrite the API, the clients and the system itself.
So, the following architecture is aimed to solve those problems and thus to help meeting R1, R2 and R3:
Each 'service' in the system is a Webmachine webserver with a RESTful API. Such an approach has the following benefits:
all goodness of Erlang/OTP, since each Webmachine is an Erlang application, which can be supervised and can be put into an Erlang release;
service oriented architecture with all the benefits of SOA;
easy adaptable to changes in the business process;
easy to add new clients and new functions to clients (e.g. to the CRM client), because a client can use RESTful APIs of all the services in the system instead of one 'central' API (Service composability in terms of SOA).
So, essentially, the system architecture proposed in the second picture is a Service Oriented Architecture where each service has a RESTful API instead of a WSDL contract and where each service is an Erlang/OTP application.
And here are my questions:
Picture 2: Am I trying to reinvent the wheel here? Should I just stick with the pure Erlang/OTP architecture instead? ("Pure Erlang" means Erlang applications packed into a release, talking to each other via
gen_server:call and gen_server:cast function calls);
Can you name any disadvantages in suggested approach? (Picture 2)
Do you think it would be easier to maintain and grow (R1 and R2) a system like this (Picture 2) than a truly Erlang/OTP one?
The security of such a system (Picture 2) could be an issue, since there are many entry points open to the web (RESTful APIs of all services) instead of just one entry point (Picture 1), isn't it so?
Is it ok to have several 'orchestrating modules' in such a system or maybe some better practice exists? ("Accept orders", "CRM" and "Dispatch orders" services on Picture 2);
Does pure Erlang/OTP (Picture 1) have any advantages over this approach (Picture 2) in terms of message passing and the limitations of the protocol? (partly discussed in my previous similar question, gen_server:call VS HTTP RESTful calls)
The thing to keep in mind regarding SOA is that the architecture is not about the technology (REST, WS* ). So you can get a good SOA in place with endpoints of several types if/when needed (what I call an Edge component - separating business logic from other concerns like communications and protocols)
Also it is important to note that service boundary is a trust boundary so when you cross it you may need to authenticate and authorize, cross network etc. Additionally, separation into layers (like data and logic) shouldn't drive the way you partition your services.
So from what I am reading in your questions, I'd probably partition the services into more coarse grained services (see below). communications within the boundary of the service can be whatever, where-as communications across services uses a public API (REST or Erlang native is up to you, but the point is that it is managed, versioned, secured etc.) - again, a service may have endpoints in multiple technologies to facilitate different users (sometimes you'd use an ESB to mediate between services and protocols but the need for that depends on size and complexity of your system)
Regarding your specific questions
1 As noted above, I think theres a place to expose more public APIs
than just a single entry point, I am not sure that exposing each and
every capability as a service with public-api is the right way to go
see.
2&3 The disadvantages of exposing every little thing is
management overhead, decreased performance (e.g. you'd have to
authenticate on these calls). You get nano-services services
whose overhead is more than their utility.
One thing to add about the security is that the fact that some service has a REST API does not have to translate to having that API available to the general public. Deployment-wise you can keep it behind a firewall and restrict the access to it for known addresses
etc.
5 It is ok to have several orchestrating modules, though if you get beyond a few you should probably consider some orchestration module (and ESB or an Orchestration engine) alternatively you can use event based integration and get choreography based integration which is more flexible (but somewhat less manageable )
6 The first option has the advantage of easy development and probably better performance (if that's an issue). The hard-coded integration layer can prove harder to maintain over time. The erlang services, if you wrote them write should be able to evolve independently if you keep API integration and message passing between them (luckily Erland makes it relatively easy to get this right by its inherent features (e.g. immutability))
I'd introduce the third way that is rather more cost effective and change-reactive. The architecture definitely should be service oriented because you have services explicitly. But there's no requirement to expose each service as Restful or WSDL-defined one. I'm not an Erlang developer but I believe there's a way to invoke local and remote processes by messaging and thus avoid unnecessary serialisation/serialisation activities for internal calls. But one day you will be faced with new integration issue. For example you will be to integrate accounting or logistic system. Then if you designed architecture well regarding SOA principles the most efforts will be related to exposing existing service with RESTful front-end wrapper with no effort to refactor existing connections to other services. But the issue is to keep domain of responsibilities clean. I mean each service should be responsible to the activity it was originally designed.
The security issue you mentioned is known one. You should have authentication/authorization in all exposed services using tokens for example.

How to manage multiple clients with slightly different business rules? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We have written a software package for a particular niche industry. This package has been pretty successful, to the extent that we have signed up several different clients in the industry, who use us as a hosted solution provider, and many others are knocking on our doors. If we achieve the kind of success that we're aiming for, we will have literally hundreds of clients, each with their own web site hosted on our servers.
Trouble is, each client comes in with their own little customizations and tweaks that they need for their own local circumstances and conditions, often (but not always) based on local state or even county legislation or bureaucracy. So while probably 90-95% of the system is the same across all clients, we're going to have to build and support these little customizations.
Moreover, the system is still very much a work in progress. There are enhancements and bug fixes happening continually on the core system that need to be applied across all clients.
We are writing code in .NET (ASP, C#), MS-SQL 2005 is our DB server, and we're using SourceGear Vault as our source control system. I have worked with branching in Vault before, and it's great if you only need to keep 2 or 3 branches synchronized - but we're looking at maintaining hundreds of branches, which is just unthinkable.
My question is: How do you recommend we manage all this?
I expect answers will be addressing things like object architecture, web server architecture, source control management, developer teams etc. I have a few ideas of my own, but I have no real experience in managing something like this, and I'd really appreciate hearing from people who have done this sort of thing before.
Thanks!
I would recommend against maintaining separate code branches per customer. This is a nightmare to maintain working code against your Core.
I do recommend you do implement the Strategy Pattern and cover your "customer customizations" with automated tests (e.g. Unit & Functional) whenever you are changing your Core.
UPDATE:
I recommend that before you get too many customers, you need to establish a system of creating and updating each of their websites. How involved you get is going to be balanced by your current revenue stream of course, but you should have an end in mind.
For example, when you just signed up Customer X (hopefully all via the web), their website will be created in XX minutes and send the customer an email stating it's ready.
You definitely want to setup a Continuous Integration (CI) environment. TeamCity is a great tool, and free.
With this in place, you'll be able to check your updates in a staging environment and can then apply those patches across your production instances.
Bottom Line: Once you get over a handful of customers, you need to start thinking about automating your operations and your deployment as yet another application to itself.
UPDATE: This post highlights the negative effects of branching per customer.
Our software has very similar requirements and I've picked up a few things over the years.
First of all, such customizations will cost you both in the short and long-term. If you have control over it, place some checks and balances such that sales & marketing do not over-zealously sell customizations.
I agree with the other posters that say NOT to use source control to manage this. It should be built into the project architecture wherever possible. When I first began working for my current employer, source control was being used for this and it quickly became a nightmare.
We use a separate database for each client, mainly because for many of our clients, the law or the client themselves require it due to privacy concerns, etc...
I would say that the business logic differences have probably been the least difficult part of the experience for us (your mileage may vary depending on the nature of the customizations required). For us, most variations in business logic can be broken down into a set of configuration values which we store in an xml file that is modified upon deployment (if machine specific) or stored in a client-specific folder and kept in source control (explained below). The business logic obtains these values at runtime and adjusts its execution appropriately. You can use this in concert with various strategy and factory patterns as well -- config fields can contain names of strategies etc... . Also, unit testing can be used to verify that you haven't broken things for other clients when you make changes. Currently, adding most new clients to the system involves simply mixing/matching the appropriate config values (as far as business logic is concerned).
More of a problem for us is managing the content of the site itself including the pages/style sheets/text strings/images, all of which our clients often want customized. The current approach that I've taken for this is to create a folder tree for each client that mirrors the main site - this tree is rooted at a folder named "custom" that is located in the main site folder and deployed with the site. Content placed in the client-specific set of folders either overrides or merges with the default content (depending on file type). At runtime the correct file is chosen based on the current context (user, language, etc...). The site can be made to serve multiple clients this way. Efficiency may also be a concern - you can use caching, etc... to make it faster (I use a custom VirtualPathProvider). The largest problem we run into is the burden of visually testing all of these pages when we need to make changes. Basically, to be 100% sure you haven't broken something in a client's custom setup when you have changed a shared stylesheet, image, etc... you would have to visually inspect every single page after any significant design change. I've developed some "feel" over time as to what changes can be comfortably made without breaking things, but it's still not a foolproof system by any means.
In my case I also have no control other than offering my opinion over which visual/code customizations are sold so MANY more of them than I would like have been sold and implemented.
This is not something that you want to solve with source control management, but within the architecture of your application.
I would come up with some sort of plugin like architecture. Which plugins to use for which website would then become a configuration issue and not a source control issue.
This allows you to use branches, etc. for the stuff that they are intended for: parallel development of code between (or maybe even over) releases. Each plugin becomes a seperate project (or subproject) within your source code system. This also allows you to combine all plugins and your main application into one visual studio solution to help with dependency analisys etc.
Loosely coupling the various components in your application is the best way to go.
As mention before, source control does not sound like a good solution for your problem. To me it sounds that is better yo have a single code base using a multi-tenant architecture. This way you get a lot of benefits in terms of managing your application, load on the service, scalability, etc.
Our product using this approach and what we have is some (a lot) of core functionality that is the same for all clients, custom modules that are used by one or more clients and at the core a the "customization" is a simple workflow engine that uses different workflows for different clients, so each clients gets the core functionality, its own workflow(s) and some extended set of modules that are either client specific or generalized for more that one client.
Here's something to get you started on multi-tenancy architecture:
Multi-Tenant Data Architecture
SaaS database tenancy patterns
Without more info, such as types of client specific customization, one can only guess how deep or superficial the changes are. Some simple/standard approaches to consider:
If you can keep a central config specifying the uniqueness from client to client
If you can centralize the business rules to one class or group of classes
If you can store the business rules in the database and pull out based on client
If the business rules can all be DB/SQL based (each client having their own DB
Overall hard coding differences based on client name/id is very problematic, keeping different code bases per client is costly (think of the complete testing/retesting time required for the 90% that doesn't change)...I think more info is required to properly answer (give some specifics)
Layer the application. One of those layers contains customizations and should be able to be pulled out at any time without affect on the rest of the system. Application- and DB-level "triggers" (quoted because they may or many not employ actual DB triggers) that call customer-specific code or are parametrized with customer keys) are very helpful.
Core should never be customized, but you must layer it in somewhere, even if it is simplistic web filtering.
What we have is a a core datbase that has the functionality that all clients get. Then each client has a separate database that contains the customizations for that client. This is expensive in terms of maintenance. The other problem is that when two clients ask for a simliar functionality, it is often done differnetly by the two separate teams. There is currently little done to share custiomizations between clients and make common ones become part of the core application. Each client has their own application portal, so we don't have the worry about a change to one client affecting some other client.
Right now we are looking at changing to a process using a rules engine, but there is some concern that the perfomance won't be there for the number of records we need to be able to process. However, in your circumstances, this might be a viable alternative.
I've used some applications that offered the following customizations:
Web pages were configurable - we could drag fields out of view, position them where we wanted with our own name for the field label.
Add our own views or stored procedures and use them in: data grids (along with an update proc) and reports. Each client would need their own database.
Custom mapping of Excel files to import data into system.
Add our own calculated fields.
Ability to run custom scripts on forms during various events.
Identify our own custom fields.
If you clients are larger companies, you're almost going to need your own SDK, API's, etc.