Enterprise Wide Cluster Streaming System - apache-kafka

I'm interested in deploying an enterprise service bus on a fault tolerant system with a variety of use cases that include tracking network traffic and analyzing social media data. I'd prefer to use a streaming application, but open to the idea of micro-batching. The solution will need to be able to take imports and exports from a variety of sources (bus).
I have been researching various types of stream-processing software platforms here:
https://thenewstack.io/apache-streaming-projects-exploratory-guide/
But I've been struggling with the fact that many (all) of these projects are open source and I don't like the large implementation risk.
I have found Apache Kafka attractive because of the Confluent Platform built on-top, but I can't seem to find anything similar to Confluent out there and want to know if there are any direct competitors built on top of another Apache project. Or an entirely private solution.
Any help would be appreciated! Thanks in advance!

Related

Appropriate event architecture for producing an event graph of a company's microservices automation?

Hello we are trying to determine the best or appropriate architecture for tracking events as they occur between microservices. We know loose coupling begins this process for good microservice design. The use case is to literally query how a company's automation is taking place in real time or historically. There are many products such as Kafka, Solace, MassTransit (mediator, riders, message queues, sagas, routing slips).
Just basic advise at this point. We have to implement saga and routing slip patterns to satisfy our business model.
I would recommend starting by taking a look at the Open Telemetry (OTel). It's a CNCF project, so not tied to specific product, and their goal is to provide a level of observability across your architecture, including the ability of tracing across distributed apps (whether they are sync or async).
I will warn that there is currently a SIG focusing on defining the messaging semantics so this isn't a fully baked solution at this point. You can find more info on that SIG here. They are working to replace the existing experimental messaging semantic conventions with a stable document as things go GA.
That said, you'd probably want to start with instrumenting your apps/microservices and OTel has a number of auto-instrumentation libraries for different APIs & languages in various OTel repos. For example, the repo for the Java agent with a number of auto-instrumentation implementations (including JMS) can be found here: https://github.com/open-telemetry/opentelemetry-java-instrumentation. The idea of the auto-instrumentation is that it doesn't require app code changes so as things evolve it should be easy for you to evolve with it, which is obviously ideal since the messaging semantics are still in work.
Agree with the Open Telemetry comment above. As far as what is currently used most widely (by any tech or framework involved) in microservices is its predecessor Open Tracing (Open Telemetry is backward compatible to Open Tracing and/as OpenTracing is also CNCF). If interested we have created a microservices workshop that uses tracing across both Rest and messaging (specifically JMS) so you can see working examples. You can view these traces in Jaeger UI and such. In the same workshop we show "unified observability" where we show metrics, logging, and tracing all within the same Grafana console. This is really handy as you can, eg, see metrics and/or get a metrics alarm/alert, drill down to its corresponding log entries, and then click/drill down directly from a log entry to its trace (and vice-versa). Full disclosure, I am from Oracle, and so we also show the ability to trace from the OpenTracing/Kubernetes layer, down into the database thus showing the "3 pillars of observability" across both app and data tiers ...but you don't need the database to do any of the other tracing, etc. shown. The workshop is here:https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/view-workshop?p180_id=637 and the src is here: https://github.com/oracle/microservices-datadriven . I'm definitely interested if you'd like to see add anything added or have any questions, etc. We'll be adding more OpenTelemetry to it as its adoption grows.

Select a cataloging / metadata system?

We are setting up a GIS server based on qgis / postgresql-postgis and geoserver.
We are missing an important tool, the cataloging and metadata system.
Postgresql and geoserver are on a windows server 2019 virtual server.
We are GIS & geomatics people but not computer scientists. We are therefore looking for an opensource solution that is relatively easy to install and configure, which does not require extensive computer skills.
What solutions do you think would be suitable? We have identified :
Geonetwork,
Georchestra https://www.georchestra.org/software.html
Geonode
Are there others?
Among these 3 solutions, would there be one easier to set and use, which would be functional on both linux and windows?
Are there other criteria to take into account in our selection of technology?
Thank you very much for your help, recommendation and / or feedback.
GeoNetwork might be the silver bullet (though it tends to do more than the job, since it also features an integrated geodata viewer).
geOrchestra provides both GeoServer and GeoNetwork, and a Single Sign On feature. It also provides additional modules like a user management console, a data upload tool ("datafeeder"), analytics, mapstore and so on. It's very modular and leaves plenty of room for integration.
GeoNode provides a fully integrated environment. It's like a social network dedicated to data. It's also based on GeoServer and has a SSO.
None of the above are easy to setup & maintain if you do not have basic computer skills. With a docker composition, you may have one of them running pretty quickly though.
You may try to use Cartoview which is an extension of GeoNode, visualizing the layers and maps in geospatial apps. It can be used in different environments (Linux, Windows, macOS).
You can download the windows installer from the link above and give it a try!

Best approach to construct a real-time rule engine for our streaming events

We are at the beginning of building an IoT cloud platform project. There are certain well known portions to achieve complete IoT platform solution. One of them is real-time rule processing/engine system which is needed to understand that streaming events are matched with any rules defined dynamically by end users with readable format (SQL or Drools if/when/then etc.)
I am so confused because there are lots of products, projects (Storm, Spark, Flink, Drools, Espertech etc.) in internet so, considering we have 3-person development team (a junior, a mid-senior, a senior), what would it be the best choice ?
Choosing one of the streaming projects such as Apache Flink and learn well ?
Choosing one of the complete solution (AWS, Azure etc.)
The BRMS(Business Rule Management System) like Drools is mainly built for quickly adapting changes in business logic and are more matured and stable compared to stream processing engines like Apache Storm, Spark Streaming, and Flink. Stream processing engines are built for high throughput and low latency. The BRMS may not be suitable to serve hundreds of millions of events in IOT scenarios and may be difficult to deal with event-time-based window calculations.
All these solutions can be used in Iaas providers. In AWS you may also want to take a look at AWS EMR and Kinesis/Kinesis Analytics.
Some use cases I've seen.
Stream data directly to FlinkCEP.
Use rule engines to do fast response with low latency, at the same time stream data to Spark for analysis and machine learning.
You can also run Drools in Spark and Flink to hot-deploy user-defined rules.
Disclaimer, I work for them. But, you should check out Losant. It's developer friendly and it's super easy to get started. We also have a workflow engine, where you can build custom logic/rules for your application.
check out the Waylay rules engine built specifically for real-time IoT data streams.
In the beginning phase Go for the cloud based IoT platform like predix,AWA,SAP or Watson for rapid product development and initial learning.

AEM 6.1/CQ5 --Technical approach to QA to trouble shoot back end servers, LOGS, etc

I am GK. Currently working as a AEM 6.1 QA for big Telecom customer.I am really enjoying the way AEM stuff happens. As QA, I involved more in testing of Author(backend) like dragdrop of components, edit... and publish(frontend).
Familair with siteadmin, crx/de....
My ask:- Right now my responsibilities are added and asking me
"to Technical approach to QA to trouble shoot back end servers, LOGS, etc. Java skills to write test cases. SSH."
I need some help where I can learn trouble shoot backend servers....
Appreciate your qucik reply
Go-Getter
GK
You need to understand the foundations of AEM:
JCR/Java Content Repository, a "No SQL" hierarchical data store (think of nodes and their paths as Resources)
Apache Felix, an OSGi implementation to permit loading multiple libraries even when they have version conflicts in their dependencies
REST/Representational State Transfer, a software architecture for the web
Apache Sling, an implementation marrying REST principles with the JCR
Then you need to know where people make mistakes, because all QA recognizes you "can't test everything" - so you prioritize your testing efforts where issues are more likely to happen OR the risk is too high if issues aren't known.
AEM also provides several log files for tracking or troubleshooting different things. They have different formats, and your open question is not specific enough to begin to dive into troubleshooting or monitoring.

Difference between Hub, Spoke and ESB

I know theres already a good question on this, but it doesn't really answer what I'm looking for.
From what I understand:
1.both are used as a central focal point between applications
2.both can use routing/mediation/transformation etc. between services/apps
But the only difference i can really see is that hub and spoke typically have many different formats entering the hub(SOAP/REST/XML/JSON...) while ESB typically has a standard format(Usually just SOAP.)
Also I keep reading that hub and spoke introduces a single point of failure compared to an ESB. So is the physical deployment the difference here? Where a hub has every possible endpoint and as ESB has endpoints deployed across multiple hubs? So an ESB is just multiple hubs(for want of better words)?
Can anyone help clear this up for me?
There is no exact answer here, since you can talk about ESB as a specific design pattern, or as the discourse about the evolution of software integration tools and SOA.
ESB as a design pattern means that you manage communication between different services using a bus where clients can easily plug in and out. This is usually done by forcing them to use standard data formats and protocols, whereas with Hub and Spoke you might use custom connectors and data transformations for each client. This limits the number of problems you may have with running multiple integrations, but you may still have a single point of failure in ESB.
ESB as a discourse (or marketing term) is a more complex issue, where people argue over what is "True ESB". Some people say you need to have a modular architecture where you can select which components you deploy, or you need to be able to distribute the components across different machines to allow scaling and fault tolerance. In the extreme definition you would need to deploy even your data transformers as distributed services.
From Here
The ESB is the next generation of enterprise integration technology, taking over where EAI(hub-spoke) leaves off.
Smarter Endpoints : The ESB enables architectures in which more intelligence is placed at the point
where the application interfaces with the outside world. The ESB allows each endpoint to present
itself as a service using standards such as WSDL and obviates the need for a unique interface written
for each application. Integration intelligence can be deployed natively on the end-points (clients and
servers) themselves. Canonical formats are bypassed in favor of directly formatting the payload to
the targeted format. This approach effectively removes much of the complexity inherent in EAI
products.
Distributed Architecture : Where EAI is a purely hub and spoke approach, ESB is a lightweight
distributed architecture. A centralized hub made sense when each interaction among programs had
to be converted to a canonical format. An ESB, distributes much more of the
processing logic to the end points.
No integration stacks : As customers used EAI products to solve more problems, each vendor added
stacks of proprietary features wedded to the EAI product. Over time these integration stacks got
monolithic and require deep expertise to use. ESBs, in contrast, are a relatively thin layer of software
to which other processing layers can be applied using open standards. For example, if an ESB user
wants to deploy a particular business process management tool, it can be easily integrated with the
ESB using industry standard interfaces such as BPEL for coordinating business processes.
The immediate short-term advantage of the ESB approach is that it achieves the same overall effect
as the EAI(hub-spoke) approach, but at a much lower total-cost-of-ownership. These savings are realized not
only through reduced hardware and software expenses, but also via labor savings that are realized by
using a framework that is distributed and flexible.
I don't know if you mean this when you say is physical deployment the difference here? but actually the main difference between Hubs and ESP is that, its communication system is in different Layer.
When we talking about an ESP we are reffering to a software architecture model where as a hub is reffering to a strict hardware connecting topology.
Profiously this hardware topology, (a collection of hubs) implements an ESP, but there is a distinct line in communication layers between the two.