Am a newbie in Storm and have been exploring its features to match our CEP requirements. Different examples which I have stumbled implements spouts as a polling service from a message broker, database. How to implement a push based spout i.e. Thrift server running inside a spout? How shall I make my clients aware of where my spouts are running, so that they can push data on it?
Spouts are designed and intended to poll, so you can't push to them. However, what many people do is use things like Redis, Thrift, or Kafka as services that you can push messages to and then your spout can poll them.
The control you have on where and when a spout runs is limited, so it's a bit of hassle to have external processes communicate directly with spouts. It certainly is possible, but it's not the simplest solution.
The standard solution is to push messages to some external message queue and let your spouts poll this message queue.
There are implementations of spouts that do exactly this for commonly used message queue services, such as Kafka, Kestrel and JMS, in storm-contrib
I don't have a whole lot of experience with either Storm or Kafka/Kestrel or CEP, in general but I am looking for a similar solution - push to a Storm spout. How about using a load-balancer between event source and the Storm cluster? For my use case of pushing Syslog messages from rsyslog to Storm, a load-balancer can keep track of what Storm nodes are running a listening spout and which ones are down and also distribute incoming load based on different parameters. I am less inclined to introduce another layer like a message bus between the source and spout.
Edit: I read your blog and to summarize, if the only problem with a listening spout is how would a source find it then a message bus might be the wrong answer. There are simpler/better solutions to direct network traffic at a receiver based on simple network status or higher app level logic. But yes, if you want to use all the additional message bus features then obviously Kafka/Kestrel would be good options.
It's not a typical usage of Storm, obviously you can't bind multiple instances of the spout on the same machine to the same port. In distributed setup it would be good idea to store API's current IP address and port e.g. to ZooKeeper and then balancer which would forward requests to your API.
Here's a project with simple REST API on Storm:
https://github.com/timjstewart/restexpress-storm
Related
I have a microservices architecture whose logs have to be sent to a remote Kafka topic.
Next to it, the consumer of this topic will send the logs to an ELK stack (an other team)
I want to have a dedicated microservice (fwk-proxy-elasticsearch) whose responsability is to collec the logs from the others one and send them to the remote kafka topic.
what's the best protocol to dispatch all the logs aggregated from my microservices to the fwk-proxy-elasticsearch microservice ?
I want this pattern to not duplicate the security configuration of the remote kafka topic. I want to centralize it in a single place.
May I use vertx event bus for that ? or kafka is beter ? or someother tool ?
May I use vertx to send message from jvm to jvm ?
Moreover, in a microservice architecture, is it a good pattern to centralize a use case in a dedicated microservice? (remote http connection for example)
On my point of view, it allows business microservices to focus on a business issue and not to worry over the protocol that the result has to be sent.
Thanks!
I believe you can use both Vert.x event bus and Kafka to propagate the logs, there are pros and cons on each approach.
While I understand the reasoning behind this decision, I would still consider a dedicated solution built for this purpose, like Fluentd, which is able to aggregate the logs and push them into multiple sources (including Kafka, via the dedicated plugin). I'm sure there are other similar solutions.
There are a couple of important benefits that I see if you use a dedicated solution, instead of building it yourself:
The level of configurability, which is definitely useful in the future (in a dedicated solution, you need to write code each time you want to build something new)
The number of destinations where you can export the logs
Support for a hybrid architecture - with a few config updates, you will be able to grab logs from non-JVM microservices
I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.
Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data
I am working on a project which is actually will be a better version of an old project. We want it to be scalable to be able to deal with high load. So we decided to go with microservices instead of monolithic. Then I started to do research about microservices, how they communicate, common design patterns and other things. Since I want my services to be scalable, event based communication made sense to me. So I decided to use kafka for this purpose.
We have much more services in the system but to simplify my question lets say I have 2 types of services which are work-node and master-node. I want both of them to be scalable. For now they are communicating over kafka.
My question : for a case I want to publish an event (produce a message on a topic) from master-node and get that event (consume from the topic) from all work-nodes. But for an other case I need to send a message to specific work-node. To be able to cover first case, all my work-nodes have different group ids in kafka and when a message published on a topic they all get that message. I know that I am not able to send a message to specific consumer with kafka. Since my nodes are scalable and their number can increase or decrease depending on the load, creating a topic for each node does not seem a good idea. My first solution was adding work-node id in message. So other work-nodes can ignore that message. Well it works but I don't think it is a good solution. My second solution is sending http request if I am going to send a message to specific node. But I don't know mixing 2 communication methods is a good solution.
What do you guys think about this problem. Is there a better solution that I am missing ? Or my whole design is going wrong ?
Kafka is not an appropriate technology for the use case you describe. I would recommend using Cadence Workflow which natively supports routing tasks to specific nodes as well as dozens of other features that messaging systems lack.
Feel free to join Cadence Workflow slack channel if you have specific questions.
I think you should able to. Consider regular Kafka flow. You have some consumer groups subscribed to the topic. Producer doesn't send message to specific partition until you specify.
Now think about the scenario that you produce some message based on your algorithm to the specific partitions.
Message received from A
some kind of algorithm like hashcode generated always 0 for A
Message send to Partition 0
Consumer 1 connected to Partiton 0
Only Consumer 1 gets the message coming from A
I am evaluating the vert.x framework to see if I can reduce the Kafka based communications between my microservices developed using spring boot.
The question is:
Can I replace
Kafka with vert.x event bus and
spring boot microservices with vert.x based verticles
To answer quickly, I would say it depends on your needs.
Yes, the eventbus can be a good way to handle natively communication between microservices verticles using an asynchronous and non-blocking paradigm.
But in some cases you could need:
to handle some common enterprises patterns like replay mechanisms, persistence of messages, transactional reading
to be able to process some kind of messages in a chronological order
to handle communication between multiples kind of microservices that aren't all written with the same framework/toolkit or even programming language
to handle reliability, resilience and
failure recovery when all your consumers/microservices/verticles are died
to handle dynamic horizontal scalability and monitoring of your consumers/microservices/verticles
to be able to work with a single cluster deployed in multi-datacenters and multi-regions
In those cases I'd prefer to choose Apache Kafka over the native eventbus or an old fascioned JMS compliant system.
It's not forbidden to use both eventbus and kafka in the same microservices architecture according to your real needs. For example, you could have one kafka consumers group reading a kafka topic to handle scaling, monitoring, failure recovery and reply mechanism and then handle communication between your sub-verticles through the eventbus.
I'll clarify a little bit for the scalability and monitoring part and explain why I think it's more simple to handle that with Kafka over the native eventbus and cluster mode with vert.x : Kafka allow us to know in real time (through JMX metrics and the describe command):
the "lag" of a topic which corresponds to
the number of unread messages
the number of consumers of each group that are listening a topic
the number of partitions of a topic affected of each consumers
i/o metrics
So it's possible to use an ElasticStack or Prometheus+Grafana solution to monitor those metrics and use them to handle a dynamic scalability (when you know that there's a need to increase temporarily the number of consumers for example according to the lag metric and the number of partitions and the cpu/ram/swap metrics of your hosts).
To answer the second question vert.x or SpringBoot my answer will be not very objective but I'd vote for vert.x for its performances on the JVM and especially for its simplicity. I'm a little tired of the Spring factory and its big layers of abstraction that hides a lot of issues under a mountain of annotations triggering a mountain of AOP.
Moreover, In the Java world of microservices, there's other alternatives to SpringBoot like the different implementations of Microprofile (thorntail project for example).
The event-bus is not persistent. You should use it for fast verticle-to-verticle communications, and more generally to dispatch events where you know that you can loose them if you have some crash.
Kafka streams are persistent, and you should send events there because either you want other (possibly non-Vert.x) applications to consume them, and/or because you want to ensure that these events are not being lost in case of failure.
A reactive (read "scalable and fault-tolerant") Vert.x application typically uses a combination of both the event-bus and some replicable messaging systems like AMQP / Kafka / etc.
On the question:
Can I replace spring boot microservices with vert.x based verticles?
Yes, definitely, although the 2 have different programming models.
If you want a more progressive approach and use Spring for structuring your application while using Vert.x for resource efficiency over your I/O and event processing then you can mix them, see https://github.com/vert-x3/vertx-examples/tree/master/spring-examples for examples.
Take a look at the Quarkus framework: in the workshop section you'll find Vert.x and Apache Kafka combined!
I have a problem, that I want to solve using kafka queues.
I need to process some result, then return it to the user.
As you can see in the picture, the Rest Service, requests something to the Calculator Service.
Both services have a kafka consumer, and a kafka producer.
The rest service receive a request, then produces a message on toAdd queue, then keep consuming the fromAdd queue, until receives a value.
The calculator service keep consuming the toAdd queue, when some message comes, it sum two values, then produces a message on fromAdd queue.
Sometimes the rest service receives old messages from the queue, or more than one message.
I find something about idempotent configuration, but I don't know how to implement right.
Is that diagram, the right way to the communication between two or more services using kafka?
Can someone give a example?
Thanks.
Is that diagram, the right way to the communication between two or more services using kafka?
If you mean "Does it make sense to have two or more services communicate indirectly through Kafka?", then yes, it does.
Can someone give a example?
Here are some good pointers including examples:
Build Services on a Backbone of Events, Confluent blog, May 2017
Commander: Better Distributed Applications through CQRS, Event Sourcing, and Immutable Logs, by Bobby Calderwood, StrangeLoop, Sep 2016
Recorded talk
Reference implementation on GitHub
To answer your question: There is no problem with such communication.
Now referring back to other parts...
Keep in mind that it's an asynchronous communication so you should not keep HTTP connection open and keep user of that service waiting for the response. This is just not the way to go. You can solve this in many ways. For instance: you can use WebSockets, you can send an email/SMS/slack msg to the user with the reply and so on.