Why message brokers don't supply total data/messages sent metrics? - apache-kafka

My team was recently considering different message brokers to use for our project, we ended up picking Apache Pulsar, but it applies to others (Kafka). Our requirement is to track total number of messages sent and bytes sent to each subscriber for billing purposes.
I was reading documentation for metrics and was surprised to see that Pulsar doesn't track this, I've checked Kafka and the result was the same.
My understanding on this subject is minimal so is this some kind of anti-pattern?
I understand that counter values like this never go down and for our use case - should not be reset, leading to potential (certain) overflows. But to me this could be solved by using something like a histogram in Prometheus (metrics format used in Pulsar). I am actually thinking about implementing such functionality, but am I wrong and is there a better solution for our purpose?

Related

Avoid Data Loss While Processing Messages from Kafka

Looking out for best approach for designing my Kafka Consumer. Basically I would like to see what is the best way to avoid data loss in case there are any
exception/errors during processing the messages.
My use case is as below.
a) The reason why I am using a SERVICE to process the message is - in future I am planning to write an ERROR PROCESSOR application which would run at the end of the day, which will try to process the failed messages (not all messages, but messages which fails because of any dependencies like parent missing) again.
b) I want to make sure there is zero message loss and so I will save the message to a file in case there are any issues while saving the message to DB.
c) In production environment there can be multiple instances of consumer and services running and so there is high chance that multiple applications try to write to the
same file.
Q-1) Is writing to file the only option to avoid data loss ?
Q-2) If it is the only option, how to make sure multiple applications write to the same file and read at the same time ? Please consider in future once the error processor
is build, it might be reading the messages from the same file while another application is trying to write to the file.
ERROR PROCESSOR - Our source is following a event driven mechanics and there is high chance that some times the dependent event (for example, the parent entity for something) might get delayed by a couple of days. So in that case, I want my ERROR PROCESSOR to process the same messages multiple times.
I've run into something similar before. So, diving straight into your questions:
Not necessarily, you could perhaps send those messages back to Kafka in a new topic (let's say - error-topic). So, when your error processor is ready, it could just listen in to the this error-topic and consume those messages as they come in.
I think this question has been addressed in response to the first one. So, instead of using a file to write to and read from and open multiple file handles to do this concurrently, Kafka might be a better choice as it is designed for such problems.
Note: The following point is just some food for thought based on my limited understanding of your problem domain. So, you may just choose to ignore this safely.
One more point worth considering on your design for the service component - You might as well consider merging points 4 and 5 by sending all the error messages back to Kafka. That will enable you to process all error messages in a consistent way as opposed to putting some messages in the error DB and some in Kafka.
EDIT: Based on the additional information on the ERROR PROCESSOR requirement, here's a diagrammatic representation of the solution design.
I've deliberately kept the output of the ERROR PROCESSOR abstract for now just to keep it generic.
I hope this helps!
If you don't commit the consumed message before writing to the database, then nothing would be lost while Kafka retains the message. The tradeoff of that would be that if the consumer did commit to the database, but a Kafka offset commit fails or times out, you'd end up consuming records again and potentially have duplicates being processed in your service.
Even if you did write to a file, you wouldn't be guaranteed ordering unless you opened a file per partition, and ensured all consumers only ran on a single machine (because you're preserving state there, which isn't fault-tolerant). Deduplication would still need handled as well.
Also, rather than write your own consumer to a database, you could look into Kafka Connect framework. For validating a message, you can similarly deploy a Kafka Streams application to filter out bad messages from an input topic out into a topic to send to the DB

What's the max of topics I can have on ZeroMQ?

I'm new to ZeroMQ ( I've been using SQS so far ).
I would like to build a system where every time a user logs in, they subscribe to a queue. The all the users subscribed to this queue are interested only in messages directed to them.
I read about topic matching. It seems that I could create a pattern like this:
development.player.234345345
development.player.453423423
integration.player.345354664
And, each worker ( user ) can subscribe to the queue and listen only to the topic they match. i.e. a player 234345345 on the development environment will only subscribe to messages with the topic development.player.234345345
Is this true?
And if so, what are the consequences in ZeroMQ?
Is there a limit on how many topic matching I can have?
ZeroMQ has a very detailed page on how the internals of topic matching works. It looks like you can have as many topics as you want, but topic matching incurrs a runtime cost. It's supposed to be extremely fast:
We believe that application of the above algorithms can give a system
that will be able to match or filter a single message in the range of
nanoseconds or couple of microseconds even it the case of large amount
of different topics and subscriptions.
However, there are some caveats you need to be aware of:
The inverted bitmap technique thus works by pre-indexing a set of
searchable items so that a search request can be resolved with a
minimal number of operations.
It is efficient if and only if the set of searchable items is
relatively stable with respect to the number of search requests.
Otherwise the cost of re-indexing is excessive.
In short, as long as you don't change your subscriptions too often, you should be able to do on the order of thousands of topics at least.
A: Yes, you can
The Max. Number? A harder part...
May would like to read Martin SUSTRIK's post on this:
While ZeroMQ evolves on it's own, Martin, ZeroMQ co-father, has posted on this subject a few interesting facts here, with some further details and design view discussion derrogated here
Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and match PUB/SUB subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure.
Worth reading to have some estimate of where safe-zones are.
Also worth to know, that not all ZeroMQ versions behave the same way.
Recent API uses PUB-side topic filtering, which is not automatic for all previous versions, where SUB-side filtering was used. Translate that into all the network transport, if all messages, irrespective of their's final destiny are broadcast to all SUB-s, just to realise that only one ( user in your use-case ) will match and all the rest will discard the messages, due to topic-filter mismatches.
Thus all your use-cases ought take into account what different ZeroMQ versions ( incl. different non-native language bindings and wrappers ) may
meet and cooperate on the same playground.
Anyway, ZeroMQ is a great tool, nanomsg being in recent years also worth to monitor and challenge.

Implementing sagas with Kafka

I am using Kafka for Event Sourcing and I am interested in implementing sagas using Kafka.
Any best practices on how to do this? The Commander pattern mentioned here seems close to the architecture I am trying to build but sagas are not mentioned anywhere in the presentation.
This talk from this year's DDD eXchange is the best resource I came across wrt Process Manager/Saga pattern in event-driven/CQRS systems:
https://skillsmatter.com/skillscasts/9853-long-running-processes-in-ddd
(requires registering for a free account to view)
The demo shown there lives on github: https://github.com/flowing/flowing-retail
I've given it a spin and I quite like it. I do recommend watching the video first to set the stage.
Although the approach shown is message-bus agnostic, the demo uses Kafka for the Process Manager to send commands to and listen to events from other bounded contexts. It does not use Kafka Streams but I don't see why it couldn't be plugged into a Kafka Streams topology and become part of the broader architecture like the one depicted in the Commander presentation you referenced.
I hope to investigate this further for our own needs, so please feel free to start a thread on the Kafka users mailing list, that's a good place to collaborate on such patterns.
Hope that helps :-)
I would like to add something here about sagas and Kafka.
In general
In general Kafka is a tad different than a normal queue. It's especially good in scaling. And this actually can cause some complications.
One of the means to accomplish scaling, Kafka uses partitioning of the data stream. Data is placed in partitions, which can be consumed at its own rate, independent of the other partitions of the same topic. Here is some info on it: how-choose-number-topics-partitions-kafka-cluster. I'll come back on why this is important.
The most common ways to ensure the order within Kafka are:
Use 1 partition for the topic
Use a partition message key to "assign" the message to a topic
In both scenarios your chronologically dependent messages need to stream through the same topic.
Also, as #pranjal thakur points out, make sure the delivery method is set to "exactly once", which has a performance impact but ensures you will not receive the messages multiple times.
The caveat
Now, here's the caveat: When changing the amount of partitions the message distribution over the partitions (when using a key) will be changed as well.
In normal conditions this can be handled easily. But if you have a high traffic situation, the migration toward a different number of partitions can result in a moment in time in which a saga-"flow" is handled over multiple partitions and the order is not guaranteed at that point.
It's up to you whether this will be an issue in your scenario.
Here are some questions you can ask to determine if this applies to your system:
What will happen if you somehow need to migrate/copy data to a new system, using Kafka?(high traffic scenario)
Can you send your data to 1 topic?
What will happen after a temporary outage of your saga service? (low availability scenario/high traffic scenario)
What will happen when you need to replay a bunch of messages?(high traffic scenario)
What will happen if we need to increase the partitions?(high traffic scenario/outage & recovery scenario)
The alternative
If you're thinking of setting up a saga, based on steps, like a state machine, I would challenge you to rethink your design a bit.
I'll give an example:
Lets consider a booking-a-hotel-room process:
Simplified, it might consist of the following steps:
Handle room reserved (incoming event)
Handle room payed (incoming event)
Send acknowledgement of the booking (after payed and some processing)
Now, if your saga is not able to handle the payment if the reservation hasn't come in yet, then you are relying on the order of events.
In this case you should ask yourself: when will this break?
If you conclude you want to avoid the chronological dependency; consider a system without a saga, or a saga which does not depend on the order of events - i.e.: accepting all messages, even when it's not their turn yet in the process.
Some examples:
aggregators
Modeled as business process: parallel gateways (parallel process flows)
Do note in such a setup it is even more crucial that every action has got an implemented compensating action (rollback action).
I know this is often hard to accomplish; but, if you start small, you might start to like it :-)

How to discard some number of messages in rabbitmQ

I have a use case where I need to get data from a queue on an exchange that I dont have control on.
the usecase is that from this queue I get messages constantly. Just wonder if in rabbitmq or by using/writing a plugin I can discard 90% of the messages at a time before saving them to my local datastore. The reason for this is that I'm not capable of storing all the messages but 10% of it.
Obviously one way is in my application to do so. but I wonder if there is a way to do it on rabbitmq level.
Just wonder if you have any thoughts/solutions on this.
If you don't have control of the exchange, you're pretty much limited to doing it in your app.
You can bulk-reject messages using a nack - here's the help page:
http://www.rabbitmq.com/nack.html
Due to the AMQP specs, a rabbitmq queue passes its messages to the connected consumers in a round robin algorithm. So if your code is the sole consumer of the rabbitmq queue & you want your application to neglect about 90% of recieved messages and process only remaining 10% then,....
connect to the same queue using 10 different consumers simultaneously (all may be written in same language or diff. dose not matter) and write your message processing logic in any one or two of them....abandon the rest 8/9 consumers(these will be used by rabbitmq [and conceptually by us] to drain off about 90% of messages)
You can simply consume the messages and do nothing about it. Using rabbitmqadmin is the easiest way to do this as below:
rabbitmqadmin get queue=queuename requeue=false count=1

Which Solution Handles Publisher/Subscriber Scenario Better?

The scenario is publisher/subscriber, and I am looking for a solution which can give the feasibility of sending one message generated by ONE producer across MULTIPLE consumers in real-time. the light weight this scenario can be handled by one solution, the better!
In case of AMQP servers I've only checked out Rabbitmq and using rabbitmq server for pub/sub pattern each consumer should declare an anonymous, private queue and bind it to an fanout exchange, so in case of thousand users consuming one message in real-time there will be thousands or so anonymous queue handling by rabbitmq.
But I really do not like the approach by the rabbitmq, It would be ideal if rabbitmq could handle this pub/sub scenario with one queue, one message , many consumers listening on one queue!
what I want to ask is which AMQP server or other type of solutions (anyone similar including XMPP servers or Apache Kafka or ...) handles the pub/sub pattern/scenario better and much more efficient than RabbitMQ with consuming (of course) less server resource?
preferences in order of interest:
in case of AMQP enabled server handling the pub/sub scenario with only ONE or LESS number of queues (as explained)
handling thousands of consumers in a light-weight manner, consuming less server resource comparing to other solutions in pub/sub pattern
clustering, tolerating failing of nodes
Many Language Bindings ( Python and Java at least)
easy to use and administer
I know my question may be VERY general but I like to hear the ideas and suggestions for the pub/sub case.
thanks.
In general, for RabbitMQ, if you put the user in the routing key, you should be able to use a single exchange and then a small number of queues (even a single one if you wanted, but you could divide them up by server or similar if that makes sense given your setup).
If you don't need guaranteed order (as one would for, say, guaranteeing that FK constraints wouldn't get hit for a sequence of changes to various SQL database tables), then there's no reason you can't have a bunch of consumers drawing from a single queue.
If you want a broadcast-message type of scenario, then that could perhaps be handled a bit differently. Instead of the single user in the routing key, which you could use for non-broadcast-type messages, have a special user type, say, __broadcast__, that no user could actually have, and have the users to broadcast to stored in the payload of the message along with the message itself.
Your message processing code could then take care of depositing that message in the database (or whatever the end destination is) across all of those users.
Edit in response to comment from OP:
So the routing key might look something like this message.[user] where [user] could be the actual user if it were a point-to-point message, and a special __broadcast__ user (or similar user name that an actual user would not be allowed to register) which would indicate a broadcast style message.
You could then place the users to which the message should be delivered in the payload of the message, and then that message content (which would also be in the payload) could be delivered to each user. The mechanism for doing that would depend on what your end destination is. i.e. do the messages end up getting stored in Postgres, or Mongo DB or similar?