How to replace Kafka with Aeron - apache-kafka

At present, the trading system of our production environment is using Kafka. Because Kafka latency is too high, we hope to replace Kafka with Aeron. How can I use Aeron correctly?

Aeron isn't an out of the box replacement for Kafka although it does provide primitives that would allow you to replicate much of the functionality.
Kafka latencies are in the order of milliseconds whereas Aeron latencies are typically measured in microseconds.
What exactly you would need to build in Aeron very much depends on your use case.
One of the primary uses of Kafka is as a persistent queue.
To build a simple persistent queue for a single publisher use case. You would need:
Publisher
ArchivingMediaDriver - this component runs and Aeron MediaDriver which handles send/receiving messages over the network and and Archive which allows you to record and replay streams.
A Publication to send messages to be recorded by the Archive. See AeronArchive.addRecordedPublication.
Subsciber(s)
MediaDriver - this component handles send/receiving messages over the network.
A Susbcription that replays data from a specific position in the recorded stream of messages. See AeronArchive.replay.
There are examples of this in the aeron-samples.
RecordedBasicPublisher.java
ReplayedBasicSubscriber.java
Latency could be reduced further by having the publisher send messages over multicast/MDC and having the subscriber use ReplayMerge to seamlessly transition from the recorded stream to the live stream.
Worth noting that real-logic do provide commercial support.

Related

process pubsub messages in constant rate. Using streaming and serverless

The scenario:
I have thousands of requests I need to issue each day.
I know the number at the beginning of the day and hopefully I want to send all the data about the requests to pubsub. Message per request.
I want to make the requests in constant rate. for example if I have 172800 requests, I want to process 2 in each second.
The ultimate way will involved pubsub push and cloud run.
Using pull with long running instances is also an option.
Any other option are also welcome.
I want to avoid running in a loop and fetch records from a database with limit.
This is how I am doing it today.
You can use batch and flow control settings for fine-tuning Pub/Sub performance which will help in processing messages at a constant rate.
Batching
A batch, within the context of Cloud Pub/Sub, refers to a group of one or more messages published to a topic by a publisher in a single publish request. Batching is done by default in the client library or explicitly by the user. The purpose for this feature is to allow for a higher throughput of messages while also providing a more efficient way for messages to travel through the various layers of the service(s). Adjusting the batch size (i.e. how many messages or bytes are sent in a publish request) can be used to achieve the desired level of throughput.
Features specific to batching on the publisher side include setElementCountThreshold(), setRequestByteThreshold(), and setDelayThreshold() as part of setBatchSettings() on a publisher client (the naming varies slightly in the different client libraries). These features can be used to finely tune the behavior of batching to find a better balance among cost, latency, and throughput.
Note: The maximum number of messages that can be published in a single batch is 1000 messages or 10 MB.
An example of these batching properties can be found in the Publish with batching settings documentation.
Flow Control
Flow control features on the subscriber side can help control the unhealthy behavior of tasks on the pipeline by allowing the subscriber to regulate the rate at which messages are ingested. These features provide the added functionality to adjust how sensitive the service is to sudden spikes or drops of published throughput.
Some features that are helpful for adjusting flow control and other settings on the subscriber are setMaxOutstandingElementCount(), setMaxOutstandingRequestBytes(), and setMaxAckExtensionPeriod().
Examples of these settings being used can be found in the Subscribe with flow control documentation.
For more information refer to this link.
If you are having long running instances as subscribers, then you will need to set relevant FlowControl settings for example .setMaxOutstandingElementCount(1000L)
Once you have set it to the desired number (for example 1000), this should control the maximum amount of messages the subscriber receives before pausing the message stream, as explained in the code below from this documentation:
// The subscriber will pause the message stream and stop receiving more messsages from the
// server if any one of the conditions is met.
FlowControlSettings flowControlSettings =
FlowControlSettings.newBuilder()
// 1,000 outstanding messages. Must be >0. It controls the maximum number of messages
// the subscriber receives before pausing the message stream.
.setMaxOutstandingElementCount(1000L)
// 100 MiB. Must be >0. It controls the maximum size of messages the subscriber
// receives before pausing the message stream.
.setMaxOutstandingRequestBytes(100L * 1024L * 1024L)
.build();

Synchronize Data From Multiple Data Sources

Our team is trying to build a predictive maintenance system whose task is to look at a set of events and predict whether these events depict a set of known anomalies or not.
We are at the design phase and the current system design is as follows:
The events may occur on multiple sources of an IoT system (such as cloud platform, edge devices or any intermediate platforms)
The events are pushed by the data sources into a message queueing system (currently we have chosen Apache Kafka).
Each data source has its own queue (Kafka Topic).
From the queues, the data is consumed by multiple inference engines (which are actually neural networks).
Depending upon the feature set, an inference engine will subscribe to
multiple Kafka topics and stream data from those topics to continuously output the inference.
The overall architecture follows the single-responsibility principle meaning that every component will be separate from each other and run inside a separate Docker container.
Problem:
In order to classify a set of events as an anomaly, the events have to occur in the same time window. e.g. say there are three data sources pushing their respective events into Kafka topics, but due to some reason, the data is not synchronized.
So one of the inference engines pulls the latest entries from each of the kafka topics, but the corresponding events in the pulled data do not belong to the same time window (say 1 hour). That will result in invalid predictions due to out-of-sync data.
Question
We need to figure out how can we make sure that the data from all three sources are pushed in-order so that when an inference engine requests entries (say the last 100 entries) from multiple kakfa topics, the corresponding entries in each topic belong to the same time window?
I would suggest KSQL, which is a streaming SQL engine that enables real-time data processing against Apache Kafka. It also provides nice functionality for Windowed Aggregation etc.
There are 3 ways to define Windows in KSQL:
hopping windows, tumbling windows, and session windows. Hopping and
tumbling windows are time windows, because they're defined by fixed
durations they you specify. Session windows are dynamically sized
based on incoming data and defined by periods of activity separated by
gaps of inactivity.
In your context, you can use KSQL to query and aggregate the topics of interest using Windowed Joins. For example,
SELECT t1.id, ...
FROM topic_1 t1
INNER JOIN topic_2 t2
WITHIN 1 HOURS
ON t1.id = t2.id;
Some suggestions -
Handle delay at the producer end -
Ensure all three producers always send data in sync to Kafka topics by using batch.size and linger.ms.
eg. if linger.ms is set to 1000, all messages would be sent to Kafka within 1 second.
Handle delay at the consumer end -
Considering any streaming engine at the consumer side (be it Kafka-stream, spark-stream, Flink), provides windows functionality to join/aggregate stream data based on keys while considering delayed window function.
Check this - Flink windows for reference how to choose right window type link
To handle this scenario, data sources must provide some mechanism for the consumer to realize that all relevant data has arrived. The simplest solution is to publish a batch from data source with a batch Id (Guid) of some form. Consumers can then wait until the next batch id shows up marking the end of the previous batch. This approach assumes sources will not skip a batch, otherwise they will get permanently mis-aligned. There is no algorithm to detect this but you might have some fields in the data that show discontinuity and allow you to realign the data.
A weaker version of this approach is to either just wait x-seconds and assume all sources succeed in this much time or look at some form of time stamps (logical or wall clock) to detect that a source has moved on to the next time window implicitly showing completion of the last window.
The following recommendations should maximize success of event synchronization for the anomaly detection problem using timeseries data.
Use a network time synchronizer on all producer/consumer nodes
Use a heartbeat message from producers every x units of time with a fixed start time. For eg: the messages are sent every two minutes at the start of the minute.
Build predictors for producer message delay. use the heartbeat messages to compute this.
With these primitives, we should be able to align the timeseries events, accounting for time drifts due to network delays.
At the inference engine side, expand your windows at a per producer level to synch up events across producers.

How to discard some number of messages in rabbitmQ

I have a use case where I need to get data from a queue on an exchange that I dont have control on.
the usecase is that from this queue I get messages constantly. Just wonder if in rabbitmq or by using/writing a plugin I can discard 90% of the messages at a time before saving them to my local datastore. The reason for this is that I'm not capable of storing all the messages but 10% of it.
Obviously one way is in my application to do so. but I wonder if there is a way to do it on rabbitmq level.
Just wonder if you have any thoughts/solutions on this.
If you don't have control of the exchange, you're pretty much limited to doing it in your app.
You can bulk-reject messages using a nack - here's the help page:
http://www.rabbitmq.com/nack.html
Due to the AMQP specs, a rabbitmq queue passes its messages to the connected consumers in a round robin algorithm. So if your code is the sole consumer of the rabbitmq queue & you want your application to neglect about 90% of recieved messages and process only remaining 10% then,....
connect to the same queue using 10 different consumers simultaneously (all may be written in same language or diff. dose not matter) and write your message processing logic in any one or two of them....abandon the rest 8/9 consumers(these will be used by rabbitmq [and conceptually by us] to drain off about 90% of messages)
You can simply consume the messages and do nothing about it. Using rabbitmqadmin is the easiest way to do this as below:
rabbitmqadmin get queue=queuename requeue=false count=1

How to get Acknowledgement from Kafka

How to I exactly get the acknowledgement from Kafka once the message is consumed or processed. Might sound stupid but is there any way to know the start and end offset of that message for which the ack has been received ?
What I found so far is in 0.8 they have introduced the following way to choose from the offset for reading ..
kafka.api.OffsetRequest.EarliestTime() finds the beginning of the data in the logs and starts streaming from there, kafka.api.OffsetRequest.LatestTime() will only stream new messages.
example code
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
Still not sure about the acknowledgement part
Kafka isn't really structured to do this. To understand why, review the design documentation here.
In order to provide an exactly-once acknowledgement, you would need to create some external tracking system for your application, where you explicitly write acknowledgements and implement locks over the transaction id's in order to ensure things are only ever processed once. The computational cost of implementing such as system is extraordinarily high, and is one of the main reasons that large transactional systems require comparatively exotic hardware and have arguably lower scalability than systems such as Kafka.
If you do not require strong durability semantics, you can use the groups API to keep rough track of when the last message was read. This ensures that every message is read at least once. Note that since the groups API does not provide you the ability to explicitly track your applications own processing logic, that your actual processing guarantees are fairly weak in this scenario. Schemes that rely on idempotent processing are common in this environment.
Alternatively, you may use the poorly-named SimpleConsumer API (it is quite complex to use), which enables you to explicitly track timestamps within your application. This is the highest level of processing guarantee that can be achieved through the native Kafka API's since it enables you to track your applications own processing of the data that is read from the queue.

use of Mongo/Redis on a full firehose stream

I have been reading up on how DataSift uses different technologies to consume the twitter firehose and since I need to follow the same concept, wanted to get some understanding on the differences between mongo/redis and its use in storage of realtime data. My understanding is this:
The stream volume is way too high to simply consume and place the data (tweets etc) in for example a rabbitmq bunch of queues. My concern is the issue of data loss. My current architecture involves connecting to an open stream and consume the data and push each post or message into a couple of queues in rabbitmq. The queues hold a copy of each message with one being the processing queue and one being the storage queue. I then consume each queue by doing processing on the processing queue which is time intensive but my workers keep up fine and write all the storage queue contents to files and that works fine.
If my volume was to increase 100x, I am told this current setup will not be able to handle the volume and using the mongo/redis approach would be better. So not sure how this would be implemented: would I then consume the stream into mongo and then from there into queues and why would this be a better approach.