Apache Kafka Streams : Out-of-Order messages

Apache Kafka Streams : Out-of-Order messages - apache-kafka

I have an Apache Kafka 2.6 Producer which writes to topic-A (TA).
I also have a Kafka streams application which consumes from TA and writes to topic-B (TB).
In the streams application, I have a custom timestamp extractor which extracts the timestamp from the message payload.
For one of my failure handling test cases, I shutdown the Kafka cluster while my applications are running.
When the producer application tries to write messages to TA, it cannot because the cluster is down and hence (I assume) buffers the messages.
Let's say it receives 4 messages m1,m2,m3,m4 in increasing time order. (i.e. m1 is first and m4 is last).
When I bring the Kafka cluster back online, the producer sends the buffered messages to the topic, but they are not in order. I receive for example, m2 then m3 then m1 and then m4.
Why is that ? Is it because the buffering in the producer is multi-threaded with each producing to the topic at the same time ?
I assumed that the custom timestamp extractor would help in ordering messages when consuming them. But they do not. Or maybe my understanding of the timestamp extractor is wrong.
I got one solution from SO here, to just stream all events from tA to another intermediate topic (say tA') which will use the TimeStamp extractor to another topic. But I am not sure if this will cause the events to get reordered based on the extracted timestamp.
My code for the Producer is as shown below (I am using Spring Cloud for creating the Producer):
Producer.java
#Service
public class Producer {
private String topicName = "input-topic";
private ApplicationProperties appProps;
#Autowired
private KafkaTemplate<String, MyEvent> kafkaTemplate;
public Producer() {
super();
}
#Autowired
public void setAppProps(ApplicationProperties appProps) {
this.appProps = appProps;
this.topicName = appProps.getInput().getTopicName();
}
public void sendMessage(String key, MyEvent ce) {
ListenableFuture<SendResult<String,MyEvent>> future = this.kafkaTemplate.send(this.topicName, key, ce);
}
}

Why is that ? Is it because the buffering in the producer is multi-threaded with each producing to the topic at the same time ?
By default, the producer allow for up to 5 parallel in-flight requests to a broker, and thus if some requests fail and are retried the request order might change.
To avoid this re-ordering issue, you can either set max.in.flight.requests.per.connection = 1 (what may have a performance hit) or set enable.idempotence = true.
Btw: you did not say if your topic has a single partition or multiple partitions, and if your messages have a key? If your topic has more then one partition and you messages are sent to different partitions, there is no ordering guarantee on read anyway, because offset ordering is only guaranteed within a partition.
I assumed that the custom timestamp extractor would help in ordering messages when consuming them. But they do not. Or maybe my understanding of the timestamp extractor is wrong.
The timestamp extractor only extracts a timestamp. Kafka Streams does not re-order any messages, but processes messages always in offset-order.
If not, then what are the specific uses of the timestamp extractor ? Just to associate a timestamp with an event ?
Correct.
I got one solution from SO here, to just stream all events from tA to another intermediate topic (say tA') which will use the TimeStamp extractor to another topic. But I am not sure if this will cause the events to get reordered based on the extracted timestamp.
No, it won't do any reordering. The other SO question is just about to change the timestamp, but if you read messages in order a,b,c the result would be written in order a,b,c (just with different timestamps, but offset order should be preserved).
This talk explains some more details: https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/

Related

Distribute messages equally into partitions in kafka

I am new in Kafka, so i have some issues related on basic things for Kafka. I wanted to distribute all messages equally to all over partitions.
As I know, Producer chose the partition based on key hashing (If key is available) using default Partitioner hash algorithm (Random, Consistent, Murmur2, sticky etc.). Which is great. But I want to distribute the messages to all partitions. Like:
Topic: "Test"
Partition: 3
Now, If i produce messages (Key Available) then I want to distribute those messages equally like:
Partition 1: 1,4,7,10
Partition 2: 2,5,8
Partition 3: 3,6,9
So, how can i distribute messages equally to all partition

The default partitioner chooses partition based on the hash of key if a key is available and no partition is specified in the record itself. Otherwise (i.e. no key is present and no partition is specified) it chooses the partition in a round-robin fashion (Kafka<2.4, read below).
public int partition(String key, int partitionNum) {
byte[] keyBytes = key.getBytes();
return toPositive(murmur2(keyBytes)) % partitionNum;
}
For a handful number of keys, using the default partitioner may not give you even data distribution, as toPositive(murmur2(keyBytes)) % numberOfPartitions will have collisions. The best way is for the producer to take responsibility and decide which partition to send the message to using CustomPartitioner based on your business use-case.
Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.
https://kafka.apache.org/documentation.html#introduction
One thing to note here is, that although eliminating data skewness is important - The order of messages going in different partitions in a topic may or may not be in-order - this may have consequences based on your use-case. But within a Partition will they'll be stored in the order, Thus keep related messages in the same partition.
For e.g. In an E-commerce delivery-related environment, Messages related to an OrderID should come in order (you don't want "Out-For-Delivery" to be after "Delivered"), thus messages for specific order_id should go into the same partition.
Update:
As mentioned in the comment, Kafka ≥ v2.4 uses Sticky Partitioner as the default partitioner.
The sticky partitioner addresses the problem of spreading out records without keys into smaller batches by picking a single partition to send all non-keyed records. Once the batch at that partition is filled or otherwise completed, the sticky partitioner randomly chooses and “sticks” to a new partition. That way, over a larger period of time, records are about evenly distributed among all the partitions while getting the added benefit of larger batch sizes.
https://www.confluent.io/blog/apache-kafka-producer-improvements-sticky-partitioner/
This means Kafka producers don’t immediately send records but keeps a batch of records for a specific topic with no keys and no assigned partition and will send to the same partition until the batch is ready to be sent. When a new batch is created, a new partition is chosen.
Effectively, the partitioner assigns records to the same partition until the batch is sent based on batch.size and linger.ms, once that batch is sent, a new partition will be used. Thus messages may not necessarily be evenly distributed.
Further Reading:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner#KIP794:StrictlyUniformStickyPartitioner-UniformStickyBatchSize
https://www.confluent.io/blog/5-things-every-kafka-developer-should-know/#tip-2-new-sticky-partitioner
https://aiven.io/blog/balance-data-across-kafka-partitions#challenge-of-uneven-record-distribution

I think this answers your question best:
https://rajatjain-ix.medium.com/whats-wrong-with-kafka-b53d0549677a
So, there are two solutions available..
You don't specify any partition_key. In this case, the DefaultPartitioner will automatically round-robin the messages across the partitions.
You use a (incremental uuid) % (count of partitions) as the partition number in Producer API. This way you are manually telling it to round-robin the messages to partitions.

Ronak explained very precisely.
You could achieve distribution of the messages over partitions evenly by implementing Partitioner interface regardless of the key.
New sticky version
public class SimplePartitioner implements Partitioner {
private final StickyPartitionCache stickyPartitionCache = new StickyPartitionCache();
public void configure(Map<String, ?> configs) {
}
#Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
return partition(topic, key, keyBytes, value, valueBytes, cluster, cluster.partitionsForTopic(topic).size());
}
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster,
int numPartitions) {
return stickyPartitionCache.partition(topic, cluster);
}
#Override
public void close() {
}
}
Old version - see this link: hhttps://github.com/sharefeel/kafka-simple-partitioner/blob/0.8.2/SimplePartitioner.java
Don't forget this. Target partitions of SimplePartitioner and DefaultPartitioner are not same. But normally same.
If key is given, DefaultPartitioner will return one number from 0 to numPartition-1.
But SimplePartition always returns number of stikyPartitionCache.partitionCache.partition()'s value.
If there's an unavailable partition (all replicas of that parition down), producing will fail with DefaulPartitioner. But Simpartition can make producing success.
I tested about this with old version of SimplePartitioner but did not with newer one.

Is it possible to reset offsets to a topic for a kafka consumer group in a kafka connector?

My kafka sink connector reads from multiple topics (configured with 10 tasks) and processes upwards of 300 records from all topics. Based on the information held in each record, the connector may perform certain operations.
Here is an example of the key:value pair in a trigger record:
"REPROCESS":"my-topic-1"
Upon reading this record, I would then need to reset the offsets of the topic 'my-topic-1' to 0 in each of its partitions.
I have read in many places that creating a new KafkaConsumer, subscribing to the topic's partitions, then calling the subscribe(...) method is the recommended way. For example,
public class MyTask extends SinkTask {
#Override
public void put(Collection<SinkRecord> records) {
records.forEach(record -> {
if (record.key().toString().equals("REPROCESS")) {
reprocessTopicRecords(record);
} else {
// do something else
}
});
}
private void reprocessTopicRecords(SinkRecord record) {
KafkaConsumer<JsonNode, JsonNode> reprocessorConsumer =
new KafkaConsumer<>(reprocessorProps, deserializer, deserializer);
reprocessorConsumer.subscribe(Arrays.asList(record.value().toString()),
new ConsumerRebalanceListener() {
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// do offset reset here
}
}
);
}
}
However, the above strategy does not work for my case because:
1. It depends on a group rebalance taking place (does not always happen)
2. 'partitions' passed to the onPartitionsAssigned method are dynamically assigned partitions, meaning these are only a subset to the full set of partitions that will need to have their offset reset. For example, this SinkTask will be assigned only 2 of the 8 partitions that hold the records for 'my-topic-1'.
I've also looked into using assign() but this is not compatible with the distributed consumer model (consumer groups) in the SinkConnector/SinkTask implementation.
I am aware that the kafka command line tool kafka-consumer-groups can do exactly what I want (I think):
https://gist.github.com/marwei/cd40657c481f94ebe273ecc16601674b
To summarize, I want to reset the offsets of all partitions for a given topic using Java APIs and let the Sink Connector pick up the offset changes and continue to do what it has been doing (processing records).
Thanks in advance.

I was able to achieve resetting offsets for a kafka connect consumer group by using a series of Confluent's kafka-rest-proxy APIs: https://docs.confluent.io/current/kafka-rest/api.html
This implementation no longer requires the 'trigger record' approach firs described in the original post and is purely Rest API based.
Temporarily delete the kafka connector (this deletes the connector's consumers and )
Create a consumer instance for the same consumer group ("connect-")
Have the instance subscribe to the requested topic you want to reset
Do a dummy poll ('subscribe' is evaluated lazily')
Reset consumer group topic offsets for specified topic
Do a dummy poll ('seek' is evaluated lazily') Commit the current offset state (in the proxy) for the consumer
Re-create kafka connector (with same connector name) - after re-balancing, consumers will join the group and read the last committed offset (starting from 0)
Delete the temporary consumer instance
If you are able to use the CLI, Steps 2-6 can be replaced with:
kafka-consumer-groups --bootstrap-server <kafkahost:port> --group <group_id> --topic <topic_name> --reset-offsets --to-earliest --execute
As for those of you trying to do this in the kafka connector code through native Java APIs, you're out of luck :-(

You're looking for the seek method. Either to an offset
consumer.seek(new TopicPartition("topic-name", partition), offset);
Or seekToBeginning
However, I feel like you'd be competing with the Connect Sink API's consumer group. In other words, assuming you setup the consumer with a separate group id, then you're essentially consuming records twice here from the source topic, once by Connect, and then your own consumer instance.
Unless you explicitly seek Connect's own consumer instance as well (which is not exposed), you'd be getting into a weird state. For example, your task only executes on new records to the topic, despite the fact your own consumer would be looking at an old offset, or you'd still be getting even newer events while still processing old ones
Also, eventually you might get a reprocess event at the very beginning of the topic due to retention policies, expiring old records, for example, causing your consumer to not progress at all and constantly rebalancing its group by seeking to the beginning

We had to do a very similar offset resetting exercise.
KafkaConsumer.seek() combined with KafkaConsumer.commitSync() worked well.
There is another option that is worth mentioning, if you are dealing with lots of topics and partitions (javadoc):
AdminClient.alterConsumerGroupOffsets(
String groupId,
Map<TopicPartition,OffsetAndMetadata> offsets
)
We were lucky because we had the luxury to stop the Kafka Connect instance for a while, so there's no consumer group competing.

Kafka Streams: Custom TimestampExtractor for aggregation

I am building a pretty straightforward KafkaStreams demo application, to test a use case.
I am not able to upgrade the Kafka broker I am using (which is currently on version 0.10.0), and there are several messages written by a pre-0.10.0 Producer, so I am using a custom TimestampExtractor, which I add as a default to the config in the beginning of my main class:
config.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, GenericRecordTimestampExtractor.class);
When consuming from my source topic, this works perfectly fine. But when using an aggregation operator, I run into an exception because the FailOnInvalidTimestamp implementation of TimestampExtractor is used instead of the custom implementation when consuming from the internal aggregation topic.
The code of the Streams app looks something like this:
...
KStream<String, MyValueClass> clickStream = streamsBuilder
.stream("mytopic", Consumed.with(Serdes.String(), valueClassSerde));
KTable<Windowed<Long>, Long> clicksByCustomerId = clickStream
.map(((key, value) -> new KeyValue<>(value.getId(), value)))
.groupByKey(Serialized.with(Serdes.Long(), valueClassSerde))
.windowedBy(TimeWindows.of(TimeUnit.MINUTES.toMillis(1)))
.count();
...
The Exception I'm encountering is the following:
Exception in thread "click-aggregator-b9d77f2e-0263-4fa3-bec4-e48d4d6602ab-StreamThread-1" org.apache.kafka.streams.errors.StreamsException:
Input record ConsumerRecord(topic = click-aggregator-KSTREAM-AGGREGATE-STATE-STORE-0000000002-repartition, partition = 9, offset = 0, CreateTime = -1, serialized key size = 8, serialized value size = 652, headers = RecordHeaders(headers = [], isReadOnly = false), key = 11230, value = org.example.MyValueClass#2a3f2ea2) has invalid (negative) timestamp.
Possibly because a pre-0.10 producer client was used to write this record to Kafka without embedding a timestamp, or because the input topic was created before upgrading the Kafka cluster to 0.10+. Use a different TimestampExtractor to process this data.
Now the question is: Is there any way I can make Kafka Streams use the custom TimestampExtractor when reading from the internal aggregation topic (optimally while still using the Streams DSL)?

You cannot change the timestamp extractor (as of v1.0.0). This is not allowed for correctness reasons.
But I am really wondering, how a record with timestamp -1 is written into this topic in the first place. Kafka Streams uses the timestamp that was provided by your custom extractor when writing the record. Also note, that KafkaProducer does not allow to write records with negative timestamp.
Thus, the only explanation I can think of is that some other producer did write into the repartitioning topic -- and this is not allowed... Only Kafka Streams should write into the repartioning topic.
I guess, you will need to delete this topic and let Kafka Streams recreate it to get back into a clean state.
From the discussion/comment of the other answer:
You need 0.10+ format to work with Kafka Streams. If you upgrade your brokers and keep 0.9 format or older, Kafka Streams might not work as expected.

It is well known issue :-). I have the same problem with old clients in the projects which are still using older Kafka clients like 0.9 and also when communicating with some "not certified" .NET clients.
Therefore I wrote dedicated class:
public class MyTimestampExtractor implements TimestampExtractor {
private static final Logger LOG = LogManager.getLogger( MyTimestampExtractor.class );
#Override
public long extract ( ConsumerRecord<Object, Object> consumerRecord, long previousTimestamp ) {
final long timestamp = consumerRecord.timestamp();
if ( timestamp < 0 ) {
final String msg = consumerRecord.toString().trim();
LOG.warn( "Record has wrong Kafka timestamp: {}. It will be patched with local timestamp. Details: {}", timestamp, msg );
return System.currentTimeMillis();
}
return timestamp;
}
}
When there are many messages you may skip logging, as it may flood.

After reading Matthias' answer I double checked everything and the cause of the issue were incompatible versions between the Kafka Broker and the Kafka Streams app. I was stupid enough to use Kafka Streams 1.0.0 with a 0.10.1.1 Broker, which is clearly stated as incompatible in the Kafka Wiki here.
Edit (thx to Matthias): The actual cause of the problem was the fact, that the log format used by our 0.10.1.x broker was still 0.9.0.x, which is incompatible with Kafka Streams.

Questions about Kafka Consumer of Transient messages via Akka-Streams on Multiple Nodes

We are using Kafka to store messages that are produced by a node in our cluster and to be distributed to all nodes in the cluster and I have it mostly working with akka-streams but there is a couple of questions I have to tie this up. There are some constraints to this.
First of all the message has to be consumed by every node in the cluster but produced by only one node. I understand I can assign each node a group id that is probably its node ID which means each node will get the message. That sorted. But here are the questions.
The data is extremely transient and fairly large (just under a meg) and cannot be compressed further or broken up. If there is a new message on the topic the old one is pretty much trash. How can I limit the topic to basically just one message currently maximum?
Given that the data is necessary for the node to start, I need to consume the latest message on the topic no matter whether I have consumed it before and, hopefully without creating a unique group id every time I start the server. Is this possible and if so, how can it be done.
Finally, the data is usually on the topic but on occasion it is not there and I, ideally, need to be able to check if there is a message there and if not ask the producer to create the message. Is this possible?
This is the code I am currently using to start the consumer:
private Control startMatrixConsumer() {
final ConsumerSettings<Long, byte[]> matrixConsumerSettings = ConsumerSettings
.create(services.actorSystem(), new LongDeserializer(), new ByteArrayDeserializer())
.withBootstrapServers(services.config().getString("kafka.bootstrapServers"))
.withGroupId("group1") // todo put in the conf ??
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
final String topicName = Matrix.class.getSimpleName() + '-' + eventId;
final AutoSubscription subscription = Subscriptions.topics(topicName);
return Consumer.plainSource(MatrixConsumerSettings, subscription)
.named(Matrix.class.getSimpleName() + "-Kafka-Consumer-" + eventId)
.map(data -> {
final Matrix matrix = services.kryoDeserialize(data.value(), Matrix.class);
log.debug(format("Received %s for event %d from Kafka", Matrix.class.getSimpleName(), matrix.getEventId()));
return matrix;
})
.filter(Objects::nonNull)
.to(Sink.actorRef(getSelf(), NotUsed.getInstance()))
.run(ActorMaterializer.create(getContext()));
}
Thanks a bunch.

All the message has to be consumed by every node in the cluster but
produced by only one.
You are correct, you can achieve this by having an unique group id per node.
How can I limit the topic to basically just one message currently
maximum?
Kafka provides compacted topics.
Compacted topic maintains only the most recent message of a given key. For instance, Kafka consumers store their offsets in compacted topic.
In your case, produce every message with the same key, and Kafka Log Cleaner will delete old messages. Please be aware that compaction is performed periodically, so you can end up with two (or more) messages with the same key for a short period of time (depends on your Log Cleaner configuration.
I need to consume the latest message on the topic no matter whether I
have consumed it before.
You can achieve this by not committing the consumer offset (enable.auto.commit set to false) and setting auto.offset.reset to earliest. By having one message in your compacted topic and consumer that starts from the beginning of the topic, that message is always consumed after node starts.
I need to be able to check if there is a message there and if not ask
the producer to create the message.
Unfortunately, I am not aware of any Kafka functionality that could help you with that. Most of the time Kafka is used to decouple producers and consumers.

KafKa partitioner class, assign message to partition within topic using key

I am new to kafka so apology if I sound stupid but what I understood so far
is .. A stream of message can be defined as a topic, like a category. And every topic is divided
into one or more partitions (each partition can have multiple replicas). so they act in parallel
From the Kafka main site they say
The producer is able to chose which message to assign to which partition within the topic.
This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the message).
Does this mean while consuming I will be able to choose the message offset from particular partition?
While running multiple partitions is it possible to choose from one specific partition i.e partition 0?
In Kafka 0.7 quick start they say
Send a message with a partition key. Messages with the same key are sent to the same partition.
And the key can be provided while creating the producer as below
ProducerData<String, String> data = new ProducerData<String, String>("test-topic", "test-key", "test-message");
producer.send(data);
Now how do I consume message based on this key? what is the actual impact of using this key while producing in Kafka ?
While creating producer in 0.8beta we can provide the partitioner class attribute through the config file.
The custom partitioner class can be perhaps created implementing the kafka partitioner interface.
But m little confused how exactly it works. 0.8 doc also does not explain much. Any advice or m i missing something ?

This is what I've found so far ..
Define our own custom partitioner class by implementing the kafka Partitioner interface. The implemented method will have two arguments, first the key that we provide from the producer and next the number of partition available. So we can define our own logic to set which key of message goes to what partition.
Now while creating the producer we can specify our own partitioner class using the "partitioner.class" attribute
props.put("partitioner.class", "path.to.custom.partitioner.class");
If we don't mention it then Kafka will use its default class and try to distribute message evenly among the partitions available.
Also inform Kafka how to serialize the key
props.put("key.serializer.class", "kafka.serializer.StringEncoder");
Now if we send some message using a key in the producer the message will be delivered to a specific partition (based on our logic written on the custom partitioner class), and in the consumer (SimpleConsumer) level we can specify the partition to retrieve the specific messages.
In case we need to pass a String as a key, the same should be handled in the custom partitioner class ( take hash value of the key and then take first two digit etc )

Each topic in Kafka is split into many partitions. Partition allows for parallel consumption increasing throughput.
Producer publishes the message to a topic using the Kafka producer client library which balances the messages across the available partitions using a Partitioner. The broker to which the producer connects to takes care of sending the message to the broker which is the leader of that partition using the partition owner information in zookeeper. Consumers use Kafka’s High-level consumer library (which handles broker leader changes, managing offset info in zookeeper and figuring out partition owner info etc implicitly) to consume messages from partitions in streams; each stream may be mapped to a few partitions depending on how the consumer chooses to create the message streams.
For example, if there are 10 partitions for a topic and 3 consumer instances (C1,C2,C3 started in that order) all belonging to the same Consumer Group, we can have different consumption models that allow read parallelism as below
Each consumer uses a single stream.
In this model, when C1 starts all 10 partitions of the topic are mapped to the same stream and C1 starts consuming from that stream. When C2 starts, Kafka rebalances the partitions between the two streams. So, each stream will be assigned to 5 partitions(depending on the rebalance algorithm it might also be 4 vs 6) and each consumer consumes from its stream. Similarly, when C3 starts, the partitions are again rebalanced between the 3 streams. Note that in this model, when consuming from a stream assigned to more than one partition, the order of messages will be jumbled between partitions.
Each consumer uses more than one stream (say C1 uses 3, C2 uses 3 and C3 uses 4).
In this model, when C1 starts, all the 10 partitions are assigned to the 3 streams and C1 can consume from the 3 streams concurrently using multiple threads. When C2 starts, the partitions are rebalanced between the 6 streams and similarly when C3 starts, the partitions are rebalanced between the 10 streams. Each consumer can consume concurrently from multiple streams. Note that the number of streams and partitions here are equal. In case the number of streams exceed the partitions, some streams will not get any messages as they will not be assigned any partitions.

Does this mean while consuming I will be able to choose the message offset from particular partition? While running multiple partitions is it possible to choose from one specific partition i.e partition 0?
Yes you can choose message from one specific partition from your consumer but if you want that to be identified dynamically then it depends on the logic how you have implemented Partitioner Class in your producer.
Now how do I consume message based on this key? what is the actual impact of using this key while producing in Kafka ?
There are two way of consuming the message. One is using Zookeeper Host and another is Static Host. Zookeper host consumes message from all partition. However if you are uisng Static Host than you can provide broker with partition number that needs to be consumed.
Please check below example of Kafka 0.8
Producer
KeyedMessage<String, String> data = new KeyedMessage<String, String>(<<topicName>>, <<KeyForPartition>>, <<Message>>);
Partition Class
public int partition(Object arg0, int arg1) {
// arg0 is the key given while producing, arg1 is the number of
// partition the broker has
long organizationId = Long.parseLong((String) arg0);
// if the given key is less than the no of partition available then send
// it according to the key given Else send it to the last partition
if (arg1 < organizationId) {
return (arg1 - 1);
}
// return (int) (organizationId % arg1);
return Integer.parseInt((String) arg0);
}
So the partiotioner class decide where to send message based on your logic.
Consumer (PN:I have used Storm Kafka 0.8 integration)
HostPort hosts = new HostPort("10.**.**.***",9092);
GlobalPartitionInformation gpi = new GlobalPartitionInformation();
gpi.addPartition(0, hosts);
gpi.addPartition(2, hosts);
StaticHosts statHost = new StaticHosts(gpi);
SpoutConfig spoutConf = new SpoutConfig(statHost, <<topicName>>, "/kafkastorm", <<spoutConfigId>>);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse