Can we update a consumer offset in kafka 0.10? - apache-kafka

I am using older version of Kafka, 0.10. Is there any way by which I can update the consumer offset for a topic to an arbitrary number?

In Kafka 0.10, I don't think there was a tool to easily update a consumer offset.
You basically have 2 options:
Use the tools from a more recent Kafka version. Nowadays, consumer offsets can be updated using both the kafka-consumer-groups.sh tool or the AdminClient (only in trunk at the moment, it will be in Kafka 2.5).
Write a small application that starts a consumer and calls commitSync() to update its consumer offsets, like in ConsumerGroupCommand.resetOffsets()

Related

Can new flink Kafka consumer (KafkaSource) start from the old FlinkKafkaConsumer's Savepoint/checkpoint?

I have a job which is running with old flink Kafka consumer ( FlinkKafkaConsumer ) Now I want to migrate it to KafkaSource . But I am not sure what will be the impact of this migration. I want my job to start from the latest successful checkpoint taken by old FlinkKafkaConsumer, Is that possible? If it is not possible then what should be the right way for me to migrate Kafka consumer?
Assuming the same configuration, the two should be able to be used interchangeably as long as your previous group-id configuration for the consumer matches the one used by your earlier implementation. You can use this in conjunction with OffsetsInitializer.latest() to ensure that you continue reading from the same offsets that were previously committed:
KafkaSource.<YourExampleClass>builder()
...
.setGroupId("your-previous-group-id")
.setStartingOffsets(OffsetsInitializer.latest())
While the two should just work, it's worth noting your specific pipeline and how it uses parallelism could reveal some of the differences between FlinkKafkaConsumer and the newer KafkaSource:
the KafkaSource behaves differently than FlinkKafkaConsumer in the case where the number of Kafka partitions is smaller than the parallelism of Flink's Kafka Source operator.
How to upgrade from FlinkKafkaConsumer to KafkaSource has been included in the release notes for Flink 1.14, when the FlinkKafkaConsumer was deprecated. You can find it at https://nightlies.apache.org/flink/flink-docs-release-1.14/release-notes/flink-1.14/#deprecate-flinkkafkaconsumer

Apache Kafka why producer is connected by broker, but consumer is connected to zookeeper?

The old version suggests that consumer connect zookeeper, and the new version suggests linking broker? A netizen from a community replied to me that the old version of topic's offset is ZK, and the new version is Kafka itself. Is this the answer to this answer?
Older versions of Kafka i.e before 0.9 version store offsets with Zookeeper.
Newer version of Kafka, store offsets in an internal Kafka topic called __consumer_offsets.
The newer version still provides the feasibility to store offsets with Zookeeeper.
With this the consumers can now only talk to the Brokers and does not need to rely on Zookeeper.
If there are many consumers simultaneously reading from Kafka, the read write load on ZooKeeper may exceed its capacity, making ZooKeeper a bottleneck.
check this for more information
https://github.com/SOHU-Co/kafka-node/issues/502

Will there any data loss while upgrading kafka client from 0.8.0 to 0.10.0.1?

we are planning to upgrade Kafka client from 0.8.0 to 0.10.0.1 but since in consumers the offset in 0.8.0 version is stored in zookeeper where as it is stored in broker in version 0.10.0.1, if we start consumer with the same group and client id as of version 0.8.0 in 0.10.0.1 then will new consumer fetch the messages from where old consumer stopped consuming. If data loss is going to happen can we try migrating the offsets from zookeeper to broker and then start our new consumer
You can continue storing offsets in zookeeper on 0.10. In fact, if you just upgraded the client binaries, you won't see any change in the offset commit behavior. Where you will have to start thinking about migration of data and offsets is when you move to using the new consumer API in your application. This is where you will need to stop your old application instance based on the old API, check the offsets stored in zookeeper, and then start the new consumer API implementation from that offset to about data loss or duplication.

Increase number of partitions in a Kafka topic from a Kafka client

I'm a new user of Apache Kafka and I'm still getting to know the internals.
In my use case, I need to increase the number of partitions of a topic dynamically from the Kafka Producer client.
I found other similar questions regarding increasing the partition size, but they utilize the zookeeper configuration. But my kafkaProducer has only the Kafka broker config, but not the zookeeper config.
Is there any way I can increase the number of partitions of a topic from the Producer side? I'm running Kafka version 0.10.0.0.
As of Kafka 0.10.0.1 (latest release): As Manav said it is not possible to increase the number of partitions from the Producer client.
Looking ahead (next releases): In an upcoming version of Kafka, clients will be able to perform some topic management actions, as outlined in KIP-4. A lot of the KIP-4 functionality is already completed and available in Kafka's trunk; the code in trunk as of today allows client to create and to delete topics. But unfortunately, for your use case, increasing the number of partitions is still not possible yet -- this is in scope for KIP-4 (see Alter Topics Request) but is not completed yet.
TL;DR: The next versions of Kafka will allow you to increase the number of partitions of a Kafka topic, but this functionality is not yet available.
It is not possible to increase the number of partitions from the Producer client.
Any specific use case use why you cannot use the broker to achieve this ?
But my kafkaProducer has only the Kafka broker config, but not the
zookeeper config.
I don't think any client will let you change the broker config. You can only access (read) the server side config at max.
Your producer can provide different keys for ProducerRecord's. The broker would place them in different partitions. For example, if you need two partitions, use keys "abc" and "xyz".
This can be done in version 0.9 as well.

In Storm, how to migrate offsets to store in Kafka?

I've been having all sorts of instabilities related to Kafka and offsets. Things like workers crashing on startup with exceptions related to invalidate offsets, and other things I don't understand.
I read that it is recommended to migrate offsets to be stored in Kafka instead of Zookeeper. I found the below in the Kafka documentation:
Migrating offsets from ZooKeeper to Kafka Kafka consumers in
earlier releases store their offsets by default in ZooKeeper. It is
possible to migrate these consumers to commit offsets into Kafka by
following these steps: 1. Set offsets.storage=kafka and
dual.commit.enabled=true in your consumer config. 2. Do a rolling
bounce of your consumers and then verify that your consumers are
healthy. 3. Set dual.commit.enabled=false in your consumer config. 4. Do
a rolling bounce of your consumers and then verify that your consumers
are healthy.
A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also
be performed using the above steps if you set
offsets.storage=zookeeper.
http://kafka.apache.org/documentation.html#offsetmigration
But, again, I don't understand what this is instructing me to do. I don't see anywhere in my topology config where I configure where offsets are stored. Is it buried in the cluster yaml?
Any advice on if storing offsets in Kafka, rather than Zookeeper, is a good idea? And how I can perform this change?
At the time of this writing Storm's Kafka spout (see documentation/README at https://github.com/apache/storm/tree/master/external/storm-kafka) only supports managing consumer offsets in ZooKeeper. That is, all current Storm versions (up to 0.9.x and including 0.10.0 Beta) still rely on ZooKeeper for storing such offsets. Hence you should not perform the ZK->Kafka offset migration you referenced above because Storm isn't compatible yet.
You will need to wait until the Storm project -- specifically, its Kafka spout -- supports managing consumer offsets via Kafka (instead of ZooKeeper). And yes, in general it is better to store consumer offsets in Kafka rather than ZooKeeper, but alas Storm isn't there yet.
Update November 2016:
The situation in Storm has improved in the meantime. There's now a new, second Kafka spout that is based on Kafka's new 0.10 consumer client, which stores consumer offsets in Kafka (and not in ZooKeeper): https://github.com/apache/storm/tree/master/external/storm-kafka-client.
However, at the time I am writing this, there are still several issues being reported by the users in the storm-user mailing list (such as Urgent help! kafka-spout stops fetching data after running for a while), so I'd use this new Kafka spout with care, and only after thorough testing.