How to move a topic from one broker to another broker in kafka? - apache-kafka

I first tried to see if I can create a topic in a particular broker. But looks like this is not possible. Even if I mention the broker host in the bootstrap
admin_client = AdminClient({
"bootstrap.servers": "xxx1.com:9092,xxx2.com:9092"
})
futmap=admin_client.create_topics(topic_list)
The program is arbitrarily picking up one of the 5 brokers that I have as the leader broker for the topic. I am trying to understand why it happens like this.
I am also trying to see if I can reassign the topic leader to another broker. I know it may be possible through the kafka-reassign-partitions command line script, but I wanted to do it programmatically using python and confluent-Kafka package. Is it possible to do this programmatically. I did not find the reassign partition function in the ADMIN class of confluent-Kafka package
Thanks

I have finally got the solution for this, the documentation of the confluent Kafka python package is not adequate for this. But one good thing about open source is that you can read the code and figure out. So, to create the topic in a particular broker, I had to code the create topic code as below. Please note that I have used replica_assignment instead of replication_factor. These two are mutually exclusive. If you use the replication_factor, the partitions will be assigned by Kafka, you can control the assignment through replica_assignment. However, I am sure that this will get re-assigned in case of a rebalancing/re-assigning of partitions. But that can also be handled through the on_revoke event. But for now, this works for me.
def createTopic(admin_client,topics):
#topic_name=topics
topic_name = ['rajib1_test_xxx_topic']
replica_assignment = [[262, 261]]
topic_list = [NewTopic(topic, num_partitions=1, replica_assignment=replica_assignment) for topic in topic_name]
futmap=admin_client.create_topics(topic_list)
# Wait for each operation to finish.
for topic, f in futmap.items():
try:
f.result() # The result itself is None
print("Topic {} created".format(topic))
except Exception as e:
print("Failed to create topic {}: {}".format(topic, e))
#return futmap

You could also use the kafka-reassign-partitions.sh tool that comes with Kafka to change the replicas of one topic to another broker.
For example, if you want to have your (in this example single-replicated, and single-partitioned) topic "test" be located on broker "1", you can first define a plan (named replicachange.json):
{
"partitions":
[{"topic": "test", "partition": 0,
"replicas": [
1
]
}],
"version":1
}
and then execute it using:
kafka-reassign-partitions.sh --zookeeper localhost:2181 --execute \
--reassignment-json-file replicachange.json

Related

How does a consumer know it is no longer listed in the Kafka cluster?

We have this issue that when Kafka brokers must be taken offline, no consumer service has any idea about that and keeps running.
We tried listing consumers in the new Kafka instance, and saw no existing consumer listed there. All consumers listed are those newly created.
We had to manually terminate all existing consumer services which is not convenient every time we hit this issue.
Question - How does a consumer know it is no longer listed in the Kafka cluster so it should terminate itself?
P.S. We use Spring Kafka.
1 -- To Check Clusters & Replica status ?
Check Kafka cluster all broker status
$ zookeeper-shell.sh localhost:9001 ls /brokers/ids
Check Kafka cluster Specific broker status
$ zookeeper-shell.sh localhost:9001 get /brokers/ids/<id>
specific to replica_unavailability check
$ kafka-check --cluster-type=sample_type replica_unavailability
For first broker check
$ kafka-check --cluster-type=sample_type --broker-id 3 replica_unavailability --first-broker-only
Any partitions replicas not available
$ kafka-check --cluster-type=sample_type replica_unavailability
Checking offline partitions
$ kafka-check --cluster-type=sample_type offline
2 -- Code sample to send/auto-shutdown
2 custom options to do handle the shutdown using a kill-message,
do it gracefully by sending a kill-message before taking down
brokers or topics.
Option 1: Consider an in-band message/signal - i.e. send a “kill” message pertaining to topics/brokers consumer is listening to as it follows the offset order on the topic-partition
Option 2: make the consumer listen to 2 topics for e.g. “topic” and “topic_kill”
The difference between the 2 options above, is that the first version is comes in the the order it was sent, consider that there maybe blocking messages maybe waiting, depending on your implementation, to be consumed before that “kill message”.
While, the second version allows kill-signal to arrive independently without being blocked out of band, this is a nicer & reusable architectural pattern, with a clear separation between data topic and signaling.
Code Sample a) producer sending the kill-message & b) consumer to recieve and handle the shutdown
// Producer -- modify and adapt as needed
import json
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_serializer=lambda m: m.encode('utf8'),
value_serializer=lambda m: json.dumps(m).encode('utf8'))
def send_kill(topic: str, partitions: [int]):
for p in partitions:
producer.send(topic, key='kill', partition=p)
producer.flush()
// Consumer to accept a kill-message -- please modify and adapt as needed
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
consumer = KafkaConsumer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_deserializer=lambda m: m.decode('utf8'),
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
consumer.subscribe(['topic'])
for msg in consumer:
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
if msg.key == "kill":
consumer.commit(offsets=offsets)
consumer.unsuscribe()
exit(0)
# do your work...
consumer.commit(offsets=offsets)

Identify oldest Kafka offset that has data for consumption

I ran into this data issue today and to solve it I have to recalculate everything from the last 3 months. But, in Kafka when I run this command :
./kafka-console-consumer.sh --bootstrap-server 10.8.95.21:9092 --topic backoffice --from-beginning
it encounters an error : The requested offset is not within the range of offsets maintained by the server
The --from-beginning is trying to get data from Offsets whose data has been purged by kafka.
Can I list offsets alongwith the time it was created? So, that I can estimate from where I can start consuming data. Otherwise, if I can identify the oldest Kafka Offset that has data, I can start reading from that offset.
Have you tried out kt (fgeller/kt). This is an amazing tool as an alternative to Kafka console tools. This is written in go, so amazingly fast also. And one other advantage is you can get offset of each message by default there.
So you can simply write something like :
kt consume -brokers <broker-name> -topic <topic-name> oldest
and the output will be something like this :
{
"partition": 0,
"offset": <oldest-offset>,
"key": "<your-key>",
"value": "<value of the message>"
}
Edit: If you want some UI for this, Kafdrop is just what you are looking for. Setting it up is pretty easy and you can get all offset related information quite easily. You can even watch a message corresponding to an offset which is pretty amazing.
The following cmd worked for me:
./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list <broker-name> --topic <topic-name> --time -2

Kafka console producer skipping messages

I'm trying to send a file to a topic using:
cat myfile | kafka-console-producer.sh --broker-list $BROKER_URL --topic mytopic
When I check the count of messages on the topic I see few hundred messages less than actual.
During the write I see a message:
[2017-11-15 14:05:26,864] WARN Error while fetching metadata with correlation id 0 : {abc123=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I have correctly set the advertised hostname and listeners.
What confuses me is that if leader is not available how does it manage to put any messages into the topic? Furthermore, the message appears randomly, sometimes it doesn't.
How can I debug this?
As pointed out by vahid in comments this is a know issue.
The workaround is to specify --request-required-acks 1 to the console producer.
The random occurence of LEADER_NOT_AVAILABLE happens when I write to a new topic without explicitly creating it first. (Thanks to amethystic)

flink kafka consumer groupId not working

I am using kafka with flink.
In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it.
According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like:
If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total).
But the actual result is that, each program would receive 2 pieces of the messages.
I have tried to use consumer client that came with the kafka server download. It worked in the documented way(2 messages processed).
I tried to use 2 kafka consumers in the same Main function of a flink programe. 4 messages processed totally.
I also tried to run 2 instances of flink, and assigned each one of them the same program of kafka consumer. 4 messages.
Any ideas?
This is the output I expect:
1> Kafka and Flink2 says: element-65
2> Kafka and Flink1 says: element-66
Here's the wrong output i always get:
1> Kafka and Flink2 says: element-65
1> Kafka and Flink1 says: element-65
2> Kafka and Flink2 says: element-66
2> Kafka and Flink1 says: element-66
And here is the segment of code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
ParameterTool parameterTool = ParameterTool.fromArgs(args);
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) throws Exception {
return "Kafka and Flink1 says: " + value;
}
}).print();
env.execute();
}
I have tried to run it twice and also in the other way:
create 2 datastreams and env.execute() for each one in the Main function.
There was a quite similar question on the Flink user mailing list today, but I can't find the link to post it here. So here a part of the answer:
"Internally, the Flink Kafka connectors don’t use the consumer group
management functionality because they are using lower-level APIs
(SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each
parallel instance for more control on individual partition
consumption. So, essentially, the “group.id” setting in the Flink
Kafka connector is only used for committing offsets back to ZK / Kafka
brokers."
Maybe that clarifies things for you.
Also, there is a blog post about working with Flink and Kafka that may help you (https://data-artisans.com/blog/kafka-flink-a-practical-how-to).
Since there is not much use of group.id of flink kafka consumer other than commiting offset to zookeeper. Is there any way of offset monitoring as far as flink kafka consumer is concerned. I could see there is a way [with the help of consumer-groups/consumer-offset-checker] for console consumers but not for flink kafka consumers.
We want to see how our flink kafka consumer is behind/lagging with kafka topic size[total number of messages in topic at given point of time], it is fine to have it at partition level.

Kafka 0.8, is it possible to create topic with partition and replication using java code?

In Kafka 0.8beta a topic can be created using a command like below as mentioned here
bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 2 --partition 3 --topic test
the above command will create a topic named "test" with 3 partitions and 2 replicas per partition.
Can I do the same thing using Java ?
So far what I found is using Java we can create a producer as seen below
Producer<String, String> producer = new Producer<String, String>(config);
producer.send(new KeyedMessage<String, String>("mytopic", msg));
This will create a topic named "mytopic" with the number of partition specified using the "num.partitions" attribute and start producing.
But is there a way to define the partition and replication also ? I couldn't find any such example. If we can't then does that mean we always need to create topic with partitions and replication (as per our requirement) before and then use the producer to produce message within that topic. For example will it be possible if I want to create the "mytopic" the same way but with different number of partition (overriding the num.partitions attribute) ?
Note: My answer covers Kafka 0.8.1+, i.e. the latest stable version available as of April 2014.
Yes, you can create a topic programatically via the Kafka API. And yes, you can specify the desired number of partitions as well as the replication factor for the topic.
Note that the recently released Kafka 0.8.1+ provides a slightly different API than Kafka 0.8.0 (which was used by Biks in his linked reply). I added a code example to create a topic in Kafka 0.8.1+ to my reply to the question How Can we create a topic in Kafka from the IDE using API that Biks was referring to above.
`
import kafka.admin.AdminUtils;
import kafka.cluster.Broker;
import kafka.utils.ZKStringSerializer$;
import kafka.utils.ZkUtils;
String zkConnect = "localhost:2181";
ZkClient zkClient = new ZkClient(zkConnect, 10 * 1000, 8 * 1000, ZKStringSerializer$.MODULE$);
ZkUtils zkUtils = new ZkUtils(zkClient, new ZkConnection(zkConnect), false);
Properties pop = new Properties();
AdminUtils.createTopic(zkUtils, topic.getTopicName(), topic.getPartitionCount(), topic.getReplicationFactor(),
pop);
zkClient.close();`