I have built a very simple akka stream based on the alpakka project, but it doesn't read anything from kafka even though it connects and creates a consumer group. I have created an implicit Actor System and Materializer for the stream.
val done = Consumer.committableSource(consumerSettings,
Subscriptions.topics(kafkaTopic))
.map(msg => msg.committableOffset)
.mapAsync(1) { offset =>
offset.commitScaladsl()
}
.runWith(Sink.ignore)
[stream.actor.dispatcher] sends this message to KafkaConsumerActor "Requesting messages, requestId: 1, partitions: Set(kafka-topic-0)"
The KafkaConsumerActor doesn't seem to receive the message but when the supervisor asks the Actor to shutdown it does receive the message and shutdown.
Any lead on why it fails to read Kafka without an Error or Exception ?
I couldn't figure out why my akka stream wasn't consuming messages from the kafka broker, But When I implemented the same stream as a Runnable Graph, it worked.
Examples that I used - https://www.programcreek.com/scala/akka.stream.scaladsl.RunnableGraph
Related
Need some help with the below error. I am trying to connect to kafka to read data from Kafka topic. I using AdminClient as well to describe Topics. Why I am seeing this error.
Java.util.concurrent.ExecutionException: org.apache.kafka.common.KafkaException: No stream name specified in the topic path or in
the default stream configuration options
at
org.apache.kafka.common.internals.Kafka Future Impl.wrapAndThrow (Kafka Future Impl.java:45)
at
org.apache.kafka.common.internals.Kafka
Future Impl.access$000 (Kafka Future Impl.java:32)
at
org.apache.kafka.common.internals.Kafka Future Impl$SingleWaiter.await (KafkaFuture Impl.java:89)
at org.apache.kafka.common.internals.Kafka Future Impl.get (Kafka Future Impl.java:258)
Is there any event callback for producer-broker connection failure like we have for consumer-broker failure in APache Kafka like for Consumer we have KafkaRebalanceCallback? Does there exist something similar for the producer since my producer does reconnect if my broker goes down and comes up again or show some kinda log when this happens.
We have this issue that when Kafka brokers must be taken offline, no consumer service has any idea about that and keeps running.
We tried listing consumers in the new Kafka instance, and saw no existing consumer listed there. All consumers listed are those newly created.
We had to manually terminate all existing consumer services which is not convenient every time we hit this issue.
Question - How does a consumer know it is no longer listed in the Kafka cluster so it should terminate itself?
P.S. We use Spring Kafka.
1 -- To Check Clusters & Replica status ?
Check Kafka cluster all broker status
$ zookeeper-shell.sh localhost:9001 ls /brokers/ids
Check Kafka cluster Specific broker status
$ zookeeper-shell.sh localhost:9001 get /brokers/ids/<id>
specific to replica_unavailability check
$ kafka-check --cluster-type=sample_type replica_unavailability
For first broker check
$ kafka-check --cluster-type=sample_type --broker-id 3 replica_unavailability --first-broker-only
Any partitions replicas not available
$ kafka-check --cluster-type=sample_type replica_unavailability
Checking offline partitions
$ kafka-check --cluster-type=sample_type offline
2 -- Code sample to send/auto-shutdown
2 custom options to do handle the shutdown using a kill-message,
do it gracefully by sending a kill-message before taking down
brokers or topics.
Option 1: Consider an in-band message/signal - i.e. send a “kill” message pertaining to topics/brokers consumer is listening to as it follows the offset order on the topic-partition
Option 2: make the consumer listen to 2 topics for e.g. “topic” and “topic_kill”
The difference between the 2 options above, is that the first version is comes in the the order it was sent, consider that there maybe blocking messages maybe waiting, depending on your implementation, to be consumed before that “kill message”.
While, the second version allows kill-signal to arrive independently without being blocked out of band, this is a nicer & reusable architectural pattern, with a clear separation between data topic and signaling.
Code Sample a) producer sending the kill-message & b) consumer to recieve and handle the shutdown
// Producer -- modify and adapt as needed
import json
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_serializer=lambda m: m.encode('utf8'),
value_serializer=lambda m: json.dumps(m).encode('utf8'))
def send_kill(topic: str, partitions: [int]):
for p in partitions:
producer.send(topic, key='kill', partition=p)
producer.flush()
// Consumer to accept a kill-message -- please modify and adapt as needed
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
consumer = KafkaConsumer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_deserializer=lambda m: m.decode('utf8'),
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
consumer.subscribe(['topic'])
for msg in consumer:
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
if msg.key == "kill":
consumer.commit(offsets=offsets)
consumer.unsuscribe()
exit(0)
# do your work...
consumer.commit(offsets=offsets)
I sent a single message to my Kafka by using the following code:
def getHealthSink(kafkaHosts: String, zkHosts: String) = {
val kafkaHealth: Subscriber[String] = kafka.publish(ProducerProperties(
brokerList = kafkaHosts,
topic = "health_check",
encoder = new StringEncoder()
))
Sink.fromSubscriber(kafkaHealth).runWith(Source.single("test"))
}
val kafkaHealth = getHealthSink(kafkaHosts, zkHosts)
and I got the following error message:
ERROR kafka.utils.Utils$ fetching topic metadata for topics
[Set(health_check)] from broker
[ArrayBuffer(id:0,host:****,port:9092)] failed
kafka.common.KafkaException: fetching topic metadata for topics
[Set(health_check)] from broker
[ArrayBuffer(id:0,host:****,port:9092)] failed
Do you have any idea what can be the problem?
The error message is incredibly unclear, but basically "Fetching topic metadata" is the first thing the producer does, which means this is where it is first establishing a connection to Kafka.
There's a good chance that either the broker you are trying to connect to is down, or there is another connectivity issue (ports, firewalls, dns, etc).
In unrelated news: You seem to be using the old and deprecated Scala producer. We recommend moving to the new Java producer (org.apache.kafka.clients.KafkaProducer)
I am using kafka with flink.
In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it.
According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like:
If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total).
But the actual result is that, each program would receive 2 pieces of the messages.
I have tried to use consumer client that came with the kafka server download. It worked in the documented way(2 messages processed).
I tried to use 2 kafka consumers in the same Main function of a flink programe. 4 messages processed totally.
I also tried to run 2 instances of flink, and assigned each one of them the same program of kafka consumer. 4 messages.
Any ideas?
This is the output I expect:
1> Kafka and Flink2 says: element-65
2> Kafka and Flink1 says: element-66
Here's the wrong output i always get:
1> Kafka and Flink2 says: element-65
1> Kafka and Flink1 says: element-65
2> Kafka and Flink2 says: element-66
2> Kafka and Flink1 says: element-66
And here is the segment of code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
ParameterTool parameterTool = ParameterTool.fromArgs(args);
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) throws Exception {
return "Kafka and Flink1 says: " + value;
}
}).print();
env.execute();
}
I have tried to run it twice and also in the other way:
create 2 datastreams and env.execute() for each one in the Main function.
There was a quite similar question on the Flink user mailing list today, but I can't find the link to post it here. So here a part of the answer:
"Internally, the Flink Kafka connectors don’t use the consumer group
management functionality because they are using lower-level APIs
(SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each
parallel instance for more control on individual partition
consumption. So, essentially, the “group.id” setting in the Flink
Kafka connector is only used for committing offsets back to ZK / Kafka
brokers."
Maybe that clarifies things for you.
Also, there is a blog post about working with Flink and Kafka that may help you (https://data-artisans.com/blog/kafka-flink-a-practical-how-to).
Since there is not much use of group.id of flink kafka consumer other than commiting offset to zookeeper. Is there any way of offset monitoring as far as flink kafka consumer is concerned. I could see there is a way [with the help of consumer-groups/consumer-offset-checker] for console consumers but not for flink kafka consumers.
We want to see how our flink kafka consumer is behind/lagging with kafka topic size[total number of messages in topic at given point of time], it is fine to have it at partition level.