Spring-XD Error while processing: KafkaMessage, MessageDispatchingException: Dispatcher has no subscribers - apache-kafka

I am using Spring-XD to read a topic from kafka, filter the data using spark- streaming-processor and sink the data into the spark.
The command i used to deploy stream is:
stream create spark-streaming-word-count --definition "kafka --zkconnect=localhost:2181 --topic=log-stream | java-word-count | log" --deploy
And the error i got for this is:
2015-05-23 11:36:16,190 1.1.1.RELEASE ERROR dispatcher-1 listener.LoggingErrorHandler - Error while processing: KafkaMessage [Message(magic = 0, attributes = 0, crc = 3699841462, key = java.nio.HeapByteBuffer[pos=0 lim=6 cap=437], payload = java.nio.HeapByteBuffer[pos=0 lim=427 cap=427]), KafkaMessageMetadata [offset=26353, nextOffset=26354, Partition[topic='log-stream', id=0]]
org.springframework.messaging.MessageDeliveryException: Dispatcher has no subscribers for channel 'admin:default,admin,singlenode,hsqldbServer:9393.spark-streaming-word-count.0'.; nested exception is org.springframework.integration.MessageDispatchingException: Dispatcher has no subscribers
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:81)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:277)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:239)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:115)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:45)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:95)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutput(AbstractMessageProducingHandler.java:248)
at org.springframework.integration.handler.AbstractMessageProducingHandler.produceOutput(AbstractMessageProducingHandler.java:171)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutputs(AbstractMessageProducingHandler.java:119)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:105)
Please help me to resolve this issue
Thanks

What is the stream deployment status? The shell command stream list would give you the stream deployment status. Also, try runtime modules to see what modules are running. It appears like the downstream module java-word-count is not deployed yet.

Related

Does kafka connect restart failed task

We have a source connector that reads from rdbms and put to kafka. It uses schema registry with avro schema.
I am finding following exceptions in kafka connect log and schema registry log respectively.
1.
Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
WorkerSourceTask{id=A-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
.
.
Caused by: org.apache.kafka.connect.errors.DataException: Failed to serialize Avro data from topic A :
at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:91)
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)
.
.
Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema:
.
.
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002
.
.
Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:187)
Stopping JDBC source task (io.confluent.connect.jdbc.source.JdbcSourceTask:314)
Closing the Kafka producer with timeoutMillis = 30000 ms.
(org.apache.kafka.clients.producer.KafkaProducer:1182)
2.
Wait to catch up until the offset at 1 (io.confluent.kafka.schemaregistry.storage.KafkaStore:304)
Request Failed with exception (io.confluent.rest.exceptions.DebuggableExceptionMapper:62)
io.confluent.kafka.schemaregistry.rest.exceptions.RestSchemaRegistryTimeoutException: Register operation timed out
at io.confluent.kafka.schemaregistry.rest.exceptions.Errors.operationTimeoutException(Errors.java:132)
.
.
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Write to the Kafka store timed out while
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.register(KafkaSchemaRegistry.java:508)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.registerOrForward(KafkaSchemaRegistry.java:553)
.
.
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreTimeoutException: KafkaStoreReaderThread failed to reach target offset within the timeout interval. targetOffset: 3, offsetReached: 1, timeout(ms): 50
0
So basically schema registry before registering schema moves offset to latest and there it time out 500ms.
My question was this.
How can I find why it is not able to read from kafka?
Does the source connector task restart or poll data for the failed task of one connector? Because in later section of the log I see this.
Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
WorkerSourceTask{id=A-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
So eariler it failed after this, but now it is not printing it, which means it passed.
The key thing to note is that when it failed eariler reading, it failed task for only one connector A and others passed. Later I didn't find the exception for the connector A.
If the task is not starting or connector is not polling again, I need to restart task using rest API.
Any help will be greatly appriciated.
Thanks in advance.
Regarding your question title, read the error.
task will not recover until manually restarted
If you have more than one task, you would still expect to see logs from other tasks.
As far as offset commits, source task offsets would not be committed until the task succeeds, and no logs given show something "moving to latest"
The error has nothing to do with reading from Kafka. The error is a timeout in your schema registry client in the AvroConverter, which isn't required for Kafka Connect.

Getting error while publishing message to kafka topic

I am new to Kafka. I have written a simple JAVA program to generate a message using avro schema. I have generated a specific record. The record is generated successfully. My schema is not yet registered with my local environment. It is currently registered with some other environment.
I am using the apache kafka producer library to publish the message to my local environment kafka topic. Can I publish the message to the local topic or the schema needs to be registered with the local schema registry as well.
Below are the producer properties -
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
properties.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "https://schema-registry.xxxx.service.dev:443");```
Error I am getting while publishing the message -
``` org.apache.kafka.common.errors.SerializationException: Error registering Avro schema:
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: User is denied operation Write on Subject: xxx.avro-value; error code: **40301**
The issue was kafka producer by default tries to register the schema on topic. So, I added the below - properties.put(KafkaAvroSerializerConfig.AUTO_REGISTER_SCHEMAS, false);
and it resolved the issue.

Apache Kafka: Fetching topic metadata with correlation id 0

I sent a single message to my Kafka by using the following code:
def getHealthSink(kafkaHosts: String, zkHosts: String) = {
val kafkaHealth: Subscriber[String] = kafka.publish(ProducerProperties(
brokerList = kafkaHosts,
topic = "health_check",
encoder = new StringEncoder()
))
Sink.fromSubscriber(kafkaHealth).runWith(Source.single("test"))
}
val kafkaHealth = getHealthSink(kafkaHosts, zkHosts)
and I got the following error message:
ERROR kafka.utils.Utils$ fetching topic metadata for topics
[Set(health_check)] from broker
[ArrayBuffer(id:0,host:****,port:9092)] failed
kafka.common.KafkaException: fetching topic metadata for topics
[Set(health_check)] from broker
[ArrayBuffer(id:0,host:****,port:9092)] failed
Do you have any idea what can be the problem?
The error message is incredibly unclear, but basically "Fetching topic metadata" is the first thing the producer does, which means this is where it is first establishing a connection to Kafka.
There's a good chance that either the broker you are trying to connect to is down, or there is another connectivity issue (ports, firewalls, dns, etc).
In unrelated news: You seem to be using the old and deprecated Scala producer. We recommend moving to the new Java producer (org.apache.kafka.clients.KafkaProducer)

Apache Flume: kafka.consumer.ConsumerTimeoutException

I'm trying to build pipeline with Apache Flume:
spooldir -> kafka channel -> hdfs sink
Events go to kafka topic without problems and I can see them with kafkacat request. But kafka channel can't write files to hdfs via sink. The error is:
Timed out while waiting for data to come from Kafka
Full log:
2016-02-26 18:25:17,125
(SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181))
[DEBUG -
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:717)]
Got ping response for sessionid: 0x2524a81676d02aa after 0ms
2016-02-26 18:25:19,127
(SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181))
[DEBUG -
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:717)]
Got ping response for sessionid: 0x2524a81676d02aa after 1ms
2016-02-26 18:25:21,129
(SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181))
[DEBUG -
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:717)]
Got ping response for sessionid: 0x2524a81676d02aa after 0ms
2016-02-26 18:25:21,775
(SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG -
org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction.doTake(KafkaChannel.java:327)]
Timed out while waiting for data to come from Kafka
kafka.consumer.ConsumerTimeoutException at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:69)
at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
at
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at
org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction.doTake(KafkaChannel.java:306)
at
org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
at
org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:374)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
My FlUME's config is:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c2
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/alex/spoolFlume
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://10.12.0.1:54310/logs/flumetest/
a1.sinks.k1.hdfs.filePrefix = flume-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c2.capacity = 10000
a1.channels.c2.transactionCapacity = 1000
a1.channels.c2.brokerList=kafka10:9092,kafka11:9092,kafka12:9092
a1.channels.c2.topic=flume_test_001
a1.channels.c2.zookeeperConnect=zoo00:2181,zoo01:2181,zoo02:2181
# Bind the source and sink to the channel
a1.sources.r1.channels = c2
a1.sinks.k1.channel = c2
With memory channel instead of kafka channel all works good.
Thanks for any ideas in advance!
ConsumerTimeoutException means there is no new message for a long time, doesn't mean connect time out for Kafka.
http://kafka.apache.org/documentation.html
consumer.timeout.ms -1 Throw a timeout exception to the consumer if no message is available for consumption after the specified interval
Kafka's ConsumerConfig class has the "consumer.timeout.ms" configuration property, which Kafka sets by default to -1. Any new Kafka Consumer is expected to override the property with a suitable value.
Below is a reference from Kafka documentation :
consumer.timeout.ms -1
By default, this value is -1 and a consumer blocks indefinitely if no new message is available for consumption. By setting the value to a positive integer, a timeout exception is thrown to the consumer if no message is available for consumption after the specified timeout value.
When Flume creates a Kafka channel, it is setting the timeout.ms value to 100, as seen on the Flume logs at the INFO level. That explains why we see a ton of these ConsumerTimeoutExceptions.
level: INFO Post-validation flume configuration contains configuration for agents: [agent]
level: INFO Creating channels
level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
level: INFO Creating instance of channel c1 type org.apache.flume.channel.kafka.KafkaChannel
level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
level: INFO Group ID was not specified. Using flume as the group id.
level: INFO {metadata.broker.list=kafka:9092, request.required.acks=-1, group.id=flume,
zookeeper.connect=zookeeper:2181, **consumer.timeout.ms=100**, auto.commit.enable=false}
level: INFO Created channel c1
Going by the Flume user guide on Kafka channel settings, I tried to override this value by specifying the below, but that doesn't seem to work though:
agent.channels.c1.kafka.consumer.timeout.ms=5000
Also, we did a load test with pounding data through the channels constantly, and this exception didn't occur during the tests.
I read flume's source code, and found that flume reads value of the key "timeout" for "consumer.timeout.ms".
So you can config the value for "consumer.timeout.ms" like this:
agent1.channels.kafka_channel.timeout=-1

Flume Kafka sink not able to write complete messages to Kafka Broker

I have written a process where I'm generating messages thru custom flume source and Flume Kafka sink provided by Hortonworks to write into Kafka brokers.
During this process i have noticed that if KAFKA broker is already running and then i start my Flume agent it delivers each and every message to the Kafka broker properly but when i starts Kafka broker when Flume agent is already running, KAFKA broker is not able to receive all the messages.
When i run Kafka Console consumer to check the counts of messages received i noticed it is dropping few records from beginning and few records from the end.
I have tried multiple mix and match in Flume.conf but still it is working as expected.
Below are the configuration parameter which i have provided to
Flume.conf -
agent.channels = firehose-channel
agent.sources = stress-source
agent.sinks = kafkasink
#################################
# Benchmark Souce Configuration #
#################################
agent.sources.stress-source.type=com.kohls.flume.source.stress.BenchMarkTestScenriao
agent.sources.stress-source.size=5000
agent.sources.stress-source.maxTotalEvents=30000
agent.sources.stress-source.batchSize=200
agent.sources.stress-source.throughputThreshold=4000
agent.sources.stress-source.throughputControlSeconds=1
agent.sources.stress-source.channels=firehose-channel
#################################
# Firehose Channel Configuration #
#################################
agent.channels.firehose-channel.type = file
agent.channels.firehose-channel.checkpointDir = /data/flume/checkpoint
agent.channels.firehose-channel.dataDirs = /data/flume/data
agent.channels.firehose-channel.capacity = 10000
agent.channels.firehose-channel.transactionCapacity = 10000
agent.channels.firehose-channel.useDualCheckpoints=1
agent.channels.firehose-channel.backupCheckpointDir=/data/flume/backup
############################################
# Firehose Sink Configuration - Kafka Sink #
############################################
agent.sinks.kafkasink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkasink.topic = backoff_test_17
agent.sinks.kafkasink.channel=firehose-channel
agent.sinks.kafkasink.brokerList = sandbox.hortonworks.com:6667
agent.sinks.kafkasink.batchsize = 200
agent.sinks.kafkasink.requiredAcks = 1
agent.sinks.kafkasink.kafka.producer.type = async
agent.sinks.kafkasink.kafka.batch.num.messages = 200
I have also tried to analyses the flume log and noticed that the flume metrics are properly showing the PUT and TAKE count.
Please let me know if anyone has any pointer to solve this issue. Appreciating your help in advance.