Producing from localhost to Kafka in HDP Sandbox 2.6.5 not working - apache-kafka

I am writing Kafka client producer as:
public class BasicProducerExample {
public static void main(String[] args){
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, 0);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
//props.put(ProducerConfig.
props.put("batch.size","16384");// maximum size of message
Producer<String, String> producer = new KafkaProducer<String, String>(props);
TestCallback callback = new TestCallback();
Random rnd = new Random();
for (long i = 0; i < 2 ; i++) {
//ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", "key-" + i, "message-"+i );
//Topci and Message
ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", ""+i);
producer.send(data, callback);
}
producer.close();
}
private static class TestCallback implements Callback {
#Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
System.out.println("Error while producing message to topic :" + recordMetadata);
e.printStackTrace();
} else {
String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset());
System.out.println(message);
}
}
}
}
OUTPUT:
Error while producing message to topic :null
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
NOTE:
Broker port: localhost:6667 is working.

In your property for BOOTSTRAP_SERVERS_CONFIG, try changing the port number to 6667.
Thanks.
--
Hiren

I use Apache Kafka on a Hortonworks (HDP 2.X release) installation. The error message encountered means that Kafka producer was not able to push the data to the segment log file. From a command-line console, that would mean 2 things :
You are using incorrect port for the brokers
Your listener config in server.properties are not working
If you encounter the error message while writing via scala api, additionally check connection to kafka cluster using telnet <cluster-host> <broker-port>
NOTE: If you are using scala api to create topic, it takes sometime for the brokers to know about the newly created topic. So, immediately after topic creation, the producers might fail with the error Failed to update metadata after 60000 ms.
I did the following checks in order to resolve this issue:
The first difference once I check via Ambari is that Kafka brokers listen on port 6667 on HDP 2.x (apache kafka uses 9092).
listeners=PLAINTEXT://localhost:6667
Next, use the ip instead of localhost.
I executed netstat -na | grep 6667
tcp 0 0 192.30.1.5:6667 0.0.0.0:* LISTEN
tcp 1 0 192.30.1.5:52242 192.30.1.5:6667 CLOSE_WAIT
tcp 0 0 192.30.1.5:54454 192.30.1.5:6667 TIME_WAIT
So, I modified the producer call to user the IP and not localhost:
./kafka-console-producer.sh --broker-list 192.30.1.5:6667 --topic rdl_test_2
To monitor if you have new records being written, monitor the /kafka-logs folder.
cd /kafka-logs/<topic name>/
ls -lart
-rw-r--r--. 1 kafka hadoop 0 Feb 10 07:24 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
Once, the producer successfully writes, the segment log-file 00000000000000000000.log will grow in size.
See the size below:
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
-rw-r--r--. 1 kafka hadoop **45** Feb 10 09:16 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
At this point, you can run the consumer-console.sh:
./kafka-console-consumer.sh --bootstrap-server 192.30.1.5:6667 --topic rdl_test_2 --from-beginning
response is hello world
After this step, if you want to produce messages via the Scala API's , then change the listeners value(from localhost to a public IP) and restart Kafka brokers via Ambari:
listeners=PLAINTEXT://192.30.1.5:6667
A Sample producer will be as follows:
package com.scalakafka.sample
import java.util.Properties
import java.util.concurrent.TimeUnit
import org.apache.kafka.clients.producer.{ProducerRecord, KafkaProducer}
import org.apache.kafka.common.serialization.{StringSerializer, StringDeserializer}
class SampleKafkaProducer {
case class KafkaProducerConfigs(brokerList: String = "192.30.1.5:6667") {
val properties = new Properties()
val batchsize :java.lang.Integer = 1
properties.put("bootstrap.servers", brokerList)
properties.put("key.serializer", classOf[StringSerializer])
properties.put("value.serializer", classOf[StringSerializer])
// properties.put("serializer.class", classOf[StringDeserializer])
properties.put("batch.size", batchsize)
// properties.put("linger.ms", 1)
// properties.put("buffer.memory", 33554432)
}
val producer = new KafkaProducer[String, String](KafkaProducerConfigs().properties)
def produce(topic: String, messages: Iterable[String]): Unit = {
messages.foreach { m =>
println(s"Sending $topic and message is $m")
val result = producer.send(new ProducerRecord(topic, m)).get()
println(s"the write status is ${result}")
}
producer.flush()
producer.close(10L, TimeUnit.MILLISECONDS)
}
}
Hope this helps someone.

Related

partition count reduced at sink during kafka stream record forward

I am using kafka stream for processing few kafka records , I have two node one is for doing some transformation and other is a final sink.
My the topics are INTER_TOPIC and FINAL_TOPIC are having 20 partitions each. and my producer which writing to INTER_TOPIC is writing in key value and partition-er is round robin.
below is the code at my inter transform node.
public void streamHandler() {
Properties props = getKafkaProperties();
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> processStream = builder.stream("INTER_TOPIC",
Consumed.with(Serdes.String(), Serdes.String()));
//processStream.peek((key,value)->System.out.println("key :"+key+" value :"+value));
processStream.map((key, value) -> getTransformer().transform(key, value)).filter((key,value)->filteroutFailedRequest(key,value)).to("FINAL_TOPIC", Produced.with(Serdes.String(), Serdes.String()));
KafkaStreams IStreams = new KafkaStreams(builder.build(), props);
IStreams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
#Override
public void uncaughtException(Thread t, Throw-able e) {
logger.error("Thread Name :" + t.getName() + " Error while processing:", e);
}
});
IStreams.cleanUp();
IStreams.start();
try {
System.in.read();
} catch (IOException e) {
logger.error("Failed streaming ",e);
}
}
but my sink is getting data in 2 partitions only, but I have 20 stream thread configured, and I verified my producer is writing to all 20 partitions, How to know that my transform node forwarding to all 20 partitions of my FINAL_TOPIC
30 Sep 2019 10:39:41,416 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-3] Received
30 Sep 2019 10:39:41,416 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-4] Received
30 Sep 2019 10:39:41,416 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-3] Received
30 Sep 2019 10:39:41,416 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-4] Received
30 Sep 2019 10:40:57,427 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-3] Received
30 Sep 2019 10:40:57,427 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-4] Received
30 Sep 2019 10:40:57,427 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-3] Received
30 Sep 2019 10:40:57,427 INFO c.j.m.s.StreamHandler [289] [streams-user-61a77203-9afc-4c66-843d-94c20a509793-StreamThread-4] Received
and partition-er is round robin
Why do you think that the partitioner is round-robin? By default, Kafka Streams applies a hash-based partitioning based on the key.
If you want to change the default partitioner, you can implement interface StreamPartitioner and pass it via:
Produced.with(Serdes.String(), Serdes.String())
.withStreamPartitioner(...)

How to send messages from Kafka producer on one PC to kafka broker on another PC?

I'm trying to send messages from Kafka Producer on one PC to Kafka broker on another PC using wifi interfaces, but messages don't appear in the specified topic at Kafka broker.
I connected two PCs using ASUS wireless router and disabled all the firewalls on PCs and router. Both PCs ping successfully each other. When I turn to a wired connection, it works and messages are ingested to the specified topic on kafka broker pc.
Kafka Producer:
public class CarDataProducer {
public static void main(String[] args) {
CarDataProducer fProducer= new CarDataProducer();
Producer<String, CarData> producer= fProducer.initializeKafkaProducer();
String topicName = "IN-DATA";
CSVReaderCarData csvReader = new CSVReaderCarData();
List<CarData> CarDataList = csvReader.readCarDataFromCSV("data/mllib/TrainTest_101.csv");
//read from CSV file and send
for (CarData val : CarDataList) {
producer.send(new ProducerRecord<String, CarData>(topicName, val));
}
}
public KafkaProducer<String, CarData> initializeKafkaProducer() {
// Set the producer configuration properties.
Properties props = ProducerProperties.getInstance();
// Instantiate a producerSampleJDBC
KafkaProducer<String, CarData> producer = new KafkaProducer<String, CarData>(props);
return producer;
}
public class ProducerProperties {
private ProducerProperties() {
}
public static final Properties props = new Properties();
static {
props.put("bootstrap.servers", "192.168.1.124:9092");
props.put("acks", "0");
props.put("retries", 0);
props.put("batch.size", 500);
props.put("linger.ms", 500);
props.put("buffer.memory", 500);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "com.iov.safety.vehicleproducer.CarDataSerializer");
}
public static Properties getInstance() {
return props;
}
}
Check message reception via Kafka Consumer using console on the server-side:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic IN-DATA
The reception of messages should look like this:
kafka-console-consumer.sh --bootstrap-server 192.168.1.124:9092 --topic IN-DATA
{"instSpeed":19.0,"time":15.0,"label":0.0}
{"instSpeed":64.0,"time":15.0,"label":1.0}
{"instSpeed":10.0,"time":16.0,"label":0.0}
ifconfig on server side
ipconfig on kafka producer side
server.properties:
listeners=PLAINTEXT://:9092
netstat -ano|grep '9092'
tcp6 0 0 :::9092 :::*
LISTEN off (0.00/0/0) tcp6 0 0 127.0.0.1:53880
127.0.1.1:9092 ESTABLISHED keepalive (6659.53/0/0) tcp6 0 0 127.0.1.1:9092 127.0.0.1:53880 ESTABLISHED
keepalive (6659.53/0/0) tcp6 0 0 127.0.1.1:9092
127.0.0.1:53878 ESTABLISHED keepalive (6659.15/0/0) tcp6 0 0 127.0.0.1:53878 127.0.1.1:9092 ESTABLISHED
keepalive (6659.15/0/0)
By adding callback to send of kafka producer, I get timeout error:
org.apache.kafka.common.errors.TimeoutException: Expiring 8 record(s) for IN-DATA-0: 30045 ms has passed since last append
I solved it. Each Kafka broker has to advertise its hostname/ip, to be reachable from Kafka Producer on another pc.
kafka-server-start.sh config/server.properties --override advertised.listeners=PLAINTEXT://192.168.1.124:9092
sha
Instead, we can update config/server.properties as follows:
advertised.listeners=PLAINTEXT://your.host.name:9092
or
advertised.listeners=PLAINTEXT://192.168.1.124:9092

log.debug("Coordinator discovery failed for group {}, refreshing metadata", groupId) with kafka 0.11.0.x

I'm using Kafka (version 0.11.0.2) server API to start a kafka broker in localhost.As it run without any problem.Producer also can send messages success.But consumer can't get messages and there is nothing error log in console.So I debugged the code and it looping for "refreshing metadata".
Here is the source code
while (coordinatorUnknown()) {
RequestFuture<Void> future = lookupCoordinator();
client.poll(future, remainingMs);
if (future.failed()) {
if (future.isRetriable()) {
remainingMs = timeoutMs - (time.milliseconds() - startTimeMs);
if (remainingMs <= 0)
break;
log.debug("Coordinator discovery failed for group {}, refreshing metadata", groupId);
client.awaitMetadataUpdate(remainingMs);
} else
throw future.exception();
} else if (coordinator != null && client.connectionFailed(coordinator)) {
// we found the coordinator, but the connection has failed, so mark
// it dead and backoff before retrying discovery
coordinatorDead();
time.sleep(retryBackoffMs);
}
remainingMs = timeoutMs - (time.milliseconds() - startTimeMs);
if (remainingMs <= 0)
break;
}
Adddtion: I change the Kafka version to 0.10.x,its run OK.
Here is my Kafka server code.
private static void startKafkaLocal() throws Exception {
final File kafkaTmpLogsDir = File.createTempFile("zk_kafka", "2");
if (kafkaTmpLogsDir.delete() && kafkaTmpLogsDir.mkdir()) {
Properties props = new Properties();
props.setProperty("host.name", KafkaProperties.HOSTNAME);
props.setProperty("port", String.valueOf(KafkaProperties.KAFKA_SERVER_PORT));
props.setProperty("broker.id", String.valueOf(KafkaProperties.BROKER_ID));
props.setProperty("zookeeper.connect", KafkaProperties.ZOOKEEPER_CONNECT);
props.setProperty("log.dirs", kafkaTmpLogsDir.getAbsolutePath());
//advertised.listeners=PLAINTEXT://xxx.xx.xx.xx:por
// flush every message.
// flush every 1ms
props.setProperty("log.default.flush.scheduler.interval.ms", "1");
props.setProperty("log.flush.interval", "1");
props.setProperty("log.flush.interval.messages", "1");
props.setProperty("replica.socket.timeout.ms", "1500");
props.setProperty("auto.create.topics.enable", "true");
props.setProperty("num.partitions", "1");
KafkaConfig kafkaConfig = new KafkaConfig(props);
KafkaServerStartable kafka = new KafkaServerStartable(kafkaConfig);
kafka.startup();
System.out.println("start kafka ok "+kafka.serverConfig().numPartitions());
}
}
Thanks.
With kafka 0.11, if you set num.partitions to 1 you also need to set the following 3 settings:
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
That should be obvious from your server logs when running 0.11.

Kafka 0.9 cant get Producer and Consumer API working

Trying the new 0.9 Consumer API and Producer API. But just cant seem to get it working. I have a producer that produces 100 messages per partition of a topic with two paritions. My code reads like
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "1");
props.put("retries", 0);
props.put("batch.size", 2);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
System.out.println("Created producer");
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 2; j++) {
producer.send(new ProducerRecord<>("burrow_test_2", j, "M_"+ i + "_" + j + "_msg",
"M_" + i + "_" + j + "_msg"));
System.out.println("Sending msg " + i + " into partition " + j);
}
System.out.println("Sent 200 msgs");
}
System.out.println("Closing producer");
producer.close();
System.out.println("Closed producer");
Now producer.close takes a really long time to close, post which I assume any buffers are flushed, and messages written to the tail of the log.
Now my consumer , I would like to read the specified number of messages before quitting. I chose to manually commit offset as I read. The code reads like
int noOfMessagesToRead = Integer.parseInt(args[0]);
String groupName = args[1];
System.out.println("Reading " + noOfMessagesToRead + " for group " + groupName);
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", groupName);
//props.put("enable.auto.commit", "true");
//props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("burrow_test_2"));
//consumer.seekToEnd(new TopicPartition("burrow_test_2", 0), new TopicPartition("burrow_test_2", 1));
int noOfMessagesRead = 0;
while (noOfMessagesRead != noOfMessagesToRead) {
System.out.println("Polling....");
ConsumerRecords<String, String> records = consumer.poll(0);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(),
record.value());
noOfMessagesRead++;
}
consumer.commitSync();
}
}
But my consumer is always stuck on the poll call (never returns even though I have specified timeout as 0).
Now just to confirm, I tried consuming from command line console consumer provided by Kafka bin/kafka-console-consumer.sh --from-beginning --new-consumer --topic burrow_test_2 --bootstrap-server localhost:9092.
Now this seems to read only to only consume messages produced before I started the java producer.
So questions are
Whats the problem with Java producer above
Why is the consumer not polling anything (I have older messages that are being read just fine by console consumer).
UPDATE: My bad. I had port forwarding enabled from localhost to a remote machine running Kafka, ZK. When this was the case there seems to be some issue. Running on localhost seemed to produce the message.
My only remaining question is that with consumer API, I am unable to seekan offset. I tried both seekToEnd and seekToBeginning methods, both threw an exception saying no record found. So what is the way for consumer to seek an offset. Is auto.offset.reset consumer property the only option ?

Not able to see kafka messages produced from Java api in the consumer console

I was trying to produce some messages and put into topic, and then getch the same from console consumer.
Code used :
import java.util.Date;
import java.util.Properties;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class SimpleProducer {
private static Producer<String,String> producer;
public SimpleProducer() {
Properties props = new Properties();
// Set the broker list for requesting metadata to find the lead broker
props.put("metadata.broker.list","172.22.96.56:9092,172.22.96.56:9093,172.22.96.56:9094");
//This specifies the serializer class for keys
props.put("serializer.class", "kafka.serializer.StringEncoder");
// 1 means the producer receives an acknowledgment once the lead replica
// has received the data. This option provides better durability as the
// client waits until the server acknowledges the request as successful.
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
producer = new Producer<String, String>(config);
}
public static void main(String[] args) {
int argsCount = args.length;
if (argsCount == 0 || argsCount == 1)
throw new IllegalArgumentException(
"Please provide topic name and Message count as arguments");
String topic = (String) args[0];
String count = (String) args[1];
int messageCount = Integer.parseInt(count);
System.out.println("Topic Name - " + topic);
System.out.println("Message Count - " + messageCount);
SimpleProducer simpleProducer = new SimpleProducer();
simpleProducer.publishMessage(topic, messageCount);
}
private void publishMessage(String topic, int messageCount) {
for (int mCount = 0; mCount < messageCount; mCount++) {
String runtime = new Date().toString();
String msg = "Message Publishing Time - " + runtime;
System.out.println(msg);
// Creates a KeyedMessage instance
KeyedMessage<String, String> data =
new KeyedMessage<String, String>(topic, msg);
// Publish the message
producer.send(data);
}
// Close producer connection with broker.
producer.close();
}
}
OUTPUT:
Topic Name - test
Message Count - 10
log4j:WARN No appenders could be found for logger
(kafka.utils.VerifiableProperties).
log4j:WARN Please initialize the log4j system properly.
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
Message Publishing Time - Tue Feb 16 02:00:56 IST 2016
from command line i supply the name of the topic as "kafkatopic" and count of the messages "10". The program runs fine without ant exception but when i try to see the messages from console, they do not appear. The topic is created.
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic kafkatopic --from-beginning
Can you plaease help as what went wrong!!
Two things I would like to point out:
1) You don't specify --zookeeper here - you should you --bootstrap-server argument.
2) You should see what the server.properties file say about listeners and advertised.listener. You should correctly point them to the brokers.
I hope this helps.