Trying the new 0.9 Consumer API and Producer API. But just cant seem to get it working. I have a producer that produces 100 messages per partition of a topic with two paritions. My code reads like
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "1");
props.put("retries", 0);
props.put("batch.size", 2);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
System.out.println("Created producer");
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 2; j++) {
producer.send(new ProducerRecord<>("burrow_test_2", j, "M_"+ i + "_" + j + "_msg",
"M_" + i + "_" + j + "_msg"));
System.out.println("Sending msg " + i + " into partition " + j);
}
System.out.println("Sent 200 msgs");
}
System.out.println("Closing producer");
producer.close();
System.out.println("Closed producer");
Now producer.close takes a really long time to close, post which I assume any buffers are flushed, and messages written to the tail of the log.
Now my consumer , I would like to read the specified number of messages before quitting. I chose to manually commit offset as I read. The code reads like
int noOfMessagesToRead = Integer.parseInt(args[0]);
String groupName = args[1];
System.out.println("Reading " + noOfMessagesToRead + " for group " + groupName);
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", groupName);
//props.put("enable.auto.commit", "true");
//props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("burrow_test_2"));
//consumer.seekToEnd(new TopicPartition("burrow_test_2", 0), new TopicPartition("burrow_test_2", 1));
int noOfMessagesRead = 0;
while (noOfMessagesRead != noOfMessagesToRead) {
System.out.println("Polling....");
ConsumerRecords<String, String> records = consumer.poll(0);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(),
record.value());
noOfMessagesRead++;
}
consumer.commitSync();
}
}
But my consumer is always stuck on the poll call (never returns even though I have specified timeout as 0).
Now just to confirm, I tried consuming from command line console consumer provided by Kafka bin/kafka-console-consumer.sh --from-beginning --new-consumer --topic burrow_test_2 --bootstrap-server localhost:9092.
Now this seems to read only to only consume messages produced before I started the java producer.
So questions are
Whats the problem with Java producer above
Why is the consumer not polling anything (I have older messages that are being read just fine by console consumer).
UPDATE: My bad. I had port forwarding enabled from localhost to a remote machine running Kafka, ZK. When this was the case there seems to be some issue. Running on localhost seemed to produce the message.
My only remaining question is that with consumer API, I am unable to seekan offset. I tried both seekToEnd and seekToBeginning methods, both threw an exception saying no record found. So what is the way for consumer to seek an offset. Is auto.offset.reset consumer property the only option ?
Related
I'm trying to send messages from Kafka Producer on one PC to Kafka broker on another PC using wifi interfaces, but messages don't appear in the specified topic at Kafka broker.
I connected two PCs using ASUS wireless router and disabled all the firewalls on PCs and router. Both PCs ping successfully each other. When I turn to a wired connection, it works and messages are ingested to the specified topic on kafka broker pc.
Kafka Producer:
public class CarDataProducer {
public static void main(String[] args) {
CarDataProducer fProducer= new CarDataProducer();
Producer<String, CarData> producer= fProducer.initializeKafkaProducer();
String topicName = "IN-DATA";
CSVReaderCarData csvReader = new CSVReaderCarData();
List<CarData> CarDataList = csvReader.readCarDataFromCSV("data/mllib/TrainTest_101.csv");
//read from CSV file and send
for (CarData val : CarDataList) {
producer.send(new ProducerRecord<String, CarData>(topicName, val));
}
}
public KafkaProducer<String, CarData> initializeKafkaProducer() {
// Set the producer configuration properties.
Properties props = ProducerProperties.getInstance();
// Instantiate a producerSampleJDBC
KafkaProducer<String, CarData> producer = new KafkaProducer<String, CarData>(props);
return producer;
}
public class ProducerProperties {
private ProducerProperties() {
}
public static final Properties props = new Properties();
static {
props.put("bootstrap.servers", "192.168.1.124:9092");
props.put("acks", "0");
props.put("retries", 0);
props.put("batch.size", 500);
props.put("linger.ms", 500);
props.put("buffer.memory", 500);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "com.iov.safety.vehicleproducer.CarDataSerializer");
}
public static Properties getInstance() {
return props;
}
}
Check message reception via Kafka Consumer using console on the server-side:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic IN-DATA
The reception of messages should look like this:
kafka-console-consumer.sh --bootstrap-server 192.168.1.124:9092 --topic IN-DATA
{"instSpeed":19.0,"time":15.0,"label":0.0}
{"instSpeed":64.0,"time":15.0,"label":1.0}
{"instSpeed":10.0,"time":16.0,"label":0.0}
ifconfig on server side
ipconfig on kafka producer side
server.properties:
listeners=PLAINTEXT://:9092
netstat -ano|grep '9092'
tcp6 0 0 :::9092 :::*
LISTEN off (0.00/0/0) tcp6 0 0 127.0.0.1:53880
127.0.1.1:9092 ESTABLISHED keepalive (6659.53/0/0) tcp6 0 0 127.0.1.1:9092 127.0.0.1:53880 ESTABLISHED
keepalive (6659.53/0/0) tcp6 0 0 127.0.1.1:9092
127.0.0.1:53878 ESTABLISHED keepalive (6659.15/0/0) tcp6 0 0 127.0.0.1:53878 127.0.1.1:9092 ESTABLISHED
keepalive (6659.15/0/0)
By adding callback to send of kafka producer, I get timeout error:
org.apache.kafka.common.errors.TimeoutException: Expiring 8 record(s) for IN-DATA-0: 30045 ms has passed since last append
I solved it. Each Kafka broker has to advertise its hostname/ip, to be reachable from Kafka Producer on another pc.
kafka-server-start.sh config/server.properties --override advertised.listeners=PLAINTEXT://192.168.1.124:9092
sha
Instead, we can update config/server.properties as follows:
advertised.listeners=PLAINTEXT://your.host.name:9092
or
advertised.listeners=PLAINTEXT://192.168.1.124:9092
I am writing Kafka client producer as:
public class BasicProducerExample {
public static void main(String[] args){
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, 0);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
//props.put(ProducerConfig.
props.put("batch.size","16384");// maximum size of message
Producer<String, String> producer = new KafkaProducer<String, String>(props);
TestCallback callback = new TestCallback();
Random rnd = new Random();
for (long i = 0; i < 2 ; i++) {
//ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", "key-" + i, "message-"+i );
//Topci and Message
ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", ""+i);
producer.send(data, callback);
}
producer.close();
}
private static class TestCallback implements Callback {
#Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
System.out.println("Error while producing message to topic :" + recordMetadata);
e.printStackTrace();
} else {
String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset());
System.out.println(message);
}
}
}
}
OUTPUT:
Error while producing message to topic :null
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
NOTE:
Broker port: localhost:6667 is working.
In your property for BOOTSTRAP_SERVERS_CONFIG, try changing the port number to 6667.
Thanks.
--
Hiren
I use Apache Kafka on a Hortonworks (HDP 2.X release) installation. The error message encountered means that Kafka producer was not able to push the data to the segment log file. From a command-line console, that would mean 2 things :
You are using incorrect port for the brokers
Your listener config in server.properties are not working
If you encounter the error message while writing via scala api, additionally check connection to kafka cluster using telnet <cluster-host> <broker-port>
NOTE: If you are using scala api to create topic, it takes sometime for the brokers to know about the newly created topic. So, immediately after topic creation, the producers might fail with the error Failed to update metadata after 60000 ms.
I did the following checks in order to resolve this issue:
The first difference once I check via Ambari is that Kafka brokers listen on port 6667 on HDP 2.x (apache kafka uses 9092).
listeners=PLAINTEXT://localhost:6667
Next, use the ip instead of localhost.
I executed netstat -na | grep 6667
tcp 0 0 192.30.1.5:6667 0.0.0.0:* LISTEN
tcp 1 0 192.30.1.5:52242 192.30.1.5:6667 CLOSE_WAIT
tcp 0 0 192.30.1.5:54454 192.30.1.5:6667 TIME_WAIT
So, I modified the producer call to user the IP and not localhost:
./kafka-console-producer.sh --broker-list 192.30.1.5:6667 --topic rdl_test_2
To monitor if you have new records being written, monitor the /kafka-logs folder.
cd /kafka-logs/<topic name>/
ls -lart
-rw-r--r--. 1 kafka hadoop 0 Feb 10 07:24 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
Once, the producer successfully writes, the segment log-file 00000000000000000000.log will grow in size.
See the size below:
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
-rw-r--r--. 1 kafka hadoop **45** Feb 10 09:16 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
At this point, you can run the consumer-console.sh:
./kafka-console-consumer.sh --bootstrap-server 192.30.1.5:6667 --topic rdl_test_2 --from-beginning
response is hello world
After this step, if you want to produce messages via the Scala API's , then change the listeners value(from localhost to a public IP) and restart Kafka brokers via Ambari:
listeners=PLAINTEXT://192.30.1.5:6667
A Sample producer will be as follows:
package com.scalakafka.sample
import java.util.Properties
import java.util.concurrent.TimeUnit
import org.apache.kafka.clients.producer.{ProducerRecord, KafkaProducer}
import org.apache.kafka.common.serialization.{StringSerializer, StringDeserializer}
class SampleKafkaProducer {
case class KafkaProducerConfigs(brokerList: String = "192.30.1.5:6667") {
val properties = new Properties()
val batchsize :java.lang.Integer = 1
properties.put("bootstrap.servers", brokerList)
properties.put("key.serializer", classOf[StringSerializer])
properties.put("value.serializer", classOf[StringSerializer])
// properties.put("serializer.class", classOf[StringDeserializer])
properties.put("batch.size", batchsize)
// properties.put("linger.ms", 1)
// properties.put("buffer.memory", 33554432)
}
val producer = new KafkaProducer[String, String](KafkaProducerConfigs().properties)
def produce(topic: String, messages: Iterable[String]): Unit = {
messages.foreach { m =>
println(s"Sending $topic and message is $m")
val result = producer.send(new ProducerRecord(topic, m)).get()
println(s"the write status is ${result}")
}
producer.flush()
producer.close(10L, TimeUnit.MILLISECONDS)
}
}
Hope this helps someone.
How to produce messages with headers in Kafka 0.11 using console producer?
I didn't find any description in Kafka document about this.
Update: Since Kafka 3.2, you can produce records with headers using the kafka-console-producer.sh tool. For details, see https://cwiki.apache.org/confluence/display/KAFKA/KIP-798%3A+Add+possibility+to+write+kafka+headers+in+Kafka+Console+Producer
Using the kafka-console-producer.sh tool (ConsoleProducer.scala) you cannot produce messages with headers.
You need to write your own small application. Headers are passed in when creating a ProducerRecord. For example:
public static void main(String[] args) throws Exception {
Properties producerConfig = new Properties();
producerConfig.load(new FileInputStream("producer.properties"));
KafkaProducer<String, String> producer = new KafkaProducer<>(producerConfig);
List<Header> headers = Arrays.asList(new RecordHeader("header_key", "header_value".getBytes()));
ProducerRecord<String, String> record = new ProducerRecord<>("topic", 0, "key", "value", headers);
Future<RecordMetadata> future = producer.send(record);
future.get();
producer.close();
}
You can also use kcat to produce message with header.
kcat -P -b localhost:9092 -H "facilityCountryCode=US" -H "facilityNum=32619" \
-t test.topic.to.mq -f testkafkaproducerfile.json
for more information check github page: https://github.com/edenhill/kcat
I'm using Kafka (version 0.11.0.2) server API to start a kafka broker in localhost.As it run without any problem.Producer also can send messages success.But consumer can't get messages and there is nothing error log in console.So I debugged the code and it looping for "refreshing metadata".
Here is the source code
while (coordinatorUnknown()) {
RequestFuture<Void> future = lookupCoordinator();
client.poll(future, remainingMs);
if (future.failed()) {
if (future.isRetriable()) {
remainingMs = timeoutMs - (time.milliseconds() - startTimeMs);
if (remainingMs <= 0)
break;
log.debug("Coordinator discovery failed for group {}, refreshing metadata", groupId);
client.awaitMetadataUpdate(remainingMs);
} else
throw future.exception();
} else if (coordinator != null && client.connectionFailed(coordinator)) {
// we found the coordinator, but the connection has failed, so mark
// it dead and backoff before retrying discovery
coordinatorDead();
time.sleep(retryBackoffMs);
}
remainingMs = timeoutMs - (time.milliseconds() - startTimeMs);
if (remainingMs <= 0)
break;
}
Adddtion: I change the Kafka version to 0.10.x,its run OK.
Here is my Kafka server code.
private static void startKafkaLocal() throws Exception {
final File kafkaTmpLogsDir = File.createTempFile("zk_kafka", "2");
if (kafkaTmpLogsDir.delete() && kafkaTmpLogsDir.mkdir()) {
Properties props = new Properties();
props.setProperty("host.name", KafkaProperties.HOSTNAME);
props.setProperty("port", String.valueOf(KafkaProperties.KAFKA_SERVER_PORT));
props.setProperty("broker.id", String.valueOf(KafkaProperties.BROKER_ID));
props.setProperty("zookeeper.connect", KafkaProperties.ZOOKEEPER_CONNECT);
props.setProperty("log.dirs", kafkaTmpLogsDir.getAbsolutePath());
//advertised.listeners=PLAINTEXT://xxx.xx.xx.xx:por
// flush every message.
// flush every 1ms
props.setProperty("log.default.flush.scheduler.interval.ms", "1");
props.setProperty("log.flush.interval", "1");
props.setProperty("log.flush.interval.messages", "1");
props.setProperty("replica.socket.timeout.ms", "1500");
props.setProperty("auto.create.topics.enable", "true");
props.setProperty("num.partitions", "1");
KafkaConfig kafkaConfig = new KafkaConfig(props);
KafkaServerStartable kafka = new KafkaServerStartable(kafkaConfig);
kafka.startup();
System.out.println("start kafka ok "+kafka.serverConfig().numPartitions());
}
}
Thanks.
With kafka 0.11, if you set num.partitions to 1 you also need to set the following 3 settings:
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
That should be obvious from your server logs when running 0.11.
I am new to Storm ,
These are my Storm Configuration
val builder = new TopologyBuilder()
builder.setSpout("kafka-spout", new AlgoKafkaSpout().buildKafkaSpout(), 4)
builder.setBolt("wlref-bolt", new WlrefBolt, 4).shuffleGrouping("kafka-spout")
builder.setBolt("params-bolt", new ParamsBolt, 4).shuffleGrouping("wlref-bolt")
builder.setBolt("gender-bolt", new GenderBolt, 4).shuffleGrouping("params-bolt")
builder.setBolt("age-bolt", new AgeBolt, 4).shuffleGrouping("gender-bolt")
builder.setBolt("preference-bolt", new PreferenceBolt, 4).shuffleGrouping("age-bolt")
builder.setBolt("geo-bolt", new GeoBolt, 4).shuffleGrouping("preference-bolt")
builder.setBolt("device-bolt", new DeviceBolt, 4).shuffleGrouping("geo-bolt")
builder.setBolt("druid-bolt", new AlgoBeamBolt[java.util.Map[String, AnyRef]](new MyBeamFactory), 4)
.shuffleGrouping("device-bolt")
builder.setBolt("redis-bolt",new RedisBolt,4).shuffleGrouping("druid-bolt")
val conf = new Config()
conf.setDebug(false)
conf.setMessageTimeoutSecs(120)
conf.setNumWorkers(1)
StormSubmitter.submitTopology(args(0), conf, builder.createTopology)
And this my KafkaSpout
val hosts = new ZkHosts(s"${Config.zkHost}:${Config.zkPort}")
val topic = Config.kafkaTopic
val zkRoot = s"/$topic"+"7"
val groupId = Config.kafkaGroup
val kafkaConfig = new SpoutConfig(hosts, topic, zkRoot, UUID.randomUUID().toString())
kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme())
kafkaConfig.startOffsetTime = kafka.api.OffsetRequest.LatestTime
new KafkaSpout(kafkaConfig)
We are producing approx 400 to 1000 messages per second. We make 4 partitions in Kafka Topic.
For few hours we consume message properly but after sometime kafkaSpout did not consume message from some partitions
This is message consuming report
Please let us know if any other information required.
EDIT
Kafka Version: kafka_2.11-0.10.1.1
Storm Version: apache-storm-0.10.2
Sbt Dependencies
"org.apache.storm" % "storm-core" % "0.10.2" % "provided",
"org.apache.storm" % "storm-kafka" % "0.10.2" /*% "provided"*/,
"org.apache.kafka" %% "kafka" % "0.10.0.1"