kafka spout can't connect to kafka topic - apache-kafka

I'm trying to connect a KafkaSpout belonging to a storm topology running on a LocalCluster object. I wrote this code according to the documentation I found on https://github.com/apache/storm/tree/master/external/storm-kafka.
private static final String brokerZkStr = "localhost:2181";
private static final String topic = "/test-topic-multi";
public void startTopology()
{
BrokerHosts hosts = new ZkHosts(brokerZkStr);
SpoutConfig conf = new SpoutConfig(hosts, topic, "localhost:2181", UUID
.randomUUID().toString());
KafkaSpout kafkaSput = new KafkaSpout(conf);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka-spout", kafkaSput);
Config topConfig = new Config();
topConfig.setDebug(true);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("HelloStorm", topConfig , builder.createTopology());
}
I want to use a zookeeper instance running at localhost:2181 but when a try to run the code I get the following error:
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Invalid path string "/brokers/topics//test-topic-multi/partitions" caused by empty node name specified #16
at storm.kafka.DynamicBrokersReader.getBrokerInfo(DynamicBrokersReader.java:81)
at storm.kafka.trident.ZkBrokerReader.<init>(ZkBrokerReader.java:42)
at storm.kafka.KafkaUtils.makeBrokerReader(KafkaUtils.java:57)
at storm.kafka.KafkaSpout.open(KafkaSpout.java:87)
It seems to be just a problem of wrong settings but I can't solve it
PS Kafka configuration is the following: 1 instance of Zookeeper and 2 brokers running on localhost:9092 and localhost:9093

I think I solve it. I just messed up with the configuration code. The correct one is:
private static final String topic = "test-topic-multi";
....
SpoutConfig conf = new SpoutConfig(hosts, topic, "/" + topic, UUID
.randomUUID().toString());

Your kafka topic name is not valid. Why are you attempting to connect to a topic which does not exist?
sql#injection:~$ kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic /test-topic
Error while executing topic command topic name /test-topic is illegal, contains a character other than ASCII alphanumerics, '.', '_' and '-'
kafka.common.InvalidTopicException: topic name /test-topic is illegal, contains a character other than ASCII alphanumerics, '.', '_' and '-'
at kafka.common.Topic$.validate(Topic.scala:42)
at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:181)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:172)
at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:93)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:55)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
Are you really sure the front-slash belongs to the topic name?

Related

Creating partition for topic in kafka-node

I have a created a HighLevelProducer to publish messages to a topic stream that will be consumed by a ConsumerGroupStream using kafka-node. When I create multiple consumers from the same ConsumerGroup to consume from that same topic only one partition is created and only one consumer is consuming. I have also tried to define the number of partitions for that topic although I'm not sure if is required to define it upon creating the topic and if so how many partitions will I need in advance. In addition, is it possible to push an object to the Transform stream and not a string (I currently used JSON.stringify because otherwise I got [Object object] in the consumer.
const myProducerStream = ({ kafkaHost, highWaterMark, topic }) => {
const kafkaClient = new KafkaClient({ kafkaHost });
const producer = new HighLevelProducer(kafkaClient);
const options = {
highWaterMark,
kafkaClient,
producer
};
kafkaClient.refreshMetadata([topic], err => {
if (err) throw err;
});
return new ProducerStream(options);
};
const transfrom = topic => new Transform({
objectMode: true,
decodeStrings: true,
transform(obj, encoding, cb) {
console.log(`pushing message ${JSON.stringify(obj)} to topic "${topic}"`);
cb(null, {
topic,
messages: JSON.stringify(obj)
});
}
});
const publisher = (topic, kafkaHost, highWaterMark) => {
const myTransfrom = transfrom(topic);
const producer = myProducerStream({ kafkaHost, highWaterMark, topic });
myTransfrom.pipe(producer);
return myTransform;
};
The consumer:
const createConsumerStream = (sourceTopic, kafkaHost, groupId) => {
const consumerOptions = {
kafkaHost,
groupId,
protocol: ['roundrobin'],
encoding: 'utf8',
id: uuidv4(),
fromOffset: 'latest',
outOfRangeOffset: 'earliest',
};
const consumerGroupStream = new ConsumerGroupStream(consumerOptions, sourceTopic);
consumerGroupStream.on('connect', () => {
console.log(`Consumer id: "${consumerOptions.id}" is connected!`);
});
consumerGroupStream.on('error', (err) => {
console.error(`Consumer id: "${consumerOptions.id}" encountered an error: ${err}`);
});
return consumerGroupStream;
};
const publisher = (func, destTopic, consumerGroupStream, kafkaHost, highWaterMark) => {
const messageTransform = new AsyncMessageTransform(func, destTopic);
const resultProducerStream = myProducerStream({ kafkaHost, highWaterMark, topic: destTopic })
consumerGroupStream.pipe(messageTransform).pipe(resultProducerStream);
};
For the first question:
The maximum working consumers in a group are equal to the number of partitions.
So if you have TopicA with 1 partition and you have 5 consumers in your consumer group, 4 of them will be idle.
If you have TopicA with 5 partitions and you have 5 consumers in your consumer group, all of them will be active and consuming messages from your topic.
To specify the number of partitions, you should create the topic from CLI instead of expecting Kafka to create it when you first publish messages.
To create a topic with a specific number of partitions:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 3 --topic test
To alter the number of partitions in an already existed topic:
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic test
--partitions 40
Please note that you can only increase the number of partitions, you can not decrease them.
Please refer to Kafka Docs https://kafka.apache.org/documentation.html
Also if you'd like to understand more about Kafka please check the free book https://www.confluent.io/resources/kafka-the-definitive-guide/

How to send messages from Kafka producer on one PC to kafka broker on another PC?

I'm trying to send messages from Kafka Producer on one PC to Kafka broker on another PC using wifi interfaces, but messages don't appear in the specified topic at Kafka broker.
I connected two PCs using ASUS wireless router and disabled all the firewalls on PCs and router. Both PCs ping successfully each other. When I turn to a wired connection, it works and messages are ingested to the specified topic on kafka broker pc.
Kafka Producer:
public class CarDataProducer {
public static void main(String[] args) {
CarDataProducer fProducer= new CarDataProducer();
Producer<String, CarData> producer= fProducer.initializeKafkaProducer();
String topicName = "IN-DATA";
CSVReaderCarData csvReader = new CSVReaderCarData();
List<CarData> CarDataList = csvReader.readCarDataFromCSV("data/mllib/TrainTest_101.csv");
//read from CSV file and send
for (CarData val : CarDataList) {
producer.send(new ProducerRecord<String, CarData>(topicName, val));
}
}
public KafkaProducer<String, CarData> initializeKafkaProducer() {
// Set the producer configuration properties.
Properties props = ProducerProperties.getInstance();
// Instantiate a producerSampleJDBC
KafkaProducer<String, CarData> producer = new KafkaProducer<String, CarData>(props);
return producer;
}
public class ProducerProperties {
private ProducerProperties() {
}
public static final Properties props = new Properties();
static {
props.put("bootstrap.servers", "192.168.1.124:9092");
props.put("acks", "0");
props.put("retries", 0);
props.put("batch.size", 500);
props.put("linger.ms", 500);
props.put("buffer.memory", 500);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "com.iov.safety.vehicleproducer.CarDataSerializer");
}
public static Properties getInstance() {
return props;
}
}
Check message reception via Kafka Consumer using console on the server-side:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic IN-DATA
The reception of messages should look like this:
kafka-console-consumer.sh --bootstrap-server 192.168.1.124:9092 --topic IN-DATA
{"instSpeed":19.0,"time":15.0,"label":0.0}
{"instSpeed":64.0,"time":15.0,"label":1.0}
{"instSpeed":10.0,"time":16.0,"label":0.0}
ifconfig on server side
ipconfig on kafka producer side
server.properties:
listeners=PLAINTEXT://:9092
netstat -ano|grep '9092'
tcp6 0 0 :::9092 :::*
LISTEN off (0.00/0/0) tcp6 0 0 127.0.0.1:53880
127.0.1.1:9092 ESTABLISHED keepalive (6659.53/0/0) tcp6 0 0 127.0.1.1:9092 127.0.0.1:53880 ESTABLISHED
keepalive (6659.53/0/0) tcp6 0 0 127.0.1.1:9092
127.0.0.1:53878 ESTABLISHED keepalive (6659.15/0/0) tcp6 0 0 127.0.0.1:53878 127.0.1.1:9092 ESTABLISHED
keepalive (6659.15/0/0)
By adding callback to send of kafka producer, I get timeout error:
org.apache.kafka.common.errors.TimeoutException: Expiring 8 record(s) for IN-DATA-0: 30045 ms has passed since last append
I solved it. Each Kafka broker has to advertise its hostname/ip, to be reachable from Kafka Producer on another pc.
kafka-server-start.sh config/server.properties --override advertised.listeners=PLAINTEXT://192.168.1.124:9092
sha
Instead, we can update config/server.properties as follows:
advertised.listeners=PLAINTEXT://your.host.name:9092
or
advertised.listeners=PLAINTEXT://192.168.1.124:9092

Configure log4j.properties for Kafka appender, error when parsing property bootstrap.servers

I want to add a Kafka appender to the audit-hdfs log in a Cloudera cluster.
I have successfully configured a log4j2.xml file with a Kafka appender, I need to convert this log4j2.xml into a log4j2.properties file in order to be able to merge it with the HDFS log log4j2.properties file. I am unable to do this because when I launch my dummy process with log4j2.properties instead of XML, I get an error.
I have tried writing the properties file in several different ways, always resulting in problems with the bootstrap.servers property
This is my properties file
filters = threshold
filter.threshold.type = ThresholdFilter
filter.threshold.level = ALL
appenders = console,kafka
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %m%n
appender.kafka.type = Kafka
appender.kafka.name = kafka
appender.kafka.layout.type = PatternLayout
appender.kafka.layout.pattern =%m%n
appender.kafka.property.type = Property
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.topic = cdh-audit-hdfs
Here the problem is in this line:
appender.kafka.property.bootstrap.servers = ip:port
I have tried the following, to no avail:
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.bootstrap\.servers = ip:port
appender.kafka.property.name = "bootstrap.servers"
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.key = "bootstrap.servers"
appender.kafka.property.value = ip:port
etc...
This is my dummy process:
package blabla
import org.apache.logging.log4j.LogManager
object dummy extends App{
val logger = LogManager.getLogger
val record = "...c"
while(true){
logger.info(record)
Thread.sleep(5000)
}
}
How do I need to configure my log4j2.properties in order to be able to define this property?
Im expecting this process to write the record in my Kafka topic but instead I get errors like this:
Exception in thread "main" org.apache.logging.log4j.core.config.ConfigurationException: No type attribute provided for component bootstrap
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.
try this solution, it works for me :
appender.kafka.property.type=Property
appender.kafka.property.name=bootstrap.servers
appender.kafka.property.value=host:port

Producing from localhost to Kafka in HDP Sandbox 2.6.5 not working

I am writing Kafka client producer as:
public class BasicProducerExample {
public static void main(String[] args){
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, 0);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
//props.put(ProducerConfig.
props.put("batch.size","16384");// maximum size of message
Producer<String, String> producer = new KafkaProducer<String, String>(props);
TestCallback callback = new TestCallback();
Random rnd = new Random();
for (long i = 0; i < 2 ; i++) {
//ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", "key-" + i, "message-"+i );
//Topci and Message
ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", ""+i);
producer.send(data, callback);
}
producer.close();
}
private static class TestCallback implements Callback {
#Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
System.out.println("Error while producing message to topic :" + recordMetadata);
e.printStackTrace();
} else {
String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset());
System.out.println(message);
}
}
}
}
OUTPUT:
Error while producing message to topic :null
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
NOTE:
Broker port: localhost:6667 is working.
In your property for BOOTSTRAP_SERVERS_CONFIG, try changing the port number to 6667.
Thanks.
--
Hiren
I use Apache Kafka on a Hortonworks (HDP 2.X release) installation. The error message encountered means that Kafka producer was not able to push the data to the segment log file. From a command-line console, that would mean 2 things :
You are using incorrect port for the brokers
Your listener config in server.properties are not working
If you encounter the error message while writing via scala api, additionally check connection to kafka cluster using telnet <cluster-host> <broker-port>
NOTE: If you are using scala api to create topic, it takes sometime for the brokers to know about the newly created topic. So, immediately after topic creation, the producers might fail with the error Failed to update metadata after 60000 ms.
I did the following checks in order to resolve this issue:
The first difference once I check via Ambari is that Kafka brokers listen on port 6667 on HDP 2.x (apache kafka uses 9092).
listeners=PLAINTEXT://localhost:6667
Next, use the ip instead of localhost.
I executed netstat -na | grep 6667
tcp 0 0 192.30.1.5:6667 0.0.0.0:* LISTEN
tcp 1 0 192.30.1.5:52242 192.30.1.5:6667 CLOSE_WAIT
tcp 0 0 192.30.1.5:54454 192.30.1.5:6667 TIME_WAIT
So, I modified the producer call to user the IP and not localhost:
./kafka-console-producer.sh --broker-list 192.30.1.5:6667 --topic rdl_test_2
To monitor if you have new records being written, monitor the /kafka-logs folder.
cd /kafka-logs/<topic name>/
ls -lart
-rw-r--r--. 1 kafka hadoop 0 Feb 10 07:24 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
Once, the producer successfully writes, the segment log-file 00000000000000000000.log will grow in size.
See the size below:
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
-rw-r--r--. 1 kafka hadoop **45** Feb 10 09:16 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
At this point, you can run the consumer-console.sh:
./kafka-console-consumer.sh --bootstrap-server 192.30.1.5:6667 --topic rdl_test_2 --from-beginning
response is hello world
After this step, if you want to produce messages via the Scala API's , then change the listeners value(from localhost to a public IP) and restart Kafka brokers via Ambari:
listeners=PLAINTEXT://192.30.1.5:6667
A Sample producer will be as follows:
package com.scalakafka.sample
import java.util.Properties
import java.util.concurrent.TimeUnit
import org.apache.kafka.clients.producer.{ProducerRecord, KafkaProducer}
import org.apache.kafka.common.serialization.{StringSerializer, StringDeserializer}
class SampleKafkaProducer {
case class KafkaProducerConfigs(brokerList: String = "192.30.1.5:6667") {
val properties = new Properties()
val batchsize :java.lang.Integer = 1
properties.put("bootstrap.servers", brokerList)
properties.put("key.serializer", classOf[StringSerializer])
properties.put("value.serializer", classOf[StringSerializer])
// properties.put("serializer.class", classOf[StringDeserializer])
properties.put("batch.size", batchsize)
// properties.put("linger.ms", 1)
// properties.put("buffer.memory", 33554432)
}
val producer = new KafkaProducer[String, String](KafkaProducerConfigs().properties)
def produce(topic: String, messages: Iterable[String]): Unit = {
messages.foreach { m =>
println(s"Sending $topic and message is $m")
val result = producer.send(new ProducerRecord(topic, m)).get()
println(s"the write status is ${result}")
}
producer.flush()
producer.close(10L, TimeUnit.MILLISECONDS)
}
}
Hope this helps someone.

How to produce messages with headers in Kafka 0.11 using console producer?

How to produce messages with headers in Kafka 0.11 using console producer?
I didn't find any description in Kafka document about this.
Update: Since Kafka 3.2, you can produce records with headers using the kafka-console-producer.sh tool. For details, see https://cwiki.apache.org/confluence/display/KAFKA/KIP-798%3A+Add+possibility+to+write+kafka+headers+in+Kafka+Console+Producer
Using the kafka-console-producer.sh tool (ConsoleProducer.scala) you cannot produce messages with headers.
You need to write your own small application. Headers are passed in when creating a ProducerRecord. For example:
public static void main(String[] args) throws Exception {
Properties producerConfig = new Properties();
producerConfig.load(new FileInputStream("producer.properties"));
KafkaProducer<String, String> producer = new KafkaProducer<>(producerConfig);
List<Header> headers = Arrays.asList(new RecordHeader("header_key", "header_value".getBytes()));
ProducerRecord<String, String> record = new ProducerRecord<>("topic", 0, "key", "value", headers);
Future<RecordMetadata> future = producer.send(record);
future.get();
producer.close();
}
You can also use kcat to produce message with header.
kcat -P -b localhost:9092 -H "facilityCountryCode=US" -H "facilityNum=32619" \
-t test.topic.to.mq -f testkafkaproducerfile.json
for more information check github page: https://github.com/edenhill/kcat