KafkaSpout not receiving messages - apache-kafka

I'm trying to integrate Kafka (Kafka_2.10 version 0.8.2.1) with Storm (version 0.9.3) in Cloudera environment, and have written some code for producers/consumers. I'm able to run the producer code separately with Kafka and see that it is working with my consumer code (on console). I then wrote some code using KafkaSpout and HDFSBolt to write data into HDFS. With this code, I am able to create a topology (and see it in the UI), but the the KafkaSpout is not receiving any messages from the producer.
My code snippet is shown below:
public class LoadingData {
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException {
String kafkaTopic = "test";
SpoutConfig spoutConfig = new SpoutConfig(new ZkHosts("localhost:2181"),
kafkaTopic, "/kafkastorm", "KafkaSpout");
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("KafkaSpout", new KafkaSpout(spoutConfig),4);
RecordFormat format = new DelimitedRecordFormat().withFieldDelimiter(",");
SyncPolicy syncPolicy = new CountSyncPolicy(10);
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB);
FileNameFormat fileNameFormat = new DefaultFileNameFormat().withPath("/stormstuff");
builder.setBolt("stormbolt", new HdfsBolt()
.withFsUrl("hdfs://localhost:8020")
.withSyncPolicy(syncPolicy)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withFileNameFormat(fileNameFormat),1
).shuffleGrouping("KafkaSpout");
String topologyName = "EmployeeTopology";
Config config = new Config();
config.setNumWorkers(1);
StormSubmitter.submitTopology(topologyName, config, builder.createTopology());
}
}
Any ideas/suggestions on what I might be doing wrong? I really appreciate your help! Please let me know if you need any more details.

Related

TopicCommand.alterTopic in Kakfa 2.4

I have an old project (it's not mine) and I'm trying to update it from Kafka 2.1 to 2.4.
I have the following piece of code
public synchronized void increasePartitions(String topic, int partitions) throws InvalidPartitionsException, IllegalArgumentException {
StringBuilder commandString = new StringBuilder();
commandString.append("--alter");
commandString.append(" --topic ").append(topic);
commandString.append(" --zookeeper ").append(config.getOrDefault("zookeeper.connect",
"localhost:2181"));
commandString.append(" --partitions ").append(partitions);
String[] command = commandString.toString().split(" ");
TopicCommand.alterTopic(kafkaZkClient, new TopicCommand.TopicCommandOptions(command));
}
It says that the alterTopic method of TopicCommand doesn't exist. I'm looking at the documentation and I don't know how to solve it.
I need this method to do the exact same thing but with Kafka version 2.4.
You should use the Admin API to perform tasks like this.
In order to add partitions, there's the createPartitions() method.
For example, to increase the number of partitions for my-topic to 10:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
Admin admin = Admin.create(props);
Map<String, NewPartitions> newPartitions = new HashMap<>();
newPartitions.put("my-topic", NewPartitions.increaseTo(10));
CreatePartitionsResult createPartitions = admin.createPartitions(newPartitions);
createPartitions.all().get();

My producer can create a topic, but data doesn't seem to be stored inside the broker

My producer can create a topic, but it doesn't seem to store any data inside a broker. I can check that the topic is created with kafka-topics script.
When I tried to consume with kafka-console-consumer, it doesn't consume anything. (I know --from-beginning.)
When I produced with kafka-console-producer, my consumer(kafka-console-consumer) can consume it right away. So there is something wrong with my java code.
And when I run my code with localhost:9092, it worked fine. And when I consume the topic with my consumer code, it was working properly. My producer works with Kafka server on my local machine but doesn't work with another Kafka server on remote machine.
Code :
//this code is inside the main method
Properties properties = new Properties();
//properties.put("bootstrap.servers", "localhost:9092");
//When I used localhost, my consumer code consumes it fine.
properties.put("bootstrap.servers", "192.168.0.30:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test5", "1111","jin1111");
//topc is created, but consumer can't consume any data.
//I tried putting different values for key and value parameters but no avail.
try {
kafkaProducer.send(record);
System.out.println("complete");
} catch (Exception e) {
e.printStackTrace();
} finally {
kafkaProducer.close();
System.out.println("closed");
}
/*//try{
for(int i = 0; i < 10000; i++){
System.out.println(i);
kafkaProducer.send(new ProducerRecord("test", Integer.toString(i), "message - " + i ));
}*/
My CLI (Putty) :
I want to see my consumer consuming when I run my java code. (Those data shown in the image are from the producer script.)
update
After reading answers and comments, this is what I've tried so far. Still not consuming any messages. I think message produced in this code is not stored in the broker. I tried with the different server, too. The same problem. Topic was created, but no consumer exists in the consumer group list and can't consume. And no data can be consumed with consumer script.
I also tried permission change. (chown) and tried with etc/hosts files. but no luck. I'll keep on trying until I solve this.
public static void main(String[] args){
Properties properties = new Properties();
//properties.put("bootstrap.servers", "localhost:9092");
properties.put("bootstrap.servers", "192.168.0.30:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("linger.ms", "1");
properties.put("batch.size", "16384");
properties.put("request.timeout.ms", "30000");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test5", "1111","jin1111");
System.out.println("1");
try {
kafkaProducer.send(record);
//kafkaProducer.send(record).get();
// implement Callback
System.out.println("complete");
kafkaProducer.flush();
System.out.println("flush completed");
} catch (Exception e) {
e.printStackTrace();
} finally {
kafkaProducer.flush();
System.out.println("another flush test");
kafkaProducer.close();
System.out.println("closed");
}
}
When I run this in Eclipse, the console shows :
To complete the ppatierno answer, you should call KafkaProducer.flush() before calling KafkaProducer.close(). This is a blocking call and will not return before all record got sent.
Yannick
My guess is that your main method exits and the application ends before the message is sent by the Kafka client.
The send method is not sync. The client buffers messages and send them after reaching a timeout named linger time (see linger.ms) or the buffer is filled to a specific size (see batch.size parameter for example). The default linger time is anyway 0.
So what your main method does is providing the message to the send method but then it exits and the underlying thread in the Kafka client isn't able to send the message.
I finally figured out. If you experienced similar problem, there are things you can do.
In your server.properties, uncomment these and put the ip and port.
(There seems to be a problem with the port, so I changed it.)
listeners=PLAINTEXT://192.168.0.30:9093
advertised.listeners=PLAINTEXT://192.168.0.30:9093
(Before restarting your broker with your changed server.properties, you might want to clean all existing log.dir. Try this, if nothing works)
Some other things you might want to consider :
change your log.dir. Usually the default path is tmp, but sometimes there is a noexec setting, so configure to a different location
check your etc/hosts
check your permission : And use chown and chmod
change zookeeper port and kafka port if necessary.
change broker.id
My working producer code :
public class Producer1 {
public static void main(String[] args){
Properties properties = new Properties();
properties.put("bootstrap.servers", "192.168.0.30:9093");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test", "1","jin");
try {
kafkaProducer.send(record);
System.out.println("complete");
} catch (Exception e) {
e.printStackTrace();
} finally {
kafkaProducer.close();
System.out.println("closed");
}
}
}
working Consumer code:
public class Consumer1 {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.0.30:9093");
props.put("group.id", "jin");
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Collections.singletonList("test"));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records){
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
}
}
} catch (Exception e){
e.printStackTrace();
} finally {
consumer.close();
System.out.println("closed");
}
}
}
Console :

does kafka support dotnet core?

I'm trying to make a Kafka producer and consumer, but my project is in dotnet Core 2.0 and it doesn't seem to work well with kafka. This is the proof-of-concept I've tried coming up with. I'm using Visual Studio 2017 with the kafka-net nuget package:
using
using KafkaNet;
using KafkaNet.Model;
using KafkaNet.Protocol;
producer
static void Main(string[] args)
{
string payload = "Welcome to Kafka!";
string topic = "IDGTestTopic";
Message msg = new Message(payload);
Uri uri = new Uri("localhost:9092");
var options = new KafkaOptions(uri);
var router = new BrokerRouter(options);
var client = new Producer(router);
client.SendMessageAsync(topic, new List<Message> { msg }).Wait();
Console.ReadLine();
}
consumer
static void Main(string[] args)
{
string topic = "IDGTestTopic";
Uri uri = new Uri("http://localhost:9092");
var options = new KafkaOptions(uri);
var router = new BrokerRouter(options);
var consumer = new Consumer(new ConsumerOptions(topic, router));
foreach (var message in consumer.Consume())
{
Console.WriteLine(Encoding.UTF8.GetString(message.Value));
}
Console.ReadLine();
}
When I try to run the producer first, I get an error message on the BrokerRouter:
$exception {System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
Parameter name: port
at System.Net.IPEndPoint..ctor(IPAddress address, Int32 port)
at KafkaNet.DefaultKafkaConnectionFactory.Resolve(Uri kafkaAddress, IKafkaLog log)
at KafkaNet.Model.KafkaOptions.<get_KafkaServerEndpoints>d__0.MoveNext()
at KafkaNet.BrokerRouter..ctor(KafkaOptions kafkaOptions)
at SampleKafkaProducer.Program.Main(String[] args) in C:\v4target\SampleKafka\SampleKafkaProducer\SampleKafkaProducer\Program.cs:line 18} System.ArgumentOutOfRangeException
How is a port of 9092 out of range? My Visual Studio projects are running on ports in the 55000's. Multiple sources I've researched use 9092 as a kafka port.
Does anyone understand the error message? Is part of the main problem because I'm using a version of Kafka not compatible with dotnet core?
The problem is with the uri.
Uri uri = new Uri("localhost:9092");
If you print out the uri.Port, it's -1. Hence the ArgumentOutOfRangeException.
Try this instead:
Uri uri = new Uri("http://localhost:9092");
From the KafkaNet Repository. This is how they setup the URI:
var options = new KafkaOptions(new Uri("http://CSDKAFKA01:9092"), new Uri("http://CSDKAFKA02:9092"))
{
Log = new ConsoleLog()
};

Kafka Consumer not getting invoked when the kafka Producer is set to Sync

I have a requirement where there are 2 topics to be maintained 1 with synchronous approach and other with an asynchronous way.
The asynchronous works as expected invoking the consumer record, however in the synchronous approach the consumer code is not getting invoked.
Below is the code declared in the config file
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9093");
props.put(ProducerConfig.RETRIES_CONFIG, 3);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
I have enabled autoFlush true here
#Bean( name="KafkaPayloadSyncTemplate")
public KafkaTemplate<String, KafkaPayload> KafkaPayloadSyncTemplate() {
return new KafkaTemplate<String,KafkaPayload>(producerFactory(),true);
}
The control stops thereafter not making any calls to the consumer after returning the recordMetadataResults object
private List<RecordMetadata> sendPayloadToKafkaTopicInSync() throws InterruptedException, ExecutionException {
final List<RecordMetadata> recordMetadataResults = new ArrayList<RecordMetadata>();
KafkaPayload kafkaPayload = constructKafkaPayload();
ListenableFuture<SendResult<String,KafkaPayload>>
future = KafkaPayloadSyncTemplate.send(TestTopic, kafkaPayload);
SendResult<String, KafkaPayload> results;
results = future.get();
recordMetadataResults.add(results.getRecordMetadata());
return recordMetadataResults;
}
Consumer Code
public class KafkaTestListener {
#Autowired
TestServiceImpl TestServiceImpl;
public final CountDownLatch countDownLatch = new CountDownLatch(1);
#KafkaListener(id="POC", topics = "TestTopic", group = "TestGroup")
public void listen(ConsumerRecord<String,KafkaPayload> record, Acknowledgment acknowledgment) {
countDownLatch.countDown();
TestServiceImpl.consumeKafkaMessage(record);
System.out.println("Acknowledgment : " + acknowledgment);
acknowledgment.acknowledge();
}
}
Based on the issue, I have 2 questions
Should we manually call the listen() inside the Listener Class when its a Sync Producer. If Yes, How to do that ?
If the listener(#KafkaListener) get called automatically, what other setup/configurations do I need to add to make this working.
Thanks for the inputs in advance
-Srikant
You should be sure that you use consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); for Consumer Properties.
Not sure what you mean about sync/async, but produce and consume are fully distinguished operations. And you can't affect consumer from your producer side. Because in between there is Kafka Broker.

how can I achieve fields grouping between KafkaSpout and the bolts

I had been sending messages to Kafka topic, these messages are in JSON in the topic and I am using KafkaSpout for fetching the messages from Kafka and sending it to the bolt using shuffle grouping. Now I want to implement fields grouping between KafkaSpout and bolt. Please can anyone help me on this. How can I achieve fields grouping between KafkaSpout and the bolts.
You need to implement the backtype.storm.spout.scheme interface, basically it looks something like this:
public class FooScheme implements Scheme {
public Values deserialize(final byte[] _line) {
try{
Values values = new Values();
JSONObject msg = (JSONObject) JSONValue.parseWithException(new String(_line));
values.add((String) msg.get("a"));
values.add((String) msg.get("b"))
values.add(msg)
}
catch(ParseException e) {
//handle the exception
return null;
}
}
public Fields getOutputFields() {
return new Fields("a", "b", "json");}
}
and you use it with your spout like this:
SpoutConfig spoutConfig = new SpoutConfig( ... your config here ...);
spoutConfig.scheme = new SchemeAsMultiScheme(new FooScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
topology.setSpout("kafka-spout", 1).setNumTasks(1);
and now you can are ready to use the fields grouping by "a" or "b" or both.
FooBolt bolt = new FooBolt();
topology.setBolt("foo-bolt", new FooBolt(), 1).setNumtasks(1)
.fieldsGrouping("kafka-spout", new Fields("a","b"));
Enjoy