Flink (1.2) window is producing more than 1 ouput per window - streaming

The question: the problem is that this program is writing to Kafka more than once every window (is creating 2-3 or more lines per window, meanwhile it is supposed to create 1 line per window as with the reduce function it lets only one element). I have the same code written in Spark and it works perfectly. I have been trying to find info about this issue and I haven't found anything :(. Also I have been trying changing some functions' parallelism and some more things and nothing worked, and I can not realise where can be the problem.
I am testing Flink latency. Here you have the environment of my problem:
Cluster: I am using Flink 1.2.0 and OpenJDK 8. I have 3 computers: 1 JobManager, 2 TaskManagers (4 cores, 2GB RAM, 4 task slots each TaskManager).
Input data: lines produced by one java producer to the Kafka 24 partitions' topic with two elements: incremental value and creation timestamp:
1 1497790546981
2 1497790546982
3 1497790546983
4 1497790546984
............................
My Java Class:
It reads from a Kafka topic with 24 partitions (Kafka is in the same machine than the JobManager).
The filter functions are useless together with the union as I use them just for checking their latency.
Basically, it adds a "1" to each line,then there is a tumbling window every 2 seconds, and the reduce function sum all this 1's and all the timestamps, this last timestamp is later divided in the map function between the sum of 1's which gives me the average, and finally in the last map function it adds a timestamp of the current moment to each reduced line and the difference between this timestamp and the average timestamp.
This line is written to Kafka (to a 2 partitions' topic).
//FLINK CONFIGURATION
final StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//KAFKA CONSUMER CONFIGURATION
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "192.168.0.155:9092");
FlinkKafkaConsumer010<String> myConsumer = new FlinkKafkaConsumer010<>(args[0], new SimpleStringSchema(), properties);
//KAFKA PRODUCER
Properties producerConfig = new Properties();
producerConfig.setProperty("bootstrap.servers", "192.168.0.155:9092");
producerConfig.setProperty("acks", "0");
producerConfig.setProperty("linger.ms", "0");
//MAIN PROGRAM
//Read from Kafka
DataStream<String> line = env.addSource(myConsumer);
//Add 1 to each line
DataStream<Tuple2<String, Integer>> line_Num = line.map(new NumberAdder());
//Filted Odd numbers
DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.filter(new FilterOdd());
//Filter Even numbers
DataStream<Tuple2<String, Integer>> line_Num_Even = line_Num.filter(new FilterEven());
//Join Even and Odd
DataStream<Tuple2<String, Integer>> line_Num_U = line_Num_Odd.union(line_Num_Even);
//Tumbling windows every 2 seconds
AllWindowedStream<Tuple2<String, Integer>, TimeWindow> windowedLine_Num_U = line_Num_U
.windowAll(TumblingProcessingTimeWindows.of(Time.seconds(2)));
//Reduce to one line with the sum
DataStream<Tuple2<String, Integer>> wL_Num_U_Reduced = windowedLine_Num_U.reduce(new Reducer());
//Calculate the average of the elements summed
DataStream<String> wL_Average = wL_Num_U_Reduced.map(new AverageCalculator());
//Add timestamp and calculate the difference with the average
DataStream<String> averageTS = wL_Average.map(new TimestampAdder());
//Send the result to Kafka
FlinkKafkaProducer010Configuration<String> myProducerConfig = (FlinkKafkaProducer010Configuration<String>) FlinkKafkaProducer010
.writeToKafkaWithTimestamps(averageTS, "testRes", new SimpleStringSchema(), producerConfig);
myProducerConfig.setWriteTimestampToKafka(true);
env.execute("TimestampLongKafka");
}
//Functions used in the program implementation:
public static class FilterOdd implements FilterFunction<Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public boolean filter(Tuple2<String, Integer> line) throws Exception {
Boolean isOdd = (Long.valueOf(line._1.split(" ")[0]) % 2) != 0;
return isOdd;
}
};
public static class FilterEven implements FilterFunction<Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public boolean filter(Tuple2<String, Integer> line) throws Exception {
Boolean isEven = (Long.valueOf(line._1.split(" ")[0]) % 2) == 0;
return isEven;
}
};
public static class NumberAdder implements MapFunction<String, Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public Tuple2<String, Integer> map(String line) {
Tuple2<String, Integer> newLine = new Tuple2<String, Integer>(line, 1);
return newLine;
}
};
public static class Reducer implements ReduceFunction<Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> line1, Tuple2<String, Integer> line2) throws Exception {
Long sum = Long.valueOf(line1._1.split(" ")[0]) + Long.valueOf(line2._1.split(" ")[0]);
Long sumTS = Long.valueOf(line1._1.split(" ")[1]) + Long.valueOf(line2._1.split(" ")[1]);
Tuple2<String, Integer> newLine = new Tuple2<String, Integer>(String.valueOf(sum) + " " + String.valueOf(sumTS),
line1._2 + line2._2);
return newLine;
}
};
public static class AverageCalculator implements MapFunction<Tuple2<String, Integer>, String> {
private static final long serialVersionUID = 1L;
public String map(Tuple2<String, Integer> line) throws Exception {
Long average = Long.valueOf(line._1.split(" ")[1]) / line._2;
String result = String.valueOf(line._2) + " " + String.valueOf(average);
return result;
}
};
public static final class TimestampAdder implements MapFunction<String, String> {
private static final long serialVersionUID = 1L;
public String map(String line) throws Exception {
Long currentTime = System.currentTimeMillis();
String totalTime = String.valueOf(currentTime - Long.valueOf(line.split(" ")[1]));
String newLine = line.concat(" " + String.valueOf(currentTime) + " " + totalTime);
return newLine;
}
};
Some output data: this output has been written to the 2 partitions' topic, and with a producing rate of less than 1000 records/second (**in this case it is creating 3 output lines per window):
1969 1497791240910 1497791241999 1089 1497791242001 1091
1973 1497791240971 1497791241999 1028 1497791242002 1031
1970 1497791240937 1497791242094 1157 1497791242198 1261
1917 1497791242912 1497791243999 1087 1497791244051 1139
1905 1497791242971 1497791243999 1028 1497791244051 1080
1916 1497791242939 1497791244096 1157 1497791244199 1260
1994 1497791244915 1497791245999 1084 1497791246002 1087
1993 1497791244966 1497791245999 1033 1497791246004 1038
1990 1497791244939 1497791246097 1158 1497791246201 1262
Thanks in advance!

I don't know exactly why, but I can fix the problem stopping the Flink cluster and starting it again. After some job executions it starts to produce more outputs, at leats x3, and probably the problem can keep growing. I will open an issue on Jira and update this as soon as possible.

Related

How to read all the records in a Kafka topic

I am using kafka : kafka_2.12-2.1.0, spring kafka on client side and have got stuck with an issue.
I need to load an in-memory map by reading all the existing messages within a kafka topic. I did this by starting a new consumer (with a unique consumer group id and setting the offset to earliest). Then I iterate over the consumer (poll method) to get all messages and stop when the consumer records become empty.
But I noticed that, when I start polling, the first few iterations return consumer records as empty and then it starts returning the actual records. Now this breaks my logic as our code thinks there are no records in the topic.
I have tried few other ways (like using offsets number) but haven't been able to come up with any solution, apart from keeping another record somewhere which tells me how many messages there are in the topic which needs to be read before I stop.
Any idea's please ?
To my understanding, what you are trying to achieve is to have a map constructed in your application based on the values that are already in a specific Topic.
For this task, instead of manually polling the topic, you can use Ktable in Kafka Streams DSL which will automatically construct a readable key-value store which is fault tolerant, replication enabled and automatically filled with new values.
You can do this simply by calling groupByKey on a stream and then using the aggregate.
KStreamBuilder builder = new KStreamBuilder();
KStream<String, Long> myKStream = builder.stream(Serdes.String(), Serdes.Long(), "topic_name");
KTable<String, Long> totalCount = myKStream.groupByKey().aggregate(this::initializer, this::aggregator);
(The actual code may vary depending on the kafka version, your configurations, etc..)
Read more about Kafka Stream concepts here
Then I iterate over the consumer (poll method) to get all messages and stop when the consumer records become empty
Kafka is a message streaming platform. Any data you stream is being updated continuously and you probably should not use it in a way that you expect the consuming to stop after a certain number of messages. How will you handle if a new message comes in after you stop the consumer?
Also the reason you are getting null records maybe probably related to records being in different partitions, etc..
What is your specific use case here?, There might be a good way to do it with the Kafka semantics itself.
You have to use 2 consumers one to load the offsets and another one to read all the records.
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.ByteArrayDeserializer;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Properties;
import java.util.Set;
import java.util.stream.Collectors;
public class KafkaRecordReader {
static final Map<String, Object> props = new HashMap<>();
static {
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class);
props.put(ConsumerConfig.CLIENT_ID_CONFIG, "sample-client");
}
public static void main(String[] args) {
final Map<TopicPartition, OffsetInfo> partitionOffsetInfos = getOffsets(Arrays.asList("world, sample"));
final List<ConsumerRecord<byte[], byte[]>> records = readRecords(partitionOffsetInfos);
System.out.println(partitionOffsetInfos);
System.out.println("Read : " + records.size() + " records");
}
private static List<ConsumerRecord<byte[], byte[]>> readRecords(final Map<TopicPartition, OffsetInfo> offsetInfos) {
final Properties readerProps = new Properties();
readerProps.putAll(props);
readerProps.put(ConsumerConfig.CLIENT_ID_CONFIG, "record-reader");
final Map<TopicPartition, Boolean> partitionToReadStatusMap = new HashMap<>();
offsetInfos.forEach((tp, offsetInfo) -> {
partitionToReadStatusMap.put(tp, offsetInfo.beginOffset == offsetInfo.endOffset);
});
final List<ConsumerRecord<byte[], byte[]>> cachedRecords = new ArrayList<>();
try (final KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(readerProps)) {
consumer.assign(offsetInfos.keySet());
for (final Map.Entry<TopicPartition, OffsetInfo> entry : offsetInfos.entrySet()) {
consumer.seek(entry.getKey(), entry.getValue().beginOffset);
}
boolean close = false;
while (!close) {
final ConsumerRecords<byte[], byte[]> consumerRecords = consumer.poll(Duration.ofMillis(100));
for (final ConsumerRecord<byte[], byte[]> record : consumerRecords) {
cachedRecords.add(record);
final TopicPartition currentTp = new TopicPartition(record.topic(), record.partition());
if (record.offset() + 1 == offsetInfos.get(currentTp).endOffset) {
partitionToReadStatusMap.put(currentTp, true);
}
}
boolean done = true;
for (final Map.Entry<TopicPartition, Boolean> entry : partitionToReadStatusMap.entrySet()) {
done &= entry.getValue();
}
close = done;
}
}
return cachedRecords;
}
private static Map<TopicPartition, OffsetInfo> getOffsets(final List<String> topics) {
final Properties offsetReaderProps = new Properties();
offsetReaderProps.putAll(props);
offsetReaderProps.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset-reader");
final Map<TopicPartition, OffsetInfo> partitionOffsetInfo = new HashMap<>();
try (final KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(offsetReaderProps)) {
final List<PartitionInfo> partitionInfos = new ArrayList<>();
topics.forEach(topic -> partitionInfos.addAll(consumer.partitionsFor("sample")));
final Set<TopicPartition> topicPartitions = partitionInfos
.stream()
.map(x -> new TopicPartition(x.topic(), x.partition()))
.collect(Collectors.toSet());
consumer.assign(topicPartitions);
final Map<TopicPartition, Long> beginningOffsets = consumer.beginningOffsets(topicPartitions);
final Map<TopicPartition, Long> endOffsets = consumer.endOffsets(topicPartitions);
for (final TopicPartition tp : topicPartitions) {
partitionOffsetInfo.put(tp, new OffsetInfo(beginningOffsets.get(tp), endOffsets.get(tp)));
}
}
return partitionOffsetInfo;
}
private static class OffsetInfo {
private final long beginOffset;
private final long endOffset;
private OffsetInfo(long beginOffset, long endOffset) {
this.beginOffset = beginOffset;
this.endOffset = endOffset;
}
#Override
public String toString() {
return "OffsetInfo{" +
"beginOffset=" + beginOffset +
", endOffset=" + endOffset +
'}';
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
OffsetInfo that = (OffsetInfo) o;
return beginOffset == that.beginOffset &&
endOffset == that.endOffset;
}
#Override
public int hashCode() {
return Objects.hash(beginOffset, endOffset);
}
}
}
Adding to the above answer from #arshad, the reason you are not getting the records is because you have already read them. See this answer here using earliest or latest does not matter on the consumer after you have a committed offset for the partition
I would use a seek to the beginning or the particular offset if you knew the starting offset.

Creating CEP with Apache Flink

I'm trying to implement a very simple Apache Flink CEP for a Kafka InputStream.
The Kafka Producer generates a simple Double Value and send them via a Kafka Topic as String towards the Consumers. At the moment i'm coding a CEP Consumer with Flink.
So far this is my written code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().disableSysoutLogging();
env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000));
env.setParallelism(3);
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "flink_consumer");
DataStream<String> stream = env
.addSource(new FlinkKafkaConsumer09<>("temp", new SimpleStringSchema(), properties));
Pattern<String, ?> warning= Pattern.<String>begin("first")
.where(new IterativeCondition<String>() {
private static final long serialVersionUID = 1L;
#Override
public boolean filter(String value, Context<String> ctx) throws Exception {
return Double.parseDouble(value) >= 89.0;
}
})
.next("second")
.where(new IterativeCondition<String>() {
private static final long serialVersionUID = 1L;
#Override
public boolean filter(String value, Context<String> ctx) throws Exception {
return Double.parseDouble(value) >= 89.0;
}
})
.within(Time.seconds(10));
DataStream<String> temp = CEP.pattern(stream, warning).select(new PatternSelectFunction<String, String>() {
private static final long serialVersionUID = 1L;
#Override
public String select(Map<String, List<String>> pattern) throws Exception {
List warnung1 = pattern.get("first");
String first = (String) warnung1.get(1);
return first;
}
});
temp.print();
env.execute();
}
if I'm trying to execute this Code this is the error message:
Exception in thread "main" java.lang.NoSuchFieldError: NO_INDEX at
org.apache.flink.cep.PatternStream.select(PatternStream.java:102) at
CEPTest.main(CEPTest.java:50)
So it looks like my generated DataStream with the CEP Pattern is wrong, but i don't know whats wrong with that method. Every help would be great!
Edit: I tried some other example and at every execution I'm getting the same error. So I think something with my packages is wrong?
With Flink 1.6.0 my code works perfectly.

Cannot produce Message when Main Thread sleep less than 1000

When I am using the Java API of Kafka,if I let my main Thread sleep less than 2000ns,it cannot prodece any message.I really want to know why this happen?
Here is my producer:
public class Producer {
private final KafkaProducer<String, String> producer;
private final String topic;
public Producer(String topic, String[] args) {
//......
//......
producer = new KafkaProducer<>(props);
this.topic = topic;
}
public void producerMsg() throws InterruptedException {
String data = "Apache Storm is a free and open source distributed";
data = data.replaceAll("[\\pP‘’“”]", "");
String[] words = data.split(" ");
Random _rand = new Random();
Random rnd = new Random();
int events = 10;
for (long nEvents = 0; nEvents < events; nEvents++) {
long runtime = new Date().getTime();
int lastIPnum = rnd.nextInt(255);
String ip = "192.168.2." + lastIPnum;
String msg = words[_rand.nextInt(words.length)];
try {
producer.send(new ProducerRecord<>(topic, ip, msg));
System.out.println("Sent message: (" + ip + ", " + msg + ")");
} catch (Exception e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws InterruptedException {
Producer producer = new Producer(Constants.TOPIC, args);
producer.producerMsg();
//If I write Thread.sleep(1000),It will not work!!!!!!!!!!!!!!!!!!!!
Thread.sleep(2000);
}
}
appreciate that
can you show the props you are using for configuring the Producer ? I'm only guessing that it's possible that ...
In the producerMsg() you are using the async way to use the producer so just producer.send() which means that the message is put in an internal buffer for making batches that will be sent later. The producer has an internal thread to get from the buffer and sending the batch. Maybe that only 1000 ms aren't enough for reaching the condition where the producer really sends messages (see batch.size and linger.ms), the main application ends and the producer dies without sending messages. Giving it more time (2000 ms), it works. Btw, I didn't try the code.
So the reason seems to be your :
props.put("linger.ms", 1000);
that matches with your sleep. So the producer will start to send messages after 1000 ms, because the batch isn't already full (batch.size is 16 MB). At same time, the main thread ends after sleeping 1 secs and the producer doesn't send messages. You have to use a lower linger.ms time.

Why use Kryo serialize framework into apache storm will over write data when blot get values

Maybe mostly develop were use AVRO as serialize framework in Kafka and Apache Storm scheme. But I need handle most complex data then I found the Kryo serialize framework also were successfully integrate it into our project which follow Kafka and Apache Storm environment. But when want to further operation there had a strange status.
I had sent 5 times message to Kafka, the Storm job also can read the 5 messages and deserialize success. But next blot get the data value is wrong. There print out the same value as the last message. Then I had add the print out after when complete the deserialize code. Actually it print out true there had different 5 message. Why the next blot can't the values? See my code below:
KryoScheme.java
public abstract class KryoScheme<T> implements Scheme {
private static final long serialVersionUID = 6923985190833960706L;
private static final Logger logger = LoggerFactory.getLogger(KryoScheme.class);
private Class<T> clazz;
private Serializer<T> serializer;
public KryoScheme(Class<T> clazz, Serializer<T> serializer) {
this.clazz = clazz;
this.serializer = serializer;
}
#Override
public List<Object> deserialize(byte[] buffer) {
Kryo kryo = new Kryo();
kryo.register(clazz, serializer);
T scheme = null;
try {
scheme = kryo.readObject(new Input(new ByteArrayInputStream(buffer)), this.clazz);
logger.info("{}", scheme);
} catch (Exception e) {
String errMsg = String.format("Kryo Scheme failed to deserialize data from Kafka to %s. Raw: %s",
clazz.getName(),
new String(buffer));
logger.error(errMsg, e);
throw new FailedException(errMsg, e);
}
return new Values(scheme);
}}
PrintFunction.java
public class PrintFunction extends BaseFunction {
private static final Logger logger = LoggerFactory.getLogger(PrintFunction.class);
#Override
public void execute(TridentTuple tuple, TridentCollector collector) {
List<Object> data = tuple.getValues();
if (data != null) {
logger.info("Scheme data size: {}", data.size());
for (Object value : data) {
PrintOut out = (PrintOut) value;
logger.info("{}.{}--value: {}",
Thread.currentThread().getName(),
Thread.currentThread().getId(),
out.toString());
collector.emit(new Values(out));
}
}
}}
StormLocalTopology.java
public class StormLocalTopology {
public static void main(String[] args) {
........
BrokerHosts zk = new ZkHosts("xxxxxx");
Config stormConf = new Config();
stormConf.put(Config.TOPOLOGY_DEBUG, false);
stormConf.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 1000 * 5);
stormConf.put(Config.TOPOLOGY_WORKERS, 1);
stormConf.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 5);
stormConf.put(Config.TOPOLOGY_TASKS, 1);
TridentKafkaConfig actSpoutConf = new TridentKafkaConfig(zk, topic);
actSpoutConf.fetchSizeBytes = 5 * 1024 * 1024 ;
actSpoutConf.bufferSizeBytes = 5 * 1024 * 1024 ;
actSpoutConf.scheme = new SchemeAsMultiScheme(scheme);
actSpoutConf.startOffsetTime = kafka.api.OffsetRequest.LatestTime();
TridentTopology topology = new TridentTopology();
TransactionalTridentKafkaSpout actSpout = new TransactionalTridentKafkaSpout(actSpoutConf);
topology.newStream(topic, actSpout).parallelismHint(4).shuffle()
.each(new Fields("act"), new PrintFunction(), new Fields());
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(topic+"Topology", stormConf, topology.build());
}}
There also other problem why the kryo scheme only can read one message buffer. Is there other way get multi messages buffer then can batch send data to next blot.
Also if I send 1 message the full flow seems success.
Then send 2 message is wrong. the print out message like below:
56157 [Thread-18-spout0] INFO s.s.a.s.s.c.KryoScheme - 2016-02- 05T17:20:48.122+0800,T6mdfEW#N5pEtNBW
56160 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Scheme data size: 1
56160 [Thread-18-spout0] INFO s.s.a.s.s.c.KryoScheme - 2016-02- 05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
56161 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Thread-20-b-0.99--value: 2016-02-05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
56162 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Scheme data size: 1
56162 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Thread-20-b-0.99--value: 2016-02-05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
I'm sorry this my mistake. Just found a bug in Kryo deserialize class, there exist an local scope parameter, so it can be over write in multi thread environment. Not change the parameter in party scope, the code run well.
reference code see blow:
public class KryoSerializer<T extends BasicEvent> extends Serializer<T> implements Serializable {
private static final long serialVersionUID = -4684340809824908270L;
// It's wrong set
//private T event;
public KryoSerializer(T event) {
this.event = event;
}
#Override
public void write(Kryo kryo, Output output, T event) {
event.write(output);
}
#Override
public T read(Kryo kryo, Input input, Class<T> type) {
T event = new T();
event.read(input);
return event;
}
}

apache kafka throwing an exception for scala

I am trying to compile and run a simple kafka code that is a sample from Aapche.When compiling I am getting the following exception, even after adding all the lib files for scala (i guess).
Exception in thread "main" java.lang.NullPointerException
at scala.Predef$.Integer2int(Predef.scala:303)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:103)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:44)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:44)
at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:102)
at kafka.producer.BrokerPartitionInfo.<init>(BrokerPartitionInfo.scala:32)
at kafka.producer.async.DefaultEventHandler.<init>(DefaultEventHandler.scala:41)
at kafka.producer.Producer.<init>(Producer.scala:60)
at kafka.javaapi.producer.Producer.<init>(Producer.scala:26)
at kafkaTest.TestProducer.main(TestProducer.java:23)
This is my program:
package kafkaTest;
import java.util.*;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class TestProducer {
public static void main(String[] args) {
// long events = Long.parseLong(args[0]);
long events = 10l;
Random rnd = new Random();
Properties props = new Properties();
props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
props.put("serializer.class", "kafka.serializer.StringEncoder");
***![props.put("partitioner.class", "kafkaTest.SimplePartitioner");][1]***//this is line no 23
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
for (long nEvents = 0; nEvents < events; nEvents++) { long runtime =
new Date().getTime(); String ip = "192.168.2.1" + rnd.nextInt(255);
String msg = runtime + ",www.example.com," + ip; KeyedMessage<String,
String> data = new KeyedMessage<String, String>("page_visits", ip,
msg); producer.send(data); }
producer.close();
}
}
The attached is the screen shot of library files.
Please let me know the cause of error/exception.
Edit: this is SimplePartitioner.java
package kafkaTest;
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner {
public SimplePartitioner(VerifiableProperties props) {
}
public int partition(Object key, int a_numPartitions) {
int partition = 0;
String stringKey = (String) key;
int offset = stringKey.lastIndexOf('.');
if (offset > 0) {
partition = Integer.parseInt(stringKey.substring(offset + 1))
% a_numPartitions;
}
return partition;
}
}
There's a space at the end of your broker list :
props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
Remove it and it should work fine then :
props.put("metadata.broker.list", "broker1:9092,broker2:9092");
I also got this error when metadata.broker.list has a broker with no port number.