how to write emitted tuple into kafka topic

how to write emitted tuple into kafka topic - apache-kafka

Application is reading messages from one Kafka topic and after storing in MongoDB and doing some validations it is writing into another topic. Here I am facing issue like application is going into infinite loop.
Code I have is below:
Hosts zkHosts = new ZkHosts("localhost:2181");
String zkRoot = "/brokers/topics" ;
String clientRequestID = "reqtest";
String clientPendingID = "pendtest";
SpoutConfig kafkaRequestConfig = new SpoutConfig(zkHosts,"reqtest",zkRoot,clientRequestID);
SpoutConfig kafkaPendingConfig = new SpoutConfig(zkHosts,"pendtest",zkRoot,clientPendingID);
kafkaRequestConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
kafkaPendingConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaRequestSpout = new KafkaSpout(kafkaRequestConfig);
KafkaSpout kafkaPendingSpout = new KafkaSpout(kafkaPendingConfig);
MongoBolt mongoBolt = new MongoBolt() ;
DeviceFilterBolt deviceFilterBolt = new DeviceFilterBolt() ;
KafkaRequestBolt kafkaReqBolt = new KafkaRequestBolt() ;
abc1DeviceBolt abc1DevBolt = new abc1DeviceBolt() ;
DefaultTopicSelector defTopicSelector = new DefaultTopicSelector(xyzKafkaTopic.RESPONSE.name()) ;
KafkaBolt kafkaRespBolt = new KafkaBolt()
.withTopicSelector(defTopicSelector)
.withTupleToKafkaMapper(new FieldNameBasedTupleToKafkaMapper()) ;
TopologyBuilder topoBuilder = new TopologyBuilder();
topoBuilder.setSpout(xyzComponent.KAFKA_REQUEST_SPOUT.name(), kafkaRequestSpout);
topoBuilder.setSpout(xyzComponent.KAFKA_PENDING_SPOUT.name(), kafkaPendingSpout);
topoBuilder.setBolt(xyzComponent.KAFKA_PENDING_BOLT.name(),
deviceFilterBolt, 1)
.shuffleGrouping(xyzComponent.KAFKA_PENDING_SPOUT.name()) ;
topoBuilder.setBolt(xyzComponent.abc1_DEVICE_BOLT.name(),
abc1DevBolt, 1)
.shuffleGrouping(xyzComponent.KAFKA_PENDING_BOLT.name(),
xyzDevice.abc1.name()) ;
topoBuilder.setBolt(xyzComponent.MONGODB_BOLT.name(),
mongoBolt, 1)
.shuffleGrouping(xyzComponent.abc1_DEVICE_BOLT.name(),
xyzStreamID.KAFKARESP.name());
topoBuilder.setBolt(xyzComponent.KAFKA_RESPONSE_BOLT.name(),
kafkaRespBolt, 1)
.shuffleGrouping(xyzComponent.abc1_DEVICE_BOLT.name(),
xyzStreamID.KAFKARESP.name());
Config config = new Config() ;
config.setDebug(true);
config.setNumWorkers(1);
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
config.put(KafkaBolt.KAFKA_BROKER_PROPERTIES, props);
LocalCluster cluster = new LocalCluster();
try{
cluster.submitTopology("demo", config, topoBuilder.createTopology());
}
In the above code, KAFKA_RESPONSE_BOLT is writing the data into topic.
abc1_DEVICE_BOLT is feeding this KAFKA_RESPONSE_BOLT by emitting the data like:
#Override
public void declareOutputFields(OutputFieldsDeclarer ofd) {
Fields respFields = IoTFields.getKafkaResponseFieldsRTEXY();
ofd.declareStream(IoTStreamID.KAFKARESP.name(), respFields);
}
#Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
List<Object> newTuple = new ArrayList<Object>() ;
String params = tuple.getStringByField("params") ;
newTuple.add(3, params);
----
collector.emit(IoTStreamID.KAFKARESP.name(), newTuple);
}

I have been bothered by the same question for a long time, the answer is very simple... you will not believe it .
As far as I understand,implementation of KafkaBolt have to receive tuples has field name of “message”，no matter it is a Bolt or Spout.So you have to do some changes to your code, which I have not seen carefully.(But I believe this would help!)
The specific reason are said at https://mail-archives.apache.org/mod_mbox/incubator-storm-user/201409.mbox/%3C6AF1CAC6-60EA-49D9-8333-0343777B48A7#andrashatvani.com%3E

Related

How to print TimeWindowedKStream and KTable in Kafka streams?

We have a Kafka process that takes a topic as input and writes timed window to the output topic.. the following code is being used. I would like to print TimeWindowedKStream(groupedStream) and KTable(aggregatedTable) and see the output for some debugging purposes..
String intopic = input_topic;
Long window = 60;
String outtopic = output_topic;
final Serde<String> stringSerde = Serdes.String();
Properties property = new Properties();
property.put("bootstrap.servers", "127.0.0.1:9092");
property.put("group.id", "test-consumer-group");
property.put("application.id", "sliding-window-min-bar");
property.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, stringSerde.getClass().getName());
property.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, stringSerde.getClass().getName());
Duration windowSizeMs = Duration.ofMinutes(window);
StreamsBuilder builder = new StreamsBuilder();
System.out.println(intopic);
KStream<String, String> equitybar = builder.stream(intopic, Consumed.with(stringSerde, stringSerde));
System.out.println(equitybar);
equitybar.print(Printed.toSysOut());
// convert string of csv to a double on the mean value
KStream<String, String> transformedbar = equitybar
.map((key, value) -> KeyValue.pair(key, value.substring(1,value.length()-2).split(",")[2]));
System.out.println(transformedbar);
transformedbar.print(Printed.toSysOut());
// group by equity and sliding window
System.out.println(windowSizeMs);
System.out.println(TimeWindows.of(windowSizeMs).advanceBy(advanceMs));
TimeWindowedKStream<String, String> groupedStream = transformedbar.groupByKey().windowedBy(TimeWindows.of(windowSizeMs).advanceBy(advanceMs));
System.out.println(groupedStream);
KTable<Windowed<String>, String> aggregatedTable = groupedStream.aggregate(
() -> "|",
(aggKey, newValue, aggValue) -> aggValue + newValue.trim() + "|") ;
I tried to print it using the the print command that is used for Kafka streams - groupedStream.print(Printed.toSysOut()); - but it doesn't seem to be working.
Thanks.

KGroupedStream and TimeWindowedKStream are "just" helper classes to allow the DSL to present a fluent API to chain operator without too many overloads on a single class.
In the DSL, there are only two main abstractions, KStream and KTable that are actual first-class data-containers. Thus, it's not possible what you want to do.

applying keyed state on top of stream from co group stream

I have two kafka sources
I am trying to perform world count and merge the counts from two streams
I have created window of 1 min for both data streams and applying coGroupBykey , from DoFn , i am emitting <Key,Value> (word,count)
On top of this coGroupByKey function , I am applying stateful ParDo
Let say if i get (Test,2) from stream 1, (Test,3) from stream 2 in same window time then in CogroupByKey function , i ll merge as (Test,5), but if they are not falling in same window , i will emit (Test,2) and (Test,3)
Now i will apply state for merging these elements
So finally as result i should get (Test,5), but i am not getting the expected result , All elements form stream 1 are going to one partition and
elements from stream 2 to another partition , thats why i am getting result
(Test,2)
(Test,3)
// word count stream from kafka topic 1
PCollection<KV<String,Long>> stream1 = ...
// word count stream from kafka topic 2
PCollection<KV<String,Long>> stream2 = ...
PCollection<KV<String,Long>> windowed1 =
stream1.apply(
Window
.<KV<String,Long>>into(FixedWindows.of(Duration.millis(60000)))
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
.withAllowedLateness(Duration.millis(1000))
.discardingFiredPanes());
PCollection<KV<String,Long>> windowed2 =
stream2.apply(
Window
.<KV<String,Long>>into(FixedWindows.of(Duration.millis(60000)))
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
.withAllowedLateness(Duration.millis(1000))
.discardingFiredPanes());
final TupleTag<Long> count1 = new TupleTag<Long>();
final TupleTag<Long> count2 = new TupleTag<Long>();
// Merge collection values into a CoGbkResult collection.
PCollection<KV<String, CoGbkResult>> joinedStream =
KeyedPCollectionTuple.of(count1, windowed1).and(count2, windowed2)
.apply(CoGroupByKey.<String>create());
// applying state operation after coGroupKey fun
PCollection<KV<String,Long>> finalCountStream =
joinedStream.apply(ParDo.of(
new DoFn<KV<String, CoGbkResult>, KV<String,Long>>() {
#StateId(stateId)
private final StateSpec<MapState<String, Long>> mapState =
StateSpecs.map();
#ProcessElement
public void processElement(
ProcessContext processContext,
#StateId(stateId) MapState<String, Long> state) {
KV<String, CoGbkResult> element = processContext.element();
Iterable<Long> count1 = element.getValue().getAll(web);
Iterable<Long> count2 = element.getValue().getAll(assist);
Long sumAmount =
StreamSupport
.stream(
Iterables.concat(count1, count2).spliterator(), false)
.collect(Collectors.summingLong(n -> n));
System.out.println(element.getKey()+"::"+sumAmount);
// processContext.output(element.getKey()+"::"+sumAmount);
Long currCount =
state.get(element.getKey()).read() == null
? 0L
: state.get(element.getKey()).read();
Long newCount = currCount+sumAmount;
state.put(element.getKey(),newCount);
processContext.output(KV.of(element.getKey(),newCount));
}
}));
finalCountStream
.apply("finalState", ParDo.of(new DoFn<KV<String,Long>, String>() {
#StateId(myState)
private final StateSpec<MapState<String, Long>> mapState =
StateSpecs.map();
#ProcessElement
public void processElement(
ProcessContext c,
#StateId(myState) MapState<String, Long> state) {
KV<String,Long> e = c.element();
Long currCount = state.get(e.getKey()).read()==null
? 0L
: state.get(e.getKey()).read();
Long newCount = currCount+e.getValue();
state.put(e.getKey(),newCount);
c.output(e.getKey()+":"+newCount);
}
}))
.apply(KafkaIO.<Void, String>write()
.withBootstrapServers("localhost:9092")
.withTopic("test")
.withValueSerializer(StringSerializer.class)
.values());

Alternatively, you can use a Flatten + Combine approach, which should be give you simpler code:
PCollection<KV<String, Long>> pc1 = ...;
PCollection<KV<String, Long>> pc2 = ...;
PCollectionList<KV<String, Long>> pcs = PCollectionList.of(pc1).and(pc2);
PCollection<KV<String, Long>> merged = pcs.apply(Flatten.<KV<String, Long>>pCollections());
merged.apply(windiw...).apply(Combine.perKey(Sum.ofLongs()))

You have set up both streams with the trigger Repeatedly.forever(AfterPane.elementCountAtLeast(1)) and discardingFiredPanes(). This will cause the CoGroupByKey to output as soon as possible after each input element and then reset its state each time. So it is normal behavior that it basically passes each input straight through.
Let me explain more: CoGroupByKey is executed like this:
All elements from stream1 and stream2 are tagged as you specified. So every (key, value1) from stream1 effectively becomes (key, (count1, value1)). And every (key, value2) from stream2 becomes `(key, (count2, value2))
These tagged collects are flattened together. So now there is one collection with elements like (key, (count1, value1)) and (key, (count2, value2)).
The combined collection goes through a normal GroupByKey. This is where triggers happen. So with the default trigger, you get (key, [(count1, value1), (count2, value2), ...]) with all the values for a key getting grouped. But with your trigger, you will often get separate (key, [(count1, value1)]) and (key, [(count2, value2)]) because each grouping fires right away.
The output of the GroupByKey is wrapped in just an API that is CoGbkResult. In many runners this is just a filtered view of the grouped iterable.
Of course, triggers are nondeterministic and runners are also allowed to have different implementations of CoGroupByKey. But the behavior you are seeing is expected. You probably don't want to use trigger like that or discarding mode, or else you need to do more grouping downstream.
Generally, doing a join with CoGBK is going to require some work downstream, until Beam supports retractions.

PipelineOptions options = PipelineOptionsFactory.create();
options.as(FlinkPipelineOptions.class)
.setRunner(FlinkRunner.class);
Pipeline p = Pipeline.create(options);
PCollection<KV<String,Long>> stream1 = new KafkaWordCount("localhost:9092","test1")
.build(p);
PCollection<KV<String,Long>> stream2 = new KafkaWordCount("localhost:9092","test2")
.build(p);
PCollectionList<KV<String, Long>> pcs = PCollectionList.of(stream1).and(stream2);
PCollection<KV<String, Long>> merged = pcs.apply(Flatten.<KV<String, Long>>pCollections());
merged.apply("finalState", ParDo.of(new DoFn<KV<String,Long>, String>() {
#StateId(myState)
private final StateSpec<MapState<String, Long>> mapState = StateSpecs.map();
#ProcessElement
public void processElement(ProcessContext c, #StateId(myState) MapState<String, Long> state){
KV<String,Long> e = c.element();
System.out.println("Thread ID :"+ Thread.currentThread().getId());
Long currCount = state.get(e.getKey()).read()==null? 0L:state.get(e.getKey()).read();
Long newCount = currCount+e.getValue();
state.put(e.getKey(),newCount);
c.output(e.getKey()+":"+newCount);
}
})).apply(KafkaIO.<Void, String>write()
.withBootstrapServers("localhost:9092")
.withTopic("test")
.withValueSerializer(StringSerializer.class)
.values()
);
p.run().waitUntilFinish();

kafka stream windowed count output unreadable

I am trying windowed count with word count example. It works fine except that output is partially unreadable.
Code:
StringSerializer stringSerializer = new StringSerializer();
StringDeserializer stringDeserializer = new StringDeserializer();
WindowedSerializer<String> windowedSerializer = new WindowedSerializer<>(stringSerializer);
WindowedDeserializer<String> windowedDeserializer = new WindowedDeserializer<>(stringDeserializer);
Serde<Windowed<String>> windowedSerde = Serdes.serdeFrom(windowedSerializer, windowedDeserializer);
TimeWindows window = TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).advanceBy(TimeUnit.MINUTES.toMillis(1));
KStream<String, String> textLines = builder.stream("streams-plaintext-input");
KTable<Windowed<String>, Long> wordCounts = textLines
.flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
.groupBy((key, word) -> word)
.windowedBy(window)
.count(Materialized.<String, Long, WindowStore<Bytes, byte[]>>as("counts-store"));
wordCounts.toStream().to("streams-plaintext-output", Produced.with(windowedSerde, Serdes.Long()));
KafkaStreams streams = new KafkaStreams(builder.build(), config);
streams.start();
Output:
kafka c[?? 1
yaya c[?? 1
kafka c[?? 2
I guess the unreadable part might be windows duration.
What can I do to let it readable?
EDIT:
Tried to use windowedSerde to print output:
KStream<Windowed<String>, Long> output = builder.stream("streams-plaintext-output");
output.print(windowedSerde, Serdes.Long());
It still doesn't work.

When reading from the topic you need to use a Deserializer appropriate for Serializer that was used to produce to the topic. In this case, you need to use the windowDeserializer, which you are already constructing like so:
WindowedDeserializer<String> windowedDeserializer = new WindowedDeserializer<>(stringDeserializer);

Config cluster ZK_HOST on Storm Topology

Now I have just set up successfully my storm topology with single node on single machine. I use KafkaSpout as below:
String zkHostPort = "localhost:2181";
String topic = "sentences";
String zkRoot = "/kafka-sentence-spout";
String zkSpoutId = "sentence-spout";
ZkHosts zkHosts = new ZkHosts(zkHostPort);
SpoutConfig spoutCfg = new SpoutConfig(zkHosts, topic, zkRoot, zkSpoutId);
KafkaSpout kafkaSpout = new KafkaSpout(spoutCfg);
return kafkaSpout;
Now I set up cluster zookeeper(three node: server1.com:2181, server2.com:2181. server3.com:2181) and cluster kafka (three node). I wonder how I can change code on Storm Topology for this purpose. Please help me!!

Please, use the configuration below:
String zkHostPort = "server1.com:2181,server2.com:2181,server3.com:2181";
String topic = "sentences";
String zkRoot = "/kafka-sentence-spout";
String zkSpoutId = "sentence-spout";
ZkHosts zkHosts = new ZkHosts(zkHostPort);
SpoutConfig spoutCfg = new SpoutConfig(zkHosts, topic, zkRoot, zkSpoutId);
KafkaSpout kafkaSpout = new KafkaSpout(spoutCfg);
return kafkaSpout;
Note: the most common issue here is space placed between hosts after comma, there must not be space between hosts.
Correct:
server1.com:2181,server2.com:2181,server3.com:2181
Wrong:
server1.com:2181, server2.com:2181, server3.com:2181

How to update a text file which always contains a single line?

I have the following code to read a line from a text file.
In the UpdateFile() method I need to delete the existing one line and update it with a new line.
Can anybody please provide any ideas?
Thank you.
FileInfo JFile = new FileInfo(#"C:\test.txt");
using (FileStream JStream = JFile.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
int n = GetNUmber(JStream);
n = n + 1;
UpdateFile(JStream);
}
private int GetNUmber(FileStream jstream)
{
StreamReader sr = new StreamReader(jstream);
string line = sr.ReadToEnd().Trim();
int result;
if (string.IsNullOrEmpty(line))
{
return 0;
}
else
{
int.TryParse(line, out result);
return result;
}
}
private int UpdateFile(FileStream jstream)
{
jstream.Seek(0, SeekOrigin.Begin);
StreamWriter writer = new StreamWriter(jstream);
writer.WriteLine(n);
}

I think the below code can do your job
StreamWriter writer = new StreamWriter("file path", false); //false means do not append
writer.Write("your new line");
writer.Close();

If you're just writing a single line, there's no need for streams or buffers or any of that. Just write it directly.
using System.IO;
File.WriteAllText(#"C:\test.txt", "hello world");

var line = File.ReadLines(#"c:\temp\hello.txt").ToList()[0];
var number = Convert.ToInt32(line);
number++;
File.WriteAllText(#"c:\temp\hello.txt", number.ToString());
Manage the possible exceptions, file exists, file has lines, the cast......

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to write emitted tuple into kafka topic - apache-kafka

Related

How to print TimeWindowedKStream and KTable in Kafka streams?

applying keyed state on top of stream from co group stream

kafka stream windowed count output unreadable

Config cluster ZK_HOST on Storm Topology

How to update a text file which always contains a single line?

Categories

Resources