Kafka server Shutdown after continue wrting data to topic - scala

when i am writing data to kafka topic using spark Direct streaming 2.1, after some time kafka geting shutdown.
val props = new HashMap[String, Object]()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:9092")
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
val message = new ProducerRecord[String, String](topic, partition,compression key,message)
producer.send(message)

Related

Presto is giving this error : Cannot invoke "com.fasterxml.jackson.databind.JsonNode.has(String)" because "currentNode" is null

I'm pushing a JSON file into a Kafka topic, connecting the topic in presto and structuring the JSON data into a queryable table.
The problem I am facing is that , presto is not to fetch data its shows error Cannot invoke "com.fasterxml.jackson.databind.JsonNode.has(String)" because "currentNode" is null.
Code for pushing data into kafka topic:
object Producer extends App{
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.connect.json.JsonSerializer")
val producer = new KafkaProducer[String,JsonNode](props)
println("inside prducer")
val mapper = (new ObjectMapper() with ScalaObjectMapper).
registerModule(DefaultScalaModule).
configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false).
findAndRegisterModules(). // register joda and java-time modules automatically
asInstanceOf[ObjectMapper with ScalaObjectMapper]
val filename = "/Users/rishunigam/Documents/devicd.json"
val jsonNode: JsonNode= mapper.readTree(new File(filename))
val s = jsonNode.size()
for(i <- 0 to jsonNode.size()) {
val js = jsonNode.get(i)
println(js)
val record = new ProducerRecord[String, JsonNode]( "tpch.devicelog", js)
println(record)
producer.send( record )
}
println("producer complete")
producer.close()
}

How can I forward Protobuf data from Flink to Kafka and stdout?

I'd like to add some codes here and stdout the protobuf data from Flink.
I am using Flink's Apache Kafka Connector in order to connect Flink to Kafka.
This is my Flink's code.
val env = StreamExecutionEnvironment.getExecutionEnvironment
val props = new Properties()
props.setProperty("bootstrap.servers", "localhost:9092").
val producer = FlinkKafkaProducer011(topic, new myProtobufSchema, props)
env.addSink(producer)
env.execute("To Kafka")
Here is my Kafka's code.
val props: Properties = {
val p = new Properties()
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "protobuf-application")
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
p
}
val builder: StreamsBuilder = new StreamsBuilder
// TODO: implement here to stdout
val streams: KafkaStreams = new KafkaStreams(builder.build(), props)
streams.start()
sys.ShutdownHookThread {
streams.close(Duration.ofSeconds(10))
}
You need to setup the StreamsBuilder to consume from a topic
val builder: StreamsBuilder = new StreamsBuilder()
.stream(topic)
.print(Printed.toSysOut());

Get kafka record timestamp from kafka message

I want the timestamp at which the message was inserted in kafka topic by producer.
And at the kafka consumer side, i want to extract that timestamp.
class Producer {
def main(args: Array[String]): Unit = {
writeToKafka("quick-start")
}
def writeToKafka(topic: String): Unit = {
val props = new Properties()
props.put("bootstrap.servers", "localhost:9094")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
val record = new ProducerRecord[String, String](topic, "key", "value")
producer.send(record)
producer.close()
}
}
class Consumer {
def main(args: Array[String]): Unit = {
consumeFromKafka("quick-start")
}
def consumeFromKafka(topic: String) = {
val props = new Properties()
props.put("bootstrap.servers", "localhost:9094")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "latest")
props.put("group.id", "consumer-group")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
consumer.subscribe(util.Arrays.asList(topic))
while (true) {
val record = consumer.poll(1000).asScala
for (data <- record.iterator)
println(data.value())
}
}
}
Does kafka provides a way to do it? Else i will have to send an extra field from producer to topic.
Kafka provides a way since v0.10
From that version, all your messages have a timestamp information available in data.timestamp, and the kind of information inside is ruled by the config "message.timestamp.type" on your brokers. The value should be either CreateTime or LogAppendTime.
Before this version, you'll have to implement it by hand, usually through modifying your data structure.

Kafka 2.3.0 producer and consumer

I'm new to kafka,and want to use Kafka 2.3 to implement a producer/consumer app.
I had download and install the Kafka 2.3 on my ubuntu server.
I found some code online and build it on my laptop in IDEA, But the consumer can't get any info.
I had checked the topic info on my server which has the topic.
I had use kafka-console-consumer to check this topic, got the topic's value successfuly, but not with my consumer.
So what's wrong with my consumer?
Producer
package com.phitrellis.tool
import java.util.Properties
import java.util.concurrent.{Future, TimeUnit}
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.clients.producer._
object MyKafkaProducer extends App {
def createKafkaProducer(): Producer[String, String] = {
val props = new Properties()
props.put("bootstrap.servers", "*:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("producer.type", "async")
props.put("acks", "all")
new KafkaProducer[String, String](props)
}
def writeToKafka(topic: String): Unit = {
val producer = createKafkaProducer()
val record = new ProducerRecord[String, String](topic, "key", "value22222222222")
println("start")
producer.send(record)
producer.close()
println("end")
}
writeToKafka("phitrellis")
}
Consumer
package com.phitrellis.tool
import java.util
import java.util.Properties
import java.time.Duration
import scala.collection.JavaConverters._
import org.apache.kafka.clients.consumer.KafkaConsumer
object MyKafkaConsumer extends App {
def createKafkaConsumer(): KafkaConsumer[String, String] = {
val props = new Properties()
props.put("bootstrap.servers", "*:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
// props.put("auto.offset.reset", "latest")
props.put("enable.auto.commit", "true")
props.put("auto.commit.interval.ms", "1000")
props.put("group.id", "test")
new KafkaConsumer[String, String](props)
}
def consumeFromKafka(topic: String) = {
val consumer: KafkaConsumer[String, String] = createKafkaConsumer()
consumer.subscribe(util.Arrays.asList(topic))
while (true) {
val records = consumer.poll(Duration.ofSeconds(2)).asScala.iterator
println("true")
for (record <- records){
print(record.value())
}
}
}
consumeFromKafka("phitrellis")
}
Two line in your Consumer code are crucial:
props.put("auto.offset.reset", "latest")
props.put("group.id", "test")
To read from beginning of the topic you have to set auto.offset.reset to earliest (latest cause that you skip messages produced before your Consumer started).
group.id is responsible for group management. If you start processing data with some group.id and than restart your application or start new with same group.id only new messages will be read.
For your tests I would suggest to add auto.offset.reset -> earliest and change group.id
props.put("auto.offset.reset", "earliest")
props.put("group.id", "test123")
Additionally:
You have to remember that KafkaProducer::send returns Future<RecordMetadata> and messages are sent asynchronously and if you progam finished before Future will finished messages might not be sent.
There's two parts here. The producing side, and the consumer.
You don't say anything about the producer, so we're assuming it did work. However, did you check on the servers? You could check the kafka log files to see if there's any data on those particular topic/partitions.
On the consumer side, to validate, you should try to consume using the command-line from that same topic, to make sure the data is in there. Look for "Kafka Consumer Console" at the following link, and follow those steps.
http://cloudurable.com/blog/kafka-tutorial-kafka-from-command-line/index.html
If there is data on the topic, then running that command should get you data. If it's not, then it will just "hang" because it's waiting for data to be written to the topic.
In addition, you can try producing to the same topic using those command line tools, to make sure your cluster is configured correctly, you have the right addresses and ports, that the ports are not blocked, etc.

Can anyone share a Flink Kafka example in Scala?

Can anyone share a working example of Flink Kafka (mainly receiving messages from Kafka) in Scala? I know there is a KafkaWordCount example in Spark. I just need to print out Kafka message in Flink. It would be really helpful.
The following code shows how to read from a Kafka topic using Flink's Scala DataStream API:
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
object Main {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("zookeeper.connect", "localhost:2181")
properties.setProperty("group.id", "test")
val stream = env
.addSource(new FlinkKafkaConsumer082[String]("topic", new SimpleStringSchema(), properties))
.print
env.execute("Flink Kafka Example")
}
}
In contrast to what Robert added, below is a piece of application code for sending messages to the Kafka topic.
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
object KafkaProducer {
def main(args: Array[String]): Unit = {
KafkaProducer.sendMessageToKafkaTopic("localhost:9092", "topic_name")
}
def sendMessageToKafkaTopic(server: String, topic:String): Unit = {
val props = new Properties()
props.put("bootstrap.servers", servers)
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String,String](props)
val record = new ProducerRecord[String,String](topic, "HELLO WORLD!")
producer.send(record)
producer.close()
}
}