Get kafka record timestamp from kafka message - scala

I want the timestamp at which the message was inserted in kafka topic by producer.
And at the kafka consumer side, i want to extract that timestamp.
class Producer {
def main(args: Array[String]): Unit = {
writeToKafka("quick-start")
}
def writeToKafka(topic: String): Unit = {
val props = new Properties()
props.put("bootstrap.servers", "localhost:9094")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
val record = new ProducerRecord[String, String](topic, "key", "value")
producer.send(record)
producer.close()
}
}
class Consumer {
def main(args: Array[String]): Unit = {
consumeFromKafka("quick-start")
}
def consumeFromKafka(topic: String) = {
val props = new Properties()
props.put("bootstrap.servers", "localhost:9094")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "latest")
props.put("group.id", "consumer-group")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
consumer.subscribe(util.Arrays.asList(topic))
while (true) {
val record = consumer.poll(1000).asScala
for (data <- record.iterator)
println(data.value())
}
}
}
Does kafka provides a way to do it? Else i will have to send an extra field from producer to topic.

Kafka provides a way since v0.10
From that version, all your messages have a timestamp information available in data.timestamp, and the kind of information inside is ruled by the config "message.timestamp.type" on your brokers. The value should be either CreateTime or LogAppendTime.
Before this version, you'll have to implement it by hand, usually through modifying your data structure.

Related

Add prefix/suffix on kafka topic at runtime

So here suppose if environment is TEST1 and topic name is BENEFIT then it should return like TEST1_BENEFIT
object hello {
def Complete (jobId: BigInt, tableName: String, topic: String = configManager.getString("Kafka.Completion.Table.Topic")): Unit = {
val kafkaServer = configManager.getString("Kafka.Server")
val props = new Properties()
props.put("bootstrap.servers", kafkaServer)
props.put("key.serializer")
props.put("value.serializer")
props.put("batch.size", "1")
props.put("acks", "all")
val producer = new KafkaProducer[String, String](props)
val message = new ProducerRecord[String, String](topic, key, value)
producer.send(message)
}
}
As answered before, use string interpolation or concatenation
val message = new ProducerRecord[String, String](s"${environment}_${topic}", key, value)

Presto is giving this error : Cannot invoke "com.fasterxml.jackson.databind.JsonNode.has(String)" because "currentNode" is null

I'm pushing a JSON file into a Kafka topic, connecting the topic in presto and structuring the JSON data into a queryable table.
The problem I am facing is that , presto is not to fetch data its shows error Cannot invoke "com.fasterxml.jackson.databind.JsonNode.has(String)" because "currentNode" is null.
Code for pushing data into kafka topic:
object Producer extends App{
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.connect.json.JsonSerializer")
val producer = new KafkaProducer[String,JsonNode](props)
println("inside prducer")
val mapper = (new ObjectMapper() with ScalaObjectMapper).
registerModule(DefaultScalaModule).
configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false).
findAndRegisterModules(). // register joda and java-time modules automatically
asInstanceOf[ObjectMapper with ScalaObjectMapper]
val filename = "/Users/rishunigam/Documents/devicd.json"
val jsonNode: JsonNode= mapper.readTree(new File(filename))
val s = jsonNode.size()
for(i <- 0 to jsonNode.size()) {
val js = jsonNode.get(i)
println(js)
val record = new ProducerRecord[String, JsonNode]( "tpch.devicelog", js)
println(record)
producer.send( record )
}
println("producer complete")
producer.close()
}

How to consume only latest offset in Kafka topic

I am working on a scala application in which I am using kafka. My kafka consumer code is as follows.
def getValues(topic: String): String = {
val props = new Properties()
props.put("group.id", "test")
props.put("bootstrap.servers", "localhost:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "earliest")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
val topicPartition = new TopicPartition(topic, 0)
consumer.assign(util.Collections.singletonList(topicPartition))
val offset = consumer.position(topicPartition) - 1
val record = consumer.poll(Duration.ofMillis(500)).asScala
for (data <- record)
if(data.offset() == offset) val value = data.value()
return value
}
In this I just want to return latest value. When I run my application I get following log:
Resetting offset for partition topic-0 to offset 0
Because of which val offset = consumer.position(topicPartition) - 1 becomes -1 and data.offset() gives list of all offsets. And as a result I don't get the latest value. Why it is automatically resetting offset to 0? How can I correct it? What is the mistake in my code? or anyother way I can get the value from the latest offset?
You are looking for the seek method which - according to the JavaDocs - "overrides the fetch offsets that the consumer will use on the next poll(timeout)".
Also make sure that you are setting
props.put("auto.offset.reset", "latest")
Making those two amendments to your code, the following worked for me to only fetch the value of the latest offset of the partion 0 in the selected topic:
import java.time.Duration
import java.util.Properties
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.common.TopicPartition
import collection.JavaConverters._
def getValues(topic: String): String = {
val props = new Properties()
props.put("group.id", "test")
props.put("bootstrap.servers", "localhost:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "latest")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
val topicPartition = new TopicPartition(topic, 0)
consumer.assign(java.util.Collections.singletonList(topicPartition))
val offset = consumer.position(topicPartition) - 1
consumer.seek(topicPartition, offset)
val record = consumer.poll(Duration.ofMillis(500)).asScala
for (data <- record) {
val value: String = data.value() // you are only reading one message if no new messages flow into the Kafka topic
}
value
}
In this line, props.put("auto.offset.reset", "earliest"), you set the parameter auto.offset.reset of your Kafka consumer to earliest, which will reset the offset to earliest. If you want the latest value, you should use latest instead.
You can find the documentation here.

Kafka server Shutdown after continue wrting data to topic

when i am writing data to kafka topic using spark Direct streaming 2.1, after some time kafka geting shutdown.
val props = new HashMap[String, Object]()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:9092")
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
val message = new ProducerRecord[String, String](topic, partition,compression key,message)
producer.send(message)

Can anyone share a Flink Kafka example in Scala?

Can anyone share a working example of Flink Kafka (mainly receiving messages from Kafka) in Scala? I know there is a KafkaWordCount example in Spark. I just need to print out Kafka message in Flink. It would be really helpful.
The following code shows how to read from a Kafka topic using Flink's Scala DataStream API:
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
object Main {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("zookeeper.connect", "localhost:2181")
properties.setProperty("group.id", "test")
val stream = env
.addSource(new FlinkKafkaConsumer082[String]("topic", new SimpleStringSchema(), properties))
.print
env.execute("Flink Kafka Example")
}
}
In contrast to what Robert added, below is a piece of application code for sending messages to the Kafka topic.
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
object KafkaProducer {
def main(args: Array[String]): Unit = {
KafkaProducer.sendMessageToKafkaTopic("localhost:9092", "topic_name")
}
def sendMessageToKafkaTopic(server: String, topic:String): Unit = {
val props = new Properties()
props.put("bootstrap.servers", servers)
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String,String](props)
val record = new ProducerRecord[String,String](topic, "HELLO WORLD!")
producer.send(record)
producer.close()
}
}