Unable to turn kafka stream into flink table - scala

I want to make a table out of a kafka stream and print it but getting this trivial errror.
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = StreamTableEnvironment.create(env)
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
val src = new FlinkKafkaConsumer010[ObjectNode]("broadcast", new JSONKeyValueDeserializationSchema(false), properties)
val stream = env.
addSource(src)
val tbl = tEnv.registerDataStream("ASK", stream, 'locationID, 'temp)
env.execute()
Kindly Help, ThankYou

Related

Presto is giving this error : Cannot invoke "com.fasterxml.jackson.databind.JsonNode.has(String)" because "currentNode" is null

I'm pushing a JSON file into a Kafka topic, connecting the topic in presto and structuring the JSON data into a queryable table.
The problem I am facing is that , presto is not to fetch data its shows error Cannot invoke "com.fasterxml.jackson.databind.JsonNode.has(String)" because "currentNode" is null.
Code for pushing data into kafka topic:
object Producer extends App{
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.connect.json.JsonSerializer")
val producer = new KafkaProducer[String,JsonNode](props)
println("inside prducer")
val mapper = (new ObjectMapper() with ScalaObjectMapper).
registerModule(DefaultScalaModule).
configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false).
findAndRegisterModules(). // register joda and java-time modules automatically
asInstanceOf[ObjectMapper with ScalaObjectMapper]
val filename = "/Users/rishunigam/Documents/devicd.json"
val jsonNode: JsonNode= mapper.readTree(new File(filename))
val s = jsonNode.size()
for(i <- 0 to jsonNode.size()) {
val js = jsonNode.get(i)
println(js)
val record = new ProducerRecord[String, JsonNode]( "tpch.devicelog", js)
println(record)
producer.send( record )
}
println("producer complete")
producer.close()
}

How can I forward Protobuf data from Flink to Kafka and stdout?

I'd like to add some codes here and stdout the protobuf data from Flink.
I am using Flink's Apache Kafka Connector in order to connect Flink to Kafka.
This is my Flink's code.
val env = StreamExecutionEnvironment.getExecutionEnvironment
val props = new Properties()
props.setProperty("bootstrap.servers", "localhost:9092").
val producer = FlinkKafkaProducer011(topic, new myProtobufSchema, props)
env.addSink(producer)
env.execute("To Kafka")
Here is my Kafka's code.
val props: Properties = {
val p = new Properties()
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "protobuf-application")
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
p
}
val builder: StreamsBuilder = new StreamsBuilder
// TODO: implement here to stdout
val streams: KafkaStreams = new KafkaStreams(builder.build(), props)
streams.start()
sys.ShutdownHookThread {
streams.close(Duration.ofSeconds(10))
}
You need to setup the StreamsBuilder to consume from a topic
val builder: StreamsBuilder = new StreamsBuilder()
.stream(topic)
.print(Printed.toSysOut());

Consuming multiple topics of multiple Kafka brokers in Flink

I am using flink-1.4.2 with scala and I want to consume multiple data stream sources of Kafka. I have used union function of combine them but I am able to use one kafka source.
def main(args: Array[String]) {
val kProps = new Properties()
kProps.setProperty("bootstrap.servers", "kafka01.prod.com:9092")
kProps.setProperty("group.id", "test_cg")
kProps.setProperty("enable.auto.commit", "true")
kProps.setProperty("auto.offset.reset", "latest")
val kProps2 = new Properties()
kProps2.setProperty("bootstrap.servers", "kafka04.prod.com:9092")
kProps2.setProperty("group.id", "test_cg_2")
kProps2.setProperty("enable.auto.commit", "true")
kProps2.setProperty("auto.offset.reset", "latest")
val sink = new BucketingSink[SimpleKafkaOutputMsg]("s3://some-bucket/")
sink.setBucketer(new DateTimeBucketer[SimpleKafkaOutputMsg]("yyyy-MM-dd-HH"))
sink.setWriter(new StringWriter[SimpleKafkaOutputMsg])
sink.setBatchSize(350 * 1024 * 1024) // 350 MB
sink.setPendingPrefix("file-")
sink.setPendingSuffix(".csv")
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
env.setParallelism(9)
env.setStateBackend(new RocksDBStateBackend("file:///tmp/flink/checkpoints", false))
val topics = List("ROUNDTRIP1")
val inpStream1 = env.addSource(new FlinkKafkaConsumer011(topics.asJava, new IndexedSectorMessagDes(), kProps))
val topics2 = List("ROUNDTRIP2")
val inpStream2 = env.addSource(new FlinkKafkaConsumer011(topics2.asJava, new IndexedSectorMessagDes(), kProps2))
val inpStream = inpStream1.union(inpStream2)
.filter(new InvalidFlightsFilterFunction())
.map(attachUID(_))
.assignTimestampsAndWatermarks(new IngestionTimeExtractor[IndexedSectorMessage]())
val intStream = inpStream.flatMap { s => flattenFlights(s) }
intStream.keyBy(getFlightKey _).process(new KeyedWindowTimeMedianFunction()).addSink(sink)
env.execute("Scala WindowExample Example")`

No File writen down to HDFS in flink

I'm trying to consume kafka by flink and save the result to hdfs but no file was produces all the time.. and no error message raise up..
btw, it's ok to save to local file but when I change the path to hdfs, I got nothing.
object kafka2Hdfs {
private val ZOOKEEPER_HOST = "ip1:2181,ip2:2181,ip3:2181"
private val KAFKA_BROKER = "ip1:9092,ip2:9092,ip3:9092"
private val TRANSACTION_GROUP = "transaction"
val topic = "tgt3"
def main(args : Array[String]){
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.enableCheckpointing(1000L)
env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)
// configure Kafka consumer
val kafkaProps = new Properties()
.... //topic infos
kafkaProps.setProperty("fs.default-scheme", "hdfs://ip:8020")
val consumer = new FlinkKafkaConsumer010[String](topic, new SimpleStringSchema(), kafkaProps)
val source = env.addSource(consumer)
val path = new Path("/user/jay/data")
// sink
val rollingPolicy : RollingPolicy[String,String] = DefaultRollingPolicy.create()
.withRolloverInterval(15000)
.build()
val sink: StreamingFileSink[String] = StreamingFileSink
.forRowFormat(path, new SimpleStringEncoder[String]("UTF-8"))
.withRollingPolicy(rollingPolicy)
.build()
source.addSink(sink)
env.execute("test")
}
}
I'm very confused..
Off the top of my head, there could be two things to look into:
Is the HDFS namenode properly configured so that Flink knows it tries to write to HDFS instead of local disk?
What do the nodemanger and taskmanager logs say? it could fail due to permission issue on HDFS.

Reactive Kafka not working

I am trying out an simple reactive-kafka program which reads and writes to Kafka. It starts up but does nothing even when I am publishing messages to the input topic.
implicit val system = ActorSystem("main")
implicit val materializer = ActorMaterializer()
val kafkaUrl: String = "localhost:9092"
val producerSettings = ProducerSettings(system, new ByteArraySerializer, new StringSerializer)
.withBootstrapServers(kafkaUrl)
val consumerSettings = ConsumerSettings(system, new ByteArrayDeserializer, new StringDeserializer,
Set("inputTopic"))
.withBootstrapServers(kafkaUrl)
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val flow: RunnableGraph[Control] = Consumer.committableSource(consumerSettings.withClientId("client1"))
.map { msg =>
println("msg = " + msg)
Producer.Message(new ProducerRecord[Array[Byte], String]("test.topic2", msg.value), msg.committableOffset)
}
.to(Producer.commitableSink(producerSettings))
flow.run()
It just stays there forever. Any tips on debugging why this is not working?