Adding a name to source processor of Kafka streams app results in serialization exception - scala

I'm trying to name my source processor using the Consumed.as() method (full code below):
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name"))
However when I'm running the application I'm getting the following exception:
scalaorg.apache.kafka.common.config.ConfigException: Please specify a value serde or set one through StreamsConfig#DEFAULT_VALUE_SERDE_CLASS_CONFIG
When I looked at the definition of .as() I saw this:
public static <K, V> Consumed<K, V> as(final String processorName) {
return new Consumed<>(null, null, null, null, processorName);
}
So I guessed the issue was that the key/value serdes were set to null.
I tried to solve it by adding a call to withValueSerde():
val orderSerde = ...
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name").withValueSerde(orderSerde))
But got the same error. What am I doing wrong?
Note: if I remove the Consumed.as() part the code works and the exception is not being thrown
Following is the full code (some imports were removed for readability reasons):
import org.apache.kafka.common.serialization.Serde
import org.apache.kafka.streams.kstream.{GlobalKTable, JoinWindows, TimeWindows, Windowed}
import org.apache.kafka.streams.scala.ImplicitConversions._
import org.apache.kafka.streams.scala.serialization.Serdes
import org.apache.kafka.streams.scala.serialization.Serdes._
import scala.concurrent.duration._
object KafkaStreamsApp {
implicit def serde[A >: Null : Decoder : Encoder]: Serde[A] = {
val serializer = (a: A) => a.asJson.noSpaces.getBytes
val deserializer = (aAsBytes: Array[Byte]) => {
val aAsString = new String(aAsBytes)
val aOrError = decode[A](aAsString)
aOrError match {
case Right(a) => Option(a)
case Left(error) =>
Option.empty
}
}
Serdes.fromFn[A](serializer, deserializer)
}
implicit val orderSerde: Serde[Order] = serde[Order]
// Topics
final val ordersByUserTopic = "orders-by-user"
final val filterOrders = "filter-low-orders"
final val applyMapValues = "mapValues-apply-discount"
final val payedOrdersTopic = "filtered-orders"
type UserId = String
case class Order(user: UserId, amount: Double)
val builder = new StreamsBuilder
val usersOrdersStreams: KStream[UserId, Order] =
builder.stream[UserId, Order](ordersByUserTopic)(Consumed.as("vvv").withValueSerde(orderSerde))
def paidOrdersTopology(): Unit = {
usersOrdersStreams
.filter((_, v) => v.amount > 1000.0, named = Named.as(filterOrders))
.mapValues(v => v.copy(amount = v.amount * 0.85), named = Named.as(applyMapValues))
.to(payedOrdersTopic)
}
def main(args: Array[String]): Unit = {
val props = new Properties
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "orders-application")
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.stringSerde.getClass)
paidOrdersTopology()
val topology: Topology = builder.build()
println(topology.describe())
val application: KafkaStreams = new KafkaStreams(topology, props)
application.start()
}
}

So... after some digging I managed to find the issue: the key serde was missing. The following code sets only the values serde, which creates a Consumed object with a null key serde:
val orderSerde = ...
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name").withValueSerde(orderSerde))
When I added the key serde as well:
val orderSerde = ...
val consumed = Consumed.as("topic-name")
.withKeySerde(Serdes.stringSerde) // Missing key serde
.withValueSerde(orderSerde)
val usersOrdersStreams: KStream[UserId, Order] =
builder.stream[UserId, Order](ordersByUserTopic)(consumed)
The code started working.
The only thing I'm not sure about is why the error thrown stated that value serde was missing, when it's the key serde that's missing.

Related

Flink 1.12 serialize Avro Generic Record to Kafka failed with com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException

I have a DataStream[GenericRecord]:
val consumer = new FlinkKafkaConsumer[String]("input_csv_topic", new SimpleStringSchema(), properties)
val stream = senv.
addSource(consumer).
map(line => {
val arr = line.split(",")
val schemaUrl = "" // avro schema link, standard .avsc file format
val schemaStr = scala.io.Source.fromURL(schemaUrl).mkString.toString().stripLineEnd
import org.codehaus.jettison.json.{JSONObject, JSONArray}
val schemaFields: JSONArray = new JSONObject(schemaStr).optJSONArray("fields")
val genericDevice: GenericRecord = new GenericData.Record(new Schema.Parser().parse(schemaStr))
for(i <- 0 until arr.length) {
val fieldObj: JSONObject = schemaFields.optJSONObject(i)
val columnName = fieldObj.optString("name")
var columnType = fieldObj.optString("type")
if (columnType.contains("string")) {
genericDevice.put(columnName, arr(i))
} else if (columnType.contains("int")) {
genericDevice.put(columnName, toInt(arr(i)).getOrElse(0).asInstanceOf[Number].intValue)
} else if (columnType.contains("long")) {
genericDevice.put(columnName, toLong(arr(i)).getOrElse(0).asInstanceOf[Number].longValue)
}
}
genericDevice
})
val kafkaSink = new FlinkKafkaProducer[GenericRecord](
"output_avro_topic",
new MyKafkaAvroSerializationSchema[GenericRecord](classOf[GenericRecord], "output_avro_topic", "this is the key", schemaStr),
properties,
FlinkKafkaProducer.Semantic.AT_LEAST_ONCE)
stream.addSink(kafkaSink)
Here is MyKafkaAvroSerializationSchema implementation:
class MyKafkaAvroSerializationSchema[T](avroType: Class[T], topic: String, key: String, schemaStr: String) extends KafkaSerializationSchema[T] {
lazy val schema: Schema = new Schema.Parser().parse(schemaStr)
override def serialize(element: T, timestamp: lang.Long): ProducerRecord[Array[Byte], Array[Byte]] = {
val cl = Thread.currentThread().getContextClassLoader()
val genericData = new GenericData(cl)
val writer = new GenericDatumWriter[T](schema, genericData)
// val writer = new ReflectDatumWriter[T](schema)
// val writer = new SpecificDatumWriter[T](schema)
val out = new ByteArrayOutputStream()
val encoder: BinaryEncoder = EncoderFactory.get().binaryEncoder(out, null)
writer.write(element, encoder)
encoder.flush()
out.close()
new ProducerRecord[Array[Byte], Array[Byte]](topic, key.getBytes, out.toByteArray)
}
}
Here's stack trace screenshot:
com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException
Serialization trace:
reserved (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
How to use Flink to serialize Avro Generic Record to Kafka? I have tested different writers, but still got com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException, thanks for your input.
You can simply add the flink-avro module to Your project and use the already provided AvroSerializationSchema that can be used both for SpecificRecord and GenericRecord after providing the schema.

Scala JavaFx -- Cannot resolve overloaded method 'add' when trying to add tree table columns

I am trying to write an application using JavaFX and Scala (not ScalaFX). When I tried out this example from http://tutorials.jenkov.com/javafx/treetableview.html (Add TreeTableColumn to TreeTableView), I got a "Cannot resolve overloaded method 'add'" in the last two lines. I was wondering if you can help me get past this issue.
class Phase1 extends Application {
import javafx.scene.control.TreeTableColumn
import javafx.scene.control.TreeTableView
import javafx.scene.control.cell.TreeItemPropertyValueFactory
override def start(primaryStage: Stage): Unit = {
primaryStage.setTitle("Experimental Blocking Tree")
val scene = new Scene(new Group(), 1500, 800)
val sceneRoot = scene.getRoot.asInstanceOf[Group]
val treeTableView = new TreeTableView[Car]
val treeTableColumn1: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Brand")
val treeTableColumn2: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Model")
treeTableColumn1.setCellValueFactory(new TreeItemPropertyValueFactory[Car, String]("brand"))
treeTableColumn2.setCellValueFactory(new TreeItemPropertyValueFactory[Car, String]("model"))
treeTableView.getColumns.add(treeTableColumn1) // cannot resolve overloaded method here
treeTableView.getColumns.add(treeTableColumn2) // and here
}
}
Thanks in advance.
I had the same issue with displaying data in TreeTableView.
Jarek posted a solution here: GitHub Issue
Also this works for me:
import scalafx.beans.property.ReadOnlyStringProperty
case class Car (
val brand: ReadOnlyStringProperty,
val model: ReadOnlyStringProperty
)
class CarStringFactory(val stringValue: ReadOnlyStringProperty) extends scalafx.beans.value.ObservableValue[String, String] {
override def delegate: javafx.beans.value.ObservableValue[String] = stringValue
override def value: String = stringValue.get
}
class YourScalaFXApp {
// ... boilerplate code ...
import scalafx.scene.control.{TreeTableView, TreeTableColumn}
val treeTableView = new TreeTableView[Car]
val treeTableColumn1: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Brand"){
cellValueFactory = {p => new CarStringFactory(p.value.value.value.brand) }
}
val treeTableColumn2: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Model"){
cellValueFactory = {p => new CarStringFactory(p.value.value.value.model) }
}
treeTableView.getColumns.add(treeTableColumn1)
treeTableView.getColumns.add(treeTableColumn2)
}
Refer to
ScalaFX documentation: Properties
TreeTableColumn.cellValueFactory

Trying to define serializer using avro4s but getting a missing implicit error

I am using flink(1.7) kafka client and Avro4s(2.0.4), I want to serialize to byte array :
class AvroSerializationSchema[IN : SchemaFor : FromRecord: ToRecord] extends SerializationSchema[IN] {
override def serialize(element: IN): Array[Byte] = {
val str = AvroSchema[IN]
val schema: Schema = new Parser().parse(str.toString)
val out = new ByteArrayOutputStream()
val os = AvroOutputStream.data[IN].to(out).build(schema)
os.write(element)
out.close()
out.flush()
os.flush()
os.close()
out.toByteArray
}
}
However I am keep getting this exception :
Error:(15, 35) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.Encoder[IN]
val os = AvroOutputStream.data[IN].to(out).build(schema)
and
Error:(15, 35) not enough arguments for method data: (implicit evidence$3: com.sksamuel.avro4s.Encoder[IN])com.sksamuel.avro4s.AvroOutputStreamBuilder[IN].
Unspecified value parameter evidence$3.
val os = AvroOutputStream.data[IN].to(out).build(schema)
According to code IN has to be Encoder type:
object AvroOutputStream {
/**
* An [[AvroOutputStream]] that does not write the schema. Use this when
* you want the smallest messages possible at the cost of not having the schema available
* in the messages for downstream clients.
*/ def binary[T: Encoder] = new AvroOutputStreamBuilder[T](BinaryFormat)
def json[T: Encoder] = new AvroOutputStreamBuilder[T](JsonFormat)
def data[T: Encoder] = new AvroOutputStreamBuilder[T](DataFormat)
}
so it should something like:
class AvroSerializationSchema[IN : Encoder] ...
You don't need to use FromRecord when writing to the output stream. That is for people who want to have a GenericRecord for their own use. You need to use Encoder.
class AvroSerializationSchema[IN : SchemaFor : Encoder] extends SerializationSchema[IN] {
override def serialize(element: IN): Array[Byte] = {
val str = AvroSchema[IN]
val schema: Schema = new Parser().parse(str.toString)
val out = new ByteArrayOutputStream()
val os = AvroOutputStream.data[IN].to(out).build(schema)
os.write(element)
out.close()
out.flush()
os.flush()
os.close()
out.toByteArray
}
}

How to ensure constant Avro schema generation and avoid the 'Too many schema objects created for x' exception?

I am experiencing a reproducible error while producing Avro messages with reactive kafka and avro4s. Once the identityMapCapacity of the client (CachedSchemaRegistryClient) is reached, serialization fails with
java.lang.IllegalStateException: Too many schema objects created for <myTopic>-value
This is unexpected, since all messages should have the same schema - they are serializations of the same case class.
val avroProducerSettings: ProducerSettings[String, GenericRecord] =
ProducerSettings(system, Serdes.String().serializer(),
avroSerde.serializer())
.withBootstrapServers(settings.bootstrapServer)
val avroProdFlow: Flow[ProducerMessage.Message[String, GenericRecord, String],
ProducerMessage.Result[String, GenericRecord, String],
NotUsed] = Producer.flow(avroProducerSettings)
val avroQueue: SourceQueueWithComplete[Message[String, GenericRecord, String]] =
Source.queue(bufferSize, overflowStrategy)
.via(avroProdFlow)
.map(logResult)
.to(Sink.ignore)
.run()
...
queue.offer(msg)
The serializer is a KafkaAvroSerializer, instantiated with a new CachedSchemaRegistryClient(settings.schemaRegistry, 1000)
Generating the GenericRecord:
def toAvro[A](a: A)(implicit recordFormat: RecordFormat[A]): GenericRecord =
recordFormat.to(a)
val makeEdgeMessage: (Edge, String) => Message[String, GenericRecord, String] = { (edge, topic) =>
val edgeAvro: GenericRecord = toAvro(edge)
val record = new ProducerRecord[String, GenericRecord](topic, edge.id, edgeAvro)
ProducerMessage.Message(record, edge.id)
}
The schema is created deep in the code (io.confluent.kafka.serializers.AbstractKafkaAvroSerDe#getSchema, invoked by io.confluent.kafka.serializers.AbstractKafkaAvroSerializer#serializeImpl) where I have no influence on it, so I have no idea how to fix the leak. Looks to me like the two confluent projects do not work well together.
The issues I have found here, here and here do not seem to address my use case.
The two workarounds for me are currently:
not use schema registry - not a long-term solution obviously
create custom SchemaRegistryClient not relying on object identity - doable but I would like to avoid creating more issues than by reimplementing
Is there a way to generate or cache a consistent schema depending on message/record type and use it with my setup?
edit 2017.11.20
The issue in my case was that each instance of GenericRecord carrying my message has been serialized by a different instance of RecordFormat, containing a different instance of the Schema. The implicit resolution here generated a new instance each time.
def toAvro[A](a: A)(implicit recordFormat: RecordFormat[A]): GenericRecord = recordFormat.to(a)
The solution was to pin the RecordFormat instance to a val and reuse it explicitly. Many thanks to https://github.com/heliocentrist for explaining the details.
original response:
After waiting for a while (also no answer for the github issue) I had to implement my own SchemaRegistryClient. Over 90% is copied from the original CachedSchemaRegistryClient, just translated into scala. Using a scala mutable.Map fixed the memory leak. I have not performed any comprehensive tests, so use at your own risk.
import java.util
import io.confluent.kafka.schemaregistry.client.rest.entities.{ Config, SchemaString }
import io.confluent.kafka.schemaregistry.client.rest.entities.requests.ConfigUpdateRequest
import io.confluent.kafka.schemaregistry.client.rest.{ RestService, entities }
import io.confluent.kafka.schemaregistry.client.{ SchemaMetadata, SchemaRegistryClient }
import org.apache.avro.Schema
import scala.collection.mutable
class CachingSchemaRegistryClient(val restService: RestService, val identityMapCapacity: Int)
extends SchemaRegistryClient {
val schemaCache: mutable.Map[String, mutable.Map[Schema, Integer]] = mutable.Map()
val idCache: mutable.Map[String, mutable.Map[Integer, Schema]] =
mutable.Map(null.asInstanceOf[String] -> mutable.Map())
val versionCache: mutable.Map[String, mutable.Map[Schema, Integer]] = mutable.Map()
def this(baseUrl: String, identityMapCapacity: Int) {
this(new RestService(baseUrl), identityMapCapacity)
}
def this(baseUrls: util.List[String], identityMapCapacity: Int) {
this(new RestService(baseUrls), identityMapCapacity)
}
def registerAndGetId(subject: String, schema: Schema): Int =
restService.registerSchema(schema.toString, subject)
def getSchemaByIdFromRegistry(id: Int): Schema = {
val restSchema: SchemaString = restService.getId(id)
(new Schema.Parser).parse(restSchema.getSchemaString)
}
def getVersionFromRegistry(subject: String, schema: Schema): Int = {
val response: entities.Schema = restService.lookUpSubjectVersion(schema.toString, subject)
response.getVersion.intValue
}
override def getVersion(subject: String, schema: Schema): Int = synchronized {
val schemaVersionMap: mutable.Map[Schema, Integer] =
versionCache.getOrElseUpdate(subject, mutable.Map())
val version: Integer = schemaVersionMap.getOrElse(
schema, {
if (schemaVersionMap.size >= identityMapCapacity) {
throw new IllegalStateException(s"Too many schema objects created for $subject!")
}
val version = new Integer(getVersionFromRegistry(subject, schema))
schemaVersionMap.put(schema, version)
version
}
)
version.intValue()
}
override def getAllSubjects: util.List[String] = restService.getAllSubjects()
override def getByID(id: Int): Schema = synchronized { getBySubjectAndID(null, id) }
override def getBySubjectAndID(subject: String, id: Int): Schema = synchronized {
val idSchemaMap: mutable.Map[Integer, Schema] = idCache.getOrElseUpdate(subject, mutable.Map())
idSchemaMap.getOrElseUpdate(id, getSchemaByIdFromRegistry(id))
}
override def getSchemaMetadata(subject: String, version: Int): SchemaMetadata = {
val response = restService.getVersion(subject, version)
val id = response.getId.intValue
val schema = response.getSchema
new SchemaMetadata(id, version, schema)
}
override def getLatestSchemaMetadata(subject: String): SchemaMetadata = synchronized {
val response = restService.getLatestVersion(subject)
val id = response.getId.intValue
val version = response.getVersion.intValue
val schema = response.getSchema
new SchemaMetadata(id, version, schema)
}
override def updateCompatibility(subject: String, compatibility: String): String = {
val response: ConfigUpdateRequest = restService.updateCompatibility(compatibility, subject)
response.getCompatibilityLevel
}
override def getCompatibility(subject: String): String = {
val response: Config = restService.getConfig(subject)
response.getCompatibilityLevel
}
override def testCompatibility(subject: String, schema: Schema): Boolean =
restService.testCompatibility(schema.toString(), subject, "latest")
override def register(subject: String, schema: Schema): Int = synchronized {
val schemaIdMap: mutable.Map[Schema, Integer] =
schemaCache.getOrElseUpdate(subject, mutable.Map())
val id = schemaIdMap.getOrElse(
schema, {
if (schemaIdMap.size >= identityMapCapacity)
throw new IllegalStateException(s"Too many schema objects created for $subject!")
val id: Integer = new Integer(registerAndGetId(subject, schema))
schemaIdMap.put(schema, id)
idCache(null).put(id, schema)
id
}
)
id.intValue()
}
}

playframework 2.4 - Unspecified value parameter headers error

I am upgrading playframework 2.4 from 2.3, I changed versions then if I compile same code, I see following error. Since I am novice at Scala, I am trying to learn Scala to solve this issue but still don't know what is the problem. What I want to do is adding a request header value from original request headers. Any help will be appreciated.
[error] /mnt/garner/project/app-service/app/com/company/playframework/filters/LoggingFilter.scala:26: not enough arguments for constructor Headers: (headers: Seq[(String, String)])play.api.mvc.Headers.
[error] Unspecified value parameter headers.
[error] val newHeaders = new Headers { val data = (requestHeader.headers.toMap
The LoggingFilter class
class LoggingFilter extends Filter {
val logger = AccessLogger.getInstance();
def apply(next: (RequestHeader) => Future[Result])(requestHeader: RequestHeader): Future[Result] = {
val startTime = System.currentTimeMillis
val requestId = logger.createLog();
val newHeaders = new Headers { val data = (requestHeader.headers.toMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> Seq(requestId))).toList }
val newRequestHeader = requestHeader.copy(headers = newHeaders)
next(newRequestHeader).map { result =>
val endTime = System.currentTimeMillis
val requestTime = endTime - startTime
val bytesToString: Enumeratee[ Array[Byte], String ] = Enumeratee.map[Array[Byte]]{ bytes => new String(bytes) }
val consume: Iteratee[String,String] = Iteratee.consume[String]()
val resultBody : Future[String] = result.body |>>> bytesToString &>> consume
resultBody.map {
body =>
logger.finish(requestId, result.header.status, requestTime, body)
}
result;
}
}
}
Edit
I updated codes as following and it compiled well
following codes changed
val newHeaders = new Headers { val data = (requestHeader.headers.toMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> Seq(requestId))).toList }
to
val newHeaders = new Headers((requestHeader.headers.toSimpleMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> requestId)).toList)
It simply states that if you want to construct Headers you need to supply a field named headers which is of type Seq[(String, String)]. If you omit the inital new you will be using the apply function of the corresponding object for Headers which will just take a parameter of a vararg of (String, String) and your code should work. If you look at documentation https://www.playframework.com/documentation/2.4.x/api/scala/index.html#play.api.mvc.Headers and flip between the docs for object and class it should become clear.