I'm trying to name my source processor using the Consumed.as() method (full code below):
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name"))
However when I'm running the application I'm getting the following exception:
scalaorg.apache.kafka.common.config.ConfigException: Please specify a value serde or set one through StreamsConfig#DEFAULT_VALUE_SERDE_CLASS_CONFIG
When I looked at the definition of .as() I saw this:
public static <K, V> Consumed<K, V> as(final String processorName) {
return new Consumed<>(null, null, null, null, processorName);
}
So I guessed the issue was that the key/value serdes were set to null.
I tried to solve it by adding a call to withValueSerde():
val orderSerde = ...
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name").withValueSerde(orderSerde))
But got the same error. What am I doing wrong?
Note: if I remove the Consumed.as() part the code works and the exception is not being thrown
Following is the full code (some imports were removed for readability reasons):
import org.apache.kafka.common.serialization.Serde
import org.apache.kafka.streams.kstream.{GlobalKTable, JoinWindows, TimeWindows, Windowed}
import org.apache.kafka.streams.scala.ImplicitConversions._
import org.apache.kafka.streams.scala.serialization.Serdes
import org.apache.kafka.streams.scala.serialization.Serdes._
import scala.concurrent.duration._
object KafkaStreamsApp {
implicit def serde[A >: Null : Decoder : Encoder]: Serde[A] = {
val serializer = (a: A) => a.asJson.noSpaces.getBytes
val deserializer = (aAsBytes: Array[Byte]) => {
val aAsString = new String(aAsBytes)
val aOrError = decode[A](aAsString)
aOrError match {
case Right(a) => Option(a)
case Left(error) =>
Option.empty
}
}
Serdes.fromFn[A](serializer, deserializer)
}
implicit val orderSerde: Serde[Order] = serde[Order]
// Topics
final val ordersByUserTopic = "orders-by-user"
final val filterOrders = "filter-low-orders"
final val applyMapValues = "mapValues-apply-discount"
final val payedOrdersTopic = "filtered-orders"
type UserId = String
case class Order(user: UserId, amount: Double)
val builder = new StreamsBuilder
val usersOrdersStreams: KStream[UserId, Order] =
builder.stream[UserId, Order](ordersByUserTopic)(Consumed.as("vvv").withValueSerde(orderSerde))
def paidOrdersTopology(): Unit = {
usersOrdersStreams
.filter((_, v) => v.amount > 1000.0, named = Named.as(filterOrders))
.mapValues(v => v.copy(amount = v.amount * 0.85), named = Named.as(applyMapValues))
.to(payedOrdersTopic)
}
def main(args: Array[String]): Unit = {
val props = new Properties
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "orders-application")
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.stringSerde.getClass)
paidOrdersTopology()
val topology: Topology = builder.build()
println(topology.describe())
val application: KafkaStreams = new KafkaStreams(topology, props)
application.start()
}
}
So... after some digging I managed to find the issue: the key serde was missing. The following code sets only the values serde, which creates a Consumed object with a null key serde:
val orderSerde = ...
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name").withValueSerde(orderSerde))
When I added the key serde as well:
val orderSerde = ...
val consumed = Consumed.as("topic-name")
.withKeySerde(Serdes.stringSerde) // Missing key serde
.withValueSerde(orderSerde)
val usersOrdersStreams: KStream[UserId, Order] =
builder.stream[UserId, Order](ordersByUserTopic)(consumed)
The code started working.
The only thing I'm not sure about is why the error thrown stated that value serde was missing, when it's the key serde that's missing.
Related
I have a DataStream[GenericRecord]:
val consumer = new FlinkKafkaConsumer[String]("input_csv_topic", new SimpleStringSchema(), properties)
val stream = senv.
addSource(consumer).
map(line => {
val arr = line.split(",")
val schemaUrl = "" // avro schema link, standard .avsc file format
val schemaStr = scala.io.Source.fromURL(schemaUrl).mkString.toString().stripLineEnd
import org.codehaus.jettison.json.{JSONObject, JSONArray}
val schemaFields: JSONArray = new JSONObject(schemaStr).optJSONArray("fields")
val genericDevice: GenericRecord = new GenericData.Record(new Schema.Parser().parse(schemaStr))
for(i <- 0 until arr.length) {
val fieldObj: JSONObject = schemaFields.optJSONObject(i)
val columnName = fieldObj.optString("name")
var columnType = fieldObj.optString("type")
if (columnType.contains("string")) {
genericDevice.put(columnName, arr(i))
} else if (columnType.contains("int")) {
genericDevice.put(columnName, toInt(arr(i)).getOrElse(0).asInstanceOf[Number].intValue)
} else if (columnType.contains("long")) {
genericDevice.put(columnName, toLong(arr(i)).getOrElse(0).asInstanceOf[Number].longValue)
}
}
genericDevice
})
val kafkaSink = new FlinkKafkaProducer[GenericRecord](
"output_avro_topic",
new MyKafkaAvroSerializationSchema[GenericRecord](classOf[GenericRecord], "output_avro_topic", "this is the key", schemaStr),
properties,
FlinkKafkaProducer.Semantic.AT_LEAST_ONCE)
stream.addSink(kafkaSink)
Here is MyKafkaAvroSerializationSchema implementation:
class MyKafkaAvroSerializationSchema[T](avroType: Class[T], topic: String, key: String, schemaStr: String) extends KafkaSerializationSchema[T] {
lazy val schema: Schema = new Schema.Parser().parse(schemaStr)
override def serialize(element: T, timestamp: lang.Long): ProducerRecord[Array[Byte], Array[Byte]] = {
val cl = Thread.currentThread().getContextClassLoader()
val genericData = new GenericData(cl)
val writer = new GenericDatumWriter[T](schema, genericData)
// val writer = new ReflectDatumWriter[T](schema)
// val writer = new SpecificDatumWriter[T](schema)
val out = new ByteArrayOutputStream()
val encoder: BinaryEncoder = EncoderFactory.get().binaryEncoder(out, null)
writer.write(element, encoder)
encoder.flush()
out.close()
new ProducerRecord[Array[Byte], Array[Byte]](topic, key.getBytes, out.toByteArray)
}
}
Here's stack trace screenshot:
com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException
Serialization trace:
reserved (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
How to use Flink to serialize Avro Generic Record to Kafka? I have tested different writers, but still got com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException, thanks for your input.
You can simply add the flink-avro module to Your project and use the already provided AvroSerializationSchema that can be used both for SpecificRecord and GenericRecord after providing the schema.
I am trying to write an application using JavaFX and Scala (not ScalaFX). When I tried out this example from http://tutorials.jenkov.com/javafx/treetableview.html (Add TreeTableColumn to TreeTableView), I got a "Cannot resolve overloaded method 'add'" in the last two lines. I was wondering if you can help me get past this issue.
class Phase1 extends Application {
import javafx.scene.control.TreeTableColumn
import javafx.scene.control.TreeTableView
import javafx.scene.control.cell.TreeItemPropertyValueFactory
override def start(primaryStage: Stage): Unit = {
primaryStage.setTitle("Experimental Blocking Tree")
val scene = new Scene(new Group(), 1500, 800)
val sceneRoot = scene.getRoot.asInstanceOf[Group]
val treeTableView = new TreeTableView[Car]
val treeTableColumn1: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Brand")
val treeTableColumn2: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Model")
treeTableColumn1.setCellValueFactory(new TreeItemPropertyValueFactory[Car, String]("brand"))
treeTableColumn2.setCellValueFactory(new TreeItemPropertyValueFactory[Car, String]("model"))
treeTableView.getColumns.add(treeTableColumn1) // cannot resolve overloaded method here
treeTableView.getColumns.add(treeTableColumn2) // and here
}
}
Thanks in advance.
I had the same issue with displaying data in TreeTableView.
Jarek posted a solution here: GitHub Issue
Also this works for me:
import scalafx.beans.property.ReadOnlyStringProperty
case class Car (
val brand: ReadOnlyStringProperty,
val model: ReadOnlyStringProperty
)
class CarStringFactory(val stringValue: ReadOnlyStringProperty) extends scalafx.beans.value.ObservableValue[String, String] {
override def delegate: javafx.beans.value.ObservableValue[String] = stringValue
override def value: String = stringValue.get
}
class YourScalaFXApp {
// ... boilerplate code ...
import scalafx.scene.control.{TreeTableView, TreeTableColumn}
val treeTableView = new TreeTableView[Car]
val treeTableColumn1: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Brand"){
cellValueFactory = {p => new CarStringFactory(p.value.value.value.brand) }
}
val treeTableColumn2: TreeTableColumn[Car, String] = new TreeTableColumn[Car, String]("Model"){
cellValueFactory = {p => new CarStringFactory(p.value.value.value.model) }
}
treeTableView.getColumns.add(treeTableColumn1)
treeTableView.getColumns.add(treeTableColumn2)
}
Refer to
ScalaFX documentation: Properties
TreeTableColumn.cellValueFactory
I am using flink(1.7) kafka client and Avro4s(2.0.4), I want to serialize to byte array :
class AvroSerializationSchema[IN : SchemaFor : FromRecord: ToRecord] extends SerializationSchema[IN] {
override def serialize(element: IN): Array[Byte] = {
val str = AvroSchema[IN]
val schema: Schema = new Parser().parse(str.toString)
val out = new ByteArrayOutputStream()
val os = AvroOutputStream.data[IN].to(out).build(schema)
os.write(element)
out.close()
out.flush()
os.flush()
os.close()
out.toByteArray
}
}
However I am keep getting this exception :
Error:(15, 35) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.Encoder[IN]
val os = AvroOutputStream.data[IN].to(out).build(schema)
and
Error:(15, 35) not enough arguments for method data: (implicit evidence$3: com.sksamuel.avro4s.Encoder[IN])com.sksamuel.avro4s.AvroOutputStreamBuilder[IN].
Unspecified value parameter evidence$3.
val os = AvroOutputStream.data[IN].to(out).build(schema)
According to code IN has to be Encoder type:
object AvroOutputStream {
/**
* An [[AvroOutputStream]] that does not write the schema. Use this when
* you want the smallest messages possible at the cost of not having the schema available
* in the messages for downstream clients.
*/ def binary[T: Encoder] = new AvroOutputStreamBuilder[T](BinaryFormat)
def json[T: Encoder] = new AvroOutputStreamBuilder[T](JsonFormat)
def data[T: Encoder] = new AvroOutputStreamBuilder[T](DataFormat)
}
so it should something like:
class AvroSerializationSchema[IN : Encoder] ...
You don't need to use FromRecord when writing to the output stream. That is for people who want to have a GenericRecord for their own use. You need to use Encoder.
class AvroSerializationSchema[IN : SchemaFor : Encoder] extends SerializationSchema[IN] {
override def serialize(element: IN): Array[Byte] = {
val str = AvroSchema[IN]
val schema: Schema = new Parser().parse(str.toString)
val out = new ByteArrayOutputStream()
val os = AvroOutputStream.data[IN].to(out).build(schema)
os.write(element)
out.close()
out.flush()
os.flush()
os.close()
out.toByteArray
}
}
I am experiencing a reproducible error while producing Avro messages with reactive kafka and avro4s. Once the identityMapCapacity of the client (CachedSchemaRegistryClient) is reached, serialization fails with
java.lang.IllegalStateException: Too many schema objects created for <myTopic>-value
This is unexpected, since all messages should have the same schema - they are serializations of the same case class.
val avroProducerSettings: ProducerSettings[String, GenericRecord] =
ProducerSettings(system, Serdes.String().serializer(),
avroSerde.serializer())
.withBootstrapServers(settings.bootstrapServer)
val avroProdFlow: Flow[ProducerMessage.Message[String, GenericRecord, String],
ProducerMessage.Result[String, GenericRecord, String],
NotUsed] = Producer.flow(avroProducerSettings)
val avroQueue: SourceQueueWithComplete[Message[String, GenericRecord, String]] =
Source.queue(bufferSize, overflowStrategy)
.via(avroProdFlow)
.map(logResult)
.to(Sink.ignore)
.run()
...
queue.offer(msg)
The serializer is a KafkaAvroSerializer, instantiated with a new CachedSchemaRegistryClient(settings.schemaRegistry, 1000)
Generating the GenericRecord:
def toAvro[A](a: A)(implicit recordFormat: RecordFormat[A]): GenericRecord =
recordFormat.to(a)
val makeEdgeMessage: (Edge, String) => Message[String, GenericRecord, String] = { (edge, topic) =>
val edgeAvro: GenericRecord = toAvro(edge)
val record = new ProducerRecord[String, GenericRecord](topic, edge.id, edgeAvro)
ProducerMessage.Message(record, edge.id)
}
The schema is created deep in the code (io.confluent.kafka.serializers.AbstractKafkaAvroSerDe#getSchema, invoked by io.confluent.kafka.serializers.AbstractKafkaAvroSerializer#serializeImpl) where I have no influence on it, so I have no idea how to fix the leak. Looks to me like the two confluent projects do not work well together.
The issues I have found here, here and here do not seem to address my use case.
The two workarounds for me are currently:
not use schema registry - not a long-term solution obviously
create custom SchemaRegistryClient not relying on object identity - doable but I would like to avoid creating more issues than by reimplementing
Is there a way to generate or cache a consistent schema depending on message/record type and use it with my setup?
edit 2017.11.20
The issue in my case was that each instance of GenericRecord carrying my message has been serialized by a different instance of RecordFormat, containing a different instance of the Schema. The implicit resolution here generated a new instance each time.
def toAvro[A](a: A)(implicit recordFormat: RecordFormat[A]): GenericRecord = recordFormat.to(a)
The solution was to pin the RecordFormat instance to a val and reuse it explicitly. Many thanks to https://github.com/heliocentrist for explaining the details.
original response:
After waiting for a while (also no answer for the github issue) I had to implement my own SchemaRegistryClient. Over 90% is copied from the original CachedSchemaRegistryClient, just translated into scala. Using a scala mutable.Map fixed the memory leak. I have not performed any comprehensive tests, so use at your own risk.
import java.util
import io.confluent.kafka.schemaregistry.client.rest.entities.{ Config, SchemaString }
import io.confluent.kafka.schemaregistry.client.rest.entities.requests.ConfigUpdateRequest
import io.confluent.kafka.schemaregistry.client.rest.{ RestService, entities }
import io.confluent.kafka.schemaregistry.client.{ SchemaMetadata, SchemaRegistryClient }
import org.apache.avro.Schema
import scala.collection.mutable
class CachingSchemaRegistryClient(val restService: RestService, val identityMapCapacity: Int)
extends SchemaRegistryClient {
val schemaCache: mutable.Map[String, mutable.Map[Schema, Integer]] = mutable.Map()
val idCache: mutable.Map[String, mutable.Map[Integer, Schema]] =
mutable.Map(null.asInstanceOf[String] -> mutable.Map())
val versionCache: mutable.Map[String, mutable.Map[Schema, Integer]] = mutable.Map()
def this(baseUrl: String, identityMapCapacity: Int) {
this(new RestService(baseUrl), identityMapCapacity)
}
def this(baseUrls: util.List[String], identityMapCapacity: Int) {
this(new RestService(baseUrls), identityMapCapacity)
}
def registerAndGetId(subject: String, schema: Schema): Int =
restService.registerSchema(schema.toString, subject)
def getSchemaByIdFromRegistry(id: Int): Schema = {
val restSchema: SchemaString = restService.getId(id)
(new Schema.Parser).parse(restSchema.getSchemaString)
}
def getVersionFromRegistry(subject: String, schema: Schema): Int = {
val response: entities.Schema = restService.lookUpSubjectVersion(schema.toString, subject)
response.getVersion.intValue
}
override def getVersion(subject: String, schema: Schema): Int = synchronized {
val schemaVersionMap: mutable.Map[Schema, Integer] =
versionCache.getOrElseUpdate(subject, mutable.Map())
val version: Integer = schemaVersionMap.getOrElse(
schema, {
if (schemaVersionMap.size >= identityMapCapacity) {
throw new IllegalStateException(s"Too many schema objects created for $subject!")
}
val version = new Integer(getVersionFromRegistry(subject, schema))
schemaVersionMap.put(schema, version)
version
}
)
version.intValue()
}
override def getAllSubjects: util.List[String] = restService.getAllSubjects()
override def getByID(id: Int): Schema = synchronized { getBySubjectAndID(null, id) }
override def getBySubjectAndID(subject: String, id: Int): Schema = synchronized {
val idSchemaMap: mutable.Map[Integer, Schema] = idCache.getOrElseUpdate(subject, mutable.Map())
idSchemaMap.getOrElseUpdate(id, getSchemaByIdFromRegistry(id))
}
override def getSchemaMetadata(subject: String, version: Int): SchemaMetadata = {
val response = restService.getVersion(subject, version)
val id = response.getId.intValue
val schema = response.getSchema
new SchemaMetadata(id, version, schema)
}
override def getLatestSchemaMetadata(subject: String): SchemaMetadata = synchronized {
val response = restService.getLatestVersion(subject)
val id = response.getId.intValue
val version = response.getVersion.intValue
val schema = response.getSchema
new SchemaMetadata(id, version, schema)
}
override def updateCompatibility(subject: String, compatibility: String): String = {
val response: ConfigUpdateRequest = restService.updateCompatibility(compatibility, subject)
response.getCompatibilityLevel
}
override def getCompatibility(subject: String): String = {
val response: Config = restService.getConfig(subject)
response.getCompatibilityLevel
}
override def testCompatibility(subject: String, schema: Schema): Boolean =
restService.testCompatibility(schema.toString(), subject, "latest")
override def register(subject: String, schema: Schema): Int = synchronized {
val schemaIdMap: mutable.Map[Schema, Integer] =
schemaCache.getOrElseUpdate(subject, mutable.Map())
val id = schemaIdMap.getOrElse(
schema, {
if (schemaIdMap.size >= identityMapCapacity)
throw new IllegalStateException(s"Too many schema objects created for $subject!")
val id: Integer = new Integer(registerAndGetId(subject, schema))
schemaIdMap.put(schema, id)
idCache(null).put(id, schema)
id
}
)
id.intValue()
}
}
I am upgrading playframework 2.4 from 2.3, I changed versions then if I compile same code, I see following error. Since I am novice at Scala, I am trying to learn Scala to solve this issue but still don't know what is the problem. What I want to do is adding a request header value from original request headers. Any help will be appreciated.
[error] /mnt/garner/project/app-service/app/com/company/playframework/filters/LoggingFilter.scala:26: not enough arguments for constructor Headers: (headers: Seq[(String, String)])play.api.mvc.Headers.
[error] Unspecified value parameter headers.
[error] val newHeaders = new Headers { val data = (requestHeader.headers.toMap
The LoggingFilter class
class LoggingFilter extends Filter {
val logger = AccessLogger.getInstance();
def apply(next: (RequestHeader) => Future[Result])(requestHeader: RequestHeader): Future[Result] = {
val startTime = System.currentTimeMillis
val requestId = logger.createLog();
val newHeaders = new Headers { val data = (requestHeader.headers.toMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> Seq(requestId))).toList }
val newRequestHeader = requestHeader.copy(headers = newHeaders)
next(newRequestHeader).map { result =>
val endTime = System.currentTimeMillis
val requestTime = endTime - startTime
val bytesToString: Enumeratee[ Array[Byte], String ] = Enumeratee.map[Array[Byte]]{ bytes => new String(bytes) }
val consume: Iteratee[String,String] = Iteratee.consume[String]()
val resultBody : Future[String] = result.body |>>> bytesToString &>> consume
resultBody.map {
body =>
logger.finish(requestId, result.header.status, requestTime, body)
}
result;
}
}
}
Edit
I updated codes as following and it compiled well
following codes changed
val newHeaders = new Headers { val data = (requestHeader.headers.toMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> Seq(requestId))).toList }
to
val newHeaders = new Headers((requestHeader.headers.toSimpleMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> requestId)).toList)
It simply states that if you want to construct Headers you need to supply a field named headers which is of type Seq[(String, String)]. If you omit the inital new you will be using the apply function of the corresponding object for Headers which will just take a parameter of a vararg of (String, String) and your code should work. If you look at documentation https://www.playframework.com/documentation/2.4.x/api/scala/index.html#play.api.mvc.Headers and flip between the docs for object and class it should become clear.