KafkaProducer for both keyed and unkeyed ProducerRecords - scala

I'm using the 0.9 Kafka Java client in Scala.
scala> val kafkaProducer = new KafkaProducer[String, String](props)
ProducerRecord has several constructors that allow you to include or not include a key and/or partition.
scala> val keyedRecord = new ProducerRecord("topic", "key", "value")
scala> kafkaProducer.send(keyedRecord)
should have no problem.
However, an unkeyed ProducerRecord gives a type error.
scala> val unkeyedRecord = new ProducerRecord("topic", "value")
res8: org.apache.kafka.clients.producer.ProducerRecord[Nothing,String] =
ProducerRecord(topic=topic, partition=null, key=null, value=value
scala> kafkaProducer.send(res8)
<console>:17: error: type mismatch;
found : org.apache.kafka.clients.producer.ProducerRecord[Nothing,String]
required: org.apache.kafka.clients.producer.ProducerRecord[String,String]
Note: Nothing <: String, but Java-defined class ProducerRecord is invariant in type K.
You may wish to investigate a wildcard type such as `_ <: String`. (SLS 3.2.10)
kafkaProducer.send(res8)
^
Is this against Kafka's rules or could it be an unnecessary precaution that has come from using this Java API in Scala?
More fundamentally, is it poor form to put keyed and unkeyed messages in the same Kafka topic?
Thank you
Javadoc: http://kafka.apache.org/090/javadoc/org/apache/kafka/clients/producer/package-summary.html
Edit
Could changing the variance of parameter K in KafkaProducer fix this?

It looks like the answer is in the comments, but to spell it out, Scala uses type inference when types are not explicitly provided. Since you wrote:
val unkeyedRecord = new ProducerRecord("topic", "value")
The key is not provided, and it becomes null, which Scala's type system infers is a Nothing instance. To fix that, declare the types explicitly:
val unkeyedRecord = new ProducerRecord[String,String]("topic", "value")

Related

Zio-Kafka: Producing messages to topic

I'm trying to produce some messages to a Kafka topic using the library zio-kafka, version 0.15.0.
Clearly, my comprehension of the ZIO ecosystem is suboptimal cause I cannot produce a simple message. My program is the following:
object KafkaProducerExample extends zio.App {
val producerSettings: ProducerSettings = ProducerSettings(List("localhost:9092"))
val producer: ZLayer[Blocking, Throwable, Producer[Nothing, String, String]] =
ZLayer.fromManaged(Producer.make(producerSettings, Serde.string, Serde.string))
val effect: RIO[Nothing with Producer[Nothing, String, String], RecordMetadata] =
Producer.produce("topic", "key", "value")
override def run(args: List[String]): URIO[zio.ZEnv, ExitCode] = {
effect.provideSomeLayer(producer).exitCode
}
}
The compiler gives me the following error:
[error] KafkaProducerExample.scala:19:28: Cannot prove that zio.blocking.Blocking with zio.Has[zio.kafka.producer.Producer.Service[Nothing,String,String]] <:< Nothing with zio.kafka.producer.Producer[Nothing,String,String].
[error] effect.provideSomeLayer(producer).exitCode
[error] ^
[error] one error found
Can anyone help me in understanding what's going on?
Ok, it was ZIO that requires some hints about types during the creation of the producer layer:
val producer: ZLayer[Blocking, Throwable, Producer[Any, String, String]] =
ZLayer.fromManaged(Producer.make[Any, String, String](producerSettings, Serde.string, Serde.string))
When calling the make smart constructor, we have to give him the types we want to use. The first represents the environment needed to build key and value serializer, while the last two are the types of the messages' keys and values.
In this case, we need no environment at all to build the two serializers, so we pass Any.
Finally, also the Producer.produce function requires some type hints:
val effect: RIO[Producer[Any, String, String], RecordMetadata] =
Producer.produce[Any, String, String]("topic", "key", "value")
After doing the above changes, the types perfectly align, and the compiler is happy again.

How to extend the transformer in Kafka Scala?

I am working on a Kafka streaming implementation of a word counter in Scala in which I extended the transformer:
class WordCounter extends Transformer[String, String, (String, Long)]
It is then called in the stream as follows:
val counter: KStream[String, Long] = filtered_record.transform(new WordCounter, "count")
However, I am getting the error below when running my program via sbt:
[error] required: org.apache.kafka.streams.kstream.TransformerSupplier[String,String,org.apache.kafka.streams.KeyValue[String,Long]]
I can't seem to figure out how to fix it, and could not find any appropriate Kafka example of a similar implementation.
Anyone got any idea of what I am doing wrong?
The signature of transform() is:
def transform[K1, V1](transformerSupplier: TransformerSupplier[K, V, KeyValue[K1, V1]],
stateStoreNames: String*): KStream[K1, V1]
Thus, transform() takes a TransformerSupplier as first argument not a Transformer.
See also the javadocs

How to change the key of a KStream and then write to a topic using Scala?

I am using Kafka Streams 1.0, I am reading a topic in a Kstream[String, CustomObject], then I am trying to select a new key that comes from one member of the CustomObject, the code looks like this:
myStream: KStream[String, CustomObject] = builder.stream("topic")
.mapValues {
...
//code to transform json to CustomObject
customObject
}
myStream.selectKey((k,v) => v.id)
.to("outputTopic", Produced.`with`(Serdes.String(),
customObjectSerde))
It gives this error:
Error:(109, 7) overloaded method value to with alternatives:
(x$1: String,x$2: org.apache.kafka.streams.kstream.Produced[?0(in value x$1),com.myobject.CustomObject])Unit <and>
(x$1: org.apache.kafka.streams.processor.StreamPartitioner[_ >: ?0(in value x$1), _ >: com.myobject.CustomObject],x$2: String)Unit
cannot be applied to (String, org.apache.kafka.streams.kstream.Produced[String,com.myobject.CustomObject])
).to("outputTopic", Produced.`with`(Serdes.String(),
I am not able to understand what is wrong.
Hopefully somebody can help me. Thanks!
Kafka Streams API uses Java generic types extensively that make it hard for the Scala compiler to infer types correctly. Thus, you need to specify types manually for some cases to avoid ambiguous method overloads.
Also compare: https://docs.confluent.io/current/streams/faq.html#scala-compile-error-no-type-parameter-java-defined-trait-is-invariant-in-type-t
I good way to avoid this issues, is to not chain multiple operators, but introduce a new typed KStream variable after each operation:
// not this
myStream.selectKey((k,v) => v.id)
.to("outputTopic", Produced.`with`(Serdes.String(),customObjectSerde))
// but this
newStream: KStream[KeyType,ValueType] = myStream.selectKey((k,v) => v.id)
newStream.to("outputTopic", Produced.`with`(Serdes.String(),customObjectSerde))
Btw: Kafka 2.0 will offer a proper Scala API for Kafka Streams (https://cwiki.apache.org/confluence/display/KAFKA/KIP-270+-+A+Scala+Wrapper+Library+for+Kafka+Streams) that will fix those Scala issues.

How to create a method with a spark.broadcast[Map] as parameter?

I know this question looks silly at first, but please look at the code.
I created a broadcasted map this way :
val rdd = sqlc
.read
.format("jdbc")
.options(Map("url" -> driver, "dbtable" -> clientsTable))
.load()
.select("client_name","client_age")
.map { data => (data.getString(0),data.getInt(1)) }
.collectAsMap()
val clients = sqlc.sparkContext.broadcast(rdd)
The I create a method with value as parameter :
def doSomething(clients: Broadcast[Map[String,Int]]) clients.toString()
But, when I call this method in my code,Scala IDE throws this error :
type mismatch; found : org.apache.spark.broadcast.Broadcast[scala.collection.Map[String,Int]] required: org.apache.spark.broadcast.Broadcast[scala.collection.immutable.Map[String,Int]] Note: scala.collection.Map[String,Int] >: Map[String,Int], but class Broadcast is invariant in type T. You may wish to define T as -T instead. (SLS 4.5)
I can't find what is wrong here, even changing the method signature with a silly scala.collection.immutable.Map doesn't work... The compiler gives me the same error.
FYI : I am using scala 2.10 and scala IDE 4.3.0
Thanks for any help.
It appears that PairRDDFunctions.collectAsMap returns a scala.collection.Map, so your function's signature should match this type, and not the more-specific scala.collection.immutable.Map, which is the default when you just write Map:
def doSomething(clients: Broadcast[scala.collection.Map[String,Int]])

Ambiguous Implicit Values when using HMap

HMap seems to be the perfect data structure for my use case, however, I can't get it working:
case class Node[N](node: N)
class ImplVal[K, V]
implicit val iv1 = new ImplVal[Int, Node[Int]]
implicit val iv2 = new ImplVal[Int, Node[String]]
implicit val iv3 = new ImplVal[String, Node[Int]]
val hm = HMap[ImplVal](1 -> Node(1), 2 -> Node("two"), "three" -> Node(3))
My first question is whether it is possible to create those implicits vals automatically. For sure, for typical combinations I could create them manually, but I'm wondering if there is something more generic, less boilerplate way.
Next question is, how to get values out of the map:
val res1 = hm.get(1) // (1) ambiguous implicit values: both value iv2 [...] and value iv1 [...] match expected type ImplVal[Int,V]`
To me, Node[Int] (iv1) and Node[String] (iv2) look pretty different :) I thought, despite the JVM type erasure limitations, Scala could differentiate here. What am I missing? Do I have to use other implicit values to make the difference clear?
The explicit version works:
val res2 = hm.get[Int, Node[Int]](1) // (2) works
Of course, in this simple case, I could add the type information to the get call. But in the following case, where only the keys are known in advance, I don't know how to do it:
def get[T <: HList](keys: T): HList = // return associated values for keys
Is there any simple solution to this problem?
BTW, what documentation about Scala's type system (or Shapeless or in functional programming in general) could be recommended to understand the whole topic better as I have to admit, I'm lacking some background for this topic.
The type of the key determines the type of the value. You have Int keys corresponding to both Node[Int] and Node[String] values, hence the ambiguity. You might find this article helpful in explaining the general mechanism underlying this.