Kafka topic to websocket - scala

I am trying to implement a setup where I have multiple web browsers open a websocket connection to my akka-http server in order to read all messages posted to a kafka topic.
so the stream of messages should go this way
kafka topic -> akka-http -> websocket connection 1
-> websocket connection 2
-> websocket connection 3
For now I have created a path for the websocket:
val route: Route =
path("ws") {
handleWebSocketMessages(notificationWs)
}
Then I have created a consumer for my kafka topic:
val consumerSettings = ConsumerSettings(system,
new ByteArrayDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val source = Consumer
.plainSource(consumerSettings, Subscriptions.topics("topic1"))
And then finally I want to connect this source to the websocket in handleWebSocketMessages
def handleWebSocketMessages: Flow[Message, Message, Any] =
Flow[Message].mapConcat {
case tm: TextMessage =>
TextMessage(source)::Nil
case bm: BinaryMessage =>
// ignore binary messages but drain content to avoid the stream being clogged
bm.dataStream.runWith(Sink.ignore)
Nil
}
Here is the error I get when I try to use source in the TextMessage:
Error:(77, 9) overloaded method value apply with alternatives:
(textStream: akka.stream.scaladsl.Source[String,Any])akka.http.scaladsl.model.ws.TextMessage
(text: String)akka.http.scaladsl.model.ws.TextMessage.Strict
cannot be applied to (akka.stream.scaladsl.Source[org.apache.kafka.clients.consumer.ConsumerRecord[Array[Byte],String],akka.kafka.scaladsl.Consumer.Control])
TextMessage(source)::Nil
I think I'm making numerous mistakes along the way but I would say that the most blocking part is the handleWebSocketMessages.

The first thing, is to understand that source is of type : Source[ConsumerRecord[K, V], Control].
So, it's not something that you could pass as an argument of a TextMessage.
Now, let's take the websocket's point of view:
An outgoing message is built for each message in the Kafka source. The message will be a TextMessage from a String transformation of the Kafka message.
For each incoming message, just println() it
So, the Flow can be seen as two components: the Source & the Sink.
val incomingMessages: Sink[Message, NotUsed] =
Sink.foreach(println(_))
val outgoingMessages: Source[Message, NotUsed] =
source
.map { consumerRecord => TextMessage(consumerRecord.record.value) }
val handleWebSocketMessages: Flow[Message, Message, Any]
= Flow.fromSinkAndSource(incomingMessages, outgoingMessages)
Hope it helps.

Related

One to One instant messaging using Kafka

I'm using Scala and Kafka to create topic based pub-sub architecture.
My question is how can I handle One-to-One Messaging part of my application using Kafka topics?
This is my producer class:
class Producer(topic: String, key: String, brokers: String, message: String) {
val producer = new KafkaProducer[String, String](configuration)
private def configuration: Properties = {
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ProducerConfig.ACKS_CONFIG, "all")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getCanonicalName)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getCanonicalName)
props
}
def sendMessages(): Unit = {
val record = new ProducerRecord[String, String](topic, key, message)
producer.send(record)
producer.close()
}
}
And this is my consumer class:
class Consumer(brokers: String, topic: String, groupId: String) {
val consumer = new KafkaConsumer[String, String](configuration)
consumer.subscribe(util.Arrays.asList(topic))
private def configuration: Properties = {
val props = new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getCanonicalName)
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getCanonicalName)
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId)
//props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true)
props
}
def receiveMessages(): Unit = {
while (true) {
consumer.poll(Duration.ofSeconds(0)).forEach(record => println(s"Received message: $record"))
}
}
}
I also have an auth service that takes cares care of everything related to authenticating via JWT tokens.
I am confused on how to create messages to specific users, I thought about creating a "Messages" class but I got lost when it comes to how to send these "specific" users messages and how to partition these messages on kafka for later usage:
class Message {
def sendMessage(sender_id: String, receiver_id: String, content: String): Unit ={
val newMessage = new Producer(brokers = KAFKA_BROKER,key =sender_id + " to " + receiver_id, topic = "topic_1", message = content)
newMessage.sendMessages()
}
def loadMessage(): Unit ={
//
}
}
My thought was to specify a custom key for all messages belonging to the same conversation but I couldn't find the right way to retrieve these messages later on as my consumer returns everything contained in that topic no matter what the key is. Meaning, all the users will eventually get all the messages. I know my modeling seems messy but I couldn't find the right way to do it, I'm also kinda confused when it comes to the usage of the group_id in the consumer.
Could someone make me what's the right way to achieve what I'm trying to do here please ?
couldn't find the right way to retrieve these messages later on ... consumer returns everything contained in that topic no matter what the key is
You would need to .assign the Consumer instance to a specific partition, not use .subscribe, which reads all partitions. Or you'd use specific topics for each conversation.
But then you need unique partitions/topics for every conversation that will exist. In a regular chat application where users create/remove rooms randomly, that will not scale for Kafka.
Ultimately, I'd suggest writing your data to somewhere else than Kafka that you can actually query and index on a "convertsationId" and/or user ids rather than try to forward those events directly from Kafka into your "chat" application.

Akka HTTP client side websocket closes unexpectedly

I have a websocket endpoint which sends a text message to the client every second. The client never sends any message to the server.
Using the below JS code, it works as expected, it keeps logging out the message every second:
var ws = new WebSocket("ws://url_of_my_endpoint");
ws.onmessage = (message) => console.log(message.data);
I want to create a similar consumer in Scala, using Akka HTTP.
I have created the below code, based on the official docs.
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
import system.dispatcher
val url = "ws://url_of_my_endpoint"
val outgoing: Source[Message, NotUsed] = Source.empty
val webSocketFlow =
Http().webSocketClientFlow(WebSocketRequest(url))
val printSink: Sink[Message, Future[Done]] =
Sink.foreach[Message] {
case message: TextMessage.Strict =>
println("message received: " + message.text)
case _ => println("some other message")
}
val (upgradeResponse, closed) =
outgoing
.viaMat(webSocketFlow)(Keep.right)
.toMat(printSink)(Keep.both)
.run()
val connected = upgradeResponse.map { upgrade =>
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Done
} else {
throw new RuntimeException(s"Connection failed: ${upgrade.response.status}")
}
}
connected.onComplete(_ => println("Connection established."))
closed.foreach(_ => println("Connection closed."))
The problem is that the connection closes after a few seconds. Sometimes after 1 sec, sometimes after 3-4 seconds. The JS client works just fine, so I assume that the problem is not on the server.
What is the problem in the code? How should it be changed, so it tells me what went wrong?
From the documentation:
Note
Inactive WebSocket connections will be dropped according to the idle-timeout settings. In case you need to keep inactive connections alive, you can either tweak your idle-timeout or inject ‘keep-alive’ messages regularly.
The problem is that you're not sending any messages through the stream, so the inactive connection is closed:
val outgoing: Source[Message, NotUsed] = Source.empty
Try something like the following, which sends a random TextMessage every second:
import scala.concurrent.duration._
val outgoing: Source[Message, NotUsed] =
Source
.fromIterator(() => Iterator.continually(TextMessage(scala.util.Random.nextInt().toString)))
.throttle(1, 1 second)
Alternatively, adjust the aforementioned idle timeout settings or configure the automatic keep-alive support:
This is supported in a transparent way via configuration by setting the: akka.http.client.websocket.periodic-keep-alive-max-idle = 1 second to a specified max idle timeout. The keep alive triggers when no other messages are in-flight during the such configured period. Akka HTTP will then automatically send a Ping frame for each of such idle intervals.
By default, the automatic keep-alive feature is disabled.
It's possible the documentation has changed since you looked at it as there's now a section to deal with the problem you're having:
https://doc.akka.io/docs/akka-http/current/client-side/websocket-support.html#half-closed-websockets
It explains:
The Akka HTTP WebSocket API does not support half-closed connections which means that if either stream completes the entire connection is closed (after a “Closing Handshake” has been exchanged or a timeout of 3 seconds has passed). This may lead to unexpected behavior, for example if we are trying to only consume messages coming from the server
So the line
val outgoing: Source[Message, NotUsed] = Source.empty
is causing the problem. And could be fixed with the below line which never completes (unless you complete the Promise linked to Source.maybe)
val outgoing = Source.empty.concatMat(Source.maybe[Message])(Keep.right)
I ran into this problem myself and find the behaviour pretty confusing.

Possible encoding issue with Google PubSub

When running a subscription source from the Alpakka PubSub library I received possible encoded data.
#Singleton
class Consumer #Inject()(config: Configuration, credentialsService: google.creds.Service)(implicit actorSystem: ActorSystem) {
implicit val m: ActorMaterializer = ActorMaterializer.create(actorSystem)
val logger = Logger(this.getClass)
val subName: String = config.get[String]("google.pubsub.subname")
val credentials: Credentials = credentialsService.getCredentials
val pubSubConfig = PubSubConfig(credentials.projectId, credentials.clientEmail, credentials.privateKey)
val subSource: Source[ReceivedMessage, NotUsed] = GooglePubSub.subscribe(subName, pubSubConfig)
val ackSink: Sink[AcknowledgeRequest, Future[Done]] = GooglePubSub.acknowledge(subName, pubSubConfig)
val computeGraph = Flow[ReceivedMessage].map {
x =>
logger.info(x.message.data)
x
}
val ackGraph = Flow.fromFunction((msgs: Seq[ReceivedMessage]) => AcknowledgeRequest(msgs.map(_.ackId).toList))
subSource
.via(computeGraph)
.groupedWithin(10, 5.minutes)
.via(ackGraph)
.to(ackSink)
.run()
}
I publish the message from the PubSub console. I am expecting my test message to appear however when publishing test I receive dGVzdA==. Is this an expected result? I have had issues with importing the private key and this might be a result of it?
The consumer is bound eagerly with Guice.
Data that is received over REST apis will be base64 encoded. My guess would be that the Alpakka Pub/Sub library which uses the REST APIs is not properly decoding the received data. It looks like they also have a library that uses the GRPC Pub/Sub client as the underlying layer which may not suffer from this defect? You can also use the Cloud Pub/Sub Java client library from Scala directly.

Cannot see message while sinking kafka stream and cannot see print message in flink 1.2

My goal is to use kafka to read in a string in json format, do a filter to the string and then sink the message out (still in json string format).
For testing purpose, my input string message looks like:
{"a":1,"b":2}
And my code of implementation is:
def main(args: Array[String]): Unit = {
// parse input arguments
val params = ParameterTool.fromArgs(args)
if (params.getNumberOfParameters < 4) {
println("Missing parameters!\n"
+ "Usage: Kafka --input-topic <topic> --output-topic <topic> "
+ "--bootstrap.servers <kafka brokers> "
+ "--zookeeper.connect <zk quorum> --group.id <some id> [--prefix <prefix>]")
return
}
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.getConfig.disableSysoutLogging
env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000))
// create a checkpoint every 5 seconds
env.enableCheckpointing(5000)
// make parameters available in the web interface
env.getConfig.setGlobalJobParameters(params)
// create a Kafka streaming source consumer for Kafka 0.10.x
val kafkaConsumer = new FlinkKafkaConsumer010(
params.getRequired("input-topic"),
new JSONKeyValueDeserializationSchema(false),
params.getProperties)
val messageStream = env.addSource(kafkaConsumer)
val filteredStream: DataStream[ObjectNode] = messageStream.filter(node => node.get("a").asText.equals("1")
&& node.get("b").asText.equals("2"))
messageStream.print()
// Refer to: https://stackoverflow.com/documentation/apache-flink/9004/how-to-define-a-custom-deserialization-schema#t=201708080802319255857
filteredStream.addSink(new FlinkKafkaProducer010[ObjectNode](
params.getRequired("output-topic"),
new SerializationSchema[ObjectNode] {
override def serialize(element: ObjectNode): Array[Byte] = element.toString.getBytes()
}, params.getProperties
))
env.execute("Kafka 0.10 Example")
}
As can be seen, I want to print message stream to the console and sink the filtered message to kafka. However, I can see neither of them.
The interesting thing is, if I modify the schema of KafkaConsumer from JSONKeyValueDeserializationSchema to SimpleStringSchema, I can see messageStream print to the console. Code as shown below:
val kafkaConsumer = new FlinkKafkaConsumer010(
params.getRequired("input-topic"),
new SimpleStringSchema,
params.getProperties)
val messageStream = env.addSource(kafkaConsumer)
messageStream.print()
This makes me think if I use JSONKeyValueDeserializationSchema, my input message is actually not accepted by Kafka. But this seems so weird and quite different from the online document(https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html)
Hope someone can help me out!
The JSONKeyValueDeserializationSchema() expects message key with each kafka msg and I am assuming that no key is supplied when the JSON messages are produced and sent over the kafka topic.
Thus to solve the issue, try using JSONDeserializationSchema() which expects only the message and creates an object node based on the message received.

running akka stream in parallel

I have a stream that
listens for HTTP post receiving a list of events
mapconcat the list of events in stream elements
convert events in kafka record
produce the record with reactive kafka (akka stream kafka producer sink)
Here is the simplified code
// flow to split group of lines into lines
val splitLines = Flow[List[Evt]].mapConcat(list=>list)
// sink to produce kafka records in kafka
val kafkaSink: Sink[Evt, Future[Done]] = Flow[Evt]
.map(evt=> new ProducerRecord[Array[Byte], String](evt.eventType, evt.value))
.toMat(Producer.plainSink(kafka))(Keep.right)
val routes = {
path("ingest") {
post {
(entity(as[List[ReactiveEvent]]) & extractMaterializer) { (eventIngestList,mat) =>
val ingest= Source.single(eventIngestList).via(splitLines).runWith(kafkaSink)(mat)
val result = onComplete(ingest){
case Success(value) => complete(s"OK")
case Failure(ex) => complete((StatusCodes.InternalServerError, s"An error occurred: ${ex.getMessage}"))
}
complete("eventList ingested: " + result)
}
}
}
}
Could you highlight me what is run in parallel and what is sequential ?
I think the mapConcat sequentialize the events in the stream so how could I parallelize the stream so after the mapConcat each step would be processed in parallel ?
Would a simple mapAsyncUnordered be sufficient ? Or should I use the GraphDSL with a Balance and Merge ?
In your case it will be sequential I think. Also you're getting whole request before you start pushing data to Kafka. I'd use extractDataBytes directive that gives you src: Source[ByteString, Any]. Then I'd process it like
src
.via(Framing.delimiter(ByteString("\n"), 1024 /* Max size of line */ , allowTruncation = true).map(_.utf8String))
.mapConcat { line =>
line.split(",")
}.async
.runWith(kafkaSink)(mat)