Akka Streams Reactive Kafka - OutOfMemoryError under high load - scala

I am running an Akka Streams Reactive Kafka application which should be functional under heavy load. After running the application for around 10 minutes, the application goes down with an OutOfMemoryError. I tried to debug the heap dump and found that akka.dispatch.Dispatcher is taking ~5GB of memory. Below are my config files.
Akka version: 2.4.18
Reactive Kafka version: 2.4.18
1.application.conf:
consumer {
num-consumers = "2"
c1 {
bootstrap-servers = "localhost:9092"
bootstrap-servers=${?KAFKA_CONSUMER_ENDPOINT1}
groupId = "testakkagroup1"
subscription-topic = "test"
subscription-topic=${?SUBSCRIPTION_TOPIC1}
message-type = "UserEventMessage"
poll-interval = 100ms
poll-timeout = 50ms
stop-timeout = 30s
close-timeout = 20s
commit-timeout = 15s
wakeup-timeout = 10s
use-dispatcher = "akka.kafka.default-dispatcher"
kafka-clients {
enable.auto.commit = true
}
}
2.build.sbt:
java -Xmx6g \
-Dcom.sun.management.jmxremote.port=27019 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.rmi.server.hostname=localhost \
-Dzookeeper.host=$ZK_HOST \
-Dzookeeper.port=$ZK_PORT \
-jar ./target/scala-2.11/test-assembly-1.0.jar
3.Source and Sink actors:
class EventStream extends Actor with ActorLogging {
implicit val actorSystem = context.system
implicit val timeout: Timeout = Timeout(10 seconds)
implicit val materializer = ActorMaterializer()
val settings = Settings(actorSystem).KafkaConsumers
override def receive: Receive = {
case StartUserEvent(id) =>
startStreamConsumer(consumerConfig("EventMessage"+".c"+id))
}
def startStreamConsumer(config: Map[String, String]) = {
val consumerSource = createConsumerSource(config)
val consumerSink = createConsumerSink()
val messageProcessor = startMessageProcessor(actorA, actorB, actorC)
log.info("Starting The UserEventStream processing")
val future = consumerSource.map { message =>
val m = s"${message.record.value()}"
messageProcessor ? m
}.runWith(consumerSink)
future.onComplete {
case _ => actorSystem.stop(messageProcessor)
}
}
def startMessageProcessor(actorA: ActorRef, actorB: ActorRef, actorC: ActorRef) = {
actorSystem.actorOf(Props(classOf[MessageProcessor], actorA, actorB, actorC))
}
def createConsumerSource(config: Map[String, String]) = {
val kafkaMBAddress = config("bootstrap-servers")
val groupID = config("groupId")
val topicSubscription = config("subscription-topic").split(',').toList
println(s"Subscriptiontopics $topicSubscription")
val consumerSettings = ConsumerSettings(actorSystem, new ByteArrayDeserializer, new StringDeserializer)
.withBootstrapServers(kafkaMBAddress)
.withGroupId(groupID)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
.withProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"true")
Consumer.committableSource(consumerSettings, Subscriptions.topics(topicSubscription:_*))
}
def createConsumerSink() = {
Sink.foreach(println)
}
}
In this case actorA, actorB, and actorC are doing some business logic processing and database interaction. Is there anything I am missing in handling the Akka Reactive Kafka consumers such as commit, error, or throttling configuration? Because looking into the heap dump, I could guess that the messages are piling up.

One thing I would change is the following:
val future = consumerSource.map { message =>
val m = s"${message.record.value()}"
messageProcessor ? m
}.runWith(consumerSink)
In the above code, you're using ask to send messages to the messageProcessor actor and expect replies, but in order for ask to function as a backpressure mechanism, you need to use it with mapAsync (more information is in the documentation). Something like the following:
val future =
consumerSource
.mapAsync(parallelism = 5) { message =>
val m = s"${message.record.value()}"
messageProcessor ? m
}
.runWith(consumerSink)
Adjust the level of parallelism as needed.

Related

Why am I getting this timeout during unit test of akka-stream?

I have an akka-gRPC service BiDirectional stream and I am testing it on a unit test. The service has uses akka-stream and I use the TestSink.probe to test the reply message. I am receiving back the messages from the service, but there is an error related to timeout that I cannot figure out what is the reason. This is the test:
object GreeterServiceConf {
// important to enable HTTP/2 in server ActorSystem's config
val configServer = ConfigFactory.parseString("akka.http.server.preview.enable-http2 = on")
.withFallback(ConfigFactory.defaultApplication())
val configString2 =
"""
|akka.grpc.client {
| "helloworld.GreeterService" {
| host = 127.0.0.1
| port = 8080
| }
|}
|""".stripMargin
val configClient = ConfigFactory.parseString(configString2)
}
class GreeterServiceImplSpec extends TestKit(ActorSystem("GreeterServiceImplSpec", ConfigFactory.load(GreeterServiceConf.configServer)))
with AnyWordSpecLike
with BeforeAndAfterAll
with Matchers
with ScalaFutures {
implicit val patience: PatienceConfig = PatienceConfig(scaled(5.seconds), scaled(100.millis))
// val testKit = ActorTestKit(conf)
val serverSystem: ActorSystem = system
val bound = new GreeterServer(serverSystem).run()
// make sure server is bound before using client
bound.futureValue
implicit val clientSystem: ActorSystem = ActorSystem("GreeterClient", ConfigFactory.load(GreeterServiceConf.configClient))
val client = GreeterServiceClient(
GrpcClientSettings
.fromConfig("helloworld.GreeterService")
.withTls(false)
)
override def afterAll: Unit = {
TestKit.shutdownActorSystem(system)
TestKit.shutdownActorSystem(clientSystem)
}
"GreeterService" should {
"reply to multiple requests" in {
import GreeterServiceData._
val names = List("John", "Michael", "Simone")
val expectedReply: immutable.Seq[HelloReply] = names.map { name =>
HelloReply(s"Hello, $name -> ${mapHelloReply.getOrElse(name, "this person does not exist =(")}")
}
val requestStream: Source[HelloRequest, NotUsed] = Source(names).map(name => HelloRequest(name))
val responseStream: Source[HelloReply, NotUsed] = client.sayHelloToAll(requestStream)
val sink = TestSink.probe[HelloReply]
val replyStream = responseStream.runWith(sink)
replyStream
.requestNext(HelloReply(s"Hello, John -> I killed Java"))
.requestNext(HelloReply(s"Hello, Michael -> We are the Jacksons 5"))
.requestNext(HelloReply(s"Hello, Simone -> I have found a job to work with Scala =)")) // THIS IS THE LINE 122 ON THE ERROR
// .request(3)
// .expectNextUnorderedN(expectedReply) // I also tested this but it did not work
.expectComplete()
}
}
}
The error is:
assertion failed: timeout (3 seconds) during expectMsg while waiting
for OnComplete java.lang.AssertionError: assertion failed: timeout (3
seconds) during expectMsg while waiting for OnComplete at
scala.Predef$.assert(Predef.scala:223) at
akka.testkit.TestKitBase.expectMsg_internal(TestKit.scala:459) at
akka.testkit.TestKitBase.expectMsg(TestKit.scala:436) at
akka.testkit.TestKitBase.expectMsg$(TestKit.scala:436) at
akka.testkit.TestKit.expectMsg(TestKit.scala:969) at
akka.stream.testkit.TestSubscriber$ManualProbe.expectComplete(StreamTestKit.scala:479)
at
com.example.helloworld.GreeterServiceImplSpec.$anonfun$new$5(GreeterServiceImplSpec.scala:121)
I got it to work based on the project akka-grpc-quickstart-scala.g8. I am executing runForeach to run the graph and have a materialized Sink on the response stream. Then, when the response is done I am doing an assert inside the Future[Done].
"reply to multiple requests" in {
import GreeterServiceData._
import system.dispatcher
val names = List("John", "Martin", "Michael", "UnknownPerson")
val expectedReplySeq: immutable.Seq[HelloReply] = names.map { name =>
HelloReply(s"Hello, $name -> ${mapHelloReply.getOrElse(name, "this person does not exist =(")}")
}
// println(s"expectedReplySeq: ${expectedReplySeq.foreach(println)}")
val requestStream: Source[HelloRequest, NotUsed] = Source(names).map(name => HelloRequest(name))
val responseStream: Source[HelloReply, NotUsed] = client.sayHelloToAll(requestStream)
val done: Future[Done] = responseStream.runForeach { reply: HelloReply =>
// println(s"got streaming reply: ${reply.message}")
assert(expectedReplySeq.contains(reply))
}
// OR USING Sink.foreach[HelloReply])(Keep.right)
val sinkHelloReply = Sink.foreach[HelloReply] { e =>
println(s"element: $e")
assert(expectedReplySeq.contains(e))
}
responseStream.toMat(sinkHelloReply)(Keep.right).run().onComplete {
case Success(value) => println(s"done")
case Failure(exception) => println(s"exception $exception")
}
}
Just to keep the reference of the whole code, the GreeterServiceImplSpec class is here.

Play + Akka - Join the cluster and ask actor on another ActorSystem

I am able to make Play app join the existing Akka cluster and then make ask call to actor running on another ActorSystem and get results back. But I am having trouble with couple of things -
I see below in logs when play tries to join the cluster. I suspect that Play is starting its own akka cluster? I am really not sure what it means.
Could not register Cluster JMX MBean with name=akka:type=Cluster as it is already registered. If you are running multiple clust
ers in the same JVM, set 'akka.cluster.jmx.multi-mbeans-in-same-jvm = on' in config`
Right now I m re-initializing the actorsystem every time when the request comes to Controller which I know is not right way do it. I am new to Scala, Akka, Play thing and having difficulty figuring out how to make it Singleton service and inject into my controller.
So far I have got this -
class DataRouter #Inject()(controller: DataController) extends SimpleRouter {
val prefix = "/v1/data"
override def routes: Routes = {
case GET(p"/ip/$datatype") =>
controller.get(datatype)
case POST(p"/ip/$datatype") =>
controller.process
}
}
case class RangeInput(start: String, end: String)
object RangeInput {
implicit val implicitWrites = new Writes[RangeInput] {
def writes(range: RangeInput): JsValue = {
Json.obj(
"start" -> range.start,
"end" -> range.end
)
}
}
}
#Singleton
class DataController #Inject()(cc: ControllerComponents)(implicit exec: ExecutionContext) extends AbstractController(cc) {
private val logger = Logger("play")
implicit val timeout: Timeout = 115.seconds
private val form: Form[RangeInput] = {
import play.api.data.Forms._
Form(
mapping(
"start" -> nonEmptyText,
"end" -> text
)(RangeInput.apply)(RangeInput.unapply)
)
}
def get(datatype: String): Action[AnyContent] = Action.async { implicit request =>
logger.info(s"show: datatype = $datatype")
logger.trace(s"show: datatype = $datatype")
//val r: Future[Result] = Future.successful(Ok("hello " + datatype ))
val config = ConfigFactory.parseString("akka.cluster.roles = [gateway]").
withFallback(ConfigFactory.load())
implicit val system: ActorSystem = ActorSystem(SharedConstants.Actor_System_Name, config)
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val ipData = system.actorOf(
ClusterRouterGroup(RandomGroup(Nil), ClusterRouterGroupSettings(
totalInstances = 100, routeesPaths = List("/user/getipdata"),
allowLocalRoutees = false, useRoles = Set("static"))).props())
val res: Future[String] = (ipData ? datatype).mapTo[String]
//val res: Future[List[Map[String, String]]] = (ipData ? datatype).mapTo[List[Map[String,String]]]
val futureResult: Future[Result] = res.map { list =>
Ok(Json.toJson(list))
}
futureResult
}
def process: Action[AnyContent] = Action.async { implicit request =>
logger.trace("process: ")
processJsonPost()
}
private def processJsonPost[A]()(implicit request: Request[A]): Future[Result] = {
logger.debug(request.toString())
def failure(badForm: Form[RangeInput]) = {
Future.successful(BadRequest("Test"))
}
def success(input: RangeInput) = {
val r: Future[Result] = Future.successful(Ok("hello " + Json.toJson(input)))
r
}
form.bindFromRequest().fold(failure, success)
}
}
akka {
log-dead-letters = off
log-dead-letters-during-shutdown = off
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
log-remote-lifecycle-events = off
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = ${myhost}
port = 0
}
}
cluster {
seed-nodes = [
"akka.tcp://MyCluster#localhost:2541"
]
} seed-nodes = ${?SEEDNODE}
}
Answers
Refer to this URL. https://www.playframework.com/documentation/2.6.x/ScalaAkka#Built-in-actor-system-name has details about configuring the actor system name.
You should not initialize actor system on every request, use Play injected actor system in the Application class, if you wish to customize the Actor system, you should do it through modifying the AKKA configuration. For that,
you should create your own ApplicationLoader extending GuiceApplicationLoader and override the builder method to have your own AKKA configuration. Rest of the things taken care by Play like injecting this actor system in Application for you.
Refer to below URL
https://www.playframework.com/documentation/2.6.x/ScalaDependencyInjection#Advanced:-Extending-the-GuiceApplicationLoader

Akka streams Source.actorRef vs Source.queue vs buffer, which one to use?

I am using akka-streams-kafka to created a stream consumer from a kafka topic.
Using broadcast to serve events from kafka topic to web socket clients.
I have found following three approaches to create a stream Source.
Question:
My goal is to serve hundreds/thousands of websocket clients (some of which might be slow consumers). Which approach scales better?
Appreciate any thoughts?
Broadcast lowers the rate down to slowest consumer.
BUFFER_SIZE = 100000
Source.ActorRef (source actor does not support backpressure option)
val kafkaSourceActorWithBroadcast = {
val (sourceActorRef, kafkaSource) = Source.actorRef[String](BUFFER_SIZE, OverflowStrategy.fail)
.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run
Consumer.plainSource(consumerSettings,
Subscriptions.topics(KAFKA_TOPIC))
.runForeach(record => sourceActorRef ! Util.toJson(record.value()))
kafkaSource
}
Source.queue
val kafkaSourceQueueWithBroadcast = {
val (futureQueue, kafkaQueueSource) = Source.queue[String](BUFFER_SIZE, OverflowStrategy.backpressure)
.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run
Consumer.plainSource(consumerSettings, Subscriptions.topics(KAFKA_TOPIC))
.runForeach(record => futureQueue.offer(Util.toJson(record.value())))
kafkaQueueSource
}
buffer
val kafkaSourceWithBuffer = Consumer.plainSource(consumerSettings, Subscriptions.topics(KAFKA_TOPIC))
.map(record => Util.toJson(record.value()))
.buffer(BUFFER_SIZE, OverflowStrategy.backpressure)
.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.right).run
Websocket route code for completeness:
val streamRoute =
path("stream") {
handleWebSocketMessages(websocketFlow)
}
def websocketFlow(where: String): Flow[Message, Message, NotUsed] = {
Flow[Message]
.collect {
case TextMessage.Strict(msg) => Future.successful(msg)
case TextMessage.Streamed(stream) =>
stream.runFold("")(_ + _).flatMap(msg => Future.successful(msg))
}
.mapAsync(parallelism = PARALLELISM)(identity)
.via(logicStreamFlow)
.map { msg: String => TextMessage.Strict(msg) }
}
private def logicStreamFlow: Flow[String, String, NotUsed] =
Flow.fromSinkAndSource(Sink.ignore, kafkaSourceActorWithBroadcast)

Parsing stops with Akka Streams mapAsync

I am parsing 50000 records which contain their titles and URLs on the web page. While parsing, I am writing them to the database, which is PostgreSQL. I deployed my application using docker-compose. However, it keeps stopping on some page without any reason. I tried to write some logs to figure out what's happening, but there is no connection error or anything like that.
Here is my code for parsing and writing to the database:
object App {
val db = Database.forURL("jdbc:postgresql://db:5432/toloka?user=user&password=password")
val browser = JsoupBrowser()
val catRepo = new CategoryRepo(db)
val torrentRepo = new TorrentRepo(db)
val torrentForParseRepo = new TorrentForParseRepo(db)
val parallelismFactor = 10
val groupFactor = 10
implicit val system = ActorSystem("TolokaParser")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
def parseAndWriteTorrentsForParseToDb(doc: App.browser.DocumentType) = {
Source(getRecordsLists(doc))
.grouped(groupFactor)
.mapAsync(parallelismFactor) { torrentForParse: Seq[TorrentForParse] =>
torrentForParseRepo.createInBatch(torrentForParse)
}
.runWith(Sink.ignore)
}
def getRecordsLists(doc: App.browser.DocumentType) = {
val pages = generatePagesFromHomePage(doc)
println("torrent links generated")
println(pages.size)
val result = for {
page <- pages
} yield {
println(s"Parsing torrent list...$page")
val tmp = getTitlesAndLinksTuple(getTitlesList(browser.get(page)), getLinksList(browser.get(page)))
println(tmp.size)
tmp
}
println("torrent links and names tupled")
result flatten
}
}
What may be the cause of such problems?
Put a supervision strategy to avoid stream finalization in case of error. Such as:
val decider: Supervision.Decider = {
case _ => Supervision.Resume
}
def parseAndWriteTorrentsForParseToDb = {
Source.fromIterator(() => List(1,2,3).toIterator)
.grouped(1)
.mapAsync(1) { torrentForParse: Seq[Int] =>
Future { 0 }
}
.withAttributes(ActorAttributes.supervisionStrategy(decider))
.runWith(Sink.ignore)
}
The stream should not stop with this async stage config

request-reply with akka-camel and ActiveMQ

Update: It would seem that an even simpler test case is not working: just trying to send a message from an ActiveMQ producer to an ActiveMQ consumer via the in-process broker. Here is the code:
val brokerURL = "vm://localhost?broker.persistent=false"
val connectionFactory = new ActiveMQConnectionFactory(brokerURL)
val connection = connectionFactory.createConnection()
val session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
val queue = session.createQueue("foo.bar")
val producer = session.createProducer(queue)
val consumer = session.createConsumer(queue)
val message = session.createTextMessage("marco")
producer.send(message)
val resp = consumer.receive(2000)
assert(resp != null)
I'm trying to implement a very simple request-reply pattern using akka-camel. Here's my (testbench) code which is trying to use activeMQ directly to send a message and expect a response:
val brokerURL = "vm://localhost?broker.persistent=false"
// create in-process broker, session, queue, etc...
val connectionFactory = new ActiveMQConnectionFactory(brokerURL)
val connection = connectionFactory.createConnection()
val session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
val queue = session.createQueue("myapp.somequeue")
val producer = session.createProducer(queue)
val tempDest = session.createTemporaryQueue()
val respConsumer = session.createConsumer(tempDest)
val message = session.createTextMessage("marco")
message.setJMSReplyTo(tempDest)
message.setJMSCorrelationID("myCorrelationID")
// create actor system with CamelExtension
val camel = CamelExtension(system)
val camelContext = camel.context
camelContext.addComponent("activemq", ActiveMQComponent.activeMQComponent(brokerURL))
val listener = system.actorOf(Props[Frontend])
// send a message, expect a response
producer.send(message)
val resp: TextMessage = respConsumer.receive(5000).asInstanceOf[TextMessage]
assert(resp.getText() == "polo")
I've tried two different approaches for the Consumer actor. The first is simpler, which attempts to respond using sender !:
class Frontend extends Actor with Consumer {
def endpointUri = "activemq:myapp.somequeue"
override def autoAck = false
def receive = {
case msg: CamelMessage => {
println("received %s" format msg.bodyAs[String])
sender ! "polo"
}
}
}
The second attempts to reply using the CamelTemplate:
class Frontend extends Actor with Consumer {
def endpointUri = "activemq:myapp.somequeue"
override def autoAck = false
def receive = {
case msg: CamelMessage => {
println("received %s" format msg.bodyAs[String])
val replyTo = msg.getHeaderAs("JMSReplyTo", classOf[ActiveMQTempQueue], camelContext)
val correlationId = msg.getHeaderAs("JMSCorrelationID", classOf[String], camelContext)
camel.template.sendBodyAndHeader("activemq:"+replyTo.getQueueName(), "polo", "JMSCorrelationID", correlationId)
}
}
}
I do see the println() output from my actor's receive method, so the ActiveMQ message is getting into the actor, but I get a timeout on the respConsumer.receive() call in the testbench. I've tried lots of combinations of specifying and not specifying headers in the reply. I've also tried enabling and disabling autoAck.
Thanks in advance.
Turns out I needed to call connection.start() in the JMS code.