Kafka Streams Scala Memory Leak

Kafka Streams Scala Memory Leak - scala

Following use case:
I want to aggregate data for a specific time and then downstream them. Since the built-in suppress-feature does not support wall clock time, I have to implement this on my own by using a transformer.
After the time window is closed I downstream the aggregated data and delete them from the state store. I tested the behaviour with a limited amount of data. I.e. after all data have been processed the state store should be empty again and the memory should decrease. Unfortunately the memory always stays at the same level.
SuppressTransformer.scala
class SuppressTransformer[T](stateStoreName: String, windowDuration: Duration) extends Transformer[String, T, KeyValue[String, T]] {
val scheduleInterval: Duration = Duration.ofSeconds(180)
private val keySet = mutable.HashSet.empty[String]
var context: ProcessorContext = _
var store: SessionStore[String, Array[T]] = _
override def init(context: ProcessorContext): Unit = {
this.context = context;
this.store = context.getStateStore(stateStoreName).asInstanceOf[SessionStore[String, Array[T]]]
this.context.schedule(
scheduleInterval,
PunctuationType.WALL_CLOCK_TIME,
_ => {
for (key <- keySet) {
val storeEntry = store.fetch(key)
while (storeEntry.hasNext) {
val keyValue: KeyValue[Windowed[String], Array[T]] = storeEntry.next()
val peekKey = keyValue.key
val now = Instant.now()
val windowAge: Long = ChronoUnit.SECONDS.between(peekKey.window().startTime(), now)
if (peekKey.window().start() > 0 && windowAge > windowDuration.toSeconds) { // Check if window is exceeded. If yes, downstream data
val windowedKey: Windowed[String] = keyValue.key
val storeValue = keyValue.value
context.forward(key, storeValue, To.all().withTimestamp(now.toEpochMilli))
context.commit()
this.store.remove(windowedKey) // Delete entry from state store
keySet -= key
}
}
storeEntry.close() // Close iterator to avoid memory leak
}
}
)
}
override def transform(key: String, value: T): KeyValue[String, T] = {
if (!keySet.contains(key)) {
keySet += key
}
null
}
override def close(): Unit = {}
}
class SuppressTransformerSupplier[T](stateStoreName: String, windowDuration: Duration) extends TransformerSupplier[String, T, KeyValue[String, T]] {
override def get(): SuppressTransformer[T] = new SuppressTransformer(stateStoreName, windowDuration)
}
Topology.scala
val windowDuration = Duration.ofMinutes(5)
val stateStore: Materialized[String, util.ArrayList[Bytes], ByteArraySessionStore] =
Materialized
.as[String, util.ArrayList[Bytes]](
new RocksDbSessionBytesStoreSupplier(stateStoreName,
stateStoreRetention.toMillis)
)
builder.stream[String, Bytes](Pattern.compile(topic + "(-\\d+)?"))
.filter((k, _) => k != null)
.groupByKey
.windowedBy(SessionWindows `with` sessionWindowMinDuration `grace` sessionGracePeriodDuration)
.aggregate(initializer = {
new util.ArrayList[Bytes]()
}
)(aggregator = (_: String, instance: Bytes, agg: util.ArrayList[Bytes]) => {
agg.add(instance)
agg
}, merger = (_: String, state1: util.ArrayList[Bytes], state2: util.ArrayList[Bytes]) => {
state1.addAll(state2)
state1
}
)(stateStore)
.toStream
.map((k, v) => (k.key(), v))
.transform(new SuppressTransformerSupplier[util.ArrayList[Bytes]](stateStoreName, windowDuration), stateStoreName)
.unsetRepartitioningRequired()
.to(f"$topic-aggregated")

I don't think that is a memory leak. I mean it could be. But from what you mentioned, it looks like normal Java behavior.
What happens is that JVM takes all the memory that it can. It is the heap memory and the maximum is configured by the Xmx option. Your state takes it all (I assume, based on the graph) and then releases the objects. But JVM normally doesn't release the memory back to the OS. That is the reason your pod is always at its highest.
There are a few garbage colletors that could possibly do that for you.
I personally use the GC that is faster and let JVM take as much memory as it requires. At the end of the day, that's the power of pod isolation. I normally set the heap max to %80 of the pod max memory.
Here is a related question Does GC release back memory to OS?

Related

Scala: Process futures in batches sorted by (approximate) completion time

// I have hundreds of tasks converting inputs into outputs, which should be persisted.
case class Op(i: Int)
case class Output(i: Int)
val inputs: Seq[Op] = ??? // Number of inputs is huge
def executeLongRunning(op: Op): Output = {
Thread.sleep(Random.nextInt(1000) + 1000) // I cannot predict which tasks will finish first
println("<==", op)
Output(op.i)
}
def executeSingleThreadedSave(outputs: Seq[Output]): Unit = {
synchronized { // Problem is, persisting output is itself a long-running process,
// which cannot be parallelized (internally uses blocking queue).
Thread.sleep(5000) // persist time is independent of outputs.size
println("==>", outputs) // Order of persisted records does not matter
}
}
// TODO: this needs to be implemented
def magicSaver(eventualOutputs: Seq[Future[Output]], saver: Seq[Output] => Unit): Unit = ???
val eventualOutputs: Seq[Future[Output]] = inputs.map((input: Op) => Future(executeLongRunning(input)))
magicSaver(eventualOutputs, executeSingleThreadedSave)
I could implement magicSaver to be:
def magicSaver(eventualOutputs: Seq[Future[Output]], saver: Seq[Output] => Unit): Unit = {
saver(Await.result(Future.sequence(eventualOutputs), Duration.Inf))
}
But this has major drawback that we're waiting for all inputs to get processed before we're starting to persisting outputs, which is not ideal from fault-tolerance standpoint.
Another implementation is:
def magicSaver(eventualOutputs: Seq[Future[Output]], saver: Seq[Output] => Unit): Unit = {
eventualOutputs.foreach(_.onSuccess { case output: Output => saver(Seq(output)) })
}
but this blows up execution time to inputs.size * 5secs (because of synchronized nature of, which is not acceptable.
I want a way to batch together already completed futures, when number of such futures reached some trade-off size (100, for example), but I'm not sure how to do that in clean manner without explicitly coding polling logic:
def magicSaver(eventualOutputs: Seq[Future[Output]], saver: Seq[Output] => Unit): Unit = {
def waitFor100CompletedFutures(eventualOutputs: Seq[Future[Output]]): (Seq[Output], Seq[Future[Output]]) = {
var completedCount: Int = 0
do {
completedCount = eventualOutputs.count(_.isCompleted)
Thread.sleep(100)
} while ((completedCount < 100) && (completedCount != eventualOutputs.size))
val (completed: Seq[Future[Output]], remaining: Seq[Future[Output]]) = eventualOutputs.partition(_.isCompleted)
(Await.result(Future.sequence(completed), Duration.Inf), remaining)
}
var completed: Seq[Output] = null
var remaining: Seq[Future[Output]] = eventualOutputs
do {
(completed: Seq[Output], remaining: Seq[Future[Output]]) = waitFor100CompletedFutures(remaining)
saver(completed)
} while (remaining.nonEmpty)
}
Any elegant solution I'm missing here?

I'm posting my solution here, for reference. It has the benefit that it avoids batching altogether, and invokes processOutput as soon as output becomes available, which is the best situation under constraints I've described.
def magicSaver[T, R](eventualOutputs: Seq[Future[T]],
processOutput: Seq[T] => R)(implicit ec: ExecutionContext): Seq[R] = {
logInfo(s"Size of outputs to save: ${eventualOutputs.size}")
var remaining: Seq[Future[T]] = eventualOutputs
val processorOutput: mutable.ListBuffer[R] = new mutable.ListBuffer[R]
do {
val (currentCompleted: Seq[Future[T]], currentRemaining: Seq[Future[T]]) = remaining.partition(_.isCompleted)
if (remaining.size == currentRemaining.size) {
Thread.sleep(100)
} else {
logInfo(s"Got ${currentCompleted.size} completed records, remaining ${currentRemaining.size}")
val completed = currentCompleted.map(Await.result(_, Duration.Zero))
processorOutput.append(processOutput(completed))
}
remaining = currentRemaining
} while (remaining.nonEmpty)
processorOutput
}

Running two scala functions in parallel, returning the latest value after 5 minutes

I have two Scala functions that are expensive to run. Each one is like below, they start improving the value of a variable and I'd like to run them simultaneously and after 5 minutes (or some other time). I'd like to terminate the two functions and take their latest value up to that time.
def func1(n: Int): Double = {
var a = 0.0D
while (not terminated) {
/// improve value of 'a' with algorithm 1
}
}
def func2(n: Int): Double = {
var a = 0.0D
while (not terminated) {
/// improve value of 'a' with algorithm 2
}
}
I would like to know how I should structure my code for doing that and what is the best practice here? I was thinking about running them in two different threads with a timeout and return their latest value at time out. But it seems there can be other ways for doing that. I am new to Scala so any insight would be tremendously helpful.

It is not hard. Here is one way of doing it:
#volatile var terminated = false
def func1(n: Int): Double = {
var a = 0.0D
while (!terminated) {
a = 0.0001 + a * 0.99999; //some useless formula1
}
a
}
def func2(n: Int): Double = {
var a = 0.0D
while (!terminated) {
a += 0.0001 //much simpler formula2, just for testing
}
a
}
def main(args: Array[String]): Unit = {
val f1 = Future { func1(1) } //work starts here
val f2 = Future { func2(2) } //and here
//aggregate results into one common future
val aggregatedFuture = for{
f1Result <- f1
f2Result <- f2
} yield (f1Result, f2Result)
Thread.sleep(500) //wait here for some calculations in ms
terminated = true //this is where we actually command to stop
//since looping to while() takes time, we need to wait for results
val res = Await.result(aggregatedFuture, 50.millis)
//just a printout
println("results:" + res)
}
But, of course, you would want to maybe look at your while loops and create a more manageable and chainable calculations.
Output: results:(9.999999999933387,31206.34691883926)

I am not 100% sure if this is something you would want to do, but here is one approach (not for 5 minutes, but you can change that) :
object s
{
def main(args: Array[String]): Unit = println(run())
def run(): (Int, Int) =
{
val (s, numNanoSec, seedVal) = (System.nanoTime, 500000000L, 0)
Seq(f1 _, f2 _).par.map(f =>
{
var (i, id) = f(seedVal)
while (System.nanoTime - s < numNanoSec)
{
i = f(i)._1
}
(i, id)
}).seq.maxBy(_._1)
}
def f1(a: Int): (Int, Int) = (a + 1, 1)
def f2(a: Int): (Int, Int) = (a + 2, 2)
}
Output:
me#ideapad:~/junk> scala s.scala
(34722678,2)
me#ideapad:~/junk> scala s.scala
(30065688,2)
me#ideapad:~/junk> scala s.scala
(34650716,2)
Of course this all assumes you have at least two threads available to distribute tasks to.

You can use Future with Await result to do that:
def fun2(): Double = {
var a = 0.0f
val f = Future {
// improve a with algorithm 2
a
}
try {
Await.result(f, 5 minutes)
} catch {
case e: TimeoutException => a
}
}
use the Await.result to wait algorithm with timeout, when we met this timeout, we return the a directly

cache using functional callbacks/ proxy pattern implementation scala

How to implement cache using functional programming
A few days ago I came across callbacks and proxy pattern implementation using scala.
This code should only apply inner function if the value is not in the map.
But every time map is reinitialized and values are gone (which seems obivous.
How to use same cache again and again between different function calls
class Aggregator{
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
(t:Int) => {
if (!cache.contains(t)) {
println("Evaluating..."+t)
val r = function.apply(t);
cache.put(t,r)
r
}
else
{
cache.get(t).get;
}
}
}
def memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
}
object Aggregator {
def main( args: Array[String] ) {
val agg = new Aggregator()
agg.memoizedDoubler(2)
agg.memoizedDoubler(2)// It should not evaluate again but does
agg.memoizedDoubler(3)
agg.memoizedDoubler(3)// It should not evaluate again but does
}

I see what you're trying to do here, the reason it's not working is that every time you call memoizedDoubler it's first calling memorize. You need to declare memoizedDoubler as a val instead of def if you want it to only call memoize once.
val memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
This answer has a good explanation on the difference between def and val. https://stackoverflow.com/a/12856386/37309

Aren't you declaring a new Map per invocation ?
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
rather than specifying one per instance of Aggregator ?
e.g.
class Aggregator{
private val cache = HashMap[Int, Int]()
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {

To answer your question:
How to implement cache using functional programming
In functional programming there is no concept of mutable state. If you want to change something (like cache), you need to return updated cache instance along with the result and use it for the next call.
Here is modification of your code that follows that approach. function to calculate values and cache is incorporated into Aggregator. When memoize is called, it returns tuple, that contains calculation result (possibly taken from cache) and new Aggregator that should be used for the next call.
class Aggregator(function: Function[Int, Int], cache:Map[Int, Int] = Map.empty) {
def memoize:Int => (Int, Aggregator) = {
t:Int =>
cache.get(t).map {
res =>
(res, Aggregator.this)
}.getOrElse {
val res = function(t)
(res, new Aggregator(function, cache + (t -> res)))
}
}
}
object Aggregator {
def memoizedDoubler = new Aggregator((key:Int) => {
println("Evaluating..." + key)
key*2
})
def main(args: Array[String]) {
val (res, doubler1) = memoizedDoubler.memoize(2)
val (res1, doubler2) = doubler1.memoize(2)
val (res2, doubler3) = doubler2.memoize(3)
val (res3, doubler4) = doubler3.memoize(3)
}
}
This prints:
Evaluating...2
Evaluating...3

Thread-safely transforming a value in a mutable map

Suppose I want to use a mutable map in Scala to keep track of the number of times I've seen some strings. In a single-threaded context, this is easy:
import scala.collection.mutable.{ Map => MMap }
class Counter {
val counts = MMap.empty[String, Int].withDefaultValue(0)
def add(s: String): Unit = counts(s) += 1
}
Unfortunately this isn't thread-safe, since the get and the update don't happen atomically.
Concurrent maps add a few atomic operations to the mutable map API, but not the one I need, which would look something like this:
def replace(k: A, f: B => B): Option[B]
I know I can use ScalaSTM's TMap:
import scala.concurrent.stm._
class Counter {
val counts = TMap.empty[String, Int]
def add(s: String): Unit = atomic { implicit txn =>
counts(s) = counts.get(s).getOrElse(0) + 1
}
}
But (for now) that's still an extra dependency. Other options would include actors (another dependency), synchronization (potentially less efficient), or Java's atomic references (less idiomatic).
In general I'd avoid mutable maps in Scala, but I've occasionally needed this kind of thing, and most recently I've used the STM approach (instead of just crossing my fingers and hoping I don't get bitten by the naïve solution).
I know there are a number of trade-offs here (extra dependencies vs. performance vs. clarity, etc.), but is there anything like a "right" answer to this problem in Scala 2.10?

How about this one? Assuming you don't really need a general replace method right now, just a counter.
import java.util.concurrent.ConcurrentHashMap
import java.util.concurrent.atomic.AtomicInteger
object CountedMap {
private val counts = new ConcurrentHashMap[String, AtomicInteger]
def add(key: String): Int = {
val zero = new AtomicInteger(0)
val value = Option(counts.putIfAbsent(key, zero)).getOrElse(zero)
value.incrementAndGet
}
}
You get better performance than synchronizing on the whole map, and you also get atomic increments.

The simplest solution is definitely synchronization. If there is not too much contention, performance might not be that bad.
Otherwise, you could try to roll up your own STM-like replace implementation. Something like this might do:
object ConcurrentMapOps {
private val rng = new util.Random
private val MaxReplaceRetryCount = 10
private val MinReplaceBackoffTime: Long = 1
private val MaxReplaceBackoffTime: Long = 20
}
implicit class ConcurrentMapOps[A, B]( val m: collection.concurrent.Map[A,B] ) {
import ConcurrentMapOps._
private def replaceBackoff() {
Thread.sleep( (MinReplaceBackoffTime + rng.nextFloat * (MaxReplaceBackoffTime - MinReplaceBackoffTime) ).toLong ) // A bit crude, I know
}
def replace(k: A, f: B => B): Option[B] = {
m.get( k ) match {
case None => return None
case Some( old ) =>
var retryCount = 0
while ( retryCount <= MaxReplaceRetryCount ) {
val done = m.replace( k, old, f( old ) )
if ( done ) {
return Some( old )
}
else {
retryCount += 1
replaceBackoff()
}
}
sys.error("Could not concurrently modify map")
}
}
}
Note that collision issues are localized to a given key. If two threads access the same map but work on distinct keys, you'll have no collisions and the replace operation will always succeed the first time. If a collision is detected, we wait a bit (a random amount of time, so as to minimize the likeliness of threads fighting forever for the same key) and try again.
I cannot guarantee that this is production-ready (I just tossed it right now), but that might do the trick.
UPDATE: Of course (as Ionuț G. Stan pointed out), if all you want is increment/decrement a value, java's ConcurrentHashMap already provides thoses operations in a lock-free manner.
My above solution applies if you need a more general replace method that would take the transformation function as a parameter.

You're asking for trouble if your map is just sitting there as a val. If it meets your use case, I'd recommend something like
class Counter {
private[this] myCounts = MMap.empty[String, Int].withDefaultValue(0)
def counts(s: String) = myCounts.synchronized { myCounts(s) }
def add(s: String) = myCounts.synchronized { myCounts(s) += 1 }
def getCounts = myCounts.synchronized { Map[String,Int]() ++ myCounts }
}
for low-contention usage. For high-contention, you should use a concurrent map designed to support such use (e.g. java.util.concurrent.ConcurrentHashMap) and wrap the values in AtomicWhatever.

If you are ok to work with future based interface:
trait SingleThreadedExecutionContext {
val ec = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())
}
class Counter extends SingleThreadedExecutionContext {
private val counts = MMap.empty[String, Int].withDefaultValue(0)
def get(s: String): Future[Int] = future(counts(s))(ec)
def add(s: String): Future[Unit] = future(counts(s) += 1)(ec)
}
Test will look like:
class MutableMapSpec extends Specification {
"thread safe" in {
import ExecutionContext.Implicits.global
val c = new Counter
val testData = Seq.fill(16)("1")
await(Future.traverse(testData)(c.add))
await(c.get("1")) mustEqual 16
}
}

How do I rewrite a for loop with a shared dependency using actors

We have some code which needs to run faster. Its already profiled so we would like to make use of multiple threads. Usually I would setup an in memory queue, and have a number of threads taking jobs of the queue and calculating the results. For the shared data I would use a ConcurrentHashMap or similar.
I don't really want to go down that route again. From what I have read using actors will result in cleaner code and if I use akka migrating to more than 1 jvm should be easier. Is that true?
However, I don't know how to think in actors so I am not sure where to start.
To give a better idea of the problem here is some sample code:
case class Trade(price:Double, volume:Int, stock:String) {
def value(priceCalculator:PriceCalculator) =
(priceCalculator.priceFor(stock)-> price)*volume
}
class PriceCalculator {
def priceFor(stock:String) = {
Thread.sleep(20)//a slow operation which can be cached
50.0
}
}
object ValueTrades {
def valueAll(trades:List[Trade],
priceCalculator:PriceCalculator):List[(Trade,Double)] = {
trades.map { trade => (trade,trade.value(priceCalculator)) }
}
def main(args:Array[String]) {
val trades = List(
Trade(30.5, 10, "Foo"),
Trade(30.5, 20, "Foo")
//usually much longer
)
val priceCalculator = new PriceCalculator
val values = valueAll(trades, priceCalculator)
}
}
I'd appreciate it if someone with experience using actors could suggest how this would map on to actors.

This is a complement to my comment on shared results for expensive calculations. Here it is:
import scala.actors._
import Actor._
import Futures._
case class PriceFor(stock: String) // Ask for result
// The following could be an "object" as well, if it's supposed to be singleton
class PriceCalculator extends Actor {
val map = new scala.collection.mutable.HashMap[String, Future[Double]]()
def act = loop {
react {
case PriceFor(stock) => reply(map getOrElseUpdate (stock, future {
Thread.sleep(2000) // a slow operation
50.0
}))
}
}
}
Here's an usage example:
scala> val pc = new PriceCalculator; pc.start
pc: PriceCalculator = PriceCalculator#141fe06
scala> class Test(stock: String) extends Actor {
| def act = {
| println(System.currentTimeMillis().toString+": Asking for stock "+stock)
| val f = (pc !? PriceFor(stock)).asInstanceOf[Future[Double]]
| println(System.currentTimeMillis().toString+": Got the future back")
| val res = f.apply() // this blocks until the result is ready
| println(System.currentTimeMillis().toString+": Value: "+res)
| }
| }
defined class Test
scala> List("abc", "def", "abc").map(new Test(_)).map(_.start)
1269310737461: Asking for stock abc
res37: List[scala.actors.Actor] = List(Test#6d888e, Test#1203c7f, Test#163d118)
1269310737461: Asking for stock abc
1269310737461: Asking for stock def
1269310737464: Got the future back
scala> 1269310737462: Got the future back
1269310737465: Got the future back
1269310739462: Value: 50.0
1269310739462: Value: 50.0
1269310739465: Value: 50.0
scala> new Test("abc").start // Should return instantly
1269310755364: Asking for stock abc
res38: scala.actors.Actor = Test#15b5b68
1269310755365: Got the future back
scala> 1269310755367: Value: 50.0

For simple parallelization, where I throw a bunch of work out to process and then wait for it all to come back, I tend to like to use a Futures pattern.
class ActorExample {
import actors._
import Actor._
class Worker(val id: Int) extends Actor {
def busywork(i0: Int, i1: Int) = {
var sum,i = i0
while (i < i1) {
i += 1
sum += 42*i
}
sum
}
def act() { loop { react {
case (i0:Int,i1:Int) => sender ! busywork(i0,i1)
case None => exit()
}}}
}
val workforce = (1 to 4).map(i => new Worker(i)).toList
def parallelFourSums = {
workforce.foreach(_.start())
val futures = workforce.map(w => w !! ((w.id,1000000000)) );
val computed = futures.map(f => f() match {
case i:Int => i
case _ => throw new IllegalArgumentException("I wanted an int!")
})
workforce.foreach(_ ! None)
computed
}
def serialFourSums = {
val solo = workforce.head
workforce.map(w => solo.busywork(w.id,1000000000))
}
def timed(f: => List[Int]) = {
val t0 = System.nanoTime
val result = f
val t1 = System.nanoTime
(result, t1-t0)
}
def go {
val serial = timed( serialFourSums )
val parallel = timed( parallelFourSums )
println("Serial result: " + serial._1)
println("Parallel result:" + parallel._1)
printf("Serial took %.3f seconds\n",serial._2*1e-9)
printf("Parallel took %.3f seconds\n",parallel._2*1e-9)
}
}
Basically, the idea is to create a collection of workers--one per workload--and then throw all the data at them with !! which immediately gives back a future. When you try to read the future, the sender blocks until the worker's actually done with the data.
You could rewrite the above so that PriceCalculator extended Actor instead, and valueAll coordinated the return of the data.
Note that you have to be careful passing non-immutable data around.
Anyway, on the machine I'm typing this from, if you run the above you get:
scala> (new ActorExample).go
Serial result: List(-1629056553, -1629056636, -1629056761, -1629056928)
Parallel result:List(-1629056553, -1629056636, -1629056761, -1629056928)
Serial took 1.532 seconds
Parallel took 0.443 seconds
(Obviously I have at least four cores; the parallel timing varies rather a bit depending on which worker gets what processor and what else is going on on the machine.)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kafka Streams Scala Memory Leak - scala

Related

Scala: Process futures in batches sorted by (approximate) completion time

Running two scala functions in parallel, returning the latest value after 5 minutes

cache using functional callbacks/ proxy pattern implementation scala

Thread-safely transforming a value in a mutable map

How do I rewrite a for loop with a shared dependency using actors

Categories

Resources