Memory leak issue in (Scala + Akka HTTP) micro service - scala

I have deployed my micro services into docker container on AWS server which is written using Akka-HTTP(https://github.com/theiterators/akka-http-microservice) and Scala. I am facing memory leak issues & performance problems once I deployed the service to AWS server.
I have noticed that the memory usage increased when server getting more number of requests(like 340mb, 410mb, 422mb...) & depends on load, then it automatically came down to the normal state (230mb). But Memory usage keeps on increasing when server getting more number of requests and it failed to release unused memory even after CPU usage came to the normal stage and finally it reached its maximum (512mb) and crashed.
We can avoid this if it cleared the unused memory/resources properly. JVM should handle the memory usage by itself using Garbage collectors. But it failed to clear unwanted memory(objects) once the request is processed. I am using the below code to clear the Akka.HTTP actor objects.
try {
<-- code block -->
} catch {
case e: Exception =>
sys.addShutdownHook(system.shutdown())
} finally {
sys.addShutdownHook(system.shutdown())
}
How can I remove unused memory/resources immediately after the execution process has been completed?
Please provide us a solution/way to resolve this problem as soon as possible as it is a very critical problem for us?

Firstly you should remove the code that is supposed to solve your problem.
BTW if you do it on each request you can spoil all the memory with these shutdown hooks.

Related

Background task in reactive pipeline (Fire-and-forget)

I have a reactive pipeline to process incoming requests. For each request I need to call a business-relevant function (doSomeRelevantProcessing).
After that is done, I need to notify some external service about what happened. That part of the pipeline should not increase the overall response time.
Also, notifying this external system is not business critical: giving a quick response after the main part of the pipeline is finished is more important than making sure the notification is successful.
As far as I learned, the only way to run something in the background without slowing down the overall process is to subscribe to in directly in the pipeline, thus achieving a fire-and-forget mentality.
Is there a good alternative to subscribing inside the flatmap?
I am a little worried about what might happen if notifying the external service takes longer than the original processing and a lot of requests are coming in at once. Could this lead to a memory exhaustion or the overall process to block?
fun runPipeline(incoming: Mono<Request>) = incoming
.flatMap { doSomeRelevantProcessing(it) } // this should not be delayed
.flatMap { doBackgroundJob(it) } // this can take a moment, but is not super critical
fun doSomeRelevantProcessing(request: Request) = Mono.just(request) // do some processing
fun doBackgroundJob(request: Request) = Mono.deferContextual { ctx: ContextView ->
val notification = "notification" // build an object from context
// this uses non-blocking HTTP (i.e. webclient), so it can take a second or so
notifyExternalService(notification).subscribeOn(Schedulers.boundedElastic()).subscribe()
Mono.just(Unit)
}
fun notifyExternalService(notification: String) = Mono.just(Unit) // might take a while
I'm answering this assuming that you notify the external service using purely reactive mechanisms - i.e. you're not wrapping a blocking service. If you are then the answer would be different as you're bound by the size of your bounded elastic thread pool, which could quickly become overwhelmed if you have hundreds of requests a second incoming.
(Assuming you're using reactive mechanisms, then there's no need for .subscribeOn(Schedulers.boundedElastic()) as you give in your example, as that's not buying you anything - it's designed for wrapping legacy blocking services.)
Could this lead to a memory exhaustion
It's only a possibility in really extreme cases, the memory used by each individual request will be tiny. It's almost certainly not worth worrying about, if you start seeing memory issues here then you'll almost certainly be hit by other issues elsewhere.
That being said, I'd probably recommend adding .timeout(Duration.ofSeconds(5)) or similar before your inner subscribe method to make sure the requests are killed off after a while if they haven't worked for any reason - this will prevent them building up.
...or [can this cause] the overall process to block?
This one is easier - a short no, it can't.

Linux, warning : __get_request: dev 8:0: request aux data allocation failed, iosched may be disturbed

I playing with the test code to submit BIO from my own kernel module:
if i use submit_bio(&bio) - all works fine
if i use bdev->bd_queue->make_request_fn(bdev->bd_queue, &bio) then
getting in the dmesg:
__get_request: dev 8:0: request aux data allocation failed, iosched may be disturbed
My primary target is submiting BIOs to stackable device driver w/o calling the submit_bio() routine. Any ideas, pointers ?
Our hero Tom Caputi of ZFS encryption fame figured this out.
Basically the scheduler expects an io context in the task struct for the thread that's running your request.
You'll see here, the io context is created in generic_make_request_checks()
https://elixir.bootlin.com/linux/latest/source/block/blk-core.c#L2323
If it is never created for the task struct that's running your request, you'll see that message "io sched may be disturbed." A lousy message if ever there was one. "Scheduler context was not allocated for this task" would have made the problem a bit more obvious.
I'm not the kernel guy that tom is, but basically by doing this:
bdev->bd_queue->make_request_fn
your request is being handled by another thread, that doesn't have that context allocated.
Now create_io_context is not exported so you can't call it directly.
But if you call this:
https://elixir.bootlin.com/linux/latest/source/block/blk-ioc.c#L319
which is exported the io context will be allocated an no more warning message.
And I imagine there will be some io improvement because the scheduler has context to work with.

potential memory leak using TriMap in Scala and Tomcat

I am using a scala.collection.concurrent.TriMap wrapped in an object to store configuration values that are fetched remotely.
object persistentMemoryMap {
val storage: TrieMap[String, CacheEntry] = TrieMap[String, CacheEntry]()
}
It works just fine but I have noticed that when Tomcat is shut down it logs some alarming messages about potential memory leaks
2013-jun-27 08:58:22 org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
ALLVARLIG: The web application [] created a ThreadLocal with key of type [scala.concurrent.forkjoin.ThreadLocalRandom$1] (value [scala.concurrent.forkjoin.ThreadLocalRandom$1#5d529976]) and a value of type [scala.concurrent.forkjoin.ThreadLocalRandom] (value [scala.concurrent.forkjoin.ThreadLocalRandom#59d941d7]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak
I am guessing this thread will terminate on it's own eventually but I am wondering if there is some way to kill it or should I just leave it alone?
The scala.concurrent.forkjoin.ThreadLocalRandom's value is created only once per thread. It does not hold any references to objects other than the random value generator used by that thread -- the memory it consumes has a fixed size. Once the thread is garbage collected, its thread local random value will be collected as well -- you should just let the GC do its work.
You could still remove it manually by using Java reflection to remove the private modifier on the static field localRandom in the ThreadLocalRandom class:
https://github.com/scala/scala/blob/master/src/forkjoin/scala/concurrent/forkjoin/ThreadLocalRandom.java#L62
You could then call localRandom.set(null) to null out the reference to the random number generator. You should also then ensure that TrieMap is no longer used from that thread, otherwise ThreadLocalRandom will break by assuming that the random number generator is different than null.
Seems hacky to me, and I think you should just stick to letting the GC collect the thread local value.

In what condition can memcached run into allocation failure

each slab class has at least one page, and when there's no memory available to allocate, it will evict items in the "tails" list. So why can it run into MEMCACHED_MEMORY_ALLOCATION_FAILURE state?
I think this is a libmemcached state, not a memcached state. In any case it would occur when the application requests memory from the underlying allocator (malloc) and malloc returns no heap memory. Since the request for memory cannot be completed you would likely get an error like this since the application would not be able to proceed with your request.
This error occur when client call libmemcached.libmemcached throw exception include several cases:
1 realloc, malloc, calloc
2 see source code segment as follow:
...
new_size= sizeof(char) * (size_t)((adjust * MEMCACHED_BLOCK_SIZE) + string->current_size);
if (new_size < need)
return MEMCACHED_MEMORY_ALLOCATION_FAILURE;
...
fully it could not happen

Memcached memory leak

i'm building some application, where i have to use memcached.
I found quite nice client:
net.spy.memcached.MemcachedClient
Under this cliend everything works greate except one - i have problem with close connection, and after a while i'm startign to fight with memory leak.
I was looking for possibility for close connection, and i foud "shutdown" method. But if i use this method like this:
MemcachedClient c = new MemcachedClient(new InetSocketAddress(
memcachedIp, memcachedPort));
c.set(something, sessionLifeTime, memcache.toJSONString());
c.shutdown();
I have problem with adding anything do memcached - in logs i see that this method is opening connection, and before it will add anything to memcached, it's closeing the connection.
Do you have any idea, what to do?
Additionally - i found method: c.shutdown(2, TimeUnit.SECONDS); - which should close connection after 2 seconds, but i have connected jmx monitor to my tomcat and i see, that Memcached thread isn't finished after 2 seconds - this thread isn't finished at all...
The reason you are having an issue adding things to memcached like this is that the set(...) function is asynchronous and all it does is put that operation into a queue to be sent to memcached. Since you call shutdown right after this the operation doesn't actually have time to make it out onto the wire. You need to call set(...).get() in order to make your application thread actually wait for the operation to complete before calling shutdown.
Also, I haven't experience IO threads not dying after calling shutdown with a timeout. One way you can confirm that this is an actual bug is by running a standalone program with Spymemached. If the process doesn't terminate when it's completed then you've found an issue.