How to handle removed data from state - scala

I have a sessionization use case. I keep my sessions in-memory thanks to mapWithstate() and update them for each incoming log. When a session ends, signaled with a specific log, I want to retrieve it and remove it from my State.
The problem I stumble upon is that I cannot retrieve AND remove (remove()) my session at the end of each batch, because retrieval happens outside the updateFunction() and the removal within it, i.e. once removed the session cannot be retrieved, and if a session ends, there should not be anymore logs for it, no more keys.
I can still retrieve my ended sessions but the number of "dead" sessions will escalate, thus creating an integral anomaly ("State-overflow") that if left unchecked will threaten the system itself. This solution is not acceptable.
As it seems like a common use-case, I was wondering if anyone had come up with a solution?
EDIT
Sample code below:
def mapWithStateContainer(iResultParsing: DStream[(String, SessionEvent)]) = {
val lStateSpec = StateSpec.function(stateUpdateFunction _).timeout(Seconds(TIMEOUT)
val lResultMapWithState: DStream[(String, Session)] =
iResultParsing.mapWithState(lStateSpec).stateSnapshots()
val lClosedSession: DStream[(String, Session)] =
lResultMapWithState.filter(_._2.mTimeout)
//ideally remove here lClosedSession from the state
}
private def stateUpdateFunction(iKey: String,
iValue: Option[SessionEvent],
iState: State[Session]): Option[(String, Session)] = {
var lResult = None: Option[(String, Session)]
if (iState.isTimingOut()) {
val lClosedSession = iState.get()
lClosedSession.mTimeout = true
lResult = Some(iKey, lClosedSession)
} else if (iState.exists) {
val lUpdatedSession = updateSession(lCurrentSession, iValue)
iState.update(lUpdatedSession)
lResult = Some(iKey, lUpdatedSession)
// we wish to remove the lUpdatedSession from the state once retrieved with lResult
/*if (lUpdatedSession.mTimeout) {
iState.remove()
lResult = None
}*/
} else {
val lInitialState = initSession(iValue)
iState.update(lInitialState)
lResult = Some(iKey, lInitialState)
}
lResult
}
private def updateSession(iCurrentSession: Session,
iNewData: Option[SessionEvent]): Session = {
//user disconnects manually
if (iNewData.get.mDisconnection) {
iCurrentSession.mTimeout = true
}
iCurrentSession
}

Instead of calling MapWithStateRDD.stateSnapshot, you can return the updated state as the return value of your mapWithState operation. This way, the finalized state is always available outside the your stateful DStream.
This means that you can do:
else if (iState.exists) {
val lUpdatedSession = updateSession(lCurrentSession, iValue)
iState.update(lUpdatedSession)
if (lUpdatedSession.mTimeout) {
iState.remove()
}
Some(iKey, lUpdatedSession)
}
And now change your graph to:
val lResultMapWithState = iResultParsing
.mapWithState(lStateSpec)
.filter { case (_, session) => session.mTimeout }
What happens is now that the state is being removed internally, but because you're returning it from your StateSpec function, it's available to you outside for further processing.

Related

Why different behavior of Single when using just or fromCallable in startWith?

I'm starting a PublishSubject with testPublishSubject.startWith(createSingle().toObservable()).
If I subscribe to this observable, dispose and subscribe again, it will emit a different item depending on how I created the Single. If I create it with just, it emits the same item as the first time (item1), if I create it with fromCallable, it emits the updated item (item2). Why is the behavior different? Is there a way to use just and have it behave like fromCallable?
Edit: Ok, I think I know why it behaves differently. It's because it's not re-creating the Single. fromCallable works only because of the closure, which is executed again with the updated counter.
My updated question would be: Is there a way to have the subject re-create the Single? The reason I want this, is because the Single is fetching a value, which may have been updated, and I need to fetch it again.
var counter = 1
// With this, it works as expected
// fun createSingle(): Single<String> = Single.fromCallable {
// "item-${counter++}"
// }
// With this, the second subscription still shows "item-1
fun createSingle(): Single<String> = Single.just("item-${counter++}")
val testPublishSubject = PublishSubject.create<String>()
val observable = testPublishSubject.startWith(createSingle().toObservable().doOnNext {
log(">>> single on next: $it")
}).doOnNext {
log(">>> publish subject on next: $it")
}
log(">>> subscribing1")
val disposable1 = observable.subscribe {
log(">>> value subscription 1: $it")
}
log(">>> pushing random item")
testPublishSubject.onNext("random item")
log(">>> disposing subscription1")
disposable1.dispose()
log(">>> subscribing2")
val disposable2 = observable.subscribe {
log(">>> value subscription 2: $it")
}

Long living service with coroutines

I want to create a long living service that can handle events.
It receives events via postEvent, stores it in repository (with underlying database) and send batch of them api when there are enough events.
Also I'd like to shut it down on demand.
Furthermore I would like to test this service.
This is what I came up so far. Currently I'm struggling with unit testing it.
Either database is shut down prematurely after events are sent to service via fixture.postEvent() or test itself gets in some sort of deadlock (was experimenting with various context + job configurations).
What am I doing wrong here?
class EventSenderService(
private val repository: EventRepository,
private val api: Api,
private val serializer: GsonSerializer,
private val requestBodyBuilder: EventRequestBodyBuilder,
) : EventSender, CoroutineScope {
private val eventBatchSize = 25
val job = Job()
private val channel = Channel<Unit>()
init {
job.start()
launch {
for (event in channel) {
val trackingEventCount = repository.getTrackingEventCount()
if (trackingEventCount < eventBatchSize) continue
readSendDelete()
}
}
}
override val coroutineContext: CoroutineContext
get() = Dispatchers.Default + job
override fun postEvent(event: Event) {
launch(Dispatchers.IO) {
writeEventToDatabase(event)
}
}
override fun close() {
channel.close()
job.cancel()
}
private fun readSendDelete() {
try {
val events = repository.getTrackingEvents(eventBatchSize)
val request = requestBodyBuilder.buildFor(events).blockingGet()
api.postEvents(request).blockingGet()
repository.deleteTrackingEvents(events)
} catch (throwable: Throwable) {
Log.e(throwable)
}
}
private suspend fun writeEventToDatabase(event: Event) {
try {
val trackingEvent = TrackingEvent(eventData = serializer.toJson(event))
repository.insert(trackingEvent)
channel.send(Unit)
} catch (throwable: Throwable) {
throwable.printStackTrace()
Log.e(throwable)
}
}
}
Test
#RunWith(RobolectricTestRunner::class)
class EventSenderServiceTest : CoroutineScope {
#Rule
#JvmField
val instantExecutorRule = InstantTaskExecutorRule()
private val api: Api = mock {
on { postEvents(any()) } doReturn Single.just(BaseResponse())
}
private val serializer: GsonSerializer = mock {
on { toJson<Any>(any()) } doReturn "event_data"
}
private val bodyBuilder: EventRequestBodyBuilder = mock {
on { buildFor(any()) } doReturn Single.just(TypedJsonString.buildRequestBody("[ { event } ]"))
}
val event = Event(EventName.OPEN_APP)
private val database by lazy {
Room.inMemoryDatabaseBuilder(
RuntimeEnvironment.systemContext,
Database::class.java
).allowMainThreadQueries().build()
}
private val repository by lazy { database.getRepo() }
val fixture by lazy {
EventSenderService(
repository = repository,
api = api,
serializer = serializer,
requestBodyBuilder = bodyBuilder,
)
}
override val coroutineContext: CoroutineContext
get() = Dispatchers.Default + fixture.job
#Test
fun eventBundling_success() = runBlocking {
(1..40).map { Event(EventName.OPEN_APP) }.forEach { fixture.postEvent(it) }
fixture.job.children.forEach { it.join() }
verify(api).postEvents(any())
assertEquals(15, eventDao.getTrackingEventCount())
}
}
After updating code as #Marko Topolnik suggested - adding fixture.job.children.forEach { it.join() } test never finishes.
One thing you're doing wrong is related to this:
override fun postEvent(event: Event) {
launch(Dispatchers.IO) {
writeEventToDatabase(event)
}
}
postEvent launches a fire-and-forget async job that will eventually write the event to the database. Your test creates 40 such jobs in rapid succession and, while they're queued, asserts the expected state. I can't work out, though, why you assert 15 events after posting 40.
To fix this issue you should use the line you already have:
fixture.job.join()
but change it to
fixture.job.children.forEach { it.join() }
and place it lower, after the loop that creates the events.
I failed to take into account the long-running consumer job you launch in the init block. This invalidates the advice I gave above to join all children of the master job.
Instead you'll have to make a bit more changes. Make postEvent return the job it launches and collect all these jobs in the test and join them. This is more selective and avoids joining the long-living job.
As a separate issue, your batching approach isn't ideal because it will always wait for a full batch before doing anything. Whenever there's a lull period with no events, the events will be sitting in the incomplete batch indefinitely.
The best approach is natural batching, where you keep eagerly draining the input queue. When there's a big flood of incoming events, the batch will naturally grow, and when they are trickling in, they'll still be served right away. You can see the basic idea here.

API Observable with dynamic caching

An API I'm polling has a field that defines the time that value is cached, cachedUntil. The goal is to create an Observable that polls and emits an event every time the cache has expired. The thing that distinguishes this case, is that the caching is not regular. I.e. Observable.interval does not apply.
In what ways is it possible to implement an Observable that has this behaviour?
The following snippet gives a function that polls the API, emits the requested events and return the cachedUntil delay to the next call.
def getContracts(subscriber: Subscriber[Set[EveContract]]): Option[Long] = {
logger.debug("Fetching new contracts")
try {
val response = parser.getResponse(auth)
if(response == null) {
subscriber.onError(new RuntimeException("Unable to fetch contracts from EVE servers"))
None
}
else if(response.hasError) {
logger.error(response.getError.toString)
subscriber.onError(new RuntimeException(response.getError.toString))
None
} else {
subscriber.onNext(response.getAll.toSet) // Emit new polled data
Some(response.getCachedUntil.getTime - new Date().getTime) // Return the cache delay
}
} catch {
case aex: ApiException ⇒
logger.error("An error occurred when querying the EVE API.")
logger.debug("ApiException: ", aex)
subscriber.onError(aex)
None
}
}
It is possible to use Scheduler workers to reschedule a call togetContracts:
Observable[Set[EveContract]](observer ⇒ {
val worker = Schedulers.newThread().createWorker()
def scheduleContracts(delay: Long) {
worker.schedule(new Action0 {
override def call(){
if(!observer.isUnsubscribed) {
val delay = getContracts(observer)
delay match {
// Reschedule a contract fetch after time d has passed.
case Some(d) ⇒
logger.debug(s"Rescheduling contract fetch in: ${d / 1000} s")
scheduleContracts(d)
case _ ⇒
// Otherwise do nothing
logger.debug("Not rescheduling contract fetch, an error has occured.")
}
} else {
logger.trace("Subscriber has unsubscribed.")
}
}
}, delay, TimeUnit.MILLISECONDS)
}
scheduleContracts(0L)
})
However, I'm very interested in possible other solutions.

Discard all messages except the last one in a Scala actor

I have a SwingWorker actor which computes a plot for display from a parameters object it gets send; then draws the plot on the EDT thread. Some GUI elements can tweak parameters for this plot. When they change I generate a new parameter object and send it to the worker.
This works so far.
Now when moving a slider many events are created and queue up in the worker's mailbox. But I only need to compute the plot for the very last set of parameters. Is there a way to drop all messages from the inbox; keep the last one and process only that?
Currently the code looks like this
val worker = new SwingWorker {
def act() {
while (true) {
receive {
case params: ExperimentParameters => {
//somehow expensive
val result = RunExperiments.generateExperimentData(params)
Swing.onEDT{ GuiElement.redrawWith(result) }
}
}
}
}
}
Meanwhile I have found a solution. You can check the mailbox size of the actor and simply skip the message if it is not 0.
val worker = new SwingWorker {
def act() {
while (true) {
receive {
case params: ExperimentParameters => {
if( mailboxSize == 0) {
//somehow expensive
val result = RunExperiments.generateExperimentData(params)
Swing.onEDT{ GuiElement.redrawWith(result) }
}
}
}
}
}
}
Remember the last event without processing it, have a very short timeout, process the last event when you get the timeout
could look like (not tested)
while(true) {
var lastReceived : Option[ExperimentParameters] = None
receive {case params : ExperimentParameters => lastReceived = Some(params)}
while (!lastReceived.isEmpty) {
receiveWithin(0) {
case params: ExperimentParameters => lastReceived = Some(params)
case TIMEOUT => do your job with lastReceived.get;
}
}
}

how to cancel ConsoleReader.readLine()

first of all, i'm learning scala and new to the java world.
I want to create a console and run this console as a service that you could start and stop.
I was able to run a ConsoleReader into an Actor but i don't know how to stop properly the ConsoleReader.
Here is the code :
import eu.badmood.util.trace
import scala.actors.Actor._
import tools.jline.console.ConsoleReader
object Main {
def main(args:Array[String]){
//start the console
Console.start(message => {
//handle console inputs
message match {
case "exit" => Console.stop()
case _ => trace(message)
}
})
//try to stop the console after a time delay
Thread.sleep(2000)
Console.stop()
}
}
object Console {
private val consoleReader = new ConsoleReader()
private var running = false
def start(handler:(String)=>Unit){
running = true
actor{
while (running){
handler(consoleReader.readLine("\33[32m> \33[0m"))
}
}
}
def stop(){
//how to cancel an active call to ConsoleReader.readLine ?
running = false
}
}
I'm also looking for any advice concerning this code !
The underlying call to read a characters from the input is blocking. On non-Windows platform, it will use System.in.read() and on Windows it will use org.fusesource.jansi.internal.WindowsSupport.readByte.
So your challenge is to cause that blocking call to return when you want to stop your console service. See http://www.javaspecialists.eu/archive/Issue153.html and Is it possible to read from a InputStream with a timeout? for some ideas... Once you figure that out, have read return -1 when your console service stops, so that ConsoleReader thinks it's done. You'll need ConsoleReader to use your version of that call:
If you are on Windows, you'll probably need to override tools.jline.AnsiWindowsTerminal and use the ConsoleReader constructor that takes a Terminal (otherwise AnsiWindowsTerminal will just use WindowsSupport.readByte` directly)
On unix, there is one ConsoleReader constructor that takes an InputStream, you could provide your own wrapper around System.in
A few more thoughts:
There is a scala.Console object already, so for less confusion name yours differently.
System.in is a unique resource, so you probably need to ensure that only one caller uses Console.readLine at a time. Right now start will directly call readLine and multiple callers can call start. Probably the console service can readLine and maintain a list of handlers.
Assuming that ConsoleReader.readLine responds to thread interruption, you could rewrite Console to use a Thread which you could then interrupt to stop it.
object Console {
private val consoleReader = new ConsoleReader()
private var thread : Thread = _
def start(handler:(String)=>Unit) : Thread = {
thread = new Thread(new Runnable {
override def run() {
try {
while (true) {
handler(consoleReader.readLine("\33[32m> \33[0m"))
}
} catch {
case ie: InterruptedException =>
}
}
})
thread.start()
thread
}
def stop() {
thread.interrupt()
}
}
You may overwrite your ConsoleReader InputStream. IMHO this is reasonable well because of STDIN is a "slow" stream. Please improve example for your needs. This is only sketch, but it works:
def createReader() =
terminal.synchronized {
val reader = new ConsoleReader
terminal.enableEcho()
reader.setBellEnabled(false)
reader.setInput(new InputStreamWrapper(reader.getInput())) // turn on InterruptedException for InputStream.read
reader
}
with InputStream wrapper:
class InputStreamWrapper(is: InputStream, val timeout: Long = 50) extends FilterInputStream(is) {
#tailrec
final override def read(): Int = {
if (is.available() != 0)
is.read()
else {
Thread.sleep(timeout)
read()
}
}
}
P.S. I tried to use NIO - a lot of troubles with System.in (especially crossplatform). I returned to this variant. CPU load is near 0%. This is suitable for such interactive application.