Scala Mongo driver getting results using Future - mongodb

I am intending to fetch all records that match a criteria from Mongo using Scala Mongo Driver.
Using Observables, you can access the stream by creating a subscription:
val MaxBuffer: Long = 100
var docs: Queue[Document] = Queue.empty
var sub: Option[Subscription] = None
val q: Observable[Document]
def fetchMoreRecords: Unit = sub.get.request(MaxBuffer)
q.subscribe(new Observer[Document] {
override def onSubscribe(subscription: Subscription): Unit = {
sub = Some(subscription)
fetchMoreRecords
}
override def onError(e: Throwable): Unit = fail(out, e)
override def onComplete(): Unit = {
println("Stream is complete")
complete(out)
}
override def onNext(result: Document): Unit = {
if (doc.size == maxBuffer) {
fail(out, new RuntimeException("Buffer overflow"))
} else {
docs = docs :+ result
}
}
})
(this code is incomplete)
I would need a function like:
def isReady: Future[Boolean] = {}
Which completes whenever onNext was called at least once.
The bad way to do this would be:
def isReady: Future[Boolean] = {
Future {
def wait: Unit = {
if (docs.nonEmpty) {
true
} else { wait }
}
wait
}
}
What would be the best way to achieve this?

You want to use Promise:
val promise = Promise[Boolean]()
...
override def onNext() = {
...
promise.tryComplete(Success(true))
}
override def onError(e: Throwable) =
promise.tryComplete(Failure(e))
val future = promise.future
You should do something to handle the case when there are no result (as it is now, the future will never be satisfied ...

Related

Scala Future flatMap implementation (chaining)

Trying to wrap my head around how async tasks are chained together, for ex futures and flatMap
val a: Future[Int] = Future { 123 }
val b: Future[Int] = a.flatMap(a => Future { a + 321 })
How would i implement something similar where i build a new Future by waiting for a first and applying a f on the result. I tried looking a scala's source code but got stuck here: https://github.com/scala/scala/blob/2.13.x/src/library/scala/concurrent/Future.scala#L216
i would imagine some code like:
def myFlatMap(a: Future[Int])(f: a => Future[Int]): Future[Int] = {
Future {
// somehow wait for a to complete and the apply 'f'
a.onComplete {
case Success(x) => f(a)
}
}
}
but i guess above will return a Future[Unit] instead. I would just like to understand the "pattern" of how async tasks are chained together.
(Disclaimer: I'm the main maintainer of Scala Futures)
You wouldn't want to implement flatMap/recoverWith/transformWith yourself—it is a very complex feature to implement safely (stack-safe, memory-safe, and concurrency-safe).
You can see what I am talking about here:
https://github.com/scala/scala/blob/2.13.x/src/library/scala/concurrent/impl/Promise.scala#L442
https://github.com/scala/scala/blob/2.13.x/src/library/scala/concurrent/impl/Promise.scala#L307
https://github.com/scala/scala/blob/2.13.x/src/library/scala/concurrent/impl/Promise.scala#L274
There is more reading on the topic here: https://viktorklang.com/blog/Futures-in-Scala-2.12-part-9.html
I found that looking at the source for scala 2.12 was easier for me to understand how Future / Promise was implemented, i learned that Future is in fact a Promise and by looking at the implementation i now understand how "chaining" async tasks can be implemented.
Im posting the code that i wrote to help me better understand what was going on under the hood.
import java.util.concurrent.{ScheduledThreadPoolExecutor, TimeUnit}
import scala.collection.mutable.ListBuffer
trait MyAsyncTask[T] {
type Callback = T => Unit
var result: Option[T]
// register a new callback to be called when task is completed.
def onComplete(callback: Callback): Unit
// complete the given task, and invoke all registered callbacks.
def complete(value: T): Unit
// create a new task, that will wait for the current task to complete and apply
// the provided map function, and complete the new task when completed.
def flatMap[B](f: T => MyAsyncTask[B]): MyAsyncTask[B] = {
val wrapper = new DefaultAsyncTask[B]()
onComplete { x =>
f(x).onComplete { y =>
wrapper.complete(y)
}
}
wrapper
}
def map[B](f: T => B): MyAsyncTask[B] = flatMap(x => CompletedAsyncTask(f(x)))
}
/**
* task with fixed pre-calculated result.
*/
case class CompletedAsyncTask[T](value: T) extends MyAsyncTask[T] {
override var result: Option[T] = Some(value)
override def onComplete(callback: Callback): Unit = {
// we already have the result just call the callback.
callback(value)
}
override def complete(value: T): Unit = () // noop nothing to complete.
}
class DefaultAsyncTask[T] extends MyAsyncTask[T] {
override var result: Option[T] = None
var isCompleted = false
var listeners = new ListBuffer[Callback]()
/**
* register callback, to be called when task is completed.
*/
override def onComplete(callback: Callback): Unit = {
if (isCompleted) {
// already completed just invoke callback
callback(result.get)
} else {
// add the listener
listeners.addOne(callback)
}
}
/**
* trigger all registered callbacks to `onComplete`
*/
override def complete(value: T): Unit = {
result = Some(value)
listeners.foreach { listener =>
listener(value)
}
}
}
object MyAsyncTask {
def apply[T](body: (T => Unit) => Unit): MyAsyncTask[T] = {
val task = new DefaultAsyncTask[T]()
// pass in `complete` as callback.
body(task.complete)
task
}
}
object MyAsyncFlatMap {
/**
* helper to simulate async task.
*/
def delayedCall(body: => Unit, delay: Int): Unit = {
val scheduler = new ScheduledThreadPoolExecutor(1)
val run = new Runnable {
override def run(): Unit = {
body
}
}
scheduler.schedule(run, delay, TimeUnit.SECONDS)
}
def main(args: Array[String]): Unit = {
val getAge = MyAsyncTask[Int] { cb =>
delayedCall({ cb(66) }, 1)
}
val getName = MyAsyncTask[String] { cb =>
delayedCall({ cb("John") }, 2)
}
// same as: getAge.flatMap(age => getName.map(name => (age, name)))
val result: MyAsyncTask[(Int, String)] = for {
age <- getAge
name <- getName
} yield (age, name)
result.onComplete {
case (age, name) => println(s"hello $name $age")
}
}
}

Alpakka - read Kryo-serialized objects from S3

I have Kryo-serialized binary data stored on S3 (thousands of serialized objects).
Alpakka allows to read the content as data: Source[ByteString, NotUsed]. But Kryo format doesn't use delimiters so I can't split each serialized object into a separate ByteString using data.via(Framing.delimiter(...)).
So, Kryo actually needs to read the data to understand when an object ends, and it doesn't look streaming-friendly.
Is it possible to implement this case in streaming fashion so that I get Source[MyObject, NotUsed] in the end of the day?
Here is a graph stage that does that. It handles the case when a serialized object spans two byte strings. It needs to be improved when objects are large (not my use case) and can take more than two byte strings in Source[ByteString, NotUsed].
object KryoReadStage {
def flow[T](kryoSupport: KryoSupport,
`class`: Class[T],
serializer: Serializer[_]): Flow[ByteString, immutable.Seq[T], NotUsed] =
Flow.fromGraph(new KryoReadStage[T](kryoSupport, `class`, serializer))
}
final class KryoReadStage[T](kryoSupport: KryoSupport,
`class`: Class[T],
serializer: Serializer[_])
extends GraphStage[FlowShape[ByteString, immutable.Seq[T]]] {
override def shape: FlowShape[ByteString, immutable.Seq[T]] = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = {
new GraphStageLogic(shape) {
setHandler(in, new InHandler {
override def onPush(): Unit = {
val bytes =
if (previousBytes.length == 0) grab(in)
else ByteString.fromArrayUnsafe(previousBytes) ++ grab(in)
Managed(new Input(new ByteBufferBackedInputStream(bytes.asByteBuffer))) { input =>
var position = 0
val acc = ListBuffer[T]()
kryoSupport.withKryo { kryo =>
var last = false
while (!last && !input.eof()) {
tryRead(kryo, input) match {
case Some(t) =>
acc += t
position = input.total().toInt
previousBytes = EmptyArray
case None =>
val bytesLeft = new Array[Byte](bytes.length - position)
val bb = bytes.asByteBuffer
bb.position(position)
bb.get(bytesLeft)
last = true
previousBytes = bytesLeft
}
}
push(out, acc.toList)
}
}
}
private def tryRead(kryo: Kryo, input: Input): Option[T] =
try {
Some(kryo.readObject(input, `class`, serializer))
} catch {
case _: KryoException => None
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
private val EmptyArray: Array[Byte] = Array.empty
private var previousBytes: Array[Byte] = EmptyArray
}
}
override def toString: String = "KryoReadStage"
private lazy val in: Inlet[ByteString] = Inlet("KryoReadStage.in")
private lazy val out: Outlet[immutable.Seq[T]] = Outlet("KryoReadStage.out")
}
Example usage:
client.download(BucketName, key)
.via(KryoReadStage.flow(kryoSupport, `class`, serializer))
.flatMapConcat(Source(_))
It uses some additional helpers below.
ByteBufferBackedInputStream:
class ByteBufferBackedInputStream(buf: ByteBuffer) extends InputStream {
override def read: Int = {
if (!buf.hasRemaining) -1
else buf.get & 0xFF
}
override def read(bytes: Array[Byte], off: Int, len: Int): Int = {
if (!buf.hasRemaining) -1
else {
val read = Math.min(len, buf.remaining)
buf.get(bytes, off, read)
read
}
}
}
Managed:
object Managed {
type AutoCloseableView[T] = T => AutoCloseable
def apply[T: AutoCloseableView, V](resource: T)(op: T => V): V =
try {
op(resource)
} finally {
resource.close()
}
}
KryoSupport:
trait KryoSupport {
def withKryo[T](f: Kryo => T): T
}
class PooledKryoSupport(serializers: (Class[_], Serializer[_])*) extends KryoSupport {
override def withKryo[T](f: Kryo => T): T = {
pool.run(new KryoCallback[T] {
override def execute(kryo: Kryo): T = f(kryo)
})
}
private val pool = {
val factory = new KryoFactory() {
override def create(): Kryo = {
val kryo = new Kryo
(KryoSupport.ScalaSerializers ++ serializers).foreach {
case ((clazz, serializer)) =>
kryo.register(clazz, serializer)
}
kryo
}
}
new KryoPool.Builder(factory).softReferences().build()
}
}

Run code if no results were found in a MongoDB query in Scala

I've got this Scala code:
Bot.dbUserCollection.find(/*filter*/).first().subscribe(new Observer[Document] {
override def onError(e: Throwable): Unit = Utils.handleError(msg, e)
override def onComplete(): Unit = {}
override def onNext(result: Document): Unit =
/*here im doing operations with the result, if found*/
})
What is the best way to execute some code only if no result was found? Can I somehow easily check if anything was found in onComplete() maybe? I know this has been asked and anwsered many times for other languages in many places all over the Internet but unfortunately I've found no Scala answer
Keeping in mind that onNext will be called when a new document is retrieved and onComplete will be called when the operation is complete regardless of whether or not the query returned any results, you can count the number of results in onNext and check that count in onComplete.
An MVCE follows:
import java.util.concurrent.TimeUnit
import java.util.concurrent.atomic.AtomicLong
import org.mongodb.scala.bson.collection.immutable.Document
import org.mongodb.scala.{
MongoClient,
MongoCollection,
MongoDatabase,
Observable,
Observer
}
import scala.concurrent.Await
import scala.concurrent.duration.Duration
object SimpleInsertionAndRetrieval extends App {
import Helpers._
// To directly connect to the default server localhost on port 27017
val mongoClient: MongoClient = MongoClient()
val database: MongoDatabase = mongoClient.getDatabase("mydb")
val collection: MongoCollection[Document] = database.getCollection("test")
val doc: Document =
Document("_id" -> 0, "name" -> "MongoDB", "type" -> "database")
collection.insertOne(doc).results()
private def observer = new Observer[Document] {
private val counter = new AtomicLong(0)
override def onError(e: Throwable): Unit = {
println(s"Error: $e")
}
override def onComplete(): Unit = {
val numberOfElements = counter.get()
if (numberOfElements == 0) {
println("Nothing was found!")
} else {
println(s"There were $numberOfElements")
}
}
override def onNext(result: Document): Unit = {
counter.incrementAndGet()
// do other things
}
}
collection
.find()
.subscribe(observer)
collection
.find(Document("{ _id: 1 }")) // There is no document with _id = 0
.subscribe(observer)
Thread.sleep(5000)
/**
* From the documentation
*
* #see https://github.com/mongodb/mongo-scala-driver/blob/master/examples/src/test/scala/tour/Helpers.scala
*/
object Helpers {
implicit class DocumentObservable[C](val observable: Observable[Document])
extends ImplicitObservable[Document] {
override val converter: (Document) => String = (doc) => doc.toJson
}
implicit class GenericObservable[C](val observable: Observable[C])
extends ImplicitObservable[C] {
override val converter: (C) => String = (doc) => doc.toString
}
trait ImplicitObservable[C] {
val observable: Observable[C]
val converter: (C) => String
def results(): Seq[C] =
Await.result(observable.toFuture(), Duration(10, TimeUnit.SECONDS))
def headResult() =
Await.result(observable.head(), Duration(10, TimeUnit.SECONDS))
def printResults(initial: String = ""): Unit = {
if (initial.length > 0) print(initial)
results().foreach(res => println(converter(res)))
}
def printHeadResult(initial: String = ""): Unit =
println(s"${initial}${converter(headResult())}")
}
}
}
This prints
There were 1
Nothing was found!

How to create an Akka Stream Source[Seq[A]] from Source[A]

With previous versions of Akka Streams, groupBy returned a Source of Sources that could be materialized into a Source[Seq[A]].
With Akka Streams 2.4 I see that groupBy returns a SubFlow - it's not clear to me how use this. The transformations I need to apply to the flow have to have the whole Seq available, so I can't just map over the SubFlow (I think).
I've written a class that extends GraphStage that does the aggregation via a mutable collection in the GraphStageLogic, but is there in-built functionality for this? Am I missing the point of SubFlow?
I ended up writing a GraphStage:
class FlowAggregation[A, B](f: A => B) extends GraphStage[FlowShape[A, Seq[A]]] {
val in: Inlet[A] = Inlet("in")
val out: Outlet[Seq[A]] = Outlet("out")
override val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
private var counter: Option[B] = None
private var aggregate = scala.collection.mutable.ArrayBuffer.empty[A]
setHandler(in, new InHandler {
override def onPush(): Unit = {
val element = grab(in)
counter.fold({
counter = Some(f(element))
aggregate += element
pull(in)
}) { p =>
if (f(element) == p) {
aggregate += element
pull(in)
} else {
push(out, aggregate)
counter = Some(f(element))
aggregate = scala.collection.mutable.ArrayBuffer(element)
}
}
}
override def onUpstreamFinish(): Unit = {
emit(out, aggregate)
complete(out)
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
}
}

Wrap asynchronous code in a blocking call

I'm inserting a record in MongoDB :
val observable: Observable[Completed] = collection.insertOne(doc)
observable.subscribe(new Observer[Completed] {
override def onNext(result: Completed): Unit = { println("Inserted"); }
override def onError(e: Throwable): Unit = { println(" \n\nFailed " + e + "\n\n"); fail() }
override def onComplete(): Unit = { println("Completed"); }
})
The test passes even though the onError callback is invoked. This is because insertOne is an asynchronous method and the test completes before the onError is invoked. I would like to wrap the insertOne method into a blocking call, so subscribe is not invoked until after observable value is set.
Is there an idiomatic method to achieve this is in Scala ?
The simplest approach to synchronously block the async operation is using Await.result on a Future. Since MongoCollection.insertOne returns an Observable[Complete], you can use the implicit ScalaObservable.toFuture on it:
val observable = collection.insertOne(doc)
Await.result(observable.toFuture, Duration.Inf)
observable.subscribe(new Observer[Completed] {
override def onNext(result: Completed): Unit = { println("Inserted"); }
override def onError(e: Throwable): Unit = { println(" \n\nFailed " + e + "\n\n"); fail() }
override def onComplete(): Unit = { println("Completed"); }
})