I am writing a Play (2.2) controller in Scala, which should return the result of a query against OrientDB. Now, I have succeeded in writing a synchronous version of said controller, but I'd like to re-write it to work asynchronously.
My question is; given the below code (just put together for demonstration purposes), how do I re-write my controller to interact asynchronously with OrientDB (connecting and querying)?
import play.api.mvc.{Action, Controller}
import play.api.libs.json._
import com.orientechnologies.orient.`object`.db.OObjectDatabasePool
import java.util
import com.orientechnologies.orient.core.sql.query.OSQLSynchQuery
import scala.collection.JavaConverters._
object Packages extends Controller {
def packages() = Action { implicit request =>
val db = OObjectDatabasePool.global().acquire("http://localhost:2480", "reader", "reader")
try {
db.getEntityManager().registerEntityClass(classOf[models.Package])
val packages = db.query[util.List[models.Package]](new OSQLSynchQuery[models.Package]("select from Package")).asScala.toSeq
Ok(Json.obj(
"packages" -> Json.toJson(packages)
))
}
finally {
db.close()
}
}
}
EDIT:
Specifically, I wish to use OrientDB's asynchronous API. I know that asynchronous queries are supported by the API, though I'm not sure if you can connect asynchronously as well.
Attempted Solution
Based on Jean's answer, I've tried the following asynchronous implementation, but it fails due to a compilation error value execute is not a member of Nothing possible cause: maybe a semicolon is missing before 'value execute'?:
def getPackages(): Future[Seq[models.Package]] = {
val db = openDb
try {
val p = promise[Seq[models.Package]]
val f = p.future
db.command(
new OSQLAsynchQuery[ODocument]("select from Package",
new OCommandResultListener() {
var acc = List[ODocument]()
#Override
def result(iRecord: Any): Boolean = {
val doc = iRecord.asInstanceOf[ODocument]
acc = doc :: acc
true
}
#Override
def end() {
// This is just a dummy
p.success(Seq[models.Package]())
}
// Fails
})).execute()
f
}
finally {
db.close()
}
}
One way could be to start a promise, return the future representing the result of that promise, locally accumulate the results as they come and complete de promise ( thus resolving the future ) when orient db notifies you that the command has completed.
def executeAsync(osql: String, params: Map[String, String] = Map()): Future[List[ODocument]] = {
import scala.concurrent._
val p = promise[List[ODocument]]
val f =p.future
val req: OCommandRequest = database.command(
new OSQLAsynchQuery[ODocument]("select * from animal where name = 'Gipsy'",
new OCommandResultListener() {
var acc = List[ODocument]()
#Override
def result(iRecord:Any):Boolean= {
val doc = iRecord.asInstanceOf[ODocument]
acc=doc::acc
true
}
#Override
def end() {
p.success(acc)
}
}))
req.execute()
f
}
Be careful though, to enable graph navigation and field lazy loading, orientdb objects used to keep an internal reference to the database instance they were loaded from ( or to depend on a threadlocal database connected instance ) for lazily loading elements from the database. Manipulating these objects asynchronously may result in loading errors. I haven't checked changes from 1.6 but that seemed to be deeply embedded in the design.
It's as simple as wrapping the blocking call in a Future.
import play.api.libs.concurrent.Execution.Implicits.defaultContext
import scala.concurrent.Future
object Packages extends Controller {
def packages = Action.async { implicit request =>
val db = OObjectDatabasePool.global().acquire("http://localhost:2480", "reader", "reader")
db.getEntityManager().registerEntityClass(classOf[models.Package])
val futureResult: Future[Result] = Future(
db.query[util.List[models.Package]](new OSQLSynchQuery[models.Package]("select from Package")).asScala.toSeq
).map(
queryResult => Ok(Json.obj("packages" -> Json.toJson(packages)))
).recover {
// Handle each of the exception cases legitimately
case e: UnsupportedOperationException => UnsupportedMediaType(e.getMessage)
case e: MappingException => BadRequest(e.getMessage)
case e: MyServiceException => ServiceUnavailable(e.toString)
case e: Throwable => InternalServerError(e.toString + "\n" + e.getStackTraceString)
}
futureResult.onComplete { case _ =>
db.close()
}
futureResult
}
}
Note that I did not compile the code. There is a lot of room to improve the code.
Related
Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}
I'm trying to build a REST API using play 2.0. I have a User case class that contains some fields (like username & password) that shouldn't be updatable by the updateMember method.
Is there a good, functional way, of dealing with multiple Options somehow, because request.body.asJson returns an Option[JsValue], and my user lookup also returns an Option:
package controllers.api
import org.joda.time.LocalDate
import play.api.Play.current
import play.api.db.slick.{DB, Session}
import play.api.mvc._
import play.api.libs.json._
import play.api.libs.functional.syntax._
import models.{Gender, User, UserId}
import repositories.UserRepository
object Member extends Controller {
def updateMember(id: Long) = Action {
DB.withSession {
implicit session: Session =>
val json: JsValue = request.body.asJson // how to deal with this?
val repository = new UserRepository
repository.findById(new UserId(id)).map {
user =>
def usernameAppender = __.json.update(
__.read[JsObject].map { o => o ++ Json.obj("username" -> user.username) }
)
json.transform(usernameAppender) // transform not found
Ok("updated")
}.getOrElse(NotFound)
}
}
}
I could move the map call to where I try to parse the request, but then inside there I guess I'd need another map over the user Option like I already have. So in that style, I'd need a map per Option.
Is there a better way of dealing with multiple Options like this in FP?
You're basically dealing with a nested monad, and the main tool for working with such is flatMap, particularly if both options being None has the same semantic meaning to your program:
request.body.asJson.flatMap { requestJson =>
val repository = new UserRepository
repository.findById(new UserId(id)).map { user =>
def usernameAppender = __.json.update(
__.read[JsObject].map { o => o ++ Json.obj("username" -> user.username) }
)
requestJson.transform(usernameAppender)
Ok("updated") // EDIT: Do you not want to return the JSON?
}
}.getOrElse(NotFound)
But it might also be the case that your Nones have different meanings, in which case, you probably just want to pattern match, and handle the error cases separately:
request.body.asJson match {
case Some(requestJson) =>
val repository = new UserRepository
repository.findById(new UserId(id)).map { user =>
def usernameAppender = __.json.update(
__.read[JsObject].map { o => o ++ Json.obj("username" -> user.username) }
)
requestJson.transform(usernameAppender)
Ok("updated")
}.getOrElse(NotFound)
case None => BadRequest // Or whatever you response makes sense for this case
}
Background: I have a function:
def doWork(symbol: String): Future[Unit]
which initiates some side-effects to fetch data and store it, and completes a Future when its done. However, the back-end infrastructure has usage limits, such that no more than 5 of these requests can be made in parallel. I have a list of N symbols that I need to get through:
var symbols = Array("MSFT",...)
but I want to sequence them such that no more than 5 are executing simultaneously. Given:
val allowableParallelism = 5
my current solution is (assuming I'm working with async/await):
val symbolChunks = symbols.toList.grouped(allowableParallelism).toList
def toThunk(x: List[String]) = () => Future.sequence(x.map(doWork))
val symbolThunks = symbolChunks.map(toThunk)
val done = Promise[Unit]()
def procThunks(x: List[() => Future[List[Unit]]]): Unit = x match {
case Nil => done.success()
case x::xs => x().onComplete(_ => procThunks(xs))
}
procThunks(symbolThunks)
await { done.future }
but, for obvious reasons, I'm not terribly happy with it. I feel like this should be possible with folds, but every time I try, I end up eagerly creating the Futures. I also tried out a version with RxScala Observables, using concatMap, but that also seemed like overkill.
Is there a better way to accomplish this?
I have example how to do it with scalaz-stream. It's quite a lot of code because it's required to convert scala Future to scalaz Task (abstraction for deferred computation). However it's required to add it to project once. Another option is to use Task for defining 'doWork'. I personally prefer task for building async programs.
import scala.concurrent.{Future => SFuture}
import scala.util.Random
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz.stream._
import scalaz.concurrent._
val P = scalaz.stream.Process
val rnd = new Random()
def doWork(symbol: String): SFuture[Unit] = SFuture {
Thread.sleep(rnd.nextInt(1000))
println(s"Symbol: $symbol. Thread: ${Thread.currentThread().getName}")
}
val symbols = Seq("AAPL", "MSFT", "GOOGL", "CVX").
flatMap(s => Seq.fill(5)(s).zipWithIndex.map(t => s"${t._1}${t._2}"))
implicit class Transformer[+T](fut: => SFuture[T]) {
def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
import scala.util.{Failure, Success}
import scalaz.syntax.either._
Task.async {
register =>
fut.onComplete {
case Success(v) => register(v.right)
case Failure(ex) => register(ex.left)
}
}
}
}
implicit class ConcurrentProcess[O](val process: Process[Task, O]) {
def concurrently[O2](concurrencyLevel: Int)(f: Channel[Task, O, O2]): Process[Task, O2] = {
val actions =
process.
zipWith(f)((data, f) => f(data))
val nestedActions =
actions.map(P.eval)
merge.mergeN(concurrencyLevel)(nestedActions)
}
}
val workChannel = io.channel((s: String) => doWork(s).toTask)
val process = Process.emitAll(symbols).concurrently(5)(workChannel)
process.run.run
When you'll have all this transformation in scope, basically all you need is:
val workChannel = io.channel((s: String) => doWork(s).toTask)
val process = Process.emitAll(symbols).concurrently(5)(workChannel)
Quite short and self-decribing
Although you've already got an excellent answer, I thought I might still offer an opinion or two about these matters.
I remember seeing somewhere (on someone's blog) "use actors for state and use futures for concurrency".
So my first thought would be to utilize actors somehow. To be precise, I would have a master actor with a router launching multiple worker actors, with number of workers restrained according to allowableParallelism. So, assuming I have
def doWorkInternal (symbol: String): Unit
which does the work from yours doWork taken 'outside of future', I would have something along these lines (very rudimentary, not taking many details into consideration, and practically copying code from akka documentation):
import akka.actor._
case class WorkItem (symbol: String)
case class WorkItemCompleted (symbol: String)
case class WorkLoad (symbols: Array[String])
case class WorkLoadCompleted ()
class Worker extends Actor {
def receive = {
case WorkItem (symbol) =>
doWorkInternal (symbol)
sender () ! WorkItemCompleted (symbol)
}
}
class Master extends Actor {
var pending = Set[String] ()
var originator: Option[ActorRef] = None
var router = {
val routees = Vector.fill (allowableParallelism) {
val r = context.actorOf(Props[Worker])
context watch r
ActorRefRoutee(r)
}
Router (RoundRobinRoutingLogic(), routees)
}
def receive = {
case WorkLoad (symbols) =>
originator = Some (sender ())
context become processing
for (symbol <- symbols) {
router.route (WorkItem (symbol), self)
pending += symbol
}
}
def processing: Receive = {
case Terminated (a) =>
router = router.removeRoutee(a)
val r = context.actorOf(Props[Worker])
context watch r
router = router.addRoutee(r)
case WorkItemCompleted (symbol) =>
pending -= symbol
if (pending.size == 0) {
context become receive
originator.get ! WorkLoadCompleted
}
}
}
You could query the master actor with ask and receive a WorkLoadCompleted in a future.
But thinking more about 'state' (of number of simultaneous requests in processing) to be hidden somewhere, together with implementing necessary code for not exceeding it, here's something of the 'future gateway intermediary' sort, if you don't mind imperative style and mutable (used internally only though) structures:
object Guardian
{
private val incoming = new collection.mutable.HashMap[String, Promise[Unit]]()
private val outgoing = new collection.mutable.HashMap[String, Future[Unit]]()
private val pending = new collection.mutable.Queue[String]
def doWorkGuarded (symbol: String): Future[Unit] = {
synchronized {
val p = Promise[Unit] ()
incoming(symbol) = p
if (incoming.size <= allowableParallelism)
launchWork (symbol)
else
pending.enqueue (symbol)
p.future
}
}
private def completionHandler (t: Try[Unit]): Unit = {
synchronized {
for (symbol <- outgoing.keySet) {
val f = outgoing (symbol)
if (f.isCompleted) {
incoming (symbol).completeWith (f)
incoming.remove (symbol)
outgoing.remove (symbol)
}
}
for (i <- outgoing.size to allowableParallelism) {
if (pending.nonEmpty) {
val symbol = pending.dequeue()
launchWork (symbol)
}
}
}
}
private def launchWork (symbol: String): Unit = {
val f = doWork(symbol)
outgoing(symbol) = f
f.onComplete(completionHandler)
}
}
doWork now is exactly like yours, returning Future[Unit], with the idea that instead of using something like
val futures = symbols.map (doWork (_)).toSeq
val future = Future.sequence(futures)
which would launch futures not regarding allowableParallelism at all, I would instead use
val futures = symbols.map (Guardian.doWorkGuarded (_)).toSeq
val future = Future.sequence(futures)
Think about some hypothetical database access driver with non-blocking interface, i.e. returning futures on requests, which is limited in concurrency by being built over some connection pool for example - you wouldn't want it to return futures not taking parallelism level into account, and require you to juggle with them to keep parallelism under control.
This example is more illustrative than practical since I wouldn't normally expect that 'outgoing' interface would be utilizing futures like this (which is quote ok for 'incoming' interface).
First, obviously some purely functional wrapper around Scala's Future is needed, cause it's side-effective and runs as soon as it can. Let's call it Deferred:
import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch
class Deferred[+T](f: () => Future[T]) {
def run(): Future[T] = f()
}
object Deferred {
def apply[T](future: => Future[T]): Deferred[T] =
new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}
And here is the routine:
import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger
import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}
trait ConcurrencyUtils {
def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
(operations: Seq[Deferred[T]])
(implicit ec: ExecutionContext): Deferred[Seq[T]] =
if (parallelism > 0) Deferred {
val indexedOps = operations.toIndexedSeq // index for faster access
val promise = Promise[Seq[T]]()
val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically
def run(operation: Deferred[T], index: Int): Unit = {
operation.run().onComplete {
case Success(value) =>
acc.add((index, value)) // accumulate result value
if (acc.size == indexedOps.size) { // we've done
import scala.collection.JavaConversions._
// in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
} else {
val next = nextIndex.getAndIncrement() // get and inc atomically
if (next < indexedOps.size) { // run next operation if exists
run(indexedOps(next), next)
}
}
case Failure(t) =>
promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
}
}
if (operations.nonEmpty) {
indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
promise.future
} else {
Future.successful(Seq.empty)
}
} else {
throw new IllegalArgumentException("Parallelism must be positive")
}
}
In a nutshell, we run as much operations initially as allowed and then on each operation completion we run next operation available, if any. So the only difficulty here is to maintain next operation index and results accumulator in concurrent setting. I'm not an absolute concurrency expert, so make me know if there are some potential problems in the code above. Notice that returned value is also a deferred computation that should be run.
Usage and test:
import org.scalatest.{Matchers, FlatSpec}
import org.scalatest.concurrent.ScalaFutures
import org.scalatest.time.{Seconds, Span}
import scala.collection.immutable.Seq
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.concurrent.duration._
class ConcurrencyUtilsSpec extends FlatSpec with Matchers with ScalaFutures with ConcurrencyUtils {
"runWithBoundedParallelism" should "return results in correct order" in {
val comp1 = mkDeferredComputation(1)
val comp2 = mkDeferredComputation(2)
val comp3 = mkDeferredComputation(3)
val comp4 = mkDeferredComputation(4)
val comp5 = mkDeferredComputation(5)
val compountComp = runWithBoundedParallelism(2)(Seq(comp1, comp2, comp3, comp4, comp5))
whenReady(compountComp.run()) { result =>
result should be (Seq(1, 2, 3, 4, 5))
}
}
// increase default ScalaTest patience
implicit val defaultPatience = PatienceConfig(timeout = Span(10, Seconds))
private def mkDeferredComputation[T](result: T, sleepDuration: FiniteDuration = 100.millis): Deferred[T] =
Deferred {
Future {
Thread.sleep(sleepDuration.toMillis)
result
}
}
}
Use Monix Task. An example from Monix document for parallelism=10
val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)
// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...
Akka streams, allow you to do the following:
import akka.NotUsed
import akka.stream.Materializer
import akka.stream.scaladsl.Source
import scala.concurrent.Future
def sequence[A: Manifest, B](items: Seq[A], func: A => Future[B], parallelism: Int)(
implicit mat: Materializer
): Future[Seq[B]] = {
val futures: Source[B, NotUsed] =
Source[A](items.toList).mapAsync(parallelism)(x => func(x))
futures.runFold(Seq.empty[B])(_ :+ _)
}
sequence(symbols, doWork, allowableParallelism)
I am using anorm to access data on my DB. The DB is written to using another service, which is made in Java, and persist using ebean.
I have the following scala object
import java.sql.Connection
import scala.concurrent.{ Future, blocking, future }
import scala.concurrent.ExecutionContext.Implicits.global
import anorm.{ SQL, SqlQuery, SqlRow, sqlToSimple, toParameterValue }
import play.api.Logger
import play.api.Play.current
import play.api.db.DB
object Queries {
private val readDataSource: String = play.Configuration.root().getString("data.provider.api.source", "default")
//better IO execution context
import play.api.libs.concurrent.Execution.Implicits.defaultContext
private val dataSetDescription: SqlQuery = SQL("SELECT DISTINCT platform, name FROM data_nugget")
private val identityCreationTime: SqlQuery = SQL("SELECT i.creation_time FROM identity i WHERE platform = {pfm} AND userid = {uid};")
private val identityData: SqlQuery = SQL("SELECT n.name, n.value FROM data_nugget n WHERE platform = {pfm} AND userid = {uid};")
private val playerData: SqlQuery = SQL("SELECT n.platform, n.name, n.value, r.userid, r.registration_time FROM data_nugget n JOIN registration r ON n.platform=r.platform AND n.userid=r.userid WHERE r.playerid = {pid} AND r.application = {app};")
private def withAsyncAnormConnection(function: Connection => Stream[SqlRow]): Future[List[SqlRow]] = {
future {
blocking {
DB.withConnection(readDataSource)(c => function(c)).toList
}
}
}
def fetchDistinctDataNames(): Future[List[SqlRow]] = {
withAsyncAnormConnection(implicit c => dataSetDescription())
}
def fetchIdentityCreationTime(platform: String, userid: String): Future[List[SqlRow]] = {
withAsyncAnormConnection(implicit c => identityCreationTime.on("pfm" -> platform, "uid" -> userid)())
}
def fetchIdentityData(platform: String, userid: String): Future[List[SqlRow]] = {
withAsyncAnormConnection(implicit c => identityData.on("pfm" -> platform, "uid" -> userid)())
}
def fetchRegistrationData(game: String, playerid: String): Future[List[SqlRow]] = {
withAsyncAnormConnection(implicit c => playerData.on("app" -> game, "pid" -> playerid)())
}
}
I use it to wrap my SQL queries executions within futures.
Everytime I run any of those queries I obtain an error with this following stack trace :
(Error,com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982)
com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
com.mysql.jdbc.ResultSetImpl.checkClosed(ResultSetImpl.java:794)
com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:7139)
anorm.Sql$$anonfun$resultSetToStream$1.apply(Anorm.scala:527)
anorm.Sql$$anonfun$resultSetToStream$1.apply(Anorm.scala:527)
anorm.Useful$.unfold(Anorm.scala:315)
anorm.Useful$$anonfun$unfold$1.apply(Anorm.scala:317)
anorm.Useful$$anonfun$unfold$1.apply(Anorm.scala:317)
scala.collection.immutable.Stream$Cons.tail(Stream.scala:1078)
scala.collection.immutable.Stream$Cons.tail(Stream.scala:1070)
scala.collection.immutable.Stream.foreach(Stream.scala:548)
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:178)
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
scala.collection.TraversableLike$class.to(TraversableLike.scala:629)
scala.collection.AbstractTraversable.to(Traversable.scala:105)
scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:243)
scala.collection.AbstractTraversable.toList(Traversable.scala:105)
controllers.dataprovider.data.Queries$$anonfun$withAsyncAnormConnection$1$$anonfun$apply$1.apply(Queries.scala:31)
controllers.dataprovider.data.Queries$$anonfun$withAsyncAnormConnection$1$$anonfun$apply$1.apply(Queries.scala:31)
scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$2$$anon$3.block(ExecutionContextImpl.scala:44)
scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:2803)
scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$2.blockOn(ExecutionContextImpl.scala:41)
scala.concurrent.package$.blocking(package.scala:50)
controllers.dataprovider.data.Queries$$anonfun$withAsyncAnormConnection$1.apply(Queries.scala:30)
controllers.dataprovider.data.Queries$$anonfun$withAsyncAnormConnection$1.apply(Queries.scala:30)
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
scala.concurrent.forkjoin.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1417)
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:262)
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1478)
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104))
I already encountered those on previous Java services using jdbc but here I am not touching the ResultSet, and I am even returning a list asap from the Stream of rows I receive from the connection.
What is happening? Where am I closing the ResultSet? What did I refactor wrong?
As a note, on the prototype of this service (when everything was in the controller) I used to have the SQL("...") directly in the code with something like that:
future {
blocking {
DB.withConnection(implicit c => {
SQL("SELECT DISTINCT platform, name FROM data_nugget")().map(row => (row[String]("platform"), row[String]("name"))).toArray
})
}
}
and it worked just fine.
PS : Sorry for the long copy/paste of stacktrace and the code ... trying to be detailed.
I solved it myself and it is a very fine line.
I changed this function
private def withAsyncAnormConnection(function: Connection => Stream[SqlRow]): Future[List[SqlRow]] = {
future {
blocking {
DB.withConnection(readDataSource)(c => function(c)).toList
}
}
}
to THIS:
private def withAsyncAnormConnection(function: Connection => Stream[SqlRow]): Future[List[SqlRow]] = {
future {
blocking {
DB.withConnection(readDataSource)(c => function(c).toList)
}
}
}
The trick is that I am using the "loan-pattern" of withConnection, so I need to iter through the Stream to get all the rows before I release the connection.
The connection is alive only within this round brackets (c => function(c).toList)
There's a difference between the code that is working for you and the code that is not working. In your working example, you are calling map on the lazy Stream of Row instances. In the non-working example, you are calling toList without using map. Maybe map is forcing the full processing of the underlying ResultSet within the withConnection block and toList is not, leaving it lazy until you get outside of the withConnection block after which the underlying ResultSet is closed. Maybe you can modify your new code to try and map the results (mapping the Row to itself, no actual mapping logic) and see if this fixes anything.
I'm trying ScalaQuery, it is really amazing. I could defined the database table using Scala class, and query it easily.
But I would like to know, in the following code, how could I check if a table is exists, so I won't call 'Table.ddl.create' twice and get a exception when I run this program twice?
object Users extends Table[(Int, String, String)]("Users") {
def id = column[Int]("id")
def first = column[String]("first")
def last = column[String]("last")
def * = id ~ first ~ last
}
object Main
{
val database = Database.forURL("jdbc:sqlite:sample.db", driver = "org.sqlite.JDBC")
def main(args: Array[String]) {
database withSession {
// How could I know table Users is alrady in the DB?
if ( ??? ) {
Users.ddl.create
}
}
}
}
ScalaQuery version 0.9.4 includes a number of helpful SQL metadata wrapper classes in the org.scalaquery.meta package, such as MTable:
http://scalaquery.org/doc/api/scalaquery-0.9.4/#org.scalaquery.meta.MTable
In the test code for ScalaQuery, we can see examples of these classes being used. In particular, see org.scalaquery.test.MetaTest.
I wrote this little function to give me a map of all the known tables, keyed by table name.
import org.scalaquery.meta.{MTable}
def makeTableMap(dbsess: Session) : Map[String, MTable] = {
val tableList = MTable.getTables.list()(dbsess);
val tableMap = tableList.map{t => (t.name.name, t)}.toMap;
tableMap;
}
So now, before I create an SQL table, I can check "if (!tableMap.contains(tableName))".
This thread is a bit old, but maybe someone will find this useful. All my DAOs include this:
def create = db withSession {
if (!MTable.getTables.list.exists(_.name.name == MyTable.tableName))
MyTable.ddl.create
}
Here's a full solution that checks on application start using a PostGreSQL DB for PlayFramework
import globals.DBGlobal
import models.UsersTable
import org.scalaquery.meta.MTable
import org.scalaquery.session.Session
import play.api.GlobalSettings
import play.api.Application
object Global extends GlobalSettings {
override def onStart(app: Application) {
DBGlobal.db.withSession { session : Session =>
import org.scalaquery.session.Database.threadLocalSession
import org.scalaquery.ql.extended.PostgresDriver.Implicit._
if (!makeTableMap(session).contains("tableName")) {
UsersTable.ddl.create(session)
}
}
}
def makeTableMap(dbsess: Session): Map[String, MTable] = {
val tableList = MTable.getTables.list()(dbsess)
val tableMap = tableList.map {
t => (t.name.name, t)
}.toMap
tableMap
}
}
With java.sql.DatabaseMetaData (Interface). Depending on your Database, more or less functions might be implemented.
See also the related discussion here.I personally prefer hezamu's suggestion and extend it as follows to keep it DRY:
def createIfNotExists(tables: TableQuery[_ <: Table[_]]*)(implicit session: Session) {
tables foreach {table => if(MTable.getTables(table.baseTableRow.tableName).list.isEmpty) table.ddl.create}
}
Then you can just create your tables with the implicit session:
db withSession {
implicit session =>
createIfNotExists(table1, table2, ..., tablen)
}
You can define in your DAO impl the following method (taken from Slick MTable.getTables always fails with Unexpected exception[JdbcSQLException: Invalid value 7 for parameter columnIndex [90008-60]]) that gives you a true o false depending if there a defined table in your db:
def checkTable() : Boolean = {
val action = MTable.getTables
val future = db.run(action)
val retVal = future map {result =>
result map {x => x}
}
val x = Await.result(retVal, Duration.Inf)
if (x.length > 0) {
true
} else {
false
}
}
Or, you can check if some "GIVENTABLENAME" or something exists with println method:
def printTable() ={
val q = db.run(MTable.getTables)
println(Await.result(q, Duration.Inf).toList(0)) //prints first MTable element
println(Await.result(q, Duration.Inf).toList(1))//prints second MTable element
println(Await.result(q, Duration.Inf).toList.toString.contains("MTable(MQName(public.GIVENTABLENAME_pkey),INDEX,null,None,None,None)"))
}
Don't forget to add
import slick.jdbc.meta._
Then call the methods from anywhere with the usual #Inject(). Using
play 2.4 and play-slick 1.0.0.
Cheers,