Recursive HTTP-requests in Scala - scala

I need to do recursive requests and then collect all models into one List, but not understand how to do it. Please tell me am I thinking right way?
package kindSir.main
import dispatch.Defaults._
import dispatch._
import kindSir.models._
import org.json4s._
import org.json4s.jackson.JsonMethods._
object ApplicationMain extends App {
def fetchMergeRequests(startPage: Int = 1): Future[List[MergeRequest]] = {
val requestsUrl = url(s"https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab-ce/merge_requests?state=opened&per_page=3&page=${startPage}")
Http(requestsUrl).map { res =>
(parse(res.getResponseBody), res.getHeader("X-Next-Page").toInt) match {
case (list#JArray(_), nextPage: Int) =>
val currentList: List[MergeRequest] = MergeRequest.parseList(list).get
val nextPageListFuture: Future[List[MergeRequest]] = fetchMergeRequests(nextPage)
// And how to merge these two lists?
case (list#JArray(_), _) => MergeRequest.parseList(list).get
case _ => throw new RuntimeException(s"No merge requests for project found")
}
}
}
}

The main problem you're dealing with here is that you're trying to combine data you already have (List[MergeRequest]) with the data you'll retrieve in future (Future[List[MergeRequest]]). There are a few things you need to do to handle this scenario:
Use flatMap instead of map on result of the HTTP request. This allows you to make further HTTP requests inside the recursion but map them back to a single Future.
Call map on the result of the recursion fetchMergeRequests(nextPage) to combine the data you already have with the future data from the recursion.
Wrap the other list in Future.successful() because flatMap requires all the pattern matches to return a Future — except for the exception.
I'm not familiar with the libraries you are using so I haven't tested it, but I think your code should work like this:
def fetchMergeRequests(startPage: Int = 1): Future[List[MergeRequest]] = {
val requestsUrl = url(s"https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab-ce/merge_requests?state=opened&per_page=3&page=${startPage}")
Http(requestsUrl).flatMap { res =>
(parse(res.getResponseBody), res.getHeader("X-Next-Page").toInt) match {
case (list#JArray(_), nextPage: Int) =>
val currentList: List[MergeRequest] = MergeRequest.parseList(list).get
val nextPageListFuture: Future[List[MergeRequest]] = fetchMergeRequests(nextPage)
nextPageListFuture.map(nextPageList => currentList ++ nextPageList)
case (list#JArray(_), _) =>
Future.successful(MergeRequest.parseList(list).get)
case _ => throw new RuntimeException(s"No merge requests for project found")
}
}
}

Related

Iterate data source asynchronously in batch and stop while remote return no data in Scala

Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}

How to get rid of nested future in scala?

I've got some code in my play framework app that parses a JSON request and uses it to update a user's data. The problem is that I need to return a Future[Result], but my userDAO.update function returns a Future[Int] so I have nested futures.
I've resorted to using Await which isn't very good. How can I rewrite this code to avoid the nested future?
def patchCurrentUser() = Action.async { request =>
Future {
request.body.asJson
}.map {
case Some(rawJson) => Json.fromJson[User](rawJson).map { newUser =>
val currentUserId = 1
logger.info(s"Retrieving users own profile for user ID $currentUserId")
val futureResult: Future[Result] = userDAO.findById(currentUserId).flatMap {
case Some(currentUser) =>
val mergedUser = currentUser.copy(
firstName = newUser.firstName // ... and the other fields
)
userDAO.update(mergedUser).map(_ => Ok("OK"))
case _ => Future { Status(404) }
}
import scala.concurrent.duration._
// this is bad. How can I get rid of this?
Await.result(futureResult, 1 seconds)
}.getOrElse(Status(400))
case _ => Status(400)
}
}
Update:
Sod's law: Just after posting this I worked it out:
Future {
request.body.asJson
}.flatMap {
case Some(rawJson) => Json.fromJson[User](rawJson).map { newUser =>
val currentUserId = 1
userDAO.findById(currentUserId).flatMap {
case Some(currentUser) =>
val updatedUser = currentUser.copy(
firstName = newUser.firstName
)
userDAO.update(updatedUser).map(_ => Ok("OK"))
case _ => Future { Status(404) }
}
}.getOrElse(Future(Status(400)))
case _ => Future(Status(400))
}
But, is there a more elegant way? It seems like I'm peppering Future() around quite liberally which seems like a code smell.
Use flatMap instead of map.
flatMap[A, B](f: A => Future[B])
map[A, B](f: A => B)
More elegant way is to use for comprehension
Using For comprehension Code looks like this
for {
jsonOpt <- Future (request.body.asJson)
result <- jsonOpt match {
case Some(json) =>
json.validate[User] match {
case JsSuccess(newUser, _ ) =>
for {
currentUser <- userDAO.findById(1)
_ <- userDAO.update(currentUser.copy(firstName = newUser.firstName))
} yield Ok("ok")
case JsError(_) => Future(Status(400))
}
case None => Future(Status(400))
}
} yield result
As #pamu said it might clear your code a bit if you would use a for comprehension.
Another interesting approach (and more pure in terms of Functional Programming) would be to use monad transformers (normally a type similar to Future[Option[T]] screams monad transformer).
You should take a look at libraries like cats (and or scalaz). I'll try to give a small "pseudo code" example using cats (because I don't have play framework locally):
import cats.data.OptionT
import cats.instances.future._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
def convertJsonToUser(json: Json): Future[Option[User]] = Json.fromJson[User](json)
def convertBodyToJson(request: Request): Future[Option[Json]] = Future {request.body.asJson}
def updateUser(user: User): Future[HttpResult] = Future {
// update user
Ok("ok")
}
def myFunction: Future[HttpResult] = {
val resultOpt: OptionT[Future, HttpResult] = for {
json <- OptionT(convertBodyToJson(request))
user <- OptionT(convertJsonToUser(json))
result <- OptionT.lift(updateUser(user))
} yield result
result.getOrElseF(Future {Status(400)})
}
As you can see, in this case the monad transformers allow to treat a type like Future[Option[T]] as a single "short-circuiting" type (e.g. the for comprehension will stop if you have either a failed future, or a future containing a None).

How to spawn an unknown amount of Futures and combine the result even if one or more failed?

I want to turn the following sequential code into concurrent code with Futures and need advice on how to structure it.
sequential:
import java.net.URL
val providers = List(
new URL("http://www.cnn.com"),
new URL("http://www.bbc.co.uk"),
new URL("http://www.othersite.com")
)
def download(urls: URL*) = urls.flatMap(url => io.Source.fromURL(url).getLines).distinct
val res = download(providers:_*)
I want to download all sources that are coming in via the varargs of the download method and combine the results into one Seq/List/Set, whatever, together. When one Future failed, let's say because the server is unreachable, it should take all others and move on and return the result nonetheless. firstCompletedOf won't work because I need the results of all, except one failed due to error. I thought about using Future.sequence like below but I can't get it to work. Here is what I tried...
def download(urls: URL*) = Future.sequence {
urls.map { url =>
Future {
io.Source.fromURL(url).getLines
}
}
}
This produces a Seq[Future[Iterator[String]]] which is not compatible with M_[Future[A_]].
A Future[Iterator[String]] is what I want. (I thought I return an Iterator because I need to reuse it later on with reset method on Iterator.)
You can use parallel collections:
import java.net.URL
val providers = List(
new URL("http://www.cnn.com"),
new URL("http://www.bbc.co.uk"),
new URL("http://www.othersite.com")
)
def download(urls: URL*) = urls.par.flatMap(url => {
Try {
io.Source.fromURL(url).getLines
} match {
case Success(e) => e
case Failure(_) => Seq()
}
}).toSeq
val res: Seq[String] = download(providers:_*)
Or if you want the non blocking version with a Future:
def download(urls: URL*) = Future {
blocking {
urls.par.flatMap(url => {
Try {
io.Source.fromURL(url).getLines
} match {
case Success(e) => e
case Failure(_) => Seq()
}
})
}
}
val res: Future[Seq[String]] = download(providers:_*)

How to transform submitted json in Play 2.0?

I'm trying to build a REST API using play 2.0. I have a User case class that contains some fields (like username & password) that shouldn't be updatable by the updateMember method.
Is there a good, functional way, of dealing with multiple Options somehow, because request.body.asJson returns an Option[JsValue], and my user lookup also returns an Option:
package controllers.api
import org.joda.time.LocalDate
import play.api.Play.current
import play.api.db.slick.{DB, Session}
import play.api.mvc._
import play.api.libs.json._
import play.api.libs.functional.syntax._
import models.{Gender, User, UserId}
import repositories.UserRepository
object Member extends Controller {
def updateMember(id: Long) = Action {
DB.withSession {
implicit session: Session =>
val json: JsValue = request.body.asJson // how to deal with this?
val repository = new UserRepository
repository.findById(new UserId(id)).map {
user =>
def usernameAppender = __.json.update(
__.read[JsObject].map { o => o ++ Json.obj("username" -> user.username) }
)
json.transform(usernameAppender) // transform not found
Ok("updated")
}.getOrElse(NotFound)
}
}
}
I could move the map call to where I try to parse the request, but then inside there I guess I'd need another map over the user Option like I already have. So in that style, I'd need a map per Option.
Is there a better way of dealing with multiple Options like this in FP?
You're basically dealing with a nested monad, and the main tool for working with such is flatMap, particularly if both options being None has the same semantic meaning to your program:
request.body.asJson.flatMap { requestJson =>
val repository = new UserRepository
repository.findById(new UserId(id)).map { user =>
def usernameAppender = __.json.update(
__.read[JsObject].map { o => o ++ Json.obj("username" -> user.username) }
)
requestJson.transform(usernameAppender)
Ok("updated") // EDIT: Do you not want to return the JSON?
}
}.getOrElse(NotFound)
But it might also be the case that your Nones have different meanings, in which case, you probably just want to pattern match, and handle the error cases separately:
request.body.asJson match {
case Some(requestJson) =>
val repository = new UserRepository
repository.findById(new UserId(id)).map { user =>
def usernameAppender = __.json.update(
__.read[JsObject].map { o => o ++ Json.obj("username" -> user.username) }
)
requestJson.transform(usernameAppender)
Ok("updated")
}.getOrElse(NotFound)
case None => BadRequest // Or whatever you response makes sense for this case
}

Handling Option Inside For Comprehension of Futures

Consider the following code inside a Play Framework controller:
val firstFuture = function1(id)
val secondFuture = function2(id)
val resultFuture = for {
first <- firstFuture
second <- secondFuture(_.get)
result <- function3(first, second)
} yield Ok(s"Processed $id")
resultFuture.map(result => result).recover { case t => InternalServerError(s"Error organizing files: $t.getMessage")}
Here are some details about the functions:
function1 returns Future[List]
function2 returns Future[Option[Person]]
function1 and function2 can run in parallel, but function3 needs the results for both.
Given this information, I have some questions:
Although the application is such that this code is very unlikely to be called with an improper id, I would like to handle this possibility. Basically, I would like to return NotFound if function2 returns None, but I can't figure out how to do that.
Will the recover call handle an Exception thrown any step of the way?
Is there a more elegant or idiomatic way to write this code?
Perhaps using collect, and then you can recover the NoSuchElementException--which yes, will recover a failure from any step of the way. resultFuture will either be successful with the mapped Result, or failed with the first exception that was thrown.
val firstFuture = function1(id)
val secondFuture = function2(id)
val resultFuture = for {
first <- firstFuture
second <- secondFuture.collect(case Some(x) => x)
result <- function3(first, second)
} yield Ok(s"Processed $id")
resultFuture.map(result => result)
.recover { case java.util.NoSuchElementException => NotFound }
.recover { case t => InternalServerError(s"Error organizing files: $t.getMessage")}
I would go with Scalaz OptionT. Maybe when you have only one function Future[Optipn[T]] it's overkill, but when you'll start adding more functions it will become super useful
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz.OptionT
import scalaz.OptionT._
import scalaz.std.scalaFuture._
// Wrap 'some' result into OptionT
private def someOptionT[T](t: Future[T]): OptionT[Future, T] =
optionT[Future](t.map(Some.apply))
val firstFuture = function1(id)
val secondFuture = function2(id)
val action = for {
list <- someOptionT(firstFuture)
person <- optionT(secondFuture)
result = function3(list, person)
} yield result
action.run.map {
case None => NotFound
case Some(result) => Ok(s"Processed $id")
} recover {
case NonFatal(err) => InternalServerError(s"Error organizing files: ${err.getMessage}")
}