How to use Java libraries asynchronously in a Scala Play 2.0 application? - scala

I see in the Play 2.0 Scala doc for calling web services that the idiomatic approach is to use Scala's asynchronous mechanisms to call web services. So if I'm using Java libraries for, say, downloading images from S3 and uploading to Facebook and Twitter (restfb and twitter4j), does this make for a highly inefficient use of resources (what resources?) or does it not make much difference (or no difference at all)?
If it makes a difference, how would I go about making something like the following asynchronous? Is there a quick way, or would I have to write libraries from scratch?
Note this will be running on heroku, if that matters in this discussion.
def tweetJpeg = Action(parse.urlFormEncoded) { request =>
val form = request.body
val folder = form("folder").head
val mediaType = form("type").head
val photo = form("photo").head
val path = folder + "/" + mediaType + "/" + photo
val config = Play.current.configuration;
val awsAccessKey = config.getString("awsAccessKey").get
val awsSecretKey = config.getString("awsSecretKey").get
val awsBucket = config.getString("awsBucket").get
val awsCred = new BasicAWSCredentials(awsAccessKey, awsSecretKey)
val amazonS3Client = new AmazonS3Client(awsCred)
val obj = amazonS3Client.getObject(awsBucket, path)
val stream = obj.getObjectContent()
val twitterKey = config.getString("twitterKey").get
val twitterSecret = config.getString("twitterSecret").get
val token = form("token").head
val secret = form("secret").head
val tweet = form("tweet").head
val cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey(twitterKey)
.setOAuthConsumerSecret(twitterSecret)
.setOAuthAccessToken(token)
.setOAuthAccessTokenSecret(secret)
val tf = new TwitterFactory(cb.build())
val twitter = tf.getInstance()
val status = new StatusUpdate(tweet)
status.media(photo, stream)
val twitResp = twitter.updateStatus(status)
Logger.info("Tweeted " + twitResp.getText())
Ok("Tweeted " + twitResp.getText())
}
def facebookJpeg = Action(parse.urlFormEncoded) { request =>
val form = request.body
val folder = form("folder").head
val mediaType = form("type").head
val photo = form("photo").head
val path = folder + "/" + mediaType + "/" + photo
val config = Play.current.configuration;
val awsAccessKey = config.getString("awsAccessKey").get
val awsSecretKey = config.getString("awsSecretKey").get
val awsBucket = config.getString("awsBucket").get
val awsCred = new BasicAWSCredentials(awsAccessKey, awsSecretKey)
val amazonS3Client = new AmazonS3Client(awsCred)
val obj = amazonS3Client.getObject(awsBucket, path)
val stream = obj.getObjectContent()
val token = form("token").head
val msg = form("msg").head
val facebookClient = new DefaultFacebookClient(token)
val fbClass = classOf[FacebookType]
val param = com.restfb.Parameter.`with`("message", msg)
val attachment = com.restfb.BinaryAttachment`with`(photo + ".png", stream)
val fbResp = facebookClient.publish("me/photos", fbClass, attachment, param)
Logger.info("Posted " + fbResp.toString())
Ok("Posted " + fbResp.toString())
}
My attempt at a guess:
Yes it's better to do things asynchronous; you're tying up threads if you do everything synchronously. Threads are memory hogs, so your server can only use so many; the more that are tied up waiting, the fewer requests your server can respond to.
No it's not a huge issue. With node.js (and Rails? Django?) it is a huge issue because there's only one thread and so it blocks your whole web server. A JVM server is multithreaded so you can still service new requests.
You can easily wrap the whole thing in a future, or do it more granularly, but that doesn't really buy you anything because you're calling the same methods, so you're just shifting the wait from one thread do another.
If those Java libraries offer asynchronous methods, you can wrap those in a future to get the real benefits of asynchrony <-how to do?. Otherwise yes you're looking at writing from the ground up.
Don't really know if running on heroku matters. Is one dyno == one simultaneous request?

I think it's best to do these requests asynchronously for two main reasons:
high latency (network calls)
failures
With Play, you should use the Akka actors to make your actions it provides great ways to deal with these two concerns.
The problem synchronous code is that it will block the web server. So it won't be available to other requests. Here we will make the wait in other threads unrelated to the web server.
You could do something like:
// you will have to write the TwitterActor
val twitterActor = Akka.system.actorOf(Props[TwitterActor], name = "twitter-actor")
def tweetJpeg = Action(parse.urlFormEncoded) { request =>
val futureMessage = (twitterActor ? request.body).map {
// Do something with the response from the actor
case ... => ...
}
async {
futureMessage.map( message =>
ok("Tweeted " + message)
)
}
}
Your actor would receive the body and send back the response of the service.
Moreover with Akka, you can tune your process to have several actors available, have a circuit breaker ...
To go further: http://doc.akka.io/docs/akka/2.1.2/scala/actors.html
Ps: I never tried play on Heroku so I don't know the impact of a single dynamo.

Related

Making HTTP post requests on Spark usign foreachPartition

Need some help to understand the behaviour of the below in Spark (using Scala and Databricks)
I have some dataframe (reading from S3 if that matters), and would send that data by making HTTP post requests in batches of 1000 (at most). So I repartitioned the dataframe to make sure each partition has no more than 1000 records. Also, created a json column for each line (so I need only to put them in an array later on)
The trouble is on the making the requests. I created the following a Serializable class using the following code
import org.apache.spark.sql.{DataFrame, Row}
import org.apache.http.client.methods.HttpPost
import org.apache.http.impl.client.HttpClientBuilder
import org.apache.http.HttpHeaders
import org.apache.http.entity.StringEntity
import org.apache.commons.io.IOUtils
object postObject extends Serializable{
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://my-cool-api-endpoint")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
def makeHttpCall(row: Iterator[Row]) = {
val json_str = """{"people": [""" + row.toSeq.map(x => x.getAs[String]("json")).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
}
}
Now when I try the following:
postObject.makeHttpCall(data.head(2).toIterator)
It works like a charm. The requests go through, there is some output on the screen, and my API gets that data.
But when I try to put it in the foreachPartition:
data.foreachPartition { x =>
postObject.makeHttpCall(x)
}
Nothing happens. No output on screen, nothing arrives in my API. If I try to rerun it, almost all stages just skips. I believe, for any reason, it is just lazy evaluating my requests, but not actually performing it. I don't understand why, and how to force it.
postObject has 2 fields: client and post which has to be serialized.
I'm not sure that client is serialized properly. post object is potentially mutated from several partitions (on the same worker). So many things could go wrong here.
I propose tryng removing postObject and inlining its body into foreachPartition directly.
Addition:
Tried to run it myself:
sc.parallelize((1 to 10).toList).foreachPartition(row => {
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://google.com")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
val json_str = """{"people": [""" + row.toSeq.map(x => x.toString).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
})
Ran it both locally and in cluster.
It completes successfully and prints 405 errors to worker logs.
So requests definitely hit the server.
foreachPartition returns nothing as the result. To debug your issue you can change it to mapPartitions:
val responseCodes = sc.parallelize((1 to 10).toList).mapPartitions(row => {
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://google.com")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
val json_str = """{"people": [""" + row.toSeq.map(x => x.toString).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
Iterator.single(response.getStatusLine.getStatusCode)
}).collect()
println(responseCodes.mkString(", "))
This code returns the list of response codes so you can analyze it.
For me it prints 405, 405 as expected.
There is a way to do this without having to find out what exactly is not serializable. If you want to keep the structure of your code, you can make all fields #transient lazy val. Also, any call with side effects should be wrapped in a block. For example
val post = {
val httpPost = new HttpPost("https://my-cool-api-endpoint")
httpPost.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
httpPost
}
That will delay the initialization of all fields until they are used by the workers. Each worker will have an instance of the object and you will be able to make invoke the makeHttpCall method.

Large file download with Play framework

I have a sample download code that works fine if the file is not zipped because I know the length and when I provide, it I think while streaming play does not have to bring the whole file in memory and it works. The below code works
def downloadLocalBackup() = Action {
var pathOfFile = "/opt/mydir/backups/big/backup"
val file = new java.io.File(pathOfFile)
val path: java.nio.file.Path = file.toPath
val source: Source[ByteString, _] = FileIO.fromPath(path)
logger.info("from local backup set the length in header as "+file.length())
Ok.sendEntity(HttpEntity.Streamed(source, Some(file.length()), Some("application/zip"))).withHeaders("Content-Disposition" -> s"attachment; filename=backup")
}
I don't know how the streaming in above case takes care of the difference in speed between disk reads(Which are faster than network). This never runs out of memory even for large files. But when I use the below code, which has zipOutput stream I am not sure of the reason to run out of memory. Somehow the same 3GB file when I try to use with zip stream, is not working.
def downloadLocalBackup2() = Action {
var pathOfFile = "/opt/mydir/backups/big/backup"
val file = new java.io.File(pathOfFile)
val path: java.nio.file.Path = file.toPath
val enumerator = Enumerator.outputStream { os =>
val zipStream = new ZipOutputStream(os)
zipStream.putNextEntry(new ZipEntry("backup2"))
val is = new BufferedInputStream(new FileInputStream(pathOfFile))
val buf = new Array[Byte](1024)
var len = is.read(buf)
var totalLength = 0L;
var logged = false;
while (len >= 0) {
zipStream.write(buf, 0, len)
len = is.read(buf)
if (!logged) {
logged = true;
logger.info("logging the while loop just one time")
}
}
is.close
zipStream.close()
}
logger.info("log right before sendEntity")
val kk = Ok.sendEntity(HttpEntity.Streamed(Source.fromPublisher(Streams.enumeratorToPublisher(enumerator)).map(x => {
val kk = Writeable.wByteArray.transform(x); kk
}),
None, Some("application/zip"))
).withHeaders("Content-Disposition" -> s"attachment; filename=backupfile.zip")
kk
}
In the first example, Akka Streams handles all details for you. It knows how to read the input stream without loading the complete file in memory. That is the advantage of using Akka Streams as explained in the docs:
The way we consume services from the Internet today includes many instances of streaming data, both downloading from a service as well as uploading to it or peer-to-peer data transfers. Regarding data as a stream of elements instead of in its entirety is very useful because it matches the way computers send and receive them (for example via TCP), but it is often also a necessity because data sets frequently become too large to be handled as a whole. We spread computations or analyses over large clusters and call it “big data”, where the whole principle of processing them is by feeding those data sequentially—as a stream—through some CPUs.
...
The purpose [of Akka Streams] is to offer an intuitive and safe way to formulate stream processing setups such that we can then execute them efficiently and with bounded resource usage—no more OutOfMemoryErrors. In order to achieve this our streams need to be able to limit the buffering that they employ, they need to be able to slow down producers if the consumers cannot keep up. This feature is called back-pressure and is at the core of the Reactive Streams initiative of which Akka is a founding member.
At the second example, you are handling the input/output streams by yourself, using the standard blocking API. I'm not 100% sure about how writing to a ZipOutputStream works here, but it is possible that it is not flushing the writes and accumulating everything before close.
Good thing is that you don't need to handle this manually since Akka Streams provides a way to gzip a Source of ByteStrings:
import javax.inject.Inject
import akka.util.ByteString
import akka.stream.scaladsl.{Compression, FileIO, Source}
import play.api.http.HttpEntity
import play.api.mvc.{BaseController, ControllerComponents}
class FooController #Inject()(val controllerComponents: ControllerComponents) extends BaseController {
def download = Action {
val pathOfFile = "/opt/mydir/backups/big/backup"
val file = new java.io.File(pathOfFile)
val path: java.nio.file.Path = file.toPath
val source: Source[ByteString, _] = FileIO.fromPath(path)
val gzipped = source.via(Compression.gzip)
Ok.sendEntity(HttpEntity.Streamed(gzipped, Some(file.length()), Some("application/zip"))).withHeaders("Content-Disposition" -> s"attachment; filename=backup")
}
}

How to use actor inside of spray route in REST service?

I'm trying to build event sourced service with REST interface using scala. I somewhat new to scala, although I'm familiar with functional programming (haskell at beginner level).
So I've build persistent actor and view without major problems. The idea of actors is quite simple I think.
object Main extends App {
val system = ActorSystem("HelloSystem")
val systemActor = system.actorOf(Props[SystemActor], name = "systemactor")
val trajectoryView = system.actorOf(Props[TrajectoryView], name = "trajectoryView")
var datas = List()
val processData = ProcessData(0, List(1,2,3), Coordinates(50, 50))
implicit val timeout = Timeout(5 seconds)
def intialDatas(): List[ProcessData] =
(for (i <- 1 to 3) yield ProcessData(i, List(1,2,3), Coordinates(50 + i, 50 + i)))(collection.breakOut)
val command = RegisterProcessCommand(3, this.intialDatas())
val id = Await.result(systemActor ? command, timeout.duration).asInstanceOf[String]
println(id)
systemActor ! MoveProcessCommand(4, ProcessData(4, List(3,4,5), Coordinates(54, 54)), id)
val processes = Await.result(systemActor ? "get", timeout.duration).asInstanceOf[Set[Process]]
println(processes)
implicit val json4sFormats = DefaultFormats
println(write(processes))
println("*****************")
systemActor ! "print"
val getTrajectoryCommand = GetTrajectoryCommand(id)
Thread.sleep(10000)
trajectoryView ! "print"
// val trajectory = Await.result(trajectoryView ? getTrajectoryCommand, timeout.duration).asInstanceOf[ListBuffer[Coordinates]]
println("******* TRAJECTORY *********")
trajectoryView ! "print"
// println(trajectory)
system.shutdown()
}
I've been able to create a script for playing with actor that I've created.
I've read the tutorials for spray routing, but I've been unable to grasp what exactly should I do to provide REST interface for actors that I've created.
object Boot extends App{
implicit val system = ActorSystem("example")
val systemActor = system.actorOf(Props[SystemActor], name = "systemactor")
val trajectoryView = system.actorOf(Props[TrajectoryView], name = "trajectoryView")
val service = system.actorOf(Props[ProcessesService], "processes-rest-service")
implicit val timeout = Timeout(5 seconds)
IO(Http) ? Http.Bind(service, interface = "localhost", port = 8080)
}
And a service
class ProcessesService(systemActor: ActorRef) extends Actor with HttpService {
def actorRefFactory = context
def receive = runRoute(route)
val json4sFormats = DefaultFormats
implicit val timeout = Timeout(5 seconds)
val route = path("processes") {
get {
respondWithMediaType(`application/json`) {
complete {
write(Await.result(systemActor ? "get", timeout.duration).asInstanceOf[Set[Process]])
}
}
}
}
}
I think I need to somehow pass actorRef for SystemActor to this ProcessesService, but I'm not sure how. Also I'm not sure how should I return a response to the request. I understand that I need to somehow pass the "get" message to SystemActor through ActorRef and then serialize the answer to json, but I don't know how to do that.
I would appreciate help!
In spray you can complete routes with a Future.
You should be able to do something like
complete { systemActor ? "get" }
Json serialization is a separate issue.
Oh, your question is vague. Yes you need to be able to reference an actor within your routes. You could just import the val from boot where you define it. They're just Scala variables so where you put them is up to you.

How to close enumerated file?

Say, in an action I have:
val linesEnu = {
val is = new java.io.FileInputStream(path)
val isr = new java.io.InputStreamReader(is, "UTF-8")
val br = new java.io.BufferedReader(isr)
import scala.collection.JavaConversions._
val rows: scala.collection.Iterator[String] = br.lines.iterator
Enumerator.enumerate(rows)
}
Ok.feed(linesEnu).as(HTML)
How to close readers/streams?
There is a onDoneEnumerating callback that functions like finally (will always be called whether or not the Enumerator fails). You can close the streams there.
val linesEnu = {
val is = new java.io.FileInputStream(path)
val isr = new java.io.InputStreamReader(is, "UTF-8")
val br = new java.io.BufferedReader(isr)
import scala.collection.JavaConversions._
val rows: scala.collection.Iterator[String] = br.lines.iterator
Enumerator.enumerate(rows).onDoneEnumerating {
is.close()
// ... Anything else you want to execute when the Enumerator finishes.
}
}
The IO tools provided by Enumerator give you this kind of resource management out of the box—e.g. if you create an enumerator with fromStream, the stream is guaranteed to get closed after running (even if you only read a single line, etc.).
So for example you could write the following:
import play.api.libs.iteratee._
val splitByNl = Enumeratee.grouped(
Traversable.splitOnceAt[Array[Byte], Byte](_ != '\n'.toByte) &>>
Iteratee.consume()
) compose Enumeratee.map(new String(_, "UTF-8"))
def fileLines(path: String): Enumerator[String] =
Enumerator.fromStream(new java.io.FileInputStream(path)).through(splitByNl)
It's a shame that the library doesn't provide a linesFromStream out of the box, but I personally would still prefer to use fromStream with hand-rolled splitting, etc. over using an iterator and providing my own resource management.

Scala/Akka WSResponse recursively call

Im trying to parse some data from an API
I have a recursion method that calling to this method
def getJsonValue( url: (String)): JsValue = {
val builder = new com.ning.http.client.AsyncHttpClientConfig.Builder()
val client = new play.api.libs.ws.ning.NingWSClient(builder.build())
val newUrl = url.replace("\"", "").replace("|", "%7C").trim
val response: Future[WSResponse] = client.url(newUrl).get()
Await.result(response, Duration.create(10, "seconds")).json
}
Everything is working well but after 128 method calls i'm getting this warning
WARNING: You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the application, so that only a few instances are created.
After about 20 More calls im getting this exception
23:24:57.425 [main] ERROR com.ning.http.client.AsyncHttpClient - Unable to instantiate provider com.ning.http.client.providers.netty.NettyAsyncHttpProvider. Trying other providers.
23:24:57.438 [main] ERROR com.ning.http.client.AsyncHttpClient - org.jboss.netty.channel.ChannelException: Failed to create a selector.
Questions
1.Im assuming that the connections didnt closed ?? and therefore i can't create new connections.
2.What will be the correct and the safe way to create those HTTP calls
Had the same problem.
Found 2 interesting solutions:
make sure you are not creating tons of clients with closing them
the threadPool you are using may be causing this.
My piece of code (commenting that line of code solved, I'm now testing several configurations):
private[this] def withClient(block: NingWSClient => WSResponse): Try[WSResponse] = {
val config = new NingAsyncHttpClientConfigBuilder().build()
val clientConfig = new AsyncHttpClientConfig.Builder(config)
// .setExecutorService(new ThreadPoolExecutor(5, 15, 30L, TimeUnit.SECONDS, new SynchronousQueue[Runnable]))
.build()
val client = new NingWSClient(clientConfig)
val result = Try(block(client))
client.close()
result
}
for avoiding this you can use different provider.
private AsyncHttpProvider httpProvider =new ApacheAsyncHttpProvider(config);
private AsyncHttpClient asyncHttpClient = new AsyncHttpClient(httpProvider,config);
I ran into this same problem. Before you call your recursive method, you should create builder and client and pass client to the recursive method, as well as getJsonValue. This is what getJsonValue should look like:
def getJsonValue(url: String, client: NingWSClient): JsValue = {
val builder = new com.ning.http.client.AsyncHttpClientConfig.Builder()
val client = new play.api.libs.ws.ning.NingWSClient(builder.build())
val newUrl = url.replace("\"", "").replace("|", "%7C").trim
val response: Future[WSResponse] = client.url(newUrl).get()
Await.result(response, Duration.create(10, "seconds")).json
}