Scala/Akka WSResponse recursively call - scala

Im trying to parse some data from an API
I have a recursion method that calling to this method
def getJsonValue( url: (String)): JsValue = {
val builder = new com.ning.http.client.AsyncHttpClientConfig.Builder()
val client = new play.api.libs.ws.ning.NingWSClient(builder.build())
val newUrl = url.replace("\"", "").replace("|", "%7C").trim
val response: Future[WSResponse] = client.url(newUrl).get()
Await.result(response, Duration.create(10, "seconds")).json
}
Everything is working well but after 128 method calls i'm getting this warning
WARNING: You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the application, so that only a few instances are created.
After about 20 More calls im getting this exception
23:24:57.425 [main] ERROR com.ning.http.client.AsyncHttpClient - Unable to instantiate provider com.ning.http.client.providers.netty.NettyAsyncHttpProvider. Trying other providers.
23:24:57.438 [main] ERROR com.ning.http.client.AsyncHttpClient - org.jboss.netty.channel.ChannelException: Failed to create a selector.
Questions
1.Im assuming that the connections didnt closed ?? and therefore i can't create new connections.
2.What will be the correct and the safe way to create those HTTP calls

Had the same problem.
Found 2 interesting solutions:
make sure you are not creating tons of clients with closing them
the threadPool you are using may be causing this.
My piece of code (commenting that line of code solved, I'm now testing several configurations):
private[this] def withClient(block: NingWSClient => WSResponse): Try[WSResponse] = {
val config = new NingAsyncHttpClientConfigBuilder().build()
val clientConfig = new AsyncHttpClientConfig.Builder(config)
// .setExecutorService(new ThreadPoolExecutor(5, 15, 30L, TimeUnit.SECONDS, new SynchronousQueue[Runnable]))
.build()
val client = new NingWSClient(clientConfig)
val result = Try(block(client))
client.close()
result
}

for avoiding this you can use different provider.
private AsyncHttpProvider httpProvider =new ApacheAsyncHttpProvider(config);
private AsyncHttpClient asyncHttpClient = new AsyncHttpClient(httpProvider,config);

I ran into this same problem. Before you call your recursive method, you should create builder and client and pass client to the recursive method, as well as getJsonValue. This is what getJsonValue should look like:
def getJsonValue(url: String, client: NingWSClient): JsValue = {
val builder = new com.ning.http.client.AsyncHttpClientConfig.Builder()
val client = new play.api.libs.ws.ning.NingWSClient(builder.build())
val newUrl = url.replace("\"", "").replace("|", "%7C").trim
val response: Future[WSResponse] = client.url(newUrl).get()
Await.result(response, Duration.create(10, "seconds")).json
}

Related

http4s - how create blaze client with limited count of threads?

I trying to create blaze client with limited number of threads like this:
object ReactiveCats extends IOApp {
private val PORT = 8083
private val DELAY_SERVICE_URL = "http://localhost:8080"
// trying create client with limited number of threads
val clientPool: ExecutorService = Executors.newFixedThreadPool(64)
val clientExecutor: ExecutionContextExecutor = ExecutionContext.fromExecutor(clientPool)
private val httpClient = BlazeClientBuilder[IO](clientExecutor).resource
private val httpApp = HttpRoutes.of[IO] {
case GET -> Root / delayMillis =>
httpClient.use { client =>
client
.expect[String](s"$DELAY_SERVICE_URL/$delayMillis")
.flatMap(response => Ok(s"ReactiveCats: $response"))
}
}.orNotFound
// trying to create server on fixed thread pool
val serverPool: ExecutorService = Executors.newFixedThreadPool(64)
val serverExecutor: ExecutionContextExecutor = ExecutionContext.fromExecutor(serverPool)
// start server
override def run(args: List[String]): IO[ExitCode] =
BlazeServerBuilder[IO](serverExecutor)
.bindHttp(port = PORT, host = "localhost")
.withHttpApp(httpApp)
.serve
.compile
.drain
.as(ExitCode.Success)
}
full code and load-tests
 
But load-test results looks like one thread by one request:
How I make restrict numbers of threads for my blaze client?
There are two obvious things that are wrong with your code:
you're creating an Executor without shutting it down when you're done.
you're using the use method of the httpClient Resource inside the HTTP route, meaning that every time the route is called, it will create, use and destroy the http client. You should instead create it once during startup.
Executors, like any other resource (e. g. file handles etc.) should always be allocated using Resource.make like so:
val clientPool: Resource[IO, ExecutorService] = Resource.make(IO(Executors.newFixedThreadPool(64)))(ex => IO(ex.shutdown()))
val clientExecutor: Resource[IO, ExecutionContextExecutor] = clientPool.map(ExecutionContext.fromExecutor)
private val httpClient = clientExecutor.flatMap(ex => BlazeClientBuilder[IO](ex).resource)
The second problem can easily be fixed by allocating the httpClient before building the HTTP app:
private def httpApp(client: Client[IO]): Kleisli[IO, Request[IO], Response[IO]] = HttpRoutes.of[IO] {
case GET -> Root / delayMillis =>
client
.expect[String](s"$DELAY_SERVICE_URL/$delayMillis")
.flatMap(response => Ok(s"ReactiveCats: $response"))
}.orNotFound
…
override def run(args: List[String]): IO[ExitCode] =
httpClient.use { client =>
BlazeServerBuilder[IO](serverExecutor)
.bindHttp(port = PORT, host = "localhost")
.withHttpApp(httpApp(client))
.serve
.compile
.drain
.as(ExitCode.Success)
}
Another potential problem is that you're using IOApp, and it comes with its own thread pool. The best way to fix that is probably to mix in the IOApp.WithContext trait and implement this method:
override protected def executionContextResource: Resource[SyncIO, ExecutionContext] = ???
Copy from my commment.
Answer for performance issue is properly setup for Blaze client - for me this is .withMaxWaitQueueLimit(1024) parameter.

Making HTTP post requests on Spark usign foreachPartition

Need some help to understand the behaviour of the below in Spark (using Scala and Databricks)
I have some dataframe (reading from S3 if that matters), and would send that data by making HTTP post requests in batches of 1000 (at most). So I repartitioned the dataframe to make sure each partition has no more than 1000 records. Also, created a json column for each line (so I need only to put them in an array later on)
The trouble is on the making the requests. I created the following a Serializable class using the following code
import org.apache.spark.sql.{DataFrame, Row}
import org.apache.http.client.methods.HttpPost
import org.apache.http.impl.client.HttpClientBuilder
import org.apache.http.HttpHeaders
import org.apache.http.entity.StringEntity
import org.apache.commons.io.IOUtils
object postObject extends Serializable{
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://my-cool-api-endpoint")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
def makeHttpCall(row: Iterator[Row]) = {
val json_str = """{"people": [""" + row.toSeq.map(x => x.getAs[String]("json")).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
}
}
Now when I try the following:
postObject.makeHttpCall(data.head(2).toIterator)
It works like a charm. The requests go through, there is some output on the screen, and my API gets that data.
But when I try to put it in the foreachPartition:
data.foreachPartition { x =>
postObject.makeHttpCall(x)
}
Nothing happens. No output on screen, nothing arrives in my API. If I try to rerun it, almost all stages just skips. I believe, for any reason, it is just lazy evaluating my requests, but not actually performing it. I don't understand why, and how to force it.
postObject has 2 fields: client and post which has to be serialized.
I'm not sure that client is serialized properly. post object is potentially mutated from several partitions (on the same worker). So many things could go wrong here.
I propose tryng removing postObject and inlining its body into foreachPartition directly.
Addition:
Tried to run it myself:
sc.parallelize((1 to 10).toList).foreachPartition(row => {
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://google.com")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
val json_str = """{"people": [""" + row.toSeq.map(x => x.toString).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
})
Ran it both locally and in cluster.
It completes successfully and prints 405 errors to worker logs.
So requests definitely hit the server.
foreachPartition returns nothing as the result. To debug your issue you can change it to mapPartitions:
val responseCodes = sc.parallelize((1 to 10).toList).mapPartitions(row => {
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://google.com")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
val json_str = """{"people": [""" + row.toSeq.map(x => x.toString).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
Iterator.single(response.getStatusLine.getStatusCode)
}).collect()
println(responseCodes.mkString(", "))
This code returns the list of response codes so you can analyze it.
For me it prints 405, 405 as expected.
There is a way to do this without having to find out what exactly is not serializable. If you want to keep the structure of your code, you can make all fields #transient lazy val. Also, any call with side effects should be wrapped in a block. For example
val post = {
val httpPost = new HttpPost("https://my-cool-api-endpoint")
httpPost.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
httpPost
}
That will delay the initialization of all fields until they are used by the workers. Each worker will have an instance of the object and you will be able to make invoke the makeHttpCall method.

Scala Jetty webapp on Heroku 404

I'm testing around with a Scala web framework (Udash) and trying to run a toy-example in Heroku. I have it running without issues in local following the instructions in the Heroku docs:
sbt compile stage
heroku local web
However, once deployed, any URL I type goes to 404, even the landing page of the app. These are the objects I am using:
object Launcher extends CrossLogging {
def main(args: Array[String]): Unit = {
val port = Properties.envOrElse("PORT", "5000").toInt
val server = new ApplicationServer(port, "frontend/target/UdashStatics/WebContent")
server.start()
logger.info(s"Application started...")
}
}
class ApplicationServer(val port: Int, resourceBase: String) {
private val server = new Server(port)
private val contextHandler = new ServletContextHandler
private val appHolder = createAppHolder()
contextHandler.setSessionHandler(new SessionHandler)
contextHandler.setGzipHandler(new GzipHandler)
contextHandler.getSessionHandler.addEventListener(new org.atmosphere.cpr.SessionSupport())
contextHandler.addServlet(appHolder, "/*")
server.setHandler(contextHandler)
def start(): Unit = server.start()
def stop(): Unit = server.stop()
private def createAppHolder() = {
val appHolder = new ServletHolder(new DefaultServlet)
appHolder.setAsyncSupported(true)
appHolder.setInitParameter("resourceBase", resourceBase)
appHolder
}
}
Is there any Heroku configuration/characteristic that I am missing?
EDIT
Tried to apply the changes suggested and ended up with the following ApplicationContext:
class ApplicationServer(val port: Int, val resourceBase: String) {
val server = new Server()
val connector = new ServerConnector(server)
connector.setPort(port)
server.addConnector(connector)
private val appHolder = createAppHolder()
val context = new ServletContextHandler(ServletContextHandler.SESSIONS)
context.setBaseResource(Resource.newResource(resourceBase))
context.setContextPath("/")
context.addServlet(appHolder, "/")
server.setHandler(context)
private def createAppHolder() = {
val appHolder = new ServletHolder("default", classOf[DefaultServlet])
appHolder.setInitParameter("dirAllowed", "true")
appHolder.setInitParameter("resourceBase", resourceBase)
appHolder
}
def start(): Unit = server.start()
def stop(): Unit = server.stop()
}
However, I still get Error 404 even on landing page after deploying to Heroku:
HTTP ERROR 404
Problem accessing /. Reason:
Not Found
When running the app on local I get to the landing page correctly.
Thank you!
Thanks!
A few things to adjust that might help you.
resourceBase as a init-parameter on DefaultServlet is for alternate static file serving.
Use ServletContextHandler.setBaseResource(Resource) instead.
Use Resource.newResource(String) to create a new Resource reference. This should be an absolute path on the filesystem, or an absolute URI reference. no relative paths or URI fragments.
The DefaultServlet must be on the url-pattern of "/", not "/*" (this is a servlet spec requirement)
The DefaultServlet must be named, and must have the name "default" (this is a servlet spec requirement, see link on point 1 for example)
set the ServletContextHandler.setContextPath("/") to be explicit about what base context-path you want to use.
Some observations:
Your example code will only serve static content out of the resourceBase.
Since you have no welcomeFiles configured it would serve <resourceBase>/index.html by default (if you don't specify a specific static resource you want to access)
You have a SessionListener setup (org.atmosphere.cpr.SessionSupport), but since there's nothing that would access a Session, that code is pretty much a no-op.
There's no dynamic results from a custom Servlet or Filter present in your example codebase.

java.io.IOException: WebSocket method must be a GET

I am trying to write a websocket client application where I got to subscribe for an websocket URL i am using play-ws for the same. But getting the exception like below.
Exception in thread "main" java.io.IOException: WebSocket method must
be a GET
Dependency used:
"com.typesafe.play" %% "play-ws" % "2.4.0-M1"
Piece of code I used to get the websocket client is below,
trait PlayHelper {
val config = new NingAsyncHttpClientConfigBuilder(DefaultWSClientConfig()).build()
val builder = new AsyncHttpClientConfig.Builder(config)
val wsClient = new NingWSClient(builder.build())
def getBody(future: Future[WSResponse]) = {
val response = Await.result(future, Duration.Inf);
if (response.status != 200)
throw new Exception(response.statusText);
response.body
}
}
object Client extends PlayHelper with App{
def subscribe()={
val url = "ws://localhost:8080"
val body = getBody(wsClient.url(url).get())
Thread.sleep(1000)
println(s"body: $body")
}
subscribe()
}
Exception screen shot is below:
Looking for the help for this issue.
I don't think that play-ws supports websockets, you may want to use the AsyncHttpClient directly: https://github.com/AsyncHttpClient/async-http-client#websocket

How to use Java libraries asynchronously in a Scala Play 2.0 application?

I see in the Play 2.0 Scala doc for calling web services that the idiomatic approach is to use Scala's asynchronous mechanisms to call web services. So if I'm using Java libraries for, say, downloading images from S3 and uploading to Facebook and Twitter (restfb and twitter4j), does this make for a highly inefficient use of resources (what resources?) or does it not make much difference (or no difference at all)?
If it makes a difference, how would I go about making something like the following asynchronous? Is there a quick way, or would I have to write libraries from scratch?
Note this will be running on heroku, if that matters in this discussion.
def tweetJpeg = Action(parse.urlFormEncoded) { request =>
val form = request.body
val folder = form("folder").head
val mediaType = form("type").head
val photo = form("photo").head
val path = folder + "/" + mediaType + "/" + photo
val config = Play.current.configuration;
val awsAccessKey = config.getString("awsAccessKey").get
val awsSecretKey = config.getString("awsSecretKey").get
val awsBucket = config.getString("awsBucket").get
val awsCred = new BasicAWSCredentials(awsAccessKey, awsSecretKey)
val amazonS3Client = new AmazonS3Client(awsCred)
val obj = amazonS3Client.getObject(awsBucket, path)
val stream = obj.getObjectContent()
val twitterKey = config.getString("twitterKey").get
val twitterSecret = config.getString("twitterSecret").get
val token = form("token").head
val secret = form("secret").head
val tweet = form("tweet").head
val cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey(twitterKey)
.setOAuthConsumerSecret(twitterSecret)
.setOAuthAccessToken(token)
.setOAuthAccessTokenSecret(secret)
val tf = new TwitterFactory(cb.build())
val twitter = tf.getInstance()
val status = new StatusUpdate(tweet)
status.media(photo, stream)
val twitResp = twitter.updateStatus(status)
Logger.info("Tweeted " + twitResp.getText())
Ok("Tweeted " + twitResp.getText())
}
def facebookJpeg = Action(parse.urlFormEncoded) { request =>
val form = request.body
val folder = form("folder").head
val mediaType = form("type").head
val photo = form("photo").head
val path = folder + "/" + mediaType + "/" + photo
val config = Play.current.configuration;
val awsAccessKey = config.getString("awsAccessKey").get
val awsSecretKey = config.getString("awsSecretKey").get
val awsBucket = config.getString("awsBucket").get
val awsCred = new BasicAWSCredentials(awsAccessKey, awsSecretKey)
val amazonS3Client = new AmazonS3Client(awsCred)
val obj = amazonS3Client.getObject(awsBucket, path)
val stream = obj.getObjectContent()
val token = form("token").head
val msg = form("msg").head
val facebookClient = new DefaultFacebookClient(token)
val fbClass = classOf[FacebookType]
val param = com.restfb.Parameter.`with`("message", msg)
val attachment = com.restfb.BinaryAttachment`with`(photo + ".png", stream)
val fbResp = facebookClient.publish("me/photos", fbClass, attachment, param)
Logger.info("Posted " + fbResp.toString())
Ok("Posted " + fbResp.toString())
}
My attempt at a guess:
Yes it's better to do things asynchronous; you're tying up threads if you do everything synchronously. Threads are memory hogs, so your server can only use so many; the more that are tied up waiting, the fewer requests your server can respond to.
No it's not a huge issue. With node.js (and Rails? Django?) it is a huge issue because there's only one thread and so it blocks your whole web server. A JVM server is multithreaded so you can still service new requests.
You can easily wrap the whole thing in a future, or do it more granularly, but that doesn't really buy you anything because you're calling the same methods, so you're just shifting the wait from one thread do another.
If those Java libraries offer asynchronous methods, you can wrap those in a future to get the real benefits of asynchrony <-how to do?. Otherwise yes you're looking at writing from the ground up.
Don't really know if running on heroku matters. Is one dyno == one simultaneous request?
I think it's best to do these requests asynchronously for two main reasons:
high latency (network calls)
failures
With Play, you should use the Akka actors to make your actions it provides great ways to deal with these two concerns.
The problem synchronous code is that it will block the web server. So it won't be available to other requests. Here we will make the wait in other threads unrelated to the web server.
You could do something like:
// you will have to write the TwitterActor
val twitterActor = Akka.system.actorOf(Props[TwitterActor], name = "twitter-actor")
def tweetJpeg = Action(parse.urlFormEncoded) { request =>
val futureMessage = (twitterActor ? request.body).map {
// Do something with the response from the actor
case ... => ...
}
async {
futureMessage.map( message =>
ok("Tweeted " + message)
)
}
}
Your actor would receive the body and send back the response of the service.
Moreover with Akka, you can tune your process to have several actors available, have a circuit breaker ...
To go further: http://doc.akka.io/docs/akka/2.1.2/scala/actors.html
Ps: I never tried play on Heroku so I don't know the impact of a single dynamo.