I just want to get an HTML page using spray - scala

But their documentation looks assuming I'm already familiar with Scala, Akka and Spray itself. I mean I couldn't find out how to do this simple basic thing, that I would love to have as one snippet of code in their home page...
The only thing I could find is how to build a request with their spray-httpx:
import spray.httpx.RequestBuilder._
val req = Get("http://url")
The object doesn't have operation to send itself to anywhere, so I'm sure I'm supposed to use Akka things to do it, but their documentation doesn't show the process. Please tell me how to do it. If spray-can do the same thing, I know it can, I would prefer the way.

There is an example here: http://spray.io/documentation/1.1-SNAPSHOT/spray-client/
import spray.http._
import spray.client.pipelining._
implicit val system = ActorSystem()
import system.dispatcher // execution context for futures
val pipeline: HttpRequest => Future[HttpResponse] = sendReceive
val response: Future[HttpResponse] = pipeline(Get("http://spray.io/"))
and even simpler example here: https://github.com/spray/spray/wiki/spray-client
val conduit = new HttpConduit("github.com")
val responseFuture = conduit.sendReceive(HttpRequest(GET, uri = "/"))
In both cases you have to process the result like you normally process a Future, e.g.:
for {response <- responseFuture} yield { someFunction(response) }

Related

Making HTTP post requests on Spark usign foreachPartition

Need some help to understand the behaviour of the below in Spark (using Scala and Databricks)
I have some dataframe (reading from S3 if that matters), and would send that data by making HTTP post requests in batches of 1000 (at most). So I repartitioned the dataframe to make sure each partition has no more than 1000 records. Also, created a json column for each line (so I need only to put them in an array later on)
The trouble is on the making the requests. I created the following a Serializable class using the following code
import org.apache.spark.sql.{DataFrame, Row}
import org.apache.http.client.methods.HttpPost
import org.apache.http.impl.client.HttpClientBuilder
import org.apache.http.HttpHeaders
import org.apache.http.entity.StringEntity
import org.apache.commons.io.IOUtils
object postObject extends Serializable{
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://my-cool-api-endpoint")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
def makeHttpCall(row: Iterator[Row]) = {
val json_str = """{"people": [""" + row.toSeq.map(x => x.getAs[String]("json")).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
}
}
Now when I try the following:
postObject.makeHttpCall(data.head(2).toIterator)
It works like a charm. The requests go through, there is some output on the screen, and my API gets that data.
But when I try to put it in the foreachPartition:
data.foreachPartition { x =>
postObject.makeHttpCall(x)
}
Nothing happens. No output on screen, nothing arrives in my API. If I try to rerun it, almost all stages just skips. I believe, for any reason, it is just lazy evaluating my requests, but not actually performing it. I don't understand why, and how to force it.
postObject has 2 fields: client and post which has to be serialized.
I'm not sure that client is serialized properly. post object is potentially mutated from several partitions (on the same worker). So many things could go wrong here.
I propose tryng removing postObject and inlining its body into foreachPartition directly.
Addition:
Tried to run it myself:
sc.parallelize((1 to 10).toList).foreachPartition(row => {
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://google.com")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
val json_str = """{"people": [""" + row.toSeq.map(x => x.toString).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
})
Ran it both locally and in cluster.
It completes successfully and prints 405 errors to worker logs.
So requests definitely hit the server.
foreachPartition returns nothing as the result. To debug your issue you can change it to mapPartitions:
val responseCodes = sc.parallelize((1 to 10).toList).mapPartitions(row => {
val client = HttpClientBuilder.create().build()
val post = new HttpPost("https://google.com")
post.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
val json_str = """{"people": [""" + row.toSeq.map(x => x.toString).mkString(",") + "]}"
post.setEntity(new StringEntity(json_str))
val response = client.execute(post)
val entity = response.getEntity()
println(Seq(response.getStatusLine.getStatusCode(), response.getStatusLine.getReasonPhrase()))
println(IOUtils.toString(entity.getContent()))
Iterator.single(response.getStatusLine.getStatusCode)
}).collect()
println(responseCodes.mkString(", "))
This code returns the list of response codes so you can analyze it.
For me it prints 405, 405 as expected.
There is a way to do this without having to find out what exactly is not serializable. If you want to keep the structure of your code, you can make all fields #transient lazy val. Also, any call with side effects should be wrapped in a block. For example
val post = {
val httpPost = new HttpPost("https://my-cool-api-endpoint")
httpPost.addHeader(HttpHeaders.CONTENT_TYPE,"application/json")
httpPost
}
That will delay the initialization of all fields until they are used by the workers. Each worker will have an instance of the object and you will be able to make invoke the makeHttpCall method.

Change a materialized value in a source using the contents of the stream

Alpakka provides a great way to access dozens of different data sources. File oriented sources such as HDFS and FTP sources are delivered as Source[ByteString, Future[IOResult]. However, HTTP requests via Akka HTTP are delivered as entity streams of Source[ByteString, NotUsed]. In my use case, I would like to retrieve content from HTTP sources as Source[ByteString, Future[IOResult] so I can build a unified resource fetcher that works from multiple schemes (hdfs, file, ftp and S3 in this case).
In particular, I would like to convert the Source[ByteString, NotUsed] source to
Source[ByteString, Future[IOResult] where I am able to calculate the IOResult from the incoming byte stream. There are plenty of methods like flatMapConcat and viaMat but none seem to be able to extract details from the input stream (such as number of bytes read) or initialise the IOResult structure properly. Ideally, I am looking for a method with the following signature that will update the IOResult as the stream comes in.
def matCalc(src: Source[ByteString, Any]) = Source[ByteString, Future[IOResult]] = {
src.someMatFoldMagic[ByteString, IOResult](IOResult.createSuccessful(0))(m, b) => m.withCount(m.count + b.length))
}
i can't recall any existing functionality, which can out of the box do this, but you can use alsoToMat (surprisingly didn't find it in akka streams docs, although you can look it in source code documentation & java api) flow function together with Sink.fold to accumulate some value and give it in the very end. eg:
def magic(source: Source[Int, Any]): Source[Int, Future[Int]] =
source.alsoToMat(Sink.fold(0)((acc, _) => acc + 1))((_, f) => f)
the thing is that alsoToMat combines input mat value with the one provided in alsoToMat. at the same time the values produced by source are not affected by the sink in alsoToMat:
def alsoToMat[Mat2, Mat3](that: Graph[SinkShape[Out], Mat2])(matF: (Mat, Mat2) ⇒ Mat3): ReprMat[Out, Mat3] =
viaMat(alsoToGraph(that))(matF)
it's not that hard to adapt this function to return IOResult, which is according to the source code:
final case class IOResult(count: Long, status: Try[Done]) { ... }
one more last thing which you need to pay attention - you want your source be like:
Source[ByteString, Future[IOResult]]
but if you wan't to carry these mat value till the very end of stream definition, and then do smth based on this future completion, that might be error prone approach. eg, in this example i finish the work based on that future, so the last value will not be processed:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Keep, Sink, Source}
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
object App extends App {
private implicit val sys: ActorSystem = ActorSystem()
private implicit val mat: ActorMaterializer = ActorMaterializer()
private implicit val ec: ExecutionContext = sys.dispatcher
val source: Source[Int, Any] = Source((1 to 5).toList)
def magic(source: Source[Int, Any]): Source[Int, Future[Int]] =
source.alsoToMat(Sink.fold(0)((acc, _) => acc + 1))((_, f) => f)
val f = magic(source).throttle(1, 1.second).toMat(Sink.foreach(println))(Keep.left).run()
f.onComplete(t => println(s"f1 completed - $t"))
Await.ready(f, 5.minutes)
mat.shutdown()
sys.terminate()
}
This can be done by using a Promise for the materialized value propagation.
val completion = Promise[IoResult]
val httpWithIoResult = http.mapMaterializedValue(_ => completion.future)
What is left now is to complete the completion promise when the relevant data becomes available.
Alternative approach would be to drop down to the GraphStage API where you get lower level control of materialized value propagation. But even there using Promises is often the chosen implementation for materialized value propagation. Take a look at built in operator implementations like Ignore.

How to create WSClient in Scala ?

Hello I'm writing scala code to pull the data from API.
Data is paginated, so I'm pulling a data sequentially.
Now, I'm looking a solution to pulling multiple page parallel and stuck to create WSClient programatically instead of Inject.
Anyone have a solution to create WSClient ?
I found a AhcWSClient(), but it required to implicitly import actor system.
When you cannot Inject one as suggested in the other answer, you can create a Standalone WS client using:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import play.api.libs.ws._
import play.api.libs.ws.ahc.StandaloneAhcWSClient
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val ws = StandaloneAhcWSClient()
No need to reinvent the wheel here. And I'm not sure why you say you can't inject a WSClient. If you can inject a WSClient, then you could do something like this to run the requests in parallel:
class MyClient #Inject() (wsClient: WSClient)(implicit ec: ExecutionContext) {
def getSomething(urls: Vector[String]): Future[Something] = {
val futures = urls.par.map { url =>
wsClient.url(url).get()
}
Future.sequence(futures).map { responses =>
//process responses here. You might want to fold them together
}
}
}

FS2 stream to unread InputStream

I'd like to convert fs2.Stream to java.io.InputStream so I can pass that input stream to an http framework (Finch and Akka Http).
I found a fs2.io.toInputStream, but this doesn't work (it prints nothing):
import java.io.{ByteArrayInputStream, InputStream}
import cats.effect.IO
import scala.concurrent.ExecutionContext.Implicits.global
object IOTest {
def main(args: Array[String]): Unit = {
val is: InputStream = new ByteArrayInputStream("test".getBytes)
val stream: fs2.Stream[IO, Byte] = fs2.io.readInputStream(IO(is), 128)
val test: Seq[InputStream] = stream.through(fs2.io.toInputStream).compile.toList.unsafeRunSync()
println(scala.io.Source.fromInputStream(test.head).mkString)
}
}
As far as I understand when I run .unsafeRunSync() it's consuming the whole stream, so even though it returns a Seq[InputStream] the under-laying input stream is already consumed.
Is there any way I can convert fs2.Stream[IO, Byte] to java.io.InputStream without it being consumed?
Thnaks!
The problem is that compile is being invoked prematurely. I'm sure that under the hood fs2.io.toInputStream does the correct thing and brackets the created InputStream. Which means that the InputStream must be accessed inside the Stream itself (e.g., in a map/flatMap call):
val wire: fs2.Stream[IO, Byte] = ???
val result: fs2.Stream[IO, String] = for {
is <- wire.through(fs2.io.toInputStream)
str = scala.io.Source.fromInputStream(is).mkString //<--- use the InputStream here
} yield str
println( result.compile.lastOrError.unsafeRunSync() ) //<--- compile at the _very_ end
Outputs:
test
It looks that Finch has fs2 support https://github.com/finagle/finch/tree/master/fs2 and Akka also has it's stream implementation and there are fs2 - Akka Stream interop libraries like https://github.com/krasserm/streamz/tree/master/streamz-converter
So i recommend you to take a look to the implementations because they take care of the resources life cycle. Probably you don't need the whole library but it serves as guideline.
And if you are starting at the "safe zone" with fs2, why moving out of there :)

how to serve a html file correctly as correct content to a browser with akka-http?

I am looking into the akka-http documentation, and find there how to serve html content with the low-level server api using HttpResponses. However, I can not find any good examples on how to serve html-content, which should be correctly represented in the browser. The only things that I find and can get working is when it serve String content as below. I found an example that:
imports akka.http.scaladsl.marshallers.xml.ScalaXmlSupport._
but I can not see that scaladsl contains marshallers (it contains marshalling)
import akka.http.scaladsl.Http
import akka.stream.ActorMaterializer
import akka.http.scaladsl.server.Directives._
import akka.actor.ActorSystem
object HttpAkka extends App{
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
implicit val ec = system.dispatcher
val route = get {
pathEndOrSingleSlash {
complete("<h1>Hello</h1>")
}
}
Http().bindAndHandle(route, "localhost", 8080)
}
found a relevant question here Akka-http: Accept and Content-type handling
I did not fully understand the answer in that link, but tried the following (i eventually got this to work..):
complete {
HttpResponse(entity=HttpEntity(ContentTypes.`text/html(UTF-8)`, "<h1>Say Hello</h1>"))
}
as well as:
complete {
respondWithHeader (RawHeader("Content-Type", "text/html(UFT-8"))
"<h1> Say hello</h1>"
}
The best way to handle this probably is to use ScalaXmlSupport to build a NodeSeq marshaller that will properly set the content type as text/html. To do that, you first need to add a new dependency as ScalaXmlSupport is not included by default. Assuming you are using sbt, then the dependency to add is as follows:
"com.typesafe.akka" %% "akka-http-xml-experimental" % "2.4.2"
Then, you can setup a route like so to return a Scala NodeSeq that will be flagged as text/html when akka sets the content type:
implicit val system = ActorSystem()
import system.dispatcher
implicit val mater = ActorMaterializer()
implicit val htmlMarshaller = ScalaXmlSupport.nodeSeqMarshaller(MediaTypes.`text/html` )
val route = {
(get & path("foo")){
val resp =
<html>
<body>
<h1>This is a test</h1>
</body>
</html>
complete(resp)
}
}
Http().bindAndHandle(route, "localhost", 8080
The trick to getting text/html vs text/xml is to use the nodeSeqMarshaller method on ScalaXmlSupport, passing in MediaTypes.text/html as the media type. If you just import ScalaXmlSupport._ then the default marshaller that will be in scope will set the content type to text/xml.