akka-http - Multipart/FromData stream batch size

akka-http - Multipart/FromData stream batch size - scala

I'm receiving a Multipart/FromData as a server. This request contains Source[Multipart.FormData.BodyPart, Any] and each Multipart.FormData.BodyPart contains Source[ByteString, Any] inside. I'm able to set a buffer size. But it measures in elements count. How can I calculate the size of each ByteString element(batch size)? I need to calculate the whole buffer size in bytes.
entity(as[Multipart.FormData]) { formData =>
formData
.parts // Source[Multipart.FormData.BodyPart, Any]
.flatMap { bodyPart =>
bodyPart.entity.dataBytes.buffer(???, OverflowStrategy.backpressure) // Source[ByteString, Any]
}
}

It's unclear what the motivation here is, but buffer's cousin batchWeighted is likely to be what you want.
bodyPart.entity.dataBytes
.batchWeighted[ByteString](
max = 1024, // bytes
costFn = { bs: ByteString => bs.length },
seed = { _: Any => ByteString.empty }
) { _ ++ _ }
(Have not tested this: my mental compiler says "looks good!")

Related

Avoid syncronous calls in Akka Actors due of Await result

Because I do some "complex" operations, I think my Actor became asyncronous. The main problem I think is that I use Await.result inside the method which return responses.
actor:
def process(subscribers: Set[ActorRef]): Receive = {
case Join(ref) => context become process(subscribers + ref)
case Leave(ref) => context become process(subscribers - ref)
case Push(request) =>
val filteredSubscribers = (subscribers - sender())
.filter(s => exists(s, request)) // just some actor filters
filteredSubscribers.foreach { subscriber =>
// here I have a Map with each actor requests
val actorOptions = getActorOptions(subscriber)
subscriber ? getResponse(actorOptions, request)
}
}
The problem is inside getResponse (I think).
getResponse(actorOptions: JsValue, request: SocketRequest): JsValue = {
(actorOptions \ "dashboardId").asOpt[Int] match {
case Some(id) => {
val response = widgetsService.getByDashboadId(id) map { widgets =>
val widgetsResponse: List[Future[String]] = widgets.map(w => {
widgetsService.getDataById(w.id) map {
data => s"""{ "widgetId": ${w.id}, "data": $data }"""
}
})
var responses: List[String] = List.empty
widgetsResponse.foreach(f => {
f.onComplete {
case Success(value) => responses = value :: responses
case Failure(e) => println(s"Something happened: ${e.getMessage}")
}
})
// first time when I use Await.result
// used to populate the responses list with data from all futures
Await.result(Future.sequence(widgetsResponse), Duration.Inf)
Json.parse(s"""{
"dashboardId": $id,
"widgets": [${response.mkString(", ")}]
}""".stripMargin)
}
// second time when I use Await.result
// used to return a JsValue instead of a Future[JsValue]
Await.result(response, Duration.Inf)
}
case None => buildDefaultJson // return default json value, unimportant for this example
}
}
Due of that, In frontend, if I have 2 sockets clients, the response for the second will be send only after first.
I found that I can obtain a "fake" increase of performance if I embrance the getResponse in a future inside of my Actor.
filteredSubscribers.foreach { subscriber =>
val actorOptions = getActorOptions(subscriber)
Future(subscriber ? getResponse(actorOptions, request))
}
So, for both subscribers the action will be started in same time, but when the first will reach the Await.result, the second will be locked until first is done.
I need to avoid using Await.result there, but I don't know how to get the results of a list of futures, without using for-comprehension (because is a dynamically list) for first time where I use it.
Because Akka ask operator (?) return a Future[Any], I tried that my getResponse method to return directly a JsValue to be mapped then in Future[JsValue]. If I remove the second Await.result and my method will return Future[JsValue], then the actor will return a Future[Future[JsValue]] which I don't think is too right.
After some more researches and solutions found on so, my code become:
Future.sequence(widgetsResponse) map { responses =>
Json.parse(
s"""
|{
|"dashboardId": $id,
|"tableSourceId": $tableSourceId,
|"widgets": [ ${responses.mkString(", ")}]
|}""".stripMargin
)
}
getResponse returns a Future[JsValue] now, removing both Await.result cases, and actor case become:
filteredSubscribers.foreach { subscriber =>
val actorOptions = getActorOptions(subscriber)
getResponse(actorOptions, request) map { data =>
subscriber ? data
}
}
I don't know why, still have a synchronous behavior. Damn, this can be due of my subscribers type: Set[ActorRef]? I tried to use parallel foreach and this looks like solving my problem:
filteredSubscribers.par.foreach { subscriber =>
val actorOptions = getActorOptions(subscriber)
getResponse(actorOptions, request) map { data =>
subscriber ? data
}
}

How to stream the response of dispatcher?

I have to export the CSV data. The volume of data is very high. So I am streaming the response from the microservice.
We hit our microservice using dispatcher.
def stream(method: String, urlString: String): Future[Source[ByteString, NotUsed]] =
method match {
case GET =>
val request = Http(url(urlString))
request.map { response =>
response.getStatusCode match {
case StatusOk => Source.single(ByteString(response.getResponseBody))
}
}
}
It will bring all the data. So to fix this issue, I like to modify it and streamed the data from here as well.
I searched a lot and found this question Scala dispatch stream response line by line
But it has no answer.
Thanks and any help will be appreciated.

After searching much, I read it as Input stream and converted to Akka Stream. It worked for me.
def stream(method: String, urlString: String): Future[Source[ByteString, Future[IOResult]]] =
method match {
case GET =>
val futureStream = Http(url(urlString) > as.Response(_.getResponseBodyAsStream))
futureStream.map { inputStream =>
val source = () => inputStream
StreamConverters.fromInputStream(source)
}
}

Appending to given formdata for http post

I am calling the following helper function from an action to make a syncronous HTTP-post with a special key-value pair appended to the URL-parameters in formdata:
def synchronousPost(url: String, formdata: Map[String, Seq[String]], timeout: Duration=Duration.create(30, TimeUnit.SECONDS)): String = {
import ExecutionContext.Implicits.global
val params = formdata.map { case (k, v) => "%s=%s".format(k, URLEncoder.encode(v.head, "UTF-8")) }.mkString("&")
val future: Future[WSResponse] = ws.url(url).
withHttpHeaders(("Content-Type", "application/x-www-form-urlencoded")).
post(params)
try {
Await.result(future, timeout)
future.value.get.get.body
} catch {
case ex: Exception =>
Logger.error(ex.toString)
ex.toString
}
}
It is called like this:
def act = Action { request =>
request.body.asFormUrlEncoded match {
case Some(formdata) =>
synchronousPost(url, formdata + ("handshake" -> List("ok"))) match {
Actually it is copy-pasted from some old gist and I am quite sure it can be rewritten in a cleaner way.
How can the line val params = ... be rewritten in a cleaner way? It seems to be low level.
The original formdata comes in from request.body.asFormUrlEncoded and I simply need to append the handshake parameter to the formdata-map and send the original request back to the sender to do the handshake.

Since formdata is a Map[String, Seq[String]], a data type for which a default WSBodyWritable is provided, you can simply use it as the request body directly:
val future: Future[WSResponse] = ws.url(url)
.withHttpHeaders(("Content-Type", "application/x-www-form-urlencoded"))
.post(formdata)
Incidentally, it's considered bad form to use Await when it's easy to make Play controllers return a Future[Result] using Action.async, e.g:
def asyncPost(url: String, formdata: Map[String, Seq[String]]): Future[String] = {
ws.url(url)
.withHttpHeaders(("Content-Type", "application/x-www-form-urlencoded"))
.post(formdata)
.map(_.body)
}
def action = Action.async { request =>
request.body.asFormUrlEncoded match {
case Some(formdata) =>
asyncPost(url, formdata + ("handshake" -> List("ok"))).map { body =>
Ok("Here is the body: " + body)
} recover {
case e =>
Logger.error(e)
InternalServerError(e.getMessage)
}
case None => Future.successful(BadRequest("No form data given"))
}
}

playframework scala typemismatch while returning a seq

I have actually a problem while returning a Seq back to frontend.
My code looks like this:
def getContentComponentsForProcessSteps: Action[AnyContent] = Action.async { implicit request =>
println("-----------------------------------------------New Request--------------------------------------------------------")
println(request.body.asJson)
request.body.asJson.map(_.validate[ProcessStepIds] match {
case JsSuccess(steps, _) =>
val contentComponents: Seq[Future[Seq[Future[ContentComponentModel]]]] = steps.steps.map(stepId => { //foreach
// Fetching all ContentComponent Relations
contentComponentDTO.getContentComponentsByStepId(stepId).map(contentComponents => { // Future[Seq[ContentComponent_ProcessStepTemplateModel]]
// Iteration über die Gefundenen Relations
contentComponents.map(contentComponent => { // foreach
// Fetch Content Component
contentComponentDTO.getContentComponentById(contentComponent.contentComponent_id).flatMap(contentComponent => { // Future[Option[ContentComponentModel]]
// Fetch Content Component Data for the types
val res = getContentComponentDataforOneContentComponent(contentComponent.get)
res.map(con => con)
})
})
})
})
Future.sequence(contentComponents).map(eins => {
println(eins)
Ok(Json.obj("Content Components Return" -> "true", "result" -> eins))
})
case JsError(_) =>
Future.successful {
BadRequest("Can't fetch Content Components")
}
case _ => Future.successful {
BadRequest("Can't fetch Content Components")
}
}).getOrElse(Future.successful {
BadRequest("Can't fetch Content Components")
})
}
Error is the following.
Thanks for any hint

Look at the type of eins your error message is telling you that it is a Seq[Seq[Future[ContenetComponentModel]]] and not simply a Seq like you thought.
There are two problems with this:
You can't write a Future (or in your case, a sequence of futures) to Json.
You need to have an implicit function in scope to convert your ContenetComponentModel to a JSON value.
Depending on what you want your result to look like, you could try flattening eins and then using another Future.sequence, but I think what you really should be doing is changing a lot of your .map calls to .flatMap calls to avoid the nesting in the first place.

Akka stream - Splitting a stream of ByteString into multiple files

I'm trying to split an incoming Akka stream of bytes (from the body of an http request, but it could also be from a file) into multiple files of a defined size.
For example, if I'm uploading a 10Gb file, it would create something like 10 files of 1Gb. The files would have randomly generated names. My issue is that I don't really know where to start, because all the responses and examples I've read are either storing the whole chunk into memory, or using a delimiter based on a string. Except I can't really have "chunks" of 1Gb, and then just write them to the disk..
Is there any easy way to perform that kind of operation ? My only idea would be to use something like this http://doc.akka.io/docs/akka/2.4/scala/stream/stream-cookbook.html#Chunking_up_a_stream_of_ByteStrings_into_limited_size_ByteStrings but transformed to something like FlowShape[ByteString, File], writting myself into a file the chunks until the max file size is reached, then creating a new file, etc.., and streaming back the created files. Which looks like an atrocious idea not using properly Akka..
Thanks in advance

I often revert to purely functional, non-akka, techniques for problems such as this and then "lift" those functions into akka constructs. By this I mean I try to use only scala "stuff" and then try to wrap that stuff inside of akka later on...
File Creation
Starting with the FileOutputStream creation based on "randomly generated names":
def randomFileNameGenerator : String = ??? //not specified in question
import java.io.FileOutputStream
val randomFileOutGenerator : () => FileOutputStream =
() => new FileOutputStream(randomFileNameGenerator)
State
There needs to be some way of storing the "state" of the current file, e.g. the number of bytes already written:
case class FileState(byteCount : Int = 0,
fileOut : FileOutputStream = randomFileOutGenerator())
File Writing
First we determine if we'd breach the maximum file size threshold with the given ByteString:
import akka.util.ByteString
val isEndOfChunk : (FileState, ByteString, Int) => Boolean =
(state, byteString, maxBytes) =>
state.byteCount + byteString.length > maxBytes
We then have to write the function that creates a new FileState if we've maxed out the capacity of the current one or returns the current state if it is still below capacity:
val closeFileInState : FileState => Unit =
(_ : FileState).fileOut.close()
val getCurrentFileState(FileState, ByteString, Int) => FileState =
(state, byteString, maxBytes) =>
if(isEndOfChunk(maxBytes, state, byteString)) {
closeFileInState(state)
FileState()
}
else
state
The only thing left is to write to the FileOutputStream:
val writeToFileAndReturn(FileState, ByteString) => FileState =
(fileState, byteString) => {
fileState.fileOut write byteString.toArray
fileState copy (byteCount = fileState.byteCount + byteString.size)
}
//the signature ordering will become useful
def writeToChunkedFile(maxBytes : Int)(fileState : FileState, byteString : ByteString) : FileState =
writeToFileAndReturn(getCurrentFileState(maxBytes, fileState, byteString), byteString)
Fold On Any GenTraversableOnce
In scala a GenTraversableOnce is any collection, parallel or not, that has the fold operator. These include Iterator, Vector, Array, Seq, scala stream, ... Th final writeToChunkedFile function perfectly matches the signature of GenTraversableOnce#fold:
val anyIterable : Iterable = ???
val finalFileState = anyIterable.fold(FileState())(writetochunkedFile(maxBytes))
One final loose end; the last FileOutputStream needs to be closed as well. Since the fold will only emit that last FileState we can close that one:
closeFileInState(finalFileState)
Akka Streams
Akka Flow gets its fold from FlowOps#fold which happens to match the GenTraversableOnce signature. Therefore we can "lift" our regular functions into stream values similar to the way we used Iterable fold:
import akka.stream.scaladsl.Flow
def chunkerFlow(maxBytes : Int) : Flow[ByteString, FileState, _] =
Flow[ByteString].fold(FileState())(writeToChunkedFile(maxBytes))
The nice part about handling the problem with regular functions is that they can be used within other asynchronous frameworks beyond streams, e.g. Futures or Actors. You also don't need an akka ActorSystem in unit testing, just regular language data structures.
import akka.stream.scaladsl.Sink
import scala.concurrent.Future
def byteStringSink(maxBytes : Int) : Sink[ByteString, _] =
chunkerFlow(maxBytes) to (Sink foreach closeFileInState)
You can then use this Sink to drain HttpEntity coming from HttpRequest.

You could write a custom graph stage.
Your issue is similar to the one faced in alpakka during upload to amazon S3. ( google alpakka s3 connector.. they wont let me post more than 2 links)
For some reason the s3 connector DiskBuffer however writes the entire incoming source of bytestrings to a file, before emitting out the chunk to do further stream processing..
What we want is something similar to limit a source of byte strings to specific length. In the example, they have limited the incoming Source[ByteString, _] to a source of fixed sized byteStrings by maintaining a memory buffer. I adopted it to work with Files.
The advantage of this is that you can use a dedicated thread pool for this stage to do blocking IO. For good reactive stream you want to keep blocking IO in separate thread pool in actor system.
PS: this does not try to make files of exact size.. so if we read 2KB extra in a 100MB file.. we write those extra bytes to the current file rather than trying to achieve exact size.
import java.io.{FileOutputStream, RandomAccessFile}
import java.nio.channels.FileChannel
import java.nio.file.Path
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler, OutHandler}
import akka.stream._
import akka.util.ByteString
case class MultipartUploadChunk(path: Path, size: Int, partNumber: Int)
//Starts writing the byteStrings received from upstream to a file. Emits a path after writing a partSize number of bytes. Does not attemtp to write exact number of bytes.
class FileChunker(maxSize: Int, tempDir: Path, partSize: Int)
extends GraphStage[FlowShape[ByteString, MultipartUploadChunk]] {
assert(maxSize > partSize, "Max size should be larger than part size. ")
val in: Inlet[ByteString] = Inlet[ByteString]("PartsMaker.in")
val out: Outlet[MultipartUploadChunk] = Outlet[MultipartUploadChunk]("PartsMaker.out")
override val shape: FlowShape[ByteString, MultipartUploadChunk] = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) with OutHandler with InHandler {
var partNumber: Int = 0
var length: Int = 0
var currentBuffer: Option[PartBuffer] = None
override def onPull(): Unit =
if (isClosed(in)) {
emitPart(currentBuffer, length)
} else {
pull(in)
}
override def onPush(): Unit = {
val elem = grab(in)
length += elem.size
val currentPart: PartBuffer = currentBuffer match {
case Some(part) => part
case None =>
val newPart = createPart(partNumber)
currentBuffer = Some(newPart)
newPart
}
currentPart.fileChannel.write(elem.asByteBuffer)
if (length > partSize) {
emitPart(currentBuffer, length)
//3. .increment part number, reset length.
partNumber += 1
length = 0
} else {
pull(in)
}
}
override def onUpstreamFinish(): Unit =
if (length > 0) emitPart(currentBuffer, length) // emit part only if something is still left in current buffer.
private def emitPart(maybePart: Option[PartBuffer], size: Int): Unit = maybePart match {
case Some(part) =>
//1. flush the part buffer and truncate the file.
part.fileChannel.force(false)
// not sure why we do this truncate.. but was being done in alpakka. also maybe safe to do.
// val ch = new FileOutputStream(part.path.toFile).getChannel
// try {
// println(s"truncating to size $size")
// ch.truncate(size)
// } finally {
// ch.close()
// }
//2emit the part
val chunk = MultipartUploadChunk(path = part.path, size = length, partNumber = partNumber)
push(out, chunk)
part.fileChannel.close() // TODO: probably close elsewhere.
currentBuffer = None
//complete stage if in is closed.
if (isClosed(in)) completeStage()
case None => if (isClosed(in)) completeStage()
}
private def createPart(partNum: Int): PartBuffer = {
val path: Path = partFile(partNum)
//currentPart.deleteOnExit() //TODO: Enable in prod. requests that the file be deleted when VM dies.
PartBuffer(path, new RandomAccessFile(path.toFile, "rw").getChannel)
}
/**
* Creates a file in the temp directory with name bmcs-buffer-part-$partNumber
* #param partNumber the part number in multipart upload.
* #return
* TODO:add unique id to the file name. for multiple
*/
private def partFile(partNumber: Int): Path =
tempDir.resolve(s"bmcs-buffer-part-$partNumber.bin")
setHandlers(in, out, this)
}
case class PartBuffer(path: Path, fileChannel: FileChannel) //TODO: see if you need mapped byte buffer. might be ok with just output stream / channel.
}

The idiomatic way to split a ByteString stream to multiple files is to use Alpakka's LogRotatorSink. From the documentation:
This sink will takes a function as parameter which returns a Bytestring => Option[Path] function. If the generated function returns a path the sink will rotate the file output to this new path and the actual ByteString will be written to this new file too. With this approach the user can define a custom stateful file generation implementation.
The following fileSizeRotationFunction is also from the documentation:
val fileSizeRotationFunction = () => {
val max = 10 * 1024 * 1024
var size: Long = max
(element: ByteString) =>
{
if (size + element.size > max) {
val path = Files.createTempFile("out-", ".log")
size = element.size
Some(path)
} else {
size += element.size
None
}
}
}
An example of its use:
val source: Source[ByteString, _] = ???
source.runWith(LogRotatorSink(fileSizeRotationFunction))