How can I upload file from string in playframework? - scala

PlayFramework docs shows that it is easy to upload a file.
https://www.playframework.com/documentation/2.5.x/ScalaWS
ws.url(url).post(Source(FilePart("hello", "hello.txt", Option("text/plain"), FileIO.fromFile(tmpFile)) :: DataPart("key", "value") :: List()))
But what if the file content is already in memory? Any alternative method for FileIO.fromFile, such as FileIO.fromString(jsontStr)?
val jsonStr = """{ foo: "Bar"} """
ws.url(url).post(Source(FilePart("hello", "hello.json", Option("application/json"), FileIO.fromString(jsonStr)) :: DataPart("key", "value") :: List()))

All you need is a FilePart that has a Source[ByteString] as ref.
Just use
Source.single(ByteString(jsonStr))
as the ref part.

Related

Download pdf file from s3 using akka-stream-alpakka

I am trying to download pdf file from S3 using the akka-stream-alpakka connector. I have the s3 path and try to download the pdf using a wrapper method over the alpakka s3Client.
def getSource(s3Path: String): Source[ByteString, NotUsed] = {
val (source, _) = s3Client.download(s3Bucket, s3Path)
source
}
From my main code, I call the above method and try to convert it to a file
val file = new File("certificate.pdf")
val res: Future[IOResult] = getSource(data.s3PdfPath)
.runWith(FileIO.toFile(file))
However, instead of it getting converted to a file, I am stuck with a type of IOResult. Can someone please guide as to where I am going wrong regarding this ?
def download(bucket: String, bucketKey: String, filePath: String) = {
val (s3Source: Source[ByteString, _], _) = s3Client.download(bucket, bucketKey)
val result = s3Source.toMat(FileIO.toPath(Paths.get(filePath)))(Keep.right)
.run()
result
}
download(s3Bucket, key, newSigFilepath).onComplete {
}
Inspect the IOResult, and if successful you can use your file:
res.foreach {
case IOResult(bytes, Success(_)) =>
println(s"$bytes bytes written to $file")
... // do whatever you want with your file
case _ =>
println("some error occurred.")
}

Adding headers to an Akka HTTP HttpRequest

I have an existing Akka HTTP HttpRequest and I want to add two headers to it.
val req: HttpRequest = ???
val hs: Seq[HttpHeader] = Seq(RawHeader("a", "b"))
req.addHeaders(hs)
Expected:
a new HttpRequest object with the additional headers
Actual:
.addHeaders expects a java.lang.Iterable and does not compile.
What is the recommended way of doing this in Scala?
There is a workaround, but it's a bit cludgy:
req.withHeaders(req.headers ++ hs)
Running Scala 2.12.8 and Akka HTTP 10.1.7.
Another workaround that is maybe a tiny sliver less cludgy. This is approximately how addHeaders is defined in the source. I unfortunately have no clue why addHeaders is not exposed in the scala api.
req.mapHeaders(_ ++ hs)
One alternative is to use foldLeft and addHeader:
val req: HttpRequest = ???
val hs: Seq[HttpHeader] = Seq(RawHeader("a", "b"))
hs.foldLeft(req)((r, h) => r.addHeader(h))
You can copy the existing HttpRequest to a new HttpRequest with headers
val req: HttpRequest = ???
val hs: Seq[HttpHeader] = Seq(RawHeader("a", "b"))
val reqWithHeaders: HttpRequest = req.copy(headers=hs)

Streaming CSV Source with AKKA-HTTP

I am trying to stream data from Mongodb using reactivemongo-akkastream 0.12.1 and return the result into a CSV stream in one of the routes (using Akka-http).
I did implement that following the exemple here:
http://doc.akka.io/docs/akka-http/10.0.0/scala/http/routing-dsl/source-streaming-support.html#simple-csv-streaming-example
and it looks working fine.
The only problem I am facing now is how to add the headers to the output CSV file. Any ideas?
Thanks
Aside from the fact that that example isn't really a robust method of generating CSV (doesn't provide proper escaping) you'll need to rework it a bit to add headers. Here's what I would do:
make a Flow to convert a Source[Tweet] to a source of CSV rows, e.g. a Source[List[String]]
concatenate it to a source containing your headers as a single List[String]
adapt the marshaller to render a source of rows rather than tweets
Here's some example code:
case class Tweet(uid: String, txt: String)
def getTweets: Source[Tweet, NotUsed] = ???
val tweetToRow: Flow[Tweet, List[String], NotUsed] =
Flow[Tweet].map { t =>
List(
t.uid,
t.txt.replaceAll(",", "."))
}
// provide a marshaller from a row (List[String]) to a ByteString
implicit val tweetAsCsv = Marshaller.strict[List[String], ByteString] { row =>
Marshalling.WithFixedContentType(ContentTypes.`text/csv(UTF-8)`, () =>
ByteString(row.mkString(","))
)
}
// enable csv streaming
implicit val csvStreaming = EntityStreamingSupport.csv()
val route = path("tweets") {
val headers = Source.single(List("uid", "text"))
val tweets: Source[List[String], NotUsed] = getTweets.via(tweetToRow)
complete(headers.concat(tweets))
}
Update: if your getTweets method returns a Future you can just map over its source value and prepend the headers that way, e.g:
val route = path("tweets") {
val headers = Source.single(List("uid", "text"))
val rows: Future[Source[List[String], NotUsed]] = getTweets
.map(tweets => headers.concat(tweets.via(tweetToRow)))
complete(rows)
}

Using Spark to extract part of the string in CSV format

Spark newbie here and hopefully you guys can give me some help. Thanks!
I am trying to extract a URL from a CSV file and the URL is located at the 16th column. The problem is that the URLs were written in a strange format as you can see from the print out from the code below. What is the best approach to get a the URL in correct format?
case class log(time_stamp: String, url: String )
val logText = sc.textFile("hdfs://...").map(s => s.split(",")).map( s => log(s(0).replaceAll("\"", ""),s(15).replaceAll("\"", ""))).toDF()
logText.registerTempTable("log")
val results = sqlContext.sql("SELECT * FROM log")
results.map(s => "URL: " + s(1)).collect().foreach(println)
URL: /XXX/YYY/ZZZ/http/www.domain.com/xyz/xyz
URL: /XX/YYY/ZZZ/https/sub.domain.com/xyz/xyz/xyz/xyz
URL: /XXX/YYY/ZZZ/http/www.domain.com/
URL: /VV/XXXX/YYY/ZZZ/https/sub.domain.com/xyz/xyz/xyz
You can try regexp_replace:
import org.apache.spark.sql.functions.regexp_replace
val df = sc.parallelize(Seq(
(1L, "/XXX/YYY/ZZZ/http/www.domain.com/xyz/xyz"),
(2L, "/XXX/YYY/ZZZ/https/sub.domain.com/xyz/xyz/xyz/xyz")
)).toDF("id", "url")
df
.select(regexp_replace($"url", "^(/\\w+){3}/(https?)/", "$2://").alias("url"))
.show(2, false)
// +--------------------------------------+
// |url |
// +--------------------------------------+
// |http://www.domain.com/xyz/xyz |
// |https://sub.domain.com/xyz/xyz/xyz/xyz|
// +--------------------------------------+
In Spark 1.4 you can try Hive UDF:
df.selectExpr("""regexp_replace(url, '^(/\w+){3}/(https?)/','$2://') AS url""")
If number of sections before http(s) can vary you adjust regexp by replacing {3} with * or range.
The question comes down to parsing the long strings and extracting the domain name. This solution will work as long as you don't have any of the random strings (XXX,YYYYY, etc.) be "http" and "https":
def getUrl(data: String): Option[String] = {
val slidingPairs = data.split("/").sliding(2)
slidingPairs.flatMap{
case Array(x,y) =>
if(x=="http" || x == "https") Some(y) else None
}.toList.headOption
}
Here are some examples in the REPL:
scala> getUrl("/XXX/YYY/ZZZ/http/www.domain.com/xyz/xyz")
res8: Option[String] = Some(www.domain.com)
scala> getUrl("/XXX/YYY/ZZZ/https/sub.domain.com/xyz/xyz/xyz/xyz")
resX: Option[String] = Some(sub.domain.com)
scala> getUrl("/XXX/YYY/ZZZ/a/asdsd/asdase/123123213/https/sub.domain.com/xyz/xyz/xyz/xyz")
resX: Option[String] = Some(sub.domain.com)

Converting a String to a Map

Given a String : {'Name':'Bond','Job':'Agent','LastEntry':'15/10/2015 13:00'}
I want to parse it into a Map[String,String], I already tried this answer but it doesn't work when the character : is inside the parsed value. Same thing with the ' character, it seems to break every JSON Mappers...
Thanks for any help.
Let
val s0 = "{'Name':'Bond','Job':'Agent','LastEntry':'15/10/2015 13:00'}"
val s = s0.stripPrefix("{").stripSuffix("}")
Then
(for (e <- s.split(",") ; xs = e.split(":",2)) yield xs(0) -> xs(1)).toMap
Here we split each key-value by the first occurrence of ":". Further this is a strong assumption, in that the key does not contain any ":".
You can use the familiar jackson-module-scala that can do this in much better scale.
For example:
val src = "{'Name':'Bond','Job':'Agent','LastEntry':'15/10/2015 13:00'}"
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
val myMap = mapper.readValue[Map[String,String]](src)