Streaming CSV with akka-http in scala

Streaming CSV with akka-http in scala - scala

I am very new to akka-http, and I would like to stream a csv with an arbitrary number of lines.
For instance, I would like to return :
a,1
b,2
c,3
with the following code
implicit val actorSystem = ActorSystem("system")
implicit val actorMaterializer = ActorMaterializer()
val map = new mutable.HashMap[String, Int]()
map.put("a", 1)
map.put("b", 2)
map.put("c", 3)
val `text/csv` = ContentType(MediaTypes.`text/csv`, `UTF-8`)
val route =
path("test") {
complete {
HttpEntity(`text/csv`, ??? using map)
}
}
Http().bindAndHandle(route,"localhost",8080)
Thanks for your help
EDIT: Thanks to Ramon J Romero y Vigil
package test
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.HttpCharsets.`UTF-8`
import akka.http.scaladsl.model._
import akka.http.scaladsl.server.Directives._
import akka.stream._
import akka.util.ByteString
import scala.collection.mutable
object Test{
def main(args: Array[String]) {
implicit val actorSystem = ActorSystem("system")
implicit val actorMaterializer = ActorMaterializer()
val map = new mutable.HashMap[String, Int]()
map.put("a", 1)
map.put("b", 2)
map.put("c", 3)
val mapStream = Stream.fromIterator(() => map.toIterator)
.map((k: String, v: Int) => s"$k,$v")
.map(ByteString.apply)
val `text/csv` = ContentType(MediaTypes.`text/csv`, `UTF-8`)
val route =
path("test") {
complete {
HttpEntity(`text/csv`, mapStream)
}
}
Http().bindAndHandle(route, "localhost", 8080)
}
}
With this code I have two compile error:
Error:(29, 28) value fromIterator is not a member of object scala.collection.immutable.Stream
val mapStream = Stream.fromIterator(() => map.toIterator)
Error:(38, 11) overloaded method value apply with alternatives:
(contentType: akka.http.scaladsl.model.ContentType,file: java.io.File,chunkSize: Int)akka.http.scaladsl.model.UniversalEntity <and>
(contentType: akka.http.scaladsl.model.ContentType,data: akka.stream.scaladsl.Source[akka.util.ByteString,Any])akka.http.scaladsl.model.HttpEntity.Chunked <and>
(contentType: akka.http.scaladsl.model.ContentType,data: akka.util.ByteString)akka.http.scaladsl.model.HttpEntity.Strict <and>
(contentType: akka.http.scaladsl.model.ContentType,bytes: Array[Byte])akka.http.scaladsl.model.HttpEntity.Strict <and>
(contentType: akka.http.scaladsl.model.ContentType.NonBinary,string: String)akka.http.scaladsl.model.HttpEntity.Strict
cannot be applied to (akka.http.scaladsl.model.ContentType.WithCharset, List[akka.util.ByteString])
HttpEntity(`text/csv`, mapStream)
I used a List of tuples to get arround the first issue (hower i do not know how to stream a map in Scala)
No idea for the second
Thanks for your help.
(I am using scala 2.11.8)

Use the apply function in HttpEntity that takes in a Source[ByteString,Any]. The apply creates a Chunked entity. You can read your file using code based on the documentation for streaming file IO using an akka stream Source:
import akka.stream.scaladsl._
val file = Paths.get("yourFile.csv")
val entity = HttpEntity(`txt/csv`, FileIO.fromPath(file))
The stream will break up your file into chunk sizes, default is currently set to 8192.
To stream the map that you've created you can use a similar trick:
val mapStream = Source.fromIterator(() => map.toIterator)
.map( (k : String, v : Int) => s"$k,$v" )
.map(ByteString.apply)
val mapEntity = HttpEntity(`test/csv`, mapStream)

Related

Circe encoder for a SortedMultiDict

I'm trying to write a Circe encoder for an object which has a field of scala.collection.immutable.SortedMultiDict. Circe can't find an encoder instance for that, so I need to write one.
import io.circe.{Decoder, Encoder, HCursor}
import io.circe.generic.semiauto._
import io.circe.parser.decode
import scala.collection.immutable.SortedMultiDict
import io.circe.syntax._
implicit val mapEncoder: Encoder[List[(Long, String)]] = deriveEncoder[List[(Long, String)]]
implicit val mapDecoder: Decoder[List[(Long, String)]] = deriveDecoder[List[(Long, String)]]
implicit val oneEncoder: Encoder[SortedMultiDict[Long, String]] = (a: SortedMultiDict[Long, String]) =>
mapEncoder(a.toList)
implicit val oneDecoder: Decoder[SortedMultiDict[Long, String]] = (c: HCursor) =>
mapDecoder.map(SortedMultiDict.from[Long, String])(c)
Sadly, this isn't correct...
val test = SortedMultiDict.from[Long, String](Seq(1666268475626L -> "a5d9f51d-35c7-4fef-b4a3-3d28944eeb2b", 1666268475626L -> "df359396-043c-4b65-bc3 -bf309d433ff5"))
val encodedData = test.asJson.noSpaces
val roundTrip = decode[SortedMultiDict[Long, String]](encodedData)
results in
scala> roundTrip
val res2: Either[io.circe.Error,scala.collection.immutable.SortedMultiDict[Long,String]] = Left(DecodingFailure(Attempt to decode value on failed cursor, List(DownField(head), DownField(::))))
In fact, the derived list encoder doesn't appear to work...
scala> val myList = List((1666268475626L, "a5d9f51d-35c7-4fef-b4a3-3d28944eeb2b"), (1666268475626L, "df359396-043c-4b65-bc3 -bf309d433ff5"))
val myList: List[(Long, String)] = List((1666268475626,a5d9f51d-35c7-4fef-b4a3-3d28944eeb2b), (1666268475626,df359396-043c-4b65-bc3 -bf309d433ff5))
scala> decode[List[(Long, String)]](myList.asJson.noSpaces)
val res0: Either[io.circe.Error,List[(Long, String)]] = Left(DecodingFailure(Attempt to decode value on failed cursor, List(DownField(head), DownField(::))))
Are my expectations of how to do the round trip of encoding/decoding wrong? It's what I'd understood from Circe's codec docs.
EDIT: Well, it works if I change the map codecs to be:
implicit val mapEncoder: Encoder[List[(Long, String)]] = Encoder.encodeList[(Long, String)]
implicit val mapDecoder: Decoder[List[(Long, String)]] = Decoder.decodeList[(Long, String)]
I still don't really understand why the earlier ones don't work, though, explications welcome...

Put elements in stream and return an object

In akka, I want to put the elements in stream and return an object. I know the elements could be a source to run a graph. But how can I put the element and return an object on runtime?
import akka.actor.ActorSystem
import akka.stream.QueueOfferResult.{Dropped, Enqueued, Failure, QueueClosed}
import akka.stream.{ActorMaterializer, OverflowStrategy}
import akka.stream.scaladsl.{Keep, Sink, Source}
import scala.Array.range
import scala.util.Success
object StreamElement {
implicit val system = ActorSystem("StreamElement")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
def main(args: Array[String]): Unit = {
val (queue, value) = Source
.queue[Int](10, OverflowStrategy.backpressure)
.map(x => {
x * x
})
.toMat(Sink.asPublisher(false))(Keep.both)
.run()
range(0, 10)
.map(x => {
queue.offer(x).onComplete {
case Success(Enqueued) => {
}
case Success(Dropped) => {}
case _ => {
println("others")
}
}
})
}
}
How can I get the value returned?

Actually, you want to return the int value for each element.
So you could create the flow, then connect to source and Sink for each time.
package tech.parasol.scala.akka
import akka.actor.ActorSystem
import akka.stream.QueueOfferResult.{Dropped, Enqueued, Failure, QueueClosed}
import akka.stream.{ActorMaterializer, OverflowStrategy}
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import scala.Array.range
import scala.util.Success
object StreamElement {
implicit val system = ActorSystem("StreamElement")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val flow = Flow[Int]
.buffer(16, OverflowStrategy.backpressure)
.map(x => x * x)
def main(args: Array[String]): Unit = {
range(0, 10)
.map(x => {
Source.single(x).via(flow).runWith(Sink.head)
}.map( v => println("v ===> " + v)
))
}
}

It's unclear to me why the Scala collection isn't fed to the Stream as a Source in your sample code. Given that you've already composed a Stream with materialized values to be captured in a Source Queue and a publisher Sink, you could create a subscriber Source using Source.fromPublisher to collect the wanted values, as shown below:
import akka.actor.ActorSystem
import akka.stream.scaladsl._
import akka.stream._
implicit val system = ActorSystem("system")
implicit val materializer = ActorMaterializer() // Not needed for Akka 2.6+
val (queue, pub) = Source
.queue[Int](10, OverflowStrategy.backpressure)
.map(x => x * x)
.toMat(Sink.asPublisher(false))(Keep.both)
.run()
val fromQueue = Source(0 until 10).runForeach(queue.offer(_))
val source = Source.fromPublisher(pub)
source.runForeach(x => print(x + " "))
// Output:
// 0 1 4 9 16 25 36 49 64 81

Akka Streams Inlets and Outlets match

Here is the simplest graph using a Partition and Merge that I could come up with, but when run it gives the following error:
requirement failed: The inlets [] and outlets [] must correspond to the inlets [Merge.in0, Merge.in1] and outlets [Partition.out0, Partition.out1]
I understand that the message indicates that I either have more outputs than inputs or an unconnected flow, but I can't seem to see in this simple example where the mismatch is.
Any help is appreciated.
The graph:
def createGraph()(implicit actorSystem: ActorSystem): Graph[ClosedShape, Future[Done]] = {
GraphDSL.create(Sink.ignore) { implicit builder: GraphDSL.Builder[Future[Done]] => s =>
import GraphDSL.Implicits._
val inputs: List[Int] = List(1, 2, 3, 4)
val source: Source[Int, NotUsed] = Source(inputs)
val messageSplit: UniformFanOutShape[Int, Int] = builder.add(Partition[Int](2, i => i%2))
val messageMerge: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2))
val processEven: Flow[Int, Int, NotUsed] = Flow[Int].map(rc => {
actorSystem.log.debug(s"even: $rc")
rc
})
val processOdd: Flow[Int, Int, NotUsed] = Flow[Int].map(rc => {
actorSystem.log.debug(s"odd: $rc")
rc
})
source ~> messageSplit.in
messageSplit.out(0) -> processEven -> messageMerge.in(0)
messageSplit.out(1) -> processOdd -> messageMerge.in(1)
messageMerge.out ~> s
ClosedShape
}
}
The test:
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl.{Flow, GraphDSL, Merge, Partition, RunnableGraph, Sink, Source}
import akka.{Done, NotUsed}
import org.scalatest.FunSpec
import scala.concurrent.Future
class RoomITSpec extends FunSpec {
implicit val actorSystem: ActorSystem = ActorSystem("RoomITSpec")
implicit val actorCreator: ActorMaterializer = ActorMaterializer()
describe("graph") {
it("should run") {
val graph = createGraph()
RunnableGraph.fromGraph(graph).run
}
}
}

Small syntactic mistake.
// Notice the curly arrows
messageSplit.out(0) ~> processEven ~> messageMerge.in(0)
messageSplit.out(1) ~> processOdd ~> messageMerge.in(1)
Instead of what you wrote:
// Straight arrows
messageSplit.out(0) -> processEven -> messageMerge.in(0)
messageSplit.out(1) -> processOdd -> messageMerge.in(1)
You ended up generating (and throwing away) tuples instead of adding to the graph.

Implicits in a Spark Scala program not working

I am not able to perform an implicit conversion from an RDD to a Dataframe in a Scala program although I am importing spark.implicits._.
Any help would be appreciated.
Main Program with the implicits:
object spark1 {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("e1").config("o1", "sv").getOrCreate()
import spark.implicits._
val conf = new SparkConf().setMaster("local").setAppName("My App")
val sc = spark.sparkContext
val data = sc.textFile("/TestDataB.txt")
val allSplit = data.map(line => line.split(","))
case class CC1(LAT: Double, LONG: Double)
val allData = allSplit.map( p => CC1( p(0).trim.toDouble, p(1).trim.toDouble))
val allDF = allData.toDF()
// ... other code
}
}
Error is as follows:
Error:(40, 25) value toDF is not a member of org.apache.spark.rdd.RDD[CC1]
val allDF = allData.toDF()

When you define the case class CC1 inside the main method, you hit https://issues.scala-lang.org/browse/SI-6649; toDF() then fails to locate the appropriate implicit TypeTag for that class at compile time.
You can see this in this simple example:
case class Out()
object TestImplicits {
def main(args: Array[String]) {
case class In()
val typeTagOut = implicitly[TypeTag[Out]] // compiles
val typeTagIn = implicitly[TypeTag[In]] // does not compile: Error:(23, 31) No TypeTag available for In
}
}
Spark's relevant implicit conversion has this type parameter: [T <: Product : TypeTag] (see newProductEncoder here), which means an implicit TypeTag[CC1] is required.
To fix this - simply move the definition of CC1 out of the method, or out of object entirely:
case class CC1(LAT: Double, LONG: Double)
object spark1 {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("e1").config("o1", "sv").getOrCreate()
import spark.implicits._
val data = spark.sparkContext.textFile("/TestDataB.txt")
val allSplit = data.map(line => line.split(","))
val allData = allSplit.map( p => CC1( p(0).trim.toDouble, p(1).trim.toDouble))
val allDF = allData.toDF()
// ... other code
}
}

I thought the toDF is in sqlContext.implicits._ so you need to import that not spark.implicits._. At least that is the case in spark 1.6

Class import error in Scala/Spark

I am new to Spark and I'm using it with Scala. I wrote a simple object that is loaded fine in spark-shell using :load test.scala.
import org.apache.spark.ml.feature.StringIndexer
object Collaborative{
def trainModel() ={
val data = sc.textFile("/user/PT/data/newfav.csv")
val df = data.map(_.split(",") match {
case Array(user,food,fav) => (user,food,fav.toDouble)
}).toDF("userID","foodID","favorite")
val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")
}
}
Now I want to put it in a class to pass parameters. I use the same code with class instead.
import org.apache.spark.ml.feature.StringIndexer
class Collaborative{
def trainModel() ={
val data = sc.textFile("/user/PT/data/newfav.csv")
val df = data.map(_.split(",") match {
case Array(user,food,fav) => (user,food,fav.toDouble)
}).toDF("userID","foodID","favorite")
val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")
}
}
This returns import errors.
<console>:19: error: value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
val df = data.map(_.split(",") match { case Array(user,food,fav) => (user,food,fav.toDouble) }).toDF("userID","foodID","favorite")
<console>:24: error: not found: type StringIndexer
val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")
What am I missing here?

Try this one, this one seems to work fine.
def trainModel() ={
val spark = SparkSession.builder().appName("test").master("local").getOrCreate()
import spark.implicits._
val data = spark.read.textFile("/user/PT/data/newfav.csv")
val df = data.map(_.split(",") match {
case Array(user,food,fav) => (user,food,fav.toDouble)
}).toDF("userID","foodID","favorite")
val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Streaming CSV with akka-http in scala - scala

Related

Circe encoder for a SortedMultiDict

Put elements in stream and return an object

Akka Streams Inlets and Outlets match

Implicits in a Spark Scala program not working

Class import error in Scala/Spark

Categories

Resources