Source.fromInputStream exception handling during reading lines - scala

I have created a function where I take in as a parameter an inputstream and return an iterator consisting of a string. I accomplish this as follows:
def lineEntry(fileInputStream:InputStream):Iterator[String] = {
Source.fromInputStream(fileInputStream).getLines()
}
I use the method as follows:
val fStream = getSomeInputStreamFromSource()
lineEntry(fStream).foreach{
processTheLine(_)
}
Now it is quite possible that the method lineEntry might blow up if it encounters a bad character while it's iterating over the inputstream using the foreach.
What are some of the ways to counter this situation?

Quick solution (for Scala 2.10):
def lineEntry(fileInputStream:InputStream):Iterator[String] = {
implicit val codec = Codec.UTF8 // or any other you like
codec.onMalformedInput(CodingErrorAction.IGNORE)
Source.fromInputStream(fileInputStream).getLines()
}
In Scala 2.9 there's a small difference:
implicit val codec = Codec(Codec.UTF8)
Codec has also a few more configuration options with which you can tune its behaviour in such cases.

Related

How to iterate over result of Future List in Scala?

I am new to Scala and was trying my hands on with akka. I am trying to access data from MongoDB in Scala and want to convert it into JSON and XML format.
This code attached below is using path /getJson and calling getJson() function to get data in a form of future.
get {
concat(
path("getJson"){
val f = Patterns.ask(actor1,getJson(),10.seconds)
val res = Await.result(f,10.seconds)
val result = res.toString
complete(res.toString)
}
}
The getJson() method is as follows:
def getJson()= {
val future = collection.find().toFuture()
future
}
I have a Greeting Case class in file Greeting.scala:
case class Greeting(msg:String,name:String)
And MyJsonProtocol.scala file for Marshelling of scala object to JSON format as follows:
trait MyJsonProtocol extends SprayJsonSupport with DefaultJsonProtocol {
implicit val templateFormat = jsonFormat2(Greeting)
}
I am getting output of complete(res.toString) in Postman as :
Future(Success(List(
Iterable(
(_id,BsonObjectId{value=5fc73944986ced2b9c2527c4}),
(msg,BsonString{value='Hiiiiii'}),
(name,BsonString{value='Ruchirrrr'})
),
Iterable(
(_id,BsonObjectId{value=5fc73c35050ec6430ec4b211}),
(msg,BsonString{value='Holaaa Amigo'}),
(name,BsonString{value='Pablo'})),
Iterable(
(_id,BsonObjectId{value=5fc8c224e529b228916da59d}),
(msg,BsonString{value='Demo'}),
(name,BsonString{value='RuchirD'}))
)))
Can someone please tell me how to iterate over this output and to display it in JSON format?
When working with Scala, its very important to know your way around types. First step toweards this is at least knowing the types of your variables and values.
If you look at this method,
def getJson() = {
val future = collection.find().toFuture()
future
}
Is lacks the type type information at all levels, which is a really bad practice.
I am assuming that you are using mongo-scala-driver. And your collection is actually a MongoCollection[Document].
Which means that the output of collection.find() should be a FindOberservable[Document], hence collection.find().toFuture() should be a Future[Seq[Document]]. So, your getJson method should be written as,
def getJson(): Future[Seq[Document]] =
collection.find().toFuture()
Now, this means that you are passing a Future[Seq[Document]] to your actor1, which is again a bad practice. You should never send any kind of Future values among actors. It looks like your actor1 does nothing but sends the same message back. Why does this actor1 even required when it does nothing ?
Which means your f is a Future[Future[Seq[Document]]]. Then you are using Await.result to get the result of this future f. Which is again an anti-pattern, since Await blocks your thread.
Now, your res is a Future[Seq[Document]]. And you are converting it to a String and sending that string back with complete.
Your JsonProtocol is not working because you are not even passing it any Greeting's.
You have to do the following,
Read raw Bson objects from mongo.
convert raw Bson objects to your Gretting objects.
comlete your result with these Gretting objects. The JsonProtocol should take case of converting these Greeting objects to Json.
The easist way to do all this is by using the mongo driver's CodecRegistreis.
case class Greeting(msg:String, name:String)
Now, your MongoDAL object will look like following (it might be missing some imports, fill any missing imports as you did in your own code).
import org.mongodb.scala.bson.codecs.Macros
import org.mongodb.scala.bson.codecs.DEFAULT_CODEC_REGISTRY
import org.bson.codecs.configuration.CodecRegistries
import org.mongodb.scala.{MongoClient, MongoCollection, MongoDatabase}
object MongoDAL {
val greetingCodecProvider = Macros.createCodecProvider[Greeting]()
val codecRegistry = CodecRegistries.fromRegistries(
CodecRegistries.fromProviders(greetingCodecProvider),
DEFAULT_CODEC_REGISTRY
)
val mongoClient: MongoClient = ... // however you are connecting to mongo and creating a mongo client
val mongoDatabase: MongoDatabase =
mongoClient
.getDatabase("database_name")
.withCodecRegistry(codecRegistry)
val greetingCollection: MongoCollection[Greeting] =
mongoDatabase.getCollection[Greeting]("greeting_collection_name")
def fetchAllGreetings(): Future[Seq[Greeting]] =
greetingCollection.find().toFuture()
}
Now, your route can be defined as
get {
concat(
path("getJson") {
val greetingSeqFuture: Future[Seq[Greeting]] = MongoDAL.fetchAllGreetings()
// I don't see any need for that actor thing,
// but if you really need to do that, then you can
// do that by using flatMap to chain future computations.
val actorResponseFuture: Future[Seq[Greeting]] =
greetingSeqFuture
.flatMap(greetingSeq => Patterns.ask(actor1, greetingSeq, 10.seconds))
// complete can handle futures just fine
// it will wait for futre completion
// then convert the seq of Greetings to Json using your JsonProtocol
complete(actorResponseFuture)
}
}
First of all, don't call toString in complete(res.toString).
As it said in AkkaHTTP json support guide if you set everything right, your case class will be converted to json automatically.
But as I see in the output, your res is not an object of a Greeting type. Looks like it is somehow related to the Greeting and has the same structure. Seems to be a raw output of the MongoDB request. If it is a correct assumption, you should convert the raw output from MongoDB to your Greeting case class.
I guess it could be done in getJson() after collection.find().

Calling function library scala

I'm looking to call the ATR function from this scala wrapper for ta-lib. But I can't figure out how to use wrapper correctly.
package io.github.patceev.talib
import com.tictactec.ta.lib.{Core, MInteger, RetCode}
import scala.concurrent.Future
object Volatility {
def ATR(
highs: Vector[Double],
lows: Vector[Double],
closes: Vector[Double],
period: Int = 14
)(implicit core: Core): Future[Vector[Double]] = {
val arrSize = highs.length - period + 1
if (arrSize < 0) {
Future.successful(Vector.empty[Double])
} else {
val begin = new MInteger()
val length = new MInteger()
val result = Array.ofDim[Double](arrSize)
core.atr(
0, highs.length - 1, highs.toArray, lows.toArray, closes.toArray,
period, begin, length, result
) match {
case RetCode.Success =>
Future.successful(result.toVector)
case error =>
Future.failed(new Exception(error.toString))
}
}
}
}
Would someone be able to explain how to use function and print out the result to the console.
Many thanks in advance.
Regarding syntax, Scala is one of many languages where you call functions and methods passing arguments in parentheses (mostly, but let's keep it simple for now):
def myFunction(a: Int): Int = a + 1
myFunction(1) // myFunction is called and returns 2
On top of this, Scala allows to specify multiple parameters lists, as in the following example:
def myCurriedFunction(a: Int)(b: Int): Int = a + b
myCurriedFunction(2)(3) // myCurriedFunction returns 5
You can also partially apply myCurriedFunction, but again, let's keep it simple for the time being. The main idea is that you can have multiple lists of arguments passed to a function.
Built on top of this, Scala allows to define a list of implicit parameters, which the compiler will automatically retrieve for you based on some scoping rules. Implicit parameters are used, for example, by Futures:
// this defines how and where callbacks are run
// the compiler will automatically "inject" them for you where needed
implicit val ec: ExecutionContext = concurrent.ExecutionContext.global
Future(4).map(_ + 1) // this will eventually result in a Future(5)
Note that both Future and map have a second parameter list that allows to specify an implicit execution context. By having one in scope, the compiler will "inject" it for you at the call site, without having to write it explicitly. You could have still done it and the result would have been
Future(4)(ec).map(_ + 1)(ec)
That said, I don't know the specifics of the library you are using, but the idea is that you have to instantiate a value of type Core and either bind it to an implicit val or pass it explicitly.
The resulting code will be something like the following
val highs: Vector[Double] = ???
val lows: Vector[Double] = ???
val closes: Vector[Double] = ???
implicit val core: Core = ??? // instantiate core
val resultsFuture = Volatility.ATR(highs, lows, closes) // core is passed implicitly
for (results <- resultsFuture; result <- results) {
println(result)
}
Note that depending on your situation you may have to also use an implicit ExecutionContext to run this code (because you are extracting the Vector[Double] from a Future). Choosing the right execution context is another kind of issue but to play around you may want to use the global execution context.
Extra
Regarding some of the points I've left open, here are some pointers that hopefully will turn out to be useful:
Operators
Multiple Parameter Lists (Currying)
Implicit Parameters
Scala Futures

Scala capture method of field without whole instance

I had a piece of code that looks like this:
val foo = df.map(parser.parse) // def parse(str: String): ParsedData = { ... }
However, I found out that passes a lambda into Scala that captures this, I guess Scala treated code above into:
val foo = df.map(s => /* this. */parser.parse(s))
where my intention to pass lightweight capture. In Java, I'd do parser::parse, but that isn't available in Scala.
That does the trick:
val tmp = parser // split val read from capture; must be in method body
val foo = df.map(tmp.parse)
but makes code a little bit more unpleasant. There is an answer by #tiago-henrique-engel that could be used like:
val tmp = parser.parse _
val foo = df.map(tmp)
but it still requires a temporary val.
So, is there a way to pass lightweight capture to map without storing it first into some val to keep the code clean?

Scala Infinite Iterator OutOfMemory

I'm playing around with Scala's lazy iterators, and I've run into an issue. What I'm trying to do is read in a large file, do a transformation, and then write out the result:
object FileProcessor {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
// this "basic" lazy iterator works fine
// val iterator = inSource.getLines
// ...but this one, which incorporates my process method,
// throws OutOfMemoryExceptions
val iterator = process(inSource.getLines.toSeq).iterator
while(iterator.hasNext) outSource.println(iterator.next)
} finally {
inSource.close()
outSource.close()
}
}
// processing in this case just means upper-cases every line
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
}
So I'm getting an OutOfMemoryException on large files. I know you can run afoul of Scala's lazy Streams if you keep around references to the head of the Stream. So in this case I'm careful to convert the result of process() to an iterator and throw-away the Seq it initially returns.
Does anyone know why this still causes O(n) memory consumption? Thanks!
Update
In response to fge and huynhjl, it seems like the Seq might be the culprit, but I don't know why. As an example, the following code works fine (and I'm using Seq all over the place). This code does not produce an OutOfMemoryException:
object FileReader {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
writeToFile(outSource, process(inSource.getLines.toSeq))
} finally {
inSource.close()
outSource.close()
}
}
#scala.annotation.tailrec
private def writeToFile(outSource: PrintWriter, contents: Seq[String]) {
if (! contents.isEmpty) {
outSource.println(contents.head)
writeToFile(outSource, contents.tail)
}
}
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
As hinted by fge, modify process to take an iterator and remove the .toSeq. inSource.getLines is already an iterator.
Converting to a Seq will cause the items to be remembered. I think it will convert the iterator into a Stream and cause all items to be remembered.
Edit: Ok, it's more subtle. You are doing the equivalent of Iterator.toSeq.iterator by calling iterator on the result of process. This can cause an out of memory exception.
scala> Iterator.continually(1).toSeq.iterator.take(300*1024*1024).size
java.lang.OutOfMemoryError: Java heap space
It may be the same issue as reported here: https://issues.scala-lang.org/browse/SI-4835. Note my comment at the end of the bug, this is from personal experience.

How to convert Enumeration to Seq/List in scala?

I'm writing a servlet, and need to get all parameters from the request. I found request.getParameterNames returns a java.util.Enumeration, so I have to write code as:
val names = request.getParameterNames
while(names.hasMoreElements) {
val name = names.nextElement
}
I wanna know is there any way to convert a Enumeration to a Seq/List, then I can use the map method?
Use JavaConverters
See https://stackoverflow.com/a/5184386/133106
Use a wrapper Iterator
You could build up a wrapper:
val nameIterator = new Iterator[SomeType] { def hasNext = names.hasMoreElements; def next = names.nextElement }
Use JavaConversions wrapper
val nameIterator = new scala.collection.JavaConversions.JEnumerationWrapper(names)
Using JavaConversions implicits
If you import
import scala.collection.JavaConversions._
you can do it implicitly (and you’ll also get implicit conversions for other Java collecitons)
request.getParameterNames.map(println)
Use Iterator.continually
You might be tempted to build an iterator using Iterator.continually like an earlier version of this answer proposed:
val nameIterator = Iterator.continually((names, names.nextElement)).takeWhile(_._1.hasMoreElements).map(_._2)
but it's incorrect as the last element of the enumerator will be discarded.
The reason is that the hasMoreElement call in the takeWhile is executed after calling nextElement in the continually, thus discarding the last value.
Current best practice (since 2.8.1) is to use scala.collection.JavaConverters
Scaladoc here
This class differs from JavaConversions slightly, in that the conversions are not fully automatic, giving you more control (this is a good thing):
import collection.JavaConverters._
val names = ...
val nameIterator = names.asScala
Using this mechanism, you'll get appropriate and type-safe conversions for most collection types via the asScala/asJava methods.
I don't disagree with any of the other answers but I had to add a type cast to get this to compile in Scala 2.9.2 and Java 7.
import scala.collection.JavaConversions._
...
val names=request.getParameterNames.asInstanceOf[java.util.Enumeration[String]].toSet
A comment on Debilski's answer that the Iterator.continually approach is wrong because it misses the last entry. Here's my test:
val list = new java.util.ArrayList[String]
list.add("hello")
list.add("world")
val en = java.util.Collections.enumeration(list)
val names = Iterator.continually((en, en.nextElement)).takeWhile(_._1.hasMoreElements).map(_._2)
.foreach { name => println("name=" + name) }
Output is
name=hello
The second item (name=world) is missing!
I got this to work by using JavaConversions.enumerationAsScalaIterator as mentioned by others.
Note I don't have enough rep to comment on Debilski's post directly.