Scala - Traversing a ByteString Until Empty - scala

Is there a more concise and/or performant way to traverse the message than what I have here?
import akka.util.ByteString
#throws[GarbledMessageException]
def nextValue(message: ByteString) =
message.indexOf(delimiter) match {
case i if i >= 0 => message.splitAt(i)
case _ => throw new GarbledMessageException("Delimiter Not Found")
}
#tailrec
def processFields(message: ByteString): Unit = nextValue(message) match {
case (_, ByteString.empty) => // Complete Parsing
case (value, rest) =>
// Do work with value
// loop
processFields(rest)
}
A new ByteString is created for each split which hurts performance, but at least the underlying Buffer is not copied, only reference counted.
Maybe it can be even better than that?

It may depend on specifically what kind of work you are doing, but if you are looking for something more performant than splitting off ByteStrings, take a look at ByteIterator, which you can get by calling iterator on a ByteString.
A ByteIterator would allow you to go directly to primitive values (ints, floats, etc.) without having to split off new ByteStrings first.

Related

How to find the first duplicate in a Stream in scala

How to find the first duplicate in a Stream in scala ?
My current idea is to pair each element with a Set of all previous elements. Afterwards, find is called on the resulting Stream.
So, for each element, we have
an insertion in a Set : O(1)
a test contains in a Set : O(1)
Hence, the overall complexity of this algo seems O(n).
def firstDuplicate[A](s: Stream[A]) = {
def recurse(s: Stream[A], set: Set[A]) : Stream[(A, Set[A])]=
(s.head, set) #:: recurse(s.tail, set + s.head)
val pairedWithElements = recurse(s, Set.empty)
pairedWithElements.find{ case (e, elems) => elems.contains(e)}.get._1
}
Is there a better way ?
You should make your function tail recursive. The way you have it, you are making pretty much another copy of your whole stream on the stack. Also, I don't understand why you are making a copy of the entire stream (and a whoooole buuunch of sets), and then scanning it again to find the dup. You can tell it's a dup right away, when adding it to the set, and stop right there.
Something like this perhaps:
def firstDup[T](s: Stream[T], seen: Set[T] = Set.empty[T]): Option[T] = s match {
case head #:: tail if seen(head) => Some(head)
case head #:: tail => firstDup(tail, seen + head)
case _ => None
}
The bloom filter suggestion from the comments above is a good idea for truly huge input streams. The "outer shell" would stay the same in that case, you'd just need to change the underlying seen implementation.

How should I handle mapping over a DecodeResult?

Often, I find myself with JSON that I wish to parse with strings somewhere inside that I want to parse in a non-trivial way (not just to a String). In such a case I need to make a decoder or a codec for it so I might try to do something like the following:
CodecJson[URL](_.toString.asJson, h ⇒
h.as[String].flatMap(s ⇒ Try{new URL(s)}.toOption)
)
but this won't compile because I can't flatMap over an Option.
How should this (seemingly common) behavior be handled?
One option would be to decode to an Option[URL], but that seems like a bummer if you just want to fail the decode.
Is there an accepted way of dealing with these subsequent decoding operations?
To solve the flatmap issue, you can convert from Try to DecodeResult. And the following technique also generalizes the String to X Codec.
def stringToTCodec[T](toString: T => String, fromString: String => T) = CodecJson[T](toString(_).asJson, h ⇒
h.as[String].flatMap(s ⇒ Try(fromString(s)) match {
case Success(u) => DecodeResult.ok(u)
case Failure(t) => DecodeResult.fail(s, h.history)
}))
implicit val urlCodec = stringToTCodec[URL](u => u.toString, s => new URL(s))

Mapping many Eithers to one Either with many

Say I have a monadic function in called processOne defined like this:
def processOne(input: Input): Either[ErrorType, Output] = ...
Given a list of "Inputs", I would like to return a corresponding list of "Outputs" wrapped in an Either:
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = ...
processMany will call processOne for each input it has, however, I would like it to terminate the first time (if any) that processOne returns a Left, and return that Left, otherwise return a Right with a list of the outputs.
My question: what is the best way to implement processMany? Is it possible to accomplish this behavior using a for expression, or is it going to be necessary for me to iterate the list myself recursively?
With Scalaz 7:
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] =
inputs.toStream traverseU processOne
Converting inputs to a Stream[Input] takes advantage of the non-strict traverse implementation for Stream, i.e. gives you the short-circuiting behaviour you want.
By the way, you tagged this "monads", but traversal requires only an applicative functor (which, as it happens, is probably defined in terms of the monad for Either). For further reference, see the paper The Essence of the Iterator Pattern, or, for a Scala-based interpretation, Eric Torreborre's blog post on the subject.
The easiest with standard Scala, which doesn't evaluate more than is necessary, would probably be
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = {
Right(inputs.map{ x =>
processOne(x) match {
case Right(r) => r
case Left(l) => return Left(l)
}
})
}
A fold would be more compact, but wouldn't short-circuit when it hit a left (it'd just keep carrying it along while you iterated through the entire input).
For now, I've decided to just solve this using recursion, as I am reluctant to add a dependency to a library (Scalaz).
(Types and names in my application have been changed here in order to appear more generic)
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = {
import scala.annotation.tailrec
#tailrec
def traverse(acc: Vector[Output], inputs: List[Input]): Either[ErrorType, Seq[Output]] = {
inputs match {
case Nil => Right(acc)
case input :: more =>
processOne(input) match {
case Right(output) => traverse(acc :+ output, more)
case Left(e) => Left(e)
}
}
}
traverse(Vector[Output](), inputs.toList)
}

Scala - return empty Option if value contained in array

I'm splitting an input of type Option[String] into an Option[Array[String]] as follows:
val input:Option[String] = Option("a=b,1000,what?")
val result: Option[Array[String]] = input map { _.split(",") }
I want to add a test whereby if any member of the array matches (eg, is an Long less than 0), the whole array is discarded and an empty Option returned.
Use filter to perform a test on the content of an Option.
Use exists to check whether any member of the collection fullfils a condition.
result.filter(! _.exists(s => test(s)))
or
result.filterNot(_.exists(s => test(s)))
Have you considered using find() on the collection ? If it returns a Some(x), then something has satisfied the condition.
list.find(_ < 0) match {
case Some(x) => None
case None => Some(list)
}
Of course you know that you can split and then filter as #ziggystar suggests, but if you have a really big Stringand an element at the beginning matches then it's pointless to finish splitting the string when you know it's going to be discarded.
In this case, if you're worried about time efficiency, you can use a Stream and re-implement the split operation, something like this:
def result(input:Option[String]):Option[Seq[String]] = {
def split(c: Char, chars:Stream[Char]):Stream[String] = {
val (head,tail) = chars span(_ != c)
head.mkString #:: (if(tail isEmpty) Stream.empty else split(c, tail tail))
}
input map {s => split(',', Stream(s:_*)) } filter (_.forall (s => !test(s)))
}
Note that the map/filter structure stays the same, but it is now short-circuiting due to the use of Stream.
If it's a really big string you probably have it as a Stream[Char] already which means you don't even have the memory overhead of hanging on the original String.

Processing Scala Option[T]

I have a Scala Option[T]. If the value is Some(x) I want to process it with a a process that does not return a value (Unit), but if it is None, I want to print an error.
I can use the following code to do this, but I understand that the more idiomatic way is to treat the Option[T] as a sequence and use map, foreach, etc. How do I do this?
opt match {
case Some(x) => // process x with no return value, e.g. write x to a file
case None => // print error message
}
I think explicit pattern matching suits your use case best.
Scala's Option is, sadly, missing a method to do exactly this. I add one:
class OptionWrapper[A](o: Option[A]) {
def fold[Z](default: => Z)(action: A => Z) = o.map(action).getOrElse(default)
}
implicit def option_has_utility[A](o: Option[A]) = new OptionWrapper(o)
which has the slightly nicer (in my view) usage
op.fold{ println("Empty!") }{ x => doStuffWith(x) }
You can see from how it's defined that map/getOrElse can be used instead of pattern matching.
Alternatively, Either already has a fold method. So you can
op.toRight(()).fold{ _ => println("Empty!") }{ x => doStuffWith(x) }
but this is a little clumsy given that you have to provide the left value (here (), i.e. Unit) and then define a function on that, rather than just stating what you want to happen on None.
The pattern match isn't bad either, especially for longer blocks of code. For short ones, the overhead of the match starts getting in the way of the point. For example:
op.fold{ printError }{ saveUserInput }
has a lot less syntactic overhead than
op match {
case Some(x) => saveUserInput(x)
case None => printError
}
and therefore, once you expect it, is a lot easier to comprehend.
I'd recommend to simply and safely use opt.get which itself throws a NoSuchElementException exception if opt is None. Or if you want to throw your own exception, you can do this:
val x = opt.getOrElse(throw new Exception("Your error message"))
// x is of type T
as #missingfaktor says, you are in the exact scenario where pattern matching is giving the most readable results.
If Option has a value you want to do something, if not you want to do something else.
While there are various ways to use map and other functional constructs on Option types, they are generally useful when:
you want to use the Some case and ignore the None case e.g. in your case
opt.map(writeToFile(_)) //(...if None just do nothing)
or you want to chain the operations on more than one option and give a result only when all of them are Some. For instance, one way of doing this is:
val concatThreeOptions =
for {
n1 <- opt1
n2 <- opt2
n3 <- opt3
} yield n1 + n2 + n3 // this will be None if any of the three is None
// we will either write them all to a file or none of them
but none of these seem to be your case
Pattern matching is the best choice here.
However, if you want to treat Option as a sequence and to map over it, you can do it, because Unit is a value:
opt map { v =>
println(v) // process v (result type is Unit)
} getOrElse {
println("error")
}
By the way, printing an error is some kind of "anti-pattern", so it's better to throw an exception anyway:
opt.getOrElse(throw new SomeException)