Scala: how to traverse stream/iterator collecting results into several different collections - scala

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}

Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}

Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}

I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}

Related

Use of Scala Loan pattern in Success Case

I'm following the tutorial from Alvin Alexander to use Loan Pattern
Here is the code what I use -
val year = 2016
val nationalData = {
val source = io.Source.fromFile(s"resources/Babynames/names/yob$year.txt")
// names is iterator of String, split() gives the array
//.toArray & toSeq is a slow process compare to .toSet // .toSeq gives Stream Closed error
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")(0)).toSet
source.close()
names
// println(names.mkString(","))
}
println("Names " + nationalData)
val info = for (stateFile <- new java.io.File("resources/Babynames/namesbystate").list(); if stateFile.endsWith(".TXT")) yield {
val source = io.Source.fromFile("resources/Babynames/namesbystate/" + stateFile)
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")).
filter(a => a(2).toInt == year).map(a => a(3)).toArray // .toSet
source.close()
(stateFile.take(2), names)
}
println(info(0)._2.size + " names from state "+ info(0)._1)
println(info(1)._2.size + " names from state "+ info(1)._1)
for ((state, sname) <- info) {
println("State: " +state + " Coverage of name in "+ year+" "+ sname.count(n => nationalData.contains(n)).toDouble / nationalData.size) // Set doesn't have length method
}
This is how I applied readTextFile, readTextFileWithTry on the above code to learn/experiment Loan Pattern in the above code
def using[A <: { def close(): Unit }, B](resource: A)(f: A => B): B =
try {
f(resource)
} finally {
resource.close()
}
def readTextFile(filename: String): Option[List[String]] = {
try {
val lines = using(fromFile(filename)) { source =>
(for (line <- source.getLines) yield line).toList
}
Some(lines)
} catch {
case e: Exception => None
}
}
def readTextFileWithTry(filename: String): Try[List[String]] = {
Try {
val lines = using(fromFile(filename)) { source =>
(for (line <- source.getLines) yield line).toList
}
lines
}
}
val year = 2016
val data = readTextFile(s"resources/Babynames/names/yob$year.txt") match {
case Some(lines) =>
val n = lines.filter(_.nonEmpty).map(_.split(",")(0)).toSet
println(n)
case None => println("couldn't read file")
}
val data1 = readTextFileWithTry("resources/Babynames/namesbystate")
data1 match {
case Success(lines) => {
val info = for (stateFile <- data1; if stateFile.endsWith(".TXT")) yield {
val source = fromFile("resources/Babynames/namesbystate/" + stateFile)
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")).
filter(a => a(2).toInt == year).map(a => a(3)).toArray // .toSet
(stateFile.take(2), names)
println(names)
}
}
But in the second case, readTextFileWithTry, I am getting the following error -
Failed, message is: java.io.FileNotFoundException: resources\Babynames\namesbystate (Access is denied)
I guess the reason for the failure is from SO what I understand -
I am trying to open the same file on each iteration of the for loop
Apart from that, I have few concerns regarding how I use -
Is it the good way to use? Can some help me how can I use the TRY on multiple occasions?
I tried to change the return type of readTextFileWithTry like Option[A] or Set/Map or Scala Collection to apply higher-order functions later on that. but not able to succeed. Not sure that is a good practice or not.
How can I use higher-order functions in Success case, as there are multiple operations and in Success case the code blocks get bigger? I can't use any field outside of Success case.
Can someone help me to understand?
I think that you problem has nothing to do with "I am trying to open the same file on each iteration of the for loop" and it is actually the same as in the accepted answer
Unfortunately you didn't provide stack trace so it is not clear on which line this happens. I would guess that the falling call is
val data1 = readTextFileWithTry("resources/Babynames/namesbystate")
And looking at your first code sample:
val info = for (stateFile <- new java.io.File("resources/Babynames/namesbystate").list(); if stateFile.endsWith(".TXT")) yield {
it looks like the path "resources/Babynames/namesbystate" points to a directory. But in your second example you are trying to read it as a file and this is the reason for the error. It comes from the fact that your readTextFileWithTry is not a valid substitute for java.io.File.list call. And File.list doesn't need a wrapper because it doesn't use any intermediate closeable/disposable entity.
P.S. it might make more sense to use File.list(FilenameFilter filter) instead of if stateFile.endsWith(".TXT"))

Scala alternative to series of if statements that append to a list?

I have a Seq[String] in Scala, and if the Seq contains certain Strings, I append a relevant message to another list.
Is there a more 'scalaesque' way to do this, rather than a series of if statements appending to a list like I have below?
val result = new ListBuffer[Err]()
val malformedParamNames = // A Seq[String]
if (malformedParamNames.contains("$top")) result += IntegerMustBePositive("$top")
if (malformedParamNames.contains("$skip")) result += IntegerMustBePositive("$skip")
if (malformedParamNames.contains("modifiedDate")) result += FormatInvalid("modifiedDate", "yyyy-MM-dd")
...
result.toList
If you want to use some scala iterables sugar I would use
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.map { v =>
v match {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
}.flatten
result.toList
Be warn if you ask for scala-esque way of doing things there are many possibilities.
The map function combined with flatten can be simplified by using flatmap
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.flatMap {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
result
Most 'scalesque' version I can think of while keeping it readable would be:
val map = scala.collection.immutable.ListMap(
"$top" -> IntegerMustBePositive("$top"),
"$skip" -> IntegerMustBePositive("$skip"),
"modifiedDate" -> FormatInvalid("modifiedDate", "yyyy-MM-dd"))
val result = for {
(k,v) <- map
if malformedParamNames contains k
} yield v
//or
val result2 = map.filterKeys(malformedParamNames.contains).values.toList
Benoit's is probably the most scala-esque way of doing it, but depending on who's going to be reading the code later, you might want a different approach.
// Some type definitions omitted
val malformations = Seq[(String, Err)](
("$top", IntegerMustBePositive("$top")),
("$skip", IntegerMustBePositive("$skip")),
("modifiedDate", FormatInvalid("modifiedDate", "yyyy-MM-dd")
)
If you need a list and the order is siginificant:
val result = (malformations.foldLeft(List.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
pair._2 ++: acc // prepend to list for faster performance
} else acc
}).reverse // and reverse since we were prepending
If the order isn't significant (although if the order's not significant, you might consider wanting a Set instead of a List):
val result = (malformations.foldLeft(Set.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
acc ++ pair._2
} else acc
}).toList // omit the .toList if you're OK with just a Set
If the predicates in the repeated ifs are more complex/less uniform, then the type for malformations might need to change, as they would if the responses changed, but the basic pattern is very flexible.
In this solution we define a list of mappings that take your IF condition and THEN statement in pairs and we iterate over the inputted list and apply the changes where they match.
// IF THEN
case class Operation(matcher :String, action :String)
def processInput(input :List[String]) :List[String] = {
val operations = List(
Operation("$top", "integer must be positive"),
Operation("$skip", "skip value"),
Operation("$modify", "modify the date")
)
input.flatMap { in =>
operations.find(_.matcher == in).map { _.action }
}
}
println(processInput(List("$skip","$modify", "$skip")));
A breakdown
operations.find(_.matcher == in) // find an operation in our
// list matching the input we are
// checking. Returns Some or None
.map { _.action } // if some, replace input with action
// if none, do nothing
input.flatMap { in => // inputs are processed, converted
// to some(action) or none and the
// flatten removes the some/none
// returning just the strings.

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Rewrite string modifications more functional

I'm reading lines from a file
for (line <- Source.fromFile("test.txt").getLines) {
....
}
I basically want to get a list of paragraphs in the end. If a line is empty, that starts as a new paragraph, and I might want to parse some keyword - value pairs in the future.
The text file contains a list of entries like this (or something similar, like an Ini file)
User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.
User=....
And I basically want to have a List[Project] where Project looks something like
class Project (val User: String, val Name:String, val Desc: String) {}
And the Description is that big chunk of text that doesn't start with a <keyword>=, but can stretch over any number of lines.
I know how to do this in an iterative style. Just do a list of checks for the keywords, and populate an instance of a class, and add it to a list to return later.
But I think it should be possible to do this in proper functional style, possibly with match case, yield and recursion, resulting in a list of objects that have the fields User, Project and so on. The class used is known, as are all the keywords, and the file format is not set in stone either. I'm mostly trying to learn better functional style.
You're obviously parsing something, so it might be the time to use... a parser!
Since your language seems to treat line breaks as significant, you will need to refer to this question to tell the parser so.
Apart from that, a rather simple implementation would be
import scala.util.parsing.combinator.RegexParsers
case class Project(user: String, name: String, description: String)
object ProjectParser extends RegexParsers {
override val whiteSpace = """[ \t]+""".r
def eol : Parser[String] = """\r?\n""".r
def user: Parser[String] = "User=" ~> """[^\n]*""".r <~ eol
def name: Parser[String] = "Project=" ~> """[^\n]*""".r <~ eol
def description: Parser[String] = repsep("""[^\n]+""".r, eol) ^^ { case l => l.mkString("\n") }
def project: Parser[Project] = user ~ name ~ description ^^ { case a ~ b ~ c => Project(a, b, c) }
def projects: Parser[List[Project]] = repsep(project,eol ~ eol)
}
And how to use it:
val sample = """User=foo1
Project=bar1
desc1
desc2
desc3
User=foo
Project=bar
desc4 desc5 desc6
desc7 desc8 desc9"""
import scala.util.parsing.input._
val reader = new CharSequenceReader(sample)
val res = ProjectParser.parseAll(ProjectParser.projects, reader)
if(res.successful) {
print("Found projects: " + res.get)
} else {
print(res)
}
Another possible implementation (since this parser is rather simple), using recursion:
import scala.io.Source
case class Project(user: String, name: String, desc: String)
#scala.annotation.tailrec
def parse(source: Iterator[String], list: List[Project] = Nil): List[Project] = {
val emptyProject = Project("", "", "")
#scala.annotation.tailrec
def parseProject(project: Option[Project] = None): Option[Project] = {
if(source.hasNext) {
val line = source.next
if(!line.isEmpty) {
val splitted = line.span(_ != '=')
parseProject(splitted match {
case (h, t) if h == "User" => project.orElse(Some(emptyProject)).map(_.copy(user = t.drop(1)))
case (h, t) if h == "Project" => project.orElse(Some(emptyProject)).map(_.copy(name = t.drop(1)))
case _ => project.orElse(Some(emptyProject)).map(project => project.copy(desc = (if(project.desc.isEmpty) "" else project.desc ++ "\n") ++ line))
})
} else project
} else project
}
if(source.hasNext) {
parse(source, parseProject().map(_ :: list).getOrElse(list))
} else list.reverse
}
And the test:
object Test {
def source = Source.fromString("""User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.
User=Plop
Project=SO
Some desc""")
def test = println(parse(source.getLines))
}
Which gives:
List(Project(Hans,Blow up the moon,The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.), Project(Plop,SO,Some desc))
To answer your question without also tackling keyword parsing, fold over the lines and aggregate lines unless it's an empty one, in which case you start a new empty paragraph.
lines.foldLeft(List("")) { (l, x) =>
if (x.isEmpty) "" :: l else (l.head + "\n" + x) :: l.tail
} reverse
You'll notice this has some wrinkles in how it handles zero lines, and multiple and trailing empty lines. Adapt to your needs. Also if you are anal about string concatenations you can collect them in a nested list and flatten in the end (using .map(_.mkString)), this is just to showcase the basic technique of folding a sequence not to a scalar but to a new sequence.
This builds a list in reverse order because list prepend (::) is more efficient than appending to l in each step.
You're obviously building something, so you might want to try... a builder!
Like Jürgen, my first thought was to fold, where you're accumulating a result.
A mutable.Builder does the accumulation mutably, with a collection.generic.CanBuildFrom to indicate the builder to use to make a target collection from a source collection. You keep the mutable thing around just long enough to get a result. So that's my plug for localized mutability. Lest one assume that the path from List[String] to List[Project] is immutable.
To the other fine answers (the ones with non-negative appreciation ratings), I would add that functional style means functional decomposition, and usually small functions.
If you're not using regex parsers, don't neglect regexes in your pattern matches.
And try to spare the dots. In fact, I believe that tomorrow is a Spare the Dots Day, and people with sensitivity to dots are advised to remain indoors.
case class Project(user: String, name: String, description: String)
trait Sample {
val sample = """
|User=Hans
|Project=Blow up the moon
|The slugs are going to eat the mustard. // multiline possible!
|They are sneaky bastards, those slugs.
|
|User=Bob
|I haven't thought up a project name yet.
|
|User=Greta
|Project=Burn the witch
|It's necessary to escape from the witch before
|we blow up the moon. I hope Hans sees it my way.
|Once we burn the bitch, I mean witch, we can
|wreak whatever havoc pleases us.
|""".stripMargin
}
object Test extends App with Sample {
val kv = "(.*?)=(.*)".r
def nonnully(s: String) = if (s == null) "" else s + " "
val empty = Project(null, null, null)
val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) { (acc, line) =>
val (sofar, cur) = acc
line match {
case kv("User", u) => (sofar, cur copy (user = u))
case kv("Project", n) => (sofar, cur copy (name = n))
case kv(k, _) => sys error s"Bad keyword $k"
case x if x.nonEmpty => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
case _ if cur != empty => (cur :: sofar, empty)
case _ => (sofar, empty)
}
}
val ps = if (dummy == empty) res.reverse else (dummy :: res).reverse
Console println ps
}
The match can be mashed this way, too:
val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) {
case ((sofar, cur), kv("User", u)) => (sofar, cur copy (user = u))
case ((sofar, cur), kv("Project", n)) => (sofar, cur copy (name = n))
case ((sofar, cur), kv(k, _)) => sys error s"Bad keyword $k"
case ((sofar, cur), x) if x.nonEmpty => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
case ((sofar, cur), _) if cur != empty => (cur :: sofar, empty)
case ((sofar, cur), _) => (sofar, empty)
}
Before the fold, it seemed simpler to do paragraphs first. Is that imperative thinking?
object Test0 extends App with Sample {
def grafs(ss: Iterator[String]): List[List[String]] = {
val (g, rest) = ss dropWhile (_.isEmpty) span (_.nonEmpty)
val others = if (rest.nonEmpty) grafs(rest) else Nil
g.toList :: others
}
def toProject(ss: List[String]): Project = {
var p = Project("", "", "")
for (line <- ss; parts = line split '=') parts match {
case Array("User", u) => p = p.copy(user = u)
case Array("Project", n) => p = p.copy(name = n)
case Array(k, _) => sys error s"Bad keyword $k"
case Array(text) => p = p.copy(description = s"${p.description} $text")
}
p
}
val ps = grafs(sample.lines) map toProject
Console println ps
}
class Project (val User: String, val Name:String, val Desc: String) {}
object Project {
def apply(str: String): Project = {
val user = somehowFetchUserName(str)
val name = somehowFetchProjectName(str)
val desc = somehowFetchDescription(str)
new Project(user, name, desc)
}
}
val contents: Array[String] = Source.fromFile("test.txt").mkString.split("\\n\\n")
val list = contents map(Project(_))
will end up with the list of projects.

Scala: Read some data of an Enumerator[T] and return the remaining Enumerator[T]

I am using the asynchronous I/O library of the playframework which uses Iteratees and Enumerators. I now have an Iterator[T] as data sink (for simplification say it's an Iterator[Byte] which stores its content into a file). This Iterator[Byte] is passed to the function which handles the writing.
But before writing I want to add some statistical information at the file begin (for simplification say it's one Byte), so I transfer the iterator the following way before passing it to the write function:
def write(value: Byte, output: Iteratee[Byte]): Iteratee[Byte] =
Iteratee.flatten(output.feed(Input.El(value)))
When I now read the stored file from the disk, I get an Enumerator[Byte] for it.
At first I want to read and remove the additional data and then I want to pass the rest of the Enumerator[Byte] to a function which handles the reading.
So I also need to transform the enumerator:
def read(input: Enumerator[Byte]): (Byte, Enumerator[Byte]) = {
val firstEnumeratorEntry = ...
val remainingEnumerator = ...
(firstEnumeratorEntry, remainingEnumerator)
}
But I have no idea, how to do this. How can I read some bytes from an Enumerator and get the remaining Enumerator?
Replacing Iteratee[Byte] with OutputStream and Enumerator[Byte] with InputStream, this would be very easy:
def write(value: Byte, output: OutputStream) = {
output.write(value)
output
}
def read(input: InputStream) = (input.read,input)
But I need the asynchronous I/O of the play framework.
I wonder if you can tackle your goal from another angle.
That function that would use the remaining enumerator, let's call it remaining, presumably it applies to an iteratee to do the processing of the remainder: remaining |>> iteratee yielding another iteratee. Let's call that resulting iteratee iteratee2... Can you check whether you can get a reference to iteratee2? If that's the case, then you can get and process the first byte using a first iteratee head, then combine head and iteratee2 through flatMap:
val head = Enumeratee.take[Byte](1) &>> Iteratee.foreach[Byte](println)
val processing = for { h <- head; i <- iteratee2 } yield (h, i)
Iteratee.flatten(processing).run
If you cannot get a hold of iteratee2 - which would be the case if your enumerator combines with an enumeratee that you did not implement - then this approach won't work.
Here is one way to achieve this by folding within the Iteratee and an appropriate (kind-of) State accumulator (a tuple here)
I go read the routes file, the first byte will be read as a Char and the other will be appended to a String as UTF-8 bytestrings.
def index = Action {
/*let's do everything asyncly*/
Async {
/*for comprehension for read-friendly*/
for (
i <- read; /*read the file */
(r:(Option[Char], String)) <- i.run /*"create" the related Promise and run it*/
) yield Ok("first : " + r._1.get + "\n" + "rest" + r._2) /* map the Promised result in a correct Request's Result*/
}
}
def read = {
//get the routes file in an Enumerator
val file: Enumerator[Array[Byte]] = Enumerator.fromFile(Play.getFile("/conf/routes"))
//apply the enumerator with an Iteratee that folds the data as wished
file(Iteratee.fold((None, ""):(Option[Char], String)) { (acc, b) =>
acc._1 match {
/*on the first chunk*/ case None => (Some(b(0).toChar), acc._2 + new String(b.tail, Charset.forName("utf-8")))
/*on other chunks*/ case x => (x, acc._2 + new String(b, Charset.forName("utf-8")))
}
})
}
EDIT
I found yet another way using Enumeratee but it needs to create 2 Enumerator s (one short lived). However is it a bit more elegant. We use a "kind-of" Enumeratee but the Traversal one which works at a finer level than Enumeratee (chunck level).
We use take 1 that will take only 1 byte and then close the stream. On the other one, we use drop that simply drops the first byte (because we're using a Enumerator[Array[Byte]])
Furthermore, now read2 has a signature much more closer than what you wished, because it returns 2 enumerators (not so far from Promise, Enumerator)
def index = Action {
Async {
val (first, rest) = read2
val enee = Enumeratee.map[Array[Byte]] {bs => new String(bs, Charset.forName("utf-8"))}
def useEnee(enumor:Enumerator[Array[Byte]]) = Iteratee.flatten(enumor &> enee |>> Iteratee.consume[String]()).run.asInstanceOf[Promise[String]]
for {
f <- useEnee(first);
r <- useEnee(rest)
} yield Ok("first : " + f + "\n" + "rest" + r)
}
}
def read2 = {
def create = Enumerator.fromFile(Play.getFile("/conf/routes"))
val file: Enumerator[Array[Byte]] = create
val file2: Enumerator[Array[Byte]] = create
(file &> Traversable.take[Array[Byte]](1), file2 &> Traversable.drop[Array[Byte]](1))
}
Actually we like Iteratees because they compose. So instead of creating multiple Enumerators from your original one, you rather compose the two Iteratees sequentially (read-first and read-rest), and feed it with your single Enumerator.
For this you need a sequential composition method, now I call it andThen. Here is a rough implementation. Note that returning the unconsumed input is a bit harsh, maybe could customize behavior with a typeclass based on the Input type. Also it doesn't handle passing the leftover stuff from the first iterator to the second one (Exercise :).
object Iteratees {
def andThen[E, A, B](a: Iteratee[E, A], b: Iteratee[E, B]): Iteratee[E, (A,B)] = new Iteratee[E, (A,B)] {
def fold[C](
done: ((A, B), Input[E]) => Promise[C],
cont: ((Input[E]) => Iteratee[E, (A, B)]) => Promise[C],
error: (String, Input[E]) => Promise[C]): Promise[C] = {
a.fold(
(ra, aleft) => b.fold(
(rb, bleft) => done((ra, rb), aleft /* could be magicop(aleft, bleft)*/),
(bcont) => cont(e => bcont(e) map (rb => (ra, rb))),
(s, err) => error(s, err)
),
(acont) => cont(e => andThen[E, A, B](acont(e), b)),
(s, err) => error(s, err)
)
}
}
}
Now you can just use the following:
object Application extends Controller {
def index = Action { Async {
val strings: Enumerator[String] = Enumerator("1","2","3","4")
val takeOne = Cont[String, String](e => e match {
case Input.El(e) => Done(e, Input.Empty)
case x => Error("not enough", x)
})
val takeRest = Iteratee.consume[String]()
val firstAndRest = Iteratees.andThen(takeOne, takeRest)
val futureRes = strings(firstAndRest) flatMap (_.run)
futureRes.map(x => Ok(x.toString)) // prints (1,234)
} }
}