Scala conditional accumulation - scala

I'm trying to implement a function that extracts from a given string "placeholders" delimited by $ character.
Processing the string:
val stringToParse = "ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored
the result should be:
Seq("aaa", "bbb")
What would be a Scala idiomatic alternative of following implementation using var for toggling accumulation?
import fiddle.Fiddle, Fiddle.println
import scalajs.js
import scala.collection.mutable.ListBuffer
#js.annotation.JSExportTopLevel("ScalaFiddle")
object ScalaFiddle {
// $FiddleStart
val stringToParse = "ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored"
class StringAccumulator {
val accumulator: ListBuffer[String] = new ListBuffer[String]
val sb: StringBuilder = new StringBuilder("")
var open:Boolean = false
def next():Unit = {
if (open) {
accumulator.append(sb.toString)
sb.clear
open = false
} else {
open = true
}
}
def accumulateIfOpen(charToAccumulate: Char):Unit = {
if (open) sb.append(charToAccumulate)
}
def get(): Seq[String] = accumulator.toList
}
def getPlaceHolders(str: String): Seq[String] = {
val sac = new StringAccumulator
str.foreach(chr => {
if (chr == '$') {
sac.next()
} else {
sac.accumulateIfOpen(chr)
}
})
sac.get
}
println(getPlaceHolders(stringToParse))
// $FiddleEnd
}

I'll present two solutions to you. The first is the most direct translation of what you've done. In Scala, if you hear the word accumulate it usually translates to a variant of fold or reduce.
def extractValues(s: String) =
{
// We can combine the functionality of your boolean and StringBuilder by using an Option
s.foldLeft[(ListBuffer[String],Option[StringBuilder])]((new ListBuffer[String], Option.empty))
{
//As we fold through, we have the accumulated list, possibly a partially built String and the current letter
case ((accumulator,sbOption),char) =>
{
char match
{
//This logic pretty much matches what you had, adjusted to work with the Option
case '$' =>
{
sbOption match
{
case Some(sb) =>
{
accumulator.append(sb.mkString)
(accumulator,None)
}
case None =>
{
(accumulator,Some(new StringBuilder))
}
}
}
case _ =>
{
sbOption.foreach(_.append(char))
(accumulator,sbOption)
}
}
}
}._1.map(_.mkString).toList
}
However, that seems pretty complicated, for what sounds like it should be a simple task. We can use regexes, but those are scary so let's avoid them. In fact, with a little bit of thought this problem actually becomes quite simple.
def extractValuesSimple(s: String) =
{
s.split('$'). //Split the string on the $ character
dropRight(1). //Drops the rightmost item, to handle the case with an odd number of $
zipWithIndex.filter{case (str, index) => index % 2 == 1}. //Filter out all of the even indexed items, which will always be outside of the matching $
map{case (str, index) => str}.toList //Remove the indexes from the output
}

Is this solution enough?
scala> val stringToParse = "ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored"
stringToParse: String = ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored
scala> val P = """\$([^\$]+)\$""".r
P: scala.util.matching.Regex = \$([^\$]+)\$
scala> P.findAllIn(stringToParse).map{case P(s) => s}.toSeq
res1: Seq[String] = List(aaa, bbb)

Related

Is there a way to use map-like syntax for setting variables in non-map code blocks in Scala? [duplicate]

This question already has answers here:
Chaining operations on values without naming intermediate values
(2 answers)
Closed 2 years ago.
I am often writing code where I want to evaluate an expression and then pass the result of that expression down to the next block of code as a variable using syntax similar to that used with a map or fold expression.
The sort of code that I want to write would look something like we do with an Option or Future:
Some("foo") map { upper => s"works with $upper"}
but without objects that support map.
A good example (but not the only one) is doing replacement of Json elements in with the spray libraries. Something like:
JsObject(upper.fields +
("obj" -> JsObject(
upper.fields("obj").asJsObject { lower: JsObject => lower.fields +
("data" -> JsObject(lower.fields("data")))
}
))
)
Is there a way to accomplish this without resorting to nesting matches?
Answer 1: Here is an example using the mouse library with Scala 2.12:
def withMouse(upper: JsObject): JsObject = {
upper
.|> { upper => println(upper.compactPrint); upper.fields; }
.|> { upper => upper + ("obj" ->
upper("obj").asJsObject.fields
.|> { lower => JsObject(lower + ("data" ->
lower("data")
))}
)}
.|> { newmsg => JsObject(newmsg) }
}
Answer 2: Here is the same example using the ChainingOps library's .pipe method with Scala 2.13
def withPipe(upper: JsObject): JsObject = {
upper
.pipe { upper => println(upper.compactPrint); upper.fields; }
.pipe { upper => upper + ("obj" ->
upper("obj").asJsObject.fields
.pipe { lower => JsObject(lower + ("data" ->
lower("data")
))}
)}
.pipe { newmsg => JsObject(newmsg) }
}
It sounds like you want pipe().
From the ScalaDocs page:
import scala.util.chaining._
val times6 = (_: Int) * 6
//times6: Int => Int = $$Lambda$2023/975629453#17143b3b
val i = (1 - 2 - 3).pipe(times6).pipe(scala.math.abs)
//i: Int = 24
If not on Scala 2.13, there exists mouse which provides similar chaining operators, for example,
input
.<| { println }
.<| { preconditions }
.thrush { program }
.<| { postconditions }
.<| { println }
where
import mouse.all._
import scala.io.StdIn
case class User(name: String, age: Int, previous: Option[User] = None) {
def changeName(newName: String): User =
this.copy(name = newName, previous = Some(this))
}
def preconditions(user: User): Unit = {
assert(user.name.nonEmpty, "User should have a name")
assert(user.age >= 0, "User's age should not be negative")
}
def postconditions(`new`: User): Unit = {
assert(
`new`.previous.exists(_.name != `new`.name),
"User should have changed their name details"
)
}
def program(user: User): User = {
println(s"Please enter new name for $user")
val newName = StdIn.readLine()
user.changeName(newName)
}
val input = User("Picard", 75)
Note how <| just executes side-effect, whilst thrush may transform.

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}

Scala extending while loops to do-until expressions

I'm trying to do some experiment with Scala. I'd like to repeat this experiment (randomized) until the expected result comes out and get that result. If I do this with either while or do-while loop, then I need to write (suppose 'body' represents the experiment and 'cond' indicates if it's expected):
do {
val result = body
} while(!cond(result))
It does not work, however, since the last condition cannot refer to local variables from the loop body. We need to modify this control abstraction a little bit like this:
def repeat[A](body: => A)(cond: A => Boolean): A = {
val result = body
if (cond(result)) result else repeat(body)(cond)
}
It works somehow but is not perfect for me since I need to call this method by passing two parameters, e.g.:
val result = repeat(body)(a => ...)
I'm wondering whether there is a more efficient and natural way to do this so that it looks more like a built-in structure:
val result = do { body } until (a => ...)
One excellent solution for body without a return value is found in this post: How Does One Make Scala Control Abstraction in Repeat Until?, the last one-liner answer. Its body part in that answer does not return a value, so the until can be a method of the new AnyRef object, but that trick does not apply here, since we want to return A rather than AnyRef. Is there any way to achieve this? Thanks.
You're mixing programming styles and getting in trouble because of it.
Your loop is only good for heating up your processor unless you do some sort of side effect within it.
do {
val result = bodyThatPrintsOrSomething
} until (!cond(result))
So, if you're going with side-effecting code, just put the condition into a var:
var result: Whatever = _
do {
result = bodyThatPrintsOrSomething
} until (!cond(result))
or the equivalent:
var result = bodyThatPrintsOrSomething
while (!cond(result)) result = bodyThatPrintsOrSomething
Alternatively, if you take a functional approach, you're going to have to return the result of the computation anyway. Then use something like:
Iterator.continually{ bodyThatGivesAResult }.takeWhile(cond)
(there is a known annoyance of Iterator not doing a great job at taking all the good ones plus the first bad one in a list).
Or you can use your repeat method, which is tail-recursive. If you don't trust that it is, check the bytecode (with javap -c), add the #annotation.tailrec annotation so the compiler will throw an error if it is not tail-recursive, or write it as a while loop using the var method:
def repeat[A](body: => A)(cond: A => Boolean): A = {
var a = body
while (cond(a)) { a = body }
a
}
With a minor modification you can turn your current approach in a kind of mini fluent API, which results in a syntax that is close to what you want:
class run[A](body: => A) {
def until(cond: A => Boolean): A = {
val result = body
if (cond(result)) result else until(cond)
}
}
object run {
def apply[A](body: => A) = new run(body)
}
Since do is a reserved word, we have to go with run. The result would now look like this:
run {
// body with a result type A
} until (a => ...)
Edit:
I just realized that I almost reinvented what was already proposed in the linked question. One possibility to extend that approach to return a type A instead of Unit would be:
def repeat[A](body: => A) = new {
def until(condition: A => Boolean): A = {
var a = body
while (!condition(a)) { a = body }
a
}
}
Just to document a derivative of the suggestions made earlier, I went with a tail-recursive implementation of repeat { ... } until(...) that also included a limit to the number of iterations:
def repeat[A](body: => A) = new {
def until(condition: A => Boolean, attempts: Int = 10): Option[A] = {
if (attempts <= 0) None
else {
val a = body
if (condition(a)) Some(a)
else until(condition, attempts - 1)
}
}
}
This allows the loop to bail out after attempts executions of the body:
scala> import java.util.Random
import java.util.Random
scala> val r = new Random()
r: java.util.Random = java.util.Random#cb51256
scala> repeat { r.nextInt(100) } until(_ > 90, 4)
res0: Option[Int] = Some(98)
scala> repeat { r.nextInt(100) } until(_ > 90, 4)
res1: Option[Int] = Some(98)
scala> repeat { r.nextInt(100) } until(_ > 90, 4)
res2: Option[Int] = None
scala> repeat { r.nextInt(100) } until(_ > 90, 4)
res3: Option[Int] = None
scala> repeat { r.nextInt(100) } until(_ > 90, 4)
res4: Option[Int] = Some(94)

Functional way to update an object based on flags

suppose you are writing a class that normalizes strings. That class has a number of configuration flags. For example:
val makeLowerCase: Boolean = true
val removeVowels: Boolean = false
val dropFirstCharacter: Boolean = true
If I were writing mutable code, I would write the following for the normalize method.
def normalize(string: String) = {
var s = string
if (makeLowerCase) {
s = s.toLowerCase
}
if (removeVowels) {
s = s.replaceAll("[aeiou]", "")
}
if (dropFirstCharacter) {
s = s.drop(1)
}
s
}
Is there a clean and easy way of writing these without mutation? Nested conditionals becomes nasty fast. I could create a list of String=>String lambdas, filter it based on the configuration, and then fold the string through it, but I hope there is something easier.
Your best bet is to define your own method:
class ConditionalMapper[A](a: A) {
def changeCheck(p: A => Boolean)(f: A => A) = if (p(a)) f(a) else a
def changeIf(b: Boolean)(f: A => A) = if (b) f(a) else a
}
implicit def conditionally_change_anything[A](a: A) = new ConditionalMapper(a)
Now you chain these things together and write:
class Normer(makeLC: Boolean, remVowel: Boolean, dropFirst: Boolean) {
def normalize(s: String) = {
s.changeIf(makeLC) { _.toLowerCase }
.changeIf(remVowel) { _.replaceAll("[aeiou]","") }
.changeIf(dropFirst){ _.substring(1) }
}
}
Which gives you:
scala> val norm = new Normer(true,false,true)
norm: Normer = Normer#2098746b
scala> norm.normalize("The Quick Brown Fox Jumps Over The Lazy Dog")
res1: String = he quick brown fox jumps over the lazy dog
That said, the mutable solution is not bad either--just keep it to a small block and you'll be fine. It's mostly a problem when you let mutability escape into the wild. (Where "the wild" means "outside your method, or inside any method more than a handful of lines long".)
If you use scalaz |> operator or have a similar one defined in your utility classes you can do this:
case class N(
makeLowerCase: Boolean = true,
removeVowels: Boolean = false,
dropFirstCharacter: Boolean = true) {
def normalize(string: String) = (
string
|> (s => if (makeLowerCase) s.toLowerCase else s)
|> (s => if (removeVowels) s.replaceAll("[aeiou]", "") else s)
|> (s => if (dropFirstCharacter) s.drop(1) else s)
)
}
N(removeVowels=true).normalize("DDABCUI")
// res1: String = dbc