Find String in Char iterator - scala

I have a use case where I need to return a String up to a delimiter String (if found) from an iterator of Char.
The contract:
if iterator is exhausted (only at the begin), return None
if the delimiter String is found, return all characters before it (empty String is fine), delimiter will be dropped
else return the remaining characters
do not eagerly exhaust the iterator!
I do have this working solution, but it feels like Java (which is where I'm coming from)
class MyClass(str: String) {
def nextString(iterator: Iterator[Char]): Option[String] = {
val sb = new StringBuilder
if(!iterator.hasNext) return None
while (iterator.hasNext) {
sb.append(iterator.next())
if (sb.endsWith(str)) return Some(sb.stripSuffix(str))
}
Some(sb.toString())
}
}
Is there a way I can do this in a more functional way (ideally without changing the method signature)?
Update: Here is how I test this
val desmurfer = new MyClass("_smurf_")
val iterator: Iterator[Char] = "Scala_smurf_is_smurf_great_smurf__smurf_".iterator
println(desmurfer.nextString(iterator))
println(desmurfer.nextString(iterator))
println(desmurfer.nextString(iterator))
println(desmurfer.nextString(iterator))
println(desmurfer.nextString(iterator))
println
println(desmurfer.nextString("FooBarBaz".iterator))
println(desmurfer.nextString("".iterator))
Output:
Some(Scala)
Some(is)
Some(great)
Some()
None
Some(FooBarBaz)
None

How about this one:
scala> def nextString(itr: Iterator[Char], sep: String): Option[String] = {
| def next(res: String): String =
| if(res endsWith sep) res dropRight sep.size else if(itr.hasNext) next(res:+itr.next) else res
| if(itr.hasNext) Some(next("")) else None
| }
nextString: (itr: Iterator[Char], sep: String)Option[String]
scala> val iterator: Iterator[Char] = "Scala_smurf_is_smurf_great".iterator
iterator: Iterator[Char] = non-empty iterator
scala> println(nextString(iterator, "_smurf_"))
Some(Scala)
scala> println(nextString(iterator, "_smurf_"))
Some(is)
scala> println(nextString(iterator, "_smurf_"))
Some(great)
scala> println(nextString(iterator, "_smurf_"))
None
scala> println(nextString("FooBarBaz".iterator, "_smurf_"))
Some(FooBarBaz)

What about this one?
def nextString(iterator: Iterator[Char]): Option[String] = {
val t = iterator.toStream
val index = t.indexOfSlice(s)
if(t.isEmpty) None
else if(index == -1) Some(t.mkString)
else Some(t.slice(0,index).mkString)
}
it passed this tests:
val desmurfer = new MyClass("_smurf_")
val iterator: Iterator[Char] = "Scala_smurf_is_smurf_great_smurf__smurf_".iterator
assert(desmurfer.nextString(iterator) == Some("Scala"))
assert(desmurfer.nextString(iterator) == Some("is"))
assert(desmurfer.nextString(iterator) == Some("great"))
assert(desmurfer.nextString(iterator) == Some(""))
assert(desmurfer.nextString(iterator) == None)
assert(desmurfer.nextString("FooBarBaz".iterator) == Some("FooBarBaz"))
assert(desmurfer.nextString("".iterator) == None)
Updated: removed "index == -1 &&" from the first "if condition clause".

This seems to be doing what you'd want. #Eastsun answer motivated me
val str = "hello"
def nextString2(iterator: Iterator[Char]): Option[String] = {
val maxSize = str.size
#tailrec
def inner(collected: List[Char], queue: Queue[Char]): Option[List[Char]] =
if (queue.size == maxSize && queue.sameElements(str))
Some(collected.reverse.dropRight(maxSize))
else
iterator.find(x => true) match {
case Some(el) => inner(el :: collected, if (queue.size == maxSize) queue.dequeue._2.enqueue(el) else queue.enqueue(el))
case None => Some(collected.reverse)
}
if (iterator.hasNext)
inner(Nil, Queue.empty).map(_.mkString)
else
None
}
test(nextString2(Nil.iterator)) === None
test(nextString2("".iterator)) === None
test(nextString2("asd".iterator)) === Some("asd")
test(nextString2("asfhello".iterator)) === Some("asf")
test(nextString2("asehelloasdasd".iterator)) === Some("ase")
But I honestly think it's too complicated to be used. Sometimes you have to use non FP stuff in scala to be performance effecient.
P.S. I didn't know how to match iterator on it's first element, so I've used iterator.find(x => true) which is ugly. Sorry.
P.P.S. A bit of explanation. I recoursively build up collected to fill the elements you are searching for. And I also build queue with last str.size-elements. Then I just check this queue over str each time. This might not be the most efficient way of doing this stuff. You might go with Aho–Corasick algorithm or an analogue if you want more.
P.P.P.S. And I am using iterator as a state, which is probably not FP way
P.P.P.P.S. And you test passes as well:
val desmurfer = new MyClass("_smurf_")
val iterator: Iterator[Char] = "Scala_smurf_is_smurf_great".iterator
test(desmurfer.nextString2(iterator)) === Some("Scala")
test(desmurfer.nextString2(iterator)) === Some("is")
test(desmurfer.nextString2(iterator)) === Some("great")
test(desmurfer.nextString2(iterator)) === None
println()
test(desmurfer.nextString2("FooBarBaz".iterator)) === Some("FooBarBaz")
test(desmurfer.nextString2("".iterator)) === None

Here's one I'm posting just because it's a bit warped :) I wouldn't recommend actually using it:
class MyClass2(str: String) {
val sepLength = str.length
def nextString(iterator: Iterator[Char]): Option[String] = {
if (!iterator.hasNext) return None
val sit = iterator.sliding(sepLength)
val prefix = sit.takeWhile(_.mkString != str).toList
val prefixString = prefix.toList.map(_.head).mkString
if (prefix.head.length < sepLength) Some(prefix.head.mkString)
else if (!iterator.hasNext) Some(prefix.head.mkString + prefix.last.mkString)
else Some(prefixString)
}
}
The idea is that by calling sliding() on our underlying iterator, we can get a sequence, one of which will be our delimiter, if it's present. So we can use takeWhile to find the delimiter. Then the first characters of each of the sliding strings before our delimiter is the string we skipped over. As I said, warped.
I'd really like sliding to be defined so that it produced all subsequences of length n and at the end sequences of length n-1, n-2....1 for this particular use case, but it doesn't, and the horrible if statement at the end is dealing with the various cases.
It passes the test cases :)

Updated: This works without converting the iterator to String
def nextString(iterator: Iterator[Char]): Option[String] = {
if (iterator.isEmpty) None
else Some(iterator.foldLeft("") { (result, currentChar) => if (res.endsWith(str)) result else result + currentChar})
}

A colleague provided the makings of this answer, which is a mixture between his original approach and some polishing from my side. Thanks, Evans!
Then another colleague also added some input. Thanks Ako :-)
class MyClass(str: String) {
def nextString(iterator: Iterator[Char]): Option[String] = {
def nextString(iterator: Iterator[Char], sb: StringBuilder): Option[String] = {
if (!iterator.hasNext || sb.endsWith(str)) {
Some(sb.stripSuffix(str))
} else {
nextString(iterator, sb.append(iterator.next()))
}
}
if (!iterator.hasNext) None
else nextString(iterator, new StringBuilder)
}
}
So far, I like this approach best, so I will accept it in two days unless there is a better answer by then.

Related

Scala conditional accumulation

I'm trying to implement a function that extracts from a given string "placeholders" delimited by $ character.
Processing the string:
val stringToParse = "ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored
the result should be:
Seq("aaa", "bbb")
What would be a Scala idiomatic alternative of following implementation using var for toggling accumulation?
import fiddle.Fiddle, Fiddle.println
import scalajs.js
import scala.collection.mutable.ListBuffer
#js.annotation.JSExportTopLevel("ScalaFiddle")
object ScalaFiddle {
// $FiddleStart
val stringToParse = "ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored"
class StringAccumulator {
val accumulator: ListBuffer[String] = new ListBuffer[String]
val sb: StringBuilder = new StringBuilder("")
var open:Boolean = false
def next():Unit = {
if (open) {
accumulator.append(sb.toString)
sb.clear
open = false
} else {
open = true
}
}
def accumulateIfOpen(charToAccumulate: Char):Unit = {
if (open) sb.append(charToAccumulate)
}
def get(): Seq[String] = accumulator.toList
}
def getPlaceHolders(str: String): Seq[String] = {
val sac = new StringAccumulator
str.foreach(chr => {
if (chr == '$') {
sac.next()
} else {
sac.accumulateIfOpen(chr)
}
})
sac.get
}
println(getPlaceHolders(stringToParse))
// $FiddleEnd
}
I'll present two solutions to you. The first is the most direct translation of what you've done. In Scala, if you hear the word accumulate it usually translates to a variant of fold or reduce.
def extractValues(s: String) =
{
// We can combine the functionality of your boolean and StringBuilder by using an Option
s.foldLeft[(ListBuffer[String],Option[StringBuilder])]((new ListBuffer[String], Option.empty))
{
//As we fold through, we have the accumulated list, possibly a partially built String and the current letter
case ((accumulator,sbOption),char) =>
{
char match
{
//This logic pretty much matches what you had, adjusted to work with the Option
case '$' =>
{
sbOption match
{
case Some(sb) =>
{
accumulator.append(sb.mkString)
(accumulator,None)
}
case None =>
{
(accumulator,Some(new StringBuilder))
}
}
}
case _ =>
{
sbOption.foreach(_.append(char))
(accumulator,sbOption)
}
}
}
}._1.map(_.mkString).toList
}
However, that seems pretty complicated, for what sounds like it should be a simple task. We can use regexes, but those are scary so let's avoid them. In fact, with a little bit of thought this problem actually becomes quite simple.
def extractValuesSimple(s: String) =
{
s.split('$'). //Split the string on the $ character
dropRight(1). //Drops the rightmost item, to handle the case with an odd number of $
zipWithIndex.filter{case (str, index) => index % 2 == 1}. //Filter out all of the even indexed items, which will always be outside of the matching $
map{case (str, index) => str}.toList //Remove the indexes from the output
}
Is this solution enough?
scala> val stringToParse = "ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored"
stringToParse: String = ignore/me/$aaa$/once-again/ignore/me/$bbb$/still-to-be/ignored
scala> val P = """\$([^\$]+)\$""".r
P: scala.util.matching.Regex = \$([^\$]+)\$
scala> P.findAllIn(stringToParse).map{case P(s) => s}.toSeq
res1: Seq[String] = List(aaa, bbb)

Multiple if else statements to get non-empty value from a map in Scala

I have a string to string map, and its value can be an empty string. I want to assign a non-empty value to a variable to use it somewhere. Is there a better way to write this in Scala?
import scala.collection.mutable
var keyvalue = mutable.Map.empty[String, String]
keyvalue += ("key1" -> "value1")
var myvalue = ""
if (keyvalue.get("key1").isDefined &&
keyvalue("key1").length > 0) {
myvalue = keyvalue("key1")
}
else if (keyvalue.get("key2").isDefined &&
keyvalue("key2").length > 0) {
myvalue = keyvalue("key2")
}
else if (keyvalue.get("key3").isDefined &&
keyvalue("key3").length > 0) {
myvalue = keyvalue("key3")
}
A more idiomatic way would be to use filter to check the length of the string contained in the Option, then orElse and getOrElse to assign to a val. A crude example:
def getKey(key: String): Option[String] = keyvalue.get(key).filter(_.length > 0)
val myvalue: String = getKey("key1")
.orElse(getKey("key2"))
.orElse(getKey("key3"))
.getOrElse("")
Here's a similar way to do it with an arbitrary list of fallback keys. Using a view and collectFirst, we will only evaluate keyvalue.get for only as many times as we need to (or all, if there are no matches).
val myvalue: String = List("key1", "key2", "key3").view
.map(keyvalue.get)
.collectFirst { case Some(value) if(value.length > 0) => value }
.getOrElse("")
Mmm, it seems it took me too long to devise a generic solution and other answer was accepted, but here it goes:
def getOrTryAgain(map: mutable.Map[String, String], keys: List[String]): Option[String] =
{
if(keys.isEmpty)
None
else
map.get(keys.head).filter(_.length > 0).orElse(getOrTryAgain(map, keys.tail))
}
val myvalue2 = getOrTryAgain(keyvalue, List("key1", "key2", "key3"))
This one you can use to check for as many keys as you want.

Better safe get from an array in scala?

I want to get first argument for main method that is optional, something like this:
val all = args(0) == "all"
However, this would fail with exception if no argument is provided.
Is there any one-liner simple method to set all to false when args[0] is missing; and not doing the common if-no-args-set-false-else... thingy?
In general case you can use lifting:
args.lift(0).map(_ == "all").getOrElse(false)
Or even (thanks to #enzyme):
args.lift(0).contains("all")
You can use headOption and fold (on Option):
val all = args.headOption.fold(false)(_ == "all")
Of course, as #mohit pointed out, map followed by getOrElse will work as well.
If you really need indexed access, you could pimp a get method on any Seq:
implicit class RichIndexedSeq[V, T <% Seq[V]](seq: T) {
def get(i: Int): Option[V] =
if (i < 0 || i >= seq.length) None
else Some(seq(i))
}
However, if this is really about arguments, you'll be probably better off, handling arguments in a fold:
case class MyArgs(n: Int = 1, debug: Boolean = false,
file: Option[String] = None)
val myArgs = args.foldLeft(MyArgs()) {
case (args, "-debug") =>
args.copy(debug = true)
case (args, str) if str.startsWith("-n") =>
args.copy(n = ???) // parse string
case (args, str) if str.startsWith("-f") =>
args.copy(file = Some(???) // parse string
case _ =>
sys.error("Unknown arg")
}
if (myArgs.file.isEmpty)
sys.error("Need file")
You can use foldLeft with initial false value:
val all = (false /: args)(_ | _ == "all")
But be careful, One Liners can be difficult to read.
Something like this will work assuming args(0) returns Some or None:
val all = args(0).map(_ == "all").getOrElse(false)

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Scala: Print separators when using output stream

When we need an array of strings to be concatenated, we can use mkString method:
val concatenatedString = listOfString.mkString
However, when we have a very long list of string, getting concatenated string may not be a good choice. In this case, It would be more appropriated to print out to an output stream directly, Writing it to output stream is simple:
listOfString.foreach(outstream.write _)
However, I don't know a neat way to append separators. One thing I tried is looping with an index:
var i = 0
for(str <- listOfString) {
if(i != 0) outstream.write ", "
outstream.write str
i += 1
}
This works, but it is too wordy. Although I can make a function encapsules the code above, I want to know whether Scala API already has a function do the same thing or not.
Thank you.
Here is a function that do what you want in a bit more elegant way:
def commaSeparated(list: List[String]): Unit = list match {
case List() =>
case List(a) => print(a)
case h::t => print(h + ", ")
commaSeparated(t)
}
The recursion avoids mutable variables.
To make it even more functional style, you can pass in the function that you want to use on each item, that is:
def commaSeparated(list: List[String], func: String=>Unit): Unit = list match {
case List() =>
case List(a) => func(a)
case h::t => func(h + ", ")
commaSeparated(t, func)
}
And then call it by:
commaSeparated(mylist, oustream.write _)
I believe what you want is the overloaded definitions of mkString.
Definitions of mkString:
scala> val strList = List("hello", "world", "this", "is", "bob")
strList: List[String] = List(hello, world, this, is, bob)
def mkString: String
scala> strList.mkString
res0: String = helloworldthisisbob
def mkString(sep: String): String
scala> strList.mkString(", ")
res1: String = hello, world, this, is, bob
def mkString(start: String, sep: String, end: String): String
scala> strList.mkString("START", ", ", "END")
res2: String = STARThello, world, this, is, bobEND
EDIT
How about this?
scala> strList.view.map(_ + ", ").foreach(print) // or .iterator.map
hello, world, this, is, bob,
Not good for parallelized code, but otherwise:
val it = listOfString.iterator
it.foreach{x => print(x); if (it.hasNext) print(' ')}
Here's another approach which avoids the var
listOfString.zipWithIndex.foreach{ case (s, i) =>
if (i != 0) outstream write ","
outstream write s }
Self Answer:
I wrote a function encapsulates the code in the original question:
implicit def withSeparator[S >: String](seq: Seq[S]) = new {
def withSeparator(write: S => Any, sep: String = ",") = {
var i = 0
for (str <- seq) {
if (i != 0) write(sep)
write(str)
i += 1
}
seq
}
}
You can use it like this:
listOfString.withSeparator(print _)
The separator can also be assigned:
listOfString.withSeparator(print _, ",\n")
Thank you for everyone answered me. What I wanted to use is a concise and not too slow representation. The implicit function withSeparator looks like the thing I wanted. So I accept my own answer for this question. Thank you again.