Minimal Substring Satisfying a condition in scala - scala

I have a string, lets say val mystr = "abcde", and I want to find the minimal substring of mystr which satisfies a given condition. I have to send a string to an external system, so the only way to do this is to iterate through the length of the string and make requests to the external system, and break when the response from the external system returns true
eg.
callExtSystemWith("a") //Returns false
callExtSystemWith("ab") //Returns false
callExtSystemWith("abc") //Returns true
Then my method should return "abc". I read that breaks are not the scala way, so was wondering what is the scala way of achieving this?
Right now I have:
for {end <- 1 to mystr.length)}{
callExtSystemWith(mystr.substring(0,end))
// I Want to break when this is true.
}
Help much appreciated

You can use inits.toStream.reverse.drop(1) (1 to s.length).map(s.take).toStream to create a lazy stream with a, ab, abc, abcd.
Then filter those strings, so that only the ones for which callExtSystemWith returns true are left.
Then get the first string for which callExtSystemWith returns true. Because this is a lazy stream, no unecessary requests will be made to the server once the first match is found.
val s = "abcdefgh"
val strs = (1 to s.length).map(s.take).toStream
strs.filter(callExtSystemWith).headOption match {
case Some(s) => "found"
case _ => "not found"
}
You can also use find instead of filter + headOption

Quite often break can be replaced with find on some sequence
So here is another short solution for this problem:
def findSuitablePrefix(mystr: String): Option[String] =
(1 to mystr.length).view.map(mystr.substring(0, _)).find(callExtSystemWith)
.view makes the evaluation lazy to avoid creating extra substrings.
.map transforms the sequence of indexes into a sequence of substrings.
And .find "breaks" after the first element for which callExtSystemWith returns true is found.

In Scala there are no normal breaks but there are other solutions. The one I like better is to create a function and force a return (instead of a normal break). Something like:
def callAndBreak(mystr:String) : Int = {
for (end <- 1 to mystr.length) {
if ( callExtSystemWith(mystr.substring(0,end)) ) return end
}
end
}
Here I return end but you can return anything

If you want to avoid using return or breaks, you could also use foldLeft:
val finalResult = (1 to mystr.length).foldLeft(false) { (result, end) =>
if(!result) callExtSystemWith(mystr.substring(0, end)) else result
}
However, it is a bit hard to read, and will walk the entire length of the string.
Simple recursion might be a better way:
def go(s: String, end: Int): Boolean = {
if(end >= s.length) false
else {
callExtSystemWith(s.substring(0, end)) || go(s, end + 1)
}
}
go(mystr, 1)

Related

How to yield a single element from for loop in scala?

Much like this question:
Functional code for looping with early exit
Say the code is
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return /*???*/ Some(obj)
}
None
}
How to yield a single element from a for loop like this in scala?
I do not want to use find, as proposed in the original question, i am curious about if and how it could be implemented using the for loop.
* UPDATE *
First, thanks for all the comments, but i guess i was not clear in the question. I am shooting for something like this:
val seven = for {
x <- 1 to 10
if x == 7
} return x
And that does not compile. The two errors are:
- return outside method definition
- method main has return statement; needs result type
I know find() would be better in this case, i am just learning and exploring the language. And in a more complex case with several iterators, i think finding with for can actually be usefull.
Thanks commenters, i'll start a bounty to make up for the bad posing of the question :)
If you want to use a for loop, which uses a nicer syntax than chained invocations of .find, .filter, etc., there is a neat trick. Instead of iterating over strict collections like list, iterate over lazy ones like iterators or streams. If you're starting with a strict collection, make it lazy with, e.g. .toIterator.
Let's see an example.
First let's define a "noisy" int, that will show us when it is invoked
def noisyInt(i : Int) = () => { println("Getting %d!".format(i)); i }
Now let's fill a list with some of these:
val l = List(1, 2, 3, 4).map(noisyInt)
We want to look for the first element which is even.
val r1 = for(e <- l; val v = e() ; if v % 2 == 0) yield v
The above line results in:
Getting 1!
Getting 2!
Getting 3!
Getting 4!
r1: List[Int] = List(2, 4)
...meaning that all elements were accessed. That makes sense, given that the resulting list contains all even numbers. Let's iterate over an iterator this time:
val r2 = (for(e <- l.toIterator; val v = e() ; if v % 2 == 0) yield v)
This results in:
Getting 1!
Getting 2!
r2: Iterator[Int] = non-empty iterator
Notice that the loop was executed only up to the point were it could figure out whether the result was an empty or non-empty iterator.
To get the first result, you can now simply call r2.next.
If you want a result of an Option type, use:
if(r2.hasNext) Some(r2.next) else None
Edit Your second example in this encoding is just:
val seven = (for {
x <- (1 to 10).toIterator
if x == 7
} yield x).next
...of course, you should be sure that there is always at least a solution if you're going to use .next. Alternatively, use headOption, defined for all Traversables, to get an Option[Int].
You can turn your list into a stream, so that any filters that the for-loop contains are only evaluated on-demand. However, yielding from the stream will always return a stream, and what you want is I suppose an option, so, as a final step you can check whether the resulting stream has at least one element, and return its head as a option. The headOption function does exactly that.
def findFirst[T](objects: List[T], expensiveFunc: T => Boolean): Option[T] =
(for (obj <- objects.toStream if expensiveFunc(obj)) yield obj).headOption
Why not do exactly what you sketched above, that is, return from the loop early? If you are interested in what Scala actually does under the hood, run your code with -print. Scala desugares the loop into a foreach and then uses an exception to leave the foreach prematurely.
So what you are trying to do is to break out a loop after your condition is satisfied. Answer here might be what you are looking for. How do I break out of a loop in Scala?.
Overall, for comprehension in Scala is translated into map, flatmap and filter operations. So it will not be possible to break out of these functions unless you throw an exception.
If you are wondering, this is how find is implemented in LineerSeqOptimized.scala; which List inherits
override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
var these = this
while (!these.isEmpty) {
if (p(these.head)) return Some(these.head)
these = these.tail
}
None
}
This is a horrible hack. But it would get you the result you wished for.
Idiomatically you'd use a Stream or View and just compute the parts you need.
def findFirst[T](objects: List[T]): T = {
def expensiveFunc(o : T) = // unclear what should be returned here
case class MissusedException(val data: T) extends Exception
try {
(for (obj <- objects) {
if (expensiveFunc(obj) != null) throw new MissusedException(obj)
})
objects.head // T must be returned from loop, dummy
} catch {
case MissusedException(obj) => obj
}
}
Why not something like
object Main {
def main(args: Array[String]): Unit = {
val seven = (for (
x <- 1 to 10
if x == 7
) yield x).headOption
}
}
Variable seven will be an Option holding Some(value) if value satisfies condition
I hope to help you.
I think ... no 'return' impl.
object TakeWhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else
seq(seq.takeWhile(_ == null).size)
}
object OptionLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T], index: Int = 0): T = if (seq.isEmpty) null.asInstanceOf[T] else
Option(seq(index)) getOrElse func(seq, index + 1)
}
object WhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else {
var i = 0
def obj = seq(i)
while (obj == null)
i += 1
obj
}
}
objects iterator filter { obj => (expensiveFunc(obj) != null } next
The trick is to get some lazy evaluated view on the colelction, either an iterator or a Stream, or objects.view. The filter will only execute as far as needed.

Scala - return empty Option if value contained in array

I'm splitting an input of type Option[String] into an Option[Array[String]] as follows:
val input:Option[String] = Option("a=b,1000,what?")
val result: Option[Array[String]] = input map { _.split(",") }
I want to add a test whereby if any member of the array matches (eg, is an Long less than 0), the whole array is discarded and an empty Option returned.
Use filter to perform a test on the content of an Option.
Use exists to check whether any member of the collection fullfils a condition.
result.filter(! _.exists(s => test(s)))
or
result.filterNot(_.exists(s => test(s)))
Have you considered using find() on the collection ? If it returns a Some(x), then something has satisfied the condition.
list.find(_ < 0) match {
case Some(x) => None
case None => Some(list)
}
Of course you know that you can split and then filter as #ziggystar suggests, but if you have a really big Stringand an element at the beginning matches then it's pointless to finish splitting the string when you know it's going to be discarded.
In this case, if you're worried about time efficiency, you can use a Stream and re-implement the split operation, something like this:
def result(input:Option[String]):Option[Seq[String]] = {
def split(c: Char, chars:Stream[Char]):Stream[String] = {
val (head,tail) = chars span(_ != c)
head.mkString #:: (if(tail isEmpty) Stream.empty else split(c, tail tail))
}
input map {s => split(',', Stream(s:_*)) } filter (_.forall (s => !test(s)))
}
Note that the map/filter structure stays the same, but it is now short-circuiting due to the use of Stream.
If it's a really big string you probably have it as a Stream[Char] already which means you don't even have the memory overhead of hanging on the original String.

Does Scala have a library method to build Option-s that takes into account empty strings?

I want to filter out empty strings to put them into an Option. So I quickly built this now:
def StrictOption(s: String) = s match {
case s if s != null && s.trim.length() > 0 => Some(s)
case _ => None
}
Question: is this maybe already somewhere in the standard library?
I don't think there's one single method in the standard library to do this, but you can do this much more tersely than your implementation.
Option(s).filter(_.trim.nonEmpty)
If you care at all about performance then
if (s.trim.isEmpty) None else Some(s)
is only 4 characters longer than Ben James's solution, and runs 3 times faster, in my benchmark (47 vs 141).
Starting Scala 2.13, for those not expecting nulls (non-Java context), Option::unless and Option::when are now an alternative option:
// val str = "hello"
Option.unless(str.isEmpty)(str)
// Option[String] = Some(hello)
Option.when(str.nonEmpty)(str)
// Option[String] = Some(hello)
// val str: String = ""
Option.unless(str.isEmpty)(str)
// Option[String] = None
Option.when(str.nonEmpty)(str)
// Option[String] = None
There's nothing built in; Ben's filter is the best brief version if performance isn't an issue (e.g. you're validating user input). Typically, performance will not be an issue.
Also, note that it's a little strange to use match when you're not actually matching anything; it's just more boilerplate to get an if-else statement. Just say
if (s ne null && s.trim.length > 0) Some(s) else None
which is about as fast and brief as anything, unless you want to write your own is-it-whitespace method. Note that trim uses a peculiar criterion: anything above \u0020 (i.e. ' ') is not trimmed, and anything equal or below is. So you could also write your own trimmed-string-is-empty detector, if performance of this operation was particularly important:
def ContentOption(s: String): Option[String] = {
if (s ne null) {
var i = s.length-1
while (i >= 0) {
if (s.charAt(i) > ' ') return Some(s)
i -= 1
}
}
None
}
This could also be achieved with a for-comprehension
val res = for (v <- Option(s) if s.nonEmpty) yield v
Option("something") produces Some("something")
Option(null) produces None

Is there a way to handle the last case differently in a Scala for loop?

For example suppose I have
for (line <- myData) {
println("}, {")
}
Is there a way to get the last line to print
println("}")
Can you refactor your code to take advantage of built-in mkString?
scala> List(1, 2, 3).mkString("{", "}, {", "}")
res1: String = {1}, {2}, {3}
Before going any further, I'd recommend you avoid println in a for-comprehension. It can sometimes be useful for tracking down a bug that occurs in the middle of a collection, but otherwise leads to code that's harder to refactor and test.
More generally, life usually becomes easier if you can restrict where any sort of side-effect occurs. So instead of:
for (line <- myData) {
println("}, {")
}
You can write:
val lines = for (line <- myData) yield "}, {"
println(lines mkString "\n")
I'm also going to take a guess here that you wanted the content of each line in the output!
val lines = for (line <- myData) yield (line + "}, {")
println(lines mkString "\n")
Though you'd be better off still if you just used mkString directly - that's what it's for!
val lines = myData.mkString("{", "\n}, {", "}")
println(lines)
Note how we're first producing a String, then printing it in a single operation. This approach can easily be split into separate methods and used to implement toString on your class, or to inspect the generated String in tests.
I agree fully with what has been said before about using mkstring, and distinguishing the first iteration rather than the last one. Would you still need to distinguish on the last, scala collections have an init method, which return all elements but the last.
So you can do
for(x <- coll.init) workOnNonLast(x)
workOnLast(coll.last)
(init and last being sort of the opposite of head and tail, which are the first and and all but first). Note however than depending on the structure, they may be costly. On Vector, all of them are fast. On List, while head and tail are basically free, init and last are both linear in the length of the list. headOption and lastOption may help you when the collection may be empty, replacing workOnlast by
for (x <- coll.lastOption) workOnLast(x)
You may take the addString function of the TraversableOncetrait as an example.
def addString(b: StringBuilder, start: String, sep: String, end: String): StringBuilder = {
var first = true
b append start
for (x <- self) {
if (first) {
b append x
first = false
} else {
b append sep
b append x
}
}
b append end
b
}
In your case, the separator is }, { and the end is }
If you don't want to use built-in mkString function, you can make something like
for (line <- lines)
if (line == lines.last) println("last")
else println(line)
UPDATE: As didierd mentioned in comments, this solution is wrong because last value can occurs several times, he provides better solution in his answer.
It is fine for Vectors, because last function takes "effectively constant time" for them, as for Lists, it takes linear time, so you can use pattern matching
#tailrec
def printLines[A](l: List[A]) {
l match {
case Nil =>
case x :: Nil => println("last")
case x :: xs => println(x); printLines(xs)
}
}
Other answers are rightfully pointed to mkString, and for a normal amount of data I would also use that.
However, mkString builds (accumulates) the end-result in-memory through a StringBuilder. This is not always desirable, depending on the amount of data we have.
In this case, if all we want is to "print" we don't need to build the big-result first (and maybe we even want to avoid this).
Consider the implementation of this helper function:
def forEachIsLast[A](iterator: Iterator[A])(operation: (A, Boolean) => Unit): Unit = {
while(iterator.hasNext) {
val element = iterator.next()
val isLast = !iterator.hasNext // if there is no "next", this is the last one
operation(element, isLast)
}
}
It iterates over all elements and invokes operation passing each element in turn, with a boolean value. The value is true if the element passed is the last one.
In your case it could be used like this:
forEachIsLast(myData) { (line, isLast) =>
if(isLast)
println("}")
else
println("}, {")
}
We have the following advantages here:
It operates on each element, one by one, without necessarily accumulating the result in memory (unless you want to).
Because it does not need to load the whole collection into memory to check its size, it's enough to ask the Iterator if it's exhausted or not. You could read data from a big file, or from the network, etc.

What's the most elegant way to find word pairs in a text with Scala?

Given a list of word pairs
val terms = ("word1a", "word1b") :: ("word2a", "word2b") :: ... :: Nil
What's the most elegant way in Scala to test if at least one of the pairs occur in a text? The test should terminate as quick as possible when it hits the first match. How would you solve that?
EDIT: To be more precise I want to know if both words of a pair appear somewhere (not necessarily in order) in the text. If that's the case for one of the pairs in the list the method should return true. It's not necessary that the matched pair is returned, neither it's important if more than one pair matches.
scala> val text = Set("blah1", "word2b", "blah2", "word2a")
text: scala.collection.immutable.Set[java.lang.String] = Set(blah1, word2b, blah2)
scala> terms.exists{case (a,b) => text(a) && text(b)}
res12: Boolean = true
EDIT: Note that using a set to represent the tokens in the text makes the lookup from the contains much more efficient. You wouldn't want to use something sequential like a List for that.
EDIT 2: Updated for clarification in requirement!
EDIT 3: changed contains to apply per the suggestion in the comment
EDIT - seems like the ambiguous wording of your question means I answered a different question:
Because you are essentially asking for either of the pair; you might as well flatten all these into one big set.
val words = (Set.empty[String] /: terms) { case (s, (w1, w2)) => s + w1 + w2 }
Then you are just asking whether any of these exist in the text:
text.split("\\s") exists words
This is fast because we can use the structure of a Set to lookup quickly whether the word is contained in the text; it terminates early due to the "exists":
scala> val text = "blah1 blah2 word2b"
text: java.lang.String = blah1 blah2 word2b
In the case that your text is very long, you may wish to Stream it, so that the next word to test is lazily computed, rather than split the String into substrings up-front:
scala> val Word = """\s*(.*)""".r
Word: scala.util.matching.Regex = \s*(.*)
scala> def strmWds(text : String) : Stream[String] = text match {
| case Word(nxt) => val (word, rest) = nxt span (_ != ' '); word #:: strmWds(rest)
| case _ => Stream.empty
| }
strmWds: (text: String)Stream[String]
Now you can:
scala> strmWds(text) exists words
res4: Boolean = true
scala> text.split("\\s") exists words
res3: Boolean = true
I'm assuming that both elements of the pair have to appear in the text, but it doesn't matter where, and it doesn't matter which pair appears.
I'm not sure this is the most elegant, but it's not bad, and it's fairly fast if you expect that the text probably has the words (and thus you don't need to read all of it), and if you can generate an iterator that will give you the words one at a time:
case class WordPair(one: String, two: String) {
private[this] var found_one, found_two = false
def check(s: String): Boolean = {
if (s==one) found_one = true
if (s==two) found_two == true
found_one && found_two
}
def reset {
found_one = false
found_two = false
}
}
val wordpairlist = terms.map { case (w1,w2) => WordPair(w1,w2) }
// May need to wordpairlist.foreach(_.reset) first, if you do this on multiple texts
text.iterator.exists(w => wordpairlist.exists(_.check(w)))
You could further improve things by putting all the terms in a set, and not even bothering to check the wordpairlist unless the word from the text was in that set.
If you mean that the words have to occur next to each other in order, you then should change check to
def check(s: String) = {
if (found_one && s==two) found_two = true
else if (s==one) { found_one = true; found_two = false }
else found_two = false
found_one && found_two
}