Check sequence String to contain strictly two patterns - scala

I have a list of String where the String element could start with prefix of AA, BB or CC. How can I check if the list must and only contains String startWith both AA and BB but not CC. This is what I have now which is working, is that a better way to do it? Thanks.
private final val ValidPatternA: String = "^AA.*"
private final val ValidPatternB: String = "^BB.*"
private final val InvalidPatternC: String = "^CC.*"
def main(args: Array[String]): Unit = {
println(isValid(Seq())) // false
println(isValid(Seq("AA0"))) // false
println(isValid(Seq("BB1"))) // false
println(isValid(Seq("CC2"))) // false
println(isValid(Seq("AA0", "BB1", "CC2"))) // false
println(isValid(Seq("AA0", "CC2"))) // false
println(isValid(Seq("BB1", "CC2"))) // false
println(isValid(Seq("AA0", "BB1"))) // true
}
private def isValid(listOfString: Seq[String]) =
!listOfString.exists(_.matches(InvalidPatternC)) &&
listOfString.exists(_.matches(ValidPatternA)) &&
listOfString.exists(_.matches(ValidPatternB))

The code you have is clear and expressive, so the only concern can be performance. A recursive function can do one pass efficiently:
def isValid(listOfString: Seq[String]) = {
#annotation.tailrec
def loop(rem: List[String], foundA: Boolean, foundB: Boolean): Boolean =
rem match {
case Nil => foundA && foundB
case s"CC$_" :: _ => false
case s"AA$_" :: tail => loop(tail, true, foundB)
case s"BB$_" :: tail => loop(tail, foundA, true)
case hd :: tail => loop(tail, foundA, foundB)
}
loop(listOfString.toList, false, false)
}
The #annotation.tailrec indicates that this will be compiled into a fast loop with rem, foundA and foundB stored in local variables, and loop being a goto back to the start of the function.

You can optimize it by using bit mask to save on number of collection traversals.
private def isValid(listOfString: Seq[String]) =
listOfString.foldLeft(0) { (mask, str) =>
mask | str.matches(ValidPatternA).compare(false) | str.matches(ValidPatternB).compare(false) << 1 | str.matches(InvalidPatternC).compare(false) << 2
} == 3 // 1 & 2 & ^4

Related

Build conditionally list avoiding mutations

Let's say I want to build list of Pizza's ingredients conditionally:
val ingredients = scala.collection.mutable.ArrayBuffer("tomatoes", "cheese")
if (!isVegetarian()) {
ingredients += "Pepperoni"
}
if (shouldBeSpicy()) {
ingredients += "Jalapeno"
}
//etc
Is there functional way to build this array using immutable collections?
I thought about:
val ingredients = List("tomatoes", "cheese") ++ List(
if (!isVegetarian()) Some("Pepperoni") else None,
if (shouldBeSpicy()) Some("Jalapeno") else None
).flatten
but is there better way?
Here is another possible way that is closer to #Antot but IMHO is much simpler.
What is unclear in your original code is where isVegetarian and shouldBeSpicy actually come from. Here I assume that there is a PizzaConf class as following to provide those configuration settings
case class PizzaConf(isVegetarian: Boolean, shouldBeSpicy: Boolean)
Assuming this, I think the simplest way is to have a allIngredients of List[(String, Function1[PizzaConf, Boolean])] type i.e. one that stores ingredients and functions to check their corresponding availability. Given that buildIngredients becomes trivial:
val allIngredients: List[(String, Function1[PizzaConf, Boolean])] = List(
("Pepperoni", conf => conf.isVegetarian),
("Jalapeno", conf => conf.shouldBeSpicy)
)
def buildIngredients(pizzaConf: PizzaConf): List[String] = {
allIngredients
.filter(_._2(pizzaConf))
.map(_._1)
}
or you can merge filter and map using collect as in following:
def buildIngredients(pizzaConf: PizzaConf): List[String] =
allIngredients.collect({ case (ing, cond) if cond(pizzaConf) => ing })
Your original approach is not bad. I would probably just stick with list:
val ingredients =
List("tomatoes", "cheese") ++
List("Pepperoni", "Sausage").filter(_ => !isVegetarian) ++
List("Jalapeno").filter(_ => shouldBeSpicy)
Which makes it easy to add more ingredients connected to a condition (see "Sausage" above)
You could start with the full list of ingredients and then filter out the ingredients not passing the conditions:
Set("tomatoes", "cheese", "Pepperoni", "Jalapeno")
.filter {
case "Pepperoni" => !isVegetarian;
case "Jalapeno" => shouldBeSpicy;
case _ => true // ingredients by default
}
which for:
val isVegetarian = true
val shouldBeSpicy = true
would return:
Set(tomatoes, cheese, Jalapeno)
This can be achieved by creating a sequence of predicates, which defines the conditions applied to filter the ingredients.
// available ingredients
val ingredients = Seq("tomatoes", "cheese", "ham", "mushrooms", "pepper", "salt")
// predicates
def isVegetarian(ingredient: String): Boolean = ingredient != "ham"
def isSpicy(ingredient: String): Boolean = ingredient == "pepper"
def isSalty(ingredient: String): Boolean = ingredient == "salt"
// to negate another predicate
def not(predicate: (String) => Boolean)(ingr: String): Boolean = !predicate(ingr)
// sequences of conditions for different pizzas:
val vegeterianSpicyPizza: Seq[(String) => Boolean] = Seq(isSpicy, isVegetarian)
val carnivoreSaltyNoSpices: Seq[(String) => Boolean] = Seq(not(isSpicy), isSalty)
// main function: builds a list of ingredients for specified conditions!
def buildIngredients(recipe: Seq[(String) => Boolean]): Seq[String] = {
ingredients.filter(ingredient => recipe.exists(_(ingredient)))
}
println("veg spicy: " + buildIngredients(vegeterianSpicyPizza))
// veg spicy: List(tomatoes, cheese, mushrooms, pepper, salt)
println("carn salty: " + buildIngredients(carnivoreSaltyNoSpices))
// carn salty: List(tomatoes, cheese, ham, mushrooms, salt)
Inspired by other answers, I came up with something like this:
case class If[T](conditions: (Boolean, T)*) {
def andAlways(values: T*): List[T] =
conditions.filter(_._1).map(_._2).toList ++ values
}
It could be used like:
val isVegetarian = false
val shouldBeSpicy = true
val ingredients = If(
!isVegetarian -> "Pepperoni",
shouldBeSpicy -> "Jalapeno",
).andAlways(
"Cheese",
"Tomatoes"
)
Still waiting for a better option :)
If any ingredient will only need testing against one condition, you could do something like this:
val commonIngredients = List("Cheese", "Tomatoes")
val nonVegetarianIngredientsWanted = {
if (!isVegetarian)
List("Pepperoni")
else
List.empty
}
val spicyIngredientsWanted = {
if (shouldBeSpicy)
List("Jalapeno")
else
List.empty
}
val pizzaIngredients = commonIngredients ++ nonVegetarianIngredientsWanted ++ spicyIngredientsWanted
This doesn't work if you have ingredients which are tested in two categories: for example if you have spicy sausage then that should only be included if !isVegetarian and spicyIngredientsWanted. One method of doing this would be to test both conditions together:
val (optionalIngredients) = {
(nonVegetarianIngredientsWanted, spicyIngredientsWanted) match {
case (false, false) => List.empty
case (false, true) => List("Jalapeno")
case (true, false) => List("Pepperoni")
case (true, true) => List("Pepperoni, Jalapeno, Spicy Sausage")
}
val pizzaIngredients = commonIngredients ++ optionalIngredients
This can be extended to test any number of conditions, though of course the number of case arms needed extends exponentially with the number of conditions tested.

Using a predicate to search through a string - Scala

I'm having difficulty figuring out how to search through a string with a given predicate and determining its position in the string.
def find(x: Char => Boolean): Boolean = {
}
Example, if x is (_ == ' ')
String = "hi my name is"
It would add 2 to the counter and return true
I'm guessing that this is what you want...
Since find is a higher-order function (HOF) - that is, it's a function that takes a function as an argument - it likely needs to be applied to a String instance. The predicate (the function argument to find) determines when the character you're looking for is found, and the find method reports the position at which the character was found. So find should return an Option[Int], not a Boolean, that way you don't lose the information about where the character was found. Note that you can still change an Option[Int] result to a Boolean value (with true indicating the search was successful, false not) by applying .isDefined to the result.
Note that I've renamed find to myFind to avoid a clash with the built-in String.find method (which does a similar job).
import scala.annotation.tailrec
// Implicit class cannot be a top-level element, so it's put in an object.
object StringUtils {
// "Decorate" strings with additional functions.
final implicit class MyRichString(val s: String)
extends AnyVal {
// Find a character satisfying predicate p, report position.
def myFind(p: Char => Boolean): Option[Int] = {
// Helper function to keep track of current position.
#tailrec
def currentPos(pos: Int): Option[Int] = {
// If we've passed the end of the string, return None. Didn't find a
// character satisfying predicate.
if(pos >= s.length) None
// Otherwise, if the predicate passes for the current character,
// return position wrapped in Some.
else if(p(s(pos))) Some(pos)
// Otherwise, perform another iteration, looking at the next character.
else currentPos(pos + 1)
}
// Start by looking at the first (0th) character.
currentPos(0)
}
}
}
import StringUtils._
val myString = "hi my name is"
myString.myFind(_ == ' ') // Should report Some(2)
myString.myFind(_ == ' ').isDefined // Should report true
myString.myFind(_ == 'X') // Should report None
myString.myFind(_ == 'X').isDefined // Should report false
If the use of an implicit class is a little too much effort, you could implement this as a single function that takes the String as an argument:
def find(s: String, p: Char => Boolean): Option[Int] = {
// Helper function to keep track of current position.
#tailrec
def currentPos(pos: Int): Option[Int] = {
// If we've passed the end of the string, return None. Didn't find a
// character satisfying predicate.
if(pos >= s.length) None
// Otherwise, if the predicate passes for the current character,
// return position wrapped in Some.
else if(p(s(pos))) Some(pos)
// Otherwise, perform another iteration, looking at the next character.
else currentPos(pos + 1)
}
// Start by looking at the first (0th) character.
currentPos(0)
}
val myString = "hi my name is"
find(myString, _ == ' ') // Should report Some(2)
find(myString, _ == ' ').isDefined // Should report true
find(myString, _ == 'X') // Should report None
find(myString, _ == 'X').isDefined // Should report false
Counter:
"hi my name is".count (_ == 'm')
"hi my name is".toList.filter (_ == 'i').size
Boolean:
"hi my name is".toList.exists (_ == 'i')
"hi my name is".contains ('j')
Position(s):
"hi my name is".zipWithIndex.filter {case (a, b) => a == 'i'}
res8: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((i,1), (i,11))
Usage of find:
scala> "hi my name is".find (_ == 'x')
res27: Option[Char] = None
scala> "hi my name is".find (_ == 's')
res28: Option[Char] = Some(s)
I would suggest separating the character search and position into individual methods, each of which leverages built-in functions in String, and wrap in an implicit class:
object MyStringOps {
implicit class CharInString(s: String) {
def charPos(c: Char): Int = s.indexOf(c)
def charFind(p: Char => Boolean): Boolean =
s.find(p) match {
case Some(_) => true
case None => false
}
}
}
import MyStringOps._
"hi my name is".charPos(' ')
// res1: Int = 2
"hi my name is".charFind(_ == ' ')
// res2: Boolean = true

Better safe get from an array in scala?

I want to get first argument for main method that is optional, something like this:
val all = args(0) == "all"
However, this would fail with exception if no argument is provided.
Is there any one-liner simple method to set all to false when args[0] is missing; and not doing the common if-no-args-set-false-else... thingy?
In general case you can use lifting:
args.lift(0).map(_ == "all").getOrElse(false)
Or even (thanks to #enzyme):
args.lift(0).contains("all")
You can use headOption and fold (on Option):
val all = args.headOption.fold(false)(_ == "all")
Of course, as #mohit pointed out, map followed by getOrElse will work as well.
If you really need indexed access, you could pimp a get method on any Seq:
implicit class RichIndexedSeq[V, T <% Seq[V]](seq: T) {
def get(i: Int): Option[V] =
if (i < 0 || i >= seq.length) None
else Some(seq(i))
}
However, if this is really about arguments, you'll be probably better off, handling arguments in a fold:
case class MyArgs(n: Int = 1, debug: Boolean = false,
file: Option[String] = None)
val myArgs = args.foldLeft(MyArgs()) {
case (args, "-debug") =>
args.copy(debug = true)
case (args, str) if str.startsWith("-n") =>
args.copy(n = ???) // parse string
case (args, str) if str.startsWith("-f") =>
args.copy(file = Some(???) // parse string
case _ =>
sys.error("Unknown arg")
}
if (myArgs.file.isEmpty)
sys.error("Need file")
You can use foldLeft with initial false value:
val all = (false /: args)(_ | _ == "all")
But be careful, One Liners can be difficult to read.
Something like this will work assuming args(0) returns Some or None:
val all = args(0).map(_ == "all").getOrElse(false)

Simple functionnal way for grouping successive elements? [duplicate]

I'm trying to 'group' a string into segments, I guess this example would explain it more succintly
scala> val str: String = "aaaabbcddeeeeeeffg"
... (do something)
res0: List("aaaa","bb","c","dd","eeeee","ff","g")
I can thnk of a few ways to do this in an imperative style (with vars and stepping through the string to find groups) but I was wondering if any better functional solution could
be attained? I've been looking through the Scala API but there doesn't seem to be something that fits my needs.
Any help would be appreciated
You can split the string recursively with span:
def s(x : String) : List[String] = if(x.size == 0) Nil else {
val (l,r) = x.span(_ == x(0))
l :: s(r)
}
Tail recursive:
#annotation.tailrec def s(x : String, y : List[String] = Nil) : List[String] = {
if(x.size == 0) y.reverse
else {
val (l,r) = x.span(_ == x(0))
s(r, l :: y)
}
}
Seems that all other answers are very concentrated on collection operations. But pure string + regex solution is much simpler:
str split """(?<=(\w))(?!\1)""" toList
In this regex I use positive lookbehind and negative lookahead for the captured char
def group(s: String): List[String] = s match {
case "" => Nil
case s => s.takeWhile(_==s.head) :: group(s.dropWhile(_==s.head))
}
Edit: Tail recursive version:
def group(s: String, result: List[String] = Nil): List[String] = s match {
case "" => result reverse
case s => group(s.dropWhile(_==s.head), s.takeWhile(_==s.head) :: result)
}
can be used just like the other because the second parameter has a default value and thus doesnt have to be supplied.
Make it one-liner:
scala> val str = "aaaabbcddddeeeeefff"
str: java.lang.String = aaaabbcddddeeeeefff
scala> str.groupBy(identity).map(_._2)
res: scala.collection.immutable.Iterable[String] = List(eeeee, fff, aaaa, bb, c, dddd)
UPDATE:
As #Paul mentioned about the order here is updated version:
scala> str.groupBy(identity).toList.sortBy(_._1).map(_._2)
res: List[String] = List(aaaa, bb, c, dddd, eeeee, fff)
You could use some helper functions like this:
val str = "aaaabbcddddeeeeefff"
def zame(chars:List[Char]) = chars.partition(_==chars.head)
def q(chars:List[Char]):List[List[Char]] = chars match {
case Nil => Nil
case rest =>
val (thesame,others) = zame(rest)
thesame :: q(others)
}
q(str.toList) map (_.mkString)
This should do the trick, right? No doubt it can be cleaned up into one-liners even further
A functional* solution using fold:
def group(s : String) : Seq[String] = {
s.tail.foldLeft(Seq(s.head.toString)) { case (carry, elem) =>
if ( carry.last(0) == elem ) {
carry.init :+ (carry.last + elem)
}
else {
carry :+ elem.toString
}
}
}
There is a lot of cost hidden in all those sequence operations performed on strings (via implicit conversion). I guess the real complexity heavily depends on the kind of Seq strings are converted to.
(*) Afaik all/most operations in the collection library depend in iterators, an imho inherently unfunctional concept. But the code looks functional, at least.
Starting Scala 2.13, List is now provided with the unfold builder which can be combined with String::span:
List.unfold("aaaabbaaacdeeffg") {
case "" => None
case rest => Some(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
or alternatively, coupled with Scala 2.13's Option#unless builder:
List.unfold("aaaabbaaacdeeffg") {
rest => Option.unless(rest.isEmpty)(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
Details:
Unfold (def unfold[A, S](init: S)(f: (S) => Option[(A, S)]): List[A]) is based on an internal state (init) which is initialized in our case with "aaaabbaaacdeeffg".
For each iteration, we span (def span(p: (Char) => Boolean): (String, String)) this internal state in order to find the prefix containing the same symbol and produce a (String, String) tuple which contains the prefix and the rest of the string. span is very fortunate in this context as it produces exactly what unfold expects: a tuple containing the next element of the list and the new internal state.
The unfolding stops when the internal state is "" in which case we produce None as expected by unfold to exit.
Edit: Have to read more carefully. Below is no functional code.
Sometimes, a little mutable state helps:
def group(s : String) = {
var tmp = ""
val b = Seq.newBuilder[String]
s.foreach { c =>
if ( tmp != "" && tmp.head != c ) {
b += tmp
tmp = ""
}
tmp += c
}
b += tmp
b.result
}
Runtime O(n) (if segments have at most constant length) and tmp.+= probably creates the most overhead. Use a string builder instead for strict runtime in O(n).
group("aaaabbcddeeeeeeffg")
> Seq[String] = List(aaaa, bb, c, dd, eeeeee, ff, g)
If you want to use scala API you can use the built in function for that:
str.groupBy(c => c).values
Or if you mind it being sorted and in a list:
str.groupBy(c => c).values.toList.sorted

Functional way to update an object based on flags

suppose you are writing a class that normalizes strings. That class has a number of configuration flags. For example:
val makeLowerCase: Boolean = true
val removeVowels: Boolean = false
val dropFirstCharacter: Boolean = true
If I were writing mutable code, I would write the following for the normalize method.
def normalize(string: String) = {
var s = string
if (makeLowerCase) {
s = s.toLowerCase
}
if (removeVowels) {
s = s.replaceAll("[aeiou]", "")
}
if (dropFirstCharacter) {
s = s.drop(1)
}
s
}
Is there a clean and easy way of writing these without mutation? Nested conditionals becomes nasty fast. I could create a list of String=>String lambdas, filter it based on the configuration, and then fold the string through it, but I hope there is something easier.
Your best bet is to define your own method:
class ConditionalMapper[A](a: A) {
def changeCheck(p: A => Boolean)(f: A => A) = if (p(a)) f(a) else a
def changeIf(b: Boolean)(f: A => A) = if (b) f(a) else a
}
implicit def conditionally_change_anything[A](a: A) = new ConditionalMapper(a)
Now you chain these things together and write:
class Normer(makeLC: Boolean, remVowel: Boolean, dropFirst: Boolean) {
def normalize(s: String) = {
s.changeIf(makeLC) { _.toLowerCase }
.changeIf(remVowel) { _.replaceAll("[aeiou]","") }
.changeIf(dropFirst){ _.substring(1) }
}
}
Which gives you:
scala> val norm = new Normer(true,false,true)
norm: Normer = Normer#2098746b
scala> norm.normalize("The Quick Brown Fox Jumps Over The Lazy Dog")
res1: String = he quick brown fox jumps over the lazy dog
That said, the mutable solution is not bad either--just keep it to a small block and you'll be fine. It's mostly a problem when you let mutability escape into the wild. (Where "the wild" means "outside your method, or inside any method more than a handful of lines long".)
If you use scalaz |> operator or have a similar one defined in your utility classes you can do this:
case class N(
makeLowerCase: Boolean = true,
removeVowels: Boolean = false,
dropFirstCharacter: Boolean = true) {
def normalize(string: String) = (
string
|> (s => if (makeLowerCase) s.toLowerCase else s)
|> (s => if (removeVowels) s.replaceAll("[aeiou]", "") else s)
|> (s => if (dropFirstCharacter) s.drop(1) else s)
)
}
N(removeVowels=true).normalize("DDABCUI")
// res1: String = dbc