How to determine if a string contains any element of a set - scala

I have a sentence, and I want to determine if it contains any elements of a set.
val sentence = "Hello, today is a fine day to learn scala"
val mySet = Set("day", "scala")

What about:
mySet.exists(word => sentence.contains(word))
It will return true if at least one word from the set is present in the string.

Here's a solution that...
is case-insensitive ("scala" does match "Scala")
ignores sub-strings ("rat" does not match "rats")
ignores punctuation (!?,-) unless specifically specified in mySet
mySet.mkString("(?i)\\b(", "|", ")\\b")
.r.unanchored
.matches(sentence)

Related

Match capitalise and lower case

I’m trying to match the Scala string sequence with .contains(“pear”). I’m able to match pear, but is there any other way to match no matter capital or lower case of the “Pear” other than toLowerCase first or using regex? This is what I did so far.
val fruits = Seq("apple", "PEAR")
fruits.map(_.toLowerCase).contains("pear")
Boolean = true
As sinanspd said
fruits.exists(_.equalsIgnoreCase("pear"))
This is better since rather than converting every element of fruits to lowercase, it only converts as many characters as needed to reject or match an element.

Scala: Transforming List of Strings containing long descriptions to list of strings containing only last sentences

I have a List[String], for example:
val test=List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.",
..)
I want the result to be like:
List("I want this sentence", "But I want this sentence"..)
I tried few approaches but didn't work
test.map(x=>x.split(".").reverse.head)
test.map(x=>x.split(".").last)
Try using this
test.reverse.head.split("\\.").last
To handle any Exception
Try(List[String]().reverse.head.split("\\.").last).getOrElse("YOUR_DEFAULT_STRING")
You can map over you List, split each String and then take the last element. Try the below code.
val list = List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.")
list.map(_.split("\\.").last.trim)
It will give you
List(I want this sentence, But I want this sentence)
test.map (_.split("\\.").last)
Split takes a regular expression, and in such, the dot stands for every character, so you have to mask it.
Maybe you want to include question marks and bangs:
test.map (_.split("[!?.]").last)
and trim surrounding whitespace:
test.map (_.split("[!?.]").last.trim).
The reverse.head would have been a good idea, if there wasn't the last:
scala> test.map (_.split("[!?.]").reverse.head.trim)
res138: List[String] = List(I want this sentence, But I want this sentence)
You can do this a number of ways:
For each string in your original list: split by ., reverse the list, take the first value
test.map(_.split('.').reverse.headOption)
// List(Some( I want this sentence), Some( But I want this sentence))
.headOption results in Some("string") or None, and you can do something like a .getOrElse("no valid string found") on it. You can trim the unwanted whitespace if you want.
Regex match
test.map { sentence =>
val regex = ".*\\.\\s*([^.]*)\\.$".r
val regex(value) = sentence
value
}
This will fetch any string at the end of a longer string which is preceded by a full stop and a space and followed by a full stop. You can modify the regex to change the exact rules of the regex, and I recommend playing around with regex101.com if you fancy learning more regex. It's very good.
This solution is better for more complicated examples and requirements, but it's worth keeping in mind. If you are worried that the regex might not match, you can do something like checking if the regex matches before extracting it:
test.map { sentence =>
val regexString = ".*\\.\\s*([^.]*)\\.$"
val regex = regexString.r
if(sentence.matches(regexString)) {
val regex(value) = sentence
value
} else ""
}
Take the last after splitting the string by .
test.map(_.split('.').map(_.trim).lastOption)

How to strip everything except digits from a string in Scala (quick one liners)

This is driving me nuts... there must be a way to strip out all non-digit characters (or perform other simple filtering) in a String.
Example: I want to turn a phone number ("+72 (93) 2342-7772" or "+1 310-777-2341") into a simple numeric String (not an Int), such as "729323427772" or "13107772341".
I tried "[\\d]+".r.findAllIn(phoneNumber) which returns an Iteratee and then I would have to recombine them into a String somehow... seems horribly wasteful.
I also came up with: phoneNumber.filter("0123456789".contains(_)) but that becomes tedious for other situations. For instance, removing all punctuation... I'm really after something that works with a regular expression so it has wider application than just filtering out digits.
Anyone have a fancy Scala one-liner for this that is more direct?
You can use filter, treating the string as a character sequence and testing the character with isDigit:
"+72 (93) 2342-7772".filter(_.isDigit) // res0: String = 729323427772
You can use replaceAll and Regex.
"+72 (93) 2342-7772".replaceAll("[^0-9]", "") // res1: String = 729323427772
Another approach, define the collection of valid characters, in this case
val d = '0' to '9'
and so for val a = "+72 (93) 2342-7772", filter on collection inclusion for instance with either of these,
for (c <- a if d.contains(c)) yield c
a.filter(d.contains)
a.collect{ case c if d.contains(c) => c }

Getting IndexOutOfBounds Exception while search for a subtring

I have a string like
var word = "banana"
and a sentence like var sent = "the monkey is holding a banana which is yellow"
sent1 = "banana!!"
I want to search banana in sent and then write to a file in the following way:
the monkey is holding a
banana
which is yellow
I'm doing it in the following way:
var before = sent.substring(0, sent.indexOf(word))
var after = sent.substring(sent.indexOf(word) + word.length)
println(before)
println(after)
This works fine but when I do the same for sent1, then it gives me IndexOutOfBoundsException. I think it is because there is nothing before banana in sent1. How to deal with this?
You can split based on the word and you will get an array with everything before and after the word.
val search = sent.split(word)
search: Array[String] = Array("the monkey is holding a ", " which is yellow")
This works in the "banana!!!" case:
"banana!!".split(word)
res5: Array[String] = Array("", !!)
Now you can write the three lines to a file in your favorite way:
println(search(0))
println(word)
println(search(1))
What if you had more than one occurrence of the word? .split understands regular expressions, so you could improve the previous solution with something like this:
string
.replaceAll("\\s+(?=banana)|(?<=banana)\\s+")
.foreach(println)
\\s means a whitespace character
(?=<word>) means "followed by <word>"
(?<=<word>) means "preceded by <word>"
So, this would split your string into pieces, using any spaces either preceded or followed by the "banana", and not the word itself. The actual word ends up in the list, just like the other parts of the string, so you don't need to print it out explicitly
This regex trick is called "positive look-around" ( ?= is look-ahead, ?<= is look-behind) in case you are wondering.

Implement Scala-style String Interpolation In Scala

I want to implement a Scala-style string interpolation in Scala. Here is an example,
val str = "hello ${var1} world ${var2}"
At runtime I want to replace "${var1}" and "${var2}" with some runtime strings. However, when trying to use Regex.replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String), I ran into the following problem:
import scala.util.matching.Regex
val placeholder = new Regex("""(\$\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
java.lang.IllegalArgumentException: No group with name {var1}
at java.util.regex.Matcher.appendReplacement(Matcher.java:800)
at scala.util.matching.Regex$Replacement$class.replace(Regex.scala:722)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:700)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:743)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:410)
... 32 elided
However, when I removed '$' from the regular expression, it worked:
val placeholder = new Regex("""(\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
res2: String = hello $A{var1}B world $A{var2}B
So my question is that whether this is a bug in Scala Regex. And if so, are there other elegant ways to achieve the same goal (other than brutal force replaceAllLiterally on all placeholders)?
$ is a treated specially in the replacement string. This is described in the documentation of replaceAllIn:
In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.
(Actually, that doesn't mention named group references, so I guess it's only sort of documented.)
Anyway, the takeaway here is that you need to escape the $ characters in the replacement string if you don't want them to be treated as references.
new scala.util.matching.Regex("""(\$\{\w+\})""")
.replaceAllIn("hello ${var1} world ${var2}", m => s"A\\${m.matched}B")
// "hello A${var1}B world A${var2}B"
It's hard to tell what you're expecting the behavior to do. The issue is that s"${m.matched}" is turning into "${var1}" (and "${var2}"). The '$' is special character to say "place the group with name {var1} here instead".
For example:
scala> placeholder.replaceAllIn(str, m => "$1")
res0: String = hello ${var1} world ${var2}
It replaces the match with the first capturing group (which is m itself).
It's hard to tell exactly what you're doing, but you could escape any $ like so:
scala> placeholder.replaceAllIn(str, m => s"${m.matched.replace("$","\\$")}")
res1: String = hello ${var1} world ${var2}
If what you really want to do is evaluate var1/var2 for some variables in the local scope of the method; that's not possible. In fact, the s"Hello, $name" pattern is actually converted into new StringContext("Hello, ", "").s(name) at compile time.