Replace all elements of a Seq from a String - scala

I have a String and a Seq like :
Array[String] = Array(a, the, an)
String = "This is a sentence that includes articles a, an and the"
I want to replace each element of the Seq within the String with ""
Currently, I'm doing something like :
val a = Array("a" , "the", "an" )
var str = "This is a sentence that includes articles a, an and the"
a.foldLeft( "" ){ (x,y) => str=str.replaceAll(s"\\b${x}\\b", ""); str }
It seems to be working but doesn't look very Scala-ish mostly because of the re-assignment of the string for each iteration.
Is there any other way to do this?

This seems to be the correct variant:
a.foldLeft(str){ case (acc,item) => acc.replaceAll(s"\\b${item}\\b", "")}

It's just
a.foldLeft(str) { (x,y) => x.replaceAll(s"\\b${y}\\b", "") }
For foldLeft, x is already the intermediate result you want, no need to store it in a var.
(As a side note, your original code doesn't work correctly in general: if a is empty, it'll return "" instead of str.)

Related

Extracting a Word after delimiter in scala

I want to extract a word from a String in Scala
val text = "This ball is from Rock. He is good boy"
I want to extract "Rock" from the string.
I tried:
val op = text.subString(4)
text is not a fixed length string. I just want to pick first word after "From".
This doesnt give the right word. can anyone suggest.
This does what you want:
text.drop(text.indexOfSlice("from ")+5).takeWhile(_.isLetter)
or more generally
val delim = "from "
text.drop(text.indexOfSlice(delim)+delim.length).takeWhile(_.isLetter)
The indexOfSlice finds the position of the delimiter and the drop removes everything up to the end of the delimiter. The takeWhile takes all the letters for the word and stops at the first non-letter character (in this case ".").
Note that this is case sensitive so it will not find "From ", it will only work with "from ". If more complex testing is required then use split to convert to separate words and check each word in turn.
This is because you are telling scala to print everything from 4th index to end of string.
In your case you first want to split the string into words which can be done using split function and then access the word you want.
Note: split gives you an array of string and array index begin from 0 in scala so rock would be at the 4th index
This piece of code should work for you. Basically I am using a function to the processing of index of the word that immediately follows a substring (in this case from)
val text = "This ball is from Rock. He is good boy"
val splitText = text.split("\\W+")
println(splitText(4))
Based on your comment below, I would create a code something like this
import scala.util.control.Breaks.{break, breakable}
object Question extends App {
val text = "This ball is from Rock. He is from good boy"
val splitText = text.split("\\W+")
val index = getIndex(splitText, "from")
println(index)
if (index != -1) {
println(splitText(index))
}
def getIndex(arrSplit: Array[String], subString: String): Int = {
var output = -1
var index = 0
breakable {
for(item <- arrSplit) {
if(item.equalsIgnoreCase(subString)) {
output = index + 1
break
}
index = index + 1
}
}
output
}
}
I hope this is what you are expecting:
object Test2 extends App {
val text = "This ball is from Rock. He is good boy"
private val words = text.split(" ")
private val wordToTrack = "from"
val indexOfFrom = words.indexOf(wordToTrack);
val expectedText = words.filterNot{
case word if indexOfFrom < words.size & words.contains(wordToTrack) =>
word == words(indexOfFrom + 1)
case _ => false
}.mkString(" ")
print(expectedText)
}
words.contains(wordToTrack) guards the scenario if the from word(i.e tracking word for this example) is missing in the input text string.
I have used the partial function along with the filter to get the desired result.
You probably want something more general so that you can extract a word from a sentence if that word is present in the input, without having to hard-code offsets:
def extractWordFromString(input: String, word: String): Option[String] = {
val wordLength = word.length
val start = input.indexOfSlice(word)
if (start == -1) None else Some(input.slice(start, start + wordLength))
}
Executing extractWordFromString(text, "Rock") will give you an option containing the target word from input if it was found, and an empty option otherwise. That way you can handle the case where the word you were searching for was not found.

Replacing/deleting some characters from a string which is in a List

I'm supposed to ignore all the ",", ".", "-", and " " " from the string which is in List.
The list is like this: e.g.: List("This is an exercise, which I have problem with", "and I don't know, how to do it., "text-,.")
What I've just tried is map, but it doesn't want to compile. I also wanted to use replace, but decided not to do it, because I would be supposed to make replace for each character I want to ignore, so e.g. replace(",", "").replace(".", "") etc, wouldn't I?
Maybe is there a method where I can put all the characters I want to ignore together?
My code:
val lines = io.Source.fromResource("ogniem-i-mieczem.txt").getLines.toList
println(lines.map{
case "," => ""
case "." => ""
case "-" => ""
case "''" => ""
})
A simple regex and replaceAllIn() should do it.
val inLst =
List("This is an exercise, which I have problem with"
, "and I don't know, how to do it."
, "text-,.")
inLst.map("[-,.\"]".r.replaceAllIn(_, ""))
//res0: List[String] =
// List(This is an exercise which I have problem with
// , and I don't know how to do it
// , text)
You could apply a regex to that and replace them all at once. something like this:
import scala.util.matching.Regex
val regex = "\\.*,*-*\"*".r
val sampleText = "Hi there. This comma, should be gone. and dots and quotes \"as well."
val result = regex.replaceAllIn(sampleText, "")
println(s"result: $result")
// result: Hi there This comma should be gone and dots and quotes as well
applied to your sample code, it could look like this:
import scala.util.matching.Regex
val regex = "\\.*,*-*\"*".r
val result = io.Source
.fromResource("ogniem-i-mieczem.txt")
.getLines
.toList
.map { line => regex.replaceAllIn(line, "") }
println(s"Result: $result")
Other answers based on regex are the way to go but just as learning exercise note that we can conceptualise strings as sequence of characters which means we can treat them as collections so the usual suspects map/filter etc. could also work
lines map { _.filterNot { exclusionList.contains } }
where
val exclusionList = Set(',', '.', '-', '"')

Join array of strings?

In JavaScript you can join an array of strings, e.g.:
fruits = ["orange", "apple", "banana"];
joined = fruits.join(", ");
console.log(joined)
// "orange, apple, banana"
How do you do this in ReasonML?
You can use Js.Array.joinWith:
let fruits = [|"orange", "apple", "banana"|];
let joined = Js.Array.joinWith(", ", fruits);
Js.log(joined);
// "orange, apple, banana"
Converting an array to a string of joined values sounds like a job for Array.fold_left, however running
Array.fold_left((a, b) => a ++ "," ++ b, "", fruits);
produces ",orange,apple,banana".
Ideally the starting value for the fold (second argument) should the the first element in the array and the array actually used would be the rest, this avoids the initial comma. Unfortunately, this isn't easily doable with arrays, but is with lists:
let fruitList = Array.to_list(fruits);
let joined = List.fold_left((a, b) => a ++ "," ++ b, List.hd(fruitList), List.tl(fruitList));
/*joined = "orange,apple,banana"*/
Reasonml docs on lists
Here's how to implement your own join function in ReasonML:
let rec join = (char: string, list: list(string)): string => {
switch(list) {
| [] => raise(Failure("Passed an empty list"))
| [tail] => tail
| [head, ...tail] => head ++ char ++ join(char, tail)
};
};
With this, Js.log(join("$", ["a", "b", "c"])) gives you "a$b$c", much like JavaScript would.

The most idiomatic way to perform conditional concatenation of strings in Scala

I'm curious what would be the best way to build a String value via sequential appending of text chunks, if some of chunks dynamically depend on external conditions. The solution should be idiomatic for Scala without much speed and memory penalties.
For instance, how one could re-write the following Java method in Scala?
public String test(boolean b) {
StringBuilder s = new StringBuilder();
s.append("a").append(1);
if (b) {
s.append("b").append(2);
}
s.append("c").append(3);
return s.toString();
}
Since Scala is both functional and imperative, the term idiomatic depends on which paradigm you prefer to follow. You've solved the problem following the imperative paradigm. Here's one of the ways you could do it functionally:
def test( b : Boolean ) =
"a1" +
( if( b ) "b2" else "" ) +
"c3"
These days, idiomatic means string interpolation.
s"a1${if(b) "b2" else ""}c3"
You can even nest the string interpolation:
s"a1${if(b) s"$someMethod" else ""}"
What about making the different components of the string functions in their own right? They have to make a decision, which is responsibility enough for a function in my book.
def test(flag: Boolean) = {
def a = "a1"
def b = if (flag) "b2" else ""
def c = "c3"
a + b + c
}
The added advantage of this is it clearly breaks apart the different components of the final string, while giving an overview of how they fit together at a high level, unencumbered by anything else, at the end.
As #om-nom-nom said, yours is already sufficiently idiomatic code
def test(b: Boolean): String = {
val sb = new StringBuilder
sb.append("a").append(1)
if (b) sb.append("b").append(2)
sb.append("c").append(3)
sb.toString
}
I can suggest an alternative version, but it's not necessarily more performant or "scala-ish"
def test2(b: Boolean): String = "%s%d%s%s%s%d".format(
"a",
1,
if (b) "b" else "",
if (b) 2 else "",
"c",
3)
In scala, a String can be treated as a sequence of characters. Thus, an idiomatic functional way to solve your problem would be with map:
"abc".map( c => c match {
case 'a' => "a1"
case 'b' => if(b) "b2" else ""
case 'c' => "c3"
case _ =>
}).mkString("")

What's the most elegant way to find word pairs in a text with Scala?

Given a list of word pairs
val terms = ("word1a", "word1b") :: ("word2a", "word2b") :: ... :: Nil
What's the most elegant way in Scala to test if at least one of the pairs occur in a text? The test should terminate as quick as possible when it hits the first match. How would you solve that?
EDIT: To be more precise I want to know if both words of a pair appear somewhere (not necessarily in order) in the text. If that's the case for one of the pairs in the list the method should return true. It's not necessary that the matched pair is returned, neither it's important if more than one pair matches.
scala> val text = Set("blah1", "word2b", "blah2", "word2a")
text: scala.collection.immutable.Set[java.lang.String] = Set(blah1, word2b, blah2)
scala> terms.exists{case (a,b) => text(a) && text(b)}
res12: Boolean = true
EDIT: Note that using a set to represent the tokens in the text makes the lookup from the contains much more efficient. You wouldn't want to use something sequential like a List for that.
EDIT 2: Updated for clarification in requirement!
EDIT 3: changed contains to apply per the suggestion in the comment
EDIT - seems like the ambiguous wording of your question means I answered a different question:
Because you are essentially asking for either of the pair; you might as well flatten all these into one big set.
val words = (Set.empty[String] /: terms) { case (s, (w1, w2)) => s + w1 + w2 }
Then you are just asking whether any of these exist in the text:
text.split("\\s") exists words
This is fast because we can use the structure of a Set to lookup quickly whether the word is contained in the text; it terminates early due to the "exists":
scala> val text = "blah1 blah2 word2b"
text: java.lang.String = blah1 blah2 word2b
In the case that your text is very long, you may wish to Stream it, so that the next word to test is lazily computed, rather than split the String into substrings up-front:
scala> val Word = """\s*(.*)""".r
Word: scala.util.matching.Regex = \s*(.*)
scala> def strmWds(text : String) : Stream[String] = text match {
| case Word(nxt) => val (word, rest) = nxt span (_ != ' '); word #:: strmWds(rest)
| case _ => Stream.empty
| }
strmWds: (text: String)Stream[String]
Now you can:
scala> strmWds(text) exists words
res4: Boolean = true
scala> text.split("\\s") exists words
res3: Boolean = true
I'm assuming that both elements of the pair have to appear in the text, but it doesn't matter where, and it doesn't matter which pair appears.
I'm not sure this is the most elegant, but it's not bad, and it's fairly fast if you expect that the text probably has the words (and thus you don't need to read all of it), and if you can generate an iterator that will give you the words one at a time:
case class WordPair(one: String, two: String) {
private[this] var found_one, found_two = false
def check(s: String): Boolean = {
if (s==one) found_one = true
if (s==two) found_two == true
found_one && found_two
}
def reset {
found_one = false
found_two = false
}
}
val wordpairlist = terms.map { case (w1,w2) => WordPair(w1,w2) }
// May need to wordpairlist.foreach(_.reset) first, if you do this on multiple texts
text.iterator.exists(w => wordpairlist.exists(_.check(w)))
You could further improve things by putting all the terms in a set, and not even bothering to check the wordpairlist unless the word from the text was in that set.
If you mean that the words have to occur next to each other in order, you then should change check to
def check(s: String) = {
if (found_one && s==two) found_two = true
else if (s==one) { found_one = true; found_two = false }
else found_two = false
found_one && found_two
}