Extracting a Word after delimiter in scala - scala

I want to extract a word from a String in Scala
val text = "This ball is from Rock. He is good boy"
I want to extract "Rock" from the string.
I tried:
val op = text.subString(4)
text is not a fixed length string. I just want to pick first word after "From".
This doesnt give the right word. can anyone suggest.

This does what you want:
text.drop(text.indexOfSlice("from ")+5).takeWhile(_.isLetter)
or more generally
val delim = "from "
text.drop(text.indexOfSlice(delim)+delim.length).takeWhile(_.isLetter)
The indexOfSlice finds the position of the delimiter and the drop removes everything up to the end of the delimiter. The takeWhile takes all the letters for the word and stops at the first non-letter character (in this case ".").
Note that this is case sensitive so it will not find "From ", it will only work with "from ". If more complex testing is required then use split to convert to separate words and check each word in turn.

This is because you are telling scala to print everything from 4th index to end of string.
In your case you first want to split the string into words which can be done using split function and then access the word you want.
Note: split gives you an array of string and array index begin from 0 in scala so rock would be at the 4th index
This piece of code should work for you. Basically I am using a function to the processing of index of the word that immediately follows a substring (in this case from)
val text = "This ball is from Rock. He is good boy"
val splitText = text.split("\\W+")
println(splitText(4))
Based on your comment below, I would create a code something like this
import scala.util.control.Breaks.{break, breakable}
object Question extends App {
val text = "This ball is from Rock. He is from good boy"
val splitText = text.split("\\W+")
val index = getIndex(splitText, "from")
println(index)
if (index != -1) {
println(splitText(index))
}
def getIndex(arrSplit: Array[String], subString: String): Int = {
var output = -1
var index = 0
breakable {
for(item <- arrSplit) {
if(item.equalsIgnoreCase(subString)) {
output = index + 1
break
}
index = index + 1
}
}
output
}
}

I hope this is what you are expecting:
object Test2 extends App {
val text = "This ball is from Rock. He is good boy"
private val words = text.split(" ")
private val wordToTrack = "from"
val indexOfFrom = words.indexOf(wordToTrack);
val expectedText = words.filterNot{
case word if indexOfFrom < words.size & words.contains(wordToTrack) =>
word == words(indexOfFrom + 1)
case _ => false
}.mkString(" ")
print(expectedText)
}
words.contains(wordToTrack) guards the scenario if the from word(i.e tracking word for this example) is missing in the input text string.
I have used the partial function along with the filter to get the desired result.

You probably want something more general so that you can extract a word from a sentence if that word is present in the input, without having to hard-code offsets:
def extractWordFromString(input: String, word: String): Option[String] = {
val wordLength = word.length
val start = input.indexOfSlice(word)
if (start == -1) None else Some(input.slice(start, start + wordLength))
}
Executing extractWordFromString(text, "Rock") will give you an option containing the target word from input if it was found, and an empty option otherwise. That way you can handle the case where the word you were searching for was not found.

Related

Customised string in spark scala

I have a string like "debug#compile". Now, my end goal is to convert first letter of each word to uppercase. So, at last I should get "Debug#Compile" where 'D' and 'C' are converted to uppercase.
My logic:
1) I have to split the string on the basis of delimiters. It will be special characters.So, I have to check everytime.
2) After that I would convert each word's first letter to upper case and then using map I would join it again.
I am trying my best but not able to design the code for this. Can anyone help me in this. Even hints would help!
Below is my code:
object WordCase {
def main(args: Array[String]) {
val s="siddhesh#kalgaonkar"
var b=""
val delimeters= Array("#","_")
if(delimeters(0)=="#")
{
b=s.split(delimeters(0).toString).map(_.capitalize).mkString(delimeters(0).toString())
}
else if(delimeters(0)=="_")
{
b=s.split(delimeters(0).toString).map(_.capitalize).mkString(delimeters(0).toString())
}
else{
println("Non-Standard String")
}
println(b)
}
}
My code capitalizes the first letter of every word in capital on the basis of constant delimeter and have to merge it. Here for the first part i.e "#" it capitalizes first letter of every words but it fails for the second case i.e "_". Am I makinig any silly mistakes in looping?
scala> val s="siddhesh#kalgaonkar"
scala> val specialChar = (s.split("[a-zA-Z0-9]") filterNot Seq("").contains).mkString
scala> s.replaceAll("[^a-zA-Z]+"," ").split(" ").map(_.capitalize).mkString(",").replaceAll(",",specialChar)
res41: String = Siddhesh#Kalgaonkar
You can manage multiple special char in this way
scala> val s="siddhesh_kalgaonkar"
s: String = siddhesh_kalgaonkar
scala> val specialChar = (s.split("[a-zA-Z0-9]") filterNot Seq("").contains).mkString
specialChar: String = _
scala> s.replaceAll("[^a-zA-Z]+"," ").split(" ").map(_.capitalize).mkString(",").replaceAll(",",specialChar)
res42: String = Siddhesh_Kalgaonkar
I solved it the easy way:
object WordCase {
def main(args: Array[String]) {
val s = "siddhesh_kalgaonkar"
var b = s.replaceAll("[^a-zA-Z]+", " ").split(" ").map(_.capitalize).mkString(" ") //Replacing delimiters with space and capitalizing first letter of each word
val c=b.indexOf(" ") //Getting index of space
val d=s.charAt(c).toString // Getting delimiter character from the original string
val output=b.replace(" ", d) //Now replacing space with the delimiter character in the modified string i.e 'b'
println(output)
}
}

extraction with scala regex

I have the following string as input :
val input=" 12 BANANAPPLESTRAWBERRY LOC 8(05). "
And I want to parse this string using Regex and pattern matching
The wanted output shall look like:
val output=" BANANAPPLESTRAWBERRY Int(5)"
I wrote the following function:
val rega = """(.{6}[^*])(\s*)(\d*)(\s*)(\S*)(\s*)(\S{4})(8\(0*(\d+)\))(.*)""".r
def functionEx(s: String, regb: Regex): String = {
s match {
case regb(start,space, nb, space2, naame, space3, loc, vartype, length, end) => name + "(Int"+ length +")"
case _ => ""
}
}
When I call this function on my input : functionEx(input,rega) I got the empty string : ""
Any help with please
Thanks a lot
I don't know what you tried there with that regex. It captures all kind of groups that are seemingly mostly irrelevant (whitespace, full stop in the end). It also contains parts like \S{4} that don't seem to match anything, and magic number like 9 that doesn't appear in the input string. So, obviously, it doesn't match the input, so the second case in the pattern matching applies, and a "" is returned.
Try this instead:
val input=" 12 BANANAPPLESTRAWBERRY LOC 8(05). "
import scala.util.matching.Regex
val rega = """^(.{6})\s+(\d*)\s+(\w+)\s+(\w+)\s+8\(0*(\d+)\)\s*\.\s*""".r
def functionEx(s: String, regb: Regex): String = {
s match {
case regb(start, nb, name, loc, length) => name + " Int("+ length +")"
case _ => ""
}
}
println(functionEx(input ,rega))
println(functionEx(input ,rega))
It produces the following output:
BANANAPPLESTRAWBERRY Int(5)

Minimal Substring Satisfying a condition in scala

I have a string, lets say val mystr = "abcde", and I want to find the minimal substring of mystr which satisfies a given condition. I have to send a string to an external system, so the only way to do this is to iterate through the length of the string and make requests to the external system, and break when the response from the external system returns true
eg.
callExtSystemWith("a") //Returns false
callExtSystemWith("ab") //Returns false
callExtSystemWith("abc") //Returns true
Then my method should return "abc". I read that breaks are not the scala way, so was wondering what is the scala way of achieving this?
Right now I have:
for {end <- 1 to mystr.length)}{
callExtSystemWith(mystr.substring(0,end))
// I Want to break when this is true.
}
Help much appreciated
You can use inits.toStream.reverse.drop(1) (1 to s.length).map(s.take).toStream to create a lazy stream with a, ab, abc, abcd.
Then filter those strings, so that only the ones for which callExtSystemWith returns true are left.
Then get the first string for which callExtSystemWith returns true. Because this is a lazy stream, no unecessary requests will be made to the server once the first match is found.
val s = "abcdefgh"
val strs = (1 to s.length).map(s.take).toStream
strs.filter(callExtSystemWith).headOption match {
case Some(s) => "found"
case _ => "not found"
}
You can also use find instead of filter + headOption
Quite often break can be replaced with find on some sequence
So here is another short solution for this problem:
def findSuitablePrefix(mystr: String): Option[String] =
(1 to mystr.length).view.map(mystr.substring(0, _)).find(callExtSystemWith)
.view makes the evaluation lazy to avoid creating extra substrings.
.map transforms the sequence of indexes into a sequence of substrings.
And .find "breaks" after the first element for which callExtSystemWith returns true is found.
In Scala there are no normal breaks but there are other solutions. The one I like better is to create a function and force a return (instead of a normal break). Something like:
def callAndBreak(mystr:String) : Int = {
for (end <- 1 to mystr.length) {
if ( callExtSystemWith(mystr.substring(0,end)) ) return end
}
end
}
Here I return end but you can return anything
If you want to avoid using return or breaks, you could also use foldLeft:
val finalResult = (1 to mystr.length).foldLeft(false) { (result, end) =>
if(!result) callExtSystemWith(mystr.substring(0, end)) else result
}
However, it is a bit hard to read, and will walk the entire length of the string.
Simple recursion might be a better way:
def go(s: String, end: Int): Boolean = {
if(end >= s.length) false
else {
callExtSystemWith(s.substring(0, end)) || go(s, end + 1)
}
}
go(mystr, 1)

Scala Connect Four Moving Between Rows

I am trying to make a connect four game in scala. Currently i have it so it prints the board everytime and switches between players asking them to enter which column they want. The problem is I dont know how to change the rows. All 64 moves stay in row 7 (the 1st row). I was thinking of somehow doing a check that will check if there is already an X or O in the spot the user wants to play and just bump up the row. Would i use a if else for this? So if x or o is there move up a row else make move.
// Initialize the grid
val table = Array.fill(9,8)('.')
var i = 0;
while(i < 8){
table(8)(i) = (i+'0').toChar
i = i+1;
}
/* printGrid: Print out the grid provided */
def printGrid(table: Array[Array[Char]]) {
table.foreach( x => println(x.mkString(" ")))
}
var allowedMoves = 64
var spotsLeft = 8
//Player One
def player1(){
printGrid(table)
println("Player 1 it is your turn. Choose a column 0-7")
val move = readInt
table(7)(move) = ('X')
}
//Player Two
def player2(){
printGrid(table)
println("Player 1 it is your turn. Choose a column 0-7")
val move = readInt
table(7)(move) = ('O')
for (turn <- 1 to 32) {
player 1
player 2
}
I am not sure if you're still baffled by my comments, but let me try to give you a more in-depth explanation here.
The code in question is:
//Player One
def player1(){
printGrid(table)
println("Player 1 it is your turn. Choose a column 0-7")
val move = readInt
table(7)(move) = ('X')
}
//Player Two
def player2(){
printGrid(table)
println("Player 1 it is your turn. Choose a column 0-7")
val move = readInt
table(7)(move) = ('O')
}
for (turn <- 1 to 32) {
player1
player2
}
where the players take turns. But before going straight to the answer, let us refactor this code a bit by removing the duplication we have here. player1 and player2 are almost the same implementation. So let us pass the distinct parts as a parameter. The distinct part is the name of the player and the symbol that represents this player in the table. So let us define a class Player:
case class Player(name: String, symbol: Char)
and contract the two functions into one:
def player(player: Player): Unit ={
printGrid(table)
println(s"${player.name} it is your turn. Choose a column 0-7")
val move = readInt
table(7)(move) = player.symbol
}
for (turn <- 1 to 32) {
player(Player("Player 1", 'X'))
player(Player("Player 2", 'O'))
}
Now, we don't have to do everything twice, but the problem is still the same.
Okay, let's say, we are going to use conditionals: If table(7)(move) is occupied, then we choose table(6)(move). However, if this is also occupied, we choose table(5)(move). This goes on, until we find the column is completely full, in which case we may for example want to throw an exception. In code, this would look as follows:
def player(player: Player): Unit = {
printGrid(table)
println(s"${player.name} it is your turn. Choose a column 0-7")
val move = readInt
if (table(7)(move) != '.') {
if (table(6)(move) != '.') {
if (table(5)(move) != '.') {
if (table(4)(move) != '.') {
if (table(3)(move) != '.') {
if (table(2)(move) != '.') {
if (table(1)(move) != '.') {
if (table(0)(move) != '.') {
throw new IllegalArgumentException(s"Column $move is already full")
} else table(0)(move) = player.symbol
} else table(1)(move) = player.symbol
} else table(2)(move) = player.symbol
} else table(3)(move) = player.symbol
} else table(4)(move) = player.symbol
} else table(5)(move) = player.symbol
} else table(6)(move) = player.symbol
} else table(7)(move) = player.symbol
}
Let's run the code and... yay, it works!. But it is terrible code. There is an awful lot of duplication and we couldn't easily make the table bigger.
Okay, what problem do we really want to solve? We want to find the highest index of the row that has a free space at move, i.e., contains a '.' at move.
How could we find this index? There exists a function indexOf that takes an argument x and returns the index of the first occurrence of x in the array. But our array table is a two-dimensional array, and we care only about the value at move in each inner array. Fortunately, every collection in Scala provides a function map to map each element, so we could do: table map (_(move)).
Say that we receive the following array: . . O X O O 2, so the index of the last occurrence of . is 1. But indexOf will return the first index, so indexOf('.') would return 0. We could reverse the array, because finding the first index in the reversed array is equivalent to finding the last index in the array, but this is a bit tricky, as we also need to inverse the index, because the the index in the reversed array is generally not the same as the index in the original array.
Let's apply a little trick: Instead of finding the last index of ., let's find the index of the first element that is not . and subtract one. But the function indexOf does not allow us to pass a not x. However, we can solve this problem by modifying our map function slightly: Instead of table map (_(move)), let's map to table map (_(move) == '.'). Now, we need to find the index of the first false value and subtract one.
The whole solution would look as follows:
def player(player: Player): Unit = {
printGrid(table)
println(s"${player.name} it is your turn. Choose a column 0-7")
val move = readInt
val freeRows = table map (_(move) == '.')
val indexOfLastFreeRow = (freeRows indexOf false) - 1
if (indexOfLastFreeRow == -1) throw new IllegalArgumentException(s"Column $move is already full")
else table(indexOfLastFreeRow)(move) = player.symbol
}
for (turn <- 1 to 32) {
player(Player("Player 1", 'X'))
player(Player("Player 2", 'O'))
}
case class Player(name: String, symbol: Char)
I hope this answer helps. As a final note: I would still not work with plain arrays, but instead define a class Table and Column and let them provide the functionality to add elements, and an appropriate toString to print the table.
case class Table(columns: List[Column]) {
override def toString = (for (i <- 0 until columns.head.values.length) yield {
columns map (_.values(i).toString) reduceLeft (_ + " " + _)
}) reduceLeft (_ + System.lineSeparator + _)
def add(entry: Char, columnIndex: Int): Table = {
val (prefix, column :: suffix) = columns.splitAt(columnIndex)
Table(prefix ++ (column.add(entry) :: suffix))
}
}
object Table {
val EmptyEntry = '.'
def empty(numberOfColumns: Int, numberOfRows: Int): Table =
Table(List.fill(numberOfColumns)(Column.empty(numberOfRows)))
}
case class Column(values: List[Char]) {
def isFull: Boolean = !values.contains(Table.EmptyEntry)
def add(entry: Char): Column = {
if (isFull) this
else {
val (empty, filled) = values.partition(_ == Table.EmptyEntry)
Column(empty.dropRight(1) ++ (entry :: filled))
}
}
}
object Column {
def empty(numberOfRows: Int): Column =
Column(List.fill(numberOfRows)(Table.EmptyEntry))
}

What's the most elegant way to find word pairs in a text with Scala?

Given a list of word pairs
val terms = ("word1a", "word1b") :: ("word2a", "word2b") :: ... :: Nil
What's the most elegant way in Scala to test if at least one of the pairs occur in a text? The test should terminate as quick as possible when it hits the first match. How would you solve that?
EDIT: To be more precise I want to know if both words of a pair appear somewhere (not necessarily in order) in the text. If that's the case for one of the pairs in the list the method should return true. It's not necessary that the matched pair is returned, neither it's important if more than one pair matches.
scala> val text = Set("blah1", "word2b", "blah2", "word2a")
text: scala.collection.immutable.Set[java.lang.String] = Set(blah1, word2b, blah2)
scala> terms.exists{case (a,b) => text(a) && text(b)}
res12: Boolean = true
EDIT: Note that using a set to represent the tokens in the text makes the lookup from the contains much more efficient. You wouldn't want to use something sequential like a List for that.
EDIT 2: Updated for clarification in requirement!
EDIT 3: changed contains to apply per the suggestion in the comment
EDIT - seems like the ambiguous wording of your question means I answered a different question:
Because you are essentially asking for either of the pair; you might as well flatten all these into one big set.
val words = (Set.empty[String] /: terms) { case (s, (w1, w2)) => s + w1 + w2 }
Then you are just asking whether any of these exist in the text:
text.split("\\s") exists words
This is fast because we can use the structure of a Set to lookup quickly whether the word is contained in the text; it terminates early due to the "exists":
scala> val text = "blah1 blah2 word2b"
text: java.lang.String = blah1 blah2 word2b
In the case that your text is very long, you may wish to Stream it, so that the next word to test is lazily computed, rather than split the String into substrings up-front:
scala> val Word = """\s*(.*)""".r
Word: scala.util.matching.Regex = \s*(.*)
scala> def strmWds(text : String) : Stream[String] = text match {
| case Word(nxt) => val (word, rest) = nxt span (_ != ' '); word #:: strmWds(rest)
| case _ => Stream.empty
| }
strmWds: (text: String)Stream[String]
Now you can:
scala> strmWds(text) exists words
res4: Boolean = true
scala> text.split("\\s") exists words
res3: Boolean = true
I'm assuming that both elements of the pair have to appear in the text, but it doesn't matter where, and it doesn't matter which pair appears.
I'm not sure this is the most elegant, but it's not bad, and it's fairly fast if you expect that the text probably has the words (and thus you don't need to read all of it), and if you can generate an iterator that will give you the words one at a time:
case class WordPair(one: String, two: String) {
private[this] var found_one, found_two = false
def check(s: String): Boolean = {
if (s==one) found_one = true
if (s==two) found_two == true
found_one && found_two
}
def reset {
found_one = false
found_two = false
}
}
val wordpairlist = terms.map { case (w1,w2) => WordPair(w1,w2) }
// May need to wordpairlist.foreach(_.reset) first, if you do this on multiple texts
text.iterator.exists(w => wordpairlist.exists(_.check(w)))
You could further improve things by putting all the terms in a set, and not even bothering to check the wordpairlist unless the word from the text was in that set.
If you mean that the words have to occur next to each other in order, you then should change check to
def check(s: String) = {
if (found_one && s==two) found_two = true
else if (s==one) { found_one = true; found_two = false }
else found_two = false
found_one && found_two
}