How to split a string by a delimiter from the right?
e.g.
scala> "hello there how are you?".rightSplit(" ", 1)
res0: Array[java.lang.String] = Array(hello there how are, you?)
Python has a .rsplit() method which is what I'm after in Scala:
In [1]: "hello there how are you?".rsplit(" ", 1)
Out[1]: ['hello there how are', 'you?']
I think the simplest solution is to search for the index position and then split based on that. For example:
scala> val msg = "hello there how are you?"
msg: String = hello there how are you?
scala> msg splitAt (msg lastIndexOf ' ')
res1: (String, String) = (hello there how are," you?")
And since someone remarked on lastIndexOf returning -1, that's perfectly fine with the solution:
scala> val msg = "AstringWithoutSpaces"
msg: String = AstringWithoutSpaces
scala> msg splitAt (msg lastIndexOf ' ')
res0: (String, String) = ("",AstringWithoutSpaces)
You could use plain old regular expressions:
scala> val LastSpace = " (?=[^ ]+$)"
LastSpace: String = " (?=[^ ]+$)"
scala> "hello there how are you?".split(LastSpace)
res0: Array[String] = Array(hello there how are, you?)
(?=[^ ]+$) says that we'll look ahead (?=) for a group of non-space ([^ ]) characters with at least 1 character length. Finally this space followed by such sequence has to be at the end of the string: $.
This solution wont break if there is only one token:
scala> "hello".split(LastSpace)
res1: Array[String] = Array(hello)
scala> val sl = "hello there how are you?".split(" ").reverse.toList
sl: List[String] = List(you?, are, how, there, hello)
scala> val sr = (sl.head :: (sl.tail.reverse.mkString(" ") :: Nil)).reverse
sr: List[String] = List(hello there how are, you?)
Related
I have a function which should take in a long string and separate it into a list of strings where each list element is a sentence of the article. I am going to achieve this by splitting on space and then grouping the elements from that split according to the tokens which end with a dot:
def getSentences(article: String): List[String] = {
val separatedBySpace = article
.map((c: Char) => if (c == '\n') ' ' else c)
.split(" ")
val splitAt: List[Int] = Range(0, separatedBySpace.size)
.filter(i => endsWithDot(separatedBySpace(0))).toList
// TODO
}
I have separated the string on space, and I've found each index that I want to group the list on. But how do I now turn separatedBySpace into a list of sentences based on splitAt?
Example of how it should work:
article = "I like donuts. I like cats."
result = List("I like donuts.", "I like cats.")
PS: Yes, I now that my algorithm for splitting the article into sentences has flaws, I just want to make a quick naive method to get the job done.
I ended up solving this by using recursion:
def getSentenceTokens(article: String): List[List[String]] = {
val separatedBySpace: List[String] = article
.replace('\n', ' ')
.replaceAll(" +", " ") // regex
.split(" ")
.toList
val splitAt: List[Int] = separatedBySpace.indices
.filter(i => ( i > 0 && endsWithDot(separatedBySpace(i - 1)) ) || i == 0)
.toList
groupBySentenceTokens(separatedBySpace, splitAt, List())
}
def groupBySentenceTokens(tokens: List[String], splitAt: List[Int], sentences: List[List[String]]): List[List[String]] = {
if (splitAt.size <= 1) {
if (splitAt.size == 1) {
sentences :+ tokens.slice(splitAt.head, tokens.size)
} else {
sentences
}
}
else groupBySentenceTokens(tokens, splitAt.tail, sentences :+ tokens.slice(splitAt.head, splitAt.tail.head))
}
val s: String = """I like donuts. I like cats
This is amazing"""
s.split("\\.|\n").map(_.trim).toList
//result: List[String] = List("I like donuts", "I like cats", "This is amazing")
To include the dots in the sentences:
val (a, b, _) = s.replace("\n", " ").split(" ")
.foldLeft((List.empty[String], List.empty[String], "")){
case ((temp, result, finalStr), word) =>
if (word.endsWith(".")) {
(List.empty[String], result ++ List(s"$finalStr${(temp ++ List(word)).mkString(" ")}"), "")
} else {
(temp ++ List(word), result, finalStr)
}
}
val result = b ++ List(a.mkString(" ").trim)
//result = List("I like donuts.", "I like cats.", "This is amazing")
I'm trying to parse a command line argument for an sbt InputTask using SBT Parsers (http://www.scala-sbt.org/0.13/docs/Parsing-Input.html) but I'm failing to write a parser to match the following pseudo-regex:
\w+(-n|--dry-run)\w+
Here's the most sensible way of expressing this that I can think of. The results here should be Some(true) if the input string matches.
import sbt.complete.Parser
import sbt.complete.DefaultParsers._
val dryRunOptions: Parser[String] = OptSpace ~> ("-n" | "--dry-run") <~ OptSpace
val dryRunParser: Parser[Boolean] = flag(dryRunOptions)
Parser(dryRunParser)("-n").result
Parser(dryRunParser)(" -n").result
Parser(dryRunParser)("-n ").result
Parser(dryRunParser)(" -n ").result
Parser(dryRunParser)("--dry-run").result
Parser(dryRunParser)(" --dry-run").result
Parser(dryRunParser)("--dry-run ").result
Parser(dryRunParser)(" --dry-run ").result
Unfortunately, this does not match any of these cases!
res0: Option[Boolean] = None
res1: Option[Boolean] = None
res2: Option[Boolean] = None
res3: Option[Boolean] = None
res4: Option[Boolean] = None
res5: Option[Boolean] = None
res6: Option[Boolean] = None
res7: Option[Boolean] = None
I can get this to match several of the cases with a couple of variations on this but never all of them. Any help appreciated!
You are checking correctness of your parser in the wrong way. It looks like that in this case you should use .resultEmpty.isValid instead of .result, like in tests here. Then it occurs that your code works fine:
import sbt.complete.Parser
import sbt.complete.DefaultParsers._
val dryRunOptions: Parser[String] = OptSpace ~> ("-n" | "--dry-run") <~ OptSpace
val dryRunParser: Parser[Boolean] = flag(dryRunOptions)
val test = Seq("-n", " -n", "-n ", " -n ",
"--dry-run", " --dry-run", "--dry-run ", " --dry-run ")
test.foldLeft(true)((b:Boolean, input:String) =>
b && Parser(dryRunParser)(input).resultEmpty.isValid)
And the result:
res0: Boolean = true
I have a requirement to concatenate two potentially empty address lines into one (with a space in between the two lines), but I need it to return a None if both address lines are None (this field is going into an Option[String] variable). The following command gets me what I want in terms of the concatenation:
Seq(myobj.address1, myobj.address2).flatten.mkString(" ")
But that gives me an empty string instead of a None in case address1 and address2 are both None.
This converts a single string to Option, converting it to None if it's either null or an empty-trimmed string:
(kudos to #Miroslav Machura for this simpler version)
Option(x).filter(_.trim.nonEmpty)
Alternative version, using collect:
Option(x).collect { case x if x.trim.nonEmpty => x }
Assuming:
val list1 = List(Some("aaaa"), Some("bbbb"))
val list2 = List(None, None)
Using plain Scala:
scala> Option(list1).map(_.flatten).filter(_.nonEmpty).map(_.mkString(" "))
res38: Option[String] = Some(aaaa bbbb)
scala> Option(list2).map(_.flatten).filter(_.nonEmpty).map(_.mkString(" "))
res39: Option[String] = None
Or using scalaz:
import scalaz._; import Scalaz._
scala> list1.flatten.toNel.map(_.toList.mkString(" "))
res35: Option[String] = Some(aaaa bbbb)
scala> list2.flatten.toNel.map(_.toList.mkString(" "))
res36: Option[String] = None
Well, In Scala there is Option[ T ] type which is intended to eliminate various run-time problems due to nulls.
So... Here is how you use Options, So basically a Option[ T ] can have one of the two types of values - Some[ T ] or None
// A nice string
var niceStr = "I am a nice String"
// A nice String option
var noceStrOption: Option[ String ] = Some( niceStr )
// A None option
var noneStrOption: Option[ String ] = None
Now coming to your part of problem:
// lets say both of your myobj.address1 and myobj.address2 were normal Strings... then you would not have needed to flatten them... this would have worked..
var yourString = Seq(myobj.address1, myobj.address2).mkString(" ")
// But since both of them were Option[ String ] you had to flatten the Sequence[ Option[ String ] ] to become a Sequence[ String ]
var yourString = Seq(myobj.address1, myobj.address2).flatten.mkString(" ")
//So... what really happens when you flatten a Sequence[ Option[ String ] ] ?
// Lets say we have Sequence[ Option [ String ] ], like this
var seqOfStringOptions = Seq( Some( "dsf" ), None, Some( "sdf" ) )
print( seqOfStringOptions )
// List( Some(dsf), None, Some(sdf))
//Now... lets flatten it out...
var flatSeqOfStrings = seqOfStringOptions.flatten
print( flatSeqOfStrings )
// List( dsf, sdf )
// So... basically all those Option[ String ] which were None are ignored and only Some[ String ] are converted to Strings.
// So... that means if both address1 and address2 were None... your flattened list would be empty.
// Now what happens when we create a String out of an empty list of Strings...
var emptyStringList: List[ String ] = List()
var stringFromEmptyList = emptyStringList.mkString( " " )
print( stringFromEmptyList )
// ""
// So... you get an empty String
// Which means we are sure that yourString will always be a String... though it can be empty (ie - "").
// Now that we are sure that yourString will alwyas be a String, we can use pattern matching to get out Option[ String ] .
// Getting an appropriate Option for yourString
var yourRequiredOption: Option[ String ] = yourString match {
// In case yourString is "" give None.
case "" => None
// If case your string is not "" give Some[ yourString ]
case someStringVal => Some( someStringVal )
}
You might also use the reduce method here:
val mySequenceOfOptions = Seq(myAddress1, myAddress2, ...)
mySequenceOfOptions.reduce[Option[String]] {
case(Some(soFar), Some(next)) => Some(soFar + " " + next)
case(None, next) => next
case(soFar, None) => soFar
}
Here's a function that should solve the original problem.
def mergeAddresses(addr1: Option[String],
addr2: Option[String]): Option[String] = {
val str = s"${addr1.getOrElse("")} ${addr2.getOrElse("")}"
if (str.trim.isEmpty) None else Some(str)
}
the answer from #dk14 is actually incorrect/incomplete because if list2 has a Some("") it will not yield a None because the filter() evaluates to an empty list instead of a None ( ScalaFiddle link)
val list2 = List(None, None, Some(""))
// this yields Some()
println(Option(list2).map(_.flatten).filter(_.nonEmpty).map(_.mkString(" ")))
but it's close. you just need to ensure that empty string is converted to a None so we combine it with #juanmirocks answer (ScalaFiddle link):
val list1 = List(Some("aaaa"), Some("bbbb"))
val list2 = List(None, None, Some(""))
// yields Some(aaaa bbbbb)
println(Option(list1.map(_.collect { case x if x.trim.nonEmpty => x }))
.map(_.flatten).filter(_.nonEmpty).map(_.mkString(" ")))
// yields None
println(Option(list2.map(_.collect { case x if x.trim.nonEmpty => x }))
.map(_.flatten).filter(_.nonEmpty).map(_.mkString(" ")))
I was searching a kind of helper function like below in the standard library but did not find yet, so I defined in the meantime:
def string_to_Option(x: String): Option[String] = {
if (x.nonEmpty)
Some(x)
else
None
}
with the help of the above you can then:
import scala.util.chaining.scalaUtilChainingOps
object TEST123 {
def main(args: Array[String]): Unit = {
val address1 = ""
val address2 = ""
val result =
Seq(
address1 pipe string_to_Option,
address2 pipe string_to_Option
).flatten.mkString(" ") pipe string_to_Option
println(s"The result is «${result}»")
// prints out: The result is «None»
}
}
With Scala 2.13:
Option.unless(address.isEmpty)(address)
For example:
val address = "foo"
Option.unless(address.isEmpty)(address) // Some("foo")
val address = ""
Option.unless(address.isEmpty)(address) // None
implicit class EmptyToNone(s: String):
def toOption: Option[String] = if (s.isEmpty) None else Some(s)
Example:
scala> "".toOption
val res0: Option[String] = None
scala> "foo".toOption
val res1: Option[String] = Some(foo)
(tested with Scala 3.2.2)
For example I want to encrypt each token of a sentence and reduce them to a final encrypted text:
def convert(str: String) = {
str + ":"
}
val tokens = "Hi this is a text".split("\\ ").toList
val reduce = tokens.reduce((a, b) => convert(a) + convert(b))
println(reduce)
// result is `Hi:this::is::a::text:`
val fold = tokens.fold("") {
case (a, b) => convert(a) + convert(b)
}
println(fold)
// result is `:Hi::this::is::a::text:`
val scan = tokens.scan("") {
case (a, b) => convert(a) + convert(b)
}
println(scan)
// result is List(, :Hi:, :Hi::this:, :Hi::this::is:, :Hi::this::is::a:, :Hi::this::is::a::text:)
Assume that convert is an encryption function. So each token should encrypt only once not twice. but fold and reduce and scan reencrypt the encrypted token. I want this desired result Hi:this:is:a:text:
Well if you want to encrypt each Token individually, map should work.
val tokens = "Hi this is a text".split("\\ ").toList
val encrypted = tokens.map(convert).mkString
println(encrypted) //prints Hi:this:is:a:text:
def convert(str: String) = {
str + ":"
}
Edit: If you want to use a fold:
val encrypted = tokens.foldLeft("")((result, token) => result + convert(token))
One-liner specialised at this very example,
"Hi this is a text" split " " mkString("",":",":")
Or
val tokens = "Hi this is a text" split " "
val sep = ":"
val encrypted = tokens mkString("",sep,sep)
Note that fold or reduce will operate on two operands in every step. However you want to encrypt each of the tokens -- which is a unary operand. Therefore first you should do a map and then either a fold or a reduce:
tokens map(convert)
Reduce / Fold:
scala> tokens.map(convert).fold("")(_ + _)
res10: String = Hi:this:is:a:text:
scala> tokens.map(convert)reduce(_ + _)
res11: String = Hi:this:is:a:text:
Infact you can simply use mkString which makes it even more concise:
scala> tokens.map(convert).mkString
res12: String = Hi:this:is:a:text:
Also you can do the conversion in parallel too (using par ):
scala> tokens.par.map(convert).mkString
res13: String = Hi:this:is:a:text:
scala> tokens.par.map(convert)reduce(_ + _)
res14: String = Hi:this:is:a:text:
I think your main problem is how reduce and fold works. You can learn from other answer
As for you question, fold can help:
"Hi this is a text".split("\\ ").fold("") { (a, b) => a + convert(b) }
Here is a version with the code cleaned up and unnecessary conversions removed:
def convert(str: String) = str + :
val tokens = "Hi this is a text" split " "
val encrypted = (tokens map convert) mkString " "
mkString could be seen as a specialized Version of reduce (or fold) for Strings.
If for some reason, you don't want to use mkString the code would look like this:
def convert(str: String) = str + :
val tokens = "Hi this is a text" split " "
val encrypted = (tokens map convert) reduce (_ + _)
Or shortend with fold
val encrypted = "Hi this is a text".split(" ").foldLeft ("") { case (accum, str) => accum + convert(str) }
does Scala have an API to do a "chomp" on a String?
Preferrably, I would like to convert a string "abcd \n" to "abcd"
Thanks
Ajay
There's java.lang.String.trim(), but that also removes leading whitespace. There's also RichString.stripLineEnd, but that only removes \n and \r.
If you don't want to use Apache Commons Lang, you can roll your own, along these lines.
scala> def chomp(text: String) = text.reverse.dropWhile(" \n\r".contains(_)).reverse
chomp: (text: String)String
scala> "[" + chomp(" a b cd\r \n") + "]"
res28: java.lang.String = [ a b cd]
There is in fact an out of the box support for chomp1
scala> val input = "abcd\n"
input: java.lang.String =
abcd
scala> "[%s]".format(input)
res2: String =
[abcd
]
scala> val chomped = input.stripLineEnd
chomped: String = abcd
scala> "[%s]".format(chomped)
res3: String = [abcd]
1 for some definition of chomp; really same answer as sepp2k but showing how to use it on String
Why not use Apache Commons Lang and the StringUtils.chomp() function ? One of the great things about Scala is that you can leverage off existing Java libraries.