Exception Index Out Of Bounds when using deleteCharAt with StringBuilder - scala

I want to delete duplicate char in my string with using deleteCharAt method but it's giving me an exception
def removeDuplicate(str: String): String={
var sb = new StringBuilder(str);
for(i <-0 until str.length ){
for(z <- i+1 until str.length ){
if(str(i)==str(z)){
sb.deleteCharAt(i);
}
}
}
return sb.toString;}

As the error suggested, when i = str.length - 1, z = str.length which is out of index. And there is another problem with your code, which is you are removing character from a string while looping through it. It is usually not a good practice to do so since the string length changes every time you delete a character, and you need to keep track of that. An alternative and more intuitive way is to build a new string and add only characters that have not appeared in the new string:
def removeDuplicate(str: String):String = {
var sb = ""
for(i <- 0 until str.length) {
if(! (sb contains str(i))) {
sb += str(i)
}}
sb
}
scala> removeDuplicate("abbccssds")
res13: String = abcsd
scala> removeDuplicate("abbeedsff")
res14: String = abedsf
scala> removeDuplicate("abbeedsffgg")
res15: String = abedsfg

Related

Iterate and trim string based on condition in spark Scala

I have dataframe 'regexDf' like below
id,regex
1,(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)
2,(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)
If the length of the regex exceeds some max length for example 50, then i want to remove the last text token in splitted regex string separated by '|' for the exceeded id. In the above data frame, id 1 length is more than 50 so that last tokens 'text4(.)' and 'text6(.)' from each splitted regex string should be removed. Even after removing that also length of the regex string in id 1 still more than 50, so that again last tokens 'text3(.)' and 'text5(.)' should be removed.so the final dataframe will be
id,regex
1,(.*)text1(.*)text2(.*)|(.*)text2(.*)
2,(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)
I am able to trim the last tokens using the following code
val reducedStr = regex.split("|").foldLeft(List[String]()) {
(regexStr,eachRegex) => {
regexStr :+ eachRegex.replaceAll("\\(\\.\\*\\)\\w+\\(\\.\\*\\)$", "\\(\\.\\*\\)")
}
}.mkString("|")
I tried using while loop to check the length and trim the text tokens in iteration which is not working. Also i want to avoid using var and while loop. Is it possible to achieve without while loop.
val optimizeRegexString = udf((regex: String) => {
if(regex.length >= 50) {
var len = regex.length;
var resultStr: String = ""
while(len >= maxLength) {
val reducedStr = regex.split("|").foldLeft(List[String]()) {
(regexStr,eachRegex) => {
regexStr :+ eachRegex
.replaceAll("\\(\\.\\*\\)\\w+\\(\\.\\*\\)$", "\\(\\.\\*\\)")
}
}.mkString("|")
len = reducedStr.length
resultStr = reducedStr
}
resultStr
} else {
regex
}
})
regexDf.withColumn("optimizedRegex", optimizeRegexString(col("regex")))
As per SathiyanS and Pasha suggestion, I changed the recursive method as function.
def optimizeRegex(regexDf: DataFrame): DataFrame = {
val shrinkString= (s: String) => {
if (s.length > 50) {
val extractedString: String = shrinkString(s.split("\\|")
.map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"))
extractedString
}
else s
}
def shrinkUdf = udf((regex: String) => shrinkString(regex))
regexDf.withColumn("regexString", shrinkUdf(col("regex")))
}
Now i am getting exception as "recursive value shrinkString needs type"
Error:(145, 39) recursive value shrinkString needs type
val extractedString: String = shrinkString(s.split("\\|")
.map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"));
Recursion:
def shrink(s: String): String = {
if (s.length > 50)
shrink(s.split("\\|").map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"))
else s
}
Looks like issues with function calling, some additional info.
Can be called as static function:
object ShrinkContainer {
def shrink(s: String): String = {
if (s.length > 50)
shrink(s.split("\\|").map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"))
else s
}
}
Link with dataframe:
def shrinkUdf = udf((regex: String) => ShrinkContainer.shrink(regex))
df.withColumn("regex", shrinkUdf(col("regex"))).show(truncate = false)
Drawbacks: Just basic example (approach) provided. Some edge cases (if regexp does not contains "text", if too many parts separated by "|", for ex. 100; etc.) have to be resolved by author of question, for avoid infinite recursion loop.
This is how I would do it.
First, a function for removing the last token from a regex:
def deleteLastToken(s: String): String =
s.replaceFirst("""[^)]+\(\.\*\)$""", "")
Then, a function that shortens the entire regex string by deleting the last token from all the |-separated fields:
def shorten(r: String) = {
val items = r.split("[|]").toSeq
val shortenedItems = items.map(deleteLastToken)
shortenedItems.mkString("|")
}
Then, for a given input regex string, create the stream of all the shortened strings you get by applying the shorten function repeatedly. This is an infinite stream, but it's lazily evaluated, so only as few elements as required will be actually computed:
val regex = "(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)"
val allShortened = Stream.iterate(regex)(shorten)
Finally, you can treat allShortened as any other sequence. For solving our problem, you can drop all elements while they don't satisfy the length requirement, and then keep only the first one of the remaining ones:
val result = allShortened.dropWhile(_.length > 50).head
You can see all the intermediate values by printing some elements of allShortened:
allShortened.take(10).foreach(println)
// Prints:
// (.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)
// (.*)text1(.*)text2(.*)text3(.*)|(.*)text2(.*)text5(.*)
// (.*)text1(.*)text2(.*)|(.*)text2(.*)
// (.*)text1(.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
Just to add to #pasha701 answer. Here is the solution that works in spark.
val df = sc.parallelize(Seq((1,"(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)"),(2,"(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)"))).toDF("ID", "regex")
df.show()
//prints
+---+------------------------------------------------------------------------+
|ID |regex |
+---+------------------------------------------------------------------------+
|1 |(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)|
|2 |(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*) |
+---+------------------------------------------------------------------------+
Now you can use the #pasha701 shrink function using udf
val shrink: String => String = (s: String) => if (s.length > 50) shrink(s.split("\\|").map(s => s.substring(0,s.lastIndexOf("text"))).mkString("|")) else s
def shrinkUdf = udf((regex: String) => shrink(regex))
df.withColumn("regex", shrinkUdf(col("regex"))).show(truncate = false)
//prints
+---+---------------------------------------------+
|ID |regex |
+---+---------------------------------------------+
|1 |(.*)text1(.*)text2(.*)|(.*)text2(.*) |
|2 |(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)|
+---+---------------------------------------------+

Scala - Count number of adjacent repeated chars in String

I have this function that counts the number of adjacent repeated chars inside a String.
def adjacentCount( s: String ) : Int = {
var cont = 0
for (a <- s.sliding(2)) {
if (a(0) == a(1)) cont = cont + 1
}
cont
}
}
But I'm supposed to create a function that does exactly the same, but using only immutable variables or loop instructions, in a "purely" functional way.
You can just use the count method on the Iterator:
val s = "aabcddd"
s.sliding(2).count(p => p(0) == p(1))
// res1: Int = 3

scala regex multiple integers

I have the following string that I would like to match on: 1-10 employees.
Here is my regex statement val regex = ("\\d+").r
The problem I have is Im trying to find a way to extract the matched data and determine which value returned is bigger.
Here is what IM doing to process it
def setMinAndMaxValue(currentCompany: CurrentCompany, matchIterator: Iterator[Regex.Match]): CurrentCompany = {
var max = 0
println(s"matchIterator - $matchIterator")
matchIterator.collect {
case regex(s: String) => println("found string")
case regex(IntConv(x)) =>
println("regex case")
if (x > max) max = x
}
val (minVal, maxVal) = rangesForMaxValue(max)
val newDetails = currentCompany.details.copy(minSize = Some(minVal), maxSize = Some(maxVal))
currentCompany.copy(details = newDetails)
}
object IntConv {
def unapply(s : String) : Option[Int] = Try {
Some(s.toInt)
}.toOption.flatten
}
I thought I was confused by your original question, then you clarified it with code and now I have no idea what you're trying to do.
To extract numbers from a string, try this.
val re = """(\d+)""".r
val nums = re.findAllIn(string_with_numbers).map(_.toInt).toList
Then you can just nums.min, and nums.max, and whatever number processing you need.

Indentation preserving string interpolation in scala

I was wondering if there is any way of preserving indentation while doing string interpolation in scala. Essentially, I was wondering if I could interpose my own StringContext. Macros would address this problem, but I'd like to wait until they are official.
This is what I want:
val x = "line1 \nline2"
val str = s"> ${x}"
str should evaluate to
> line1
line2
Answering my question, and converting Daniel Sobral's very helpful answer to code. Hopefully it will be of use to someone else with the same issue. I have not used implicit classes since I am still pre-2.10.
Usage:
import Indenter._ and use string interpolation like so e" $foo "
Example
import Indenter._
object Ex extends App {
override def main(args: Array[String]) {
val name = "Foo"
val fields = "x: Int\ny:String\nz:Double"
// fields has several lines. All of them will be indented by the same amount.
print (e"""
class $name {
${fields}
}
""")
}
}
should print
class Foo
x: Int
y: String
z: Double
Here's the custom indenting context.
class IndentStringContext(sc: StringContext) {
def e(args: Any*):String = {
val sb = new StringBuilder()
for ((s, a) <- sc.parts zip args) {
sb append s
val ind = getindent(s)
if (ind.size > 0) {
sb append a.toString().replaceAll("\n", "\n" + ind)
} else {
sb append a.toString()
}
}
if (sc.parts.size > args.size)
sb append sc.parts.last
sb.toString()
}
// get white indent after the last new line, if any
def getindent(str: String): String = {
val lastnl = str.lastIndexOf("\n")
if (lastnl == -1) ""
else {
val ind = str.substring(lastnl + 1)
if (ind.trim.isEmpty) ind // ind is all whitespace. Use this
else ""
}
}
}
object Indenter {
// top level implicit defs allowed only in 2.10 and above
implicit def toISC(sc: StringContext) = new IndentStringContext(sc)
}
You can write your own interpolators, and you can shadow the standard interpolators with your own. Now, I have no idea what's the semantic behind your example, so I'm not even going to try.
Check out my presentation on Scala 2.10 on either Slideshare or SpeakerDeck, as they contain examples on all the manners in which you can write/override interpolators. Starts on slide 40 (for now -- the presentation might be updated until 2.10 is finally out).
For Anybody seeking a post 2.10 answer:
object Interpolators {
implicit class Regex(sc: StringContext) {
def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}
implicit class IndentHelper(val sc: StringContext) extends AnyVal {
import sc._
def process = StringContext.treatEscapes _
def ind(args: Any*): String = {
checkLengths(args)
parts.zipAll(args, "", "").foldLeft("") {
case (a, (part, arg)) =>
val processed = process(part)
val prefix = processed.split("\n").last match {
case r"""([\s|]+)$d.*""" => d
case _ => ""
}
val argLn = arg.toString
.split("\n")
val len = argLn.length
// Todo: Fix newline bugs
val indented = argLn.zipWithIndex.map {
case (s, i) =>
val res = if (i < 1) { s } else { prefix + s }
if (i == len - 1) { res } else { res + "\n" }
}.mkString
a + processed + indented
}
}
}
}
Here's a short solution. Full code and tests on Scastie. There are two versions there, a plain indented interpolator, but also a slightly more complex indentedWithStripMargin interpolator which allows it to be a bit more readable:
assert(indentedWithStripMargin"""abc
|123456${"foo\nbar"}-${"Line1\nLine2"}""" == s"""|abc
|123456foo
| bar-Line1
| Line2""".stripMargin)
Here is the core function:
def indentedHelper(parts: List[String], args: List[String]): String = {
// In string interpolation, there is always one more string than argument
assert(parts.size == 1+args.size)
(parts, args) match {
// The simple case is where there is one part (and therefore zero args). In that case,
// we just return the string as-is:
case (part0 :: Nil, Nil) => part0
// If there is more than one part, we can simply take the first two parts and the first arg,
// merge them together into one part, and then use recursion. In other words, we rewrite
// indented"A ${10/10} B ${2} C ${3} D ${4} E"
// as
// indented"A 1 B ${2} C ${3} D ${4} E"
// and then we can rely on recursion to rewrite that further as:
// indented"A 1 B 2 C ${3} D ${4} E"
// then:
// indented"A 1 B 2 C 3 D ${4} E"
// then:
// indented"A 1 B 2 C 3 D 4 E"
case (part0 :: part1 :: tailparts, arg0 :: tailargs) => {
// If 'arg0' has newlines in it, we will need to insert spaces. To decide how many spaces,
// we count many characters after after the last newline in 'part0'. If there is no
// newline, then we just take the length of 'part0':
val i = part0.reverse.indexOf('\n')
val n = if (i == -1)
part0.size // if no newlines in part0, we just take its length
else
i // the number of characters after the last newline
// After every newline in arg0, we must insert 'n' spaces:
val arg0WithPadding = arg0.replaceAll("\n", "\n" + " "*n)
val mergeTwoPartsAndOneArg = part0 + arg0WithPadding + part1
// recurse:
indentedHelper(mergeTwoPartsAndOneArg :: tailparts, tailargs)
}
// The two cases above are exhaustive, but the compiler thinks otherwise, hence we need
// to add this dummy.
case _ => ???
}
}

Does using an implicate type caste change the type of the variable?

I am getting an error from a piece of code. I will only show one line of code, at least the line I believe is causing it from the error report. It is:
b = temp(temp.length-1).toInt; //temp is an ArrayBuffer[String]
the error is:
For input string: "z"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:449)
at java.lang.Integer.parseInt(Integer.java:499)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:231)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at Driver$.stringParse$1(Driver.scala:59)
at Driver$.main(Driver.scala:86)
at Driver.main(Driver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
From what I can tell, it is causing an issue with this. Since it is immutable, I know it cannot be changed. But I am not sure. I am basing this off of
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
Once I do something like my lone of code above, does it change the whole object? Temp is an ArrayBuffer[String]. So I am trying to access a string representation of a number, and convert it. But in doing so, does this change what it is and keep me from doing anything?
If you believe putting all my code will be helpful, let me know to edit it, but it is a lot and I don't want to annoy anybody. I appreciate anybody who can help me understand this!
*EDIT: MY CODE (Only here to help me figure out my error, but not necessary to look at. I just can't see where its giving me this error).
The point of my code is to parse either one of those strings at the top. It puts together and into one string and then reads the other two symbols to go with it. It parses str just fine, but it finds a problem when it reads "z" in str2, and "y" in str3. As one can see, the problem is with the second string after the and when recursing. Its also important to note that the string has to be in that form. So it can only be parsed like "(and x (and y z))", but not in any other way that makes it more convenient.
val str = "(and x y)";
val str2 = "(and x (and y z))"; //case with expression on th right side
val str3 = "(and (and x y) z)"; //case with expression ont he left side
var i = 0; //just counter used to loop through the finished parsed array to make a list
//var position = 0; //this is used for when passing it in the parser to start off at zero
var hold = new ArrayBuffer[String]();//finished array should be here
def stringParse ( exp: String, expreshHolder: ArrayBuffer[String] ): ArrayBuffer[String] = { //takes two arguments, string, arraybuffer
var b = 0; //position of where in the expression String I am currently in
var temp = expreshHolder; //holder of expressions without parens
var arrayCounter = 0;
if(temp.length == 0)
b = 0;
else {
b = temp(temp.length-1).toInt;
temp.remove(temp.length-1);
arrayCounter = temp.length;
} //this sets the position of wherever the string was read last plus removes that check from the end of the ArrayBuffer
//just counts to make sure an empty spot in the array is there to put in the strings
if(exp(b) == '(') {
b = b + 1;
while(exp(b) == ' '){b = b + 1;} //point of this is to just skip any spaces between paren and start of expression type
if(exp(b) == 'a') {
//first create the 'and', 'or', 'not' expression types to figure out
temp += exp(b).toString;
b = b+1;
temp(arrayCounter) = temp(arrayCounter) + exp(b).toString; //concatenates the second letter
b = b+1;
temp(arrayCounter) = temp(arrayCounter) + exp(b).toString; //concatenates the last letter for the expression type
//arrayCounter+=1;
//this part now takes the symbols and puts them in an array
b+=1;
while(exp(b) == ' ') {b+=1;} //just skips any spaces until it reaches the FIRST symbol
if(exp(b) == '(') {
temp += b.toString;
temp = stringParse(exp, temp);
b = temp(temp.length-1).toInt;
temp.remove(temp.length-1);
arrayCounter = temp.length-1
} else {
temp += exp(b).toString;
arrayCounter+=1; b+=1; }
while(exp(b) == ' ') {b+=1;} //just skips any spaces until it reaches the SECOND symbol
if(exp(b) == '(') {
temp += b.toString;
temp = stringParse(exp, temp);
b = temp(temp.length-1).toInt;
temp.remove(temp.length-1);
arrayCounter = temp.length-1
} else {
temp += exp(b).toString;
arrayCounter+=1;
b+=1;
}
temp;
} else { var fail = new ArrayBuffer[String]; fail +="failed"; fail;}
}
hold = stringParse(str2, ho );
for(test <- hold) println(test);
What does temp contain? Your code assumes that it contains Strings that can be converted to Ints, but it seems that you have a String "z" in there instead. That would produce the error:
scala> "z".toInt
java.lang.NumberFormatException: For input string: "z"
...
Here's a recreation of what temp might look like:
val temp = ArrayBuffer("1", "2", "z")
temp(temp.length-1).toInt //java.lang.NumberFormatException: For input string: "z"
So you need to figure out why some String "z" is getting into temp.
EDIT:
So you're adding "expressions" to temp (temp += exp(b).toString) and also adding indices (temp += b.toString). Then you're assuming that temp only holds indices (b = temp(temp.length-1).toInt). You need to decide what temp is for, and then use it exclusively for that purpose.
No, toInt doesn't change the object, it takes the object as an argument and returns an integer, leaving the object as is.
I can't understand you question because I can`t understand you code.
Let's try to simplify you code.
First of all: you have some expressions with expression type and list of operands:
scala> :paste
// Entering paste mode (ctrl-D to finish)
abstract sealed class Operand
case class IdentOperand(name: String) extends Operand { override def toString(): String = name }
case class IntOperand(i: Int) extends Operand { override def toString(): String = i.toString() }
case class ExprOperand(expr: Expression) extends Operand { override def toString(): String = expr.toString() }
case class Expression(exprType: String, operands: Seq[Operand]) {
override def toString(): String = operands.mkString("(" + exprType + " ", " ", ")")
}
// Exiting paste mode, now interpreting.
defined class Operand
defined class IdentOperand
defined class IntOperand
defined class ExprOperand
defined class Expression
scala> Expression("and", Seq(IdentOperand("x"), IdentOperand("y")))
res0: Expression = (and x y)
scala> Expression("and", Seq(IdentOperand("x"), ExprOperand(Expression("and", Seq(IdentOperand("y"), IdentOperand("z"))))))
res1: Expression = (and x (and y z))
scala> Expression("and", Seq(ExprOperand(Expression("and", Seq(IdentOperand("x"), IdentOperand("y")))), IdentOperand("z")))
res2: Expression = (and (and x y) z)
Now we have to parse strings to expressions of this type:
scala> import scala.util.parsing.combinator._
import scala.util.parsing.combinator._
scala> object ExspessionParser extends JavaTokenParsers {
| override def skipWhitespace = false;
|
| def parseExpr(e: String) = parseAll(expr, e)
|
| def expr: Parser[Expression] = "(" ~> exprType ~ operands <~ ")" ^^ { case exprType ~ operands => Expression(exprType, operands) }
| def operands: Parser[Seq[Operand]] = rep(" "~>operand)
| def exprType: Parser[String] = "and" | "not" | "or"
| def operand: Parser[Operand] = variable | exprOperand
| def exprOperand: Parser[ExprOperand] = expr ^^ (ExprOperand( _ ))
| def variable: Parser[IdentOperand] = ident ^^ (IdentOperand( _ ))
| }
defined module ExspessionParser
scala> ExspessionParser.parseExpr("(and x y)")
res3: ExspessionParser.ParseResult[Expression] = [1.10] parsed: (and x y)
scala> ExspessionParser.parseExpr("(and x (and y z))")
res4: ExspessionParser.ParseResult[Expression] = [1.18] parsed: (and x (and y z))
scala> ExspessionParser.parseExpr("(and (and x y) z)")
res5: ExspessionParser.ParseResult[Expression] = [1.18] parsed: (and (and x y) z)
And now (as far as I understand your code) we have to replace string operands (x, y, z) with integer values. Let's add these 2 methods to Expression class:
def replaceOperands(ints: Seq[Int]): Expression = replaceOperandsInner(ints)._2
private def replaceOperandsInner(ints: Seq[Int]): (Seq[Int], Expression) = {
var remainInts = ints
val replacedOperands = operands.collect{
case n: IdentOperand =>
val replacement = remainInts.head
remainInts = remainInts.tail
IntOperand(replacement)
case ExprOperand(e) =>
val (remain, replaced) = e.replaceOperandsInner(remainInts)
remainInts = remain
ExprOperand(replaced)
}
(remainInts, Expression(exprType, replacedOperands))
}
And now we can do this:
scala> ExspessionParser.parseExpr("(and (and x y) z)").get.replaceOperands(Seq(1, 2, 3))
res7: Expression = (and (and 1 2) 3)
And if you have integer values in string form, then you can just convert them first:
scala> Seq("1", "2", "3") map { _.toInt }
res8: Seq[Int] = List(1, 2, 3)