scala replace function with camma as "," - scala

The below Input i have to replace last comma (,) with "," between two colons(:)
println(input)
//[level:1,File:one,three,Flag:NA][level:1,File:two,Flag:NA]
println(input.replace(",", "\",\""))
getting result as:
//[level:1","File:one","three","Flag:NA][level:1","File:two","Flag:NA]
expected result should be
[level:1","File:one,three","Flag:NA][level:1","File:two","Flag:NA]
Kindly help me.

val str1 = "[level:1,File:one,three,Flag:NA][level:1,File:two,Flag:NA]"
val regex1 = raw"(,)(\w+:)".r
val matches = regex1.findAllMatchIn(str1)
val str2 = matches.foldLeft(str1)({ case (str, m) =>
str.replaceFirst(m.group(0), "\",\"" + m.group(2))
})
// str2: String = [level:1","File:one,three","Flag:NA][level:1","File:two","Flag:NA]

Related

How to use if-else condition in Scala's Filter?

I have an ArrayBuffer with data in the following format: period_name:character varying(15) year:bigint. The data in it represents column name of a table and its datatype. My requirement is to extract the column name period and the datatype, just character varying excluding substring from "(" till ")" and then send all the elements to a ListBuffer. I came up with the following logic:
for(i <- receivedGpData) {
gpTypes = i.split("\\:")
if(gpTypes(1).contains("(")) {
gpColType = gpTypes(1).substring(0, gpTypes(1).indexOf("("))
prepList += gpTypes(0) + " " + gpColType
} else {
prepList += gpTypes(0) + " " + gpTypes(1)
}
}
The above code is working but I am trying to implement the same using Scala's Map and Filter functions. What I don't understand is how to use the if-else condition in the Scala Filter after the condition:
var reList = receivedGpData.map(element => element.split(":"))
.filter{ x => x(1).contains("(")
}
Could anyone let me know how can I implement the same code in for-loop using Scala's map & filter functions ?
val receivedGpData = Array("bla:bla(1)","bla2:cat")
val res = receivedGpData
.map(_.split(":"))
.map(s=>(s(0),s(1).takeWhile(_!='(')))
.map(s => s"${s._1} ${s._2}").toList
println(res)
Using regex:
val p = "(\\w+):([.[^(]]*)(\\(.*\\))?".r
val res = data.map{case p(x,y,_)=>x+" "+y}
In Scala REPL:
scala> val data = Array("period_name:character varying(15)","year:bigint")
data: Array[String] = Array(period_name:character varying(15), year:bigint)
scala> val p = "(\\w+):([.[^(]]*)(\\(.*\\))?".r
p: scala.util.matching.Regex = (\w+):([.[^(]]*)(\(.*\))?
scala> val res = data.map{case p(x,y,_)=>x+" "+y}
res: Array[String] = Array(period_name character varying, year bigint)

for loop into map method with Spark using Scala

Hi I want to use a "for" into a map method in scala.
How can I do it?
For example here for each line read I want to generate a random word :
val rdd = file.map(line => (line,{
val chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
val word = new String;
val res = new String;
val rnd = new Random;
val len = 4 + rnd.nextInt((6-4)+1);
for(i <- 1 to len){
val char = chars(rnd.nextInt(51));
word.concat(char.toString);
}
word;
}))
My current output is :
Array[(String, String)] = Array((1,""), (2,""), (3,""), (4,""), (5,""), (6,""), (7,""), (8,""), (9,""), (10,""), (11,""), (12,""), (13,""), (14,""), (15,""), (16,""), (17,""), (18,""), (19,""), (20,""), (21,""), (22,""), (23,""), (24,""), (25,""), (26,""), (27,""), (28,""), (29,""), (30,""), (31,""), (32,""), (33,""), (34,""), (35,""), (36,""), (37,""), (38,""), (39,""), (40,""), (41,""), (42,""), (43,""), (44,""), (45,""), (46,""), (47,""), (48,""), (49,""), (50,""), (51,""), (52,""), (53,""), (54,""), (55,""), (56,""), (57,""), (58,""), (59,""), (60,""), (61,""), (62,""), (63,""), (64,""), (65,""), (66,""), (67,""), (68,""), (69,""), (70,""), (71,""), (72,""), (73,""), (74,""), (75,""), (76,""), (77,""), (78,""), (79,""), (80,""), (81,""), (82,""), (83,""), (84,""), (85,""), (86...
I don't know why the right side is empty.
There's no need for var here. It's a one liner
Seq.fill(len)(chars(rnd.nextInt(51))).mkString
This will create a sequence of Char of length len by repeatedly calling chars(rnd.nextInt(51)), then makes it into a String.
Thus you'll get something like this :
import org.apache.spark.rdd.RDD
import scala.util.Random
val chars = ('a' to 'z') ++ ('A' to 'Z')
val rdd = file.map(line => {
val randomWord = {
val rnd = new Random
val len = 4 + rnd.nextInt((6 - 4) + 1)
Seq.fill(len)(chars(rnd.nextInt(chars.length-1))).mkString
}
(line, randomWord)
})
word.concat doesn't modify word but return a new String, you can make word a variable and add new string to it:
var word = new String
....
for {
...
word += char
...
}

How to use reduce or fold to encrypt a list of string tokens?

For example I want to encrypt each token of a sentence and reduce them to a final encrypted text:
def convert(str: String) = {
str + ":"
}
val tokens = "Hi this is a text".split("\\ ").toList
val reduce = tokens.reduce((a, b) => convert(a) + convert(b))
println(reduce)
// result is `Hi:this::is::a::text:`
val fold = tokens.fold("") {
case (a, b) => convert(a) + convert(b)
}
println(fold)
// result is `:Hi::this::is::a::text:`
val scan = tokens.scan("") {
case (a, b) => convert(a) + convert(b)
}
println(scan)
// result is List(, :Hi:, :Hi::this:, :Hi::this::is:, :Hi::this::is::a:, :Hi::this::is::a::text:)
Assume that convert is an encryption function. So each token should encrypt only once not twice. but fold and reduce and scan reencrypt the encrypted token. I want this desired result Hi:this:is:a:text:
Well if you want to encrypt each Token individually, map should work.
val tokens = "Hi this is a text".split("\\ ").toList
val encrypted = tokens.map(convert).mkString
println(encrypted) //prints Hi:this:is:a:text:
def convert(str: String) = {
str + ":"
}
Edit: If you want to use a fold:
val encrypted = tokens.foldLeft("")((result, token) => result + convert(token))
One-liner specialised at this very example,
"Hi this is a text" split " " mkString("",":",":")
Or
val tokens = "Hi this is a text" split " "
val sep = ":"
val encrypted = tokens mkString("",sep,sep)
Note that fold or reduce will operate on two operands in every step. However you want to encrypt each of the tokens -- which is a unary operand. Therefore first you should do a map and then either a fold or a reduce:
tokens map(convert)
Reduce / Fold:
scala> tokens.map(convert).fold("")(_ + _)
res10: String = Hi:this:is:a:text:
scala> tokens.map(convert)reduce(_ + _)
res11: String = Hi:this:is:a:text:
Infact you can simply use mkString which makes it even more concise:
scala> tokens.map(convert).mkString
res12: String = Hi:this:is:a:text:
Also you can do the conversion in parallel too (using par ):
scala> tokens.par.map(convert).mkString
res13: String = Hi:this:is:a:text:
scala> tokens.par.map(convert)reduce(_ + _)
res14: String = Hi:this:is:a:text:
I think your main problem is how reduce and fold works. You can learn from other answer
As for you question, fold can help:
"Hi this is a text".split("\\ ").fold("") { (a, b) => a + convert(b) }
Here is a version with the code cleaned up and unnecessary conversions removed:
def convert(str: String) = str + :
val tokens = "Hi this is a text" split " "
val encrypted = (tokens map convert) mkString " "
mkString could be seen as a specialized Version of reduce (or fold) for Strings.
If for some reason, you don't want to use mkString the code would look like this:
def convert(str: String) = str + :
val tokens = "Hi this is a text" split " "
val encrypted = (tokens map convert) reduce (_ + _)
Or shortend with fold
val encrypted = "Hi this is a text".split(" ").foldLeft ("") { case (accum, str) => accum + convert(str) }

How to split a string by delimiter from the right?

How to split a string by a delimiter from the right?
e.g.
scala> "hello there how are you?".rightSplit(" ", 1)
res0: Array[java.lang.String] = Array(hello there how are, you?)
Python has a .rsplit() method which is what I'm after in Scala:
In [1]: "hello there how are you?".rsplit(" ", 1)
Out[1]: ['hello there how are', 'you?']
I think the simplest solution is to search for the index position and then split based on that. For example:
scala> val msg = "hello there how are you?"
msg: String = hello there how are you?
scala> msg splitAt (msg lastIndexOf ' ')
res1: (String, String) = (hello there how are," you?")
And since someone remarked on lastIndexOf returning -1, that's perfectly fine with the solution:
scala> val msg = "AstringWithoutSpaces"
msg: String = AstringWithoutSpaces
scala> msg splitAt (msg lastIndexOf ' ')
res0: (String, String) = ("",AstringWithoutSpaces)
You could use plain old regular expressions:
scala> val LastSpace = " (?=[^ ]+$)"
LastSpace: String = " (?=[^ ]+$)"
scala> "hello there how are you?".split(LastSpace)
res0: Array[String] = Array(hello there how are, you?)
(?=[^ ]+$) says that we'll look ahead (?=) for a group of non-space ([^ ]) characters with at least 1 character length. Finally this space followed by such sequence has to be at the end of the string: $.
This solution wont break if there is only one token:
scala> "hello".split(LastSpace)
res1: Array[String] = Array(hello)
scala> val sl = "hello there how are you?".split(" ").reverse.toList
sl: List[String] = List(you?, are, how, there, hello)
scala> val sr = (sl.head :: (sl.tail.reverse.mkString(" ") :: Nil)).reverse
sr: List[String] = List(hello there how are, you?)

Matrix to CSV in Scala

Writing an MxN matrix ( M rows, N columns ) to a CSV file:
My first attempt, using map, works, but creates N references to the stringbuffer. It also writes an unnecessary comma at the end of each row.
def matrix2csv(matrix:List[List[Double]], filename: String ) = {
val pw = new PrintWriter( filename )
val COMMA = ","
matrix.map( row => {
val sbuf = new StringBuffer
row.map( elt => sbuf.append( elt ).append( COMMA ))
pw.println(sbuf)
})
pw.flush
pw.close
}
My second attempt, using reduce, also works but looks clunky:
def matrix2csv(matrix:List[List[Double]], filename: String ) = {
val pw = new PrintWriter( filename )
val COMMA = ","
matrix.map( row => {
val sbuf = new StringBuffer
val last = row.reduce( (a,b)=> {
sbuf.append(a).append(COMMA)
b
})
sbuf.append(last)
pw.println(sbuf)
})
pw.flush
pw.close
}
Any suggestions on a more concise and idiomatic approach ? Thanks.
You can obtain the string representation easily:
val csvString = matrix.map{ _.mkString(", ") }.mkString("\n")
Then you just need to dump it in a file.
Pay attention to end-lines (here "\n" ), they vary according to the platform.
Idiomatically, you're misusing map by using it to perform side-effecting operations. You should use foreach for this instead.
This is what it could look like if you used a foreach and replaced your StringBuffer boilerplate with a call to the mkString method:
def matrix2csv(matrix:List[List[Double]], filename: String) {
val pw = new PrintWriter(filename)
val COMMA = ","
matrix.foreach { row => pw.println(row mkString COMMA) }
pw.flush
pw.close
}
Note that mkString uses a StringBuilder (a non-thread-safe StringBuffer, which is fine here).