How can i split string to list using scala - scala

i has string str = "one,two,(three,four), five"
I want to split this string into list, like this:
List[String] = ("one", "two", "(three, four)", "five")?
i have no idea for this.
thanks

We can try matching on the pattern \(.*?\)|[^, ]+:
val str = "one,two,(three,four), five"
val re = """\(.*?\)|[^, ]+""".r
for(m <- re.findAllIn(str)) println(m)
This prints:
one
two
(three,four)
five
This regex pattern eagerly first tries to find a (...) term. That failing, it matches any content other than comma or space, to consume one CSV term at a time. This trick avoids the problem of matching across commas inside (...).

Related

Scala split a 2 words which aren't seperated

I have a corpus with words like, applefruit which isn't separated by any separator which I would like to do. As this can be a non-linear problem. I would like to pass a custom dictionary to split only when a word from the dictionary is a substring of a word in the corpus.
if my dictionary has only apple and 3 words in corpus aaplefruit, applebananafruit, bananafruit. The output should look like apple , fruit apple, bananafruit, bananafruit.
Notice I am not splitting bananafruit, the goal is to make the process faster by just splitting on the text provided in the dictionary. I am using scala 2.x.
You can use regular expressions with split:
scala> "foobarfoobazfoofoobatbat".split("(?<=foo)|(?=foo)")
res27: Array[String] = Array(foo, bar, foo, baz, foo, foo, batbat)
Or if your dictionary (and/or strings to split) has more than one word ...
val rx = wordList.map { w => s"(?<=$w)|(?=$w)" }.mkString("|")
val result: List[String] = toSplit.flatMap(_.split(rx))
You could do a regex find and replace on the following pattern:
(?=apple)|(?<=apple)
and then replace with comma surrounded by spaces on both sides. We could try:
val input = "bananaapplefruit"
val output = input.replaceAll("(?=apple)|(?<=apple)", " , ")
println(output) // banana , apple , fruit

Scala: Convert a string to string array with and without split given that all special characters except "(" an ")" are allowed

I have an array
val a = "((x1,x2),(y1,y2),(z1,z2))"
I want to parse this into a scala array
val arr = Array(("x1","x2"),("y1","y2"),("z1","z2"))
Is there a way of directly doing this with an expr() equivalent ?
If not how would one do this using split
Note : x1 x2 x3 etc are strings and can contain special characters so key would be to use () delimiters to parse data -
Code I munged from Dici and Bogdan Vakulenko
val x2 = a.getString(1).trim.split("[\()]").grouped(2).map(x=>x(0).trim).toArray
val x3 = x2.drop(1) // first grouping is always null dont know why
var jmap = new java.util.HashMap[String, String]()
for (i<-x3)
{
val index = i.lastIndexOf(",")
val fv = i.slice(0,index)
val lv = i.substring(index+1).trim
jmap.put(fv,lv)
}
This is still suceptible to "," in the second string -
Actually, I think regex are the most convenient way to solve this.
val a = "((x1,x2),(y1,y2),(z1,z2))"
val regex = "(\\((\\w+),(\\w+)\\))".r
println(
regex.findAllMatchIn(a)
.map(matcher => (matcher.group(2), matcher.group(3)))
.toList
)
Note that I made some assumptions about the format:
no whitespaces in the string (the regex could easily be updated to fix this if needed)
always tuples of two elements, never more
empty string not valid as a tuple element
only alphanumeric characters allowed (this also would be easy to fix)
val a = "((x1,x2),(y1,y2),(z1,z2))"
a.replaceAll("[\\(\\) ]","")
.split(",")
.sliding(2)
.map(x=>(x(0),x(1)))
.toArray

Efficientley counting occurrences of each character in a file - scala

I am new to Scala, I want the fastest way to get a map of count of occurrences for each character in a text file, how can I do that?(I used groupBy but I believe it is too slow)
I think that groupBy() is probably pretty efficient, but it simply collects the elements, which means that counting them requires a 2nd traversal.
To count all Chars in a single traversal you'd probably need something like this.
val tally = Array.ofDim[Long](127)
io.Source.fromFile("someFile.txt").foreach(tally(_) += 1)
Array was used for its fast indexing. The index is the character that was counted.
tally('e') //res0: Long = 74
tally('x') //res1: Long = 1
You can do the following:
Read the file first:
val lines = Source.fromFile("/Users/Al/.bash_profile").getLines.toSeq
You can then write a method that takes the List of lines read and counts the occurence for a given character:
def getCharCount(c: Char, lines: Seq[String]) = {
lines.foldLeft(0){(acc, elem) =>
elem.toSeq.count(_ == c) + acc
}
}

What's an elegant way to append String elements in a Scala Set?

I have an immutable Scala Set[String] containing a few strings, say {"a", "b", "c"}. I want to basically append them into a String that looks like "\"a\",\"b\",\"c\"". I know I can make a var resultStr and use a for-loop to get the result. But since Scala encourages using an immutable val over val and also has so many operations defined over the set, I was wondering if there was a more elegant way to achieve the result.
Thanks.
If you want to make a String from a scala collection, then you can simply use mkString(sep). Eg,
Set("a", "b", "c").mkString(",")
You can use map to surround them with quotes and then mkString to join them up with comma as the separator.
s.map(x => s""""$x"""").mkString(",")
or
s.map(x => "\"" + x + "\"").mkString(",")
(The treble quote is a Scala way to avoid having to escape anything from within, and the dollar is string interpolation.)

scala - one line convert string split to vals

I saw that following answer: Scala split string to tuple, but in the question the OP is asking for a string to a List. I would like to take a string, split it by some character, and convert it to a tuple so they can be saved as vals:
val (a,b,c) = "A.B.C".split(".").<toTupleMagic>
Is this possible? This would be a conversion from an Array[String] to a Tuple3 of (String,String,String)
It is unnecessary:
val Array(a, b, c) = "A.B.C".split('.')
Note that I converted the parameter to split from String to Char: if you pass a String, it is treated as a regex pattern, and . matches anything (so you'll get an array of empty strings back).
If you truly want to convert it to tuple, you can use Shapeless.