Error in code Regex - scala

I am trying to find only the word contains 3 letters(e is below example) in the word
need to find using regex.
val inputString = """edepak,suman,employdee,eeeee,eme,ev"""
and i have written the below code.
val numberPatteren = "([a-z]*e){3,}".r
but i am getting the below output which is not as expected.
employdee,eeeee
but the output should be only -- employdee
can you please help me on this.

You can achieve that simply by doing the following
scala> inputString.split(",").filter(word => word.count(_ == 'e') == 3).mkString(",")
//res16: String = employdee
If you want to use regex, you can do as below
scala> val numberPatteren = "[a-df-zA-DF-Z0-9]".r
//numberPatteren: scala.util.matching.Regex = [a-df-zA-DF-Z0-9]
scala> inputString.split(",").filter(numberPatteren.replaceAllIn(_, "").length == 3).mkString(",")
//res0: String = employdee

Related

extracting sub string using pattern matching in scala

I want to extract domain name from uri.
For example, input to the regular expression may be of one of the below types
test.net
https://www.test.net
https://test.net
http://www.test.net
http://test.net
in all the cases the input should return test.net
Below is the code in implemented for my purpose
val re = "([http[s]?://[w{3}\\.]?]+)(.*)".r
But I didn't get expected result
below is my output
val re(prefix, domain) = "https://www.test.net"
prefix: String = https://www.t
domain: String = est.net
what is problem with my regular expression and how can I fix it?
what is problem with my regular expression and how can I fix it?
You are using a character class
[http.?://(www.)?]
This means:
either an h
or a t
or a t
or a .
or a ?
or a :
or a /
or a /
or a (
or a w
or a w
or a w
or a .
or a )
or a ?
It does not include an s, so it will not match https://.
It is not clear to me why you are using a character class here, nor why you are using duplicate characters in the class.
Ideally, you shouldn't try to parse URIs yourself; someone else has already done the hard work. You could, for example, use the java.net.URI class:
import java.net.URI
val u1 = new URI("test.net")
u1.getHost
// res: String = null
val u2 = new URI("https://www.test.net")
u2.getHost
// res: String = www.test.net
val u3 = new URI("https://test.net")
u3.getHost
// res: String = test.net
val u4 = new URI("http://www.test.net")
u4.getHost
// res: String = www.test.net
val u5 = new URI("http://test.net")
u5.getHost
// res: String = test.net
Unfortunately, as you can see, what you want to achieve does not actually comply with the official URI syntax.
If you can fix that, then you can use java.net.URI. Otherwise, you will need to go back to your old solution and parse the URI yourself:
val re = "(?>https?://)?(?>www.)?([^/?#]*)".r
val re(domain1) = "test.net"
//=> domain1: String = test.net
val re(domain2) = "https://www.test.net"
//=> domain2: String = test.net
val re(domain3) = "https://test.net"
//=> domain3: String = test.net
val re(domain4) = "http://www.test.net"
//=> domain4: String = test.net
val re(domain5) = "http://test.net"
//=> domain5: String = test.net

Converting a String to a Map

Given a String : {'Name':'Bond','Job':'Agent','LastEntry':'15/10/2015 13:00'}
I want to parse it into a Map[String,String], I already tried this answer but it doesn't work when the character : is inside the parsed value. Same thing with the ' character, it seems to break every JSON Mappers...
Thanks for any help.
Let
val s0 = "{'Name':'Bond','Job':'Agent','LastEntry':'15/10/2015 13:00'}"
val s = s0.stripPrefix("{").stripSuffix("}")
Then
(for (e <- s.split(",") ; xs = e.split(":",2)) yield xs(0) -> xs(1)).toMap
Here we split each key-value by the first occurrence of ":". Further this is a strong assumption, in that the key does not contain any ":".
You can use the familiar jackson-module-scala that can do this in much better scale.
For example:
val src = "{'Name':'Bond','Job':'Agent','LastEntry':'15/10/2015 13:00'}"
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
val myMap = mapper.readValue[Map[String,String]](src)

scala read file,and each line save to a variable?

scalaresult.txt
0~250::250~500::500~750::750~1000::1000~1250
481::827::750::256::1000
scala code
val filename = "/home/user/scalaresult.txt"
for ( (line,index) <- Source.fromFile(filename).getLines().zipWithIndex){
println(line)
println(index)
}
//val step_x = "0~250::250~500::500~750::750~1000::1000~1250"
//val step_y = "481::827::750::256::1000"
Seq("java", "-jar", "/home/user/birt2.jar" , step_x , step_y , "BarChart").lines
I have a file: scalaresult.txt
I need to save first line (index(0)) to step_x
and the second line (index(1)) to step_y
How to do this ? Please guide me Thank you.
This is not the optimal solution, but you can try the following: (I'm not a scala expert yet! :P)
scala> val it = Source.fromFile(filename).getLines().toList
it: List[String] = List(0~250::250~500::500~750::750~1000::1000~1250, "481::827::750::256::1000 ")
scala> it(1)
res7: String = "481::827::750::256::1000 "
scala> it(0)
res8: String = 0~250::250~500::500~750::750~1000::1000~1250
If all you are trying to do it take the two lines from the file and inserting them into the sequence, the indexer on the list will do the trick. Mind you, it's an O(n) operation on list, so if there were a lot of lines, it wouldn't be the best approach.
val filename = "/home/user/scalaresult.txt"
val lines = Source.fromFile(filename).getLines()
val seq = Seq("java", "-jar", "/home/user/birt2.jar" , lines(0) , lines(1), "BarChart")

Spark Data Loadling Issue

I am getting IndexOutOfBoundException while doing following operation in spark-shell
val input = sc.textFile("demo.txt")
b.collect
Both of above functions are working fine .
val out = input.map(_.split(",")).map(r => r(1))
Getting OutOfBoundException for above line
demo.txt is looks like this:(Header :- Name,Gender,age)
Danial,,14
,Male,18
Hema,,
With pig same file is working without any issue!!
You can try this out yourself, just start the Scala console and enter your sample lines.
scala> "Danial,,14".split(",")
res0: Array[String] = Array(Danial, "", 14)
scala> ",Male,18".split(",")
res1: Array[String] = Array("", Male, 18)
scala> "Hema,,".split(",")
res2: Array[String] = Array(Hema)
So ooops, the last line doesn't work. Add the number of expected columns to split:
scala> "Hema,,".split(",", 3)
res3: Array[String] = Array(Hema, "", "")
or even better, write a real parser. String.split isn't suitable for production code.

Scala: String Chomp

does Scala have an API to do a "chomp" on a String?
Preferrably, I would like to convert a string "abcd \n" to "abcd"
Thanks
Ajay
There's java.lang.String.trim(), but that also removes leading whitespace. There's also RichString.stripLineEnd, but that only removes \n and \r.
If you don't want to use Apache Commons Lang, you can roll your own, along these lines.
scala> def chomp(text: String) = text.reverse.dropWhile(" \n\r".contains(_)).reverse
chomp: (text: String)String
scala> "[" + chomp(" a b cd\r \n") + "]"
res28: java.lang.String = [ a b cd]
There is in fact an out of the box support for chomp1
scala> val input = "abcd\n"
input: java.lang.String =
abcd
scala> "[%s]".format(input)
res2: String =
[abcd
]
scala> val chomped = input.stripLineEnd
chomped: String = abcd
scala> "[%s]".format(chomped)
res3: String = [abcd]
1 for some definition of chomp; really same answer as sepp2k but showing how to use it on String
Why not use Apache Commons Lang and the StringUtils.chomp() function ? One of the great things about Scala is that you can leverage off existing Java libraries.