Regex to extract part of string between parenthesis - scala

I have below string and I want to extract only List((asdf, asdf), (fff,qqq)) from the string, line has many other characters before and after the part I want to extract.
some garbage string PARAMS=List((foo, bar), (foo1, bar1)) some garbage string
I have tried these regex
(?:PARAMS)=(List\((.*?)\))
(?:PARAMS)=(List\(([^)]+)\))
but it gives me below output in group(1):
List((foo, bar)

regex .*List\((.*)\).* works
Using Scala regex and pattern matching together and then split with any of ( , ) and then group
regex contains extractors
val r = """.*List\((.*)\).*""".r
pattern matching using extractor in regex
val result = str match {
case r(value) => value
case _ => ""
}
Then split using any of ( or , or ) and then group
result.split("""[(|,|)]""").filterNot(s => s.isEmpty || s.trim.isEmpty)
.grouped(2)
.toList
.map(pair => (pair(0), pair(1))).toList
Scala REPL
scala> val str = """some garbage string PARAMS=List((foo, bar), (foo1, bar1)) some garbage string"""
str: String = "some garbage string PARAMS=List((foo, bar), (foo1, bar1)) some garbage string"
scala> val r = """.*List\((.*)\).*""".r
r: util.matching.Regex = .*List\((.*)\).*
scala> val result = str match {
case r(value) => value
case _ => ""
}
result: String = "(foo, bar), (foo1, bar1)"
scala> result.split("""[(|,|)]""").filterNot(s => s.isEmpty || s.trim.isEmpty).grouped(2).toList.map(pair => (pair(0), pair(1))).toList
res46: List[(String, String)] = List(("foo", " bar"), ("foo1", " bar1"))

Related

how to extract part of string that did not match pattern

I want to extract part of string that did not match pattern
My pattern matching condition is sting should be of length 5 and should contain only N or Y.
Ex:
NYYYY => valid
NY => Invalid , length is invalid
NYYSY => Invalid. character at position 3 is invalid
If string is invalid then I want to find out which particular character did not match. Ex : In NYYSY 4th character did not match.
I tried with pattern matching in scala
val Pattern = "([NY]{5})".r
paramList match {
case Pattern(c) => true
case _ => false
}
Returns a String indicating validation status.
def validate(str :String, len :Int, cs :Seq[Char]) :String = {
val checkC = cs.toSet
val errs = str.zipAll(Range(0,len), 1.toChar, -1).flatMap{ case (c,x) =>
if (x < 0) Some("too long")
else if (checkC(c)) None
else if (c == 1) Some("too short")
else Some(s"'$c' at index $x")
}
str + ": " + (if (errs.isEmpty) "valid" else errs.distinct.mkString(", "))
}
testing:
validate("NTYYYNN", 4, "NY") //res0: String = NTYYYNN: 'T' at index 1, too long
validate("NYC", 7, "NY") //res1: String = NYC: 'C' at index 2, too short
validate("YNYNY", 5, "NY") //res2: String = YNYNY: valid
Here's one approach that returns a list of (Char, Int) tuples of invalid characters and their corresponding positions in a given string:
def checkString(validChars: List[Char], validLength: Int, s: String) = {
val Pattern = s"([${validChars.mkString}]{$validLength})".r
s match {
case Pattern(_) => Vector.empty[(Char, Int)]
case s =>
val invalidList = s.zipWithIndex.filter{case (c, _) => !validChars.contains(c)}
if (invalidList.nonEmpty) invalidList else Vector(('\u0000', -1))
}
}
List("NYYYY", "NY", "NNSYYTN").map(checkString(List('N', 'Y'), 5, _))
// res1: List(Vector(), Vector((?,-1)), Vector((S,2), (T,5)))
As shown above, an empty list represents a valid string and a list of (null-char, -1) means the string has valid characters but invalid length.
Here is one suggestion which might suit your needs:
"NYYSY".split("(?<=[^NY])|(?=[^NY])").foreach(println)
NYY
S
Y
This solution splits the input string at any point when either the preceding or following character is not a Y or a N. This places each island of valid and invalid characters as separate rows in the output.
You can use additional regular expressions to detect the specific issue:
val Pattern = "([NY]{5})".r
val TooLong = "([NY]{5})(.+)".r
val WrongChar = "([NY]*)([^NY].*)".r
paramList match {
case Pattern(c) => // Good
case TooLong(head, rest) => // Extra character(s) in sequence
case WrongChar(head, rest) => // Wrong character in sequence
case _ => // Too short
}
You can work out the index of the error using head.length and the failing character is rest.head.
You can achieve this with pattern matching each characters of the string without using any sort of regex or complex string manipulation.
def check(value: String): Unit = {
if(value.length!=5) println(s"$value length is invalid.")
else value.foldLeft((0, Seq[String]())){
case (r, char) =>
char match {
case 'Y' | 'N' => r._1+1 -> r._2
case c # _ => r._1+1 -> {r._2 ++ List(s"Invalid character `$c` in position ${r._1}")}
}
}._2 match {
case Nil => println(s"$value is valid.")
case errors: List[String] => println(s"$value is invalid - [${errors.mkString(", ")}]")
}
}
check("NYCNBNY")
NYNYNCC length is invalid.
check("NYCNB")
NYCNB is invalid - [Invalid character `C` in position 2, Invalid character `B` in position 4]
check("NYNNY")
NYNNY is valid.

list with case class scala

I have the following issue I have the following list as input
val input:List[Item]=List(FRDVE,12
SDED,13
prog-d,11
PROG-D,15
a-prog-d,17)
with
case class Item(Name:String,Number:Int)
The aim is to find only first line where name contains either prog-d or PROG-D
so for this case the expected output is:
val output="prog-d"
I wrote the following code :
def getName(input:List[Item]):Option[String]={
val matchLine=input.filter(Name.contains("prog-d"))
matchLine match {
case head::tail => Some(matchLine.head.split(",")(0))
case isEmpty => None
}
}
so here I am getting an error saying that the Name doesn't exist and I don't know how to put different possibilities in the contains : here it should basically be : Name.contains("prog-d" ||"PROG-D")
Any recommendations please
Thanks a lot
You can use collectFirst:
input.collectFirst { case Item(s, _) if s.equalsIgnoreCase("prog-d") => s }
This avoids both map and filter, so that only the minimal necessary amount of entries is inspected.
Full code:
case class Item(name: String, number: Int)
val input: List[Item] = List(
Item("FRDVE", 12),
Item("SDED", 13),
Item("prog-d", 11),
Item("PROG-D", 15),
Item("a-prog-d", 17),
)
val output = input.collectFirst {
case Item(s, _) if s.equalsIgnoreCase("prog-d") => s
}
println(output)
You can also use find where this function returns the option of first elements whichever matches your condition (stops iterating on remaining elements).
case class Item(Name:String,Number:Int)
val input = List(Item("FRDVE",12), Item("SDED",13), Item("prog-d",11), Item("PROG-D",15), Item("a-prog-d",17))
input.find(_.Name.equalsIgnoreCase("prog-d")) match {
case Some(item) => item.Name
case None => "" //your default string
}
you can use item.Name == "prog-d" || item.Name == "PROG-D" or item.Name.equalsIgnoreCase("prog-d")
scala> val input = List(Item("FRDVE",12), Item("SDED",13), Item("prog-d",11), Item("PROG-D",15), Item("a-prog-d",17))
input: List[Item] = List(Item(FRDVE,12), Item(SDED,13), Item(prog-d,11), Item(PROG-D,15), Item(a-prog-d,17))
scala> input.filter(item => item.Name.equalsIgnoreCase("prog-d")).map(_.Name)
res1: List[String] = List(prog-d, PROG-D)
If you want the first match, do headOption and play with it based on what data you want.
scala> val output = input.filter(item => item.Name.equalsIgnoreCase("prog-d")).headOption
output: Option[Item] = Some(Item(prog-d,11))
scala> val outputName = input.filter(item => item.Name.equalsIgnoreCase("prog-d")).headOption.map(_.Name)
outputName: Option[String] = Some(prog-d)
NOTE: (.head is not safe to use because List().head will explode when list is empty)
scala> List.empty[Item].head
java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:428)
at scala.collection.immutable.Nil$.head(List.scala:425)
... 28 elided

Addition of numbers recursively in Scala

In this Scala code I'm trying to analyze a string that contains a sum (such as 12+3+5) and return the result (20). I'm using regex to extract the first digit and parse the trail to be added recursively. My issue is that since the regex returns a String, I cannot add up the numbers. Any ideas?
object TestRecursive extends App {
val plus = """(\w*)\+(\w*)""".r
println(parse("12+3+5"))
def parse(str: String) : String = str match {
// sum
case plus(head, trail) => parse(head) + parse(trail)
case _ => str
}
}
You might want to use the parser combinators for an application like this.
"""(\w*)\+(\w*)""".r also matches "+" or "23+" or "4 +5" // but captures it only in the first group.
what you could do might be
scala> val numbers = "[+-]?\\d+"
numbers: String = [+-]?\d+
^
scala> numbers.r.findAllIn("1+2-3+42").map(_.toInt).reduce(_ + _)
res4: Int = 42
scala> numbers.r.findAllIn("12+3+5").map(_.toInt).reduce(_ + _)
res5: Int = 20

Converting MongoCursor to JSON

Using Casbah, I query Mongo.
val mongoClient = MongoClient("localhost", 27017)
val db = mongoClient("test")
val coll = db("test")
val results: MongoCursor = coll.find(builder)
var matchedDocuments = List[DBObject]()
for(result <- results) {
matchedDocuments = matchedDocuments :+ result
}
Then, I convert the List[DBObject] into JSON via:
val jsonString: String = buildJsonString(matchedDocuments)
Is there a better way to convert from "results" (MongoCursor) to JSON (JsValue)?
private def buildJsonString(list: List[DBObject]): Option[String] = {
def go(list: List[DBObject], json: String): Option[String] = list match {
case Nil => Some(json)
case x :: xs if(json == "") => go(xs, x.toString)
case x :: xs => go(xs, json + "," + x.toString)
case _ => None
}
go(list, "")
}
Assuming you want implicit conversion (like in flavian's answer), the easiest way to join the elements of your list with commas is:
private implicit def buildJsonString(list: List[DBObject]): String =
list.mkString(",")
Which is basically the answer given in Scala: join an iterable of strings
If you want to include the square brackets to properly construct a JSON array you'd just change it to:
list.mkString("[", ",", "]") // punctuation madness
However if you'd actually like to get to Play JsValue elements as you seem to indicate in the original question, then you could do:
list.map { x => Json.parse(x.toString) }
Which should produce a List[JsValue] instead of a String. However, if you're just going to convert it back to a string again when sending a response, then it's an unneeded step.

In Scala, how to find an elemein in CSV by a pair of key values?

For example, from a following file:
Name,Surname,E-mail
John,Smith,john.smith#hotmail.com
Nancy,Smith,nancy.smith#gmail.com
Jane,Doe,jane.doe#aol.com
John,Doe,john.doe#yahoo.com
how do I get e-mail address of John Doe?
I use the following code now, but can specify only one key field now:
val src = Source.fromFile(file)
val iter = src.getLines().drop(1).map(_.split(","))
var quote = ""
iter.find( _(1) == "Doe" ) foreach (a => println(a(2)))
src.close()
I've tried writing "iter.find( _(0) == "John" && _(1) == "Doe" )", but this raises an error saying that only one parameter is expected (enclosing the condition into extra pair of parentheses does not help).
The underscore as a placeholder for a parameter to a lambda doesn't work the way that you think.
a => println(a)
// is equivalent to
println(_)
(a,b) => a + b
// is equivalent to
_ + _
a => a + a
// is not equivalent to
_ + _
That is, the first underscore means the first parameter and the second one means the second parameter and so on. So that's the reason for the error that you're seeing -- you're using two underscores but have only one parameter. The fix is to use the explicit version:
iter.find( a=> a(0) == "John" && a(1) == "Doe" )
You can use Regex:
scala> def getRegex(v1: String, v2: String) = (v1 + "," + v2 +",(\\S+)").r
getRegex: (v1: String,v2: String)scala.util.matching.Regex
scala> val src = """John,Smith,john.smith#hotmail.com
| Nancy,Smith,nancy.smith#gmail.com
| Jane,Doe,jane.doe#aol.com
| John,Doe,john.doe#yahoo.com
| """
src: java.lang.String =
John,Smith,john.smith#hotmail.com
Nancy,Smith,nancy.smith#gmail.com
Jane,Doe,jane.doe#aol.com
John,Doe,john.doe#yahoo.com
scala> val MAIL = getRegex("John","Doe")
MAIL: scala.util.matching.Regex = John,Doe,(\S+)
scala> val itr = src.lines
itr: Iterator[String] = non-empty iterator
scala> for(MAIL(address) <- itr) println(address)
john.doe#yahoo.com
scala>
You could also do a pattern match on the result of split in a for comprehension.
val firstName = "John"
val surName = "Doe"
val emails = for {
Array(`firstName`, `surName`, email) <-
src.getLines().drop(1) map { _ split ',' }
} yield { email }
println(emails.mkString(","))
Note the backticks in the pattern: this means we match on the value of firstName instead of introducing a new variable matching anything and shadowing the val firstname.