how to extract part of string that did not match pattern - scala

I want to extract part of string that did not match pattern
My pattern matching condition is sting should be of length 5 and should contain only N or Y.
Ex:
NYYYY => valid
NY => Invalid , length is invalid
NYYSY => Invalid. character at position 3 is invalid
If string is invalid then I want to find out which particular character did not match. Ex : In NYYSY 4th character did not match.
I tried with pattern matching in scala
val Pattern = "([NY]{5})".r
paramList match {
case Pattern(c) => true
case _ => false
}

Returns a String indicating validation status.
def validate(str :String, len :Int, cs :Seq[Char]) :String = {
val checkC = cs.toSet
val errs = str.zipAll(Range(0,len), 1.toChar, -1).flatMap{ case (c,x) =>
if (x < 0) Some("too long")
else if (checkC(c)) None
else if (c == 1) Some("too short")
else Some(s"'$c' at index $x")
}
str + ": " + (if (errs.isEmpty) "valid" else errs.distinct.mkString(", "))
}
testing:
validate("NTYYYNN", 4, "NY") //res0: String = NTYYYNN: 'T' at index 1, too long
validate("NYC", 7, "NY") //res1: String = NYC: 'C' at index 2, too short
validate("YNYNY", 5, "NY") //res2: String = YNYNY: valid

Here's one approach that returns a list of (Char, Int) tuples of invalid characters and their corresponding positions in a given string:
def checkString(validChars: List[Char], validLength: Int, s: String) = {
val Pattern = s"([${validChars.mkString}]{$validLength})".r
s match {
case Pattern(_) => Vector.empty[(Char, Int)]
case s =>
val invalidList = s.zipWithIndex.filter{case (c, _) => !validChars.contains(c)}
if (invalidList.nonEmpty) invalidList else Vector(('\u0000', -1))
}
}
List("NYYYY", "NY", "NNSYYTN").map(checkString(List('N', 'Y'), 5, _))
// res1: List(Vector(), Vector((?,-1)), Vector((S,2), (T,5)))
As shown above, an empty list represents a valid string and a list of (null-char, -1) means the string has valid characters but invalid length.

Here is one suggestion which might suit your needs:
"NYYSY".split("(?<=[^NY])|(?=[^NY])").foreach(println)
NYY
S
Y
This solution splits the input string at any point when either the preceding or following character is not a Y or a N. This places each island of valid and invalid characters as separate rows in the output.

You can use additional regular expressions to detect the specific issue:
val Pattern = "([NY]{5})".r
val TooLong = "([NY]{5})(.+)".r
val WrongChar = "([NY]*)([^NY].*)".r
paramList match {
case Pattern(c) => // Good
case TooLong(head, rest) => // Extra character(s) in sequence
case WrongChar(head, rest) => // Wrong character in sequence
case _ => // Too short
}
You can work out the index of the error using head.length and the failing character is rest.head.

You can achieve this with pattern matching each characters of the string without using any sort of regex or complex string manipulation.
def check(value: String): Unit = {
if(value.length!=5) println(s"$value length is invalid.")
else value.foldLeft((0, Seq[String]())){
case (r, char) =>
char match {
case 'Y' | 'N' => r._1+1 -> r._2
case c # _ => r._1+1 -> {r._2 ++ List(s"Invalid character `$c` in position ${r._1}")}
}
}._2 match {
case Nil => println(s"$value is valid.")
case errors: List[String] => println(s"$value is invalid - [${errors.mkString(", ")}]")
}
}
check("NYCNBNY")
NYNYNCC length is invalid.
check("NYCNB")
NYCNB is invalid - [Invalid character `C` in position 2, Invalid character `B` in position 4]
check("NYNNY")
NYNNY is valid.

Related

How can I emit periodic results over an iteration?

I might have something like this:
val found = source.toCharArray.foreach{ c =>
// Process char c
// Sometimes (e.g. on newline) I want to emit a result to be
// captured in 'found'. There may be 0 or more captured results.
}
This shows my intent. I want to iterate over some collection of things. Whenever the need arrises I want to "emit" a result to be captured in found. It's not a direct 1-for-1 like map. collect() is a "pull", applying a partial function over the collection. I want a "push" behavior, where I visit everything but push out something when needed.
Is there a pattern or collection method I'm missing that does this?
Apparently, you have a Collection[Thing], and you want to obtain a new Collection[Event] by emitting a Collection[Event] for each Thing. That is, you want a function
(Collection[Thing], Thing => Collection[Event]) => Collection[Event]
That's exactly what flatMap does.
You can write it down with nested fors where the second generator defines what "events" have to be "emitted" for each input from the source. For example:
val input = "a2ba4b"
val result = (for {
c <- input
emitted <- {
if (c == 'a') List('A')
else if (c.isDigit) List.fill(c.toString.toInt)('|')
else Nil
}
} yield emitted).mkString
println(result)
prints
A||A||||
because each 'a' emits an 'A', each digit emits the right amount of tally marks, and all other symbols are ignored.
There are several other ways to express the same thing, for example, the above expression could also be rewritten with an explicit flatMap and with a pattern match instead of if-else:
println(input.flatMap{
case 'a' => "A"
case d if d.isDigit => "|" * (d.toString.toInt)
case _ => ""
})
I think you are looking for a way to build a Stream for your condition. Streams are lazy and are computed only when required.
val sourceString = "sdfdsdsfssd\ndfgdfgd\nsdfsfsggdfg\ndsgsfgdfgdfg\nsdfsffdg\nersdff\n"
val sourceStream = sourceString.toCharArray.toStream
def foundStreamCreator( source: Stream[Char], emmitBoundaryFunction: Char => Boolean): Stream[String] = {
def loop(sourceStream: Stream[Char], collector: List[Char]): Stream[String] =
sourceStream.isEmpty match {
case true => collector.mkString.reverse #:: Stream.empty[String]
case false => {
val char = sourceStream.head
emmitBoundaryFunction(char) match {
case true =>
collector.mkString.reverse #:: loop(sourceStream.tail, List.empty[Char])
case false =>
loop(sourceStream.tail, char :: collector)
}
}
}
loop(source, List.empty[Char])
}
val foundStream = foundStreamCreator(sourceStream, c => c == '\n')
val foundIterator = foundStream.toIterator
foundIterator.next()
// res0: String = sdfdsdsfssd
foundIterator.next()
// res1: String = dfgdfgd
foundIterator.next()
// res2: String = sdfsfsggdfg
It looks like foldLeft to me:
val found = ((List.empty[String], "") /: source.toCharArray) {case ((agg, tmp), char) =>
if (char == '\n') (tmp :: agg, "") // <- emit
else (agg, tmp + char)
}._1
Where you keep collecting items in a temporary location and then emit it when you run into a character signifying something. Since I used List you'll have to reverse at the end if you want it in order.

How to exclude a string from parsed text file using Scala?

A sample text file looks like this:
Date: Nov 12, 2004
Support_Addresses: Support#microsoft.com, suport#yahoo.com,
google#gmail.com,
support#comcast.net
Notes: Need to renew support contracts for software and services.
Expected output is:
Nov 12, 2004
Support#microsoft.com, suport#yahoo.com, google#gmail.com, support#comcast.net
Need to renew support contracts for software and services.
Basically, I need to exclude the field titles from the lines, so things like “Date:” , “Support_Addresses: “ and “Notes: “ are removed from the lines before they are saved to a CSV file. I have this code from other projects:
val support_agreements = lines
.dropWhile(line => !line.startsWith("Support_Addresses: "))
.takeWhile(line => !line.startsWith(“Notes: "))
.flatMap(_.split(","))
.map(_.trim())
.filter(_.nonEmpty)
.mkString(", ")
But it does not remove the field titles/names. I am using startsWith, but it includes the field name. How can I exclude the field name from the line?
This should do it:
text.lines.map{ line =>
line.indexOf(':') match {
case x if x > 0 =>
line.substring(x + 1).trim
case _ => line.trim
}
}.mkString("\n")
it iterates through the lines and if it finds a colon it calls the substring function
Here's what I came up with. It builds a map of data m which could be manipulated usefully. This is then printed in the form you wanted.
def processValue(s: String): List[String] =
s.split(",").toList.map(_.trim).filterNot(_.isEmpty)
val retros = lines.foldLeft(List.empty[(String, List[String])]) {
case (acc, l) =>
l.indexOf(':') match {
case -1 =>
acc match {
case Nil => acc // ???
case h :: t => (h._1, h._2 ++ processValue(l)) :: t
}
case n =>
val key = l.substring(0, n).trim
val value = processValue(l.substring(n+1))
(key, value) :: acc
}
}
val m = retros.reverse.toMap
m.values.map(_.mkString(", ")).foreach(println)

Regex to extract part of string between parenthesis

I have below string and I want to extract only List((asdf, asdf), (fff,qqq)) from the string, line has many other characters before and after the part I want to extract.
some garbage string PARAMS=List((foo, bar), (foo1, bar1)) some garbage string
I have tried these regex
(?:PARAMS)=(List\((.*?)\))
(?:PARAMS)=(List\(([^)]+)\))
but it gives me below output in group(1):
List((foo, bar)
regex .*List\((.*)\).* works
Using Scala regex and pattern matching together and then split with any of ( , ) and then group
regex contains extractors
val r = """.*List\((.*)\).*""".r
pattern matching using extractor in regex
val result = str match {
case r(value) => value
case _ => ""
}
Then split using any of ( or , or ) and then group
result.split("""[(|,|)]""").filterNot(s => s.isEmpty || s.trim.isEmpty)
.grouped(2)
.toList
.map(pair => (pair(0), pair(1))).toList
Scala REPL
scala> val str = """some garbage string PARAMS=List((foo, bar), (foo1, bar1)) some garbage string"""
str: String = "some garbage string PARAMS=List((foo, bar), (foo1, bar1)) some garbage string"
scala> val r = """.*List\((.*)\).*""".r
r: util.matching.Regex = .*List\((.*)\).*
scala> val result = str match {
case r(value) => value
case _ => ""
}
result: String = "(foo, bar), (foo1, bar1)"
scala> result.split("""[(|,|)]""").filterNot(s => s.isEmpty || s.trim.isEmpty).grouped(2).toList.map(pair => (pair(0), pair(1))).toList
res46: List[(String, String)] = List(("foo", " bar"), ("foo1", " bar1"))

Spray routing filter path parameter

given this snippet of code
val passRoute = (path("passgen" / IntNumber) & get) { length =>
complete {
if(length > 0){
logger.debug(s"new password generated of length $length")
newPass(length)
}
else {
logger.debug("using default length 8 when no length specified")
newPass(8)
}
}
}
How could I replace the if-else with a match-case pattern, eventually using also an Option object with Some-None.
My aim is to filter out the length and handle the case where length is an Int exists , it does not exist, is something else than an Int.
I have tried this but it does not work.
val passRoute = (path("passgen" / IntNumber) & get) { length =>
complete {
length match {
case Some(match: Int) => print("Is an int")
case None => print("length is missing")
//missing the case for handling non int but existent values
// can be case _ => print("non int") ???
}
}
}
My guess is that in your non-working code, length is still an Int, and hence does not match with either Some or None. If you wanted to translate your if-else code to a match-statement, I'd suggest something similar to the following code, which matches for positive Int values:
List(10, -1, 3, "hello", 0, -2) foreach {
case length: Int if length > 0 => println("Found length " + length)
case _ => println("Length is missing")
}
If you want to be fancy, you can also define a custom extractor:
object Positive {
def unapply(i: Int): Option[Int] = if (i > 0) Some(i) else None
}
List(10, -1, 3, "hello", 0, -2) foreach {
case Positive(length) => println("Found length " + length)
case _ => println("Length is missing")
}
And if you somehow do have Option values, the following should work:
List(10, -1, 3, "hello", 0, -2) map (Some(_)) foreach {
case Some(length: Int) if length > 0 => println("Found length " + length)
case _ => println("Length is missing")
}
All of those snippets print
Found length 10
Length is missing
Found length 3
Length is missing
Length is missing
Length is missing

Why does scala complain when given this pattern match on an integral value?

Goal: Write a function that generates a new String excluding a specified character (identified by the index)
Example:
takeAllExcept(0, "abc") returns bc
takeAllExcept(1, "abc") returns ac
takeAllExcept(2, "abc") returns ab
What I did initially:
def takeAllExcept( index: Int, s: String ): String = {
val lastIndex = s.length()-1
index match {
case 0 => return s.slice(1, s.length)
case lastIndex => return s.slice(0, s.length()-1)
case _ => { s.slice(0, index) + s.slice(index+1, s.length) }
}
}
The compiler complains that the statement block for case _ is unreachable.
How I fixed it
def takeAllExcept( index: Int, s: String ): String = {
val lastIndex = s.length()-1
if( index == 0 )
return s.slice(1, s.length)
if( index == lastIndex )
return s.slice(0, s.length()-1)
s.slice(0, index) + s.slice(index+1, s.length)
}
I want to know why my initial attempt failed with the unreachable code. It looks legit to me. Also, is there an in-built facility in scala that already does this ?
lastIndex in the pattern is an implicit declaration of a new name that is bound to whatever value is put into the match and shadows the allready defined lastIndex, as the other two post allready pointed out. There are two other possibilities instead of using upper case identifiers (see Peter's post):
Using backticks to let the compiler know that this shall not be a declaration of a new identifier:
case `lastIndex` => ...
Using pattern guards:
case x if x == lastIndex => ...
If you want to do a lot of index-based removing on strings then it would be faster to use a Buffer by calling toBuffer on the string and then you can use the remove(i: Int) method of Buffer. That is slower for only one operation because you will have to convert the Buffer back to string when your done but if you do many random access operations its a lot faster. After your done you can call mkString on the Buffer to get your String back. For single removal I would do it like Peter suggested or here is an alternative:
def takeAllExcept(i: Int, s: String) = s.take(i) + s.drop(i+1)
Your first question:
def takeAllExcept( index: Int, s: String ): String = {
val LastIndex = s.length()-1
index match {
case 0 => return s.slice(1, s.length)
case LastIndex => return s.slice(0, s.length()-1)
case _ => { s.slice(0, index) + s.slice(index+1, s.length) }
}
}
The lastIndex after the case is newly bound while pattern matching and hides the definition of val lastIndex = s.length()-1. As my example shows, you can use upper case names, then scala uses a defined val in scope.
To answer your second question in a way I would solve it:
def takeAllExcept(i: Int, s: String): String = {
val (prefix,suffix) = s.splitAt(i)
prefix + suffix.tail
}
val lastIndex = s.length()-1
index match {
case 0 => return s.slice(1, s.length)
case lastIndex => return s.slice(0, s.length()-1)
case _ => { s.slice(0, index) + s.slice(index+1, s.length) }
}
The second clause does not try to match index with lastIndex as you would have expected from e.g. Prolog. Instead it matches any value and binds the value to the name lastIndex, shadowing the previous binding of this variable.