I want to pattern match from Array of String with a single String in scala? - scala

val aggFilters = Array["IR*","IR_"]
val aggCodeVal = "IR_CS_BPV"
val flag = compareFilters(aggFilters,aggCodeVal)
As per my requirement I want to compare the patterns given in the aggFilters with aggCodeVal. The first pattern "IR*" is a match with "IR_CS_BPV" but not the second one, hence I want to break out of the for loop after the match is found so that I don't go for the second one "IR_". I don't want to use break statement like java.
def compareFilters(aggFilters: Array[String], aggCodeVal: String): Boolean = {
var flag: Boolean = false
for (aggFilter <- aggFilters) {
if (aggFilter.endsWith("*")
&& aggCodeVal.startsWith(aggFilter.substring(0, aggFilter.length() - 1))) {
flag = true
}
else if (aggFilter.startsWith("*")
&& aggCodeVal.startsWith(aggFilter.substring(1, aggFilter.length()))) {
flag = true
}
else if (((aggFilter startsWith "*")
&& aggFilter.endsWith("*"))
&& aggCodeVal.startsWith(aggFilter.substring(1, aggFilter.length() - 1))) {
flag = true
}
else if (aggFilter.equals(aggCodeVal)) {
flag = true
}
else {
flag = false
}
}
flag
}

If * is your only wild-card character, you should be able to leverage Regex to do your match testing.
def compareFilters(aggFilters: Array[String], aggCodeVal: String): Boolean =
aggFilters.exists(f => s"$f$$".replace("*",".*").r.findAllIn(aggCodeVal).hasNext)

You can use the built-in exists method to do it for you.
Extract a function that compares a single filter, with this signature:
def compareFilter(aggFilter: String, aggCodeVal: String): Boolean
And then:
def compareFilters(aggFilters: Array[String], aggCodeVal: String): Boolean = {
aggFilters.exists(filter => compareFilter(filter, aggCodeVal))
}
The implementation of compareFilter, BTW, can be shortened to something like:
def compareFilter(aggFilter: String, aggCodeVal: String): Boolean = {
(aggFilter.startsWith("*") && aggFilter.endsWith("*") && aggCodeVal.startsWith(aggFilter.drop(1).dropRight(1))) ||
(aggFilter.endsWith("*") && aggCodeVal.startsWith(aggFilter.dropRight(1))) ||
(aggFilter.startsWith("*") && aggCodeVal.startsWith(aggFilter.drop(1))) ||
aggFilter.equals(aggCodeVal)
}
But - double check me on that one, not sure I followed your logic perfectly.

Related

Shorten MQTT topic filtering function

I wrote the following logic function, but I am sure it is possible to write it (way) shorter.
In case you are unfamiliar with MQTT wildcards, you can read up on them here.
self is the topic we are "subscribed" to, containing zero or more wildcards. incoming is the topic we received something on, which must match the self topic either fully, or conforming to the wildcard rules.
All my tests on this function succeed, but I just don't like the lengthiness and "iffyness" of this Scala function.
def filterTopic(incoming: String, self: String): Boolean = {
if (incoming == self || self == "#") {
true
} else if (self.startsWith("#") || (self.contains("#") && !self.endsWith("#")) || self.endsWith("+")) {
false
} else {
var valid = true
val selfSplit = self.split('/')
var j = 0
for (i <- selfSplit.indices) {
if (selfSplit(i) != "+" && selfSplit(i) != "#" && selfSplit(i) != incoming.split('/')(i)) {
valid = false
}
j += 1
}
if (j < selfSplit.length && selfSplit(j) == "#") {
j += 1
}
j == selfSplit.length && valid
}
}
Here's a shot at it assuming that '+' can be at the end and that the topics are otherwise well-structured
def filterTopic(incoming: String, self: String): Boolean = {
// helper function that works on lists of parts of the topics
def go(incParts: List[String], sParts: List[String]): Boolean = (incParts, sParts) match {
// if they're equivalent lists, the topics match
case (is, ss) if is == ss => true
// if sParts is just a single "#", the topics match
case (_, "#" :: Nil) => true
// if sParts starts with '+', just check if the rest match
case (_ :: is, s :: ss) if s == "+" =>
go(is, ss)
// otherwise the first parts have to match, and we check the rest
case (i :: is, s :: ss) if i == s =>
go(is, ss)
// otherwise they don't match
case _ => false
}
// split the topic strings into parts
go(incoming.split('/').toList, self.split('/').toList)
}

Algorithms to check if string has unique characters in Scala. Why is O(N2) verison quicker?

I implemented two versions of the algorithm in Scala (without using sets, by the way). A first one :
def isContained(letter: Char, word: String): Boolean =
if (word == "") false
else if (letter == word.head) true
else isContained(letter, word.tail)
def hasUniqueChars(stringToCheck: String): Boolean =
if (stringToCheck == "") true
else if (isContained(stringToCheck.head, stringToCheck.tail)) false
else hasUniqueChars(stringToCheck.tail)
which is in O(N2).
And a second one :
def hasUniqueChars2Acc(str: String, asciiTable: List[Boolean]): Boolean = {
if (str.length == 0) true
else if (asciiTable(str.head.toByte)) false
else hasUniqueChars2Acc(str.tail, asciiTable.updated(str.head.toByte, true))
}
def hasUniqueChars2(str: String): Boolean = {
val virginAsciiTable = List.fill(128)(false)
if (str.length > 128) false
else hasUniqueChars2Acc(str, virginAsciiTable)
}
which is in O(N).
But when testing, the second version takes as much as 20 times the duration of the first one. Why? Is it related to the .updated method ?

Using a predicate to search through a string - Scala

I'm having difficulty figuring out how to search through a string with a given predicate and determining its position in the string.
def find(x: Char => Boolean): Boolean = {
}
Example, if x is (_ == ' ')
String = "hi my name is"
It would add 2 to the counter and return true
I'm guessing that this is what you want...
Since find is a higher-order function (HOF) - that is, it's a function that takes a function as an argument - it likely needs to be applied to a String instance. The predicate (the function argument to find) determines when the character you're looking for is found, and the find method reports the position at which the character was found. So find should return an Option[Int], not a Boolean, that way you don't lose the information about where the character was found. Note that you can still change an Option[Int] result to a Boolean value (with true indicating the search was successful, false not) by applying .isDefined to the result.
Note that I've renamed find to myFind to avoid a clash with the built-in String.find method (which does a similar job).
import scala.annotation.tailrec
// Implicit class cannot be a top-level element, so it's put in an object.
object StringUtils {
// "Decorate" strings with additional functions.
final implicit class MyRichString(val s: String)
extends AnyVal {
// Find a character satisfying predicate p, report position.
def myFind(p: Char => Boolean): Option[Int] = {
// Helper function to keep track of current position.
#tailrec
def currentPos(pos: Int): Option[Int] = {
// If we've passed the end of the string, return None. Didn't find a
// character satisfying predicate.
if(pos >= s.length) None
// Otherwise, if the predicate passes for the current character,
// return position wrapped in Some.
else if(p(s(pos))) Some(pos)
// Otherwise, perform another iteration, looking at the next character.
else currentPos(pos + 1)
}
// Start by looking at the first (0th) character.
currentPos(0)
}
}
}
import StringUtils._
val myString = "hi my name is"
myString.myFind(_ == ' ') // Should report Some(2)
myString.myFind(_ == ' ').isDefined // Should report true
myString.myFind(_ == 'X') // Should report None
myString.myFind(_ == 'X').isDefined // Should report false
If the use of an implicit class is a little too much effort, you could implement this as a single function that takes the String as an argument:
def find(s: String, p: Char => Boolean): Option[Int] = {
// Helper function to keep track of current position.
#tailrec
def currentPos(pos: Int): Option[Int] = {
// If we've passed the end of the string, return None. Didn't find a
// character satisfying predicate.
if(pos >= s.length) None
// Otherwise, if the predicate passes for the current character,
// return position wrapped in Some.
else if(p(s(pos))) Some(pos)
// Otherwise, perform another iteration, looking at the next character.
else currentPos(pos + 1)
}
// Start by looking at the first (0th) character.
currentPos(0)
}
val myString = "hi my name is"
find(myString, _ == ' ') // Should report Some(2)
find(myString, _ == ' ').isDefined // Should report true
find(myString, _ == 'X') // Should report None
find(myString, _ == 'X').isDefined // Should report false
Counter:
"hi my name is".count (_ == 'm')
"hi my name is".toList.filter (_ == 'i').size
Boolean:
"hi my name is".toList.exists (_ == 'i')
"hi my name is".contains ('j')
Position(s):
"hi my name is".zipWithIndex.filter {case (a, b) => a == 'i'}
res8: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((i,1), (i,11))
Usage of find:
scala> "hi my name is".find (_ == 'x')
res27: Option[Char] = None
scala> "hi my name is".find (_ == 's')
res28: Option[Char] = Some(s)
I would suggest separating the character search and position into individual methods, each of which leverages built-in functions in String, and wrap in an implicit class:
object MyStringOps {
implicit class CharInString(s: String) {
def charPos(c: Char): Int = s.indexOf(c)
def charFind(p: Char => Boolean): Boolean =
s.find(p) match {
case Some(_) => true
case None => false
}
}
}
import MyStringOps._
"hi my name is".charPos(' ')
// res1: Int = 2
"hi my name is".charFind(_ == ' ')
// res2: Boolean = true

Multiple if else statements to get non-empty value from a map in Scala

I have a string to string map, and its value can be an empty string. I want to assign a non-empty value to a variable to use it somewhere. Is there a better way to write this in Scala?
import scala.collection.mutable
var keyvalue = mutable.Map.empty[String, String]
keyvalue += ("key1" -> "value1")
var myvalue = ""
if (keyvalue.get("key1").isDefined &&
keyvalue("key1").length > 0) {
myvalue = keyvalue("key1")
}
else if (keyvalue.get("key2").isDefined &&
keyvalue("key2").length > 0) {
myvalue = keyvalue("key2")
}
else if (keyvalue.get("key3").isDefined &&
keyvalue("key3").length > 0) {
myvalue = keyvalue("key3")
}
A more idiomatic way would be to use filter to check the length of the string contained in the Option, then orElse and getOrElse to assign to a val. A crude example:
def getKey(key: String): Option[String] = keyvalue.get(key).filter(_.length > 0)
val myvalue: String = getKey("key1")
.orElse(getKey("key2"))
.orElse(getKey("key3"))
.getOrElse("")
Here's a similar way to do it with an arbitrary list of fallback keys. Using a view and collectFirst, we will only evaluate keyvalue.get for only as many times as we need to (or all, if there are no matches).
val myvalue: String = List("key1", "key2", "key3").view
.map(keyvalue.get)
.collectFirst { case Some(value) if(value.length > 0) => value }
.getOrElse("")
Mmm, it seems it took me too long to devise a generic solution and other answer was accepted, but here it goes:
def getOrTryAgain(map: mutable.Map[String, String], keys: List[String]): Option[String] =
{
if(keys.isEmpty)
None
else
map.get(keys.head).filter(_.length > 0).orElse(getOrTryAgain(map, keys.tail))
}
val myvalue2 = getOrTryAgain(keyvalue, List("key1", "key2", "key3"))
This one you can use to check for as many keys as you want.

Scala: Group an Iterable into an Iterable of Iterables by a predicate

I have very large Iterators that I want to split into pieces. I have a predicate that looks at an item and returns true if it is the start of a new piece. I need the pieces to be Iterators, because even the pieces will not fit into memory. There are so many pieces that I would be wary of a recursive solution blowing out your stack. The situation is similar to this question, but I need Iterators instead of Lists, and the "sentinels" (items for which the predicate is true) occur (and should be included) at the beginning of a piece. The resulting iterators will only be used in order, though some may not be used at all, and they should only use O(1) memory. I imagine this means they should all share the same underlying iterator. Performance is important.
If I were to take a stab at a function signature, it would be this:
def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] = ...
I would have loved to use takeWhile, but it loses the last element. I investigated span, but it buffers results. My current best idea involves BufferedIterator, but maybe there is a better way.
You'll know you've got it right because something like this doesn't crash your JVM:
groupby((1 to Int.MaxValue).iterator)(_ % (Int.MaxValue / 2) == 0).foreach(group => println(group.sum))
groupby((1 to Int.MaxValue).iterator)(_ % 10 == 0).foreach(group => println(group.sum))
You have an inherent problem. Iterable implies that you can get multiple iterators. Iterator implies that you can only pass through once. That means that your Iterable[Iterable[T]] should be able to produce Iterator[Iterable[T]]s. But when this returns an element--an Iterable[T]--and that asks for multiple iterators, the underlying single iterator can't comply without either caching the results of the list (too big) or calling the original iterable and going through absolutely everything again (very inefficient).
So, while you could do this, I think you should conceive of your problem in a different way.
If you could start with a Seq instead, you could grab subsets as ranges.
If you already know how you want to use your iterable, you could write a method
def process[T](source: Iterable[T])(starts: T => Boolean)(handlers: T => Unit *)
which increments through the set of handlers each time starts fires off a "true". If there's any way you can do your processing in one sweep, something like this is the way to go. (Your handlers will have to save state via mutable data structures or variables, however.)
If you can permit iteration on the outer list to break the inner list, you could have an Iterable[Iterator[T]] with the additional constraint that once you iterate to a later sub-iterator, all previous sub-iterators are invalid.
Here's a solution of the last type (from Iterator[T] to Iterator[Iterator[T]]; one can wrap this to make the outer layers Iterable instead).
class GroupedBy[T](source: Iterator[T])(starts: T => Boolean)
extends Iterator[Iterator[T]] {
private val underlying = source
private var saved: T = _
private var cached = false
private var starting = false
private def cacheNext() {
saved = underlying.next
starting = starts(saved)
cached = true
}
private def oops() { throw new java.util.NoSuchElementException("empty iterator") }
// Comment the next line if you do NOT want the first element to always start a group
if (underlying.hasNext) { cacheNext(); starting = true }
def hasNext = {
while (!(cached && starting) && underlying.hasNext) cacheNext()
cached && starting
}
def next = {
if (!(cached && starting) && !hasNext) oops()
starting = false
new Iterator[T] {
var presumablyMore = true
def hasNext = {
if (!cached && !starting && underlying.hasNext && presumablyMore) cacheNext()
presumablyMore = cached && !starting
presumablyMore
}
def next = {
if (presumablyMore && (cached || hasNext)) {
cached = false
saved
}
else oops()
}
}
}
}
Here's my solution using BufferedIterator. It doesn't let you skip iterators correctly, but it's fairly simple and functional. The first element(s) go into a group even if !startsGroup(first).
def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] =
new Iterator[Iterator[T]] {
val base = iter.buffered
override def hasNext = base.hasNext
override def next() = Iterator(base.next()) ++ new Iterator[T] {
override def hasNext = base.hasNext && !startsGroup(base.head)
override def next() = if (hasNext) base.next() else Iterator.empty.next()
}
}
Update: Keeping a little state lets you skip iterators and prevent people from messing with previous ones:
def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] =
new Iterator[Iterator[T]] {
val base = iter.buffered
var prev: Iterator[T] = Iterator.empty
override def hasNext = base.hasNext
override def next() = {
while (prev.hasNext) prev.next() // Exhaust previous iterator; take* and drop* do NOT always work!! (Jira SI-5002?)
prev = Iterator(base.next()) ++ new Iterator[T] {
var hasMore = true
override def hasNext = { hasMore = hasMore && base.hasNext && !startsGroup(base.head) ; hasMore }
override def next() = if (hasNext) base.next() else Iterator.empty.next()
}
prev
}
}
If you are looking at memory constraints then the following will work. You can only use it if your underlying iterable object supports views. This implementation will iterate over the Iterable and then generate IterableViews which can then be iterated over. This implementation does not care if the very first element tests as a start group since it will be regardless.
def groupby[T](iter: Iterable[T])(startsGroup: T => Boolean): Iterable[Iterable[T]] = new Iterable[Iterable[T]] {
def iterator = new Iterator[Iterable[T]] {
val i = iter.iterator
var index = 0
var nextView: IterableView[T, Iterable[T]] = getNextView()
private def getNextView() = {
val start = index
var hitStartGroup = false
while ( i.hasNext && ! hitStartGroup ) {
val next = i.next()
index += 1
hitStartGroup = ( index > 1 && startsGroup( next ) )
}
if ( hitStartGroup ) {
if ( start == 0 ) iter.view( start, index - 1 )
else iter.view( start - 1, index - 1 )
} else { // hit end
if ( start == index ) null
else if ( start == 0 ) iter.view( start, index )
else iter.view( start - 1, index )
}
}
def hasNext = nextView != null
def next() = {
if ( nextView != null ) {
val next = nextView
nextView = getNextView()
next
} else null
}
}
}
You can maintain low memory foot-print by using Streams. Use result.toIterator, if you an iterator again.
With streams, there's no mutable state, only a single conditional and it's nearly as concise as Jay Hacker's solution.
def batchBy[A,B](iter: Iterator[A])(f: A => B): Stream[(B, Iterator[A])] = {
val base = iter.buffered
val empty = Stream.empty[(B, Iterator[A])]
def getBatch(key: B) = {
Iterator(base.next()) ++ new Iterator[A] {
def hasNext: Boolean = base.hasNext && (f(base.head) == key)
def next(): A = base.next()
}
}
def next(skipList: Option[Iterator[A]] = None): Stream[(B, Iterator[A])] = {
skipList.foreach{_.foreach{_=>}}
if (base.isEmpty) empty
else {
val key = f(base.head)
val batch = getBatch(key)
Stream.cons((key, batch), next(Some(batch)))
}
}
next()
}
I ran the tests:
scala> batchBy((1 to Int.MaxValue).iterator)(_ % (Int.MaxValue / 2) == 0)
.foreach{case(_,group) => println(group.sum)}
-1610612735
1073741823
-536870909
2147483646
2147483647
The second test prints too much to paste to Stack Overflow.
import scala.collection.mutable.ArrayBuffer
object GroupingIterator {
/**
* Create a new GroupingIterator with a grouping predicate.
*
* #param it The original iterator
* #param p Predicate controlling the grouping
* #tparam A Type of elements iterated
* #return A new GroupingIterator
*/
def apply[A](it: Iterator[A])(p: (A, IndexedSeq[A]) => Boolean): GroupingIterator[A] =
new GroupingIterator(it)(p)
}
/**
* Group elements in sequences of contiguous elements that satisfy a predicate. The predicate
* tests each single potential next element of the group with the help of the elements grouped so far.
* If it returns true, the potential next element is added to the group, otherwise
* a new group is started with the potential next element as first element
*
* #param self The original iterator
* #param p Predicate controlling the grouping
* #tparam A Type of elements iterated
*/
class GroupingIterator[+A](self: Iterator[A])(p: (A, IndexedSeq[A]) => Boolean) extends Iterator[IndexedSeq[A]] {
private[this] val source = self.buffered
private[this] val buffer: ArrayBuffer[A] = ArrayBuffer()
def hasNext: Boolean = source.hasNext
def next(): IndexedSeq[A] = {
if (hasNext)
nextGroup()
else
Iterator.empty.next()
}
private[this] def nextGroup(): IndexedSeq[A] = {
assert(source.hasNext)
buffer.clear()
buffer += source.next
while (source.hasNext && p(source.head, buffer)) {
buffer += source.next
}
buffer.toIndexedSeq
}
}