Specs2 + Scalacheck Generate Tuple with different Strings - scala

I have to test an loop-free graph and always checking whether the Strings are different is not very usable (it throws an exception). There must be a better solution, but I am not able to come up with it, and i am kind of lost in the specs2 documentation.
This is an example of the code:
"BiDirectionalEdge" should {
"throw an Error for the wrong DirectedEdges" in prop {
(a :String, b :String, c :String, d :String) =>
val edge1 = createDirectedEdge(a, b, c)
val edge2 = createDirectedEdge(c, b, d)
new BiDirectionalEdge(edge1, edge2) must throwA[InvalidFormatException] or(a mustEqual d)
}
if a and c are the same, createDirectedEdge will throw an exception (i have different test for that behaviour).

Yep, there's a better way—this is precisely what conditional properties are for. Just add your condition followed by ==>:
"BiDirectionalEdge" should {
"throw an Error for the wrong DirectedEdges" in prop {
(a: String, b: String, c: String, d: String) => (a != c) ==>
val edge1 = createDirectedEdge(a, b, c)
val edge2 = createDirectedEdge(c, b, d)
new BiDirectionalEdge(edge1, edge2) must
throwA[InvalidFormatException] or(a mustEqual d)
}
}
If the condition is likely to fail often, you should probably take a different approach (see the ScalaCheck guide for details), but in your case a conditional property is totally appropriate.

Related

Working with strings and objects

I think my problem is pretty straight forward - I have a text file made out of 'E' and 'B' symbols, for example:
EBBEBBB BBEB
E
BEB BEB B
B
Now i want to get this data. When i use it, i don't want it to be in a form of strings because you can pass anything that would not work, for example something like a number or any other invalid symbol. That's why i figured i could create some case objects that extends a single trait(like shown below). Problem is, I don't know how i should CORRECTLY convert my string data to that particular data structure that I made:
sealed trait EB
case object E extends EB
case object B extends EB
case class EB_Text(data: Vector[EB])
def convertText(fileData: Vector[String]) : EB_Text = {
//Match each symbol and check if it's 'E' or 'B' ?
//If i find an invalid symbol here, what do i return? Should i return AN Option here?
}
Thank you! ^^
You can construct the function in the following way
def convertText(fileData: Vector[String]) : EB_Text = {
EB_Text(fileData.map{
singleLine =>
singleLine.replaceAll(" ","").toUpperCase().collect{
case 'E' => E
case 'B' => B
}
}.flatten)
}
You do not need to make any modifications to the case objects that you had defined earlier. You can keep them as it is
sealed trait EB
case object E extends EB
case object B extends EB
case class EB_Text(data: Vector[EB])
On invoking the function with the following input, you will get the output as
val input = Vector("EBBEBBB BBEB"," E","BEB BEB B"," B ")
convertText(input)
you will get the output as
res0: EB_Text = EB_Text(Vector(E, B, B, E, B, B, B, B, B, E, B, E, B, E, B, B, E, B, B, B))
I hope this answers your question.
Your logic is correct and returning an options sounds like the right thing to do. If you don't need the option though (or any other information related to the convert operation, e.g. the underlying string) there is no reason to construct it.
I also don't think that is necessary to wrap the result in a case class.
def convertText(fileData: Vector[String]) : Vector[EB] = {
for (s <- fileData if s == "E" || s == "B") yield {
if (s == "E") E
else B
}
}

Foldable "foldMap" that take a partial function: foldCollect?

Say, I have the following object:
case class MyFancyObject(a: String, b: Int, c : Vector[String])
And what I needed is to get a single Vector[String] containing all 'c's that match a given partial function.
E.g.:
val xs = Vector(
MyFancyObject("test1",1,Vector("test1-1","test1-2","test1-3")),
MyFancyObject("test2",2,Vector("test2-1","test2-2","test2-3")),
MyFancyObject("test3",3,Vector("test3-1","test3-2","test3-3")),
MyFancyObject("test4",4,Vector("test4-1","test4-2","test4-3"))
)
val partialFunction1 : PartialFunction[MyFancyObject,Vector[String]] = {
case MyFancyObject(_,b,c) if b > 2 => c
}
What I need to get is: Vector("test3-1","test3-2","test3-3","test4-1","test4-2","test4-3").
I solved this doing the following:
val res1 = xs.foldMap{
case MyFancyObject(_,b,c) if b > 2 => c
case _ => Vector.empty[String]
}
However, this made me curious. What I am doing here seemed to be a pretty common and natural thing: for each element of a foldable collection, try to apply a partial function and, should that fail, default to the Monoid's empty (Vector.empty in my case). I searched in the library and I did not find anything doing this already, so I ended up adding this extension method in my code:
implicit class FoldableExt[F[_], A](foldable : F[A]) {
def foldCollect[B](pF: PartialFunction[A, B])(implicit F : Foldable[F], B : Monoid[B]) : B = {
F.foldMap(foldable)(pF.applyOrElse(_, (_ : A) => B.empty))
}
}
My question here is:
Is there any reason why such a method would not be in available already? Is it not a generic and common enough scenario, or am I missing something?
I think that if you really need the partial function, you don't want it to leak outside, because it's not very nice to use. The best thing to do, if you want to reuse your partialFunction1, is to lift it to make it a total function that returns Option. Then you can provide your default case in the same place you use your partial function. Here's the approach:
val res2 = xs.foldMap(partialFunction1.lift).getOrElse(Vector.empty)
The foldMap(partialFunction1.lift) returns Some(Vector(test3-1, test3-2, test3-3, test4-1, test4-2, test4-3)). This is exactly what you have in res1, but wrapped in Option.

Topological sort in scala

I'm looking for a nice implementation of topological sorting in scala.
The solution should be stable:
If input is already sorted, the output should be unchanged
The algorithm should be deterministic (hashCode has no effect)
I suspect there are libraries that can do this, but I wouldn't like to add nontrivial dependencies due to this.
Example problem:
case class Node(name: String)(val referenced: Node*)
val a = Node("a")()
val b = Node("b")(a)
val c = Node("c")(a)
val d = Node("d")(b, c)
val e = Node("e")(d)
val f = Node("f")()
assertEquals("Previous order is kept",
Vector(f, a, b, c, d, e),
topoSort(Vector(f, a, b, c, d, e)))
assertEquals(Vector(a, b, c, d, f, e),
topoSort(Vector(d, c, b, f, a, e)))
Here the order is defined such that if the nodes were say declarations in a programming language referencing other declarations, the result order would
be such that no declaration is used before it has been declared.
Here is my own solution. Additionnally it returns possible loops detected in the input.
The format of the nodes is not fixed because the caller provides a visitor that
will take a node and a callback and call the callback for each referenced node.
If the loop reporting is not necessary, it should be easy to remove.
import scala.collection.mutable
// Based on https://en.wikipedia.org/wiki/Topological_sorting?oldformat=true#Depth-first_search
object TopologicalSort {
case class Result[T](result: IndexedSeq[T], loops: IndexedSeq[IndexedSeq[T]])
type Visit[T] = (T) => Unit
// A visitor is a function that takes a node and a callback.
// The visitor calls the callback for each node referenced by the given node.
type Visitor[T] = (T, Visit[T]) => Unit
def topoSort[T <: AnyRef](input: Iterable[T], visitor: Visitor[T]): Result[T] = {
// Buffer, because it is operated in a stack like fashion
val temporarilyMarked = mutable.Buffer[T]()
val permanentlyMarked = mutable.HashSet[T]()
val loopsBuilder = IndexedSeq.newBuilder[IndexedSeq[T]]
val resultBuilder = IndexedSeq.newBuilder[T]
def visit(node: T): Unit = {
if (temporarilyMarked.contains(node)) {
val loopStartIndex = temporarilyMarked.indexOf(node)
val loop = temporarilyMarked.slice(loopStartIndex, temporarilyMarked.size)
.toIndexedSeq
loopsBuilder += loop
} else if (!permanentlyMarked.contains(node)) {
temporarilyMarked += node
visitor(node, visit)
permanentlyMarked += node
temporarilyMarked.remove(temporarilyMarked.size - 1, 1)
resultBuilder += node
}
}
for (i <- input) {
if (!permanentlyMarked.contains(i)) {
visit(i)
}
}
Result(resultBuilder.result(), loopsBuilder.result())
}
}
In the example of the question this would be applied like this:
import TopologicalSort._
def visitor(node: BaseNode, callback: (Node) => Unit): Unit = {
node.referenced.foreach(callback)
}
assertEquals("Previous order is kept",
Vector(f, a, b, c, d, e),
topoSort(Vector(f, a, b, c, d, e), visitor).result)
assertEquals(Vector(a, b, c, d, f, e),
topoSort(Vector(d, c, b, f, a, e), visitor).result)
Some thoughts on complexity:
The worst case complexity of this solution is actually above O(n + m) because the temporarilyMarked array is scanned for each node.
The asymptotic complexity would be improved if the temporarilyMarked would be replaced with for example a HashSet.
A true O(n + m) would be achieved if the marks were be stored directly inside the nodes, but storing them outside makes writing a generic solution easier.
I haven't run any performance tests, but I suspect scanning the temporarilyMarked array is not a problem even in large graphs as long as they are not very deep.
Example code and test on Github
I have very similar code is also published here. That version has a test suite which can be useful for experimenting and exploring the implementation.
Why would you detect loops
Detecting loops can be useful for example in serialization situations where most of the data can be handled as a DAG, but loops can be handled with some kind of special arrangement.
The test suite in the Github code linked to in above section contains various cases with multiple loops.
Here's a purely functional implementation that returns the topological ordering ONLY if the graph is acyclic.
case class Node(label: Int)
case class Graph(adj: Map[Node, Set[Node]]) {
case class DfsState(discovered: Set[Node] = Set(), activeNodes: Set[Node] = Set(), tsOrder: List[Node] = List(),
isCylic: Boolean = false)
def dfs: (List[Node], Boolean) = {
def dfsVisit(currState: DfsState, src: Node): DfsState = {
val newState = currState.copy(discovered = currState.discovered + src, activeNodes = currState.activeNodes + src,
isCylic = currState.isCylic || adj(src).exists(currState.activeNodes))
val finalState = adj(src).filterNot(newState.discovered).foldLeft(newState)(dfsVisit(_, _))
finalState.copy(tsOrder = src :: finalState.tsOrder, activeNodes = finalState.activeNodes - src)
}
val stateAfterSearch = adj.keys.foldLeft(DfsState()) {(state, n) => if (state.discovered(n)) state else dfsVisit(state, n)}
(stateAfterSearch.tsOrder, stateAfterSearch.isCylic)
}
def topologicalSort: Option[List[Node]] = dfs match {
case (topologicalOrder, false) => Some(topologicalOrder)
case _ => None
}
}

How to create an Iteratee that passes through values to an inner Iteratee unless a specific value is found

I've got an ADT that's essentially a cross between Option and Try:
sealed trait Result[+T]
case object Empty extends Result[Nothing]
case class Error(cause: Throwable) extends Result[Nothing]
case class Success[T](value: T) extends Result[T]
(assume common combinators like map, flatMap etc are defined on Result)
Given an Iteratee[A, Result[B] called inner, I want to create a new Iteratee[Result[A], Result[B]] with the following behavior:
If the input is a Success(a), feed a to inner
If the input is an Empty, no-op
If the input is an Error(err), I want inner to be completely ignored, instead returning a Done iteratee with the Error(err) as its result.
Example Behavior:
// inner: Iteratee[Int, Result[List[Int]]]
// inputs:
1
2
3
// output:
Success(List(1,2,3))
// wrapForResultInput(inner): Iteratee[Result[Int], Result[List[Int]]]
// inputs:
Success(1)
Success(2)
Error(Exception("uh oh"))
Success(3)
// output:
Error(Exception("uh oh"))
This sounds to me like the job for an Enumeratee, but I haven't been able to find anything in the docs that looks like it'll do what I want, and the internal implementations are still voodoo to me.
How can I implement wrapForResultInput to create the behavior described above?
Adding some more detail that won't really fit in a comment:
Yes it looks like I was mistaken in my question. I described it in terms of Iteratees but it seems I really am looking for Enumeratees.
At a certain point in the API I'm building, there's a Transformer[A] class that is essentially an Enumeratee[Event, Result[A]]. I'd like to allow clients to transform that object by providing an Enumeratee[Result[A], Result[B]], which would result in a Transformer[B] aka an Enumeratee[Event, Result[B]].
For a more complex example, suppose I have a Transformer[AorB] and want to turn that into a Transformer[(A, List[B])]:
// the Transformer[AorB] would give
a, b, a, b, b, b, a, a, b
// but the client wants to have
a -> List(b),
a -> List(b, b, b),
a -> Nil
a -> List(b)
The client could implement an Enumeratee[AorB, Result[(A, List[B])]] without too much trouble using Enumeratee.grouped, but they are required to provide an Enumeratee[Result[AorB], Result[(A, List[B])] which seems to introduce a lot of complication that I'd like to hide from them if possible.
val easyClientEnumeratee = Enumeratee.grouped[AorB]{
for {
_ <- Enumeratee.dropWhile(_ != a) ><> Iteratee.ignore
headResult <- Iteratee.head.map{ Result.fromOption }
bs <- Enumeratee.takeWhile(_ == b) ><> Iteratee.getChunks
} yield headResult.map{_ -> bs}
val harderEnumeratee = ??? ><> easyClientEnumeratee
val oldTransformer: Transformer[AorB] = ... // assume it already exists
val newTransformer: Transformer[(A, List[B])] = oldTransformer.andThen(harderEnumeratee)
So what I'm looking for is the ??? to define the harderEnumeratee in order to ease the burden on the user who already implemented easyClientEnumeratee.
I guess the ??? should be an Enumeratee[Result[AorB], AorB], but if I try something like
Enumeratee.collect[Result[AorB]] {
case Success(ab) => ab
case Error(err) => throw err
}
the error will actually be thrown; I actually want the error to come back out as an Error(err).
Simplest implementation of such would be Iteratee.fold2 method, that could collect elements until something is happened.
Since you return single result and can't really return anything until you verify there is no errors, Iteratee would be enough for such a task
def listResults[E] = Iteratee.fold2[Result[E], Either[Throwable, List[E]]](Right(Nil)) { (state, elem) =>
val Right(list) = state
val next = elem match {
case Empty => (Right(list), false)
case Success(x) => (Right(x :: list), false)
case Error(t) => (Left(t), true)
}
Future(next)
} map {
case Right(list) => Success(list.reverse)
case Left(th) => Error(th)
}
Now if we'll prepare little playground
import scala.concurrent.ExecutionContext.Implicits._
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
val good = Enumerator.enumerate[Result[Int]](
Seq(Success(1), Empty, Success(2), Success(3)))
val bad = Enumerator.enumerate[Result[Int]](
Seq(Success(1), Success(2), Error(new Exception("uh oh")), Success(3)))
def runRes[X](e: Enumerator[Result[X]]) : Result[List[X]] = Await.result(e.run(listResults), 3 seconds)
we can verify those results
runRes(good) //res0: Result[List[Int]] = Success(List(1, 2, 3))
runRes(bad) //res1: Result[List[Int]] = Error(java.lang.Exception: uh oh)

Nearest keys in a SortedMap

Given a key k in a SortedMap, how can I efficiently find the largest key m that is less than or equal to k, and also the smallest key n that is greater than or equal to k. Thank you.
Looking at the source code for 2.9.0, the following code seems about to be the best you can do
def getLessOrEqual[A,B](sm: SortedMap[A,B], bound: A): B = {
val key = sm.to(x).lastKey
sm(key)
}
I don't know exactly how the splitting of the RedBlack tree works, but I guess it's something like a O(log n) traversal of the tree/construction of new elements and then a balancing, presumable also O(log n). Then you need to go down the new tree again to get the last key. Unfortunately you can't retrieve the value in the same go. So you have to go down again to fetch the value.
In addition the lastKey might throw an exception and there is no similar method that returns an Option.
I'm waiting for corrections.
Edit and personal comment
The SortedMap area of the std lib seems to be a bit neglected. I'm also missing a mutable SortedMap. And looking through the sources, I noticed that there are some important methods missing (like the one the OP asks for or the ones pointed out in my answer) and also some have bad implementation, like 'last' which is defined by TraversableLike and goes through the complete tree from first to last to obtain the last element.
Edit 2
Now the question is reformulated my answer is not valid anymore (well it wasn't before anyway). I think you have to do the thing I'm describing twice for lessOrEqual and greaterOrEqual. Well you can take a shortcut if you find the equal element.
Scala's SortedSet trait has no method that will give you the closest element to some other element.
It is presently implemented with TreeSet, which is based on RedBlack. The RedBlack tree is not visible through methods on TreeSet, but the protected method tree is protected. Unfortunately, it is basically useless. You'd have to override methods returning TreeSet to return your subclass, but most of them are based on newSet, which is private.
So, in the end, you'd have to duplicate most of TreeSet. On the other hand, it isn't all that much code.
Once you have access to RedBlack, you'd have to implement something similar to RedBlack.Tree's lookup, so you'd have O(logn) performance. That's actually the same complexity of range, though it would certainly do less work.
Alternatively, you'd make a zipper for the tree, so that you could actually navigate through the set in constant time. It would be a lot more work, of course.
Using Scala 2.11.7, the following will give what you want:
scala> val set = SortedSet('a', 'f', 'j', 'z')
set: scala.collection.SortedSet[Char] = TreeSet(a, f, j, z)
scala> val beforeH = set.to('h').last
beforeH: Char = f
scala> val afterH = set.from('h').head
afterH: Char = j
Generally you should use lastOption and headOption as the specified elements may not exist. If you are looking to squeeze a little more efficiency out, you can try replacing from(...).head with keysIteratorFrom(...).head
Sadly, the Scala library only allows to make this type of query efficiently:
and also the smallest key n that is greater than or equal to k.
val n = TreeMap(...).keysIteratorFrom(k).next
You can hack this by keeping two structures, one with normal keys, and one with negated keys. Then you can use the other structure to make the second type of query.
val n = - TreeMap(...).keysIteratorFrom(-k).next
Looks like I should file a ticket to add 'fromIterator' and 'toIterator' methods to 'Sorted' trait.
Well, one option is certainly using java.util.TreeMap.
It has lowerKey and higherKey methods, which do excatly what you want.
I had a similar problem: I wanted to find the closest element to a given key in a SortedMap. I remember the answer to this question being, "You have to hack TreeSet," so when I had to implement it for a project, I found a way to wrap TreeSet without getting into its internals.
I didn't see jazmit's answer, which more closely answers the original poster's question with minimum fuss (two method calls). However, those method calls do more work than needed for this application (multiple tree traversals), and my solution provides lots of hooks where other users can modify it to their own needs.
Here it is:
import scala.collection.immutable.TreeSet
import scala.collection.SortedMap
// generalize the idea of an Ordering to metric sets
trait MetricOrdering[T] extends Ordering[T] {
def distance(x: T, y: T): Double
def compare(x: T, y: T) = {
val d = distance(x, y)
if (d > 0.0) 1
else if (d < 0.0) -1
else 0
}
}
class MetricSortedMap[A, B]
(elems: (A, B)*)
(implicit val ordering: MetricOrdering[A])
extends SortedMap[A, B] {
// while TreeSet searches for an element, keep track of the best it finds
// with *thread-safe* mutable state, of course
private val best = new java.lang.ThreadLocal[(Double, A, B)]
best.set((-1.0, null.asInstanceOf[A], null.asInstanceOf[B]))
private val ord = new MetricOrdering[(A, B)] {
def distance(x: (A, B), y: (A, B)) = {
val diff = ordering.distance(x._1, y._1)
val absdiff = Math.abs(diff)
// the "to" position is a key-null pair; the object of interest
// is the other one
if (absdiff < best.get._1)
(x, y) match {
// in practice, TreeSet always picks this first case, but that's
// insider knowledge
case ((to, null), (pos, obj)) =>
best.set((absdiff, pos, obj))
case ((pos, obj), (to, null)) =>
best.set((absdiff, pos, obj))
case _ =>
}
diff
}
}
// use a TreeSet as a backing (not TreeMap because we need to get
// the whole pair back when we query it)
private val treeSet = TreeSet[(A, B)](elems: _*)(ord)
// find the closest key and return:
// (distance to key, the key, its associated value)
def closest(to: A): (Double, A, B) = {
treeSet.headOption match {
case Some((pos, obj)) =>
best.set((ordering.distance(to, pos), pos, obj))
case None =>
throw new java.util.NoSuchElementException(
"SortedMap has no elements, and hence no closest element")
}
treeSet((to, null.asInstanceOf[B])) // called for side effects
best.get
}
// satisfy the contract (or throw UnsupportedOperationException)
def +[B1 >: B](kv: (A, B1)): SortedMap[A, B1] =
new MetricSortedMap[A, B](
elems :+ (kv._1, kv._2.asInstanceOf[B]): _*)
def -(key: A): SortedMap[A, B] =
new MetricSortedMap[A, B](elems.filter(_._1 != key): _*)
def get(key: A): Option[B] = treeSet.find(_._1 == key).map(_._2)
def iterator: Iterator[(A, B)] = treeSet.iterator
def rangeImpl(from: Option[A], until: Option[A]): SortedMap[A, B] =
new MetricSortedMap[A, B](treeSet.rangeImpl(
from.map((_, null.asInstanceOf[B])),
until.map((_, null.asInstanceOf[B]))).toSeq: _*)
}
// test it with A = Double
implicit val doubleOrdering =
new MetricOrdering[Double] {
def distance(x: Double, y: Double) = x - y
}
// and B = String
val stuff = new MetricSortedMap[Double, String](
3.3 -> "three",
1.1 -> "one",
5.5 -> "five",
4.4 -> "four",
2.2 -> "two")
println(stuff.iterator.toList)
println(stuff.closest(1.5))
println(stuff.closest(1000))
println(stuff.closest(-1000))
println(stuff.closest(3.3))
println(stuff.closest(3.4))
println(stuff.closest(3.2))
I've been doing:
val m = SortedMap(myMap.toSeq:_*)
val offsetMap = (m.toSeq zip m.keys.toSeq.drop(1)).map {
case ( (k,v),newKey) => (newKey,v)
}.toMap
When I want the results of my map off-set by one key. I'm also looking for a better way, preferably without storing an extra map.