Working with strings and objects - scala

I think my problem is pretty straight forward - I have a text file made out of 'E' and 'B' symbols, for example:
EBBEBBB BBEB
E
BEB BEB B
B
Now i want to get this data. When i use it, i don't want it to be in a form of strings because you can pass anything that would not work, for example something like a number or any other invalid symbol. That's why i figured i could create some case objects that extends a single trait(like shown below). Problem is, I don't know how i should CORRECTLY convert my string data to that particular data structure that I made:
sealed trait EB
case object E extends EB
case object B extends EB
case class EB_Text(data: Vector[EB])
def convertText(fileData: Vector[String]) : EB_Text = {
//Match each symbol and check if it's 'E' or 'B' ?
//If i find an invalid symbol here, what do i return? Should i return AN Option here?
}
Thank you! ^^

You can construct the function in the following way
def convertText(fileData: Vector[String]) : EB_Text = {
EB_Text(fileData.map{
singleLine =>
singleLine.replaceAll(" ","").toUpperCase().collect{
case 'E' => E
case 'B' => B
}
}.flatten)
}
You do not need to make any modifications to the case objects that you had defined earlier. You can keep them as it is
sealed trait EB
case object E extends EB
case object B extends EB
case class EB_Text(data: Vector[EB])
On invoking the function with the following input, you will get the output as
val input = Vector("EBBEBBB BBEB"," E","BEB BEB B"," B ")
convertText(input)
you will get the output as
res0: EB_Text = EB_Text(Vector(E, B, B, E, B, B, B, B, B, E, B, E, B, E, B, B, E, B, B, B))
I hope this answers your question.

Your logic is correct and returning an options sounds like the right thing to do. If you don't need the option though (or any other information related to the convert operation, e.g. the underlying string) there is no reason to construct it.
I also don't think that is necessary to wrap the result in a case class.
def convertText(fileData: Vector[String]) : Vector[EB] = {
for (s <- fileData if s == "E" || s == "B") yield {
if (s == "E") E
else B
}
}

Related

Scala Add elements to set only by checking specific fields of a case class

I have a case class
case class ApiCall(a: String, b: String, c: String, d: String, e: String, f: String)
I also have a mutable Set: private var apiCalls: mutable.Set[ApiCall] = mutable.Set[ApiCall]()
The problem is, I may get the following ApiCall elements:
ApiCall(a1, b1, c1, d1, e1, f1)
ApiCall(a1, b1, c1, d2, e2, f2)
I have to add elements to the set only if a unique combination of (a, b, c) of the case class ApiCall doesn't already exist. I cannot modify the case class itself because it is being used in multiple places.
Is it possible to add case class elements to the set only by looking at certain fields, and not all?
You might want to use Map instead of Set in your case:
val apiCalls = mutable.Map[(String, String, String), ApiCall]()
Also values are replaced for matching keys inside Map, you might need a separate method to update the API calls map:
def updateApiCalls(call: ApiCall): Unit = {
apiCalls.getOrElseUpdate((call.a, call.b, call.c), call)
()
}
I kind of solved it using a small workaround:
private var _apiCalls: mutable.Set[ApiCall] = mutable.Set[ApiCall]() is my Set of ApiCalls
I wrote a function which will add to set only if there's no 3 part key (a, b, c) already existing in the Set:
def addApiCall(e: X): Unit = {
val m = _x.find(m => m.a == e.a && m.b == e.b && m.c == e.c)
if (m.isEmpty)
_x += e
}
I'm not going to have too many elements in the Set either, so I found this approach easier to handle for me..

Foldable "foldMap" that take a partial function: foldCollect?

Say, I have the following object:
case class MyFancyObject(a: String, b: Int, c : Vector[String])
And what I needed is to get a single Vector[String] containing all 'c's that match a given partial function.
E.g.:
val xs = Vector(
MyFancyObject("test1",1,Vector("test1-1","test1-2","test1-3")),
MyFancyObject("test2",2,Vector("test2-1","test2-2","test2-3")),
MyFancyObject("test3",3,Vector("test3-1","test3-2","test3-3")),
MyFancyObject("test4",4,Vector("test4-1","test4-2","test4-3"))
)
val partialFunction1 : PartialFunction[MyFancyObject,Vector[String]] = {
case MyFancyObject(_,b,c) if b > 2 => c
}
What I need to get is: Vector("test3-1","test3-2","test3-3","test4-1","test4-2","test4-3").
I solved this doing the following:
val res1 = xs.foldMap{
case MyFancyObject(_,b,c) if b > 2 => c
case _ => Vector.empty[String]
}
However, this made me curious. What I am doing here seemed to be a pretty common and natural thing: for each element of a foldable collection, try to apply a partial function and, should that fail, default to the Monoid's empty (Vector.empty in my case). I searched in the library and I did not find anything doing this already, so I ended up adding this extension method in my code:
implicit class FoldableExt[F[_], A](foldable : F[A]) {
def foldCollect[B](pF: PartialFunction[A, B])(implicit F : Foldable[F], B : Monoid[B]) : B = {
F.foldMap(foldable)(pF.applyOrElse(_, (_ : A) => B.empty))
}
}
My question here is:
Is there any reason why such a method would not be in available already? Is it not a generic and common enough scenario, or am I missing something?
I think that if you really need the partial function, you don't want it to leak outside, because it's not very nice to use. The best thing to do, if you want to reuse your partialFunction1, is to lift it to make it a total function that returns Option. Then you can provide your default case in the same place you use your partial function. Here's the approach:
val res2 = xs.foldMap(partialFunction1.lift).getOrElse(Vector.empty)
The foldMap(partialFunction1.lift) returns Some(Vector(test3-1, test3-2, test3-3, test4-1, test4-2, test4-3)). This is exactly what you have in res1, but wrapped in Option.

Topological sort in scala

I'm looking for a nice implementation of topological sorting in scala.
The solution should be stable:
If input is already sorted, the output should be unchanged
The algorithm should be deterministic (hashCode has no effect)
I suspect there are libraries that can do this, but I wouldn't like to add nontrivial dependencies due to this.
Example problem:
case class Node(name: String)(val referenced: Node*)
val a = Node("a")()
val b = Node("b")(a)
val c = Node("c")(a)
val d = Node("d")(b, c)
val e = Node("e")(d)
val f = Node("f")()
assertEquals("Previous order is kept",
Vector(f, a, b, c, d, e),
topoSort(Vector(f, a, b, c, d, e)))
assertEquals(Vector(a, b, c, d, f, e),
topoSort(Vector(d, c, b, f, a, e)))
Here the order is defined such that if the nodes were say declarations in a programming language referencing other declarations, the result order would
be such that no declaration is used before it has been declared.
Here is my own solution. Additionnally it returns possible loops detected in the input.
The format of the nodes is not fixed because the caller provides a visitor that
will take a node and a callback and call the callback for each referenced node.
If the loop reporting is not necessary, it should be easy to remove.
import scala.collection.mutable
// Based on https://en.wikipedia.org/wiki/Topological_sorting?oldformat=true#Depth-first_search
object TopologicalSort {
case class Result[T](result: IndexedSeq[T], loops: IndexedSeq[IndexedSeq[T]])
type Visit[T] = (T) => Unit
// A visitor is a function that takes a node and a callback.
// The visitor calls the callback for each node referenced by the given node.
type Visitor[T] = (T, Visit[T]) => Unit
def topoSort[T <: AnyRef](input: Iterable[T], visitor: Visitor[T]): Result[T] = {
// Buffer, because it is operated in a stack like fashion
val temporarilyMarked = mutable.Buffer[T]()
val permanentlyMarked = mutable.HashSet[T]()
val loopsBuilder = IndexedSeq.newBuilder[IndexedSeq[T]]
val resultBuilder = IndexedSeq.newBuilder[T]
def visit(node: T): Unit = {
if (temporarilyMarked.contains(node)) {
val loopStartIndex = temporarilyMarked.indexOf(node)
val loop = temporarilyMarked.slice(loopStartIndex, temporarilyMarked.size)
.toIndexedSeq
loopsBuilder += loop
} else if (!permanentlyMarked.contains(node)) {
temporarilyMarked += node
visitor(node, visit)
permanentlyMarked += node
temporarilyMarked.remove(temporarilyMarked.size - 1, 1)
resultBuilder += node
}
}
for (i <- input) {
if (!permanentlyMarked.contains(i)) {
visit(i)
}
}
Result(resultBuilder.result(), loopsBuilder.result())
}
}
In the example of the question this would be applied like this:
import TopologicalSort._
def visitor(node: BaseNode, callback: (Node) => Unit): Unit = {
node.referenced.foreach(callback)
}
assertEquals("Previous order is kept",
Vector(f, a, b, c, d, e),
topoSort(Vector(f, a, b, c, d, e), visitor).result)
assertEquals(Vector(a, b, c, d, f, e),
topoSort(Vector(d, c, b, f, a, e), visitor).result)
Some thoughts on complexity:
The worst case complexity of this solution is actually above O(n + m) because the temporarilyMarked array is scanned for each node.
The asymptotic complexity would be improved if the temporarilyMarked would be replaced with for example a HashSet.
A true O(n + m) would be achieved if the marks were be stored directly inside the nodes, but storing them outside makes writing a generic solution easier.
I haven't run any performance tests, but I suspect scanning the temporarilyMarked array is not a problem even in large graphs as long as they are not very deep.
Example code and test on Github
I have very similar code is also published here. That version has a test suite which can be useful for experimenting and exploring the implementation.
Why would you detect loops
Detecting loops can be useful for example in serialization situations where most of the data can be handled as a DAG, but loops can be handled with some kind of special arrangement.
The test suite in the Github code linked to in above section contains various cases with multiple loops.
Here's a purely functional implementation that returns the topological ordering ONLY if the graph is acyclic.
case class Node(label: Int)
case class Graph(adj: Map[Node, Set[Node]]) {
case class DfsState(discovered: Set[Node] = Set(), activeNodes: Set[Node] = Set(), tsOrder: List[Node] = List(),
isCylic: Boolean = false)
def dfs: (List[Node], Boolean) = {
def dfsVisit(currState: DfsState, src: Node): DfsState = {
val newState = currState.copy(discovered = currState.discovered + src, activeNodes = currState.activeNodes + src,
isCylic = currState.isCylic || adj(src).exists(currState.activeNodes))
val finalState = adj(src).filterNot(newState.discovered).foldLeft(newState)(dfsVisit(_, _))
finalState.copy(tsOrder = src :: finalState.tsOrder, activeNodes = finalState.activeNodes - src)
}
val stateAfterSearch = adj.keys.foldLeft(DfsState()) {(state, n) => if (state.discovered(n)) state else dfsVisit(state, n)}
(stateAfterSearch.tsOrder, stateAfterSearch.isCylic)
}
def topologicalSort: Option[List[Node]] = dfs match {
case (topologicalOrder, false) => Some(topologicalOrder)
case _ => None
}
}

How to create an Iteratee that passes through values to an inner Iteratee unless a specific value is found

I've got an ADT that's essentially a cross between Option and Try:
sealed trait Result[+T]
case object Empty extends Result[Nothing]
case class Error(cause: Throwable) extends Result[Nothing]
case class Success[T](value: T) extends Result[T]
(assume common combinators like map, flatMap etc are defined on Result)
Given an Iteratee[A, Result[B] called inner, I want to create a new Iteratee[Result[A], Result[B]] with the following behavior:
If the input is a Success(a), feed a to inner
If the input is an Empty, no-op
If the input is an Error(err), I want inner to be completely ignored, instead returning a Done iteratee with the Error(err) as its result.
Example Behavior:
// inner: Iteratee[Int, Result[List[Int]]]
// inputs:
1
2
3
// output:
Success(List(1,2,3))
// wrapForResultInput(inner): Iteratee[Result[Int], Result[List[Int]]]
// inputs:
Success(1)
Success(2)
Error(Exception("uh oh"))
Success(3)
// output:
Error(Exception("uh oh"))
This sounds to me like the job for an Enumeratee, but I haven't been able to find anything in the docs that looks like it'll do what I want, and the internal implementations are still voodoo to me.
How can I implement wrapForResultInput to create the behavior described above?
Adding some more detail that won't really fit in a comment:
Yes it looks like I was mistaken in my question. I described it in terms of Iteratees but it seems I really am looking for Enumeratees.
At a certain point in the API I'm building, there's a Transformer[A] class that is essentially an Enumeratee[Event, Result[A]]. I'd like to allow clients to transform that object by providing an Enumeratee[Result[A], Result[B]], which would result in a Transformer[B] aka an Enumeratee[Event, Result[B]].
For a more complex example, suppose I have a Transformer[AorB] and want to turn that into a Transformer[(A, List[B])]:
// the Transformer[AorB] would give
a, b, a, b, b, b, a, a, b
// but the client wants to have
a -> List(b),
a -> List(b, b, b),
a -> Nil
a -> List(b)
The client could implement an Enumeratee[AorB, Result[(A, List[B])]] without too much trouble using Enumeratee.grouped, but they are required to provide an Enumeratee[Result[AorB], Result[(A, List[B])] which seems to introduce a lot of complication that I'd like to hide from them if possible.
val easyClientEnumeratee = Enumeratee.grouped[AorB]{
for {
_ <- Enumeratee.dropWhile(_ != a) ><> Iteratee.ignore
headResult <- Iteratee.head.map{ Result.fromOption }
bs <- Enumeratee.takeWhile(_ == b) ><> Iteratee.getChunks
} yield headResult.map{_ -> bs}
val harderEnumeratee = ??? ><> easyClientEnumeratee
val oldTransformer: Transformer[AorB] = ... // assume it already exists
val newTransformer: Transformer[(A, List[B])] = oldTransformer.andThen(harderEnumeratee)
So what I'm looking for is the ??? to define the harderEnumeratee in order to ease the burden on the user who already implemented easyClientEnumeratee.
I guess the ??? should be an Enumeratee[Result[AorB], AorB], but if I try something like
Enumeratee.collect[Result[AorB]] {
case Success(ab) => ab
case Error(err) => throw err
}
the error will actually be thrown; I actually want the error to come back out as an Error(err).
Simplest implementation of such would be Iteratee.fold2 method, that could collect elements until something is happened.
Since you return single result and can't really return anything until you verify there is no errors, Iteratee would be enough for such a task
def listResults[E] = Iteratee.fold2[Result[E], Either[Throwable, List[E]]](Right(Nil)) { (state, elem) =>
val Right(list) = state
val next = elem match {
case Empty => (Right(list), false)
case Success(x) => (Right(x :: list), false)
case Error(t) => (Left(t), true)
}
Future(next)
} map {
case Right(list) => Success(list.reverse)
case Left(th) => Error(th)
}
Now if we'll prepare little playground
import scala.concurrent.ExecutionContext.Implicits._
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
val good = Enumerator.enumerate[Result[Int]](
Seq(Success(1), Empty, Success(2), Success(3)))
val bad = Enumerator.enumerate[Result[Int]](
Seq(Success(1), Success(2), Error(new Exception("uh oh")), Success(3)))
def runRes[X](e: Enumerator[Result[X]]) : Result[List[X]] = Await.result(e.run(listResults), 3 seconds)
we can verify those results
runRes(good) //res0: Result[List[Int]] = Success(List(1, 2, 3))
runRes(bad) //res1: Result[List[Int]] = Error(java.lang.Exception: uh oh)

Specs2 + Scalacheck Generate Tuple with different Strings

I have to test an loop-free graph and always checking whether the Strings are different is not very usable (it throws an exception). There must be a better solution, but I am not able to come up with it, and i am kind of lost in the specs2 documentation.
This is an example of the code:
"BiDirectionalEdge" should {
"throw an Error for the wrong DirectedEdges" in prop {
(a :String, b :String, c :String, d :String) =>
val edge1 = createDirectedEdge(a, b, c)
val edge2 = createDirectedEdge(c, b, d)
new BiDirectionalEdge(edge1, edge2) must throwA[InvalidFormatException] or(a mustEqual d)
}
if a and c are the same, createDirectedEdge will throw an exception (i have different test for that behaviour).
Yep, there's a better way—this is precisely what conditional properties are for. Just add your condition followed by ==>:
"BiDirectionalEdge" should {
"throw an Error for the wrong DirectedEdges" in prop {
(a: String, b: String, c: String, d: String) => (a != c) ==>
val edge1 = createDirectedEdge(a, b, c)
val edge2 = createDirectedEdge(c, b, d)
new BiDirectionalEdge(edge1, edge2) must
throwA[InvalidFormatException] or(a mustEqual d)
}
}
If the condition is likely to fail often, you should probably take a different approach (see the ScalaCheck guide for details), but in your case a conditional property is totally appropriate.