I'm trying to model a relationship which can be reversed. For example, the reverse of North might be South. The reverse of Left might be Right. I'd like to use a case class to represent my relationships. I found a similar solution that uses case Objects here, but it's not quite what I want, here.
Here's my non-functional code:
case class Relationship(name: String, opposite:Relationship)
def relationshipFactory(nameA:String, nameB:String): Relationship = {
lazy val x:Relationship = Relationship(nameA, Relationship(nameB, x))
x
}
val ns = relationshipFactory("North", "South")
ns // North
ns.opposite // South
ns.opposite.opposite // North
ns.opposite.opposite.opposite // South
Can this code be changed so that:
It dosen't crash
I can create these things on demand as pairs.
If you really want to build graphs of immutable objects with circular dependencies, you have to declare opposite as def, and (preferably) throw one more lazy val into the mix:
abstract class Relationship(val name: String) {
def opposite: Relationship
}
object Relationship {
/** Factory method */
def apply(nameA: String, nameB: String): Relationship = {
lazy val x: Relationship = new Relationship(nameA) {
lazy val opposite = new Relationship(nameB) {
def opposite = x
}
}
x
}
/** Extractor */
def unapply(r: Relationship): Option[(String, Relationship)] =
Some((r.name, r.opposite))
}
val ns = Relationship("North", "South")
println(ns.name)
println(ns.opposite.name)
println(ns.opposite.opposite.name)
println(ns.opposite.opposite.opposite.name)
You can quickly convince yourself that nothing bad happens if you run a few million rounds on this circle of circular dependencies:
// just to demonstrate that it doesn't blow up in any way if you
// call it hundred million times:
// Should be "North"
println((1 to 100000000).foldLeft(ns)((r, _) => r.opposite).name)
It indeed prints "North". It doesn work with case classes, but you can always add your own extractors, so this works:
val Relationship(x, op) = ns
val Relationship(y, original) = op
println(s"Extracted x = $x y = $y")
It prints "North" and "South" for x and y.
However, the more obvious thing to do would be to just save both components of a relation, and add opposite as a method that constructs the opposite pair.
case class Rel(a: String, b: String) {
def opposite: Rel = Rel(b, a)
}
Actually, this is already implemented in the standard library:
scala> val rel = ("North", "South")
rel: (String, String) = (North,South)
scala> rel.swap
res0: (String, String) = (South,North)
you have cyclic dependencies, this won't work. One option is to do:
case class Relationship(name: String)
and have a setter to specify the opposite. The factory would then do:
def relationshipFactory(nameA:String, nameB:String): Relationship = {
val x:Relationship = Relationship(nameA)
val opposite = Relationship(nameB)
x.setOpposite(opposite)
opposite.setOpposite(x)
x
}
another option:
case class Relationship(name: String) {
lazy val opposite = Utils.computeOpposite(this)
}
and have the opposite logic on the Utils object
yet another option: probably you don't want several South instances, so you should use case objects or enums (more on that at http://pedrorijo.com/blog/scala-enums/)
Using enums you can use pattern matching to do that logic without no overhead
Related
I'm trying to implement a functional Breadth First Search in Scala to compute the distances between a given node and all the other nodes in an unweighted graph. I've used a State Monad for this with the signature as :-
case class State[S,A](run:S => (A,S))
Other functions such as map, flatMap, sequence, modify etc etc are similar to what you'd find inside a standard State Monad.
Here's the code :-
case class Node(label: Int)
case class BfsState(q: Queue[Node], nodesList: List[Node], discovered: Set[Node], distanceFromSrc: Map[Node, Int]) {
val isTerminated = q.isEmpty
}
case class Graph(adjList: Map[Node, List[Node]]) {
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
val initialBfsState = BfsState(Queue(src), List(src), Set(src), Map(src -> 0))
val output = bfsComp(initialBfsState)
(output.nodesList,output.distanceFromSrc)
}
#tailrec
private def bfsComp(currState:BfsState): BfsState = {
if (currState.isTerminated) currState
else bfsComp(searchNode.run(currState)._2)
}
private def searchNode: State[BfsState, Unit] = for {
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
s <- get
_ <- sequence(adjList(node).filter(!s.discovered(_)).map(n => {
modify[BfsState](s => {
s.copy(s.q.enqueue(n), n :: s.nodesList, s.discovered + n, s.distanceFromSrc + (n -> (s.distanceFromSrc(node) + 1)))
})
}))
} yield ()
}
Please can you advice on :-
Should the State Transition on dequeue in the searchNode function be a member of BfsState itself?
How do I make this code more performant/concise/readable?
First off, I suggest moving all the private defs related to bfs into bfs itself. This is the convention for methods that are solely used to implement another.
Second, I suggest simply not using State for this matter. State (like most monads) is about composition. It is useful when you have many things that all need access to the same global state. In this case, BfsState is specialized to bfs, will likely never be used anywhere else (it might be a good idea to move the class into bfs too), and the State itself is always run, so the outer world never sees it. (In many cases, this is fine, but here the scope is too small for State to be useful.) It'd be much cleaner to pull the logic of searchNode into bfsComp itself.
Third, I don't understand why you need both nodesList and discovered, when you can just call _.toList on discovered once you've done your computation. I've left it in in my reimplementation, though, in case there's more to this code that you haven't displayed.
def bfsComp(old: BfsState): BfsState = {
if(old.q.isEmpty) old // You don't need isTerminated, I think
else {
val (currNode, newQ) = old.q.dequeue
val newState = old.copy(q = newQ)
adjList(curNode)
.filterNot(s.discovered) // Set[T] <: T => Boolean and filterNot means you don't need to write !s.discovered(_)
.foldLeft(newState) { case (BfsState(q, nodes, discovered, distance), adjNode) =>
BfsState(
q.enqueue(adjNode),
adjNode :: nodes,
discovered + adjNode,
distance + (adjNode -> (distance(currNode) + 1)
)
}
}
}
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
// I suggest moving BfsState and bfsComp into this method
val output = bfsComp(BfsState(Queue(src), List(src), Set(src), Map(src -> 0)))
(output.nodesList, output.distanceFromSrc)
// Could get rid of nodesList and say output.discovered.toList
}
In the event that you think you do have a good reason for using State here, here are my thoughts.
You use def searchNode. The point of a State is that it is pure and immutable, so it should be a val, or else you reconstruct the same State every use.
You write:
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
First off, Scala's syntax was designed so that you don't need to have both a () and {} surrounding an anonymous function:
node <- State[BfsState, Node] { s =>
// ...
}
Second, this doesn't look quite right to me. One benefit of using for-syntax is that the anonymous functions are hidden from you and there is minimal indentation. I'd just write it out
oldState <- get
(node, newQ) = oldState.q.dequeue
newState = oldState.copy(q = newQ)
Footnote: would it make sense to make Node an inner class of Graph? Just a suggestion.
Using Monocle I can define a Lens to read a case class member without issue,
val md5Lens = GenLens[Message](_.md5)
This can used to compare the value of md5 between two objects and fail with an error message that includes the field name when the values differ.
Is there a way to produce a user-friendly string from the Lens alone that identifies the field being read by the lens? I want to avoid providing the field name explicitly
val md5LensAndName = (GenLens[Message](_.md5), "md5")
If there is a solution that also works with lenses with more than one component then even better. For me it would be good even if the solution only worked to a depth of one.
This is fundamentally impossible. Conceptually, lens is nothing more than a pair of functions: one to get a value from object and one to obtain new object using a given value. That functions can be implemented by the means of accessing the source object's fields or not. In fact, even GenLens macro can use a chain field accessors like _.field1.field2 to generate composite lenses to the fields of nested objects. That can be confusing at first, but this feature have its uses. For example, you can decouple the format of data storage and representation:
import monocle._
case class Person private(value: String) {
import Person._
private def replace(
array: Array[String], index: Int, item: String
): Array[String] = {
val copy = Array.ofDim[String](array.length)
array.copyToArray(copy)
copy(index) = item
copy
}
def replaceItem(index: Int, item: String): Person = {
val array = value.split(delimiter)
val newArray = replace(array, index, item)
val newValue = newArray.mkString(delimiter)
Person(newValue)
}
def getItem(index: Int): String = {
val array = value.split(delimiter)
array(index)
}
}
object Person {
private val delimiter: String = ";"
val nameIndex: Int = 0
val cityIndex: Int = 1
def apply(name: String, address: String): Person =
Person(Array(name, address).mkString(delimiter))
}
val name: Lens[Person, String] =
Lens[Person, String](
_.getItem(Person.nameIndex)
)(
name => person => person.replaceItem(Person.nameIndex, name)
)
val city: Lens[Person, String] =
Lens[Person, String](
_.getItem(Person.cityIndex)
)(
city => person => person.replaceItem(Person.cityIndex, city)
)
val person = Person("John", "London")
val personAfterMove = city.set("New York")(person)
println(name.get(personAfterMove)) // John
println(city.get(personAfterMove)) // New York
While not very performant, that example illustrates the idea: Person class don't have city or address fields, but by wrapping data extractor and a string rebuild function into Lens, we can pretend it have them. For more complex objects, lens composition works as usual: inner lens just operates on extracted object, relying on outer one to pack it back.
I want to check if a specify id that contained in an Enumeration.
So I write down the contains function
object Enum extends Enumeration {
type Enum = Value
val A = Value(2, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
}
But the time cost is unexpected while id is a big number, such as over eight-digit
val A = Value(222222222, "A")
Then the contains function cost over 1000ms per calling.
And I also noticed the first time calling always cost hundreds millisecond whether the id is big or small.
I can't figure out why.
First, lets talk about the cost of Enum.values. This is implemented here:
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L83
The implementation is essentially setting up a mutable map. Once it is set up, it is re-used.
The cost for big numbers in your Value is because, internally Scala library uses a BitSet.
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L245
So, for larger numbers, BitSet will be bigger. That only happens when you call Enum.values.
Depending on your specific uses case you can choose between using Enumeration or Case Object:
Case objects vs Enumerations in Scala
It sure looks like the mechanics of Enumeration don't handle large ints well in that position. The Scaladocs for the class don't say anything about this, but they don't advertise using Enumeration.Value the way you do either. They say, e.g., val A = Value, where you say val A = Value(2000, "A").
If you want to keep your contains method as you have it, why don't you cache the Enum.values.map(_.id)? Much faster.
object mult extends App {
object Enum extends Enumeration {
type Enum = Value
val A1 = Value(1, "A")
val A2 = Value(2, "A")
val A222 = Enum.Value(222222222, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
val cache = Enum.values.map(_.id)
def contains2(value: Int): Boolean = {
cache.contains(value)
}
}
def clockit(desc: String, f: => Unit) = {
val start = System.currentTimeMillis
f
val end = System.currentTimeMillis
println(s"$desc ${end - start}")
}
clockit("initialize Enum ", Enum.A1)
clockit("contains 2 ", Enum.contains(2))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains2 2 ", Enum.contains2(2))
clockit("contains2 222222222", Enum.contains2(222222222))
}
Say I have the following data structure:
case class Timestamped[CC[M] < Seq[M]](elems : CC, timestamp : String)
So it's essentially a sequence with an attribute -- a timestamp -- attached to it. This works fine and I could create new instances with the syntax
val t = Timestamped(Seq(1,2,3,4),"2014-02-25")
t.elems.head // 1
t.timestamp // "2014-05-25"
The syntax is unwieldly and instead I want to be able to do something like:
Timestamped(1,2,3,4)("2014-02-25")
t.head // 1
t.timestamp // "2014-05-25"
Where timestamped is just an extension of a Seq and it's implementation SeqLike, with a single attribute val timestamp : String.
This seems easy to do; just use a Seq with a mixin TimestampMixin { val timestamp : String }. But I can't figure out how to create the constructor. My question is: how do I create a constructor in the companion object, that creates a sequence with an extra member value? The signature is as follows:
object Timestamped {
def apply(elems: M*)(timestamp : String) : Seq[M] with TimestampMixin = ???
}
You'll see that it's not straightforward; collections use Builders to instantiate themselves, so I can't simply call the constructor an override some vals.
Scala collections are very complicated structures when it comes down to it. Extending Seq requires implementing apply, length, and iterator methods. In the end, you'll probably end up duplicating existing code for List, Set, or something else. You'll also probably have to worry about CanBuildFroms for your collection, which in the end I don't think is worth it if you just want to add a field.
Instead, consider an implicit conversion from your Timestamped type to Seq.
case class Timestamped[A](elems: Seq[A])(timestamp: String)
object Timestamped {
implicit def toSeq[A](ts: Timestamped[A]): Seq[A] = ts.elems
}
Now, whenever I try to call a method from Seq, the compiler will implicitly convert Timestamped to Seq, and we can proceed as normal.
scala> val ts = Timestamped(List(1,2,3,4))("1/2/34")
ts: Timestamped[Int] = Timestamped(List(1, 2, 3, 4))
scala> ts.filter(_ > 2)
res18: Seq[Int] = List(3, 4)
There is one major drawback here, and it's that we're now stuck with Seq after performing operations on the original Timestamped.
Go the other way... extend Seq, it only has 3 abstract members:
case class Stamped[T](elems: Seq[T], stamp: Long) extends Seq[T] {
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(List(10,20,30), 15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Keep in mind, this only works as long as Seq is specific enough for your use cases, if you later find you need more complex collection behavior this may become untenable.
EDIT: Added a version closer to the signature you were trying to create. Not sure if this helps you any more:
case class Stamped[T](elems: T*)(stamp: Long) extends Seq[T] {
def timeStamp = stamp
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(10,20,30)(15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Where elems would end up being a generically created WrappedArray.
I have a class that represents sales orders:
class SalesOrder(val f01:String, val f02:Int, ..., f50:Date)
The fXX fields are of various types. I am faced with the problem of creating an audit trail of my orders. Given two instances of the class, I have to determine which fields have changed. I have come up with the following:
class SalesOrder(val f01:String, val f02:Int, ..., val f50:Date){
def auditDifferences(that:SalesOrder): List[String] = {
def diff[A](fieldName:String, getField: SalesOrder => A) =
if(getField(this) != getField(that)) Some(fieldName) else None
val diffList = diff("f01", _.f01) :: diff("f02", _.f02) :: ...
:: diff("f50", _.f50) :: Nil
diffList.flatten
}
}
I was wondering what the compiler does with all the _.fXX functions: are they instanced just once (statically), and can be shared by all instances of my class, or will they be instanced every time I create an instance of my class?
My worry is that, since I will use a lot of SalesOrder instances, it may create a lot of garbage. Should I use a different approach?
One clean way of solving this problem would be to use the standard library's Ordering type class. For example:
class SalesOrder(val f01: String, val f02: Int, val f03: Char) {
def diff(that: SalesOrder) = SalesOrder.fieldOrderings.collect {
case (name, ord) if !ord.equiv(this, that) => name
}
}
object SalesOrder {
val fieldOrderings: List[(String, Ordering[SalesOrder])] = List(
"f01" -> Ordering.by(_.f01),
"f02" -> Ordering.by(_.f02),
"f03" -> Ordering.by(_.f03)
)
}
And then:
scala> val orderA = new SalesOrder("a", 1, 'a')
orderA: SalesOrder = SalesOrder#5827384f
scala> val orderB = new SalesOrder("b", 1, 'b')
orderB: SalesOrder = SalesOrder#3bf2e1c7
scala> orderA diff orderB
res0: List[String] = List(f01, f03)
You almost certainly don't need to worry about the perfomance of your original formulation, but this version is (arguably) nicer for unrelated reasons.
Yes, that creates 50 short lived functions. I don't think you should be worried unless you have manifest evidence that that causes a performance problem in your case.
But I would define a method that transforms SalesOrder into a Map[String, Any], then you would just have
trait SalesOrder {
def fields: Map[String, Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] = {
val af = a.fields
val bf = b.fields
af.collect { case (key, value) if bf(key) != value => key }
}
If the field names are indeed just incremental numbers, you could simplify
trait SalesOrder {
def fields: Iterable[Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] =
(a.fields zip b.fields).zipWithIndex.collect {
case ((av, bv), idx) if av != bv => f"f${idx + 1}%02d"
}