Scala BitSet and shift operations - scala

I'm looking for a way to represent a set of integers with a bit vector (which would be the characteristic function of that set of integers) and be able to perform bitwise operations on this set.
Initially I thought scala's BitSet would be the ideal candidate. However, it seems BitSet doesn't support shifting operations according to the documentation 1. Upon further investigation I also found that the related Java BitSet implementation doesn't support shift operations either 2.
Am I left with the only option of implementing my own BitSet class which supports shift operations? Moreover, according to the description given in 3 it doesn't sound that difficult to support shift operations on the Scala's BitSet implementation, or have I misunderstood something here?
Thanks in advance.

The usual trick when faced with a need for retrofitting new functionality is the "Pimp My Library" pattern. Implicitly convert the BitSet to a dedicated type intended to perform the added operation:
class ShiftableBitSet(bs: BitSet) {
def shiftLeft(n: Int): BitSet = ... //impl goes here
}
implicit def bitsetIsShiftable(bs: BitSet) = new ShiftableBitSet(bs)
val sample = BitSet(1,2,3,5,7,9)
val shifted = sample.shiftLeft(2)
Alter shiftLeft to whatever name and with whatever arguments you prefer.
UPDATE
If you know for certain that you'll have an immutable BitSet, then a (slightly hacky) approach to access the raw underlying array is to pattern match. Not too painful either, as there are only 3 possible concrete subclasses for an immutable BitSet:
import collection.immutable.BitSet
val bitSet = BitSet(1,2,3)
bitSet match {
case bs: BitSet.BitSet1 => Array(bs.elems)
case bs: BitSet.BitSetN => bs.elems
case _ => error("unusable BitSet")
}
Annoyingly, the elems1 param to BitSet2 isn't a val, and the elems param to a mutable BitSet is marked protected. So it's not perfect, but should do the trick if your set is non-trivial and immutable. For the trivial cases, "normal" access to the set won't be too expensive.
And yes, this technique would be used within the wrapper as described above.

You can just use map, for example to shift to left by 4 positions:
import collection.immutable.BitSet
val bitSet = BitSet(1,2,3)
bitSet map (_ + 4)

Related

Why do you need Arbitraries in scalacheck?

I wonder why Arbitrary is needed because automated property testing requires property definition, like
val prop = forAll(v: T => check that property holds for v)
and value v generator. The user guide says that you can create custom generators for custom types (a generator for trees is exemplified). Yet, it does not explain why do you need arbitraries on top of that.
Here is a piece of manual
implicit lazy val arbBool: Arbitrary[Boolean] = Arbitrary(oneOf(true, false))
To get support for your own type T you need to define an implicit def
or val of type Arbitrary[T]. Use the factory method Arbitrary(...) to
create the Arbitrary instance. This method takes one parameter of type
Gen[T] and returns an instance of Arbitrary[T].
It clearly says that we need Arbitrary on top of Gen. Justification for arbitrary is not satisfactory, though
The arbitrary generator is the generator used by ScalaCheck when it
generates values for property parameters.
IMO, to use the generators, you need to import them rather than wrapping them into arbitraries! Otherwise, one can argue that we need to wrap arbitraries also into something else to make them usable (and so on ad infinitum wrapping the wrappers endlessly).
You can also explain how does arbitrary[Int] convert argument type into generator. It is very curious and I feel that these are related questions.
forAll { v: T => ... } is implemented with the help of Scala implicits. That means that the generator for the type T is found implicitly instead of being explicitly specified by the caller.
Scala implicits are convenient, but they can also be troublesome if you're not sure what implicit values or conversions currently are in scope. By using a specific type (Arbitrary) for doing implicit lookups, ScalaCheck tries to constrain the negative impacts of using implicits (this use also makes it similar to Haskell typeclasses that are familiar for some users).
So, you are entirely correct that Arbitrary is not really needed. The same effect could have been achieved through implicit Gen[T] values, arguably with a bit more implicit scoping confusion.
As an end-user, you should think of Arbitrary[T] as the default generator for the type T. You can (through scoping) define and use multiple Arbitrary[T] instances, but I wouldn't recommend it. Instead, just skip Arbitrary and specify your generators explicitly:
val myGen1: Gen[T] = ...
val mygen2: Gen[T] = ...
val prop1 = forAll(myGen1) { t => ... }
val prop2 = forAll(myGen2) { t => ... }
arbitrary[Int] works just like forAll { n: Int => ... }, it just looks up the implicit Arbitrary[Int] instance and uses its generator. The implementation is simple:
def arbitrary[T](implicit a: Arbitrary[T]): Gen[T] = a.arbitrary
The implementation of Arbitrary might also be helpful here:
sealed abstract class Arbitrary[T] {
val arbitrary: Gen[T]
}
ScalaCheck has been ported from the Haskell QuickCheck library. In Haskell type-classes only allow one instance for a given type, forcing you into this sort of separation.
In Scala though, there isn't such a constraint and it would be possible to simplify the library. My guess is that, ScalaCheck being (initially written as) a 1-1 mapping of QuickCheck, makes it easier for Haskellers to jump into Scala :)
Here is the Haskell definition of Arbitrary
class Arbitrary a where
-- | A generator for values of the given type.
arbitrary :: Gen a
And Gen
newtype Gen a
As you can see they have a very different semantic, Arbitrary being a type class, and Gen a wrapper with a bunch of combinators to build them.
I agree that the argument of "limiting the scope through semantic" is a bit vague and does not seem to be taken seriously when it comes to organizing the code: the Arbitrary class sometimes simply delegates to Gen instances as in
/** Arbirtrary instance of Calendar */
implicit lazy val arbCalendar: Arbitrary[java.util.Calendar] =
Arbitrary(Gen.calendar)
and sometimes defines its own generator
/** Arbitrary BigInt */
implicit lazy val arbBigInt: Arbitrary[BigInt] = {
val long: Gen[Long] =
Gen.choose(Long.MinValue, Long.MaxValue).map(x => if (x == 0) 1L else x)
val gen1: Gen[BigInt] = for { x <- long } yield BigInt(x)
/* ... */
Arbitrary(frequency((5, gen0), (5, gen1), (4, gen2), (3, gen3), (2, gen4)))
}
So in effect this leads to code duplication (each default Gen being mirrored by an Arbitrary) and some confusion (why isn't Arbitrary[BigInt] not wrapping a default Gen[BigInt]?).
My reading of that is that you might need to have multiple instances of Gen, so Arbitrary is used to "flag" the one that you want ScalaCheck to use?

How to efficiently select a random element from a Scala immutable HashSet

I have a scala.collection.immutable.HashSet that I want to randomly select an element from.
I could solve the problem with an extension method like this:
implicit class HashSetExtensions[T](h: HashSet[T]) {
def nextRandomElement (): Option[T] = {
val list = h.toList
list match {
case null | Nil => None
case _ => Some (list (Random.nextInt (list.length)))
}
}
}
...but converting to a list will be slow. What would be the most efficient solution?
WARNING This answer is for experimental use only. For real project you probably should use your own collection types.
So i did some research in the HashSet source and i think there is little opportunity to someway extract the inner structure of most valuable class HashTrieSet without package violation.
I did come up with this code, which is extended Ben Reich's solution:
package scala.collection
import scala.collection.immutable.HashSet
import scala.util.Random
package object random {
implicit class HashSetRandom[T](set: HashSet[T]) {
def randomElem: Option[T] = set match {
case trie: HashSet.HashTrieSet[T] => {
trie.elems(Random.nextInt(trie.elems.length)).randomElem
}
case _ => Some(set.size) collect {
case size if size > 0 => set.iterator.drop(Random.nextInt(size)).next
}
}
}
}
file should be created somewhere in the src/scala/collection/random folder
note the scala.collection package - this thing makes the elems part of HashTrieSet visible. This is only solution i could think, which could run better than O(n). Current version should have complexity O(ln(n)) as any of immutable.HashSet's operation s.
Another warning - private structure of HashSet is not part of scala's standard library API, so it could change any version making this code erroneous (though it's didn't changed since 2.8)
Since size is O(1) on HashSet, and iterator is as lazy as possible, I think this solution would be relatively efficient:
implicit class RichHashSet[T](val h: HashSet[T]) extends AnyVal {
def nextRandom: Option[T] = Some(h.size) collect {
case size if size > 0 => h.iterator.drop(Random.nextInt(size)).next
}
}
And if you're trying to get every ounce of efficiency you could use match here instead of the more concise Some/collect idiom used here.
You can look at the mutable HashSet implementation to see the size method. The iterator method defined there basically just calls iterator on FlatHashTable. The same basic efficiencies of these methods apply to immutable HashSet if that's what you're working with. As a comparison, you can see the toList implementation on HashSet is all the way up the type hierarchy at TraversableOnce and uses far more primitive elements which are probably less efficient and (of course) the entire collection must be iterated to generate the List. If you were going to convert the entire set to a Traversable collection, you should use Array or Vector which have constant-time lookup.
You might also note that there is nothing special about HashSet in the above methods, and you could enrich Set[T] instead, if you so chose (although there would be no guarantee that this would be as efficient on other Set implementations, of course).
As a side note, when implementing enriched classes for extension methods, you should always consider making an implicit, user-defined value class by extending AnyVal. You can read about some of the advantages and limitations in the docs, and on this answer.

what's the conceptual purpose of the Tuple2?

take this as a follow up to this SO question
I'm new to scala and working through the 99 problems. The given solution to p9 is:
object P09 {
def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
}
The span function is doing all the work here. As you can see from the API doc (it's the link) span returns a Tuple2 (actually the doc says it returns a pair - but that's been deprecated in favor or Tuple2). I was trying to figure out why you don't get something back like a list-of-lists or some such thing and stumbled across the SO link above. As I understand it, the reason for the Tuple2 has to do with increasing performance by not having to deal with Java's 'boxing/unboxing' of things like ints into objects like Integers. My question is
1) is that an accurate statement?
2) are there other reasons for something like span to return a Tuple2?
thx!
A TupleN object has at least two major differences when compared to a "standard" List+:
(less importantly) the size of the tuple is known beforehand, allowing to better reason about it (by the programmer and the compiler).
(more importantly) a tuple preserves the type information for each of its elements/"slots".
Note that, as alluded, the type Tuple2 is a part of the TupleN family, all utilizing the same concept. For example:
scala> ("1",2,3l)
res0: (String, Int, Long) = (1,2,3)
scala> res0.getClass
res1: Class[_ <: (String, Int, Long)] = class scala.Tuple3
As you can see here, each of the elements in the 3-tuple has a distinct type, allowing for better pattern matching, stricter type protection etc.
+heterogeneous lists are also possible in Scala, but, so far, they're not part of the standard library, and arguably harder to understand, especially for newcomers.
span returns exactly two values. A Tuple2 can hold exactly two values. A list can contain arbitrarily many values. Therefore Tuple2 is just a better fit than using a list.

Override equality for floating point values in Scala

Note: Bear with me, I'm not asking how to override equals or how to create a custom method to compare floating point values.
Scala is very nice in allowing comparison of objects by value, and by providing a series of tools to do so with little code. In particular, case classes, tuples and allowing comparison of entire collections.
I've often call methods that do intensive computations and generate o non-trivial data structure to return and I can then write a unit test that given a certain input will call the method and then compare the results against a hardcoded value. For instance:
def compute() =
{
// do a lot of computations here to produce the set below...
Set(('a', 1), ('b', 3))
}
val A = compute()
val equal = A == Set(('a', 1), ('b', 3))
// equal = true
This is a bare-bones example and I'm omitting here any code from specific test libraries, etc.
Given that floating point values are not reliably compared with equals, the following, and rather equivalent example, fails:
def compute() =
{
// do a lot of computations here to produce the set below...
Set(('a', 1.0/3.0), ('b', 3.1))
}
val A = compute()
val equal2 = A == Set(('a', 0.33333), ('b', 3.1)) // Use some arbitrary precision here
// equal2 = false
What I would want is to have a way to make all floating-point comparisons in that call to use an arbitrary level of precision. But note that I don't control (or want to alter in any way) either Set or Double.
I tried defining an implicit conversion from double to a new class and then overloading that class to return true. I could then use instances of that class in my hardcoded validations.
implicit class DoubleAprox(d: Double)
{
override def hashCode = d.hashCode()
override def equals(other : Any) : Boolean = other match {
case that : Double => (d - that).abs < 1e-5
case _ => false
}
}
val equals3 = DoubleAprox(1.0/3.0) == 0.33333 // true
val equals4 = 1.33333 == DoubleAprox(1.0/3.0) // false
But as you can see, it breaks symmetry. Given that I'm then comparing more complex data-structures (sets, tuples, case classes), I have no way to define a priori if equals() will be called on the left or the right. Seems like I'm bound to traverse all the structures and then do single floating-point comparisons on the branches... So, the question is: is there any way to do this at all??
As a side note: I gave a good read to an entire chapter on object equality and several blogs, but they only provides solutions for inheritance problems and requires you to basically own all classes involved and change all of them. And all of it seems rather convoluted given what it is trying to solve.
Seems to me that equality is one of those things that is fundamentally broken in Java due to the method having to be added to each class and permanently overridden time and again. What seems more intuitive to me would be to have comparison methods that the compiler can find. Say, you would provide equals(DoubleAprox, Double) and it would be used every time you want to compare 2 objects of those classes.
I think that changing the meaning of equality to mean anything fuzzy is a bad idea. See my comments in Equals for case class with floating point fields for why.
However, it can make sense to do this in a very limited scope, e.g. for testing. I think for numerical problems you should consider using the spire library as a dependency. It contains a large amount of useful things. Among them a type class for equality and mechanisms to derive type class instances for composite types (collections, tuples, etc) based on the type class instances for the individual scalar types.
Since as you observe, equality in the java world is fundamentally broken, they are using other operators (=== for type safe equality).
Here is an example how you would redefine equality for a limited scope to get fuzzy equality for comparing test results:
// import the machinery for operators like === (when an Eq type class instance is in scope)
import spire.syntax.all._
object Test extends App {
// redefine the equality for double, just in this scope, to mean fuzzy equali
implicit object FuzzyDoubleEq extends spire.algebra.Eq[Double] {
def eqv(a:Double, b:Double) = (a-b).abs < 1e-5
}
// this passes. === looks up the Eq instance for Double in the implicit scope. And
// since we have not imported the default instance but defined our own, this will
// find the Eq instance defined above and use its eqv method
require(0.0 === 0.000001)
// import automatic generation of type class instances for tuples based on type class instances of the scalars
// if there is an Eq available for each scalar type of the tuple, this will also make an Eq instance available for the tuple
import spire.std.tuples._
require((0.0, 0.0) === (0.000001, 0.0)) // works also for tuples containing doubles
// import automatic generation of type class instances for arrays based on type class instances of the scalars
// if there is an Eq instance for the element type of the array, there will also be one for the entire array
import spire.std.array._
require(Array(0.0,1.0) === Array(0.000001, 1.0)) // and for arrays of doubles
import spire.std.seq._
require(Seq(1.0, 0.0) === Seq(1.000000001, 0.0))
}
Java equals is indeed not as principled as it should be - people who are very bothered about this use something like Scalaz' Equal and ===. But even that assumes a symmetry of the types involved; I think you would have to write a custom typeclass to allow comparing heterogeneous types.
It's quite easy to write a new typeclass and have instances recursively derived for case classes, using Shapeless' automatic type class instance derivation. I'm not sure that extends to a two-parameter typeclass though. You might find it best to create distinct EqualityLHS and EqualityRHS typeclasses, and then your own equality method for comparing A: EqualityLHS and B: EqualityRHS, which could be pimped onto A as an operator if desired. (Of course it should be possible to extend the technique generically to support two-parameter typeclasses in full generality rather than needing such workarounds, and I'm sure shapeless would greatly appreciate such a contribution).
Best of luck - hopefully this gives you enough to find the rest of the answer yourself. What you want to do is by no means trivial, but with the help of modern Scala techniques it should be very much within the realms of possibility.

Scala Option object inside another Option object

I have a model, which has some Option fields, which contain another Option fields. For example:
case class First(second: Option[Second], name: Option[String])
case class Second(third: Option[Third], title: Option[String])
case class Third(numberOfSmth: Option[Int])
I'm receiving this data from external JSON's and sometimes this data may contain null's, that was the reason of such model design.
So the question is: what is the best way to get a deepest field?
First.get.second.get.third.get.numberOfSmth.get
Above method looks really ugly and it may cause exception if one of the objects will be None. I was looking in to Scalaz lib, but didn't figure out a better way to do that.
Any ideas?
The solution is to use Option.map and Option.flatMap:
First.flatMap(_.second.flatMap(_.third.map(_.numberOfSmth)))
Or the equivalent (see the update at the end of this answer):
First flatMap(_.second) flatMap(_.third) map(_.numberOfSmth)
This returns an Option[Int] (provided that numberOfSmth returns an Int). If any of the options in the call chain is None, the result will be None, otherwise it will be Some(count) where count is the value returned by numberOfSmth.
Of course this can get ugly very fast. For this reason scala supports for comprehensions as a syntactic sugar. The above can be rewritten as:
for {
first <- First
second <- first .second
third <- second.third
} third.numberOfSmth
Which is arguably nicer (especially if you are not yet used to seeing map/flatMap everywhere, as will certainly be the case after a while using scala), and generates the exact same code under the hood.
For more background, you may check this other question: What is Scala's yield?
UPDATE:
Thanks to Ben James for pointing out that flatMap is associative. In other words x flatMap(y flatMap z))) is the same as x flatMap y flatMap z. While the latter is usually not shorter, it has the advantage of avoiding any nesting, which is easier to follow.
Here is some illustration in the REPL (the 4 styles are equivalent, with the first two using flatMap nesting, the other two using flat chains of flatMap):
scala> val l = Some(1,Some(2,Some(3,"aze")))
l: Some[(Int, Some[(Int, Some[(Int, String)])])] = Some((1,Some((2,Some((3,aze))))))
scala> l.flatMap(_._2.flatMap(_._2.map(_._2)))
res22: Option[String] = Some(aze)
scala> l flatMap(_._2 flatMap(_._2 map(_._2)))
res23: Option[String] = Some(aze)
scala> l flatMap(_._2) flatMap(_._2) map(_._2)
res24: Option[String] = Some(aze)
scala> l.flatMap(_._2).flatMap(_._2).map(_._2)
res25: Option[String] = Some(aze)
There is no need for scalaz:
for {
first <- yourFirst
second <- f.second
third <- second.third
number <- third.numberOfSmth
} yield number
Alternatively you can use nested flatMaps
This can be done by chaining calls to flatMap:
def getN(first: Option[First]): Option[Int] =
first flatMap (_.second) flatMap (_.third) flatMap (_.numberOfSmth)
You can also do this with a for-comprehension, but it's more verbose as it forces you to name each intermediate value:
def getN(first: Option[First]): Option[Int] =
for {
f <- first
s <- f.second
t <- s.third
n <- t.numberOfSmth
} yield n
I think it is an overkill for your problem but just as a general reference:
This nested access problem is addressed by a concept called Lenses. They provide a nice mechanism to access nested data types by simple composition. As introduction you might want to check for instance this SO answer or this tutorial. The question whether it makes sense to use Lenses in your case is whether you also have to perform a lot of updates in you nested option structure (note: update not in the mutable sense, but returning a new modified but immutable instance). Without Lenses this leads to lengthy nested case class copy code. If you do not have to update at all, I would stick to om-nom-nom's suggestion.