Bitstream Library in Scala - scala

I need to compress/decompress some Data with a old, in house developed Algorithm.
There i have a lot of operations like:
if the next bit is 0 take the following 6 Bits and interpret them as an Int
if the next bits are 10 take the following 9 Bits and interpret them as an Int
etc.
Knows somebody something like a "Bitstrem" class in Scala? (I didn't found anything and hope that i didn't have to implement it by myself.)
Thanks
Edit:
I combined the answer with http://www.scala-lang.org/node/8413 ("The Architecture of Scala Collections") If somebody needs the samething:
abstract class Bit
object Bit {
val fromInt: Int => Bit = Array(Low, High)
val toInt: Bit => Int = Map(Low -> 0, High -> 1)
}
case object High extends Bit
case object Low extends Bit
import collection.IndexedSeqLike
import collection.mutable.{Builder, ArrayBuffer}
import collection.generic.CanBuildFrom
import collection.IndexedSeq
// IndexedSeqLike implements all concrete methods of IndexedSeq
// with newBuilder. (methods like take, filter, drop)
final class BitSeq private (val bits: Array[Int], val length: Int)
extends IndexedSeq[Bit]
with IndexedSeqLike[Bit, BitSeq]
{
import BitSeq._
// Mandatory for IndexedSeqLike
override protected[this] def newBuilder: Builder[Bit, BitSeq] =
BitSeq.newBuilder
//Mandatory for IndexedSeq
def apply(idx: Int): Bit = {
if(idx < 0 || length <= idx)
throw new IndexOutOfBoundsException
Bit.fromInt(bits(idx/N) >> (idx % N) & M)
}
}
object BitSeq {
// Bits per Int
private val N = 32
// Bitmask to isolate a bit
private val M = 0x01
def fromSeq(buf: Seq[Bit]): BitSeq = {
val bits = new Array[Int]((buf.length + N - 1) / N)
for(i <- 0 until buf.length) {
bits(i/N) |= Bit.toInt(buf(i)) << (i % N)
}
new BitSeq(bits, buf.length)
}
def apply(bits: Bit*) = fromSeq(bits)
def newBuilder: Builder[Bit, BitSeq] = new ArrayBuffer mapResult fromSeq
// Needed for map etc. (BitSeq map {:Bit} should return a BitSeq)
implicit def canBuilderFrom: CanBuildFrom[BitSeq, Bit, BitSeq] =
new CanBuildFrom[BitSeq, Bit, BitSeq] {
def apply(): Builder[Bit, BitSeq] = newBuilder
def apply(from: BitSeq): Builder[Bit, BitSeq] = newBuilder
}
}

There isn't any existing class that I'm aware of, but you can leverage the existing classes to help out with almost all of the difficult operations. The trick is to turn your data into a stream of Ints (or Bytes if there wouldn't be enough memory). You then can use all the handy collections methods (e.g. take) and only are left with the problem of turning bits into memory. But that's easy if you pack the bits in MSB order.
object BitExample {
def bitInt(ii: Iterator[Int]): Int = (0 /: ii)((i,b) => (i<<1)|b)
def bitInt(ii: Iterable[Int]): Int = bitInt(ii.iterator)
class ArrayBits(bytes: Array[Byte]) extends Iterator[Int] {
private[this] var buffer = 0
private[this] var index,shift = -1
def hasNext = (shift > 0) || (index+1 < bytes.length)
def next = {
if (shift <= 0) {
index += 1
buffer = bytes(index) & 0xFF
shift = 7
}
else shift -= 1
(buffer >> shift) & 0x1
}
}
}
And then you do things like
import BitExample._
val compressed = new ArrayBits( Array[Byte](14,29,126) ).toStream
val headless = compressed.dropWhile(_ == 0)
val (test,rest) = headless.splitAt(3)
if (bitInt(test) > 4) println(bitInt(rest.take(6)))
(You can decide whether you want to use the iterator directly or as a stream, list, or whatever.)

Related

Incrementing 'i' in scala for loop by differing amounts depending on circumstance

I want to write a for loop in scala, but the counter should get incremented by more than one (the amount is variable) in some special cases.
You can do this with a combination of a filter and an external var. Here is an example:
var nextValidVal = 0
for (i <- 0 to 99; if i >= nextValidVal) {
var amountToSkip = 0
// Whatever this loop is for
nextValidVal = if (amountToSkip > 0) i + amountToSkip + 1 else nextValidVal
}
So in the main body of your loop, you can set amountToSkip to n according to your conditions. The next n values of i´s sequence will be skipped.
If your sequence is pulled from some other kind of sequence, you could do it like this
var skip = 0
for (o <- someCollection if { val res = skip == 0; skip = if (!res) skip - 1 else 0; res } ) {
// Do stuff
}
If you set skip to a positive value in the body of the loop, the next n elements of the sequence will be skipped.
Of course, this is terribly imperative and side-effecty. I would look for other ways to to this where ever possible, by mapping or filtering or folding the original sequence.
You could implement your own stream to reflect step, for example:
import scala.collection.immutable.Stream
import ForStream._
object Test {
def main(args: Array[String]): Unit = {
val range = 0 to 20 by 1 withVariableStep; // in case you like definition through range
//val range = ForStream(0,20,1) // direct definition
for (i<- range) {
println(s"i=$i")
range.step = range.step + 1
}
}
}
object ForStream{
implicit def toForStream(range: Range): ForStream = new ForStreamMaster(range.start, range.end,range.step)
def apply(head:Int, end:Int, step:Int) = new ForStreamMaster(head, end,step)
}
abstract class ForStream(override val head: Int, val end: Int, var step: Int) extends Stream[Int] {
override val tailDefined = false
override val isEmpty = head > end
def withVariableStep = this
}
class ForStreamMaster(_head: Int, _end: Int, _Step: Int) extends ForStream(_head, _end,_Step){
override def tail = if (isEmpty) Stream.Empty else new ForStreamSlave(head + step, end, step, this)
}
class ForStreamSlave(_head: Int, _end: Int, _step: Int, val master: ForStream) extends ForStream(_head, _end,_step){
override def tail = if (isEmpty) Stream.Empty else new ForStreamSlave(head + master.step, end, master.step, master)
}
This prints:
i=0
i=2
i=5
i=9
i=14
i=20
You can define ForStream from Range with implicits, or define it directly. But be carefull:
You are not iterating Range anymore!
Stream should be immutable, but step is mutable!
Also as #om-nom-nom noted, this might be better implemented with recursion
Why not use the do-while loop?
var x = 0;
do{
...something
if(condition){change x to something else}
else{something else}
x+=1
}while(some condition for x)

How to add 'Array[Ordered[Any]]' as a method parameter

Below is an implementation of Selection sort written in Scala.
The line ss.sort(arr) causes this error :
type mismatch; found : Array[String] required: Array[Ordered[Any]]
Since the type Ordered is inherited by StringOps should this type not be inferred ?
How can I add the array of Strings to sort() method ?
Here is the complete code :
object SelectionSortTest {
def main(args: Array[String]){
val arr = Array("Hello","World")
val ss = new SelectionSort()
ss.sort(arr)
}
}
class SelectionSort {
def sort(a : Array[Ordered[Any]]) = {
var N = a.length
for (i <- 0 until N) {
var min = i
for(j <- i + 1 until N){
if( less(a(j) , a(min))){
min = j
}
exchange(a , i , min)
}
}
}
def less(v : Ordered[Any] , w : Ordered[Any]) = {
v.compareTo(w) < 0
}
def exchange(a : Array[Ordered[Any]] , i : Integer , j : Integer) = {
var swap : Ordered[Any] = a(i)
a(i) = a(j)
a(j) = swap
}
}
Array is invariant. You cannot use an Array[A] as an Array[B] even if A is subtype of B. See here why: Why are Arrays invariant, but Lists covariant?
Neither is Ordered, so your implementation of less will not work either.
You should make your implementation generic the following way:
object SelectionSortTest {
def main(args: Array[String]){
val arr = Array("Hello","World")
val ss = new SelectionSort()
ss.sort(arr)
}
}
class SelectionSort {
def sort[T <% Ordered[T]](a : Array[T]) = {
var N = a.length
for (i <- 0 until N) {
var min = i
for(j <- i + 1 until N){
if(a(j) < a(min)){ // call less directly on Ordered[T]
min = j
}
exchange(a , i , min)
}
}
}
def exchange[T](a : Array[T] , i : Integer , j : Integer) = {
var swap = a(i)
a(i) = a(j)
a(j) = swap
}
}
The somewhat bizarre statement T <% Ordered[T] means "any type T that can be implicitly converted to Ordered[T]". This ensures that you can still use the less-than operator.
See this for details:
What are Scala context and view bounds?
The answer by #gzm0 (with some very nice links) suggests Ordered. I'm going to complement with an answer covering Ordering, which provides equivalent functionality without imposing on your classes as much.
Let's adjust the sort method to accept an array of type 'T' for which an Ordering implicit instance is defined.
def sort[T : Ordering](a: Array[T]) = {
val ord = implicitly[Ordering[T]]
import ord._ // now comparison operations such as '<' are available for 'T'
// ...
if (a(j) < a(min))
// ...
}
The [T : Ordering] and implicitly[Ordering[T]] combo is equivalent to an implicit parameter of type Ordering[T]:
def sort[T](a: Array[T])(implicit ord: Ordering[T]) = {
import ord._
// ...
}
Why is this useful?
Imagine you are provided with a case class Account(balance: Int) by some third party. You can now add an Ordering for it like so:
// somewhere in scope
implicit val accountOrdering = new Ordering[Account] {
def compare(x: Account, y: Account) = x.balance - y.balance
}
// or, more simply
implicit val accountOrdering: Ordering[Account] = Ordering by (_.balance)
As long as that instance is in scope, you should be able to use sort(accounts).
If you want to use some different ordering, you can also provide it explicitly, like so: sort(accounts)(otherOrdering).
Note that this isn't very different from providing an implicit conversion to Ordering (at least not within the context of this question).
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler and performant code.
I don't think falling back to imperative style is a mistake for certain classes of problems, such as sorting algorithms which usually transform the input buffer (more like a procedure) rather than resulting to a new sorted collection.
Here it is my solution:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, to achieve parametric polymorphism, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))

Selection Sort Generic type implementation

I worked my way implementing a recursive version of selection and quick sort,i am trying to modify the code in a way that it can sort a list of any generic type , i want to assume that the generic type supplied can be converted to Comparable at runtime.
Does anyone have a link ,code or tutorial on how to do this please
I am trying to modify this particular code
'def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
}
enter code here
def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
} '
Language: SCALA
In Scala, Java Comparator is replaced by Ordering (quite similar but comes with more useful methods). They are implemented for several types (primitives, strings, bigDecimals, etc.) and you can provide your own implementations.
You can then use scala implicit to ask the compiler to pick the correct one for you:
def sort[A]( lst: List[A] )( implicit ord: Ordering[A] ) = {
...
}
If you are using a predefined ordering, just call:
sort( myLst )
and the compiler will infer the second argument. If you want to declare your own ordering, use the keyword implicit in the declaration. For instance:
implicit val fooOrdering = new Ordering[Foo] {
def compare( f1: Foo, f2: Foo ) = {...}
}
and it will be implicitly use if you try to sort a List of Foo.
If you have several implementations for the same type, you can also explicitly pass the correct ordering object:
sort( myFooLst )( fooOrdering )
More info in this post.
For Quicksort, I'll modify an example from the "Scala By Example" book to make it more generic.
class Quicksort[A <% Ordered[A]] {
def sort(a:ArraySeq[A]): ArraySeq[A] =
if (a.length < 2) a
else {
val pivot = a(a.length / 2)
sort (a filter (pivot >)) ++ (a filter (pivot == )) ++
sort (a filter(pivot <))
}
}
Test with Int
scala> val quicksort = new Quicksort[Int]
quicksort: Quicksort[Int] = Quicksort#38ceb62f
scala> val a = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39 ,219)
a: scala.collection.mutable.ArraySeq[Int] = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39, 21
9)
scala> quicksort.sort(a).foreach(n=> (print(n), print (" " )))
1 1 2 2 3 5 9 39 219
Test with a custom class implementing Ordered
scala> case class Meh(x: Int, y:Int) extends Ordered[Meh] {
| def compare(that: Meh) = (x + y).compare(that.x + that.y)
| }
defined class Meh
scala> val q2 = new Quicksort[Meh]
q2: Quicksort[Meh] = Quicksort#7677ce29
scala> val a3 = ArraySeq(Meh(1,1), Meh(12,1), Meh(0,1), Meh(2,2))
a3: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(1,1), Meh(12,1), Meh(0
,1), Meh(2,2))
scala> q2.sort(a3)
res7: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(0,1), Meh(1,1), Meh(
2,2), Meh(12,1))
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler code for the reader. I don't think falling back to imperative style is a mistake for certain classes of problems (such as sorting algorithms which usually transform the input buffer (like a procedure) rather than resulting to a new sorted one
Here it is my solution:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to many Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))

Finite Growable Array in Scala

I would like to be able to grow an Array-like structure up to a maximum size, after which the oldest (1st) element would be dropped off the structure every time a new element is added. I don't know what the best way to do this is, but one way would be to extend the ArrayBuffer class, and override the += operator so that if the maximum size has been reached, the first element is dropped every time a new one is added. I haven't figured out how to properly extend collections yet. What I have so far is:
class FiniteGrowableArray[A](maxLength:Int) extends scala.collection.mutable.ArrayBuffer {
override def +=(elem:A): <insert some return type here> = {
// append element
if(length > maxLength) remove(0)
<returned collection>
}
}
Can someone suggest a better path and/or help me along this one? NOTE: I will need to arbitrarily access elements within the structure multiple times in between the += operations.
Thanks
As others have discussed, you want a ring buffer. However, you also have to decide if you actually want all of the collections methods or not, and if so, what happens when you filter a ring buffer of maximum size N--does it keep its maximum size, or what?
If you're okay with merely being able to view your ring buffer as part of the collections hierarchy (but don't want to use collections efficiently to generate new ring buffers) then you can just:
class RingBuffer[T: ClassManifest](maxsize: Int) {
private[this] val buffer = new Array[T](maxsize+1)
private[this] var i0,i1 = 0
private[this] def i0up = { i0 += 1; if (i0>=buffer.length) i0 -= buffer.length }
private[this] def i0dn = { i0 -= 1; if (i0<0) i0 += buffer.length }
private[this] def i1up = { i1 += 1; if (i1>=buffer.length) i1 -= buffer.length }
private[this] def i1dn = { i1 -= 1; if (i1<0) i1 += buffer.length }
private[this] def me = this
def apply(i: Int) = {
val j = i+i0
if (j >= buffer.length) buffer(j-buffer.length) else buffer(j)
}
def size = if (i1<i0) buffer.length+i1-i0 else i1-i0
def :+(t: T) = {
buffer(i1) = t
i1up; if (i1==i0) i0up
this
}
def +:(t: T) = {
i0dn; if (i0==i1) i1dn
buffer(i0) = t
this
}
def popt = {
if (i1==i0) throw new java.util.NoSuchElementException
i1dn; buffer(i1)
}
def poph = {
if (i1==i0) throw new java.util.NoSuchElementException
val ans = buffer(i0); i0up; ans
}
def seqView = new IndexedSeq[T] {
def apply(i: Int) = me(i)
def length = me.size
}
}
Now you can use this easily directly, and you can jump out to IndexedSeq when needed:
val r = new RingBuffer[Int](4)
r :+ 7 :+ 9 :+ 2
r.seqView.mkString(" ") // Prints 7 9 2
r.popt // Returns 2
r.poph // Returns 7
r :+ 6 :+ 5 :+ 4 :+ 3
r.seqView.mkString(" ") // Prints 6 5 4 3 -- 7 fell off the end
0 +: 1 +: 2 +: r
r.seqView.mkString(" ") // Prints 0 1 2 6 -- added to front; 3,4,5 fell off
r.seqView.filter(_>1) // Vector(2,6)
and if you want to put things back into a ring buffer, you can
class RingBufferImplicit[T: ClassManifest](ts: Traversable[T]) {
def ring(maxsize: Int) = {
val rb = new RingBuffer[T](maxsize)
ts.foreach(rb :+ _)
rb
}
}
implicit def traversable2ringbuffer[T: ClassManifest](ts: Traversable[T]) = {
new RingBufferImplicit(ts)
}
and then you can do things like
val rr = List(1,2,3,4,5).ring(4)
rr.seqView.mkString(" ") // Prints 2,3,4,5

The cost of nested methods

In Scala one might define methods inside other methods. This limits their scope of use to inside of definition block. I use them to improve readability of code that uses several higher-order functions. In contrast to anonymous function literals, this allows me to give them meaningful names before passing them on.
For example:
class AggregatedPerson extends HashSet[PersonRecord] {
def mostFrequentName: String = {
type NameCount = (String, Int)
def moreFirst(a: NameCount, b: NameCount) = a._2 > b._2
def countOccurrences(nameGroup: (String, List[PersonRecord])) =
(nameGroup._1, nameGroup._2.size)
iterator.toList.groupBy(_.fullName).
map(countOccurrences).iterator.toList.
sortWith(moreFirst).head._1
}
}
Is there any runtime cost because of the nested method definition I should be aware of?
Does the answer differ for closures?
During compilaton, the nested functions are moveFirst and countOccurences are moved out to the same level as mostFrequentName. They get compiler synthesized names: moveFirst$1 and countOccurences$1.
In addition, when you refer to one of these methods without an argument list, it is lifted into a function. So map(countOccurences) is the same as writing map((a: (String, List[PersonRecord])) => countOccurences(a)). This anonymous function is compiled to a separate class AggregatedPerson$$anonfun$mostFrequentName$2, which does nothing more than forward to countOccurences$.
As a side note, the process of lifting the method to a function is called Eta Expansion. It is triggered if you omit the argument list in a context where a function type is expected (as in your example), or if you use _ in place of the entire argument list, or in place of each argument (val f1 = countOccurences _ ; val f2 = countOccurences(_).
If the code was directly in the closure, you would have one fewer method call in your stack, and one fewer synthetic method generated. The performance impact of this is likely to be zero in most cases.
I find it to be a fantastically useful tool to structure code, and consider your example very idiomatic Scala.
Another useful tool is using small blocks to initialize a val:
val a = {
val temp1, temp2 = ...
f(temp1, temp2)
}
You can use scalac -print to see exactly how Scala code is translated into a form ready for the JVM. Heres the output from your program:
[[syntax trees at end of cleanup]]// Scala source: nested-method.scala
package <empty> {
class AggregatedPerson extends scala.collection.mutable.HashSet with ScalaObject {
def mostFrequentName(): java.lang.String = AggregatedPerson.this.iterator().toList().groupBy({
(new AggregatedPerson$$anonfun$mostFrequentName$1(AggregatedPerson.this): Function1)
}).map({
{
(new AggregatedPerson$$anonfun$mostFrequentName$2(AggregatedPerson.this): Function1)
}
}, collection.this.Map.canBuildFrom()).$asInstanceOf[scala.collection.MapLike]().iterator().toList().sortWith({
{
(new AggregatedPerson$$anonfun$mostFrequentName$3(AggregatedPerson.this): Function2)
}
}).$asInstanceOf[scala.collection.IterableLike]().head().$asInstanceOf[Tuple2]()._1().$asInstanceOf[java.lang.String]();
final def moreFirst$1(a: Tuple2, b: Tuple2): Boolean = scala.Int.unbox(a._2()).>(scala.Int.unbox(b._2()));
final def countOccurrences$1(nameGroup: Tuple2): Tuple2 = new Tuple2(nameGroup._1(), scala.Int.box(nameGroup._2().$asInstanceOf[scala.collection.SeqLike]().size()));
def this(): AggregatedPerson = {
AggregatedPerson.super.this();
()
}
};
#SerialVersionUID(0) #serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$1 extends scala.runtime.AbstractFunction1 {
final def apply(x$1: PersonRecord): java.lang.String = x$1.fullName();
final <bridge> def apply(v1: java.lang.Object): java.lang.Object = AggregatedPerson$$anonfun$mostFrequentName$1.this.apply(v1.$asInstanceOf[PersonRecord]());
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$1 = {
AggregatedPerson$$anonfun$mostFrequentName$1.super.this();
()
}
};
#SerialVersionUID(0) #serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$2 extends scala.runtime.AbstractFunction1 {
final def apply(nameGroup: Tuple2): Tuple2 = AggregatedPerson$$anonfun$mostFrequentName$2.this.$outer.countOccurrences$1(nameGroup);
<synthetic> <paramaccessor> private[this] val $outer: AggregatedPerson = _;
final <bridge> def apply(v1: java.lang.Object): java.lang.Object = AggregatedPerson$$anonfun$mostFrequentName$2.this.apply(v1.$asInstanceOf[Tuple2]());
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$2 = {
if ($outer.eq(null))
throw new java.lang.NullPointerException()
else
AggregatedPerson$$anonfun$mostFrequentName$2.this.$outer = $outer;
AggregatedPerson$$anonfun$mostFrequentName$2.super.this();
()
}
};
#SerialVersionUID(0) #serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$3 extends scala.runtime.AbstractFunction2 {
final def apply(a: Tuple2, b: Tuple2): Boolean = AggregatedPerson$$anonfun$mostFrequentName$3.this.$outer.moreFirst$1(a, b);
<synthetic> <paramaccessor> private[this] val $outer: AggregatedPerson = _;
final <bridge> def apply(v1: java.lang.Object, v2: java.lang.Object): java.lang.Object = scala.Boolean.box(AggregatedPerson$$anonfun$mostFrequentName$3.this.apply(v1.$asInstanceOf[Tuple2](), v2.$asInstanceOf[Tuple2]()));
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$3 = {
if ($outer.eq(null))
throw new java.lang.NullPointerException()
else
AggregatedPerson$$anonfun$mostFrequentName$3.this.$outer = $outer;
AggregatedPerson$$anonfun$mostFrequentName$3.super.this();
()
}
}
}
There is a small runtime cost. You can observe it here (apologies for the long code):
object NestBench {
def countRaw() = {
var sum = 0
var i = 0
while (i<1000) {
sum += i
i += 1
var j = 0
while (j<1000) {
sum += j
j += 1
var k = 0
while (k<1000) {
sum += k
k += 1
sum += 1
}
}
}
sum
}
def countClosure() = {
var sum = 0
var i = 0
def sumI {
sum += i
i += 1
var j = 0
def sumJ {
sum += j
j += 1
var k = 0
def sumK {
def sumL { sum += 1 }
sum += k
k += 1
sumL
}
while (k<1000) sumK
}
while (j<1000) sumJ
}
while (i<1000) sumI
sum
}
def countInner() = {
var sum = 0
def whileI = {
def whileJ = {
def whileK = {
def whileL() = 1
var ksum = 0
var k = 0
while (k<1000) { ksum += k; k += 1; ksum += whileL }
ksum
}
var jsum = 0
var j = 0
while (j<1000) {
jsum += j; j += 1
jsum += whileK
}
jsum
}
var isum = 0
var i = 0
while (i<1000) {
isum += i; i += 1
isum += whileJ
}
isum
}
whileI
}
def countFunc() = {
def summer(f: => Int)() = {
var sum = 0
var i = 0
while (i<1000) {
sum += i; i += 1
sum += f
}
sum
}
summer( summer( summer(1) ) )()
}
def nsPerIteration(f:() => Int): (Int,Double) = {
val t0 = System.nanoTime
val result = f()
val t1 = System.nanoTime
(result , (t1-t0)*1e-9)
}
def main(args: Array[String]) {
for (i <- 1 to 5) {
val fns = List(countRaw _, countClosure _, countInner _, countFunc _)
val labels = List("raw","closure","inner","func")
val results = (fns zip labels) foreach (fl => {
val x = nsPerIteration( fl._1 )
printf("Method %8s produced %d; time/it = %.3f ns\n",fl._2,x._1,x._2)
})
}
}
}
There are four different methods for summing integers:
A raw while loop ("raw")
While loop inner methods that are closures over the sum variable
While loop inner methods that return the partial sum
A while-and-sum nested function
And we see the results on my machine in terms of nanoseconds taken in the inner loop:
scala> NestBench.main(Array[String]())
Method raw produced -1511174132; time/it = 0.422 ns
Method closure produced -1511174132; time/it = 2.376 ns
Method inner produced -1511174132; time/it = 0.402 ns
Method func produced -1511174132; time/it = 0.836 ns
Method raw produced -1511174132; time/it = 0.418 ns
Method closure produced -1511174132; time/it = 2.410 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.813 ns
Method raw produced -1511174132; time/it = 0.411 ns
Method closure produced -1511174132; time/it = 2.372 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.813 ns
Method raw produced -1511174132; time/it = 0.411 ns
Method closure produced -1511174132; time/it = 2.370 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.815 ns
Method raw produced -1511174132; time/it = 0.412 ns
Method closure produced -1511174132; time/it = 2.357 ns
Method inner produced -1511174132; time/it = 0.400 ns
Method func produced -1511174132; time/it = 0.817 ns
So, bottom line is: nesting functions really doesn't hurt you at all in simple cases--the JVM will figure out that the call can be inlined (thus raw and inner give the same times). If you take a more functional approach, the function call can't be completely neglected, but the time taken is vanishingly small (approximately 0.4 ns extra per call). If you use a lot of closures, then closing them gives an overhead of something like 1 ns per call at least in this case where a single mutable variable is written to.
You can modify the code above to find answers to other questions, but the bottom line is that it is all very fast, ranging between "no penalty whatsoever" through to "only worry about in the very tightest inner loops that otherwise have minimal work to do".
(P.S. For comparison, the creation of a single small object takes ~4 ns on my machine.)
Current as of January 2014
The current benchmark is ~3 years old and Hotspot and the compiler have significantly evolved. I am also using Google Caliper to perform the benchmarks.
Code
import com.google.caliper.SimpleBenchmark
class Benchmark extends SimpleBenchmark {
def timeRaw(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += 0xc37e ^ (i * 0xd5f3)
i = i + 1
}
result
}
def normal(i: Int): Long = 0xc37e ^ (i * 0xd5f3)
def timeNormal(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += normal(i)
i = i + 1
}
result
}
def timeInner(reps: Int) = {
def inner(i: Int): Long = 0xc37e ^ (i * 0xd5f3)
var i = 0
var result = 0L
while (i < reps) {
result += inner(i)
i = i + 1
}
result
}
def timeClosure(reps: Int) = {
var i = 0
var result = 0L
val closure = () => result += 0xc37e ^ (i * 0xd5f3)
while (i < reps) {
closure()
i = i + 1
}
result
}
def normal(i: Int, j: Int, k: Int, l: Int): Long = i ^ j ^ k ^ l
def timeUnboxed(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += normal(i,i,i,i)
i = i + 1
}
result
}
val closure = (i: Int, j: Int, k: Int, l: Int) => (i ^ j ^ k ^ l).toLong
def timeBoxed(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
closure(i,i,i,i)
i = i + 1
}
result
}
}
Results
benchmark ns linear runtime
Normal 0.576 =
Raw 0.576 =
Inner 0.576 =
Closure 0.532 =
Unboxed 0.893 =
Boxed 15.210 ==============================
What is very surprising is that closure test was completed about 4ns faster than the others. This seems to be an idiosyncrasy of Hotspot instead of the execution environment, multiple runs have returned the same trend.
Using a closure that performs boxing is a huge performance hit, it takes approx 3.579ns to perform one unboxing and reboxing, enough to do a lot of primitive math operations. In this specific position, things might get better with the work being done on a new optimizer. In the general case, boxing might be alleviated by miniboxing.
Edit:
The new optimizer doesn't really help here, it makes Closure 0.1 ns slower and Boxed 0.1 ns faster:
benchmark ns linear runtime
Raw 0.574 =
Normal 0.576 =
Inner 0.575 =
Closure 0.645 =
Unboxed 0.889 =
Boxed 15.107 ==============================
Performed with 2.11.0-20131209-220002-9587726041 from magarciaEPFL/scala
Versions
JVM
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
Scalac
Scala compiler version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL