In Scala one might define methods inside other methods. This limits their scope of use to inside of definition block. I use them to improve readability of code that uses several higher-order functions. In contrast to anonymous function literals, this allows me to give them meaningful names before passing them on.
For example:
class AggregatedPerson extends HashSet[PersonRecord] {
def mostFrequentName: String = {
type NameCount = (String, Int)
def moreFirst(a: NameCount, b: NameCount) = a._2 > b._2
def countOccurrences(nameGroup: (String, List[PersonRecord])) =
(nameGroup._1, nameGroup._2.size)
iterator.toList.groupBy(_.fullName).
map(countOccurrences).iterator.toList.
sortWith(moreFirst).head._1
}
}
Is there any runtime cost because of the nested method definition I should be aware of?
Does the answer differ for closures?
During compilaton, the nested functions are moveFirst and countOccurences are moved out to the same level as mostFrequentName. They get compiler synthesized names: moveFirst$1 and countOccurences$1.
In addition, when you refer to one of these methods without an argument list, it is lifted into a function. So map(countOccurences) is the same as writing map((a: (String, List[PersonRecord])) => countOccurences(a)). This anonymous function is compiled to a separate class AggregatedPerson$$anonfun$mostFrequentName$2, which does nothing more than forward to countOccurences$.
As a side note, the process of lifting the method to a function is called Eta Expansion. It is triggered if you omit the argument list in a context where a function type is expected (as in your example), or if you use _ in place of the entire argument list, or in place of each argument (val f1 = countOccurences _ ; val f2 = countOccurences(_).
If the code was directly in the closure, you would have one fewer method call in your stack, and one fewer synthetic method generated. The performance impact of this is likely to be zero in most cases.
I find it to be a fantastically useful tool to structure code, and consider your example very idiomatic Scala.
Another useful tool is using small blocks to initialize a val:
val a = {
val temp1, temp2 = ...
f(temp1, temp2)
}
You can use scalac -print to see exactly how Scala code is translated into a form ready for the JVM. Heres the output from your program:
[[syntax trees at end of cleanup]]// Scala source: nested-method.scala
package <empty> {
class AggregatedPerson extends scala.collection.mutable.HashSet with ScalaObject {
def mostFrequentName(): java.lang.String = AggregatedPerson.this.iterator().toList().groupBy({
(new AggregatedPerson$$anonfun$mostFrequentName$1(AggregatedPerson.this): Function1)
}).map({
{
(new AggregatedPerson$$anonfun$mostFrequentName$2(AggregatedPerson.this): Function1)
}
}, collection.this.Map.canBuildFrom()).$asInstanceOf[scala.collection.MapLike]().iterator().toList().sortWith({
{
(new AggregatedPerson$$anonfun$mostFrequentName$3(AggregatedPerson.this): Function2)
}
}).$asInstanceOf[scala.collection.IterableLike]().head().$asInstanceOf[Tuple2]()._1().$asInstanceOf[java.lang.String]();
final def moreFirst$1(a: Tuple2, b: Tuple2): Boolean = scala.Int.unbox(a._2()).>(scala.Int.unbox(b._2()));
final def countOccurrences$1(nameGroup: Tuple2): Tuple2 = new Tuple2(nameGroup._1(), scala.Int.box(nameGroup._2().$asInstanceOf[scala.collection.SeqLike]().size()));
def this(): AggregatedPerson = {
AggregatedPerson.super.this();
()
}
};
#SerialVersionUID(0) #serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$1 extends scala.runtime.AbstractFunction1 {
final def apply(x$1: PersonRecord): java.lang.String = x$1.fullName();
final <bridge> def apply(v1: java.lang.Object): java.lang.Object = AggregatedPerson$$anonfun$mostFrequentName$1.this.apply(v1.$asInstanceOf[PersonRecord]());
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$1 = {
AggregatedPerson$$anonfun$mostFrequentName$1.super.this();
()
}
};
#SerialVersionUID(0) #serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$2 extends scala.runtime.AbstractFunction1 {
final def apply(nameGroup: Tuple2): Tuple2 = AggregatedPerson$$anonfun$mostFrequentName$2.this.$outer.countOccurrences$1(nameGroup);
<synthetic> <paramaccessor> private[this] val $outer: AggregatedPerson = _;
final <bridge> def apply(v1: java.lang.Object): java.lang.Object = AggregatedPerson$$anonfun$mostFrequentName$2.this.apply(v1.$asInstanceOf[Tuple2]());
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$2 = {
if ($outer.eq(null))
throw new java.lang.NullPointerException()
else
AggregatedPerson$$anonfun$mostFrequentName$2.this.$outer = $outer;
AggregatedPerson$$anonfun$mostFrequentName$2.super.this();
()
}
};
#SerialVersionUID(0) #serializable final <synthetic> class AggregatedPerson$$anonfun$mostFrequentName$3 extends scala.runtime.AbstractFunction2 {
final def apply(a: Tuple2, b: Tuple2): Boolean = AggregatedPerson$$anonfun$mostFrequentName$3.this.$outer.moreFirst$1(a, b);
<synthetic> <paramaccessor> private[this] val $outer: AggregatedPerson = _;
final <bridge> def apply(v1: java.lang.Object, v2: java.lang.Object): java.lang.Object = scala.Boolean.box(AggregatedPerson$$anonfun$mostFrequentName$3.this.apply(v1.$asInstanceOf[Tuple2](), v2.$asInstanceOf[Tuple2]()));
def this($outer: AggregatedPerson): AggregatedPerson$$anonfun$mostFrequentName$3 = {
if ($outer.eq(null))
throw new java.lang.NullPointerException()
else
AggregatedPerson$$anonfun$mostFrequentName$3.this.$outer = $outer;
AggregatedPerson$$anonfun$mostFrequentName$3.super.this();
()
}
}
}
There is a small runtime cost. You can observe it here (apologies for the long code):
object NestBench {
def countRaw() = {
var sum = 0
var i = 0
while (i<1000) {
sum += i
i += 1
var j = 0
while (j<1000) {
sum += j
j += 1
var k = 0
while (k<1000) {
sum += k
k += 1
sum += 1
}
}
}
sum
}
def countClosure() = {
var sum = 0
var i = 0
def sumI {
sum += i
i += 1
var j = 0
def sumJ {
sum += j
j += 1
var k = 0
def sumK {
def sumL { sum += 1 }
sum += k
k += 1
sumL
}
while (k<1000) sumK
}
while (j<1000) sumJ
}
while (i<1000) sumI
sum
}
def countInner() = {
var sum = 0
def whileI = {
def whileJ = {
def whileK = {
def whileL() = 1
var ksum = 0
var k = 0
while (k<1000) { ksum += k; k += 1; ksum += whileL }
ksum
}
var jsum = 0
var j = 0
while (j<1000) {
jsum += j; j += 1
jsum += whileK
}
jsum
}
var isum = 0
var i = 0
while (i<1000) {
isum += i; i += 1
isum += whileJ
}
isum
}
whileI
}
def countFunc() = {
def summer(f: => Int)() = {
var sum = 0
var i = 0
while (i<1000) {
sum += i; i += 1
sum += f
}
sum
}
summer( summer( summer(1) ) )()
}
def nsPerIteration(f:() => Int): (Int,Double) = {
val t0 = System.nanoTime
val result = f()
val t1 = System.nanoTime
(result , (t1-t0)*1e-9)
}
def main(args: Array[String]) {
for (i <- 1 to 5) {
val fns = List(countRaw _, countClosure _, countInner _, countFunc _)
val labels = List("raw","closure","inner","func")
val results = (fns zip labels) foreach (fl => {
val x = nsPerIteration( fl._1 )
printf("Method %8s produced %d; time/it = %.3f ns\n",fl._2,x._1,x._2)
})
}
}
}
There are four different methods for summing integers:
A raw while loop ("raw")
While loop inner methods that are closures over the sum variable
While loop inner methods that return the partial sum
A while-and-sum nested function
And we see the results on my machine in terms of nanoseconds taken in the inner loop:
scala> NestBench.main(Array[String]())
Method raw produced -1511174132; time/it = 0.422 ns
Method closure produced -1511174132; time/it = 2.376 ns
Method inner produced -1511174132; time/it = 0.402 ns
Method func produced -1511174132; time/it = 0.836 ns
Method raw produced -1511174132; time/it = 0.418 ns
Method closure produced -1511174132; time/it = 2.410 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.813 ns
Method raw produced -1511174132; time/it = 0.411 ns
Method closure produced -1511174132; time/it = 2.372 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.813 ns
Method raw produced -1511174132; time/it = 0.411 ns
Method closure produced -1511174132; time/it = 2.370 ns
Method inner produced -1511174132; time/it = 0.399 ns
Method func produced -1511174132; time/it = 0.815 ns
Method raw produced -1511174132; time/it = 0.412 ns
Method closure produced -1511174132; time/it = 2.357 ns
Method inner produced -1511174132; time/it = 0.400 ns
Method func produced -1511174132; time/it = 0.817 ns
So, bottom line is: nesting functions really doesn't hurt you at all in simple cases--the JVM will figure out that the call can be inlined (thus raw and inner give the same times). If you take a more functional approach, the function call can't be completely neglected, but the time taken is vanishingly small (approximately 0.4 ns extra per call). If you use a lot of closures, then closing them gives an overhead of something like 1 ns per call at least in this case where a single mutable variable is written to.
You can modify the code above to find answers to other questions, but the bottom line is that it is all very fast, ranging between "no penalty whatsoever" through to "only worry about in the very tightest inner loops that otherwise have minimal work to do".
(P.S. For comparison, the creation of a single small object takes ~4 ns on my machine.)
Current as of January 2014
The current benchmark is ~3 years old and Hotspot and the compiler have significantly evolved. I am also using Google Caliper to perform the benchmarks.
Code
import com.google.caliper.SimpleBenchmark
class Benchmark extends SimpleBenchmark {
def timeRaw(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += 0xc37e ^ (i * 0xd5f3)
i = i + 1
}
result
}
def normal(i: Int): Long = 0xc37e ^ (i * 0xd5f3)
def timeNormal(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += normal(i)
i = i + 1
}
result
}
def timeInner(reps: Int) = {
def inner(i: Int): Long = 0xc37e ^ (i * 0xd5f3)
var i = 0
var result = 0L
while (i < reps) {
result += inner(i)
i = i + 1
}
result
}
def timeClosure(reps: Int) = {
var i = 0
var result = 0L
val closure = () => result += 0xc37e ^ (i * 0xd5f3)
while (i < reps) {
closure()
i = i + 1
}
result
}
def normal(i: Int, j: Int, k: Int, l: Int): Long = i ^ j ^ k ^ l
def timeUnboxed(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
result += normal(i,i,i,i)
i = i + 1
}
result
}
val closure = (i: Int, j: Int, k: Int, l: Int) => (i ^ j ^ k ^ l).toLong
def timeBoxed(reps: Int) = {
var i = 0
var result = 0L
while (i < reps) {
closure(i,i,i,i)
i = i + 1
}
result
}
}
Results
benchmark ns linear runtime
Normal 0.576 =
Raw 0.576 =
Inner 0.576 =
Closure 0.532 =
Unboxed 0.893 =
Boxed 15.210 ==============================
What is very surprising is that closure test was completed about 4ns faster than the others. This seems to be an idiosyncrasy of Hotspot instead of the execution environment, multiple runs have returned the same trend.
Using a closure that performs boxing is a huge performance hit, it takes approx 3.579ns to perform one unboxing and reboxing, enough to do a lot of primitive math operations. In this specific position, things might get better with the work being done on a new optimizer. In the general case, boxing might be alleviated by miniboxing.
Edit:
The new optimizer doesn't really help here, it makes Closure 0.1 ns slower and Boxed 0.1 ns faster:
benchmark ns linear runtime
Raw 0.574 =
Normal 0.576 =
Inner 0.575 =
Closure 0.645 =
Unboxed 0.889 =
Boxed 15.107 ==============================
Performed with 2.11.0-20131209-220002-9587726041 from magarciaEPFL/scala
Versions
JVM
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
Scalac
Scala compiler version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL
Related
I had a piece of code:
def this(vectors: List[DenseVector[Double]]) {
this(vectors.length)
var resultVector = vectors.head
for (vector <- vectors) {
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
}
_vector = resultVector
}
It worked just the way I wanted it to work. The problem is that I needed complex values in stead of doubles. After importing breeze.math.Complex, I changed the code to:
def this(vectors: List[DenseVector[Complex]]) {
this(vectors.length)
var resultVector = vectors.head
for (vector <- vectors) {
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
}
_vector = resultVector
}
This however results into the errors:
Error:(42, 26) could not find implicit value for parameter impl: breeze.linalg.kron.Impl2[breeze.linalg.DenseMatrix[breeze.math.Complex],breeze.linalg.DenseMatrix[breeze.math.Complex],VR]
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
^
Error:(42, 26) not enough arguments for method apply: (implicit impl: breeze.linalg.kron.Impl2[breeze.linalg.DenseMatrix[breeze.math.Complex],breeze.linalg.DenseMatrix[breeze.math.Complex],VR])VR in trait UFunc.
Unspecified value parameter impl.
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
^
Is this a bug or am I forgetting to do something?
I found the problem in the following way:
I first rewrote the function to use less matrix conversions
As there was a problem with the implicit impl variable of kron, I also rewrote the function call to explicitly state which variable to use to use
.
def this(vectors: List[DenseVector[Complex]]) {
this(vectors.length)
var resultMatrix = vectors.head.toDenseMatrix
for (i <- 1 until vectors.length) {
resultMatrix = kron(resultMatrix, vectors(i).toDenseMatrix)(kron.kronDM_M[Complex, Complex, DenseMatrix[Complex], Complex])
}
_vector = resultMatrix.toDenseVector
}
This showed me that there was no ScalarMulOp for V2, M, DenseMatrix[RV] where M is a Matrix[V1], V1 and V2 are the input types and RV is the output type of the ScalarMulOp
Digging through the source code of breeze I found in DenseMatrixOps that there only was an implicit ScalarMulOp for the above types if V1, V2 and RV are of type Int, Long, Float and Double. By copying the function and making it specific for Complex numbers, I was able to get the kronecker product to work. Now I could also remove the explicit use of (kron.kronDM_M[Complex, Complex, DenseMatrix[Complex], Complex]). The ScalarMulOp function in question is:
implicit def s_dm_op_Complex_OpMulScalar(implicit op: OpMulScalar.Impl2[Complex, Complex, Complex]):
OpMulScalar.Impl2[Complex, DenseMatrix[Complex], DenseMatrix[Complex]] =
new OpMulScalar.Impl2[Complex, DenseMatrix[Complex], DenseMatrix[Complex]] {
def apply(b: Complex, a: DenseMatrix[Complex]): DenseMatrix[Complex] = {
val res: DenseMatrix[Complex] = DenseMatrix.zeros[Complex](a.rows, a.cols)
val resd: Array[Complex] = res.data
val ad: Array[Complex] = a.data
var c = 0
var off = 0
while (c < a.cols) {
var r = 0
while (r < a.rows) {
resd(off) = op(b, ad(a.linearIndex(r, c)))
r += 1
off += 1
}
c += 1
}
res
}
implicitly[BinaryRegistry[Complex, Matrix[Complex], OpMulScalar.type, Matrix[Complex]]].register(this)
}
I want to write a for loop in scala, but the counter should get incremented by more than one (the amount is variable) in some special cases.
You can do this with a combination of a filter and an external var. Here is an example:
var nextValidVal = 0
for (i <- 0 to 99; if i >= nextValidVal) {
var amountToSkip = 0
// Whatever this loop is for
nextValidVal = if (amountToSkip > 0) i + amountToSkip + 1 else nextValidVal
}
So in the main body of your loop, you can set amountToSkip to n according to your conditions. The next n values of i´s sequence will be skipped.
If your sequence is pulled from some other kind of sequence, you could do it like this
var skip = 0
for (o <- someCollection if { val res = skip == 0; skip = if (!res) skip - 1 else 0; res } ) {
// Do stuff
}
If you set skip to a positive value in the body of the loop, the next n elements of the sequence will be skipped.
Of course, this is terribly imperative and side-effecty. I would look for other ways to to this where ever possible, by mapping or filtering or folding the original sequence.
You could implement your own stream to reflect step, for example:
import scala.collection.immutable.Stream
import ForStream._
object Test {
def main(args: Array[String]): Unit = {
val range = 0 to 20 by 1 withVariableStep; // in case you like definition through range
//val range = ForStream(0,20,1) // direct definition
for (i<- range) {
println(s"i=$i")
range.step = range.step + 1
}
}
}
object ForStream{
implicit def toForStream(range: Range): ForStream = new ForStreamMaster(range.start, range.end,range.step)
def apply(head:Int, end:Int, step:Int) = new ForStreamMaster(head, end,step)
}
abstract class ForStream(override val head: Int, val end: Int, var step: Int) extends Stream[Int] {
override val tailDefined = false
override val isEmpty = head > end
def withVariableStep = this
}
class ForStreamMaster(_head: Int, _end: Int, _Step: Int) extends ForStream(_head, _end,_Step){
override def tail = if (isEmpty) Stream.Empty else new ForStreamSlave(head + step, end, step, this)
}
class ForStreamSlave(_head: Int, _end: Int, _step: Int, val master: ForStream) extends ForStream(_head, _end,_step){
override def tail = if (isEmpty) Stream.Empty else new ForStreamSlave(head + master.step, end, master.step, master)
}
This prints:
i=0
i=2
i=5
i=9
i=14
i=20
You can define ForStream from Range with implicits, or define it directly. But be carefull:
You are not iterating Range anymore!
Stream should be immutable, but step is mutable!
Also as #om-nom-nom noted, this might be better implemented with recursion
Why not use the do-while loop?
var x = 0;
do{
...something
if(condition){change x to something else}
else{something else}
x+=1
}while(some condition for x)
Just messing about here, with circular buffers. Is this a sensible implementation or is there a faster/more reliable way to skin this cat?
class CircularBuffer[T](size: Int)(implicit mf: Manifest[T]) {
private val arr = new scala.collection.mutable.ArrayBuffer[T]()
private var cursor = 0
val monitor = new ReentrantReadWriteLock()
def push(value: T) {
monitor.writeLock().lock()
try {
arr(cursor) = value
cursor += 1
cursor %= size
} finally {
monitor.writeLock().unlock()
}
}
def getAll: Array[T] = {
monitor.readLock().lock()
try {
val copy = new Array[T](size)
arr.copyToArray(copy)
copy
} finally {
monitor.readLock().unlock()
}
}
}
Creation
A type declaration to an existing scala collection class and an appender function is easier to implement than "rolling your own". As noted in the comments this implementation will likely not be as performant as a "true" circular buffer, but it gets the job done with very little coding:
import scala.collection.immutable
type CircularBuffer[T] = immutable.Vector[T]
def emptyCircularBuffer[T] : CircularBuffer[T] = immutable.Vector.empty[T]
def addToCircularBuffer[T](maxSize : Int)(buffer : CircularBuffer[T], item : T) : CircularBuffer[T] =
(buffer :+ item) takeRight maxSize
This means that your "CircularBuffer" is actually a Vector and you now get all of the corresponding Vector methods (filter, map, flatMap, etc...) for free:
var intCircularBuffer = emptyCircularBuffer[Int]
//Vector(41)
intCircularBuffer = addToCircularBuffer(2)(intCircularBuffer, 41)
//Vector(41, 42)
intCircularBuffer = addToCircularBuffer(2)(intCircularBuffer, 42)
//Vector(42, 43)
intCircularBuffer = addToCircularBuffer(2)(intCircularBuffer, 43)
//Vector(42)
val evens : CircularBuffer[Int] = intCircularBuffer filter ( _ % 2 == 0)
Indexing
You can similarly add a function for circular indexing:
def circularIndex[T](buffer : CircularBuffer[T])(index : Int) : T =
buffer(index % buffer.size)
I would like to be able to grow an Array-like structure up to a maximum size, after which the oldest (1st) element would be dropped off the structure every time a new element is added. I don't know what the best way to do this is, but one way would be to extend the ArrayBuffer class, and override the += operator so that if the maximum size has been reached, the first element is dropped every time a new one is added. I haven't figured out how to properly extend collections yet. What I have so far is:
class FiniteGrowableArray[A](maxLength:Int) extends scala.collection.mutable.ArrayBuffer {
override def +=(elem:A): <insert some return type here> = {
// append element
if(length > maxLength) remove(0)
<returned collection>
}
}
Can someone suggest a better path and/or help me along this one? NOTE: I will need to arbitrarily access elements within the structure multiple times in between the += operations.
Thanks
As others have discussed, you want a ring buffer. However, you also have to decide if you actually want all of the collections methods or not, and if so, what happens when you filter a ring buffer of maximum size N--does it keep its maximum size, or what?
If you're okay with merely being able to view your ring buffer as part of the collections hierarchy (but don't want to use collections efficiently to generate new ring buffers) then you can just:
class RingBuffer[T: ClassManifest](maxsize: Int) {
private[this] val buffer = new Array[T](maxsize+1)
private[this] var i0,i1 = 0
private[this] def i0up = { i0 += 1; if (i0>=buffer.length) i0 -= buffer.length }
private[this] def i0dn = { i0 -= 1; if (i0<0) i0 += buffer.length }
private[this] def i1up = { i1 += 1; if (i1>=buffer.length) i1 -= buffer.length }
private[this] def i1dn = { i1 -= 1; if (i1<0) i1 += buffer.length }
private[this] def me = this
def apply(i: Int) = {
val j = i+i0
if (j >= buffer.length) buffer(j-buffer.length) else buffer(j)
}
def size = if (i1<i0) buffer.length+i1-i0 else i1-i0
def :+(t: T) = {
buffer(i1) = t
i1up; if (i1==i0) i0up
this
}
def +:(t: T) = {
i0dn; if (i0==i1) i1dn
buffer(i0) = t
this
}
def popt = {
if (i1==i0) throw new java.util.NoSuchElementException
i1dn; buffer(i1)
}
def poph = {
if (i1==i0) throw new java.util.NoSuchElementException
val ans = buffer(i0); i0up; ans
}
def seqView = new IndexedSeq[T] {
def apply(i: Int) = me(i)
def length = me.size
}
}
Now you can use this easily directly, and you can jump out to IndexedSeq when needed:
val r = new RingBuffer[Int](4)
r :+ 7 :+ 9 :+ 2
r.seqView.mkString(" ") // Prints 7 9 2
r.popt // Returns 2
r.poph // Returns 7
r :+ 6 :+ 5 :+ 4 :+ 3
r.seqView.mkString(" ") // Prints 6 5 4 3 -- 7 fell off the end
0 +: 1 +: 2 +: r
r.seqView.mkString(" ") // Prints 0 1 2 6 -- added to front; 3,4,5 fell off
r.seqView.filter(_>1) // Vector(2,6)
and if you want to put things back into a ring buffer, you can
class RingBufferImplicit[T: ClassManifest](ts: Traversable[T]) {
def ring(maxsize: Int) = {
val rb = new RingBuffer[T](maxsize)
ts.foreach(rb :+ _)
rb
}
}
implicit def traversable2ringbuffer[T: ClassManifest](ts: Traversable[T]) = {
new RingBufferImplicit(ts)
}
and then you can do things like
val rr = List(1,2,3,4,5).ring(4)
rr.seqView.mkString(" ") // Prints 2,3,4,5
I need to compress/decompress some Data with a old, in house developed Algorithm.
There i have a lot of operations like:
if the next bit is 0 take the following 6 Bits and interpret them as an Int
if the next bits are 10 take the following 9 Bits and interpret them as an Int
etc.
Knows somebody something like a "Bitstrem" class in Scala? (I didn't found anything and hope that i didn't have to implement it by myself.)
Thanks
Edit:
I combined the answer with http://www.scala-lang.org/node/8413 ("The Architecture of Scala Collections") If somebody needs the samething:
abstract class Bit
object Bit {
val fromInt: Int => Bit = Array(Low, High)
val toInt: Bit => Int = Map(Low -> 0, High -> 1)
}
case object High extends Bit
case object Low extends Bit
import collection.IndexedSeqLike
import collection.mutable.{Builder, ArrayBuffer}
import collection.generic.CanBuildFrom
import collection.IndexedSeq
// IndexedSeqLike implements all concrete methods of IndexedSeq
// with newBuilder. (methods like take, filter, drop)
final class BitSeq private (val bits: Array[Int], val length: Int)
extends IndexedSeq[Bit]
with IndexedSeqLike[Bit, BitSeq]
{
import BitSeq._
// Mandatory for IndexedSeqLike
override protected[this] def newBuilder: Builder[Bit, BitSeq] =
BitSeq.newBuilder
//Mandatory for IndexedSeq
def apply(idx: Int): Bit = {
if(idx < 0 || length <= idx)
throw new IndexOutOfBoundsException
Bit.fromInt(bits(idx/N) >> (idx % N) & M)
}
}
object BitSeq {
// Bits per Int
private val N = 32
// Bitmask to isolate a bit
private val M = 0x01
def fromSeq(buf: Seq[Bit]): BitSeq = {
val bits = new Array[Int]((buf.length + N - 1) / N)
for(i <- 0 until buf.length) {
bits(i/N) |= Bit.toInt(buf(i)) << (i % N)
}
new BitSeq(bits, buf.length)
}
def apply(bits: Bit*) = fromSeq(bits)
def newBuilder: Builder[Bit, BitSeq] = new ArrayBuffer mapResult fromSeq
// Needed for map etc. (BitSeq map {:Bit} should return a BitSeq)
implicit def canBuilderFrom: CanBuildFrom[BitSeq, Bit, BitSeq] =
new CanBuildFrom[BitSeq, Bit, BitSeq] {
def apply(): Builder[Bit, BitSeq] = newBuilder
def apply(from: BitSeq): Builder[Bit, BitSeq] = newBuilder
}
}
There isn't any existing class that I'm aware of, but you can leverage the existing classes to help out with almost all of the difficult operations. The trick is to turn your data into a stream of Ints (or Bytes if there wouldn't be enough memory). You then can use all the handy collections methods (e.g. take) and only are left with the problem of turning bits into memory. But that's easy if you pack the bits in MSB order.
object BitExample {
def bitInt(ii: Iterator[Int]): Int = (0 /: ii)((i,b) => (i<<1)|b)
def bitInt(ii: Iterable[Int]): Int = bitInt(ii.iterator)
class ArrayBits(bytes: Array[Byte]) extends Iterator[Int] {
private[this] var buffer = 0
private[this] var index,shift = -1
def hasNext = (shift > 0) || (index+1 < bytes.length)
def next = {
if (shift <= 0) {
index += 1
buffer = bytes(index) & 0xFF
shift = 7
}
else shift -= 1
(buffer >> shift) & 0x1
}
}
}
And then you do things like
import BitExample._
val compressed = new ArrayBits( Array[Byte](14,29,126) ).toStream
val headless = compressed.dropWhile(_ == 0)
val (test,rest) = headless.splitAt(3)
if (bitInt(test) > 4) println(bitInt(rest.take(6)))
(You can decide whether you want to use the iterator directly or as a stream, list, or whatever.)