I'm curious if Scala has some gem hidden in its collection classes that I can use. Basically I'm looking for something like a FIFO queue, but that has an upper-limit on its size such that when the limit is hit, the oldest (first) element is removed from the queue. I've implemented this myself in Java in the past, but I'd rather use something standard if possible.
An often preferable alternative to subclassing is the (unfortunately named) "pimp my library" pattern. You can use it to add an enqueueFinite method to Queue, like so:
import scala.collection.immutable.Queue
class FiniteQueue[A](q: Queue[A]) {
def enqueueFinite[B >: A](elem: B, maxSize: Int): Queue[B] = {
var ret = q.enqueue(elem)
while (ret.size > maxSize) { ret = ret.dequeue._2 }
ret
}
}
implicit def queue2finitequeue[A](q: Queue[A]) = new FiniteQueue[A](q)
Whenever queue2finitequeue is in scope, you can treat Queue objects as though they have the enqueueFinite method:
val maxSize = 3
val q1 = Queue(1, 2, 3)
val q2 = q1.enqueueFinite(5, maxSize)
val q3 = q2.map(_+1)
val q4 = q3.enqueueFinite(7, maxSize)
The advantage of this approach over subclassing is that enqueueFinite is available to all Queues, including those that are constructed via operations like enqueue, map, ++, etc.
Update: As Dylan says in the comments, enqueueFinite needs also to take a parameter for the maximum queue size, and drop elements as necessary. I updated the code.
Why don't you just subclass a FIFO queue? Something like this should work: (pseudocode follows...)
class Limited(limit:Int) extends FIFO {
override def enqueue() = {
if (size >= limit) {
//remove oldest element
}
super.enqueue()
}
}
Here is an immutable solution:
class FixedSizeFifo[T](val limit: Int)
( private val out: List[T], private val in: List[T] )
extends Traversable[T] {
override def size = in.size + out.size
def :+( t: T ) = {
val (nextOut,nextIn) = if (size == limit) {
if( out.nonEmpty) {
( out.tail, t::in )
} else {
( in.reverse.tail, List(t) )
}
} else ( out, t::in )
new FixedSizeFifo( limit )( nextOut, nextIn )
}
private lazy val deq = {
if( out.isEmpty ) {
val revIn = in.reverse
( revIn.head, new FixedSizeFifo( limit )( revIn.tail, List() ) )
} else {
( out.head, new FixedSizeFifo( limit )( out.tail, in ) )
}
}
override lazy val head = deq._1
override lazy val tail = deq._2
def foreach[U]( f: T => U ) = ( out ::: in.reverse ) foreach f
}
object FixedSizeFifo {
def apply[T]( limit: Int ) = new FixedSizeFifo[T]( limit )(List(),List())
}
An example:
val fifo = FixedSizeFifo[Int](3) :+ 1 :+ 2 :+ 3 :+ 4 :+ 5 :+ 6
println( fifo ) //prints: FixedSizeFifo(4, 5, 6)
println( fifo.head ) //prints: 4
println( fifo.tail :+ 7 :+8 ) //prints: FixedSizeFifo(6, 7, 8)
This is the approach I toke with extending Scala's standard mutable.Queue class.
class LimitedQueue[A](maxSize: Int) extends mutable.Queue[A] {
override def +=(elem: A): this.type = {
if (length >= maxSize) dequeue()
appendElem(elem);
this
}
}
And simple use-case
var q2 = new LimitedQueue[Int](2)
q2 += 1
q2 += 2
q2 += 3
q2 += 4
q2 += 5
q2.foreach { n =>
println(n)
}
You'll get only 4 and 5 in the console as the old elements were dequeued beforehand.
Related
I am trying to understand the "add" and "extract" methods of the FPTree class:
(https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala).
What is the purpose of 'summaries' variable?
where is the Group list?
I assume it is the following, am I correct:
val numParts = if (numPartitions > 0) numPartitions else data.partitions.length
val partitioner = new HashPartitioner(numParts)
What will 'summaries contain for 3 transactions of {a,b,c} , {a,b} , {b,c} where all are frequent?
def add(t: Iterable[T], count: Long = 1L): FPTree[T] = {
require(count > 0)
var curr = root
curr.count += count
t.foreach { item =>
val summary = summaries.getOrElseUpdate(item, new Summary)
summary.count += count
val child = curr.children.getOrElseUpdate(item, {
val newNode = new Node(curr)
newNode.item = item
summary.nodes += newNode
newNode
})
child.count += count
curr = child
}
this
}
def extract(
minCount: Long,
validateSuffix: T => Boolean = _ => true): Iterator[(List[T], Long)] = {
summaries.iterator.flatMap { case (item, summary) =>
if (validateSuffix(item) && summary.count >= minCount) {
Iterator.single((item :: Nil, summary.count)) ++
project(item).extract(minCount).map { case (t, c) =>
(item :: t, c)
}
} else {
Iterator.empty
}
}
}
After a bit experiments, it is pretty straight forward:
1+2) The partition is indeed the Group representative.
It is also how the conditional transactions calculated:
private def genCondTransactions[Item: ClassTag](
transaction: Array[Item],
itemToRank: Map[Item, Int],
partitioner: Partitioner): mutable.Map[Int, Array[Int]] = {
val output = mutable.Map.empty[Int, Array[Int]]
// Filter the basket by frequent items pattern and sort their ranks.
val filtered = transaction.flatMap(itemToRank.get)
ju.Arrays.sort(filtered)
val n = filtered.length
var i = n - 1
while (i >= 0) {
val item = filtered(i)
val part = partitioner.getPartition(item)
if (!output.contains(part)) {
output(part) = filtered.slice(0, i + 1)
}
i -= 1
}
output
}
The summaries is just a helper to save the count of items in transaction
The extract/project will generate the FIS by using up/down recursion and dependent FP-Trees (project), while checking summaries if traversal that path is needed.
summaries of node 'a' will have {b:2,c:1} and children of node 'a' are 'b' and 'c'.
I am trying to create a frequency distribution.
My data is in the following pattern (ColumnIndex, (Value, countOfValue)) of type (Int, (Any, Long)). For instance, (1, (A, 10)) means for column index 1, there are 10 A's.
My goal is to get the top 100 values for all my index's or Keys.
Right away I can make it less compute intensive for my workload by doing an initial filter:
val freqNumDist = numRDD.filter(x => x._2._2 > 1)
Now I found an interesting example of a class, here which seems to fit my use case:
class TopNList (val maxSize:Int) extends Serializable {
val topNCountsForColumnArray = new mutable.ArrayBuffer[(Any, Long)]
var lowestColumnCountIndex:Int = -1
var lowestValue = Long.MaxValue
def add(newValue:Any, newCount:Long): Unit = {
if (topNCountsForColumnArray.length < maxSize -1) {
topNCountsForColumnArray += ((newValue, newCount))
} else if (topNCountsForColumnArray.length == maxSize) {
updateLowestValue
} else {
if (newCount > lowestValue) {
topNCountsForColumnArray.insert(lowestColumnCountIndex, (newValue, newCount))
updateLowestValue
}
}
}
def updateLowestValue: Unit = {
var index = 0
topNCountsForColumnArray.foreach{ r =>
if (r._2 < lowestValue) {
lowestValue = r._2
lowestColumnCountIndex = index
}
index+=1
}
}
}
So Now What I was thinking was putting together an aggregateByKey to use this class in order to get my top 100 values! The problem is that I am unsure of how to use this class in aggregateByKey in order to accomplish this goal.
val initFreq:TopNList = new TopNList(100)
def freqSeq(u: (TopNList), v:(Double, Long)) = (
u.add(v._1, v._2)
)
def freqComb(u1: TopNList, u2: TopNList) = (
u2.topNCountsForColumnArray.foreach(r => u1.add(r._1, r._2))
)
val freqNumDist = numRDD.filter(x => x._2._2 > 1).aggregateByKey(initFreq)(freqSeq, freqComb)
The obvious problem is that nothing is returned by the functions I am using. So I am wondering how to modify this class or do I need to think about this in a whole new light and just cherry pick some of the functions out of this class and add them to the functions I am using for the aggregateByKey?
I'm either thinking about classes wrong or the entire aggregateByKey or both!
Your projections implementations (freqSeq, freqComb) return Unit while you expect them to return TopNList
If intentially keep the style of your solution, the relevant impl should be
def freqSeq(u: TopNList, v:(Any, Long)) : TopNList = {
u.add(v._1, v._2) // operation gives void result (Unit)
u // this one of TopNList type
}
def freqComb(u1: TopNList, u2: TopNList) : TopNList = {
u2.topNCountsForColumnArray.foreach (r => u1.add (r._1, r._2) )
u1
}
Just take a look on aggregateByKey signature of PairRDDFunctions, what does it expect for
def aggregateByKey[U](zeroValue : U)(seqOp : scala.Function2[U, V, U], combOp : scala.Function2[U, U, U])(implicit evidence$3 : scala.reflect.ClassTag[U]) : org.apache.spark.rdd.RDD[scala.Tuple2[K, U]] = { /* compiled code */ }
I worked my way implementing a recursive version of selection and quick sort,i am trying to modify the code in a way that it can sort a list of any generic type , i want to assume that the generic type supplied can be converted to Comparable at runtime.
Does anyone have a link ,code or tutorial on how to do this please
I am trying to modify this particular code
'def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
}
enter code here
def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
} '
Language: SCALA
In Scala, Java Comparator is replaced by Ordering (quite similar but comes with more useful methods). They are implemented for several types (primitives, strings, bigDecimals, etc.) and you can provide your own implementations.
You can then use scala implicit to ask the compiler to pick the correct one for you:
def sort[A]( lst: List[A] )( implicit ord: Ordering[A] ) = {
...
}
If you are using a predefined ordering, just call:
sort( myLst )
and the compiler will infer the second argument. If you want to declare your own ordering, use the keyword implicit in the declaration. For instance:
implicit val fooOrdering = new Ordering[Foo] {
def compare( f1: Foo, f2: Foo ) = {...}
}
and it will be implicitly use if you try to sort a List of Foo.
If you have several implementations for the same type, you can also explicitly pass the correct ordering object:
sort( myFooLst )( fooOrdering )
More info in this post.
For Quicksort, I'll modify an example from the "Scala By Example" book to make it more generic.
class Quicksort[A <% Ordered[A]] {
def sort(a:ArraySeq[A]): ArraySeq[A] =
if (a.length < 2) a
else {
val pivot = a(a.length / 2)
sort (a filter (pivot >)) ++ (a filter (pivot == )) ++
sort (a filter(pivot <))
}
}
Test with Int
scala> val quicksort = new Quicksort[Int]
quicksort: Quicksort[Int] = Quicksort#38ceb62f
scala> val a = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39 ,219)
a: scala.collection.mutable.ArraySeq[Int] = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39, 21
9)
scala> quicksort.sort(a).foreach(n=> (print(n), print (" " )))
1 1 2 2 3 5 9 39 219
Test with a custom class implementing Ordered
scala> case class Meh(x: Int, y:Int) extends Ordered[Meh] {
| def compare(that: Meh) = (x + y).compare(that.x + that.y)
| }
defined class Meh
scala> val q2 = new Quicksort[Meh]
q2: Quicksort[Meh] = Quicksort#7677ce29
scala> val a3 = ArraySeq(Meh(1,1), Meh(12,1), Meh(0,1), Meh(2,2))
a3: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(1,1), Meh(12,1), Meh(0
,1), Meh(2,2))
scala> q2.sort(a3)
res7: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(0,1), Meh(1,1), Meh(
2,2), Meh(12,1))
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler code for the reader. I don't think falling back to imperative style is a mistake for certain classes of problems (such as sorting algorithms which usually transform the input buffer (like a procedure) rather than resulting to a new sorted one
Here it is my solution:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to many Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))
Is it possible to have angle brackets in method names , e.g. :
class Foo(ind1:Int,ind2:Int){...}
var v = new Foo(1,2)
v(1) = 3 //updates ind1
v<1> = 4 //updates ind2
The real situation is obviously more complicated than this!!I am trying to provide a convenient user interface.
This response is not meant to be taken too seriously - just a proof that this can almost be achieved using some hacks.
class Vector(values: Int*) {
val data = values.toArray
def < (i:Int) = new {
def `>_=`(x: Int) {
data(i) = x
}
def > {
println("value at "+ i +" is "+ data(i))
}
}
override def toString = data.mkString("<", ", ", ">")
}
val v = new Vector(1, 2, 3)
println(v) // prints <1, 2, 3>
v<1> = 10
println(v) // prints <1, 10, 3>
v<1> // prints: value at 1 is 10
Using this class we can have a vector that uses <> instead of () for "read" and write access.
The compiler (2.9.0.1) crashes if > returns a value. It might be a bug or a result of misusing >.
Edit: I was wrong; kassens's answer shows how to do it as you want.
It is not possible to implement a method that would be called when you write v<1> = 4 (except, maybe, if you write a compiler plugin?). However, something like this would be possible:
class Foo {
def at(i: Int) = new Assigner(i)
class Assigner(i: Int) {
def :=(v: Int) = println("assigning " + v + " at index " + i)
}
}
Then:
val f = new Foo
f at 4 := 6
With a little trickery you can actually get quite close to what you want.
object Foo {
val a:Array[Int] = new Array(100)
def <(i:Int) = new Updater(a, i)
}
class Updater(a:Array[Int], i:Int) {
def update(x:Int) {
a(i) = x
}
def >() = this
}
Foo<1>() = 123
I am not sure why Scala requires the () though. And yes, this is a bit of a hack...
I would like to be able to grow an Array-like structure up to a maximum size, after which the oldest (1st) element would be dropped off the structure every time a new element is added. I don't know what the best way to do this is, but one way would be to extend the ArrayBuffer class, and override the += operator so that if the maximum size has been reached, the first element is dropped every time a new one is added. I haven't figured out how to properly extend collections yet. What I have so far is:
class FiniteGrowableArray[A](maxLength:Int) extends scala.collection.mutable.ArrayBuffer {
override def +=(elem:A): <insert some return type here> = {
// append element
if(length > maxLength) remove(0)
<returned collection>
}
}
Can someone suggest a better path and/or help me along this one? NOTE: I will need to arbitrarily access elements within the structure multiple times in between the += operations.
Thanks
As others have discussed, you want a ring buffer. However, you also have to decide if you actually want all of the collections methods or not, and if so, what happens when you filter a ring buffer of maximum size N--does it keep its maximum size, or what?
If you're okay with merely being able to view your ring buffer as part of the collections hierarchy (but don't want to use collections efficiently to generate new ring buffers) then you can just:
class RingBuffer[T: ClassManifest](maxsize: Int) {
private[this] val buffer = new Array[T](maxsize+1)
private[this] var i0,i1 = 0
private[this] def i0up = { i0 += 1; if (i0>=buffer.length) i0 -= buffer.length }
private[this] def i0dn = { i0 -= 1; if (i0<0) i0 += buffer.length }
private[this] def i1up = { i1 += 1; if (i1>=buffer.length) i1 -= buffer.length }
private[this] def i1dn = { i1 -= 1; if (i1<0) i1 += buffer.length }
private[this] def me = this
def apply(i: Int) = {
val j = i+i0
if (j >= buffer.length) buffer(j-buffer.length) else buffer(j)
}
def size = if (i1<i0) buffer.length+i1-i0 else i1-i0
def :+(t: T) = {
buffer(i1) = t
i1up; if (i1==i0) i0up
this
}
def +:(t: T) = {
i0dn; if (i0==i1) i1dn
buffer(i0) = t
this
}
def popt = {
if (i1==i0) throw new java.util.NoSuchElementException
i1dn; buffer(i1)
}
def poph = {
if (i1==i0) throw new java.util.NoSuchElementException
val ans = buffer(i0); i0up; ans
}
def seqView = new IndexedSeq[T] {
def apply(i: Int) = me(i)
def length = me.size
}
}
Now you can use this easily directly, and you can jump out to IndexedSeq when needed:
val r = new RingBuffer[Int](4)
r :+ 7 :+ 9 :+ 2
r.seqView.mkString(" ") // Prints 7 9 2
r.popt // Returns 2
r.poph // Returns 7
r :+ 6 :+ 5 :+ 4 :+ 3
r.seqView.mkString(" ") // Prints 6 5 4 3 -- 7 fell off the end
0 +: 1 +: 2 +: r
r.seqView.mkString(" ") // Prints 0 1 2 6 -- added to front; 3,4,5 fell off
r.seqView.filter(_>1) // Vector(2,6)
and if you want to put things back into a ring buffer, you can
class RingBufferImplicit[T: ClassManifest](ts: Traversable[T]) {
def ring(maxsize: Int) = {
val rb = new RingBuffer[T](maxsize)
ts.foreach(rb :+ _)
rb
}
}
implicit def traversable2ringbuffer[T: ClassManifest](ts: Traversable[T]) = {
new RingBufferImplicit(ts)
}
and then you can do things like
val rr = List(1,2,3,4,5).ring(4)
rr.seqView.mkString(" ") // Prints 2,3,4,5