scala collections circular buffer - scala

Just messing about here, with circular buffers. Is this a sensible implementation or is there a faster/more reliable way to skin this cat?
class CircularBuffer[T](size: Int)(implicit mf: Manifest[T]) {
private val arr = new scala.collection.mutable.ArrayBuffer[T]()
private var cursor = 0
val monitor = new ReentrantReadWriteLock()
def push(value: T) {
monitor.writeLock().lock()
try {
arr(cursor) = value
cursor += 1
cursor %= size
} finally {
monitor.writeLock().unlock()
}
}
def getAll: Array[T] = {
monitor.readLock().lock()
try {
val copy = new Array[T](size)
arr.copyToArray(copy)
copy
} finally {
monitor.readLock().unlock()
}
}
}

Creation
A type declaration to an existing scala collection class and an appender function is easier to implement than "rolling your own". As noted in the comments this implementation will likely not be as performant as a "true" circular buffer, but it gets the job done with very little coding:
import scala.collection.immutable
type CircularBuffer[T] = immutable.Vector[T]
def emptyCircularBuffer[T] : CircularBuffer[T] = immutable.Vector.empty[T]
def addToCircularBuffer[T](maxSize : Int)(buffer : CircularBuffer[T], item : T) : CircularBuffer[T] =
(buffer :+ item) takeRight maxSize
This means that your "CircularBuffer" is actually a Vector and you now get all of the corresponding Vector methods (filter, map, flatMap, etc...) for free:
var intCircularBuffer = emptyCircularBuffer[Int]
//Vector(41)
intCircularBuffer = addToCircularBuffer(2)(intCircularBuffer, 41)
//Vector(41, 42)
intCircularBuffer = addToCircularBuffer(2)(intCircularBuffer, 42)
//Vector(42, 43)
intCircularBuffer = addToCircularBuffer(2)(intCircularBuffer, 43)
//Vector(42)
val evens : CircularBuffer[Int] = intCircularBuffer filter ( _ % 2 == 0)
Indexing
You can similarly add a function for circular indexing:
def circularIndex[T](buffer : CircularBuffer[T])(index : Int) : T =
buffer(index % buffer.size)

Related

Distinct elements in priority queue

I am using this implementation of a bounded priority queue, https://gist.github.com/ryanlecompte/5746241, and it works perfectly. However, I want this queue to not contain any element with the same ordering 'ord'. How can I achieve this?
I tried to update the maybeReplaceLowest function by giving the function lteq instead of lt.
private def maybeReplaceLowest(a: A) {
if (ord.lteq(a, head)) {
dequeue()
super.+=(a)
}
}
But I think it does not work because the element which has the same ord as the new element might not be at the head. What could be a quick workaround for this problem?
Many thanks.
Scala PriorityQueue is implemented using Array. Which is very inefficient for lookup operation, which you need to do for every insert (to check if the element already exists in the queue.
For TreeSet is ideal for lookup and ordered storage.
But since this is a sealed class (scala-2.12.8) I created composite class
import scala.collection.mutable
class PrioritySet[A](val maxSize:Int)(implicit ordering:Ordering[A]) {
var set = mutable.TreeSet[A]()(ordering)
private def removeAdditional(): this.type = {
val additionalElements = set.size - maxSize
if( additionalElements > 0) { set = set.drop(additionalElements) }
this
}
def +(elem: A): this.type = {
set += elem
removeAdditional()
}
def ++=(elem: A, elements: A*) : this.type = {
set += elem ++= elements
removeAdditional()
}
override def toString = this.getClass.getName + set.mkString("(", ", ", ")")
}
This can be used as
>>> val set = new PrioritySet[Int](4)
>>> set ++=(1000,3,42,2,5, 1,2,4, 100)
>>> println(set)
PrioritySet(5, 42, 100, 1000)

How to aggregateByKey with custom class for frequency distribution?

I am trying to create a frequency distribution.
My data is in the following pattern (ColumnIndex, (Value, countOfValue)) of type (Int, (Any, Long)). For instance, (1, (A, 10)) means for column index 1, there are 10 A's.
My goal is to get the top 100 values for all my index's or Keys.
Right away I can make it less compute intensive for my workload by doing an initial filter:
val freqNumDist = numRDD.filter(x => x._2._2 > 1)
Now I found an interesting example of a class, here which seems to fit my use case:
class TopNList (val maxSize:Int) extends Serializable {
val topNCountsForColumnArray = new mutable.ArrayBuffer[(Any, Long)]
var lowestColumnCountIndex:Int = -1
var lowestValue = Long.MaxValue
def add(newValue:Any, newCount:Long): Unit = {
if (topNCountsForColumnArray.length < maxSize -1) {
topNCountsForColumnArray += ((newValue, newCount))
} else if (topNCountsForColumnArray.length == maxSize) {
updateLowestValue
} else {
if (newCount > lowestValue) {
topNCountsForColumnArray.insert(lowestColumnCountIndex, (newValue, newCount))
updateLowestValue
}
}
}
def updateLowestValue: Unit = {
var index = 0
topNCountsForColumnArray.foreach{ r =>
if (r._2 < lowestValue) {
lowestValue = r._2
lowestColumnCountIndex = index
}
index+=1
}
}
}
So Now What I was thinking was putting together an aggregateByKey to use this class in order to get my top 100 values! The problem is that I am unsure of how to use this class in aggregateByKey in order to accomplish this goal.
val initFreq:TopNList = new TopNList(100)
def freqSeq(u: (TopNList), v:(Double, Long)) = (
u.add(v._1, v._2)
)
def freqComb(u1: TopNList, u2: TopNList) = (
u2.topNCountsForColumnArray.foreach(r => u1.add(r._1, r._2))
)
val freqNumDist = numRDD.filter(x => x._2._2 > 1).aggregateByKey(initFreq)(freqSeq, freqComb)
The obvious problem is that nothing is returned by the functions I am using. So I am wondering how to modify this class or do I need to think about this in a whole new light and just cherry pick some of the functions out of this class and add them to the functions I am using for the aggregateByKey?
I'm either thinking about classes wrong or the entire aggregateByKey or both!
Your projections implementations (freqSeq, freqComb) return Unit while you expect them to return TopNList
If intentially keep the style of your solution, the relevant impl should be
def freqSeq(u: TopNList, v:(Any, Long)) : TopNList = {
u.add(v._1, v._2) // operation gives void result (Unit)
u // this one of TopNList type
}
def freqComb(u1: TopNList, u2: TopNList) : TopNList = {
u2.topNCountsForColumnArray.foreach (r => u1.add (r._1, r._2) )
u1
}
Just take a look on aggregateByKey signature of PairRDDFunctions, what does it expect for
def aggregateByKey[U](zeroValue : U)(seqOp : scala.Function2[U, V, U], combOp : scala.Function2[U, U, U])(implicit evidence$3 : scala.reflect.ClassTag[U]) : org.apache.spark.rdd.RDD[scala.Tuple2[K, U]] = { /* compiled code */ }

IndexedSeq-based equivalent of Stream?

I have a lazily-calculated sequence of objects, where the lazy calculation depends only on the index (not the previous items) and some constant parameters (p:Bar below). I'm currently using a Stream, however computing the stream.init is typically wasteful.
However, I really like that using Stream[Foo] = ... gets me out of implementing a cache, and has very light declaration syntax while still providing all the sugar (like stream(n) gets element n). Then again, I could just be using the wrong declaration:
class FooSrcCache(p:Bar) {
val src : Stream[FooSrc] = {
def error() : FooSrc = FooSrc(0,p)
def loop(i: Int): Stream[FooSrc] = {
FooSrc(i,p) #:: loop(i + 1)
}
error() #:: loop(1)
}
def apply(max: Int) = src(max)
}
Is there a Stream-comparable base Scala class, that is indexed instead of linear?
PagedSeq should do the job for you:
class FooSrcCache(p:Bar) {
private def fill(buf: Array[FooSrc], start: Int, end: Int) = {
for (i <- start until end) {
buf(i) = FooSrc(i,p)
}
end - start
}
val src = new PagedSeq[FooSrc](fill _)
def apply(max: Int) = src(max)
}
Note that this might calculate FooSrc with higher indices than you requested.

How to add 'Array[Ordered[Any]]' as a method parameter

Below is an implementation of Selection sort written in Scala.
The line ss.sort(arr) causes this error :
type mismatch; found : Array[String] required: Array[Ordered[Any]]
Since the type Ordered is inherited by StringOps should this type not be inferred ?
How can I add the array of Strings to sort() method ?
Here is the complete code :
object SelectionSortTest {
def main(args: Array[String]){
val arr = Array("Hello","World")
val ss = new SelectionSort()
ss.sort(arr)
}
}
class SelectionSort {
def sort(a : Array[Ordered[Any]]) = {
var N = a.length
for (i <- 0 until N) {
var min = i
for(j <- i + 1 until N){
if( less(a(j) , a(min))){
min = j
}
exchange(a , i , min)
}
}
}
def less(v : Ordered[Any] , w : Ordered[Any]) = {
v.compareTo(w) < 0
}
def exchange(a : Array[Ordered[Any]] , i : Integer , j : Integer) = {
var swap : Ordered[Any] = a(i)
a(i) = a(j)
a(j) = swap
}
}
Array is invariant. You cannot use an Array[A] as an Array[B] even if A is subtype of B. See here why: Why are Arrays invariant, but Lists covariant?
Neither is Ordered, so your implementation of less will not work either.
You should make your implementation generic the following way:
object SelectionSortTest {
def main(args: Array[String]){
val arr = Array("Hello","World")
val ss = new SelectionSort()
ss.sort(arr)
}
}
class SelectionSort {
def sort[T <% Ordered[T]](a : Array[T]) = {
var N = a.length
for (i <- 0 until N) {
var min = i
for(j <- i + 1 until N){
if(a(j) < a(min)){ // call less directly on Ordered[T]
min = j
}
exchange(a , i , min)
}
}
}
def exchange[T](a : Array[T] , i : Integer , j : Integer) = {
var swap = a(i)
a(i) = a(j)
a(j) = swap
}
}
The somewhat bizarre statement T <% Ordered[T] means "any type T that can be implicitly converted to Ordered[T]". This ensures that you can still use the less-than operator.
See this for details:
What are Scala context and view bounds?
The answer by #gzm0 (with some very nice links) suggests Ordered. I'm going to complement with an answer covering Ordering, which provides equivalent functionality without imposing on your classes as much.
Let's adjust the sort method to accept an array of type 'T' for which an Ordering implicit instance is defined.
def sort[T : Ordering](a: Array[T]) = {
val ord = implicitly[Ordering[T]]
import ord._ // now comparison operations such as '<' are available for 'T'
// ...
if (a(j) < a(min))
// ...
}
The [T : Ordering] and implicitly[Ordering[T]] combo is equivalent to an implicit parameter of type Ordering[T]:
def sort[T](a: Array[T])(implicit ord: Ordering[T]) = {
import ord._
// ...
}
Why is this useful?
Imagine you are provided with a case class Account(balance: Int) by some third party. You can now add an Ordering for it like so:
// somewhere in scope
implicit val accountOrdering = new Ordering[Account] {
def compare(x: Account, y: Account) = x.balance - y.balance
}
// or, more simply
implicit val accountOrdering: Ordering[Account] = Ordering by (_.balance)
As long as that instance is in scope, you should be able to use sort(accounts).
If you want to use some different ordering, you can also provide it explicitly, like so: sort(accounts)(otherOrdering).
Note that this isn't very different from providing an implicit conversion to Ordering (at least not within the context of this question).
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler and performant code.
I don't think falling back to imperative style is a mistake for certain classes of problems, such as sorting algorithms which usually transform the input buffer (more like a procedure) rather than resulting to a new sorted collection.
Here it is my solution:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, to achieve parametric polymorphism, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))

Selection Sort Generic type implementation

I worked my way implementing a recursive version of selection and quick sort,i am trying to modify the code in a way that it can sort a list of any generic type , i want to assume that the generic type supplied can be converted to Comparable at runtime.
Does anyone have a link ,code or tutorial on how to do this please
I am trying to modify this particular code
'def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
}
enter code here
def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
} '
Language: SCALA
In Scala, Java Comparator is replaced by Ordering (quite similar but comes with more useful methods). They are implemented for several types (primitives, strings, bigDecimals, etc.) and you can provide your own implementations.
You can then use scala implicit to ask the compiler to pick the correct one for you:
def sort[A]( lst: List[A] )( implicit ord: Ordering[A] ) = {
...
}
If you are using a predefined ordering, just call:
sort( myLst )
and the compiler will infer the second argument. If you want to declare your own ordering, use the keyword implicit in the declaration. For instance:
implicit val fooOrdering = new Ordering[Foo] {
def compare( f1: Foo, f2: Foo ) = {...}
}
and it will be implicitly use if you try to sort a List of Foo.
If you have several implementations for the same type, you can also explicitly pass the correct ordering object:
sort( myFooLst )( fooOrdering )
More info in this post.
For Quicksort, I'll modify an example from the "Scala By Example" book to make it more generic.
class Quicksort[A <% Ordered[A]] {
def sort(a:ArraySeq[A]): ArraySeq[A] =
if (a.length < 2) a
else {
val pivot = a(a.length / 2)
sort (a filter (pivot >)) ++ (a filter (pivot == )) ++
sort (a filter(pivot <))
}
}
Test with Int
scala> val quicksort = new Quicksort[Int]
quicksort: Quicksort[Int] = Quicksort#38ceb62f
scala> val a = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39 ,219)
a: scala.collection.mutable.ArraySeq[Int] = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39, 21
9)
scala> quicksort.sort(a).foreach(n=> (print(n), print (" " )))
1 1 2 2 3 5 9 39 219
Test with a custom class implementing Ordered
scala> case class Meh(x: Int, y:Int) extends Ordered[Meh] {
| def compare(that: Meh) = (x + y).compare(that.x + that.y)
| }
defined class Meh
scala> val q2 = new Quicksort[Meh]
q2: Quicksort[Meh] = Quicksort#7677ce29
scala> val a3 = ArraySeq(Meh(1,1), Meh(12,1), Meh(0,1), Meh(2,2))
a3: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(1,1), Meh(12,1), Meh(0
,1), Meh(2,2))
scala> q2.sort(a3)
res7: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(0,1), Meh(1,1), Meh(
2,2), Meh(12,1))
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler code for the reader. I don't think falling back to imperative style is a mistake for certain classes of problems (such as sorting algorithms which usually transform the input buffer (like a procedure) rather than resulting to a new sorted one
Here it is my solution:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to many Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))