Shrink method in array stack (scala) - scala

I am trying to implement a resize function that shrinks and grows an array stack depending on its number of elements and the array's size in Scala3. Here is my code below:
package adt
import scala.reflect.ClassTag
class ArrayStack[A: ClassTag]:
private var dataArray: Array[A] = Array.fill(10)(null.asInstanceOf[A])
private var top: Int = 0
private var sz: Int = 10
def push(elem: A): Unit =
if top == sz then resize(grow = true)
dataArray(top) = elem
top += 1
def pop(): A =
if top == (sz / 2) - 1 then resize(grow = false)
top -= 1
dataArray(top)
def peek(): A =
dataArray(top - 1)
def isEmpty(): Boolean =
top == 0
def resize(grow: Boolean): Unit =
val newSize = if grow then sz * 2 else sz / 2
val newArray: Array[A] = Array.fill(newSize)(null.asInstanceOf[A])
for (a <- 0 until dataArray.length) do newArray(a) = dataArray(a)
dataArray = newArray
sz = newSize
However, in my JUnit tests, I get an array index out of bounds exception.
#Test def pushMultiple(): Unit =
val stack = new ArrayStack[Int]
val pushArr = Array.tabulate(100)(i => Math.round(i * 100))
pushArr.foreach(stack.push(_))
for (n <- pushArr.reverse) do assertEquals(n, stack.pop())
#Test def popMultiple(): Unit =
val stack = new ArrayStack[Int]
val pushArr = Array.tabulate(100)(i => Math.round(i * 100))
pushArr.foreach(stack.push(_))
for (i <- 1 to 1000) do stack.pop()
assertTrue(stack.isEmpty())
Error message:
Test arraystack_test.pushMultiple failed: java.lang.ArrayIndexOutOfBoundsException: Index 80 out of bounds for length 80, took 0.032 sec
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:75)
ScalaRunTime.scala:75
at adt.ArrayStack.resize$$anonfun$1(ArrayStack.scala:30)
ArrayStack.scala:30
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
at scala.collection.immutable.Range.foreach(Range.scala:190)
Range.scala:190
at adt.ArrayStack.resize(ArrayStack.scala:30)
ArrayStack.scala:30
at adt.ArrayStack.pop(ArrayStack.scala:17)
ArrayStack.scala:17
at arraystack_test.pushMultiple$$anonfun$2(arraystack_test.scala:26)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1324)
ArrayOps.scala:1324
at arraystack_test.pushMultiple(arraystack_test.scala:26)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
NativeMethodAccessorImpl.java:34
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
NativeMethodAccessorImpl.java:62
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
DelegatingMethodAccessorImpl.java:43
at java.lang.reflect.Method.invoke(Method.java:566)
Method.java:566
...
Test arraystack_test.popMultiple failed: java.lang.ArrayIndexOutOfBoundsException: Index 80 out of bounds for length 80, took 0.0 sec
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:75)
ScalaRunTime.scala:75
at adt.ArrayStack.resize$$anonfun$1(ArrayStack.scala:30)
ArrayStack.scala:30
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
at scala.collection.immutable.Range.foreach(Range.scala:190)
Range.scala:190
at adt.ArrayStack.resize(ArrayStack.scala:30)
ArrayStack.scala:30
at adt.ArrayStack.pop(ArrayStack.scala:17)
ArrayStack.scala:17
at arraystack_test.popMultiple$$anonfun$2(arraystack_test.scala:32)
at scala.runtime.java8.JFunction1$mcII$sp.apply(JFunction1$mcII$sp.scala:17)
at scala.collection.immutable.Range.foreach(Range.scala:190)
Range.scala:190
at arraystack_test.popMultiple(arraystack_test.scala:32)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
NativeMethodAccessorImpl.java:34
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
NativeMethodAccessorImpl.java:62
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
DelegatingMethodAccessorImpl.java:43
at java.lang.reflect.Method.invoke(Method.java:566)
Can anyone please help me figure out why my resize shrink function does not work?

Nevermind, I figured it out.
In the resize method, this causes an index out of bounds exception
for (a <- 0 until dataArray.length) do newArray(a) = dataArray(a)
When the method is called for grow, it works fine. But, when the method is called for shrink, newArray has a length that is half of dataArray, so it will cause an index out of bounds exception.
To fix this:
def resize(grow: Boolean): Unit =
val newSize = if grow then sz * 2 else sz / 2
val endRange: Int = if grow then dataArray.length else newSize
val newArray: Array[A] = Array.fill(newSize)(null.asInstanceOf[A])
for (a <- 0 until endRange) do newArray(a) = dataArray(a)
dataArray = newArray
sz = newSize

Related

How to find Sum at Each partition in Spark

I have created class and used that class to create RDD. I want to calculate sum of LoudnessRate (member of class) at each partition. This sum will be later used to calculate Mean LoudnessRate at each partition.
I have tried following code but it does not calculate Sum and returns 0.0.
My code is
object sparkBAT {
def main(args: Array[String]): Unit = {
val numPartitions = 3
val N = 50
val d = 5
val MinVal = -10
val MaxVal = 10
val conf = new SparkConf().setMaster(locally("local")).setAppName("spark Sum")
val sc = new SparkContext(conf)
val ba = List.fill(N)(new BAT(d, MinVal, MaxVal))
val rdd = sc.parallelize(ba, numPartitions)
var arrSum =Array.fill(numPartitions)(0.0) // Declare Array that will hold sum for each Partition
rdd.mapPartitionsWithIndex((k,iterator) => iterator.map(x => arrSum(k) += x.LoudnessRate)).collect()
arrSum foreach println
}
}
class BAT (dim:Int, min:Double, max:Double) extends Serializable {
val random = new Random()
var position : List[Double] = List.fill(dim) (random.nextDouble() * (max-min)+min )
var velocity :List[Double] = List.fill(dim)( math.random)
var PulseRate : Double = 0.1
var LoudnessRate :Double = 0.95
var frequency :Double = math.random
var fitness :Double = math.random
var BestPosition :List[Double] = List.fill(dim)(math.random)
var BestFitness :Double = math.random
}
Changing my comment to an answer as requested. Original comment
You are modifying arrSum in executor JVMs and printing its values in the dirver JVM. You can map the iterators to singleton iterators and use collect to move the values to the driver. Also, don't use iterator.map for side-effects, iterator.foreach is meant for that.
And here is a sample snippet how to do it. First creating a RDD with two partitions, 0 -> 1,2,3 and 1 -> 4,5. Naturally you would not need this in actual code but as the sc.parallelize behaviour changes depending on environment, this will always create uniform RDDs to reproduce:
object DemoPartitioner extends Partitioner {
override def numPartitions: Int = 2
override def getPartition(key: Any): Int = key match {
case num: Int => num
}
}
val rdd = sc
.parallelize(Seq((0, 1), (0, 2), (0, 3), (1, 4), (1, 5)))
.partitionBy(DemoPartitioner)
.map(_._2)
And then the actual trick:
val sumsByPartition = rdd.mapPartitionsWithIndex {
case (partitionNum, it) => Iterator.single(partitionNum -> it.sum)
}.collect().toMap
println(sumsByPartition)
Outputs:
Map(0 -> 6, 1 -> 9)
The problem is that you're using arrSum (a regular collection) that is declared in your Driver and updated in the Executors. Whenever you're doing that you need to use Accumulators.
This should help

Scala nested for loop

I'm a new Scala fellow and using processing library in scala ,I have 2 questions here:
val grid: Array[Cell] = Array()
val w = 60
val rows = height/w
val cols = width /w
override def setup(): Unit = {
for(j <- 0 until rows;
i <- 0 until cols){
val cell = new Cell(i,j)
grid :+ cell
println(s"at row : $j, at col: $i") //it compiles only once (at row : 0,
} //at col: 0 )
}
override def draw(): Unit = {
background(0)
grid.foreach(cell => cell.display())//nothing happens
}
but if i replace the variables rows & cols by height/w & width/w in the nested loop as follows:
for(j <- 0 until height/w;
i <- 0 until width/w){
val cell = new Cell(i,j)
grid :+ cell
println(s"at row : $j, at col: $i") //it compiles ordinary as nested
} //for loop
the second question is in the class Cell here:
class Cell(i: Int,j:Int){
def display(): Unit = {
val x = this.i * w
val y = this.j * w
println("it works")//doesn't work
//creating a square
stroke(255)
line(x,y,x+w,y)//top
line(x+w,y,x+x,y+w)//right
line(x,y+w,x+w,y+w)//bottom
line(x,y,x,y+w)//left
}
}
the method display doesn't work when calling at function draw() but no errors show up
Use tabulate to initialise your Array:
val grid = Array.tabulate(rows * cols) { i => new Cell(i % cols, i / cols) }
If you still have a problem with the display function then please post it as a separate question.

Scala: Haar Wavelet Transform

I am trying to implement Haar Wavelet Transform in Scala. I am using this Python Code for reference Github Link to Python implementation of HWT
I am also giving here my Scala code version. I am new to Scala so forgive me for not-so-good-code.
/**
* Created by vipul vaibhaw on 1/11/2017.
*/
import scala.collection.mutable.{ListBuffer, MutableList,ArrayBuffer}
object HaarWavelet {
def main(args: Array[String]): Unit = {
var samples = ListBuffer(
ListBuffer(1,4),
ListBuffer(6,1),
ListBuffer(0,2,4,6,7,7,7,7),
ListBuffer(1,2,3,4),
ListBuffer(7,5,1,6,3,0,2,4),
ListBuffer(3,2,3,7,5,5,1,1,0,2,5,1,2,0,1,2,0,2,1,0,0,2,1,2,0,2,1,0,0,2,1,2)
)
for (i <- 0 to samples.length){
var ubound = samples(i).max+1
var length = samples(i).length
var deltas1 = encode(samples(i), ubound)
var deltas = deltas1._1
var avg = deltas1._2
println( "Input: %s, boundary = %s, length = %s" format(samples(i), ubound, length))
println( "Haar output:%s, average = %s" format(deltas, avg))
println("Decoded: %s" format(decode(deltas, avg, ubound)))
}
}
def wrap(value:Int, ubound:Int):Int = {
(value+ubound)%ubound
}
def encode(lst1:ListBuffer[Int], ubound:Int):(ListBuffer[Int],Int)={
//var lst = ListBuffer[Int]()
//lst1.foreach(x=>lst+=x)
var lst = lst1
var deltas = new ListBuffer[Int]()
var avg = 0
while (lst.length>=2) {
var avgs = new ListBuffer[Int]()
while (lst.nonEmpty) {
// getting first two element from the list and removing them
val a = lst.head
lst -= 1 // removing index 0 element from the list
val b = lst.head
lst -= 1 // removing index 0 element from the list
if (a<=b) {
avg = (a + b)/2
}
else{
avg = (a+b+ubound)/2
}
var delta = wrap(b-a,ubound)
avgs += avg
deltas += delta
}
lst = avgs
}
(deltas, avg%ubound)
}
def decode(deltas:ListBuffer[Int],avg:Int,ubound:Int):ListBuffer[Int]={
var avgs = ListBuffer[Int](avg)
var l = 1
while(deltas.nonEmpty){
for(i <- 0 to l ){
val delta = deltas.last
deltas -= -1
val avg = avgs.last
avgs -= -1
val a = wrap(math.ceil(avg-delta/2.0).toInt,ubound)
val b = wrap(math.ceil(avg+delta/2.0).toInt,ubound)
}
l*=2
}
avgs
}
def is_pow2(n:Int):Boolean={
(n & -n) == n
}
}
But Code gets stuck at "var deltas1 = encode(samples(i), ubound)" and doesn't give any output. How can I improve my implementation? Thanks in advance!
Your error is on this line:
lst -= 1 // removing index 0 element from the list.
This doesn't remove index 0 from the list. It removes the element 1 (if it exists). This means that the list never becomes empty. The while-loop while (lst.nonEmpty) will therefore never terminate.
To remove the first element of the list, simply use lst.remove(0).

use spark run KMeans cluster , program block?

when I use apache spark scala API run the KMeans cluster. my program as follow:
object KMeans {
def closestPoint(p: Vector, centers: Array[Vector]) = {
var index = 0
var bestIndex = 0
var closest = Double.PositiveInfinity
for(i <- 0 until centers.length) {
var tempDist = p.squaredDist(centers(i))
if(tempDist < closest) {
closest = tempDist
bestIndex = i
}
}
bestIndex
}
def parseVector(line: String): Vector = {
new Vector(line.split("\\s+").map(s => s.toDouble))
}
def main(args: Array[String]): Unit = {}
System.setProperty("hadoop.home.dir", "F:/OpenSoft/hadoop-2.2.0")
val sc = new SparkContext("local", "kmeans cluster",
"G:/spark-0.9.0-incubating-bin-hadoop2",
SparkContext.jarOfClass(this.getClass()))
val lines = sc.textFile("G:/testData/synthetic_control.data.txt") // RDD[String]
val count = lines.count
val data = lines.map(parseVector _) // RDD[Vector]
data.foreach(println)
val K = 6
val convergeDist = 0.1
val kPoint = data.takeSample(withReplacement = false, K, 42) // Array[Vector]
kPoint.foreach(println)
var tempDist = 1.0
while(tempDist > convergeDist) {
val closest = data.map(p => (closestPoint(p, kPoint), (p, 1)))
val pointStat = closest.reduceByKey{case ((x1, y1), (x2, y2)) =>
(x1+x2, y1+y2)}
val newKPoint = pointStat.map{pair => (
pair._1,pair._2._1/pair._2._2)}.collectAsMap()
tempDist = 0.0
for(i <- 0 until K) {
tempDist += kPoint(i).squaredDist(newKPoint(i))
}
for(newP <- newKPoint) {
kPoint(newP._1) = newP._2
}
println("Finish iteration (delta=" + tempDist + ")")
}
println("Finish centers: ")
kPoint.foreach(println)
System.exit(0)
}
when I apply run as local mode , the log info as follow:
..................
14/03/31 11:29:15 INFO HadoopRDD: Input split: hdfs://hadoop-01:9000/data/synthetic_control.data:0+288374
program begin block , no running continue........
Can anyone can help me???

scala priority queue not ordering properly?

I'm seeing some strange behavior with Scala's collection.mutable.PriorityQueue. I'm performing an external sort and testing it with 1M records. Each time I run the test and verify the results between 10-20 records are not sorted properly. I replace the scala PriorityQueue implementation with a java.util.PriorityQueue and it works 100% of the time. Any ideas?
Here's the code (sorry it's a bit long...). I test it using the tools gensort -a 1000000 and valsort from http://sortbenchmark.org/
def externalSort(inFileName: String, outFileName: String)
(implicit ord: Ordering[String]): Int = {
val MaxTempFiles = 1024
val TempBufferSize = 4096
val inFile = new java.io.File(inFileName)
/** Partitions input file and sorts each partition */
def partitionAndSort()(implicit ord: Ordering[String]):
List[java.io.File] = {
/** Gets block size to use */
def getBlockSize: Long = {
var blockSize = inFile.length / MaxTempFiles
val freeMem = Runtime.getRuntime().freeMemory()
if (blockSize < freeMem / 2)
blockSize = freeMem / 2
else if (blockSize >= freeMem)
System.err.println("Not enough free memory to use external sort.")
blockSize
}
/** Sorts and writes data to temp files */
def writeSorted(buf: List[String]): java.io.File = {
// Create new temp buffer
val tmp = java.io.File.createTempFile("external", "sort")
tmp.deleteOnExit()
// Sort buffer and write it out to tmp file
val out = new java.io.PrintWriter(tmp)
try {
for (l <- buf.sorted) {
out.println(l)
}
} finally {
out.close()
}
tmp
}
val blockSize = getBlockSize
var tmpFiles = List[java.io.File]()
var buf = List[String]()
var currentSize = 0
// Read input and divide into blocks
for (line <- io.Source.fromFile(inFile).getLines()) {
if (currentSize > blockSize) {
tmpFiles ::= writeSorted(buf)
buf = List[String]()
currentSize = 0
}
buf ::= line
currentSize += line.length() * 2 // 2 bytes per char
}
if (currentSize > 0) tmpFiles ::= writeSorted(buf)
tmpFiles
}
/** Merges results of sorted partitions into one output file */
def mergeSortedFiles(fs: List[java.io.File])
(implicit ord: Ordering[String]): Int = {
/** Temp file buffer for reading lines */
class TempFileBuffer(val file: java.io.File) {
private val in = new java.io.BufferedReader(
new java.io.FileReader(file), TempBufferSize)
private var curLine: String = ""
readNextLine() // prep first value
def currentLine = curLine
def isEmpty = curLine == null
def readNextLine() {
if (curLine == null) return
try {
curLine = in.readLine()
} catch {
case _: java.io.EOFException => curLine = null
}
if (curLine == null) in.close()
}
override protected def finalize() {
try {
in.close()
} finally {
super.finalize()
}
}
}
val wrappedOrd = new Ordering[TempFileBuffer] {
def compare(o1: TempFileBuffer, o2: TempFileBuffer): Int = {
ord.compare(o1.currentLine, o2.currentLine)
}
}
val pq = new collection.mutable.PriorityQueue[TempFileBuffer](
)(wrappedOrd)
// Init queue with item from each file
for (tmp <- fs) {
val buf = new TempFileBuffer(tmp)
if (!buf.isEmpty) pq += buf
}
var count = 0
val out = new java.io.PrintWriter(new java.io.File(outFileName))
try {
// Read each value off of queue
while (pq.size > 0) {
val buf = pq.dequeue()
out.println(buf.currentLine)
count += 1
buf.readNextLine()
if (buf.isEmpty) {
buf.file.delete() // don't need anymore
} else {
// re-add to priority queue so we can process next line
pq += buf
}
}
} finally {
out.close()
}
count
}
mergeSortedFiles(partitionAndSort())
}
My tests don't show any bugs in PriorityQueue.
import org.scalacheck._
import Prop._
object PriorityQueueProperties extends Properties("PriorityQueue") {
def listToPQ(l: List[String]): PriorityQueue[String] = {
val pq = new PriorityQueue[String]
l foreach (pq +=)
pq
}
def pqToList(pq: PriorityQueue[String]): List[String] =
if (pq.isEmpty) Nil
else { val h = pq.dequeue; h :: pqToList(pq) }
property("Enqueued elements are dequeued in reverse order") =
forAll { (l: List[String]) => l.sorted == pqToList(listToPQ(l)).reverse }
property("Adding/removing elements doesn't break sorting") =
forAll { (l: List[String], s: String) =>
(l.size > 0) ==>
((s :: l.sorted.init).sorted == {
val pq = listToPQ(l)
pq.dequeue
pq += s
pqToList(pq).reverse
})
}
}
scala> PriorityQueueProperties.check
+ PriorityQueue.Enqueued elements are dequeued in reverse order: OK, passed
100 tests.
+ PriorityQueue.Adding/removing elements doesn't break sorting: OK, passed
100 tests.
If you could somehow reduce the input enough to make a test case, it would help.
I ran it with five million inputs several times, output matched expected always. My guess from looking at your code is that your Ordering is the problem (i.e. it's giving inconsistent answers.)