Converting List of lists to Stream - Functional Programming - scala

I am working on Scala to convert list of lists to list of customized object "Point"
class Point(val x: Int, val y: Int) {
var cX: Int = x
var cY: Int = y
}
Should I use Foreach or should I use Map or foreach in this case
def list_To_Point(_listOfPoints :List[List[String]]) : List[Point] = {
var elem =
lazy val _list: List[Point] = _listOfPoints.map(p=> new Point(p[0],p[1])
_list
}
I couldn't figure out where the problem exactly ?

def listToPoint(l:List[List[String]]):List[Point] =
l.collect({case x::y::Nil => new Point(x.toInt,y.toInt)})
But you really shouldn't use a List[String] to represent what is basically (Int,Int) …

ugly as hell and untested but it should work (pls consider making your structures immutable) :
case class Point(x:Int,y:Int)
object Point {
def listToPoint(listOfPoints:List[List[String]]):List[Point] =
listOfPoints.map(p => new Point(p(0).toInt,p(1).toInt))
}

Related

How to aggregateByKey with custom class for frequency distribution?

I am trying to create a frequency distribution.
My data is in the following pattern (ColumnIndex, (Value, countOfValue)) of type (Int, (Any, Long)). For instance, (1, (A, 10)) means for column index 1, there are 10 A's.
My goal is to get the top 100 values for all my index's or Keys.
Right away I can make it less compute intensive for my workload by doing an initial filter:
val freqNumDist = numRDD.filter(x => x._2._2 > 1)
Now I found an interesting example of a class, here which seems to fit my use case:
class TopNList (val maxSize:Int) extends Serializable {
val topNCountsForColumnArray = new mutable.ArrayBuffer[(Any, Long)]
var lowestColumnCountIndex:Int = -1
var lowestValue = Long.MaxValue
def add(newValue:Any, newCount:Long): Unit = {
if (topNCountsForColumnArray.length < maxSize -1) {
topNCountsForColumnArray += ((newValue, newCount))
} else if (topNCountsForColumnArray.length == maxSize) {
updateLowestValue
} else {
if (newCount > lowestValue) {
topNCountsForColumnArray.insert(lowestColumnCountIndex, (newValue, newCount))
updateLowestValue
}
}
}
def updateLowestValue: Unit = {
var index = 0
topNCountsForColumnArray.foreach{ r =>
if (r._2 < lowestValue) {
lowestValue = r._2
lowestColumnCountIndex = index
}
index+=1
}
}
}
So Now What I was thinking was putting together an aggregateByKey to use this class in order to get my top 100 values! The problem is that I am unsure of how to use this class in aggregateByKey in order to accomplish this goal.
val initFreq:TopNList = new TopNList(100)
def freqSeq(u: (TopNList), v:(Double, Long)) = (
u.add(v._1, v._2)
)
def freqComb(u1: TopNList, u2: TopNList) = (
u2.topNCountsForColumnArray.foreach(r => u1.add(r._1, r._2))
)
val freqNumDist = numRDD.filter(x => x._2._2 > 1).aggregateByKey(initFreq)(freqSeq, freqComb)
The obvious problem is that nothing is returned by the functions I am using. So I am wondering how to modify this class or do I need to think about this in a whole new light and just cherry pick some of the functions out of this class and add them to the functions I am using for the aggregateByKey?
I'm either thinking about classes wrong or the entire aggregateByKey or both!
Your projections implementations (freqSeq, freqComb) return Unit while you expect them to return TopNList
If intentially keep the style of your solution, the relevant impl should be
def freqSeq(u: TopNList, v:(Any, Long)) : TopNList = {
u.add(v._1, v._2) // operation gives void result (Unit)
u // this one of TopNList type
}
def freqComb(u1: TopNList, u2: TopNList) : TopNList = {
u2.topNCountsForColumnArray.foreach (r => u1.add (r._1, r._2) )
u1
}
Just take a look on aggregateByKey signature of PairRDDFunctions, what does it expect for
def aggregateByKey[U](zeroValue : U)(seqOp : scala.Function2[U, V, U], combOp : scala.Function2[U, U, U])(implicit evidence$3 : scala.reflect.ClassTag[U]) : org.apache.spark.rdd.RDD[scala.Tuple2[K, U]] = { /* compiled code */ }

Scala update value in a map while iterating

I'm trying to update the value in a map[String, WordCount] while iterating this
case class WordCount(name: String,
id: Int,
var count: Double,
var links: scala.collection.mutable.HashMap[String, Double],
var ent: Double) {
def withEnt(v: Double): WordCount = {println(s"v: $v"); copy(ent = v)}
}
var targets = mutable.HashMap[String, WordCount]()
The function calEnt is:
def calEnt(total: Double, links: scala.collection.mutable.HashMap[String, Double]): Double = {
var p: Double = 0.0
var ent: Double = 0.0
links.map(d => {
p = d._2 / total
ent -= p * Math.log(p)
})
return ent
}
And I'm using:
targets.map(m => m._2.withEnt(calEnt(m._2.count, m._2.links)))
for iterating the map and calculate the new value for ent and update this with withEnt. I can imprime in console the new value but it is not setting inside the map. What is the way to do that? please.
Use foreach:
targets.foreach(m => targets(m._1) = m._2.withEnt(calEnt(m._2.count, m._2.links)))
Self-contained example:
val m = scala.collection.mutable.HashMap[Int, Int](1 -> 1, 2 -> 2)
println(m)
m.foreach(p => m(p._1) = p._2 + 1)
println(m)
map method won't modify your targets HashMap. It will return a new, modified HashMap. Try this:
targets = targets.map(m => (m._1, m._2.withEnt(calEnt(m._2.count, m._2.links))))
Note also that we map to a pairs of keys m._1 and modified values. Not just to values.

DRY when passing similar functions to Scala map()

def doubleList(noList:List[Int]) = {
val result = noList.map{ number =>
number*2
}
result
}
def halfList(noList:List[Int]) = {
val result = noList.map{ number =>
number/2
}
result
}
def mapFunctionDRY(noList:List[Int])(codeBlock: () => Int) = {
}
println(halfList(List(1,2,3)))
println(doubleList(List(1,2,4)))
I was playing around with scala and noticed violation of DRY (Dont Repeat Yourself) in the above two functions doubleList and halfList. I want the code common in both the function to be isolated and just pass the code block which is different. That way my code would not be violating DRY principle. I know that you could pass in code block as argument in scala. That is what I intend to do in mapFunctionDRY
I want mapFunctionDRY to be in this way
def mapFunctionDRY(noList:List[Int])(codeBlock: () => Int) = {
noList.map{ number =>
codeBlock()
}
}
And code in doubleList and halfList to be similar to this
def doubleList(noList:List[Int]) = { mapFunctionDRY(noList){ () => number*2 } }
But I would get a compilation error if I do such thing. How can I make the code pass in as the parameter in this case to avoid violation of DRY. Can this code be reduced further to keep it DRY?
You don't need to reinvent job that map does quite DRY:
def double(x: Int) = x * 2
def half(x: Int) = x / 2
val xs = List(1,2,3,4)
xs.map(double)
// List[Int] = List(2, 4, 6, 8)
xs.map(half)
// List[Int] = List(0, 1, 1, 2)
The compilation error occurs because you want to map each Int to another Int. codeBlock: () => Int is a function that takes no argument.
codeBlock: Int => Int should do what you want. Then you can define something like this:
def doubleList(noList:List[Int]) = { mapFunctionDRY(noList){ (number : Int) => number*2 } }
Haven't tested it though.
Edit: And like the others said. This function is not really useful because it's like map but weaker in the sense that it can only be applied to List[Int]
Why are you building a wrapper around map, which actually provides the dry-est solution to your problem? I would suggest a different strategy:
val mapDouble = (x: Int) => x * 2
val mapHalf = (x: Int) => x / 2
List(1, 2, 3).map(mapDouble)
List(1, 2, 3).map(mapHalf)
Your functions operate on one element of a list. Therefore instead of codeBlock being a () => Int, I would change it to (Int) => Int. So given one element of a list what do you want to do with it.
This results in the following code:
def mapFunctionDRY(noList:List[Int])(elementFn: (Int) => Int) = {
noList.map{ number =>
elementFn(number)
}
}
And if you're into short code, then the equivalent code is:
def mapFunctionDRY(noList:List[Int])(elementFn: (Int) => Int) = noList.map(elementFn)
There are many other ways to keep being DRY. For an example your could define the operations separately to be able to reuse them:
val doubleOperation: Int => Int = _ * 2
val halfOperation: Int => Int = _ / 2
def doubleList(noList:List[Int]) = noList.map(doubleOperation)
def halfList(noList:List[Int]) = noList.map(halfOperation)
Or you could use function currying to save yourself one line of code:
def mapFunction(fn: (Int) => Int)(noList: List[Int]) = noList.map(fn)
val doubleList = mapFunction(_ * 2)
val halfList = mapFunction(_ / 2)
I think what you are looking for is Currying in this regard.
def func(factor:Double)(noList:List[Int]) ={
val result = noList.map{ number =>
number*factor
}
result
Now you can pass this function with func(0.5f)(noList) or func(1.0f)(noList)
You could even have References to the different Versions of your Function.
halfed = x:List[Int] => func(0.5f)(x)
doubled = x:List[Int] => func(2.0f)(x)

IndexedSeq-based equivalent of Stream?

I have a lazily-calculated sequence of objects, where the lazy calculation depends only on the index (not the previous items) and some constant parameters (p:Bar below). I'm currently using a Stream, however computing the stream.init is typically wasteful.
However, I really like that using Stream[Foo] = ... gets me out of implementing a cache, and has very light declaration syntax while still providing all the sugar (like stream(n) gets element n). Then again, I could just be using the wrong declaration:
class FooSrcCache(p:Bar) {
val src : Stream[FooSrc] = {
def error() : FooSrc = FooSrc(0,p)
def loop(i: Int): Stream[FooSrc] = {
FooSrc(i,p) #:: loop(i + 1)
}
error() #:: loop(1)
}
def apply(max: Int) = src(max)
}
Is there a Stream-comparable base Scala class, that is indexed instead of linear?
PagedSeq should do the job for you:
class FooSrcCache(p:Bar) {
private def fill(buf: Array[FooSrc], start: Int, end: Int) = {
for (i <- start until end) {
buf(i) = FooSrc(i,p)
}
end - start
}
val src = new PagedSeq[FooSrc](fill _)
def apply(max: Int) = src(max)
}
Note that this might calculate FooSrc with higher indices than you requested.

Selection Sort Generic type implementation

I worked my way implementing a recursive version of selection and quick sort,i am trying to modify the code in a way that it can sort a list of any generic type , i want to assume that the generic type supplied can be converted to Comparable at runtime.
Does anyone have a link ,code or tutorial on how to do this please
I am trying to modify this particular code
'def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
}
enter code here
def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
} '
Language: SCALA
In Scala, Java Comparator is replaced by Ordering (quite similar but comes with more useful methods). They are implemented for several types (primitives, strings, bigDecimals, etc.) and you can provide your own implementations.
You can then use scala implicit to ask the compiler to pick the correct one for you:
def sort[A]( lst: List[A] )( implicit ord: Ordering[A] ) = {
...
}
If you are using a predefined ordering, just call:
sort( myLst )
and the compiler will infer the second argument. If you want to declare your own ordering, use the keyword implicit in the declaration. For instance:
implicit val fooOrdering = new Ordering[Foo] {
def compare( f1: Foo, f2: Foo ) = {...}
}
and it will be implicitly use if you try to sort a List of Foo.
If you have several implementations for the same type, you can also explicitly pass the correct ordering object:
sort( myFooLst )( fooOrdering )
More info in this post.
For Quicksort, I'll modify an example from the "Scala By Example" book to make it more generic.
class Quicksort[A <% Ordered[A]] {
def sort(a:ArraySeq[A]): ArraySeq[A] =
if (a.length < 2) a
else {
val pivot = a(a.length / 2)
sort (a filter (pivot >)) ++ (a filter (pivot == )) ++
sort (a filter(pivot <))
}
}
Test with Int
scala> val quicksort = new Quicksort[Int]
quicksort: Quicksort[Int] = Quicksort#38ceb62f
scala> val a = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39 ,219)
a: scala.collection.mutable.ArraySeq[Int] = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39, 21
9)
scala> quicksort.sort(a).foreach(n=> (print(n), print (" " )))
1 1 2 2 3 5 9 39 219
Test with a custom class implementing Ordered
scala> case class Meh(x: Int, y:Int) extends Ordered[Meh] {
| def compare(that: Meh) = (x + y).compare(that.x + that.y)
| }
defined class Meh
scala> val q2 = new Quicksort[Meh]
q2: Quicksort[Meh] = Quicksort#7677ce29
scala> val a3 = ArraySeq(Meh(1,1), Meh(12,1), Meh(0,1), Meh(2,2))
a3: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(1,1), Meh(12,1), Meh(0
,1), Meh(2,2))
scala> q2.sort(a3)
res7: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(0,1), Meh(1,1), Meh(
2,2), Meh(12,1))
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler code for the reader. I don't think falling back to imperative style is a mistake for certain classes of problems (such as sorting algorithms which usually transform the input buffer (like a procedure) rather than resulting to a new sorted one
Here it is my solution:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to many Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))