delay implementation in scala - scala

I have implemented the following cons_stream function in scala that does not work and I am not sure why.
def cons_stream[T, U](x : T, y : U) =
{
def delay = () => y
/// Delay takes no parameters but returns y
(f : String ) =>
{
if ( f == "x") x
else if( f == "y") delay
else throw new Error("Invalid string use x or y")
}
}
The corresponding car and cdr functions are:
def stream_car[T](f : String => T) : T = f("x")
def stream_cdr[T](f : String => Any) : T = force(f("y").asInstanceOf[() => T])
Now I have the definition of a stream integers starting with 1
def integers_starting_from_n[T, U](n : Int) : String => Any =
{
cons_stream(n, integers_starting_from_n(n+1))
}
Unfortunately when I try to access the stream using either stream_car or stream_cdr I get a stack overflow:
def integers = integers_starting_from_n(1)
stream_car(integers)
I have no idea why. Any help is appreciated.

I assume that the stack is full of integers_starting_from_n. Correct? That function is recursive and is called before cons_stream can be executed, because it takes value of integers_starting_from_n(n+1) as a paremeter.

In order to define a stream, you can use a call-by-name parameter by prepending => to the type. For your example, use call-by-name with y.
def cons_stream[T, U](x:T, y: => U)

Related

scala variable number by name parameters [duplicate]

This question already has answers here:
Scala variable argument list with call-by-name possible?
(2 answers)
Closed 6 years ago.
I am trying to implement a control flow structure which can accept a variable number of by-name parameters.
See CalculateGroup method and its use.
I was trying to follow this post, but still have some issues
As I can see from the error, I suspect I need to define a type annotation predicate in CalculateGroup function?
Here is current code:
def compare[T : Numeric](x: T)(y: T) : Boolean = implicitly[Numeric[T]].gt( x, y )
val items = compare[Double](10) _
val assertionsEnabled = true
def Calculate( predicate: => Boolean ) =
if (assertionsEnabled && !predicate)
throw new AssertionError
Calculate{
items(5)
}
def CalculateGroup( list: (predicate: => Boolean) *) =
{
list.foreach( (p : (predicate: => Boolean) ) => {
if (assertionsEnabled && !predicate)
throw new AssertionError
})
}
CalculateGroup{
items(5),
items(3),
items(8)
}
Error details:
scala ControlFlow.scala
/Users/pavel/Documents/ControlFlow/ControlFlow.scala:36: error: ')' expected but ':' found.
def CalculateGroup( list: (predicate: => Boolean) *) =
^
/Users/pavel/Documents/ControlFlow/ControlFlow.scala:68: error: ')' expected but '}' found.
}
^
two errors found
You cannot use by-name var args, you could use a lazy collection like Iterator or maybe Stream:
def compare[T : Numeric](x: T)(y: T) : Boolean = implicitly[Numeric[T]].gt( x, y )
val items = compare[Double](10) _
val assertionsEnabled = true
def Calculate(predicate: => Boolean) =
if (assertionsEnabled && !predicate)
throw new AssertionError
Calculate{
items(5)
}
def CalculateGroup(list: Iterator[Boolean]) =
{
list.foreach { (p : Boolean ) =>
if (assertionsEnabled && !p) {
throw new AssertionError
}
}
}
CalculateGroup{Iterator(
items(5),
items(3),
items(8)
)}
You have a syntax problem... you are placing a colon in front of the word predicate in the signature of the method CalculateGroup and in the foreach. Just remove them and it should compile.
just remove it and know that the word predicate is not alias for a variable, but it should be the name of a class. So it's better if you capitalize it. Contrary to the case of your methods, which shouldn't be capitalized.
Update
To have multiple by-name parameters just do this:
def CalculateGroup( list: (=> Boolean) *) =
{
list.foreach( (p : (=> Boolean) ) => {
if (assertionsEnabled && !p)
throw new AssertionError
})
}

How to aggregateByKey with custom class for frequency distribution?

I am trying to create a frequency distribution.
My data is in the following pattern (ColumnIndex, (Value, countOfValue)) of type (Int, (Any, Long)). For instance, (1, (A, 10)) means for column index 1, there are 10 A's.
My goal is to get the top 100 values for all my index's or Keys.
Right away I can make it less compute intensive for my workload by doing an initial filter:
val freqNumDist = numRDD.filter(x => x._2._2 > 1)
Now I found an interesting example of a class, here which seems to fit my use case:
class TopNList (val maxSize:Int) extends Serializable {
val topNCountsForColumnArray = new mutable.ArrayBuffer[(Any, Long)]
var lowestColumnCountIndex:Int = -1
var lowestValue = Long.MaxValue
def add(newValue:Any, newCount:Long): Unit = {
if (topNCountsForColumnArray.length < maxSize -1) {
topNCountsForColumnArray += ((newValue, newCount))
} else if (topNCountsForColumnArray.length == maxSize) {
updateLowestValue
} else {
if (newCount > lowestValue) {
topNCountsForColumnArray.insert(lowestColumnCountIndex, (newValue, newCount))
updateLowestValue
}
}
}
def updateLowestValue: Unit = {
var index = 0
topNCountsForColumnArray.foreach{ r =>
if (r._2 < lowestValue) {
lowestValue = r._2
lowestColumnCountIndex = index
}
index+=1
}
}
}
So Now What I was thinking was putting together an aggregateByKey to use this class in order to get my top 100 values! The problem is that I am unsure of how to use this class in aggregateByKey in order to accomplish this goal.
val initFreq:TopNList = new TopNList(100)
def freqSeq(u: (TopNList), v:(Double, Long)) = (
u.add(v._1, v._2)
)
def freqComb(u1: TopNList, u2: TopNList) = (
u2.topNCountsForColumnArray.foreach(r => u1.add(r._1, r._2))
)
val freqNumDist = numRDD.filter(x => x._2._2 > 1).aggregateByKey(initFreq)(freqSeq, freqComb)
The obvious problem is that nothing is returned by the functions I am using. So I am wondering how to modify this class or do I need to think about this in a whole new light and just cherry pick some of the functions out of this class and add them to the functions I am using for the aggregateByKey?
I'm either thinking about classes wrong or the entire aggregateByKey or both!
Your projections implementations (freqSeq, freqComb) return Unit while you expect them to return TopNList
If intentially keep the style of your solution, the relevant impl should be
def freqSeq(u: TopNList, v:(Any, Long)) : TopNList = {
u.add(v._1, v._2) // operation gives void result (Unit)
u // this one of TopNList type
}
def freqComb(u1: TopNList, u2: TopNList) : TopNList = {
u2.topNCountsForColumnArray.foreach (r => u1.add (r._1, r._2) )
u1
}
Just take a look on aggregateByKey signature of PairRDDFunctions, what does it expect for
def aggregateByKey[U](zeroValue : U)(seqOp : scala.Function2[U, V, U], combOp : scala.Function2[U, U, U])(implicit evidence$3 : scala.reflect.ClassTag[U]) : org.apache.spark.rdd.RDD[scala.Tuple2[K, U]] = { /* compiled code */ }

Use of _ when invoking a method

Output of below :
getNum(_);
getNum(3);
def getNum(num: Int) {
println("Num is " + num)
}
is
Num is 3
Why is getNum(_); not invoked ? How is _ used in this case ?
What you'd expect it to be? getNum(null) ?
The getNum(_); is translated into, something like:
{ x:Int => getNum(x) }
Which is a anonymous function and a value itself.
You could do for example:
val f = getNum(_)
f(42)
Then you'd see:
Num is 42
_ is used to partially apply a function. Partial application of a function produces another function with some of its parameters already applied.
val f = getNum(_) // partially apply
f(3) // apply the function

Simplify/DRY up a case statement in Scala for Twirl Templates

So I'm using play Twirl templates (not within play; independent project) and I have some templates that generate some database DDLs. The following works:
if(config.params.showDDL.isSupplied) {
print( BigSenseServer.config.options("dbms") match {
case "mysql" => txt.mysql(
BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName,
BigSenseServer.config.options("dboUser"),
BigSenseServer.config.options("dboPass"),
BigSenseServer.config.options("dbUser"),
BigSenseServer.config.options("dbPass")
)
case "pgsql" => txt.pgsql(
BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName,
BigSenseServer.config.options("dboUser"),
BigSenseServer.config.options("dboPass"),
BigSenseServer.config.options("dbUser"),
BigSenseServer.config.options("dbPass")
)
case "mssql" => txt.mssql$.MODULE$(
BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName,
BigSenseServer.config.options("dboUser"),
BigSenseServer.config.options("dboPass"),
BigSenseServer.config.options("dbUser"),
BigSenseServer.config.options("dbPass")
)
})
System.exit(0)
}
But I have a lot of repeated statements. If I try to assign the case to a variable and use the $.MODULE$ trick, I get an error saying my variable doesn't take parameters:
val b = BigSenseServer.config.options("dbms") match {
case "mysql" => txt.mysql$.MODULE$
case "pgsql" => txt.pgsql$.MODULE$
case "mssql" => txt.mssql$.MODULE$
}
b("string1","string2","string3","string4","string5","string6")
and the error:
BigSense/src/main/scala/io/bigsense/server/BigSenseServer.scala:32: play.twirl.api.BaseScalaTemplate[T,F] with play.twirl.api.Template6[A,B,C,D,E,F,Result] does not take parameters
What's the best way to simplify this Scala code?
EDIT: Final Solution using a combination of the answers below
The answers below suggest creating factory classes, but I really want to avoid that since I already have the Twirl generated template object. The partially applied functions gave me a better understanding of how to achieve this. Turns out all I needed to do was to pick the apply methods and to eta-expand these; if necessary in combination with partial function application. The following works great:
if(config.params.showDDL.isSupplied) {
print((config.options("dbms") match {
case "pgsql" =>
txt.pgsql.apply _
case "mssql" =>
txt.mssql.apply _
case "mysql" =>
txt.mysql.apply(InetAddress.getLocalHost().getCanonicalHostName,
_:String, _:String, _:String,_:String, _:String)
})(
config.options("dbDatabase"),
config.options("dboUser"),
config.options("dboPass"),
config.options("dbUser"),
config.options("dbPass")
))
System.exit(0)
}
You can try to use eta-expansion and partially applied functions.
Given a factory with some methods:
class Factory {
def mysql(i: Int, s: String) = s"x: $i/$s"
def pgsql(i: Int, s: String) = s"y: $i/$s"
def mssql(i: Int, j: Int, s: String) = s"z: $i/$j/$s"
}
You can abstract over the methods like this:
val factory = new Factory()
// Arguments required by all factory methods
val i = 5
val s = "Hello"
Seq("mysql", "pgsql", "mssql").foreach {
name =>
val f = name match {
case "mysql" =>
// Eta-expand: Convert method into function
factory.mysql _
case "pgsql" =>
factory.pgsql _
case "mssql" =>
// Argument for only one factory method
val j = 10
// Eta-expand, then apply function partially
factory.mssql(_ :Int, j, _: String)
}
// Fill in common arguments into the new function
val result = f(i, s)
println(name + " -> " + result)
}
As you can see in the "mssql" case, the arguments may even differ; yet the common arguments only need to be passed once. The foreach loop is just to test each case, the code in the body shows how to partially apply a function.
You can try to do this by using tupled() to create tupled version of the function.
object X {
def a(x : Int, y : Int, z : Int) = "A" + x + y + z
def b(x : Int, y : Int, z : Int) = "B" + x + y + z
def c(x : Int, y : Int, z : Int) = "C" + x + y + z
}
val selectedFunc = X.a _
selectedFunc.tupled((1, 2, 3)) //returns A123
More specifically, you would store your parameters in a tuple:
val params = (BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName) //etc.
and then in your match statement:
case "mysql" => (txt.mysql _).tupled(params)

How to extract filter code to local variable

I'm filtering a list using this code :
linkVOList = linkVOList.filter(x => x.getOpen().>=(100))
The type x is inferred by Scala which is why it can find the .getOpen() method.
Can the code 'x => x.getOpen()' be extracted to a local variable ? something like :
val xval = 'x => x.getOpen()'
and then :
linkVOList = linkVOList.filter(xval.>=(100))
I think this is difficult because the .filter method infers the type wheras I need to work out the type outside of the .filter method. Perhaps this can be achieved using instaneof or an alternative method ?
There are a couple of ways to do what you are asking, but both ways will explicitly have to know the type of object they are working with:
case class VO(open:Int)
object ListTesting {
def main(args: Array[String]) {
val linkVOList = List(VO(200))
val filtered = linkVOList.filter(x => x.open.>=(100))
val filterFunc = (x:VO) => x.open.>=(100)
linkVOList.filter(filterFunc)
def filterFunc2(x:VO) = x.open.>=(100)
linkVOList.filter(filterFunc2)
}
}
Since you haven't provided any such information, I'll imply the following preconditions:
trait GetsOpen { def getOpen() : Int }
def linkVOList : List[GetsOpen]
Then you can extract the function like this:
val f = (x : GetsOpen) => x.getOpen()
or this:
val f : GetsOpen => Int = _.getOpen()
And use it like this:
linkVOList.filter( f.andThen(_ >= 100) )
Just use
import language.higherKinds
def inferMap[A,C[A],B](c: C[A])(f: A => B) = f
scala> val f = inferMap(List(Some("fish"),None))(_.isDefined)
f: Option[String] => Boolean = <function1>
Now, this is not the value but the function itself. If you want the values, just
val opened = linkVOList.map(x => x.open)
(linkVOList zip opened).filter(_._2 >= 100).map(_._1)
but if you want the function then
val xfunc = inferMap(linkVOList)(x => x.open)
but you have to use it like
linkVOList.filter(xfunc andThen { _ >= 100 })
or
linkVOList.filter(x => xfunc(x) >= 100)
since you don't actually have the values but a function to compute the values.