HOMap implementation example - scala

I was watching this video by Daniel Spiewak and tried to implement sample about Higher Kinds from it. Here's what I get:
/* bad style */
val map: Map[Option[Any], List[Any]] = Map (
Some("foo") -> List("foo", "bar", "baz"),
Some(42) -> List(1, 1, 2, 3, 5, 8),
Some(true) -> List(true, false, true, false)
)
val xs: List[String] =
map(Some("foo")).asInstanceOf[List[String]] // ugly cast
val ys: List[Int] =
map(Some(42)).asInstanceOf[List[Int]] // another one
println(xs)
println(ys)
/* higher kinds usage */
// HOMAP :: ((* => *) x (* => *)) => *
class HOMap[K[_], V[_]](delegate: Map[K[Any], V[Any]]) {
def apply[A](key: K[A]): V[A] =
delegate(key.asInstanceOf[K[Any]]).asInstanceOf[V[A]]
}
object HOMap {
type Pair[K[_], V[_]] = (K[A], V[A]) forSome { type A }
def apply[K[_], V[_]](tuples: Pair[K, V]*) =
new HOMap[K, V](Map(tuples: _*))
}
val map_b: HOMap[Option, List] = HOMap[Option, List](
Some("foo") -> List("foo", "bar", "baz"),
Some(42) -> List(1, 1, 2, 3, 5, 8),
Some(true) -> List(true, false, true, false)
)
val xs_b: List[String] = map_b(Some("foo"))
val ys_b: List[Int] = map_b(Some(42))
println(xs_b)
println(ys_b)
Unfortunately launching this I get the type mismatch error:
username#host:~/workspace/scala/samples$ scala higher_kinds.scala
/home/username/workspace/scala/samples/higher_kinds.scala:30: error: type mismatch;
found : Main.$anon.HOMap.Pair[K,V]*
required: Seq[(K[Any], V[Any])]
new HOMap[K, V](Map(tuples: _*))
^
one error found
My questions:
How can I fix this? I fully understand that I just need to pass in the right type, but my experience with this kind of stuff in Scala is poor and I can't figure out this.
Why this happens? I mean the operation tuples: _* is probably used widely for passing to Map, but it somehow gives some strange type - Main.$anon.HOMap.Pair[K,V]* and not what it's supposed to give.
Why that example is no longer work? Maybe some recent changes to Scala language changed some syntax?
Thanks for answers!

Problem in type varince conditions. In line def apply[K[_], V[_]] you need guaranty that containers K[_] & V[_] can be cast to K[Any] & V[Any]
Just add type covarince constraint (+) to K & V containers:
object HOMap {
def apply[K[+_], V[+_]](tuples: (Pair[K[A], V[A]] forSome { type A })*) =
new HOMap[K, V](Map(tuples: _*))
}

Related

apply/get methods in Scala

If we go by the definition in "Programming in Scala" book:
When you apply parentheses surrounding one or more values to a
variable, Scala will transform the code into an invocation of a method
named apply on that variable
Then what about accessing the elements of an array? eg: x(0) is transformed to x.apply(0) ? (let's assume that x is an array). I tried to execute the above line. It was throwing error. I also tried x.get(0) which was also throwing error.
Can anyone please help?
() implies apply(),
Array example,
scala> val data = Array(1, 1, 2, 3, 5, 8)
data: Array[Int] = Array(1, 1, 2, 3, 5, 8)
scala> data.apply(0)
res0: Int = 1
scala> data(0)
res1: Int = 1
not releated but alternative is to use safer method which is lift
scala> data.lift(0)
res4: Option[Int] = Some(1)
scala> data.lift(100)
res5: Option[Int] = None
**Note: ** scala.Array can be mutated,
scala> data(0) = 100
scala> data
res7: Array[Int] = Array(100, 1, 2, 3, 5, 8)
In this you can not use apply, think of apply as a getter not mutator,
scala> data.apply(0) = 100
<console>:13: error: missing argument list for method apply in class Array
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
data.apply(0) = 100
^
You better use .update if you want to mutate,
scala> data.update(0, 200)
scala> data
res11: Array[Int] = Array(200, 1, 2, 3, 5, 8)
User defined apply method,
scala> object Test {
|
| case class User(name: String, password: String)
|
| object User {
| def apply(): User = User("updupd", "password")
| }
|
| }
defined object Test
scala> Test.User()
res2: Test.User = User(updupd,password)
If you add an apply method to an object, you can apply that object (like you can apply functions).
The way to do that it is just apply the object as if it was a function, directly with (), without a "dot".
val array:Array[Int] = Array(1,2,3,4)
array(0) == array.apply(0)
For
x(1)=200
which you mention in the comment, the answer is different. It also gets translated to a method call, but not to apply; instead it's
x.update(1, 200)
Just like apply, this will work with any type which defines a suitable update method.

Accessing Previous output while operator chaining in Scala

How to access the resulting output value to perform an upcoming operation for example:
scala> List(1,4,3,4,4,5,6,7)
res0: List[Int] = List(1, 4, 3, 4, 4, 5, 6, 7)
scala> res0.removeDuplicates.slice(0, ???.size -2)
In the above line, i need to perform slice operation after removing duplicates. To do this, how to access output of .removeDuplicate(), so that i can use it to find size for slice operation.
I need to perform this in a single step. Not in multiple steps like:
scala> res0.removeDuplicates
res1: List[Int] = List(1, 4, 3, 5, 6, 7)
scala> res1.slice(0, res1.size -2)
res2: List[Int] = List(1, 4, 3, 5)
I want to access intermediate results in the final operation. removeDuplicates() is just an example.
list.op1().op2().op3().finalop() here i want to access: output of op1,op2,op3 in finalop
Wrapping into into an Option may be one option (no pun intended):
val finalResult = Some(foo).map { foo =>
foo.op1(foo.stuff)
}.map { foo =>
foo.op2(foo.stuff)
}.map { foo =>
foo.op3(foo.stuff)
}.get.finalOp
You can make the wrapping part implicit to make it a little nicer:
object Tapper {
implicit class Tapped[T] extends AnyVal(val v: T) {
def tap[R](f: T => R) = f(v)
}
}
import Tapper._
val finalResult = foo
.tap(f => f.op1(f.stuff))
.tap(f => f.op2(f.stuff))
.tap(f => f.finalOp(f.stuff))
With for comprehension it is possible to compose operations in quite readable way with ability to access intermediate results:
val res = for {
ls1 <- Option(list.op1)
ls2 = ls1.op2() // Possible to access list, ls1
ls3 = ls2.op3() // Possible to access list, ls1, ls2
} yield ls4.finalOp() // Possible to access list, ls1, ls2, ls3
For example:
scala> val ls = List(1,1,2,2,3,3,4,4)
ls: List[Int] = List(1, 1, 2, 2, 3, 3, 4, 4)
scala> :paste
// Entering paste mode (ctrl-D to finish)
for {
ls1 <- Option(ls.map(_ * 2))
ls2 = ls1.map(_ + ls1.size)
ls3 = ls2.filter(_ < ls1.size + ls2.size)
} yield ls3.sum
// Exiting paste mode, now interpreting.
res15: Option[Int] = Some(72)
You will not need to know the length if you use dropRight:
scala> val a = List(1,4,3,4,4,5,6,7)
a: List[Int] = List(1, 4, 3, 4, 4, 5, 6, 7)
scala> a.dropRight(2)
res0: List[Int] = List(1, 4, 3, 4, 4, 5)
So do this: res0.removeDuplicates.dropRight(2)
If you really need it in one function, you can write a custom foldLeft, something like this:
var count = 0
val found = new HashSet()
res0.foldLeft(List[Int]()) { (z, i) =>
if(!found.contains(i)){
if(count < 4){
z :+ i
found += i
count += 1
}
}
}
However I don't really see the problem in chaining calls like in res0.removeDuplicates.slice. One benefit of functional programming is that our compiler can optimize in situations like this where we just want a certain behavior and don't want to specify the implementation.
You want to process some data through a series of transformations: someData -> op1 -> op2 -> op3 -> finalOp. However, inside op3, you would like to have access to intermediate results from the processing done in op1. The key here is to pass to the next function in the processing chain all the information that will be required downstream.
Let's say that your input is xs: Seq[String] and op1 is of type (xs: Seq[String]) => Seq[String]. You want to modify op1 to return case class ResultWrapper(originalInputLength: Int, deduplicatedItems: Seq[String], somethingNeededInOp5: SomeType). If all of your ops pass along what the other ops need down the line, you will get what you need. It's not very elegant, because there is coupling between your ops: the upstream needs to save the info that the downstream needs. They are not really "different operations" any more at this point.
One thing you can do is to use a Map[A,B] as your "result wrapper". This way, there is less coupling between ops, but less type safety as well.

Spark: aggregateByKey into a pair of lists

I have a keyed set of records that contain book id as well as reader id fields.
case class Book(book: Int, reader: Int)
How can I use aggregateByKey to combine all records with the same key into one record of the following format:
(key:Int, (books: List:[Int], readers: List:[Int]))
where books is a list of all books and readers is a list of all readers from records with the given key?
My code (below) results in compilation errors:
import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkContext, SparkConf}
object Aggr {
case class Book(book: Int, reader: Int)
val bookArray = Array(
(2,Book(book = 1, reader = 700)),
(3,Book(book = 2, reader = 710)),
(4,Book(book = 3, reader = 710)),
(2,Book(book = 8, reader = 710)),
(3,Book(book = 1, reader = 720)),
(4,Book(book = 2, reader = 720)),
(4,Book(book = 8, reader = 720)),
(3,Book(book = 3, reader = 730)),
(4,Book(book = 8, reader = 740))
)
def main(args: Array[String]) {
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
// set up environment
val conf = new SparkConf()
.setMaster("local[5]")
.setAppName("Aggr")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
val books = sc.parallelize(bookArray)
val aggr = books.aggregateByKey((List()[Int], List()[Int]))
({case
((bookList:List[Int],readerList:List[Int]), Book(book, reader)) =>
(bookList ++ List(book), readerList ++ List(reader))
},
{case ((bookLst1:List[Int], readerLst1:List[Int]),
(bookLst2:List[Int], readerLst2:List[Int])
) => (bookLst1 ++ bookLst2, readerLst1 ++ readerLst2) })
}
}
Errors:
Error:(36, 44) object Nil does not take type parameters.
val aggr = books.aggregateByKey((List()[Int], List()[Int]))
Error:(37, 6) missing parameter type for expanded function The argument types of an anonymous function must be fully known. (SLS 8.5) Expected type was: ?
({case
^
^
Update
When initializing accumalator with (List(0), List(0) everything compiles, but inserts extra zeros into result. Very interesting:
val aggr : RDD[(Int, (List[Int], List[Int]))] = books.aggregateByKey((List(0), List(0))) (
{case
((bookList:List[Int],readerList:List[Int]), Book(book, reader)) =>
(bookList ++ List(book), readerList ++ List(reader))
},
{case ((bookLst1:List[Int], readerLst1:List[Int]),
(bookLst2:List[Int], readerLst2:List[Int])
) => (bookLst1 ++ bookLst2, readerLst1 ++ readerLst2) }
)
This results in the following output:
[Stage 0:> (0 + 0) / 5](2,(List(0, 1, 0, 8),List(0, 700, 0, 710)))
(3,(List(0, 2, 0, 1, 0, 3),List(0, 710, 0, 720, 0, 730)))
(4,(List(0, 3, 0, 2, 8, 0, 8),List(0, 710, 0, 720, 720, 0, 740)))
Providing I could have empty lists as initializers instead of lists with zeros, I would not have extra zeros of course, lists would concatenate nicely.
Can somebody, please, explain me why empty list initializer (List(), List() results in error and (List(0), List(0) compiles. Is it a Scala bug or a feature?
Actually you're doing everything OK, it's only that your indentation/syntax style is a bit sloppy, you just need to move one parenthesis from this:
val aggr = books.aggregateByKey((List()[Int], List()[Int]))
({case
Into this:
val aggr = books.aggregateByKey((List[Int](), List[Int]())) (
{case
These links might shed some light why this didn't work for you:
What are the precise rules for when you can omit parenthesis, dots, braces, = (functions), etc.? (first answer)
http://docs.scala-lang.org/style/method-invocation.html#suffix-notation
Answering your update - you misplaced the type declaration for your lists. If you declared them as List[Int]() instead of List()[Int], everything would have worked. The compiler error message is correctly telling you the problem, but it's not quite easy to understand. By putting [Int] at the end, you are passing a type parameter to the result of the List() function. The result of List() is Nil - a singleton object representating an empty list - and it does not take type parameters.
As for why List(0) also works - scala performs type inference, if it can. You've declared one element of a list - which is 0, an integer, so it inferred that this is a List[Int]. Note however, that this does not declare an empty list, but a list with a single zero. You probably want to use List[Int]() instead.
Just using List() doesn't work because scala cannot infer the type of the empty list.

How to convert Iterable[Try[U]] filter successed to Iterable[U]?

I tried
val tryValues : Iterable[Try[Int]] = ...
val successValues = tryValues.filter(_.isSuccess).map(_.get)
but compiler give warning that map may throw exception.
Is there any way free of warning?
You want to use collect to pattern match out all the values which are Success, and discard anything else.
val successValues: List[Int] = tryValues collect { case Success(x) => x }
collect accepts a PartialFunction as an argument. Any values from the collection that the PartialFunction is defined for will be mapped, and the rest will be discarded.
Example:
scala> val tryValues = List(1, 1, 0, 1, 1).map(x => Try(1 / x))
tryValues: List[scala.util.Try[Int]] = List(Success(1), Success(1), Failure(java.lang.ArithmeticException: / by zero), Success(1), Success(1))
scala> val successValues = tryValues collect { case Success(x) => x }
successValues: List[Int] = List(1, 1, 1, 1)
Another option here, if you don't care to log anything about the fails is to flatMap using toOption on the Try. Like so:
val successValues = tryValues.flatMap(_.toOption)
The following is a for-comprehension approach
val successValues = for { Success(n) <- tryValues } yield(p)
For more information have a look at the answer

Idiomatic to find the first element in a collection that matches a given sub-type in Scala

I am teaching myself Scala, Akka, and Play by developing a model of an order book. I need to find the first element in a collection (specifically a priority queue) of various types of Ask orders that matches a certain type of Ask order (specifically a LimitOrderAsk)
The solution that I have come up with is the following:
bestLimitOrderAsk = askBook find {
case ask: LimitOrderAsk => true
case _ => false
}
I am new to scala and I an not sure that this is the idiomatic Scala way to solve this problem. Thoughts?
Two options:
askBook.collectFirst{case ask: LimitOrderAsk => ask}
or:
askBook.find(_.isInstanceOf[LimitOrderAsk])
If you just need to know, if there is some element (with appropriate type) - add .nonEmpty at the end of expression:
askBook.collectFirst{case ask: LimitOrderAsk => ask}.nonEmpty
askBook.exists(_.isInstanceOf[LimitOrderAsk])
Examples:
scala> List(5, null, "aaa", "bbb").find(_.isInstanceOf[String])
res30: Option[Any] = Some(aaa)
scala> List(5, null, "aaa", "bbb").collectFirst{case a: String => a}
res31: Option[String] = Some(aaa)
Boolean result:
scala> List(5, null, "aaa").find(_.isInstanceOf[String]).nonEmpty
res32: Boolean = true
scala> List(5, null).find(_.isInstanceOf[String]).nonEmpty
res33: Boolean = false