apply/get methods in Scala - scala

If we go by the definition in "Programming in Scala" book:
When you apply parentheses surrounding one or more values to a
variable, Scala will transform the code into an invocation of a method
named apply on that variable
Then what about accessing the elements of an array? eg: x(0) is transformed to x.apply(0) ? (let's assume that x is an array). I tried to execute the above line. It was throwing error. I also tried x.get(0) which was also throwing error.
Can anyone please help?

() implies apply(),
Array example,
scala> val data = Array(1, 1, 2, 3, 5, 8)
data: Array[Int] = Array(1, 1, 2, 3, 5, 8)
scala> data.apply(0)
res0: Int = 1
scala> data(0)
res1: Int = 1
not releated but alternative is to use safer method which is lift
scala> data.lift(0)
res4: Option[Int] = Some(1)
scala> data.lift(100)
res5: Option[Int] = None
**Note: ** scala.Array can be mutated,
scala> data(0) = 100
scala> data
res7: Array[Int] = Array(100, 1, 2, 3, 5, 8)
In this you can not use apply, think of apply as a getter not mutator,
scala> data.apply(0) = 100
<console>:13: error: missing argument list for method apply in class Array
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
data.apply(0) = 100
^
You better use .update if you want to mutate,
scala> data.update(0, 200)
scala> data
res11: Array[Int] = Array(200, 1, 2, 3, 5, 8)
User defined apply method,
scala> object Test {
|
| case class User(name: String, password: String)
|
| object User {
| def apply(): User = User("updupd", "password")
| }
|
| }
defined object Test
scala> Test.User()
res2: Test.User = User(updupd,password)

If you add an apply method to an object, you can apply that object (like you can apply functions).
The way to do that it is just apply the object as if it was a function, directly with (), without a "dot".
val array:Array[Int] = Array(1,2,3,4)
array(0) == array.apply(0)

For
x(1)=200
which you mention in the comment, the answer is different. It also gets translated to a method call, but not to apply; instead it's
x.update(1, 200)
Just like apply, this will work with any type which defines a suitable update method.

Related

scala: list as input and output of function

I am new to scala. I have a very simple problem.
Given a list in python
x=[1, 100, "a1", "b1"]
I can write a function that will return the last two elements
def f(w):
if w[0]>=1 and w[1]<=100:
return ([w[2],w[3]])
How do I do the equivalent in scala
val v= List(1, 100, "a1", "b1")
def g(L:List[Any]): List[String] = {
if( L(0)>=1 & L(1)<=100 ) {return List(L(2), L(3))}
}
val w=g(v)
This gets me the error
List[Any] = List(1, 100, a, b)
Incomplete expression
You can't get a List[String] from a List[Any]. (Well, you can, but it's a really bad thing to do.)
Don't, don't, don't create a List[Any]. Unlike Python, Scala is a strictly typed language, which means that the compiler keeps a close watch on the type of each variable and every collection. When the compiler looses track of the List type it becomes List[Any] and you've lost all the assistance the compiler offers to help write programs that don't crash.
To mix types in a collection you can use tuples. Here's the type-safe Scala way to write your g() method.
def g(tup: (Int,Int,String,String)): List[String] =
if (tup._1 >= 1 & tup._2 <= 100) List(tup._3, tup._4)
else List()
Usage:
val v = (1, 100, "a1", "b1")
val w = g(v) //w: List[String] = List(a1, b1)
It seems like you have a typo here:
if(L(0)>=1 & L(1<=100)) {return List(L(2), L(3))}
Wouldn't it be like this?
if(L(0)>=1 & L(1)<=100) {return List(L(2), L(3))}
The error seems to point out there's something wrong with that extra bracket there.
scala> List(1,2,3,4,5).takeRight(2)
res44: List[Int] = List(4, 5)
You can use a built in function in Scala that does this!

Accessing Previous output while operator chaining in Scala

How to access the resulting output value to perform an upcoming operation for example:
scala> List(1,4,3,4,4,5,6,7)
res0: List[Int] = List(1, 4, 3, 4, 4, 5, 6, 7)
scala> res0.removeDuplicates.slice(0, ???.size -2)
In the above line, i need to perform slice operation after removing duplicates. To do this, how to access output of .removeDuplicate(), so that i can use it to find size for slice operation.
I need to perform this in a single step. Not in multiple steps like:
scala> res0.removeDuplicates
res1: List[Int] = List(1, 4, 3, 5, 6, 7)
scala> res1.slice(0, res1.size -2)
res2: List[Int] = List(1, 4, 3, 5)
I want to access intermediate results in the final operation. removeDuplicates() is just an example.
list.op1().op2().op3().finalop() here i want to access: output of op1,op2,op3 in finalop
Wrapping into into an Option may be one option (no pun intended):
val finalResult = Some(foo).map { foo =>
foo.op1(foo.stuff)
}.map { foo =>
foo.op2(foo.stuff)
}.map { foo =>
foo.op3(foo.stuff)
}.get.finalOp
You can make the wrapping part implicit to make it a little nicer:
object Tapper {
implicit class Tapped[T] extends AnyVal(val v: T) {
def tap[R](f: T => R) = f(v)
}
}
import Tapper._
val finalResult = foo
.tap(f => f.op1(f.stuff))
.tap(f => f.op2(f.stuff))
.tap(f => f.finalOp(f.stuff))
With for comprehension it is possible to compose operations in quite readable way with ability to access intermediate results:
val res = for {
ls1 <- Option(list.op1)
ls2 = ls1.op2() // Possible to access list, ls1
ls3 = ls2.op3() // Possible to access list, ls1, ls2
} yield ls4.finalOp() // Possible to access list, ls1, ls2, ls3
For example:
scala> val ls = List(1,1,2,2,3,3,4,4)
ls: List[Int] = List(1, 1, 2, 2, 3, 3, 4, 4)
scala> :paste
// Entering paste mode (ctrl-D to finish)
for {
ls1 <- Option(ls.map(_ * 2))
ls2 = ls1.map(_ + ls1.size)
ls3 = ls2.filter(_ < ls1.size + ls2.size)
} yield ls3.sum
// Exiting paste mode, now interpreting.
res15: Option[Int] = Some(72)
You will not need to know the length if you use dropRight:
scala> val a = List(1,4,3,4,4,5,6,7)
a: List[Int] = List(1, 4, 3, 4, 4, 5, 6, 7)
scala> a.dropRight(2)
res0: List[Int] = List(1, 4, 3, 4, 4, 5)
So do this: res0.removeDuplicates.dropRight(2)
If you really need it in one function, you can write a custom foldLeft, something like this:
var count = 0
val found = new HashSet()
res0.foldLeft(List[Int]()) { (z, i) =>
if(!found.contains(i)){
if(count < 4){
z :+ i
found += i
count += 1
}
}
}
However I don't really see the problem in chaining calls like in res0.removeDuplicates.slice. One benefit of functional programming is that our compiler can optimize in situations like this where we just want a certain behavior and don't want to specify the implementation.
You want to process some data through a series of transformations: someData -> op1 -> op2 -> op3 -> finalOp. However, inside op3, you would like to have access to intermediate results from the processing done in op1. The key here is to pass to the next function in the processing chain all the information that will be required downstream.
Let's say that your input is xs: Seq[String] and op1 is of type (xs: Seq[String]) => Seq[String]. You want to modify op1 to return case class ResultWrapper(originalInputLength: Int, deduplicatedItems: Seq[String], somethingNeededInOp5: SomeType). If all of your ops pass along what the other ops need down the line, you will get what you need. It's not very elegant, because there is coupling between your ops: the upstream needs to save the info that the downstream needs. They are not really "different operations" any more at this point.
One thing you can do is to use a Map[A,B] as your "result wrapper". This way, there is less coupling between ops, but less type safety as well.

Spark: aggregateByKey into a pair of lists

I have a keyed set of records that contain book id as well as reader id fields.
case class Book(book: Int, reader: Int)
How can I use aggregateByKey to combine all records with the same key into one record of the following format:
(key:Int, (books: List:[Int], readers: List:[Int]))
where books is a list of all books and readers is a list of all readers from records with the given key?
My code (below) results in compilation errors:
import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkContext, SparkConf}
object Aggr {
case class Book(book: Int, reader: Int)
val bookArray = Array(
(2,Book(book = 1, reader = 700)),
(3,Book(book = 2, reader = 710)),
(4,Book(book = 3, reader = 710)),
(2,Book(book = 8, reader = 710)),
(3,Book(book = 1, reader = 720)),
(4,Book(book = 2, reader = 720)),
(4,Book(book = 8, reader = 720)),
(3,Book(book = 3, reader = 730)),
(4,Book(book = 8, reader = 740))
)
def main(args: Array[String]) {
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
// set up environment
val conf = new SparkConf()
.setMaster("local[5]")
.setAppName("Aggr")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
val books = sc.parallelize(bookArray)
val aggr = books.aggregateByKey((List()[Int], List()[Int]))
({case
((bookList:List[Int],readerList:List[Int]), Book(book, reader)) =>
(bookList ++ List(book), readerList ++ List(reader))
},
{case ((bookLst1:List[Int], readerLst1:List[Int]),
(bookLst2:List[Int], readerLst2:List[Int])
) => (bookLst1 ++ bookLst2, readerLst1 ++ readerLst2) })
}
}
Errors:
Error:(36, 44) object Nil does not take type parameters.
val aggr = books.aggregateByKey((List()[Int], List()[Int]))
Error:(37, 6) missing parameter type for expanded function The argument types of an anonymous function must be fully known. (SLS 8.5) Expected type was: ?
({case
^
^
Update
When initializing accumalator with (List(0), List(0) everything compiles, but inserts extra zeros into result. Very interesting:
val aggr : RDD[(Int, (List[Int], List[Int]))] = books.aggregateByKey((List(0), List(0))) (
{case
((bookList:List[Int],readerList:List[Int]), Book(book, reader)) =>
(bookList ++ List(book), readerList ++ List(reader))
},
{case ((bookLst1:List[Int], readerLst1:List[Int]),
(bookLst2:List[Int], readerLst2:List[Int])
) => (bookLst1 ++ bookLst2, readerLst1 ++ readerLst2) }
)
This results in the following output:
[Stage 0:> (0 + 0) / 5](2,(List(0, 1, 0, 8),List(0, 700, 0, 710)))
(3,(List(0, 2, 0, 1, 0, 3),List(0, 710, 0, 720, 0, 730)))
(4,(List(0, 3, 0, 2, 8, 0, 8),List(0, 710, 0, 720, 720, 0, 740)))
Providing I could have empty lists as initializers instead of lists with zeros, I would not have extra zeros of course, lists would concatenate nicely.
Can somebody, please, explain me why empty list initializer (List(), List() results in error and (List(0), List(0) compiles. Is it a Scala bug or a feature?
Actually you're doing everything OK, it's only that your indentation/syntax style is a bit sloppy, you just need to move one parenthesis from this:
val aggr = books.aggregateByKey((List()[Int], List()[Int]))
({case
Into this:
val aggr = books.aggregateByKey((List[Int](), List[Int]())) (
{case
These links might shed some light why this didn't work for you:
What are the precise rules for when you can omit parenthesis, dots, braces, = (functions), etc.? (first answer)
http://docs.scala-lang.org/style/method-invocation.html#suffix-notation
Answering your update - you misplaced the type declaration for your lists. If you declared them as List[Int]() instead of List()[Int], everything would have worked. The compiler error message is correctly telling you the problem, but it's not quite easy to understand. By putting [Int] at the end, you are passing a type parameter to the result of the List() function. The result of List() is Nil - a singleton object representating an empty list - and it does not take type parameters.
As for why List(0) also works - scala performs type inference, if it can. You've declared one element of a list - which is 0, an integer, so it inferred that this is a List[Int]. Note however, that this does not declare an empty list, but a list with a single zero. You probably want to use List[Int]() instead.
Just using List() doesn't work because scala cannot infer the type of the empty list.

HOMap implementation example

I was watching this video by Daniel Spiewak and tried to implement sample about Higher Kinds from it. Here's what I get:
/* bad style */
val map: Map[Option[Any], List[Any]] = Map (
Some("foo") -> List("foo", "bar", "baz"),
Some(42) -> List(1, 1, 2, 3, 5, 8),
Some(true) -> List(true, false, true, false)
)
val xs: List[String] =
map(Some("foo")).asInstanceOf[List[String]] // ugly cast
val ys: List[Int] =
map(Some(42)).asInstanceOf[List[Int]] // another one
println(xs)
println(ys)
/* higher kinds usage */
// HOMAP :: ((* => *) x (* => *)) => *
class HOMap[K[_], V[_]](delegate: Map[K[Any], V[Any]]) {
def apply[A](key: K[A]): V[A] =
delegate(key.asInstanceOf[K[Any]]).asInstanceOf[V[A]]
}
object HOMap {
type Pair[K[_], V[_]] = (K[A], V[A]) forSome { type A }
def apply[K[_], V[_]](tuples: Pair[K, V]*) =
new HOMap[K, V](Map(tuples: _*))
}
val map_b: HOMap[Option, List] = HOMap[Option, List](
Some("foo") -> List("foo", "bar", "baz"),
Some(42) -> List(1, 1, 2, 3, 5, 8),
Some(true) -> List(true, false, true, false)
)
val xs_b: List[String] = map_b(Some("foo"))
val ys_b: List[Int] = map_b(Some(42))
println(xs_b)
println(ys_b)
Unfortunately launching this I get the type mismatch error:
username#host:~/workspace/scala/samples$ scala higher_kinds.scala
/home/username/workspace/scala/samples/higher_kinds.scala:30: error: type mismatch;
found : Main.$anon.HOMap.Pair[K,V]*
required: Seq[(K[Any], V[Any])]
new HOMap[K, V](Map(tuples: _*))
^
one error found
My questions:
How can I fix this? I fully understand that I just need to pass in the right type, but my experience with this kind of stuff in Scala is poor and I can't figure out this.
Why this happens? I mean the operation tuples: _* is probably used widely for passing to Map, but it somehow gives some strange type - Main.$anon.HOMap.Pair[K,V]* and not what it's supposed to give.
Why that example is no longer work? Maybe some recent changes to Scala language changed some syntax?
Thanks for answers!
Problem in type varince conditions. In line def apply[K[_], V[_]] you need guaranty that containers K[_] & V[_] can be cast to K[Any] & V[Any]
Just add type covarince constraint (+) to K & V containers:
object HOMap {
def apply[K[+_], V[+_]](tuples: (Pair[K[A], V[A]] forSome { type A })*) =
new HOMap[K, V](Map(tuples: _*))
}

SortedSet map does not always preserve element ordering in result?

Given the following Scala 2.9.2 code:
Updated with non-working example
import collection.immutable.SortedSet
case class Bar(s: String)
trait Foo {
val stuff: SortedSet[String]
def makeBars(bs: Map[String, String])
= stuff.map(k => Bar(bs.getOrElse(k, "-"))).toList
}
case class Bazz(rawStuff: List[String]) extends Foo {
val stuff = SortedSet(rawStuff: _*)
}
// test it out....
val b = Bazz(List("A","B","C"))
b.makeBars(Map("A"->"1","B"->"2","C"->"3"))
// List[Bar] = List(Bar(1), Bar(2), Bar(3))
// Looks good?
// Make a really big list not in order. This is why we pass it to a SortedSet...
val data = Stream.continually(util.Random.shuffle(List("A","B","C","D","E","F"))).take(100).toList
val b2 = Bazz(data.flatten)
// And how about a sparse map...?
val bs = util.Random.shuffle(Map("A" -> "1", "B" -> "2", "E" -> "5").toList).toMap
b2.makeBars(bs)
// res24: List[Bar] = List(Bar(1), Bar(2), Bar(-), Bar(5))
I've discovered that, in some cases, the makeBars method of classes extending Foo does not return a sorted List. In fact, the list ordering does not reflect the ordering of the SortedSet
What am I missing about the above code where Scala will not always map a SortedSet to a List with elements ordered by the SortedSet ordering?
You're being surprised by implicit resolution.
The map method requires a CanBuildFrom instance that's compatible with the target collection type (in simple cases, identical to the source collection type) and the mapper function's return type.
In the particular case of SortedSet, its implicit CanBuildFrom requires that an Ordering[A] (where A is the return type of the mapper function) be available. When your map function returns something that the compiler already knows how to find an Ordering for, you're good:
scala> val ss = collection.immutable.SortedSet(10,9,8,7,6,5,4,3,2,1)
ss: scala.collection.immutable.SortedSet[Int] = TreeSet(1, 2, 3, 4, 5,
6, 7, 8, 9, 10)
scala> val result1 = ss.map(_ * 2)
result1: scala.collection.immutable.SortedSet[Int] = TreeSet(2, 4, 6, 8, 10,
12, 14, 16, 18, 20)
// still sorted because Ordering[Int] is readily available
scala> val result2 = ss.map(_ + " is a number")
result2: scala.collection.immutable.SortedSet[String] = TreeSet(1 is a number,
10 is a number,
2 is a number,
3 is a number,
4 is a number,
5 is a number,
6 is a number,
7 is a number,
8 is a number,
9 is a number)
// The default Ordering[String] is an "asciibetical" sort,
// so 10 comes between 1 and 2. :)
However, when your mapper function turns out to return a type for which no Ordering is known, the implicit on SortedSet doesn't match (specifically, no value can be found for its implicit parameter), so the compiler looks "upward" for a compatible CanBuildFrom and finds the generic one from Set.
scala> case class Foo(i: Int)
defined class Foo
scala> val result3 = ss.map(Foo(_))
result3: scala.collection.immutable.Set[Foo] = Set(Foo(10), Foo(4), Foo(6), Foo(7), Foo(1), Foo(3), Foo(5), Foo(8), Foo(9), Foo(2))
// The default Set is a hash set, therefore ordering is not preserved
Of course, you can get around this by simply supplying an instance of Ordering[Foo] that does whatever you expect:
scala> implicit val fooIsOrdered: Ordering[Foo] = Ordering.by(_.i)
fooIsOrdered: Ordering[Foo] = scala.math.Ordering$$anon$9#7512dbf2
scala> val result4 = ss.map(Foo(_))
result4: scala.collection.immutable.SortedSet[Foo] = TreeSet(Foo(1), Foo(2),
Foo(3), Foo(4), Foo(5),
Foo(6), Foo(7), Foo(8),
Foo(9), Foo(10))
// And we're back!
Finally, note that toy examples often don't exhibit the problem, because the Scala collection library has special implementations for small (n <= 6) Sets and Maps.
You're probably making assumption about what SortedSet does from Java. You need to specify what order you want the elements to be in. See http://www.scala-lang.org/docu/files/collections-api/collections_8.html