I can use an = in a scala for-comprehension (as specified in section 6.19 of the SLS) as follows:
Option
Suppose I have some function String => Option[Int]:
scala> def intOpt(s: String) = try { Some(s.toInt) } catch { case _ => None }
intOpt: (s: String)Option[Int]
Then I can use it thus
scala> for {
| str <- Option("1")
| i <- intOpt(str)
| val j = i + 10 //Note use of = in generator
| }
| yield j
res18: Option[Int] = Some(11)
It was my understanding that this was essentially equivalent to:
scala> Option("1") flatMap { str => intOpt(str) } map { i => i + 10 } map { j => j }
res19: Option[Int] = Some(11)
That is, the embedded generator was a way of injecting a map into a sequence of flatMap calls. So far so good.
Either.RightProjection
What I actually want to do: use a similar for-comprehension as the previous example using the Either monad.
However, if we use it in a similar chain, but this time using the Either.RightProjection monad/functor, it doesn't work:
scala> def intEither(s: String): Either[Throwable, Int] =
| try { Right(s.toInt) } catch { case x => Left(x) }
intEither: (s: String)Either[Throwable,Int]
Then use:
scala> for {
| str <- Option("1").toRight(new Throwable()).right
| i <- intEither(str).right //note the "right" projection is used
| val j = i + 10
| }
| yield j
<console>:17: error: value map is not a member of Product with Serializable with Either[java.lang.Throwable,(Int, Int)]
i <- intEither(str).right
^
The issue has something to do with the function that a right-projection expects as an argument to its flatMap method (i.e. it expects an R => Either[L, R]). But modifying to not call right on the second generator, it still won't compile.
scala> for {
| str <- Option("1").toRight(new Throwable()).right
| i <- intEither(str) // no "right" projection
| val j = i + 10
| }
| yield j
<console>:17: error: value map is not a member of Either[Throwable,Int]
i <- intEither(str)
^
Mega-Confusion
But now I get doubly confused. The following works just fine:
scala> for {
| x <- Right[Throwable, String]("1").right
| y <- Right[Throwable, String](x).right //note the "right" here
| } yield y.toInt
res39: Either[Throwable,Int] = Right(1)
But this does not:
scala> Right[Throwable, String]("1").right flatMap { x => Right[Throwable, String](x).right } map { y => y.toInt }
<console>:14: error: type mismatch;
found : Either.RightProjection[Throwable,String]
required: Either[?,?]
Right[Throwable, String]("1").right flatMap { x => Right[Throwable, String](x).right } map { y => y.toInt }
^
I thought these were equivalent
What is going on?
How can I embed an = generator in a for comprehension across an Either?
The fact that you cannot embed the = in the for-comprehension is related to this issue reported by Jason Zaugg; the solution is to Right-bias Either (or create a new data type isomorphic to it).
For your mega-confusion, you expanded the for sugar incorrectly. The desugaring of
for {
b <- x(a)
c <- y(b)
} yield z(c)
is
x(a) flatMap { b =>
y(b) map { c =>
z(c) }}
and not
x(a) flatMap { b => y(b)} map { c => z(c) }
Hence you should have done this:
scala> Right[Throwable, String]("1").right flatMap { x => Right[Throwable, String](x).right map { y => y.toInt } }
res49: Either[Throwable,Int] = Right(1)
More fun about desugaring (the `j = i + 10` issue)
for {
b <- x(a)
c <- y(b)
x1 = f1(b)
x2 = f2(b, x1)
...
xn = fn(.....)
d <- z(c, xn)
} yield w(d)
is desugared into
x(a) flatMap { b =>
y(b) map { c =>
x1 = ..
...
xn = ..
(c, x1, .., xn)
} flatMap { (_c1, _x1, .., _xn) =>
z(_c1, _xn) map w }}
So in your case, y(b) has result type Either which doesn't have map defined.
Related
I've got a coding draft which works so far as it delivers the correct answer. But from the esthetics side, it could be improved, my guess!
Aim: Find first solution in a list of many possible solutions. When found first solution, don't calculate further. In real-world application, the calculation of each solution/non-solution might be more complex for sure.
Don't like: The Solution=Left and NoSolution=Right aliasing is contra-intuitive, since Right normally stands for success and here Left and Right are swapped (since technically when using Either only Left shortcuts the for-comprehension list)
Is there a nice way to improve this implementation? or another solution?
package playground
object Test {
def main(args: Array[String]): Unit = {
test
}
val Solution = Left
val NoSolution = Right
def test: Unit = {
{
// Find the first solution in a list of computations and print it out
val result = for {
_ <- if (1 == 2) Solution("impossible") else NoSolution()
_ <- NoSolution()
_ <- NoSolution(3)
_ <- Solution("*** Solution 1 ***")
_ <- NoSolution("oh no")
_ <- Solution("*** Solution 2 ***")
x <- NoSolution("no, no")
} yield x
if (result.isLeft)
println(result.merge) // Prints: *** Solution 1 ***
}
}
}
So you're looking for something that's "monaduck": i.e. has flatMap/map but doesn't necessarily obey any monadic laws (Scala doesn't even require that flatMap have monadic shape: the chain after desugaring just has to typecheck); cf. duck-typing.
trait Trial[+Result] {
def result: Option[Result]
def flatMap[R >: Result](f: Unit => Trial[R]): Trial[R]
def map[R](f: Result => R): Trial[R]
}
case object NoSolution extends Trial[Nothing] {
def result = None
def flatMap[R](f: Unit => Trial[R]): Trial[R] = f(())
def map[R](f: Result => R): Trial[R] = this
}
case class Solution[Result](value: Result) extends Trial[Result] {
def result = Some(value)
def flatMap[R >: Result](f: Unit => Trial[R]): Trial[R] = this
def map[R](f: Result => R): Trial[R] = Solution(f(value))
}
scala> for {
| _ <- if (1 == 2) Solution("nope") else NoSolution
| _ <- NoSolution
| _ <- Solution("yay!")
| _ <- NoSolution
| x <- Solution("nope")
| } yield x
res0: Trial[String] = Solution(yay!)
scala> for {
| _ <- if (1 == 2) Solution("nope") else NoSolution
| _ <- NoSolution
| _ <- Solution("yay!")
| x <- NoSolution
| } yield x
res1: Trial[String] = Solution(yay!)
scala> for {
| _ <- if (1 == 2) Solution("nope") else NoSolution
| x <- NoSolution
| } yield x
res2: Trial[String] = NoSolution
Clearly, monadic laws are being violated: the only thing we could use for pure is Solution, but
scala> val f: Unit => Trial[Any] = { _ => NoSolution }
f: Unit => Trial[Any] = $Lambda$107382/0x00000008433be840#6c0e35d7
scala> Solution(5).flatMap(f)
res7: Trial[Any] = Solution(5)
scala> f(5)
<console>:13: warning: a pure expression does nothing in statement position
f(5)
^
res8: Trial[Any] = NoSolution
Absent Scala's willingness to convert any pure value to Unit, that wouldn't even type check, but still, it breaks left identity.
I am very curious how Scala desugars the following for-comprehension:
for {
a <- Option(5)
b = a * 2
c <- if (b == 10) Option(100) else None
} yield b + c
My difficulty comes from having both b and c in the yield, because they seem to be bound at different steps
This is the sanitized output of desugar - a command available in Ammonite REPL:
Option(5)
.map { a =>
val b = a * 2;
(a, b)
}
.flatMap { case (a, b) =>
(if (b == 10) Option(100) else None)
.map(c => b + c)
}
Both b and c can be present in yield because it does not desugar to chained calls to map/flatMap, but rather to nested calls.
You can even ask the compiler. The following command:
scala -Xprint:parser -e "for {
a <- Option(5)
b = a * 2
c <- if (b == 10) Option(100) else None
} yield b + c"
yields this output
[[syntax trees at end of parser]] // scalacmd7617799112170074915.scala
package <empty> {
object Main extends scala.AnyRef {
def <init>() = {
super.<init>();
()
};
def main(args: Array[String]): scala.Unit = {
final class $anon extends scala.AnyRef {
def <init>() = {
super.<init>();
()
};
Option(5).map(((a) => {
val b = a.$times(2);
scala.Tuple2(a, b)
})).flatMap(((x$1) => x$1: #scala.unchecked match {
case scala.Tuple2((a # _), (b # _)) => if (b.$eq$eq(10))
Option(100)
else
None.map(((c) => b.$plus(c)))
}))
};
new $anon()
}
}
}
Taking only the piece you are interested in and improving the readability, you get this:
Option(5).map(a => {
val b = a * 2
(a, b)
}).flatMap(_ match {
case (_, b) =>
if (b == 10)
Option(100)
else
None.map(c => b + c)
})
Edit
As reported in a comment, literally translating from the compiler output seems to highlight a bug in how the desugared expression is rendered. The sum should be mapped on the result of the if expression, rather then on the None in the else branch:
Option(5).map(a => {
val b = a * 2
(a, b)
}).flatMap(_ match {
case (_, b) =>
(if (b == 10) Option(100) else None).map(c => b + c)
})
It's probably worth it to ask the compiler team if this is a bug.
These two codes are equivalent:
scala> for {
| a <- Option(5)
| b = a * 2
| c <- if (b == 10) Option(100) else None
| } yield b + c
res70: Option[Int] = Some(110)
scala> for {
| a <- Option(5)
| b = a * 2
| if (b == 10)
| c <- Option(100)
| } yield b + c
res71: Option[Int] = Some(110)
Since there is no collection involved, yielding multiple values, there is only one big step - or, arguable, 3 to 4 small steps. If a would have been None, the whole loop would have been terminated early, yielding a None.
The desugaring is a flatMap/withFilter/map.
Given:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
def f: Future[Either[String, Int]] = Future { Right(100)}
def plus10(x: Int): Future[Either[String, Int]] =
Future { Right(x + 10) }
I'm trying to chain the Future[...] together as so:
scala> for {
| x <- f
| y <- for { a <- x.right } yield plus10(a)
| } yield y
<console>:17: error: value map is not a member of Product with
Serializable with
scala.util.Either[String,scala.concurrent.Future[Either[String,Int]]]
y <- for { a <- x.right } yield plus10(a)
^
I am expecting to get: Future{Right(100)} as a result, but I get the above compile-time error.
Travis Brown gave an excellent answer on how to use Monad Transformers to fix my code here. However, how can I fix my code without Monad Transformers?
Turns out that I can use Either#fold:
scala> for {
| a <- f
| b <- a.fold(_ => Future { Left("bad") }, xx => plus10(xx) )
| } yield b
res16: scala.concurrent.Future[Either[String,Int]] =
scala.concurrent.impl.Promise$DefaultPromise#67fc2aad
scala> res16.value
res17: Option[scala.util.Try[Either[String,Int]]] =
Some(Success(Right(110)))
I was about to answer when yours appeared, but you might still look at this:
val res = for {
x <- f
y <- x.fold(x => Future{Left(x)}, plus10)
} yield y
It is a little more concise on the right side and keeps the left side.
I've got a hadoopFiles object which is generated from sc.newAPIHadoopFile.
scala> hadoopFiles
res1: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.LongWritable, org.apache.hadoop.io.Text)] = UnionRDD[64] at union at <console>:24
I intend to iterate through all the lines in hadoopFiles with operation and filter on it, In which, a if check is applied and will throw an exception:
scala> val rowRDD = hadoopFiles.map(line =>
| line._2.toString.split("\\^") map {
| field => {
| var pair = field.split("=", 2)
| if(pair.length == 2)
| (pair(0) -> pair(1))
| }
| } toMap
| ).map(kvs => Row(kvs("uuid"), kvs("ip"), kvs("plt").trim))
<console>:33: error: Cannot prove that Any <:< (T, U).
} toMap
^
However, if I remove the if(pair.length == 2) part, it will works fine:
scala> val rowRDD = hadoopFiles.map(line =>
| line._2.toString.split("\\^") map {
| field => {
| var pair = field.split("=", 2)
| (pair(0) -> pair(1))
| }
| } toMap
| ).map(kvs => Row(kvs("uuid"), kvs("ip"), kvs("plt").trim))
warning: there was one feature warning; re-run with -feature for details
rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.expressions.Row] = MappedRDD[66] at map at <console>:33
Could anyone tell me the reason for this phenomenon, and show me the correct way to apply the if statement. Thanks a lot!
P.S. We could use this simplified example to test:
"1=a^2=b^3".split("\\^") map {
field => {
var pair = field.split("=", 2)
if(pair.length == 2)
pair(0) -> pair(1)
else
return
}
} toMap
To map over a collection and only keep some of the mapped elements, you can use flatMap. flatMap takes a function that returns a collection, e.g. instance an Option. Now the if expression needs to have an else part that returns an empty Option, i.e. None.
scala> val rowRDD = hadoopFiles.map(line =>
| line._2.toString.split("\\^") flatMap {
| field => {
| var pair = field.split("=", 2)
| if (pair.length == 2)
| Some(pair(0) -> pair(1))
| else
| None
| }
| } toMap
| ).map(kvs => Row(kvs("uuid"), kvs("ip"), kvs("plt").trim))
You can use collect:
val res = "1=a^2=b^3".split("\\^") collect {
_.split("=", 2) match {
case Array(a, b) => a -> b
}
} toMap
println(res) // Map(1 -> a, 2 -> b)
In your particular case the following happens:
case class Row(uuid: String, ip: String, plt: String)
val hadoopFiles = List(("", "uuid=a^ip=b^plt"))
val rowRDD = hadoopFiles.map(line =>
line._2.toString.split("\\^") map {
field =>
{
var pair = field.split("=", 2)
val res = if (pair.length == 2)
(pair(0) -> pair(1))
res // res: Any (common super class for (String, String)
// which is Tuple2 and Unit (result for case when
// pair.length != 2)
}
} /* <<< returns Array[Any] */ /*toMap*/ )
//.map(kvs => Row(kvs("uuid"), kvs("ip"), kvs("plt").trim))
The result of inner map is Any and map yields Array[Any]. If you look at toMap definition you will see:
def toMap[T, U](implicit ev: A <:< (T, U)): immutable.Map[T, U] = {
val b = immutable.Map.newBuilder[T, U]
for (x <- self)
b += x // <<< implicit conversion from each `x` of class `A` in `self`
// to (T, U) because we have `implicit ev: A <:< (T, U)`
b.result()
}
For your Array[Any] there is no implicit conversion from Any to (T, U) in current context. Because of this your code fails.
If you add else alternative:
val rowRDD = hadoopFiles.map(line =>
(line._2.toString.split("\\^") map {
field =>
{
var pair = field.split("=", 2)
val res = if (pair.length == 2)
(pair(0) -> pair(1))
else ("" -> "") // dummy, just for demo
res // res: (String, String)
}
} toMap) withDefaultValue ("")
/*withDefaultValue just to avoid Exception for this demo*/ )
.map(kvs => Row(kvs("uuid"), kvs("ip"), kvs("plt").trim))
println(rowRDD) // List(Row(a,b,))
Here your result will be Array[(String, String)] and there is an implicit conversion from (String, String) to (T, U). So the code compiles and works.
Is there any difference between this code:
for(term <- term_array) {
val list = hashmap.get(term)
...
}
and:
for(term <- term_array; val list = hashmap.get(term)) {
...
}
Inside the loop I'm changing the hashmap with something like this
hashmap.put(term, string :: list)
While checking for the head of list it seems to be outdated somehow when using the second code snippet.
The difference between the two is, that the first one is a definition which is created by pattern matching and the second one is a value inside a function literal. See Programming in Scala, Section 23.1 For Expressions:
for {
p <- persons // a generator
n = p.name // a definition
if (n startsWith "To") // a filter
} yield n
You see the real difference when you compile sources with scalac -Xprint:typer <filename>.scala:
object X {
val x1 = for (i <- (1 to 5); x = i*2) yield x
val x2 = for (i <- (1 to 5)) yield { val x = i*2; x }
}
After code transforming by the compiler you will get something like this:
private[this] val x1: scala.collection.immutable.IndexedSeq[Int] =
scala.this.Predef.intWrapper(1).to(5).map[(Int, Int), scala.collection.immutable.IndexedSeq[(Int, Int)]](((i: Int) => {
val x: Int = i.*(2);
scala.Tuple2.apply[Int, Int](i, x)
}))(immutable.this.IndexedSeq.canBuildFrom[(Int, Int)]).map[Int, scala.collection.immutable.IndexedSeq[Int]]((
(x$1: (Int, Int)) => (x$1: (Int, Int) #unchecked) match {
case (_1: Int, _2: Int)(Int, Int)((i # _), (x # _)) => x
}))(immutable.this.IndexedSeq.canBuildFrom[Int]);
private[this] val x2: scala.collection.immutable.IndexedSeq[Int] =
scala.this.Predef.intWrapper(1).to(5).map[Int, scala.collection.immutable.IndexedSeq[Int]](((i: Int) => {
val x: Int = i.*(2);
x
}))(immutable.this.IndexedSeq.canBuildFrom[Int]);
This can be simplified to:
val x1 = (1 to 5).map {i =>
val x: Int = i * 2
(i, x)
}.map {
case (i, x) => x
}
val x2 = (1 to 5).map {i =>
val x = i * 2
x
}
Instantiating variables inside for loops makes sense if you want to use that variable the for statement, like:
for (i <- is; a = something; if (a)) {
...
}
And the reason why your list is outdated, is that this translates to a foreach call, such as:
term_array.foreach {
term => val list= hashmap.get(term)
} foreach {
...
}
So when you reach ..., your hashmap has already been changed. The other example translates to:
term_array.foreach {
term => val list= hashmap.get(term)
...
}