I'm trying to use cats effect in scala and in the end of the world I have the type:
IO[Vector[IO[Vector[IO[Unit]]]]]
I found only one method to run it:
for {
row <- rows.unsafeRunSync()
} yield
for {
cell <- row.unsafeRunSync()
} yield cell.handleErrorWith(errorHandlingFunc).unsafeRunSync()
But it looks pretty ugly. Please help me to understand how I can perform complex side effects.
UPDATE:
1)First IO - I open excel file and get the vector of rows i.e IO[Vector[Row]].
2)Second IO - I perform query to DB for each row. I can't compose IO monad with Vector[_],
3)Third IO - I create PDF file for each row from excel using Vector[Results] from DB.
So I have such functions as:
1) String=>IO[Vector[Row]]
2) Row=>IO[Vector[Results]]
3) Vector[Results] => IO[Unit]
For the sake of example here's a nonsense action I've just made up off the top of my head with the same type:
import cats.effect.IO
val actions: IO[Vector[IO[Vector[IO[Unit]]]]] =
IO(readLine).flatMap(in => IO(in.toInt)).map { count =>
(0 until count).toVector.map { _ =>
IO(System.nanoTime).map { t =>
(0 until 2).toVector.map { _ =>
IO(println(t.toString))
}
}
}
}
Here we're reading a string from standard input, parsing it as an integer, looking at the current time that many times, and printing it twice each time.
The correct way to flatten this type would be to use sequence to rearrange the layers:
import cats.implicits._
val program = actions.flatMap(_.sequence).flatMap(_.flatten.sequence_)
(Or something similar—there are lots of reasonable ways you could write this.)
This program has type IO[Unit], and works as we'd expect:
scala> program.unsafeRunSync
// I typed "3" here
8058983807657
8058983807657
8058984254443
8058984254443
8058984270434
8058984270434
Any time you see a deeply nested type involving multiple layers of IO and collections like this, though, it's likely that the best thing to do is to avoid getting in that situation in the first place (usually by using traverse). In this case we could rewrite our original actions like this:
val actions: IO[Unit] =
IO(readLine).flatMap(in => IO(in.toInt)).flatMap { count =>
(0 until count).toVector.traverse_ { _ =>
IO(System.nanoTime).flatMap { t =>
(0 until 2).toVector.traverse { _ =>
IO(println(t.toString))
}
}
}
}
This will work exactly the same way as our program, but we've avoided the nesting by replacing the maps in our original actions with either flatMap or traverse. Knowing which you need where is something that you learn through practice, but when you're starting out it's best to go in the smallest steps possible and follow the types.
Related
Let's say I have some Repository API where I have wrapped the transactions in a (Scalaz) Reader monad. Now I want to run computations over the results, and save the results back into the Repository. I tried something like:
type UOW[A] = Reader[Transaction, A]
object Record1Repo {
override def findAll: UOW[Seq[Record1]] = Reader(t => {
...
})
}
...
repo.run {
for {
all: Seq[Record1] <- Record1Repo.findAll
record: Record <- all
encoding: Encoding <- Processor.encode(record)
_ <- Record2Repo.save(Record2(encoding))
} yield {
logger.info(s"processed record id=${record.id}")
}}
But it falls apart with the futile attempt to map over the results in record <- all.
I'm quite new to this type of functional programming and couldn't find how express my intention properly. Any suggestions is welcome.
It fails because you are breaking out of the Reader monad.
You start with a Reader and then you extract from a Seq so this cannot be translated in a flatMap/map chain within the Reader structure.
I have data type :
counted: org.apache.spark.rdd.RDD[(String, Seq[(String, Int)])] = MapPartitionsRDD[24] at groupByKey at <console>:28
And I'm trying to apply the following to this type :
def func = 2
counted.flatMap { x => counted.map { y => ((x._1+","+y._1),func) } }
So each sequence is compared to each other and a function is applied. For simplicity the function is just returning 2. When I attempt above function I receive this error :
scala> counted.flatMap { x => counted.map { y => ((x._1+","+y._1),func) } }
<console>:33: error: type mismatch;
found : org.apache.spark.rdd.RDD[(String, Int)]
required: TraversableOnce[?]
counted.flatMap { x => counted.map { y => ((x._1+","+y._1),func) } }
How can this function be applied using Spark ?
I have tried
val dataArray = counted.collect
dataArray.flatMap { x => dataArray.map { y => ((x._1+","+y._1),func) } }
which converts the collection to Array type and applies same function. But I run out of memory when I try this method. I think using an RDD is more efficient than using an Array ? The max amount of memory I can allocate is 7g , is there a mechanism in spark that I can use hard drive memory to augment available RAM memory ?
The collection I'm running this function on contain 20'000 entries so 20'000^2 comparisons (400'000'000) but in Spark terms this is quite small ?
Short answer:
counted.cartesian(counted).map {
case ((x, _), (y, _)) => (x + "," + y, func)
}
Please use pattern matching to extract tuple elements for nested tuples to avoid unreadable chained underscore notation. Using _ for the second elements shows the reader that these values are being ignored.
Now what would be even more readable (and maybe more efficient) if func doesn't use the second elements would be to do this:
val projected = counted.map(_._1)
projected.cartesian(projected).map(x => (x._1 + "," + x._2, func))
Note that you do not need curly braces if your lambda fits in a single semantic line this is a very common mistake in Scala.
I would like to know why you wish to have this Cartesian product, there is often ways to avoid doing this that are significantly more scalable. Please say what your going to do with this Cartesian product and I will try to find a scalable way of doing what you want.
One final point; please put spaces between operators
#RexKerr pointed to me that I was somewhat incorrect in the comment section, so I deleted my comments. But while doing that, I had the chance to read the post again and came up with the idea that might be of some use to you.
Since what you are trying to implement is actually some operation over a cartesian product, you might want to try just calling the RDD#cartesian. Here is a dumb example, but if you can give some real code, maybe I'll be able to do something like this in that case as well:
// get collection with the type corresponding to the type in question:
val v1 = sc.parallelize(List("q"-> (".", 0), "s"->(".", 1), "f" -> (".", 2))).groupByKey
// try doing something
v1.cartesian(v1).map{x => (x._1._1+","+x._1._1, 2)}.foreach(println)
I am a newbie to scala and I am writing scala code to implement pastry protocol. The protocol itself does not matter. There are nodes and each node has a routing table which I want to populate.
Here is the part of the code:
def act () {
def getMatchingNode (initialMatch :String) : Int = {
val len = initialMatch.length
for (i <- 0 to noOfNodes-1) {
var flag : Int = 1
for (j <- 0 to len-1) {
if (list(i).key.charAt(j) == initialMatch(j)) {
continue
}
else {
flag = 0
}
}
if (flag == 1) {
return i
}
}
return -1
}
// iterate over rows
for (ii <- 0 to rows - 1) {
for (jj <- 0 to 15) {
var initialMatch = ""
for (k <- 0 to ii-1) {
initialMatch = initialMatch + key.charAt(k)
}
initialMatch += jj
println("initialMatch",initialMatch)
if (getMatchingNode(initialMatch) != -1) {
Routing(0)(jj) = list(getMatchingNode(initialMatch)).key
}
else {
Routing(0)(jj) = "NULL"
}
}
}
}// act
The problem is when the function call to getMatchingNode takes place then the actor dies suddenly by itself. 'list' is the list of all nodes. (list of node objects)
Also this behaviour is not consistent. The call to getMatchingNode should take place 15 times for each actor (for 10 nodes).
But while debugging the actor kills itself in the getMatchingNode function call after one call or sometimes after 3-4 calls.
The scala library code which gets executed is this :
def run() {
try {
beginExecution()
try {
if (fun eq null)
handler(msg)
else
fun()
} catch {
case _: KillActorControl =>
// do nothing
case e: Exception if reactor.exceptionHandler.isDefinedAt(e) =>
reactor.exceptionHandler(e)
}
reactor.kill()
}
Eclipse shows that this code has been called from the for loop in the getMatchingNode function
def getMatchingNode (initialMatch :String) : Int = {
val len = initialMatch.length
for (i <- 0 to noOfNodes-1)
The strange thing is that sometimes the loop behaves normally and sometimes it goes to the scala code which kills the actor.
Any inputs what wrong with the code??
Any help would be appreciated.
Got the error..
The 'continue' clause in the for loop caused the trouble.
I thought we could use continue in Scala as we do in C++/Java but it does not seem so.
Removing the continue solved the issue.
From the book: "Programming in Scala 2ed" by M.Odersky
You may have noticed that there has been no mention of break or continue.
Scala leaves out these commands because they do not mesh well with function
literals, a feature described in the next chapter. It is clear what continue
means inside a while loop, but what would it mean inside a function literal?
While Scala supports both imperative and functional styles of programming,
in this case it leans slightly towards functional programming in exchange
for simplifying the language. Do not worry, though. There are many ways to
program without break and continue, and if you take advantage of function
literals, those alternatives can often be shorter than the original code.
I really suggest reading the book if you want to learn scala
Your code is based on tons of nested for loops, which can be more often than not be rewritten using the Higher Order Functions available on the most appropriate Collection.
You can rewrite you function like the following [I'm trying to make it approachable for newcomers]:
//works if "list" contains "nodes" with an attribute "node.key: String"
def getMatchingNode (initialMatch :String) : Int = {
//a new list with the corresponding keys
val nodeKeys = list.map(node => node.key)
//zips each key (creates a pair) with the corresponding index in the list and then find a possible match
val matchOption: Option[(String, Int)] = (nodeKeys.zipWithIndex) find {case (key, index) => key == initialMatch}
//we convert an eventual result contained in the Option, with the right projection of the pair (which contains the index)
val idxOption = matchOption map {case (key, index) => index} //now we have an Option[Int] with a possible index
//returns the content of option if it's full (Some) or a default value of "-1" if there was no match (None). See Option[T] for more details
idxOption.getOrElse(-1)
}
The potential to easily transform or operate on the Collection's elements is what makes continues, and for loops in general, less used in Scala
You can convert the row iteration in a similar way, but I would suggest that if you need to work a lot with the collection's indexes, you want to use an IndexedSeq or one of its implementations, like ArrayBuffer.
I have a list of possible input Values
val inputValues = List(1,2,3,4,5)
I have a really long to compute function that gives me a result
def reallyLongFunction( input: Int ) : Option[String] = { ..... }
Using scala parallel collections, I can easily do
inputValues.par.map( reallyLongFunction( _ ) )
To get what all the results are, in parallel. The problem is, I don't really want all the results, I only want the FIRST result. As soon as one of my input is a success, I want my output, and want to move on with my life. This did a lot of extra work.
So how do I get the best of both worlds? I want to
Get the first result that returns something from my long function
Stop all my other threads from useless work.
Edit -
I solved it like a dumb java programmer by having
#volatile var done = false;
Which is set and checked inside my reallyLongFunction. This works, but does not feel very scala. Would like a better way to do this....
(Updated: no, it doesn't work, doesn't do the map)
Would it work to do something like:
inputValues.par.find({ v => reallyLongFunction(v); true })
The implementation uses this:
protected[this] class Find[U >: T](pred: T => Boolean, protected[this] val pit: IterableSplitter[T]) extends Accessor[Option[U], Find[U]] {
#volatile var result: Option[U] = None
def leaf(prev: Option[Option[U]]) = { if (!pit.isAborted) result = pit.find(pred); if (result != None) pit.abort }
protected[this] def newSubtask(p: IterableSplitter[T]) = new Find(pred, p)
override def merge(that: Find[U]) = if (this.result == None) result = that.result
}
which looks pretty similar in spirit to your #volatile except you don't have to look at it ;-)
I took interpreted your question in the same way as huynhjl, but if you just want to search and discardNones, you could do something like this to avoid the need to repeat the computation when a suitable outcome is found:
class Computation[A,B](value: A, function: A => B) {
lazy val result = function(value)
}
def f(x: Int) = { // your function here
Thread.sleep(100 - x)
if (x > 5) Some(x * 10)
else None
}
val list = List.range(1, 20) map (i => new Computation(i, f))
val found = list.par find (_.result.isDefined)
//found is Option[Computation[Int,Option[Int]]]
val result = found map (_.result.get)
//result is Option[Int]
However find for parallel collections seems to do a lot of unnecessary work (see this question), so this might not work well, with current versions of Scala at least.
Volatile flags are used in the parallel collections (take a look at the source for find, exists, and forall), so I think your idea is a good one. It's actually better if you can include the flag in the function itself. It kills referential transparency on your function (i.e. for certain inputs your function now sometimes returns None rather than Some), but since you're discarding the stopped computations, this shouldn't matter.
If you're willing to use a non-core library, I think Futures would be a good match for this task. For instance:
Akka's Futures include Futures.firstCompletedOf
Twitter's Futures include Future.select
...both of which appear to enable the functionality you're looking for.
let's say we have a list of states and we want to sequence them:
import cats.data.State
import cats.instances.list._
import cats.syntax.traverse._
trait MachineState
case object ContinueRunning extends MachineState
case object StopRunning extends MachineState
case class Machine(candy: Int)
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
Result would be
(Machine(10),List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
It's obvious that 3 last steps are redundant. Is there a way to make this sequence stop early? Here when StopRunning gets returned I would like to stop. For example a list of Either's would fail fast and stop sequence early if needed (because it acts like a monad).
For the record - I do know that it is possible to simply write a tail recursion that checks each state that is being runned and if some condition is satisfied - stop the recursion. I just want to know if there is a more elegant way of doing this? The recursion solution seems like a lot of boilerplate to me, am I wrong or not?
Thank you!:))
There are 2 things here needed to be done.
The first is understanding what is actually happening:
State takes some state value, threads in between many composed calls and in the process produces some output value as well
in your case Machine is the state threaded between calls, while MachineState is the output of a single operation
sequence (usually) takes a collection (here List) of some parametric stuff here State[Machine, _] and turns nesting on the left side (here: List[State[Machine, _]] -> State[Machine, List[_]]) (_ is the gap that you'll be filling with your type)
the result is that you'll thread state (Machine(0)) through all the functions, while you combine the output of each of them (MachineState) into list of outputs
// ammonite
// to better see how many times things are being run
# {
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
println("new attempt with " + machine + " and " + amount)
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
}
addCandy: Int => State[Machine, MachineState] = ammonite.$sess.cmd24$$$Lambda$2669/1733815710#25c887ca
# List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
new attempt with Machine(8) and 20
new attempt with Machine(8) and 50
res25: (Machine, List[MachineState]) = (Machine(8), List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
In other words, what you want is circuit breaking then .sequence might not be what you want.
As a matter of the fact, you probably want something else - combine a list of A => (A, B) functions into one function which stops next computation if the result of a computation is StopRunning (in your code nothing tells the code what is the condition of circuit break and how it should be performed). I would suggest doing it explicitly with some other function, e.g.:
# {
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50))
.reduce { (a, b) =>
a.flatMap {
// flatMap and map uses MachineState
// - the second parameter is the result after all!
// we are pattern matching on it to decide if we want to
// proceed with computation or stop it
case ContinueRunning => b // runs next computation
case StopRunning => State.pure(StopRunning) // returns current result without modifying it
}
}
.run(Machine(0))
.value
}
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
res23: (Machine, MachineState) = (Machine(8), StopRunning)
This will eliminate the need for running code within addCandy - but you cannot really get rid of code that combines states together, so this reduce logic will be applied on runtime n-1 times (where n is the size of your list) and that cannot be helped.
BTW If you take a closer look at Either you will find that it also computes n results and only then combines them so that it looks like it's circuit breaking, but in fact isn't. Sequence is combining a result of "parallel" computations but won't interrupt them if any of them failed.