Why is the Empty input case needed in Scala Iteratees? - scala

The 3 descriptions of the Iteratee pattern in Scala that I've seen all include 3 cases for input. For example, from James:
sealed trait Input[+E]
object Input {
case object EOF extends Input[Nothing]
case object Empty extends Input[Nothing]
case class El[+E](e: E) extends Input[E]
}
More details see blogs by James, Runar, Josh.
My question is simply: why precisely is the Empty input case needed?
The iteratee pattern defines a relationship between a producer and a consumer of a stream of values. Intuitively, it seems that if any input is empty, the producer that "runs" the iteratee should simply collapse that empty item away, and not call the iteratee until non-empty input is available.
I note the pull-based analog of iteratees, the much more familiar iterators, do not define an empty case, although it's possible that elements have been filtered away "inside" the iterator.
trait Iterator[E] {
next: E // like El
hasNext: Boolean //like EOF
}
While all the above blogs mention the need for an Empty input in passing, but they don't discuss explicitly why it cannot be eliminated altogether. I notice the example iterators shown treat Empty input as a no-op.
I'd really like an example, with code, of a plausible "real-world-ish" problem that requires the Empty input message to solve.

Let's say you connect an enumerator that feeds some elements to the peek iteratee that looks at the first element and returns it but does not consume it, leaving it to be used by possibly another iteratee that will be composed with peek. Then you would want to provide a mechanism for peek to put back the element. From what I can tell from both Play and Scalaz iteratee, the done iteratee takes an argument just for this purpose. So you can do something like in pseudo code: done(Some(result), El(result)). See this implementation of peek.
Now if you implement something like head which will actually consume the element, then it feels like one way to do it is to return done(Some(result), emptyInput) to indicate that the input was consumed.
See also this comment in the playframework source code showing the second argument of Done(_, _) is for unused input and initialized as an empty default. So empty is not something seldom used for which it's hard to find real-world example. It is really key to the implementation of the iteratees. In fact it may be interesting to see which iteratee frameworks do not have empty and how they managed to implement peek and head.

James Roper gave a useful response here, including this snippet I found interesting:
I guess another way it could be implemented is to have Option[Input]
as the left over input for Done. This would make implementing
iteratees simpler since they wouldn't need to handle empty.

Related

How to convert `fs2.Stream[IO, T]` to `Iterator[T]` in Scala

Need to fill in the methods next and hasNext and preserve laziness
new Iterator[T] {
val stream: fs2.Stream[IO, T] = ...
def next(): T = ???
def hasNext(): Boolean = ???
}
But cannot figure out how an earth to do this from a fs2.Stream? All the methods on a Stream (or on the "compiled" thing) are fairly useless.
If this is simply impossible to do in a reasonable amount of code, then that itself is a satisfactory answer and we will just rip out fs2.Stream from the codebase - just want to check first!
fs2.Stream, while similar in concept to Iterator, cannot be converted to one while preserving laziness. I'll try to elaborate on why...
Both represent a pull-based series of items, but the way in which they represent that series and implement the laziness differs too much.
As you already know, Iterator represents its pull in terms of the next() and hasNext methods, both of which are synchronous and blocking. To consume the iterator and return a value, you can directly call those methods e.g. in a loop, or use one of its many convenience methods.
fs2.Stream supports two capabilities that make it incompatible with that interface:
cats.effect.Resource can be included in the construction of a Stream. For example, you could construct a fs2.Stream[IO, Byte] representing the contents of a file. When consuming that stream, even if you abort early or do some strange flatMap, the underlying Resource is honored and your file handle is guaranteed to be closed. If you were trying to do the same thing with iterator, the "abort early" case would pose problems, forcing you to do something like Iterator[Byte] with Closeable and the caller would have to make sure to .close() it, or some other pattern.
Evaluation of "effects". In this context, effects are types like IO or Future, where the process of obtaining the value may perform some possibly-asynchronous action, and may perform side-effects. Asynchrony poses a problem when trying to force the process into a synchronous interface, since it forces you to block your current thread to wait for the asynchronous answer, which can cause deadlocks if you aren't careful. Libraries like cats-effect strongly discourage you from calling methods like unsafeRunSync.
fs2.Stream does allow for some special cases that prevent the inclusion of Resource and Effects, via its Pure type alias which you can use in place of IO. That gets you access to Stream.PureOps, but that only gets you methods that consume the whole stream by building a collection; the laziness you want to preserve would be lost.
Side note: you can convert an Iterator to a Stream.
The only way to "convert" a Stream to an Iterator is to consume it to some collection type via e.g. .compile.toList, which would get you an IO[List[T]], then .map(_.iterator) that to get an IO[Iterator[T]]. But ultimately that doesn't fit what you're asking for since it forces you to consume the stream to a buffer, breaking laziness.
#Dima mentioned the "XY Problem", which was poorly-received since they didn't really elaborate (initially) on the incompatibility, but they're right. It would be helpful to know why you're trying to make a Stream-to-Iterator conversion, in case there's some other approach that would serve your overall goal instead.

Tagless-final effect propagation

The tagless-final pattern lets us write pure functional programs which are explicit about the effects they require.
However, scaling this pattern might become challenging. I'll try to demonstrate this with an example. Imagine a simple program that reads records from the database and prints them to the console. We will require some custom typeclasses Database and Console, in addition to Monad from cats/scalaz in order to compose them:
def main[F[_]: Monad: Console: Database]: F[Unit] =
read[F].flatMap(Console[F].print)
def read[F[_]: Functor: Database]: F[List[String]] =
Database[F].read.map(_.map(recordToString))
The problem starts when I want to add a new a effect to a function in the inner layers. For example, I want my read function to log a message if no records were found
def read[F[_]: Monad: Database: Logger]: F[List[String]] =
Database[F].read.flatMap {
case Nil => Logger[F].log("no records found") *> Nil.pure
case records => records.map(recordToString).pure
}
But now, I have to add the Logger constraint to all the callers of read up the chain. In this contrived example it's just main, but imagine this is several layers down a complicated real-world application.
We can look at this issue in two ways:
We can say it's a good thing that were explicit about our effects, and we know exactly which effects are needed by each layer
We can also say that this leaks implementation details - main doesn't care about logging, it's just needs the result of read. Also, in real applications you see really long chains of effects in the top layers. It feels like a code-smell, but I can't put my finger on what other approach I can take.
Would love to get your insights on this.
Thanks.
We can also say that this leaks implementation details - main doesn't
care about logging, it's just needs the result of read. Also, in real
applications you see really long chains of effects in the top layers.
It feels like a code-smell, but I can't put my finger on what other
approach I can take.
I actually believe the contrary is true. One of the key promises of pure FP is equational reasoning as a means of deriving the method implementation from it's signature. If read needs a logging effect in order to do it's business, then by all means it should be declaratively expressed in the signature. Another advantage of being explicit about your effects is the fact that when they start to accumulate, perhaps we need to rethink what this specific method is doing and split it up into smaller components? Or should this effect really be used here?
It is true that effects stack up, but as #TravisBrown mentioned in the comments, it is usually the highest place in the call stack that has to "suffer the consequence" of actually providing all the implicit evidence for the entire call tree.

(Scala) Am I using Options correctly?

I'm currently working on my functional programming - I am fairly new to it. Am i using Options correctly here? I feel pretty insecure on my skills currently. I want my code to be as safe as possible - Can any one point out what am I doing wrong here or is it not that bad? My code is pretty straight forward here:
def main(args: Array[String]): Unit =
{
val file = "myFile.txt"
val myGame = Game(file) //I have my game that returns an Option here
if(myGame.isDefined) //Check if I indeed past a .txt file
{
val solutions = myGame.get.getAllSolutions() //This returns options as well
if(solutions.isDefined) //Is it possible to solve the puzzle(crossword)
{
for(i <- solutions.get){ //print all solutions to the crossword
i.solvedCrossword foreach println
}
}
}
}
-Thanks!! ^^
When using Option, it is recommended to use match case instead of calling 'isDefined' and 'get'
Instead of the java style for loop, use higher-order function:
myGame match {
case Some(allSolutions) =>
val solutions = allSolutions.getAllSolutions
solutions.foreach(_.solvedCrossword.foreach(println))
case None =>
}
As a rule of thumb, you can think of Option as a replacement for Java's null pointer. That is, in cases where you might want to use null in Java, it often makes sense to use Option in Scala.
Your Game() function uses None to represent errors. So you're not really using it as a replacement for null (at least I'd consider it poor practice for an equivalent Java method to return null there instead of throwing an exception), but as a replacement for exceptions. That's not a good use of Option because it loses error information: you can no longer differentiate between the file not existing, the file being in the wrong format or other types of errors.
Instead you should use Either. Either consists of the cases Left and Right where Right is like Option's Some, but Left differs from None in that it also takes an argument. Here that argument can be used to store information about the error. So you can create a case class containing the possible types of errors and use that as an argument to Left. Or, if you never need to handle the errors differently, but just present them to the user, you can use a string with the error message as the argument to Left instead of case classes.
In getAllSolutions you're just using None as a replacement for the empty list. That's unnecessary because the empty list needs no replacement. It's perfectly fine to just return an empty list when there are no solutions.
When it comes to interacting with the Options, you're using isDefined + get, which is a bit of an anti pattern. get can be used as a shortcut if you know that the option you have is never None, but should generally be avoided. isDefined should generally only be used in situations where you need to know whether an option contains a value, but don't need to know the value.
In cases where you need to know both whether there is a value and what that value is, you should either use pattern matching or one of Option's higher-order functions, such as map, flatMap, getOrElse (which is kind of a higher-order function if you squint a bit and consider by-name arguments as kind-of like functions). For cases where you want to do something with the value if there is one and do nothing otherwise, you can use foreach (or equivalently a for loop), but note that you really shouldn't do nothing in the error case here. You should tell the user about the error instead.
If all you need here is to print it in case all is good, you can use for-comprehension which is considered quite idiomatic Scala way
for {
myGame <- Game("mFile.txt")
solutions <- myGame.getAllSolutions()
solution <- solutions
crossword <- solution.solvedCrossword
} println(crossword)

Either monadic operations

I just start to be used to deal with monadic operations.
For the Option type, this Cheat Sheet of Tony Morris helped:
http://blog.tmorris.net/posts/scalaoption-cheat-sheet/
So in the end it seems easy to understand that:
map transforms the value of inside an option
flatten permits to transform Option[Option[X]] in Option[X]
flatMap is somehow a map operation producing an Option[Option[X]] and then flattened to Option[X]
At least it is what I understand until now.
For Either, it seems a bit more difficult to understand since Either itself is not right biaised, does not have map / flatMap operations... and we use projection.
I can read the Scaladoc but it is not as clear as the Cheat Sheet on Options.
Can someone provide an Either Sheet Cheat to describe the basic monadic operations?
It seems to me that Either.joinRight is a bit like RightProjection.flatMap and seems to be the equivalent of Option.flatten for Either.
It seems to me that if Either was Right biaised, then Either.flatten would be Either.joinRight no?
In this question: Either, Options and for comprehensions I ask about for comprehension with Eiher, and one of the answers says that we can't mix monads because of the way it is desugared into map/flatMap/filter.
When using this kind of code:
def updateUserStats(user: User): Either[Error,User] = for {
stampleCount <- stampleRepository.getStampleCount(user).right
userUpdated <- Right(copyUserWithStats(user,stampleCount)).right
userSaved <- userService.update(userUpdated).right
} yield userSaved
Does this mean that all my 3 method calls must always return Either[Error,Something]?
I mean if I have a method call Either[Throwable,Something] it won't work right?
Edit:
Is Try[Something] exactly the same as a right-biaised Either[Throwable,Something]?
Either was never really meant to be an exception handling based structure. It was meant to represent a situation where a function really could possible return one of two distinct types, but people started the convention where the left type is a supposed to be a failed case and the right is success. If you want to return a biased type for some pass/fail type business checks logic, then Validation from scalaz works well. If you have a function that could return a value or a Throwable, then Try would be a good choice. Either should be used for situations where you really might get one of two possible types, and now that I am using Try and Validation (each for different types of situations), I never use Either any more.

Configuration data in Scala -- should I use the Reader monad?

How do I create a properly functional configurable object in Scala? I have watched Tony Morris' video on the Reader monad and I'm still unable to connect the dots.
I have a hard-coded list of Client objects:
class Client(name : String, age : Int){ /* etc */}
object Client{
//Horrible!
val clients = List(Client("Bob", 20), Client("Cindy", 30))
}
I want Client.clients to be determined at runtime, with the flexibility of either reading it from a properties file or from a database. In the Java world I'd define an interface, implement the two types of source, and use DI to assign a class variable:
trait ConfigSource {
def clients : List[Client]
}
object ConfigFileSource extends ConfigSource {
override def clients = buildClientsFromProperties(Properties("clients.properties"))
//...etc, read properties files
}
object DatabaseSource extends ConfigSource { /* etc */ }
object Client {
#Resource("configuration_source")
private var config : ConfigSource = _ //Inject it at runtime
val clients = config.clients
}
This seems like a pretty clean solution to me (not a lot of code, clear intent), but that var does jump out (OTOH, it doesn't seem to me really troublesome, since I know it will be injected once-and-only-once).
What would the Reader monad look like in this situation and, explain it to me like I'm 5, what are its advantages?
Let's start with a simple, superficial difference between your approach and the Reader approach, which is that you no longer need to hang onto config anywhere at all. Let's say you define the following vaguely clever type synonym:
type Configured[A] = ConfigSource => A
Now, if I ever need a ConfigSource for some function, say a function that gets the n'th client in the list, I can declare that function as "configured":
def nthClient(n: Int): Configured[Client] = {
config => config.clients(n)
}
So we're essentially pulling a config out of thin air, any time we need one! Smells like dependency injection, right? Now let's say we want the ages of the first, second and third clients in the list (assuming they exist):
def ages: Configured[(Int, Int, Int)] =
for {
a0 <- nthClient(0)
a1 <- nthClient(1)
a2 <- nthClient(2)
} yield (a0.age, a1.age, a2.age)
For this, of course, you need some appropriate definition of map and flatMap. I won't get into that here, but will simply say that Scalaz (or RĂșnar's awesome NEScala talk, or Tony's which you've seen already) gives you all you need.
The important point here is that the ConfigSource dependency and its so-called injection are mostly hidden. The only "hint" that we can see here is that ages is of type Configured[(Int, Int, Int)] rather than simply (Int, Int, Int). We didn't need to explicitly reference config anywhere.
As an aside, this is the way I almost always like to think about monads: they hide their effect so it's not polluting the flow of your code, while explicitly declaring the effect in the type signature. In other words, you needn't repeat yourself too much: you say "hey, this function deals with effect X" in the function's return type, and don't mess with it any further.
In this example, of course the effect is to read from some fixed environment. Another monadic effect you might be familiar with include error-handling: we can say that Option hides error-handling logic while making the possibility of errors explicit in your method's type. Or, sort of the opposite of reading, the Writer monad hides the thing we're writing to while making its presence explicit in the type system.
Now finally, just as we normally need to bootstrap a DI framework (somewhere outside our usual flow of control, such as in an XML file), we also need to bootstrap this curious monad. Surely we'll have some logical entry point to our code, such as:
def run: Configured[Unit] = // ...
It ends up being pretty simple: since Configured[A] is just a type synonym for the function ConfigSource => A, we can just apply the function to its "environment":
run(ConfigFileSource)
// or
run(DatabaseSource)
Ta-da! So, contrasting with the traditional Java-style DI approach, we don't have any "magic" occurring here. The only magic, as it were, is encapsulated in the definition of our Configured type and the way it behaves as a monad. Most importantly, the type system keeps us honest about which "realm" dependency injection is occurring in: anything with type Configured[...] is in the DI world, and anything without it is not. We simply don't get this in old-school DI, where everything is potentially managed by the magic, so you don't really know which portions of your code are safe to reuse outside of a DI framework (for example, within your unit tests, or in some other project entirely).
update: I wrote up a blog post which explains Reader in greater detail.