Reentrant locks within monads in Scala - scala

A colleague of mine stated the following, about using a Java ReentrantReadWriteLock in some Scala code:
Acquiring the lock here is risky. It's "reentrant", but that internally depends on the thread context. F may run different stages of the same computation in different threads. You can easily cause a deadlock.
F here refers to some effectful monad.
Basically what I'm trying to do is to acquire the same reentrant lock twice, within the same monad.
Could somebody clarify why this could be a problem?
The code is split into two files. The outermost one:
val lock: Resource[F, Unit] = for {
// some other resource
_ <- store.writeLock
} yield ()
lock.use { _ =>
for {
// stuff
_ <- EitherT(store.doSomething())
// other stuff
} yield ()
}
Then, in the store:
import java.util.concurrent.locks.{Lock, ReentrantReadWriteLock}
import cats.effect.{Resource, Sync}
private def lockAsResource[F[_]](lock: Lock)(implicit F: Sync[F]): Resource[F, Unit] =
Resource.make {
F.delay(lock.lock())
} { _ =>
F.delay(lock.unlock())
}
private val lock = new ReentrantReadWriteLock
val writeLock: Resource[F, Unit] = lockAsResource(lock.writeLock())
def doSomething(): F[Either[Throwable, Unit]] = writeLock.use { _ =>
// etc etc
}
The writeLock in the two pieces of code is the same, and it's a cats.effect.Resource[F, Unit] wrapping a ReentrantReadWriteLock's writeLock. There are some reasons why I was writing the code this way, so I wouldn't want to dig into that. I would just like to understand why (according to my colleague, at least), this could potentially break stuff.
Also, I'd like to know if there is some alternative in Scala that would allow something like this without the risk for deadlocks.

IIUC your question:
You expect that for each interaction with the Resource lock.lock and lock.unlock actions happen in the same thread.
1) There is no guarantee at all since you are using arbitrary effect F here.
It's possible to write an implementation of F that executes every action in a new thread.
2) Even if we assume that F is IO then the body of doSomething someone could do IO.shift. So the next actions including unlock would happen in another thread. Probably it's not possible with the current signature of doSomething but you get the idea.
Also, I'd like to know if there is some alternative in Scala that would allow something like this without the risk for deadlocks.
You can take a look at scalaz zio STM.

Related

Using scala-cats IO type to encapsulate a mutable Java library

I understand that generally speaking there is a lot to say about deciding what one wants to model as effect This discussion is introduce in Functional programming in Scala on the chapter on IO.
Nonethless, I have not finished the chapter, i was just browsing it end to end before takling it together with Cats IO.
In the mean time, I have a bit of a situation for some code I need to deliver soon at work.
It relies on a Java Library that is just all about mutation. That library was started a long time ago and for legacy reason i don't see them changing.
Anyway, long story short. Is actually modeling any mutating function as IO a viable way to encapsulate a mutating java library ?
Edit1 (at request I add a snippet)
Readying into a model, mutate the model rather than creating a new one. I would contrast jena to gremlin for instance, a functional library over graph data.
def loadModel(paths: String*): Model =
paths.foldLeft(ModelFactory.createOntologyModel(new OntModelSpec(OntModelSpec.OWL_MEM)).asInstanceOf[Model]) {
case (model, path) ⇒
val input = getClass.getClassLoader.getResourceAsStream(path)
val lang = RDFLanguages.filenameToLang(path).getName
model.read(input, "", lang)
}
That was my scala code, but the java api as documented in the website look like this.
// create the resource
Resource r = model.createResource();
// add the property
r.addProperty(RDFS.label, model.createLiteral("chat", "en"))
.addProperty(RDFS.label, model.createLiteral("chat", "fr"))
.addProperty(RDFS.label, model.createLiteral("<em>chat</em>", true));
// write out the Model
model.write(system.out);
// create a bag
Bag smiths = model.createBag();
// select all the resources with a VCARD.FN property
// whose value ends with "Smith"
StmtIterator iter = model.listStatements(
new SimpleSelector(null, VCARD.FN, (RDFNode) null) {
public boolean selects(Statement s) {
return s.getString().endsWith("Smith");
}
});
// add the Smith's to the bag
while (iter.hasNext()) {
smiths.add(iter.nextStatement().getSubject());
}
So, there are three solutions to this problem.
1. Simple and dirty
If all the usage of the impure API is contained in single / small part of the code base, you may just "cheat" and do something like:
def useBadJavaAPI(args): IO[Foo] = IO {
// Everything inside this block can be imperative and mutable.
}
I said "cheat" because the idea of IO is composition, and a big IO chunk is not really composition. But, sometimes you only want to encapsulate that legacy part and do not care about it.
2. Towards composition.
Basically, the same as above but dropping some flatMaps in the middle:
// Instead of:
def useBadJavaAPI(args): IO[Foo] = IO {
val a = createMutableThing()
mutableThing.add(args)
val b = a.bar()
b.computeFoo()
}
// You do something like this:
def useBadJavaAPI(args): IO[Foo] =
for {
a <- IO(createMutableThing())
_ <- IO(mutableThing.add(args))
b <- IO(a.bar())
result <- IO(b.computeFoo())
} yield result
There are a couple of reasons for doing this:
Because the imperative / mutable API is not contained in a single method / class but in a couple of them. And the encapsulation of small steps in IO is helping you to reason about it.
Because you want to slowly migrate the code to something better.
Because you want to feel better with yourself :p
3. Wrap it in a pure interface
This is basically the same that many third party libraries (e.g. Doobie, fs2-blobstore, neotypes) do. Wrapping a Java library on a pure interface.
Note that as such, the amount of work that has to be done is way more than the previous two solutions. As such, this is worth it if the mutable API is "infecting" many places of your codebase, or worse in multiple projects; if so then it makes sense to do this and publish is as an independent module.
(it may also be worth to publish that module as an open-source library, you may end up helping other people and receive help from other people as well)
Since this is a bigger task is not easy to just provide a complete answer of all you would have to do, it may help to see how those libraries are implemented and ask more questions either here or in the gitter channels.
But, I can give you a quick snippet of how it would look like:
// First define a pure interface of the operations you want to provide
trait PureModel[F[_]] { // You may forget about the abstract F and just use IO instead.
def op1: F[Int]
def op2(data: List[String]): F[Unit]
}
// Then in the companion object you define factories.
object PureModel {
// If the underlying java object has a close or release action,
// use a Resource[F, PureModel[F]] instead.
def apply[F[_]](args)(implicit F: Sync[F]): F[PureModel[F]] = ???
}
Now, how to create the implementation is the tricky part.
Maybe you can use something like Sync to initialize the mutable state.
def apply[F[_]](args)(implicit F: Sync[F]): F[PureModel[F]] =
F.delay(createMutableState()).map { mutableThing =>
new PureModel[F] {
override def op1: F[Int] = F.delay(mutableThing.foo())
override def op2(data: List[String]): F[Unit] = F.delay(mutableThing.bar(data))
}
}

How does the cats-effect IO monad really work?

I'm new to functional programming and Scala, and I was checking out the Cats Effect framework and trying to understand what the IO monad does. So far what I've understood is that writing code in the IO block is just a description of what needs to be done and nothing happens until you explicitly run using the unsafe methods provided, and also a way to make code that performs side-effects referentially transparent by actually not running it.
I tried executing the snippet below just to try to understand what it means:
object Playground extends App {
var out = 10
var state = "paused"
def changeState(newState: String): IO[Unit] = {
state = newState
IO(println("Updated state."))
}
def x(string: String): IO[Unit] = {
out += 1
IO(println(string))
}
val tuple1 = (x("one"), x("two"))
for {
_ <- x("1")
_ <- changeState("playing")
} yield ()
println(out)
println(state)
}
And the output was:
13
paused
I don't understand why the assignment state = newState does not run, but the increment and assign expression out += 1 run. Am I missing something obvious on how this is supposed to work? I could really use some help. I understand that I can get this to run using the unsafe methods.
In your particular example, I think what is going on is that regular imperative Scala coded is unaffected by the IO monad--it runs when it normally would under the rules of Scala.
When you run:
for {
_ <- x("1")
_ <- changeState("playing")
} yield ()
this immediately calls x. That has nothing to do with the IO monad; it's just how for comprehensions are defined. The first step is to evaluate the first statement so you can call flatMap on it.
As you observe, you never "run" the monadic result, so the argument to flatMap, the monadic continuation, is never invoked, resulting in no call to changeState. This is specific to the IO monad, as, e.g., the List monad's flatMap would have immediately invoked the function (unless it were an empty list).

Functional scala log accumulator

I'm working on a Scala project using cats library, mainly. In there, we have calls like
for {
_ <- initSomeServiceAndLog("something from a far away service")
_ <- initSomeOtherServiceAndLog("something from another far away service")
a <- b()
c <- d(a)
} yield c
Imagine that b also logs something or might throw a business error (I know, we avoid to throw in Scala, but it's not the case right now). I'm looking for a solution to accumulate logs and print them all in the end, in a single message.
For a happy path, I saw that Writer Monad from Cats might be an acceptable solution.
But what if b method throws? The requirements are to logs everything - all previous logs and the error message, in a single message, with some kind of unique trace ID.
Any thoughts? Thanks in advance
Implementing functional logging (in a way that preserves logs even if error happened) using monad transformers like Writer (WriterT) or State (StateT) is hard. However, if we don't be anal about FP approach we could do the following:
use some IO monad
with it create something like in-memory storage for logs
however implement in in a functional way
Personally I would pick either cats.effect.concurrent.Ref or monix.eval.TaskLocal.
Example using Ref (and Task):
type Log = Ref[Task, Chain[String]]
type FunctionalLogger = String => Task[Unit]
val createLog: Task[Log] = Ref.of[Task, Chain[String]](Chain.empty)
def createAppender(log: Log): FunctionalLogger =
entry => log.update(chain => chain.append(entry))
def outputLog(log: Log): Task[Chain[String]] = log.get
with helpers like that I could:
def doOperations(logger: FunctionalLogger) = for {
_ <- operation1(logger) // logging is a side effect managed by IO monad
_ <- operation2(logger) // so it is referentially transparent
} yield result
createLog.flatMap { log =>
doOperations(createAppender(log))
.recoverWith(...)
.flatMap { result =>
outputLog(log)
...
}
}
However, making sure that output is called is a bit of a pain so we could use some form of Bracket or Resource to handle it:
val loggerResource: Resource[Task, FunctionalLogger] = Resource.make {
createLog // acquiring resource - IO operation that accesses something
} { log =>
outputLog(log) // releasing resource - works like finally in try-catchso it should
.flatMap(... /* log entries or sth */) // be called no matter if error occured
}.map(createAppender)
loggerResource.use { logger =>
doSomething(logger)
}
If you don't like passing this appender around explicitly you could use Kleisli to inject it:
type WithLogger[A] = Kleisli[Task, FunctionalLogger, A]
// def operation1: WithLogger[A]
// def operation2: WithLogger[B]
def doSomething: WithLogger[C] = for {
a <- operation1
b <- operation2
} yield c
loggerResource.use { logger =>
doSomething(logger)
}
TaskLocal would be used in a very similar way.
At the end of the day you would end up with:
type that says that it is logging
mutability managed through IO, so referential transparency would not be lost
certainty that even if IO fails, log will be preserved and the results sent
I believe some purist would not like this solution, but it has all the benefits of FP, so I would personally use it.

Implement while(true) in cats [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
How do you implement the following loops in cats?
First (normal while(true) loop):
while(true) { doSomething() }
Second (while(true) loop with increment):
var i = 1
while(true) { i +=1; doSomething() }
Third (while(true) with several independent variables inside):
var x = 1
var y = 2
while(true) {
x = someCalculation()
y = otherCalculation()
doSomething()
}
I think your question is somewhat ill-posed, but it's ill-posed in an interesting way, so maybe I should try to explain what I mean by that.
Asking "how do I implement
var i = 1
while(true) { i +=1; doSomething() }
in Cats" is answered trivially: in exactly the same way as you would implement it in ordinary Scala without using any libraries. In this particular case, Cats will not enable you to implement anything that behaves drastically different at runtime. What it does enable you to do is to express more precisely what (side) effects each piece of your code has, and to encode it as type level information, which can be verified at compile time during static type checking.
So, the question should not be
"How do I do X in Cats?"
but rather
"How do I prove / make explicit that my code does (or does not) have certain side effects using Cats?".
The while loop in your example simply performs some side effects in doSomething(), and it messes with mutable state of the variable i whenever it wants to, without ever making it explicit in the types of the constituent subexpressions.
Now you might take something like effects.IO, and at least wrap the body of your doSomething in an IO, thereby making it explicit that it performs input/output operations (here: printing to StdOut):
// Your side-effectful `doSomething` function in IO
def doSomething: IO[Unit] = IO { println("do something") }
Now you might ask how to write down the loop in such a way that it also becomes obvious that it performs such IO-operations. You could do something like this:
// Literally `while(true) { ... }`
def whileTrue_1: IO[Unit] =
Monad[IO].whileM_(IO(true)) { doSomething }
// Shortcut using `foreverM` syntax
import cats.syntax.flatMap._
def whileTrue_2: IO[Nothing] = doSomething.foreverM
// Use `>>` from `syntax.flatMap._`
def whileTrue_3: IO[Unit] = doSomething >> whileTrue_3
Now, if you want to throw in the mutable variable i into the mix, you could treat writing/reading of mutable memory as yet another IO operation:
// Treat access to mutable variable `i` as
// yet another IO side effect, do something
// with value of `i` again and again.
def whileInc_1: IO[Unit] = {
var i = 1
def doSomethingWithI: IO[Unit] = IO {
println(s"doing sth. with $i")
}
Monad[IO].whileM_(IO(true)) {
for {
_ <- IO { i += 1 }
_ <- doSomethingWithI
} yield ()
}
}
Alternatively, you might decide that tracking all accesses / changes of the state of i is important enough that you want to make it explicit, for example using the StateT monad transformer that keeps track of the state of type Int:
// Explicitly track variable `i` in a state monad
import cats.data.StateT
import cats.data.StateT._
def whileInc_2: IO[Unit] = {
// Let's make `doSthWithI` not too boring,
// so that it at least accesses the state
// with variable `i`
def doSthWithI: StateT[IO, Int, Unit] =
for {
i <- get[IO, Int]
_ <- liftF(IO { println(i) })
} yield ()
// Define the loop
val loop = Monad[StateT[IO, Int, ?]].whileM_(
StateT.pure(true)
) {
for {
i <- get[IO, Int]
_ <- set[IO, Int](i + 1)
_ <- doSthWithI
} yield ()
}
// The `_._2` is there only to make the
// types match up, it's never actually used,
// because the loop runs forever.
loop.run(1).map(_._2)
}
It would function similarly with two variables x and y (just use (Int, Int) instead of Int as the state).
Admittedly, this code seems a bit more verbose, and especially the last example begins to look like enterprise-edition fizz-buzz, but the point is that if you consistently apply these techniques to your code base, you don't have to dig into the body of a function to get a fairly good idea of what it can (or cannot) do, based on its signature alone. This in turn is advantageous when you are trying to understand the code (you can understand what it does by skimming over signatures, without reading the code in the bodies), and it also forces you to write simpler code that is easier to test (at least that's the idea).
Functional alternative for while(true) loop in your case would be just recursive function:
#tailrec
def loop(x: Int, y: Int) { //every call of loop will receive updated state
doSomething()
loop(someCalculation(), otherCalculation()) //next iteration with updated state
}
loop(1,2); //start "loop" with initial state
The very important thing here is #tailrec annotation. It checks if the recursive call to loop is in the tail position and therefore tail-call optimalization can be applied. If that wouldn't be the case your "loop" would cause StackOverflowException.
An interesting fact is that optimized function will look in bytecode very similarly to while loop.
This approach is not really directly related to cats, but rather with functional programming. Recursion is very commonly used in FP.

How the sample Simple IO Type get rid of side effects in "FP in Scala"?

I am reading chapter 13.2.1 and came across the example that can handle IO input and get rid of side effect in the meantime:
object IO extends Monad[IO] {
def unit[A](a: => A): IO[A] = new IO[A] { def run = a }
def flatMap[A,B](fa: IO[A])(f: A => IO[B]) = fa flatMap f
def apply[A](a: => A): IO[A] = unit(a)
}
def ReadLine: IO[String] = IO { readLine }
def PrintLine(msg: String): IO[Unit] = IO { println(msg) }
def converter: IO[Unit] = for {
_ <- PrintLine("Enter a temperature in degrees Fahrenheit: ")
d <- ReadLine.map(_.toDouble)
_ <- PrintLine(fahrenheitToCelsius(d).toString)
} yield ()
I have couple of questions regarding this piece of code:
In the unit function, what does def run = a really do?
In the ReadLine function, what does IO { readLine } really do? Will it really execute the println function or just return an IO type?
What does _ in the for comprehension mean (_ <- PrintLine("Enter a temperature in degrees Fahrenheit: ")) ?
Why it removes the IO side effects? I saw these functions still interact with inputs and outputs.
The definition of your IO is as follows:
trait IO { def run: Unit }
Following that definition, you can understand that writing new IO[A] { def run = a } means initialising an anonymous class from your trait, and assigning a to be the method that runs when you call IO.run. Because a is a by name parameter, nothing is actually ran at creation time.
Any object or class in Scala which follows a contract of an apply method, can be called as: ClassName(args), where the compiler will search for an apply method on the object/class and convert it to a ClassName.apply(args) call. A more elaborate answer can be found here. As such, because the IO companion object posses such a method:
def apply[A](a: => A): IO[A] = unit(a)
The expansion is allowed to happen. Thus we actually call IO.apply(readLine) instead.
_ has many overloaded uses in Scala. This occurrence means "I don't care about the value returned from PrintLine, discard it". It is so because the value returned is of type Unit, which we have nothing to do with.
It is not that the IO datatype removes the part of doing IO, it's that it defers it to a later point in time. We usually say IO runs at the "edges" of the application, in the Main method. These interactions with the out side world will still occur, but since we encapsulate them inside IO, we can reason about them as values in our program, which brings a lot of benefit. For example, we can now compose side effects and depend on the success/failure of their execution. We can mock out these IO effects (using other data types such as Const), and many other surprisingly nice properties.
The simplest way to look at IO monad as a small piece of program definition.
Thus:
This is IO definition, run method defines what IO monad does. new IO[A] { def run = a } is scala way of creating an instance of class and defining method run.
There is a bit of syntactical sugar is going on. IO { readLine } is same as IO.apply { readLine } or IO.apply(readLine) where readLine is call-by-name function of type => String. This calls the unit method from object IO and thus this is just creation of instance of IO class that does not run yet.
Since IO is a monad, for comprehension can be used. It requires storing a result of each monad operation in a syntax like result <- someMonad. To ignore the result, _ can be used, thus _ <- someMonad reads as execute the monad but ignore the result.
This methods are all IO definitions, they don't run anything and thus there is no side effect. Side effects only appears when IO.run is called.