How do I abstract over effects and use ContextShift with Scala Cats? - scala

I am creating in Scala and Cats a function that does some I/O and that will be called by other parts of the code. I'm also learning Cats and I want my function to:
Be generic in its effect and use a F[_]
Run on a dedicated thread pool
I want to introduce async boundaries
I assume that all my functions are generic in F[_] up to the main method because I'm trying to follow these Cat's guidelines
But I struggle to make these constraint to work by using ContextShift or ExecutionContext. I have written a full example here and this is an exctract from the example:
object ComplexOperation {
// Thread pool for ComplexOperation internal use only
val cs = IO.contextShift(
ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())
)
// Complex operation that takes resources and time
def run[F[_]: Sync](input: String): F[String] =
for {
r1 <- Sync[F].delay(cs.shift) *> op1(input)
r2 <- Sync[F].delay(cs.shift) *> op2(r1)
r3 <- Sync[F].delay(cs.shift) *> op3(r2)
} yield r3
def op1[F[_]: Sync](input: String): F[Int] = Sync[F].delay(input.length)
def op2[F[_]: Sync](input: Int): F[Boolean] = Sync[F].delay(input % 2 == 0)
def op3[F[_]: Sync](input: Boolean): F[String] = Sync[F].delay(s"Complex result: $input")
}
This clearly doesn't abstract over effects as ComplexOperation.run needs a ContextShift[IO] to be able to introduce async boundaries. What is the right (or best) way of doing this?
Creating ContextShift[IO] inside ComplexOperation.run makes the function depend on IO which I don't want.
Moving the creation of a ContextShift[IO] on the caller will simply shift the problem: the caller is also generic in F[_] so how does it obtain a ContextShift[IO] to pass to ComplexOperation.run without explicitly depending on IO?
Remember that I don't want to use one global ContextShift[IO] defined at the topmost level but I want each component to decide for itself.
Should my ComplexOperation.run create the ContextShift[IO] or is it the responsibility of the caller?
Am I doing this right at least? Or am I going against standard practices?

So I took the liberty to rewrite your code, hope it helps:
import cats.effect._
object Functions {
def sampleFunction[F[_]: Sync : ContextShift](file: String, blocker: Blocker): F[String] = {
val handler: Resource[F, Int] =
Resource.make(
blocker.blockOn(openFile(file))
) { handler =>
blocker.blockOn(closeFile(handler))
}
handler.use(handler => doWork(handler))
}
private def openFile[F[_]: Sync](file: String): F[Int] = Sync[F].delay {
println(s"Opening file $file with handler 2")
2
}
private def closeFile[F[_]: Sync](handler: Int): F[Unit] = Sync[F].delay {
println(s"Closing file handler $handler")
}
private def doWork[F[_]: Sync](handler: Int): F[String] = Sync[F].delay {
println(s"Calculating the value on file handler $handler")
"The final value"
}
}
object Main extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
val result = Blocker[IO].use { blocker =>
Functions.sampleFunction[IO](file = "filePath", blocker)
}
for {
data <- result
_ <- IO(println(data))
} yield ExitCode.Success
}
}
You can see it running here.
So, what does this code does.
First, it creates a Resource for the file, since close has to be done, even on guarantee or on failure.
It is using Blocker to run the open and close operations on a blocking thread poo (that is done using ContextShift).
Finally, on the main, it creates a default Blocker for instance, for **IO*, and uses it to call your function; and prints the result.
Fell free to ask any question.

Related

ConcurrentHashMap[String, AtomicInteger] or ConcurrentHashMap[String, Int] for thread-safe counters?

When incrementing concurrent counters by key in ConcurrentHashMap is it safe to use regular Int for value or do we have to use AtomicInteger? For example consider the following two implementations
ConcurrentHashMap[String, Int]
final class ExpensiveMetrics(implicit system: ActorSystem, ec: ExecutionContext) {
import scala.collection.JavaConverters._
private val chm = new ConcurrentHashMap[String, Int]().asScala
system.scheduler.schedule(5.seconds, 60.seconds)(publishAllMetrics())
def countRequest(key: String): Unit =
chm.get(key) match {
case Some(value) => chm.update(key, value + 1)
case None => chm.update(key, 1)
}
private def resetCount(key: String) = chm.replace(key, 0)
private def publishAllMetrics(): Unit =
chm foreach { case (key, value) =>
// publishMetric(key, value.doubleValue())
resetCount(key)
}
}
ConcurrentHashMap[String, AtomicInteger]
final class ExpensiveMetrics(implicit system: ActorSystem, ec: ExecutionContext) {
import scala.collection.JavaConverters._
private val chm = new ConcurrentHashMap[String, AtomicInteger]().asScala
system.scheduler.schedule(5.seconds, 60.seconds)(publishAllMetrics())
def countRequest(key: String): Unit =
chm.getOrElseUpdate(key, new AtomicInteger(1)).incrementAndGet()
private def resetCount(key: String): Unit =
chm.getOrElseUpdate(key, new AtomicInteger(0)).set(0)
private def publishAllMetrics(): Unit =
chm foreach { case (key, value) =>
// publishMetric(key, value.doubleValue())
resetCount(key)
}
}
Is the former implementation safe? If not, at what point in the snippet can race-condition be introduced and why?
The context of the question are AWS CloudWatch metrics which can get very expensive on high-frequency APIs if posted on each request. So I am trying to "batch" them up and publish them periodically.
The first implementation is not correct, because the countRequest method is not atomic. Consider this sequence of events:
Threads A and B both call countRequest with key "foo"
Thread A obtains the counter value, let's call it x
Thread B obtains the counter value. It's the same value x, because Thread A hasn't updated the counter yet.
Thread B updates the map with the new counter value, x+1
Thread A updates the map, and because it obtained the counter value before B wrote the new counter value, it also writes x+1.
The counter should be x+2, but it is x+1. It's a classic lost update problem.
The second implementation has a similar problem due to the use of the `getOrElseUpdate` method. `ConcurrentHashMap` does not have that method, therefore the Scala wrapper needs to emulate it. I think the implementation is the one inherited from `scala.collection.mutable.MapOps`, and it is defined like so:
```
def getOrElseUpdate(key: K, op: => V): V =
get(key) match {
case Some(v) => v
case None => val d = op; this(key) = d; d
}
```
This is obviously not atomic.
To implement this correctly, use the compute method on ConcurrentHashMap.
This method will execute atomically, so you won't need an AtomicInteger.

MVar tryPut returns true and isEmpty also returns true

I wrote simple callback(handler) function which i pass to async api and i want to wait for result:
object Handlers {
val logger: Logger = Logger("Handlers")
implicit val cs: ContextShift[IO] =
IO.contextShift(ExecutionContext.Implicits.global)
class DefaultHandler[A] {
val response: IO[MVar[IO, A]] = MVar.empty[IO, A]
def onResult(obj: Any): Unit = {
obj match {
case obj: A =>
println(response.flatMap(_.tryPut(obj)).unsafeRunSync())
println(response.flatMap(_.isEmpty).unsafeRunSync())
case _ => logger.error("Wrong expected type")
}
}
def getResponse: A = {
response.flatMap(_.take).unsafeRunSync()
}
}
But for some reason both tryPut and isEmpty(when i'd manually call onResult method) returns true, therefore when i calling getResponse it sleeps forever.
This is the my test:
class HandlersTest extends FunSuite {
test("DefaultHandler.test") {
val handler = new DefaultHandler[Int]
handler.onResult(3)
val response = handler.getResponse
assert(response != 0)
}
}
Can somebody explain why tryPut returns true, but nothing puts. And what is the right way to use Mvar/channels in scala?
IO[X] means that you have the recipe to create some X. So on your example, yuo are putting in one MVar and then asking in another.
Here is how I would do it.
object Handlers {
trait DefaultHandler[A] {
def onResult(obj: Any): IO[Unit]
def getResponse: IO[A]
}
object DefaultHandler {
def apply[A : ClassTag]: IO[DefaultHandler[A]] =
MVar.empty[IO, A].map { response =>
new DefaultHandler[A] {
override def onResult(obj: Any): IO[Unit] = obj match {
case obj: A =>
for {
r1 <- response.tryPut(obj)
_ <- IO(println(r1))
r2 <- response.isEmpty
_ <- IO(println(r2))
} yield ()
case _ =>
IO(logger.error("Wrong expected type"))
}
override def getResponse: IO[A] =
response.take
}
}
}
}
The "unsafe" is sort of a hint, but every time you call unsafeRunSync, you should basically think of it as an entire new universe. Before you make the call, you can only describe instructions for what will happen, you can't actually change anything. During the call is when all the changes occur. Once the call completes, that universe is destroyed, and you can read the result but no longer change anything. What happens in one unsafeRunSync universe doesn't affect another.
You need to call it exactly once in your test code. That means your test code needs to look something like:
val test = for {
handler <- TestHandler.DefaultHandler[Int]
_ <- handler.onResult(3)
response <- handler.getResponse
} yield response
assert test.unsafeRunSync() == 3
Note this doesn't really buy you much over just using the MVar directly. I think you're trying to mix side effects inside IO and outside it, but that doesn't work. All the side effects need to be inside.

Scala: Invokation of methods with/without () with overridable implicits

Here is a definition of method, that uses ExecutionContext implicitly, and allows client to override it. Two execution contexts are used to test it:
val defaultEc = ExecutionContext.fromExecutor(
Executors.newFixedThreadPool(5))
Names of threads look like: 'pool-1-thread-1' to 'pool-1-thread-5'
And the 2nd one from Scala:
scala.concurrent.ExecutionContext.Implicits.global
Names of threads look like: 'scala-execution-context-global-11'
Client can override default implicit via:
implicit val newEc = scala.concurrent.ExecutionContext.Implicits.global
Unfortunately it is overridable only, when a method with implicit is invoked without ():
val r = FutureClient.f("testDefault") //prints scala-execution-context-global-11
not working:
val r = FutureClient.f("testDefault")() //still prints: pool-1-thread-1
The question is WHY it works this way? Cause it makes it much more complicated for clients of API
Here is a full code to run it and play:
object FutureClient {
//thread names will be from 'pool-1-thread-1' to 'pool-1-thread-5'
val defaultEc = ExecutionContext.fromExecutor(
Executors.newFixedThreadPool(5))
def f(beans: String)
(implicit executor:ExecutionContext = defaultEc)
: Future[String] = Future {
println("thread: " + Thread.currentThread().getName)
TimeUnit.SECONDS.sleep(Random.nextInt(3))
s"$beans"
}
}
class FutureTest {
//prints thread: pool-1-thread-1
#Test def testFDefault(): Unit ={
val r = FutureClient.f("testDefault")
while (!r.isCompleted) {
TimeUnit.SECONDS.sleep(2)
}
}
//thread: scala-execution-context-global-11
#Test def testFOverridable(): Unit ={
implicit val newEc = scala.concurrent.ExecutionContext.Implicits.global
val r = FutureClient.f("testDefault")
while (!r.isCompleted) {
TimeUnit.SECONDS.sleep(2)
}
}
//prints pool-1-thread-1, but not 'scala-execution-context-global-11'
//cause the client invokes f with () at the end
#Test def testFOverridableWrong(): Unit ={
implicit val newEc = scala.concurrent.ExecutionContext.Implicits.global
val r = FutureClient.f("testDefault")()
while (!r.isCompleted) {
TimeUnit.SECONDS.sleep(2)
}
}
}
I have already discussed a couple of related topics, but they are related to API definition, so it is a new issue, not covered by those topics.
Scala Patterns To Avoid: Implicit Arguments With Default Values
f("testDefault") (or f("testDefault")(implicitly)) means that implicit argument is taken from implicit context.
f("testDefault")(newEc) means that you specify implicit argument explicitly. If you write f("testDefault")() this means that you specify implicit argument explicitly but since the value isn't provided it should be taken from default value.

Play/Scala: Making unknown number of I/O calls in parallell, watining for the results

So, I read the article here about parallel comprehension. He gives the following code example:
// Make 3 parallel async calls
val fooFuture = WS.url("http://foo.com").get()
val barFuture = WS.url("http://bar.com").get()
val bazFuture = WS.url("http://baz.com").get()
for {
foo <- fooFuture
bar <- barFuture
baz <- bazFuture
} yield {
// Build a Result using foo, bar, and baz
Ok(...)
}
All fine so far, but, I am in a situation where I don't know how many WS.get()'s I need to do always, I want it to be dynamic. So for instance:
val checks = Seq(callOne(param), callTwo(param))
Where the calls are:
def callOne(param: String): Future[Boolean] = {
// do something and return the Future with a true/false value
Future(true)
}
def callTwo(param: String): Future[Boolean] = {
// do something and return the Future with a true/false value
Future(false)
}
So, my question is, how shall I react on the results of my sequence with WS calls (or database queries for that matter), in a for-yield?
I have given two example of calls, but I want the same code be able to process 1 to many number of calls in parallel and gather the results in the for-yield to ultimately proceed to do other things.
Important: All calls should be carried out in parallel, the quickest ones will complete before the slow ones without any respect to what order they are fired.
Future.sequence is likely what you want.
Example usage:
val futures = List(WS.url("http://foo.com").get(), WS.url("http://bar.com").get())
Future.sequence(futures) # => Transforms a Seq[Future[_]] to Future[Seq[_]]
The future returns from Future.sequence will not be completed until the all of the futures in the input sequence are completed.
Bonus:
If your futures are heterogeneously typed, and you need to preserve that type, you can use Hlist. I've written the following snippet which will take an Hlist of futures, and transform it to a Future containing an Hlist of resolved values:
import shapeless._
import scala.concurrent.{ExecutionContext,Future}
object FutureHelpers {
object FutureReducer extends Poly2 {
import scala.concurrent.ExecutionContext.Implicits.global
implicit def f[A, B <: HList] = at[Future[A], Future[B]] { (f, resultFuture) =>
for {
result <- resultFuture
value <- f
} yield value :: result
}
}
// Like Future.sequence, but for HList
// hsequence(Future { 1 } :: Future { "string" } :: HNil)
// => Future { 1 :: "string" :: HNil }
def hsequence[T <: HList](hlist: T)(implicit
executor: ExecutionContext,
folder: RightFolder[T, Future[HNil], FutureReducer.type]) = {
hlist.foldRight(Future.successful[HNil](HNil))(FutureReducer)
}
}

Thread-safely transforming a value in a mutable map

Suppose I want to use a mutable map in Scala to keep track of the number of times I've seen some strings. In a single-threaded context, this is easy:
import scala.collection.mutable.{ Map => MMap }
class Counter {
val counts = MMap.empty[String, Int].withDefaultValue(0)
def add(s: String): Unit = counts(s) += 1
}
Unfortunately this isn't thread-safe, since the get and the update don't happen atomically.
Concurrent maps add a few atomic operations to the mutable map API, but not the one I need, which would look something like this:
def replace(k: A, f: B => B): Option[B]
I know I can use ScalaSTM's TMap:
import scala.concurrent.stm._
class Counter {
val counts = TMap.empty[String, Int]
def add(s: String): Unit = atomic { implicit txn =>
counts(s) = counts.get(s).getOrElse(0) + 1
}
}
But (for now) that's still an extra dependency. Other options would include actors (another dependency), synchronization (potentially less efficient), or Java's atomic references (less idiomatic).
In general I'd avoid mutable maps in Scala, but I've occasionally needed this kind of thing, and most recently I've used the STM approach (instead of just crossing my fingers and hoping I don't get bitten by the naïve solution).
I know there are a number of trade-offs here (extra dependencies vs. performance vs. clarity, etc.), but is there anything like a "right" answer to this problem in Scala 2.10?
How about this one? Assuming you don't really need a general replace method right now, just a counter.
import java.util.concurrent.ConcurrentHashMap
import java.util.concurrent.atomic.AtomicInteger
object CountedMap {
private val counts = new ConcurrentHashMap[String, AtomicInteger]
def add(key: String): Int = {
val zero = new AtomicInteger(0)
val value = Option(counts.putIfAbsent(key, zero)).getOrElse(zero)
value.incrementAndGet
}
}
You get better performance than synchronizing on the whole map, and you also get atomic increments.
The simplest solution is definitely synchronization. If there is not too much contention, performance might not be that bad.
Otherwise, you could try to roll up your own STM-like replace implementation. Something like this might do:
object ConcurrentMapOps {
private val rng = new util.Random
private val MaxReplaceRetryCount = 10
private val MinReplaceBackoffTime: Long = 1
private val MaxReplaceBackoffTime: Long = 20
}
implicit class ConcurrentMapOps[A, B]( val m: collection.concurrent.Map[A,B] ) {
import ConcurrentMapOps._
private def replaceBackoff() {
Thread.sleep( (MinReplaceBackoffTime + rng.nextFloat * (MaxReplaceBackoffTime - MinReplaceBackoffTime) ).toLong ) // A bit crude, I know
}
def replace(k: A, f: B => B): Option[B] = {
m.get( k ) match {
case None => return None
case Some( old ) =>
var retryCount = 0
while ( retryCount <= MaxReplaceRetryCount ) {
val done = m.replace( k, old, f( old ) )
if ( done ) {
return Some( old )
}
else {
retryCount += 1
replaceBackoff()
}
}
sys.error("Could not concurrently modify map")
}
}
}
Note that collision issues are localized to a given key. If two threads access the same map but work on distinct keys, you'll have no collisions and the replace operation will always succeed the first time. If a collision is detected, we wait a bit (a random amount of time, so as to minimize the likeliness of threads fighting forever for the same key) and try again.
I cannot guarantee that this is production-ready (I just tossed it right now), but that might do the trick.
UPDATE: Of course (as Ionuț G. Stan pointed out), if all you want is increment/decrement a value, java's ConcurrentHashMap already provides thoses operations in a lock-free manner.
My above solution applies if you need a more general replace method that would take the transformation function as a parameter.
You're asking for trouble if your map is just sitting there as a val. If it meets your use case, I'd recommend something like
class Counter {
private[this] myCounts = MMap.empty[String, Int].withDefaultValue(0)
def counts(s: String) = myCounts.synchronized { myCounts(s) }
def add(s: String) = myCounts.synchronized { myCounts(s) += 1 }
def getCounts = myCounts.synchronized { Map[String,Int]() ++ myCounts }
}
for low-contention usage. For high-contention, you should use a concurrent map designed to support such use (e.g. java.util.concurrent.ConcurrentHashMap) and wrap the values in AtomicWhatever.
If you are ok to work with future based interface:
trait SingleThreadedExecutionContext {
val ec = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())
}
class Counter extends SingleThreadedExecutionContext {
private val counts = MMap.empty[String, Int].withDefaultValue(0)
def get(s: String): Future[Int] = future(counts(s))(ec)
def add(s: String): Future[Unit] = future(counts(s) += 1)(ec)
}
Test will look like:
class MutableMapSpec extends Specification {
"thread safe" in {
import ExecutionContext.Implicits.global
val c = new Counter
val testData = Seq.fill(16)("1")
await(Future.traverse(testData)(c.add))
await(c.get("1")) mustEqual 16
}
}