I wrote a very simple mechanism that only allows a max number of function calls during a given number of seconds. See it as a basic rate limiter.
It takes the execution to limit as an argument and returns the return value of that original execution.
The problem is that executions can be synchronous (of type => A) or asynchronous (of type => Future[A]) and that leads to two extremely similar functions:
case class Limiter[A](max: Int, seconds: Int) {
private val queue = Queue[Long]()
def limit(value: => A): Option[A] = {
val now = System.currentTimeMillis()
if (queue.length == max) {
val oldest = queue.head
if (now - oldest < seconds * 1000) return None
else queue.dequeue()
}
queue.enqueue(now)
Some(value)
}
def limitFuture(future: => Future[A]): Future[Option[A]] = {
val now = System.currentTimeMillis()
if (queue.length == max) {
val oldest = queue.head
if (now - oldest < seconds * 1000) return Future(None)
else queue.dequeue()
}
future.map { x =>
queue.enqueue(now)
Some(x)
}
}
}
(I am not actually using Option but a set of types I defined, just using Option for simplicity sake)
Examples of execution:
// Prevent more than 5 runs/minute. Useful for example to prevent email spamming
val limit = Limit[Boolean](5, 60)
val result = limitFuture { sendEmail(...) } // `sendEmail` returns a future
// Prevent more than 1 run/hour. Useful for example to cache HTML response
val limit = Limit[String](1, 3600)
val limit { getHTML(...) } // `getHTML` returns the HTML as a string directly
How can I refactor these methods to avoid repetition? Later needs might include other argument types and not only direct type + Futured type, so I'd like to keep my options open if it's possible.
The only "solution" I could come up with so far is to replace limit:
def limit(value: => A): Option[A] = {
Await.result(limitFuture(Future.successful(value)), 5.seconds)
}
Well, it works, but it feels backwards. I would rather have the => A being the base version that other methods extend or, even better, a generic (private) method that both limit and limitFuture could extend.
Actually, it would be even better-er if a single limit function could take care of this regardless of argument but I doubt it's possible.
You can condense this down to one method with an implicit parameter handling the differences:
trait Limitable[A, B] {
type Out
def none: Out
def some(b: B, f: () => Unit): Out
}
implicit def rawLimitable[A]: Limitable[A, A] = new Limitable[A, A] {
type Out = Option[A]
def none = None
def some(a: A, f: () => Unit): Out = {
f()
Some(a)
}
}
implicit def futureLimitable[A]: Limitable[A, Future[A]] = new Limitable[A, Future[A]] {
type Out = Future[Option[A]]
def none = Future(None)
def some(future: Future[A], f: () => Unit): Out = future.map { a =>
f()
Some(a)
}
}
case class Limiter[A](max: Int, seconds: Int) {
private val queue = Queue[Long]()
def limit[B](in: => B)(implicit l: Limitable[A, B]): l.Out = {
val now = System.currentTimeMillis()
if (queue.length == max) {
val oldest = queue.head
if (now - oldest < seconds * 1000) return l.none
else queue.dequeue()
}
l.some(in, {() => queue.enqueue(now)})
}
}
And use it like:
val limit = Limit[String](1, 3600)
limit.limit("foo")
limit.limit(Future("bar"))
You can use Applicative typeclass from cats or scalaz. Applicative, among other things, lets you lift a value into some context F (using pure) and is also a functor, so you can use map on F[A].
Currently you want it for Id and Future types (you need ExecutionContext in scope for Future applicative to work). It will work for things like Vector or Validated, tho you might have problems adding custom collection types.
import cats._, implicits._
import scala.concurrent._
import scala.collection.mutable.Queue
case class Limiter[A](max: Int, seconds: Int) {
private val queue = Queue[Long]()
def limitA[F[_]: Applicative](value: => F[A]): F[Option[A]] = {
val now = System.currentTimeMillis()
if (queue.length == max) {
val oldest = queue.head
if (now - oldest < seconds * 1000) return none[A].pure[F]
else queue.dequeue()
}
value.map { x =>
queue.enqueue(now)
x.some
}
}
// or leave these e.g. for source compatibility
def limit(value: => A): Option[A] = limitA[Id](value)
def limitFuture(future: => Future[A])(implicit ec: ExecutionContext): Future[Option[A]] = limitA(future)
}
Notes:
I'm using none[A] instead of None: Option[A] and a.some instead of Some(a): Option[A]. These helpers are available in both cats and scalaz and you need them because F[_] here is not defined as covariant.
You have to specify Id as a type explicitly, e.g. .limitA[Id](3). This is not the case with Future, however.
You map call is strange. It is parsed as:
future.map {
queue.enqueue(now) // in current thread
x => Some(x)
}
Which is the same as
queue.enqueue(now) // in current thread
future.map {
x => Some(x)
}
Related
Background
I have been reading the book Functional Programming in Scala, and have some questions regarding the content in Chapter 7: Purely functional parallelism.
Here is the code for the answers in the book: Par.scala, but I am confused about certain part of it.
Here is the first part of the code of Par.scala, which stands for Parallelism:
import java.util.concurrent._
object Par {
type Par[A] = ExecutorService => Future[A]
def unit[A](a: A): Par[A] = (es: ExecutorService) => UnitFuture(a)
private case class UnitFuture[A](get: A) extends Future[A] {
def isDone = true
def get(timeout: Long, units: TimeUnit): A = get
def isCancelled = false
def cancel(evenIfRunning: Boolean): Boolean = false
}
def map2[A, B, C](a: Par[A], b: Par[B])(f: (A, B) => C): Par[C] =
(es: ExecutorService) => {
val af = a(es)
val bf = b(es)
UnitFuture(f(af.get, bf.get))
}
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
def run[A](es: ExecutorService)(a: Par[A]): Future[A] = a(es)
def asyncF[A, B](f: A => B): A => Par[B] =
a => lazyUnit(f(a))
def map[A, B](pa: Par[A])(f: A => B): Par[B] =
map2(pa, unit(()))((a, _) => f(a))
}
The simplest possible model for Par[A] might be ExecutorService => Future[A], and run simply returns the Future.
unit promotes a constant value to a parallel computation by returning a UnitFuture, which is a simple implementation of Future that just wraps a constant value.
map2 combines the results of two parallel computations with a binary function.
fork marks a computation for concurrent evaluation. The evaluation won’t actually occur until forced by run. Here is with its simplest and most natural implementation of it. Even though it has its problems, let's first put them aside.
lazyUnit wraps its unevaluated argument in a Par and marks it for concurrent evaluation.
run extracts a value from a Par by actually performing the computation.
asyncF converts any function A => B to one that evaluates its result asynchronously.
Questions
The fork is the function confuses me a lot here, because it takes a lazy argument, which will be evaluated later when it is called. Then my questions are more about when we should use this fork, i.e., when we need lazy-evaluation and when we need to have the value directly.
Here is an exercise from the book:
EXERCISE 7.5
Hard: Write this function, called sequence. No additional primitives are required. Do not call run.
def sequence[A](ps: List[Par[A]]): Par[List[A]]
And here is the answers (offered here).
First
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldRight[Par[List[A]]](unit(List()))((h, t) => map2(h, t)(_ :: _))
What is the different between above code and the following:
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldLeft[Par[List[A]]](unit(List()))((t, h) => map2(h, t)(_ :: _))
Additionally
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] =
as match {
case Nil => unit(Nil)
case h :: t => map2(h, fork(sequenceRight(t)))(_ :: _)
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] = fork {
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(sequenceBalanced(l), sequenceBalanced(r))(_ ++ _)
}
}
In sequenceRight, fork is used when recursive function is directly called. However, in sequenceBalanced, fork is used outside of the whole function body.
Then, what is the differences or above code and the following (where we switched the places of fork):
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] = fork {
as match {
case Nil => unit(Nil)
case h :: t => map2(h, sequenceRight(t))(_ :: _)
}
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] =
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(fork(sequenceBalanced(l)), fork(sequenceBalanced(r)))(_ ++ _)
}
Finally, given the sequence defined above, we have the following function:
def parMap[A,B](ps: List[A])(f: A => B): Par[List[B]] = fork {
val fbs: List[Par[B]] = ps.map(asyncF(f))
sequence(fbs)
}
I would like to know, can I also implement the function in the following way, which is by applying the lazyUnit defined in the beginning? Is this implementation lazyUnit(ps.map(f)) lazy?
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
I did not completely understand your doubt. But I see a major problem with the following solution,
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
To understand the problem lets look at def lazyUnit,
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
So... lazyUnit takes an expression of type => A and submits it to ExecutorService to get evaluated. And returns the wrapped result of this parallel computation as Par[A].
In parMap for every element of ps: List[A], we not only have to evaluate the corresponding mapping using the function f: A => B but we have to do these evaluations in parallel.
But our solution lazyUnit(ps.map(f)) will submit the whole { ps.map(f) } evaluation as a single task to our ExecutionService. Which means we are not doing it in parallel.
What we need to do is make sure that for each element a in ps: [A], the function f: A => B is executed as a separate task for our ExecutorService.
Now, as we learned from our implementation is that we can run an expression of type exp: => A by using lazyUnit(exp) to get a result: Par[A].
So, we will do exactly that for every a: A in ps: List[A],
val parMappedTmp = ps.map( a => lazyUnit(f(a) ) )
// or
val parMappedTmp = ps.map( a => asyncF(f)(a) )
// or
val parMappedTmp = ps.map(asyncF(f))
But, Now our parMappedTmp is a List[Par[B]] and whereas we needed a Par[List[B]]
So, you will need a function with the following signature to get what you wanted,
def sequence[A](ps: List[Par[A]]): Par[List[A]]
Once you have it,
val parMapped = sequence(parMappedTmp)
I have an Iterator[Record] which is ordered on record.id this way:
record.id=1
record.id=1
...
record.id=1
record.id=2
record.id=2
..
record.id=2
Records of a specific ID could occur a large number of times, so I want to write a function that takes this iterator as input, and returns an Iterator[Iterator[Record]] output in a lazy manner.
I was able to come up with the following, but it fails on StackOverflowError after 500K records or so:
def groupByIter[T, B](iterO: Iterator[T])(func: T => B): Iterator[Iterator[T]] = new Iterator[Iterator[T]] {
var iter = iterO
def hasNext = iter.hasNext
def next() = {
val first = iter.next()
val firstValue = func(first)
val (i1, i2) = iter.span(el => func(el) == firstValue)
iter = i2
Iterator(first) ++ i1
}
}
What am I doing wrong?
Trouble here is that each Iterator.span call makes another stacked closure for trailing iterator, and without any trampolining it's very easy to overflow.
Actually I dont think there is an implementation, which is not memoizing elements of prefix iterator, since followed iterator could be accessed earlier than prefix is drain out.
Even in .span implementation there is a Queue to memoize elements in the Leading definition.
So easiest implementation that I could imagine is the following via Stream.
implicit class StreamChopOps[T](xs: Stream[T]) {
def chopBy[U](f: T => U): Stream[Stream[T]] = xs match {
case x #:: _ =>
def eq(e: T) = f(e) == f(x)
xs.takeWhile(eq) #:: xs.dropWhile(eq).chopBy(f)
case _ => Stream.empty
}
}
Although it could be not the most performant as it memoize a lot. But with proper iterating of that, GC should handle problem of excess intermediate streams.
You could use it as myIterator.toStream.chopBy(f)
Simple check validates that following code can run without SO
Iterator.fill(10000000)(Iterator(1,1,2)).flatten //1,1,2,1,1,2,...
.toStream.chopBy(identity) //(1,1),(2),(1,1),(2),...
.map(xs => xs.sum * xs.size).sum //60000000
Inspired by chopBy implemented by #Odomontois here is a chopBy I implemented for Iterator. Of course each bulk should fit allocated memory. It doesn't looks very elegant but it seems to work :)
implicit class IteratorChopOps[A](toChopIter: Iterator[A]) {
def chopBy[U](f: A => U) = new Iterator[Traversable[A]] {
var next_el: Option[A] = None
#tailrec
private def accum(acc: List[A]): List[A] = {
next_el = None
val new_acc = hasNext match {
case true =>
val next = toChopIter.next()
acc match {
case Nil =>
acc :+ next
case _ MatchTail t if (f(t) == f(next)) =>
acc :+ next
case _ =>
next_el = Some(next)
acc
}
case false =>
next_el = None
return acc
}
next_el match{
case Some(_) =>
new_acc
case None => accum(new_acc)
}
}
def hasNext = {
toChopIter.hasNext || next_el.isDefined
}
def next: Traversable[A] = accum(next_el.toList)
}
}
And here is an extractor for matching tail:
object MatchTail {
def unapply[A] (l: Traversable[A]) = Some( (l.init, l.last) )
}
I have an idea (vague), to pass (or chain) some implicit value in this manner, not introducing parameters to block f:
def block(x: Int)(f: => Unit)(implicit v: Int) = {
implicit val nv = v + x
f
}
def fun(implicit v: Int) = println(v)
such that if I used something alike:
implicit val ii: Int = 0
block(1) {
block(2) {
fun
}
}
It would print 3.
If I could say def block(x: Int)(f: implicit Int => Unit).
In other words I'm seeking for some design pattern which will allow me to implement this DSL: access some cumulative value inside nested blocks but without explicitly passing it as parameter. Is it possible? (implicits are not necessary, just a hint to emphasize that I don't want to pass that accumulator explicitly). Of course upper code will print 0.
EDIT: One of possible usages: composing http routes, in a following manner
prefix("path") {
prefix("subpath") {
post("action1") { (req, res) => do action }
get("action2") { (req, res) => do action }
}
}
Here post and get will access (how?) accumulated prefix, say List("path", "subpath") or "/path/subpath/".
Consider using DynamicVariable for this. It's really simple to use, and thread-safe:
val acc: DynamicVariable[Int] = new DynamicVariable(0)
def block(x: Int)(f: => Unit) = {
acc.withValue(acc.value + x)(f)
}
def fun = println(acc.value)
Passing state via implicit is dirty and will lead to unexpected and hard to track down bugs. What you're asking to do is build a function that can compose in such a way that nested calls accumulate over some operation and anything else uses that value to execute the function?
case class StateAccum[S](init: S){
val op: S => S
def flatMap[A <: S](f: S => StateAccum[A]) ={
val StateAccum(out) = f(s)
StateAccum(op(init, out))
}
def apply(f: S => A) = f(init)
}
which could allow you do exactly what you're after with a slight change in how you're calling it.
Now, if you really want the nested control structures, your apply would have to use an implicit value to distinguish the types of the return such that it applied the function to one and a flatMap to StateAccum returns. It gets crazy but looks like the following:
def apply[A](f: S => A)(implicit mapper: Mapper[S, A]): mapper.Out = mapper(this, f)
trait Mapper[S, A]{
type Out
def apply(s: StateAccum[S], f: S => A): Out
}
object Mapper extends LowPriorityMapper{
implicit def acuum[S, A <: S] = new Mapper[S, StateAccum[A]]{
type Out = StateAccum[A]
def apply(s: StateAccum[S], f: S => StateAccum[A]) = s.flatMap(f)
}
}
trait LowPriorityMapper{
implicit def acuum[S, A] = new Mapper[S, A]{
type Out = A
def apply(s: StateAccum[S], f: S => A) = f(s.init)
}
}
I'm basically following the example given at the Scala API page for delimited continuations. The code below works fine:
import scala.util.continuations._
import scala.collection.mutable.HashMap
val sessions = new HashMap[Int, Int=>Unit]
def ask(prompt: String): Int #cps[Unit] = shift {
ret: (Int => Unit) => {
val id = sessions.size
printf("%s\nrespond with: submit(0x%x, ...)\n", prompt, id)
sessions += id -> ret
}
}
def submit(id: Int, addend: Int): Unit = {
sessions.get(id) match {
case Some(continueWith) => continueWith(addend)
}
}
def go = reset {
println("Welcome!")
val first = ask("Please give me a number")
val second = ask("Please enter another number")
printf("The sum of your numbers is: %d\n", first + second)
}
However, when I modify go to the following:
def go = reset {
println("Welcome!")
List("First?","Second?").map[Int #cps[Unit]](ask)
}
I get this error:
error: wrong number of type parameters for method map: [B, That](f: String => B)
(implicit bf: scala.collection.generic.CanBuildFrom[List[String],B,That])That
List("First?","Second?").map[Int #cps[Unit]](ask)
^
Adding Any as a second type parameter doesn't help. Any idea what types I should be supplying?
The reason is that this is simply not possible without creating a CPS-transformed map method on List: the CPS annotations make the compiler turn your methods “inside out” in order to pass the continuation back to where it is needed and the standard List.map does not obey the transformed contract. If you want to have your mind wrapped in Klein bottles for a while you may look at the class files produced from your source, in particular the method signatures.
This is the primary reason why the CPS plugin will never be a complete generic solution to this problem, which is not due to a deficiency but caused by an inherent mismatch between “straight” code and continuation passing style.
You need to give correct parameter for the CanBuildFrom implicit to be found:
List("First?","Second?").map[Int #cps[Unit], List[Int #cps[Unit]](ask)
But do you really need to be explicit about type? maybe just do .map(ask) will work.
Here's the closest thing I could work out. It uses shiftR to reify the continuation rather than reset it, uses a foldRight to construct the suspended continuation chain, uses a shift/reset block to get the continuation after the suspension, and an "animate" method to kick off the suspended continuation.
import scala.collection.mutable.HashMap
import scala.util.continuations._
val sessions = new HashMap[Int, (Unit=>Unit, Int)]
val map = new HashMap[Int, Int]
def ask(pair:(String, Int)) = pair match {
case (prompt, index) => shiftR { (ret: Unit => Unit) => {
val id = sessions.size
printf("%s\nrespond with: submit(0x%x, ...)\n", prompt, id)
sessions += id -> (ret, index)
()
}}
}
def submit(id: Int, addend: Int): Unit = {
sessions.get(id) match {
case Some((continue, index)) => { map.put(index, addend); continue() }
}
}
def sum(m:HashMap[Int,Int]) : Int = {
m.fold[(Int, Int)]((0, 0))((a, b) => (0, {a._2+b._2}))._2
}
type Suspended = ControlContext[Unit,Unit,Unit]
class AnimateList(l:List[Suspended]) {
def suspend(k: => Unit) = (c: Unit) => k
def animate(k:Unit => Unit): Unit = {
l.foldRight(k)(
(elem: Suspended, acc: Unit => Unit) => suspend(elem.fun(acc, ex => ())))()
}
}
implicit def listToAnimateList(l:List[Suspended]) = new AnimateList(l)
reset {
val conts = List("First?","Second?","Third?").zipWithIndex.map(ask)
shift { conts.animate }
println(sum(map))
}
The general question is how to return additional information from methods, beside the actual result of the computation. But I want, that this information can silently be ignored.
Take for example the method dropWhile on Iterator. The returned result is the mutated iterator. But maybe sometimes I might be interested in the number of elements dropped.
In the case of dropWhile, this information could be generated externally by adding an index to the iterator and calculating the number of dropped steps afterwards. But in general this is not possible.
I simple solution is to return a tuple with the actual result and optional information. But then I need to handle the tuple whenever I call the method - even if I'm not interested in the optional information.
So the question is, whether there is some clever way of gathering such optional information?
Maybe through Option[X => Unit] parameters with call-back functions that default to None? Is there something more clever?
Just my two cents here…
You could declare this:
case class RichResult[+A, +B](val result: A, val info: B)
with an implicit conversion to A:
implicit def unwrapRichResult[A, B](richResult: RichResult[A, B]): A = richResult.result
Then:
def someMethod: RichResult[Int, String] = /* ... */
val richRes = someMethod
val res: Int = someMethod
It's definitely not more clever, but you could just create a method that drops the additional information.
def removeCharWithCount(str: String, x: Char): (String, Int) =
(str.replace(x.toString, ""), str.count(x ==))
// alias that drops the additional return information
def removeChar(str: String, x: Char): String =
removeCharWithCount(str, x)._1
Here is my take (with some edits with a more realistic example):
package info {
trait Info[T] { var data: Option[T] }
object Info {
implicit def makeInfo[T]: Info[T] = new Info[T] {
var data: Option[T] = None
}
}
}
Then suppose your original method (and use case) is implemented like this:
object Test extends App {
def dropCounterIterator[A](iter: Iterator[A]) = new Iterator[A] {
def hasNext = iter.hasNext
def next() = iter.next()
override def dropWhile(p: (A) => Boolean): Iterator[A] = {
var count = 0
var current: Option[A] = None
while (hasNext && p({current = Some(next()); current.get})) { count += 1 }
current match {
case Some(a) => Iterator.single(a) ++ this
case None => Iterator.empty
}
}
}
val i = dropCounterIterator(Iterator.from(1))
val ii = i.dropWhile(_ < 10)
println(ii.next())
}
To provide and get access to the info, the code would be modified only slightly:
import info.Info // line added
object Test extends App {
def dropCounterIterator[A](iter: Iterator[A]) = new Iterator[A] {
def hasNext = iter.hasNext
def next() = iter.next()
// note overloaded variant because of extra parameter list, not overriden
def dropWhile(p: (A) => Boolean)(implicit info: Info[Int]): Iterator[A] = {
var count = 0
var current: Option[A] = None
while (hasNext && p({current = Some(next()); current.get})) { count += 1 }
info.data = Some(count) // line added here
current match {
case Some(a) => Iterator.single(a) ++ this
case None => Iterator.empty
}
}
}
val i = dropCounterIterator(Iterator.from(1))
val info = implicitly[Info[Int]] // line added here
val ii = i.dropWhile((x: Int) => x < 10)(info) // line modified
println(ii.next())
println(info.data.get) // line added here
}
Note that for some reason the type inference is affected and I had to annotate the type of the function passed to dropWhile.
You want dropWhileM with the State monad threading a counter through the computation.