FIFO queue via memcache backend - memcached

Is it plausible to implement a FIFO queue with memcached?
I found a recipe for redis
As far as I know memcached has not list type. The append() method of memcached looks very basic.
Is it plausible to implement a FIFO with memcached?

Is it plausible to implement a FIFO queue with memcached? Yes. Is it practical no. The following is a very basic implementation with python.
from pylibmc import Client
class Queue:
"""A sample implementation of a First-In-First-Out
data structure."""
def __init__(self):
self.in_ptr = 0
self.out_ptr = 0
self.client = Client(['/tmp/memcached.sock'])
def push(self, obj):
key = 'fifoq{0}'.format(self.in_ptr)
self.client.add(key, obj)
self.in_ptr += 1
def pop(self):
if self.in_ptr > self.out_ptr:
key = 'fifoq{0}'.format(self.out_ptr)
ret = self.client.get(key)
self.out_ptr += 1
return ret
return None
This has no thread safety, does not do type checking and does not catch any exceptions. Once you add those features it will work reasonably well as a FIFO queue but I am not convinced that using memcached is the ideal solution. As you have discovered yourself redis already has this feature and it will be much more robust and scalable.

No, because you can never be sure that a chunk of data is going to be available in the queue when using memcached. Data may be evicted any time.

Related

using streams vs actors for periodic tasks

Im working with akka/scala/play stack.
Usually, im using stream to perform certain tasks. for example, I have a stream that wakes every minute, picks up something from the DB, and call another service to enrich its data using an API and save the enrichment to the DB.
something like this:
class FetcherAndSaveStream #Inject()(fetcherAndSaveGraph: FetcherAndSaveGraph, dbElementsSource: DbElementsSource)
(implicit val mat: Materializer,
implicit val exec: ExecutionContext) extends LazyLogging {
def graph[M1, M2](source: Source[BDElement, M1],
sink: Sink[BDElement, M2],
switch: SharedKillSwitch): RunnableGraph[(M1, M2)] = {
val fetchAndSaveDataFromExternalService: Flow[BDElement, BDElement, NotUsed] =
fetcherAndSaveGraph.fetchEndSaveEnrichment
source.viaMat(switch.flow)(Keep.left)
.via(fetchAndSaveDataFromExternalService)
.toMat(sink)(Keep.both).withAttributes(supervisionStrategy(resumingDecider))
}
def runGraph(switchSharedKill: SharedKillSwitch): (NotUsed, Future[Done]) = {
logger.info("FetcherAndSaveStream is now running")
graph(dbElementsSource.dbElements(), Sink.ignore, switchSharedKill).run()
}
}
I wonder, is this better than just using an actor that ticks every minute and do something like that? what is the comparison between using actors for this and stream?
trying to figure out still when should I choose which method (streams/actors). thanks!!
You can use both, depending on the requirements you have for your solution which are not listed there. The general concern you need to take into consideration - actors more low-level stuff than streams, so they require more code and debug.
Basically, streams are good for tasks where you have a relatively big amount of data you need to process with low memory consumption. With streams, you won't need to start to stream each n seconds, you can set this stream to run along with the application. That could make your code more concise by omitting scheduler logic.
I will omit your DI and architecture stuff, write solution with pseudocode:
val yourConsumer: Sink[YourDBRecord] = ???
val recordsSource: Source[YourDBRecord] =
val runnableGraph = (Source repeat ())
.throttle(1, n seconds)
.mapAsync(yourParallelism){_ =>
fetchReasonableAmountOfRecordsFromDB
} mapConcat identity to yourConsumer
This stream will do your stuff. You even can enhance it with more sophisticated logic to adapt the polling rate according to workloads using feedback loop in graph api. Also, you can add the error-handling strategy you need to resume in place your stream has crashed.
Moreover, there's alpakka connectors for DBS capable of doing so, you can see if solutions there fit your purpose, or check for implementation details.
What you can get by doing so - backpressure, ability to work with streams, clean and concise code with no timed automata managed directly by you.
https://doc.akka.io/docs/akka/current/stream/stream-rate.html
You can also create an actor, but then you should do all the things akka streams do for you by hand, i.e. back-pressure in case you want to interop with streams, scheduler, chunking and memory management(to not to load 100000 or so entries in one batch to memory), etc.

Asynchronous IO (socket) in Scala

import java.nio.channels.{AsynchronousServerSocketChannel, AsynchronousSocketChannel}
import java.net.InetSocketAddress
import scala.concurrent.{Future, blocking}
class Master {
val server: AsynchronousServerSocketChannel = AsynchronousServerSocketChannel.open()
server.bind(new InetSocketAddress("localhost", port))
val client: Future[AsynchronousSocketChannel] = Future { blocking { server.accept().get() } }
}
This is a pseudo code what I'm trying.
Before asking this question, I searched about it and found a related answer.
In her answer, I wonder what this means: "If you want to absolutely prevent additional threads from ever being created, then you ought to use an AsyncIO library, such as Java's NIO library."
Since I don't want to suffer from either running out of memory (case when using blocking) or thread pool hell (opposite case) , her answer was exactly what I have been looking forward. However, as you can see in my pseudo code, a new thread will be created for each client (I made just one client for the sake of simplicity) due to blocking even though I used NIO as she said.
Please explain her suggestion with a simple example.
Is my pseudo code an appropriate approach when trying to asynchronous io in Scala or is there a better alternative way?
Answer to Question 1
She suggests two things
a) If you are using future with blocking call then use scala.concurrent.blocking. blocking tells the default execution context to spawn temporary threads to stop starvation.
lets say blockingServe() does blocking. To do multiple blockingServes we use Futures.
Future {
blockingServe() //blockingServe() could be serverSocket.accept()
}
But above code leads to starvation in evented model. In order to deal with starvation. We have to ask execution context to create new temporary threads to serve extra requests. This is communicated to execution context using scala.concurrent.blocking
Future {
blocking {
blockingServe() //blockingServe() could be serverSocket.accept()
}
}
b)
We have not achieved non-blocking still. Code is still blocking but blocking in different thread (asynchronous)
How can we achieve true non blocking ?
We can achieve true non-blocking using non-blocking api.
So, in your case you have to use java.nio.channels.ServerSocketChannel to go for true non blocking model.
Notice in your code snippet you have mixed a) and b) which not required
Answer to Question 2
val selector = Selector.open()
val serverChannel = ServerSocketChannel.open()
serverChannel.configureBlocking(false)
serverChannel.socket().bind(new InetSocketAddress("192.168.2.1", 5000))
serverChannel.register(selector, SelectionKey.OP_ACCEPT)
def assignWork[A](serverChannel: ServerSocketChannel, selector: Selector, work: => Future[A]) = {
work
//recurse
}
assignWork[Unit](serverChannel, selector, Future(()))
}

Akka HTTP: Blocking in a future blocks the server

I am trying to use Akka HTTP to basic authenticate my request.
It so happens that I have an external resource to authenticate through, so I have to make a rest call to this resource.
This takes some time, and while it's processing, it seems the rest of my API is blocked, waiting for this call.
I have reproduced this with a very simple example:
// used dispatcher:
implicit val system = ActorSystem()
implicit val executor = system.dispatcher
implicit val materializer = ActorMaterializer()
val routes =
(post & entity(as[String])) { e =>
complete {
Future{
Thread.sleep(5000)
e
}
}
} ~
(get & path(Segment)) { r =>
complete {
"get"
}
}
If I post to the log endpoint, my get endpoint is also stuck waiting for the 5 seconds, which the log endpoint dictated.
Is this expected behaviour, and if is, how do I make blocking operations without blocking my entire API?
What you observe is expected behaviour – yet of course it's very bad. Good that known solutions and best practices exist to guard against it. In this answer I'd like to spend some time to explain the issue short, long, and then in depth – enjoy the read!
Short answer: "don't block the routing infrastructure!", always use a dedicated dispatcher for blocking operations!
Cause of the observed symptom: The problem is that you're using context.dispatcher as the dispatcher the blocking futures execute on. The same dispatcher (which is in simple terms just a "bunch of threads") is used by the routing infrastructure to actually handle the incoming requests – so if you block all available threads, you end up starving the routing infrastructure. (A thing up for debate and benchmarking is if Akka HTTP could protect from this, I'll add that to my research todo-list).
Blocking must be treated with special care to not impact other users of the same dispatcher (which is why we make it so simple to separate execution onto different ones), as explained in the Akka docs section: Blocking needs careful management.
Something else I wanted to bring to attention here is that one should avoid blocking APIs at all if possible - if your long running operation is not really one operation, but a series thereof, you could have separated those onto different actors, or sequenced futures. Anyway, just wanted to point out – if possible, avoid such blocking calls, yet if you have to – then the following explains how to properly deal with those.
In-depth analysis and solutions:
Now that we know what is wrong, conceptually, let's have a look what exactly is broken in the above code, and how the right solution to this problem looks like:
Colour = thread state:
turquoise – SLEEPING
orange - WAITING
green - RUNNABLE
Now let's investigate 3 pieces of code and how the impact the dispatchers, and performance of the app. To force this behaviour the app has been put under the following load:
[a] keep requesting GET requests (see above code in initial question for that), it's not blocking there
[b] then after a while fire 2000 POST requests, which will cause the 5second blocking before returning the future
1) [bad] Dispatcher behaviour on bad code:
// BAD! (due to the blocking in Future):
implicit val defaultDispatcher = system.dispatcher
val routes: Route = post {
complete {
Future { // uses defaultDispatcher
Thread.sleep(5000) // will block on the default dispatcher,
System.currentTimeMillis().toString // starving the routing infra
}
}
}
So we expose our app to [a] load, and you can see a number of akka.actor.default-dispatcher threads already - they're handling the requests – small green snippet, and orange meaning the others are actually idle there.
Then we start the [b] load, which causes blocking of these threads – you can see an early thread "default-dispatcher-2,3,4" going into blocking after being idle before. We also observe that the pool grows – new threads are started "default-dispatcher-18,19,20,21..." however they go into sleeping immediately (!) – we're wasting precious resource here!
The number of the such started threads depends on the default dispatcher configuration, but likely will not exceed 50 or so. Since we just fired 2k blocking ops, we starve the entire threadpool – the blocking operations dominate such that the routing infra has no thread available to handle the other requests – very bad!
Let's do something about it (which is an Akka best practice incidentally – always isolate blocking behaviour like shown below):
2) [good!] Dispatcher behaviour good structured code/dispatchers:
In your application.conf configure this dispatcher dedicated for blocking behaviour:
my-blocking-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
// in Akka previous to 2.4.2:
core-pool-size-min = 16
core-pool-size-max = 16
max-pool-size-min = 16
max-pool-size-max = 16
// or in Akka 2.4.2+
fixed-pool-size = 16
}
throughput = 100
}
You should read more in the Akka Dispatchers documentation, to understand the various options here. The main point though is that we picked a ThreadPoolExecutor which has a hard limit of threads it keeps available for the blocking ops. The size settings depend on what your app does, and how many cores your server has.
Next we need to use it, instead of the default one:
// GOOD (due to the blocking in Future):
implicit val blockingDispatcher = system.dispatchers.lookup("my-blocking-dispatcher")
val routes: Route = post {
complete {
Future { // uses the good "blocking dispatcher" that we configured,
// instead of the default dispatcher – the blocking is isolated.
Thread.sleep(5000)
System.currentTimeMillis().toString
}
}
}
We pressure the app using the same load, first a bit of normal requests and then we add the blocking ones. This is how the ThreadPools will behave in this case:
So initially the normal requests are easily handled by the default dispatcher, you can see a few green lines there - that's actual execution (I'm not really putting the server under heavy load, so it's mostly idle).
Now when we start issuing the blocking ops, the my-blocking-dispatcher-* kicks in, and starts up to the number of configured threads. It handles all the Sleeping in there. Also, after a certain period of nothing happening on those threads, it shuts them down. If we were to hit the server with another bunch of blocking the pool would start new threads that will take care of the sleep()-ing them, but in the meantime – we're not wasting our precious threads on "just stay there and do nothing".
When using this setup, the throughput of the normal GET requests was not impacted, they were still happily served on the (still pretty free) default dispatcher.
This is the recommended way of dealing with any kind of blocking in reactive applications. It often is referred to as "bulkheading" (or "isolating") the bad behaving parts of an app, in this case the bad behaviour is sleeping/blocking.
3) [workaround-ish] Dispatcher behaviour when blocking applied properly:
In this example we use the scaladoc for scala.concurrent.blocking method which can help when faced with blocking ops. It generally causes more threads to be spun up to survive the blocking operations.
// OK, default dispatcher but we'll use `blocking`
implicit val dispatcher = system.dispatcher
val routes: Route = post {
complete {
Future { // uses the default dispatcher (it's a Fork-Join Pool)
blocking { // will cause much more threads to be spun-up, avoiding starvation somewhat,
// but at the cost of exploding the number of threads (which eventually
// may also lead to starvation problems, but on a different layer)
Thread.sleep(5000)
System.currentTimeMillis().toString
}
}
}
}
The app will behave like this:
You'll notice that A LOT of new threads are created, this is because blocking hints at "oh, this'll be blocking, so we need more threads". This causes the total time we're blocked to be smaller than in the 1) example, however then we have hundreds of threads doing nothing after the blocking ops have finished... Sure, they will eventually be shut down (the FJP does this), but for a while we'll have a large (uncontrolled) amount of threads running, in contrast to the 2) solution, where we know exactly how many threads we're dedicating for the blocking behaviours.
Summing up: Never block the default dispatcher :-)
The best practice is to use the pattern shown in 2), to have a dispatcher for the blocking operations available, and execute them there.
Discussed Akka HTTP version: 2.0.1
Profiler used: Many people have asked me in response to this answer privately what profiler I used to visualise the Thread states in the above pics, so adding this info here: I used YourKit which is an awesome commercial profiler (free for OSS), though you can achieve the same results using the free VisualVM from OpenJDK.
Strange, but for me everything works fine (no blocking). Here is code:
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server.Route
import akka.stream.ActorMaterializer
import scala.concurrent.Future
object Main {
implicit val system = ActorSystem()
implicit val executor = system.dispatcher
implicit val materializer = ActorMaterializer()
val routes: Route = (post & entity(as[String])) { e =>
complete {
Future {
Thread.sleep(5000)
e
}
}
} ~
(get & path(Segment)) { r =>
complete {
"get"
}
}
def main(args: Array[String]) {
Http().bindAndHandle(routes, "0.0.0.0", 9000).onFailure {
case e =>
system.shutdown()
}
}
}
Also you can wrap you async code into onComplete or onSuccess directive:
onComplete(Future{Thread.sleep(5000)}){e}
onSuccess(Future{Thread.sleep(5000)}){complete(e)}

How to implement actor model without Akka?

How to implement simple actors without Akka? I don't need high-performance for many (non-fixed count) actor instances, green-threads, IoC (lifecycle, Props-based factories, ActorRef's), supervising, backpressure etc. Need only sequentiality (queue) + handler + state + message passing.
As a side-effect I actually need small actor-based pipeline (with recursive links) + some parallell actors to optimize the DSP algorithm calculation. It will be inside library without transitive dependencies, so I don't want (and can't as it's a jar-plugin) to push user to create and pass akkaSystem, the library should have as simple and lightweight interface as possible. I don't need IoC as it's just a library (set of functions), not a framework - so it has more algorithmic complexity than structural. However, I see actors as a good instrument for describing protocols and I actually can decompose the algorithm to small amount of asynchronously interacting entities, so it fits to my needs.
Why not Akka
Akka is heavy, which means that:
it's an external dependency;
has complex interface and implementation;
non-transparent for library's user, for example - all instances are managed by akka's IoC, so there is no guarantee that one logical actor is always maintained by same instance, restart will create a new one;
requires additional support for migration which is comparable with scala's migration support itself.
It also might be harder to debug akka's green threads using jstack/jconsole/jvisualvm, as one actor may act on any thread.
Sure, Akka's jar (1.9Mb) and memory consumption (2.5 million actors per GB) aren't heavy at all, so you can run it even on Android. But it's also known that you should use specialized tools to watch and analyze actors (like Typesafe Activator/Console), which user may not be familiar with (and I wouldn't push them to learn it). It's all fine for enterprise project as it almost always has IoC, some set of specialized tools and continuous migration, but this isn't good approach for a simple library.
P.S. About dependencies. I don't have them and I don't want to add any (I'm even avoiding the scalaz, which actually fits here a little bit), as it will lead to heavy maintenance - I'll have to keep my simple library up-to-date with Akka.
Here is most minimal and efficient actor in the JVM world with API based on Minimalist Scala actor from Viktor Klang:
https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/src/test/scala/com/github/gist/viktorklang/Actor.scala
It is handy and safe in usage but isn't type safe in message receiving and cannot send messages between processes or hosts.
Main features:
simplest FSM-like API with just 3 states (Stay, Become and Die): https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/src/test/scala/com/github/gist/viktorklang/Actor.scala#L28-L30
minimalistic error handling - just proper forwading to the default exception handler of executor threads: https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/src/test/scala/com/github/gist/viktorklang/Actor.scala#L52-L53
fast async initialization that takes ~200 ns to complete, so no need for additional futures/actors for time consuming actor initialization: https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/out0.txt#L447
smallest memory footprint, that is ~40 bytes in a passive state (BTW the new String() spends the same amout of bytes in the JVM heap): https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/out0.txt#L449
very efficient in message processing with throughput ~90M msg/sec for 4 core CPU: https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/out0.txt#L466
very efficient in message sending/receiving with latency ~100 ns: https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/out0.txt#L472
per actor tuning of fairness by the batch parameter: https://github.com/plokhotnyuk/actors/blob/41eea0277530f86e4f9557b451c7e34345557ce3/src/test/scala/com/github/gist/viktorklang/Actor.scala#L32
Example of stateful counter:
def process(self: Address, msg: Any, state: Int): Effect = if (state > 0) {
println(msg + " " + state)
self ! msg
Become { msg =>
process(self, msg, state - 1)
}
} else Die
val actor = Actor(self => msg => process(self, msg, 5))
Results:
scala> actor ! "a"
a 5
scala> a 4
a 3
a 2
a 1
This will use FixedThreadPool (and so its internal task queue):
import scala.concurrent._
trait Actor[T] {
implicit val context = ExecutionContext.fromExecutor(java.util.concurrent.Executors.newFixedThreadPool(1))
def receive: T => Unit
def !(m: T) = Future { receive(m) }
}
FixedThreadPool with size 1 guarantees sequentiality here. Of course it's NOT the best way to manage your threads if you need 100500 dynamically created actors, but it's fine if you need some fixed amount of actors per application to implement your protocol.
Usage:
class Ping(pong: => Actor[Int]) extends Actor[Int] {
def receive = {
case m: Int =>
println(m)
if (m > 0) pong ! (m - 1)
}
}
object System {
lazy val ping: Actor[Int] = new Ping(pong) //be careful with lazy vals mutual links between different systems (objects); that's why people prefer ActorRef
lazy val pong: Actor[Int] = new Ping(ping)
}
System.ping ! 5
Results:
import scala.concurrent._
defined trait Actor
defined class Ping
defined object System
res17: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise#6be61f2c
5
4
3
2
1
0
scala> System.ping ! 5; System.ping ! 7
5
7
4
6
3
5
2
res19: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise#54b053b1
4
1
3
0
2
1
0
This implementation is using two Java threads, so it's "twice" faster than counting without parallelization.

Playframework non-blocking Action

Came across a problem I did not find an answer yet.
Running on playframework 2 with Scala.
Was required to write an Action method that performs multiple Future calls.
My question:
1) Is the attached code non-blocking and hence looking the way it should be ?
2) Is there a guarantee that both DAO results are caught at any given time ?
def index = Action.async {
val t2:Future[Tuple2[List[PlayerCol],List[CreatureCol]]] = for {
p <- PlayerDAO.findAll()
c <- CreatureDAO.findAlive()
}yield(p,c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
Thanks for your feedback.
Is the attached code non-blocking and hence looking the way it should be ?
That depends on a few things. First, I'm going to assume that PlayerDAO.findAll() and CreatureDAO.findAlive() return Future[List[PlayerCol]] and Future[List[CreatureCol]] respectively. What matters most is what these functions are actually calling themselves. Are they making JDBC calls, or using an asynchronous DB driver?
If the answer is JDBC (or some other synchronous db driver), then you're still blocking, and there's no way to make it fully "non-blocking". Why? Because JDBC calls block their current thread, and wrapping them in a Future won't fix that. In this situation, the most you can do is have them block a different ExecutionContext than the one Play is using to handle requests. This is generally a good idea, because if you have several db requests running concurrently, they can block Play's internal thread pool used for handling HTTP requests, and suddenly your server will have to wait to handle other requests (even if they don't require database calls).
For more on different ExecutionContexts see the thread pools documentation and this answer.
If you're answer is an asynchronous database driver like reactive mongo (there's also scalike-jdbc, and maybe some others), then you're in good shape, and I probably made you read a little more than you had to. In that scenario your index controller function would be fully non-blocking.
Is there a guarantee that both DAO results are caught at any given time ?
I'm not quite sure what you mean by this. In your current code, you're actually making these calls in sequence. CreatureDAO.findAlive() isn't executed until PlayerDAO.findAll() has returned. Since they are not dependent on each other, it seems like this isn't intentional. To make them run in parallel, you should instantiate the Futures before mapping them in a for-comprehension:
def index = Action.async {
val players: Future[List[PlayerCol]] = PlayerDAO.findAll()
val creatures: Future[List[CreatureCol]] = CreatureDAO.findAlive()
val t2: Future[(List[PlayerCol], List[CreatureCol])] = for {
p <- players
c <- creatures
} yield (p, c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
The only thing you can guarantee about having both results being completed is that yield isn't executed until the Futures have completed (or never, if they failed), and likewise the body of t2.map(...) isn't executed until t2 has been completed.
Further reading:
Are there any benefits in using non-async actions in Play Framework 2.2?
Understanding the Difference Between Non-Blocking Web Service Calls vs Non-Blocking JDBC