scalacache memoization asynchronous refresh - scala

I'd like to do a TTL based memoization with active refresh asynchronously in scala.
ScalaCache example in the documentation allows for TTL based memoization as follows:
import scalacache._
import memoization._
implicit val scalaCache = ScalaCache(new MyCache())
def getUser(id: Int): User = memoize(60 seconds) {
// Do DB lookup here...
User(id, s"user${id}")
}
Curious whether the DB lookup gets triggered after TTL expires for existing value, synchronously and lazily during the next getUser invocation, or if the refresh happens aggressively and asynchronously - even before the next getUser call.
If the ScalaCache implementation is synchronous, is there an alternate library that provides ability to refresh cache actively and asynchronously ?

Expiration and refresh are closely related but different mechanisms. An expired entry is considered stale and cannot be used, so it must be discarded and refetched. An entry eligible for being refreshed means that the content is still valid to use, but the data should be refetched as it may be out of date. Guava provides these TTL policies under the names expireAfterWrite and refreshAfterWrite, which may be used together if the refresh time is smaller than the expiration time.
The design of most caches prefer discarding unused content. An active refresh would require a dedicated thread that reloads entries regardless of whether they have been used. Therefore most caching libraries do not provide active refresh themselves, but make it easy for applications to add that customization on top.
When a read in Guava detects that the entry is eligible for refresh, that caller will perform the operation. All subsequent reads while the refresh is in progress will obtain the current value. This means that to the refresh is performed synchronously on the user's thread that triggered it, and asynchronously to other threads reading that value. A refresh may be fully asynchronous if CacheLoader.reload is overridden to perform the work on an executor.
Caffeine is a rewrite of Guava's cache and differs slightly by always performing the refresh asynchronously to a user's thread. The cache delegates the operation to an executor, by default ForkJoinPool.commonPool which is a JVM-wide executor. The Policy api provides means of inspecting the runtime state of the cache, such as the age of an entry, for adding application-specific custom behavior.
For other ScalaCache backends support is mixed. Ehcache has a RefreshAheadCache decorator that refreshes lazily using its own threadpool. Redis and memcached do not refresh as they are not aware of the system of record. LruMap has expiration support grafted on and does not have any refresh capabilities.

Related

Creating an atomic process for a netconf edit-config request

I am creating a custom system that, when a user submits a netconf edit-config, it will initiate a set of actions in my system that will atomically alter the configuration of our system and then submit a notification to the user of its success or failure.
Think of it as a big SQL transaction that, at the end, either commits or rolls back.
So, steps
User submits an edit-config
System accepts config and works to implement this config
If the config is successful, sends by a thumbs up response (not sure the formal way of doing this)
If the config is a failure, sends by a thumbs down response (and I will have to make sure the config is rolled back internally)
All this is done atomically. So, if a user submits two configs in a row, they won't conflict with each other.
Our working idea (probably not the best one) to implement this was to go about this by accepting the edit-config and then, within sysrepo, we would edit parts of our leafs with the success or failure flags and they would happen within the same session as the initial change. We were hoping this would keep everything atomic; by doing edits outside of the session, multiple configuration changes could conflict with each other.
We weren't sure to go about this with pure netconf or to leverage sysrepo directly. We noticed all these plugins/bindings made for sysrepo and figured those could be used directly to talk to our datastore.
But that said, our working idea is most likely not best-practice approach. What would be the best way to achieve this?
Our system is:
netopeer 1.1.27
sysrepo 1.4.58
libyang 1.0.167
libnetconf2 1.1.24
And our yang file is
module rxmbn {
namespace "urn:com:zug:rxmbn";
prefix rxmbn;
container rxmbn-config {
config true;
leaf raw {
type string;
}
leaf raw_hashCode {
type int32;
}
leaf odl_last_processed_hashCode {
type int32;
}
leaf processed {
type boolean;
default "false";
}
}
}
Currently we can:
Execute an edit-config to netopeer server
We can see the new config register in the sysrepo datastore
We can capture the moment sysrepo registers the data via sysrepo's API
But we are having problems
Atomically editing the datastore during the update session (due to locks, which is normal. In fact, if there is no way to edit during an update session, that is fine and not necessary. The main goal is the next bullet)
Atomically reacting to the new edit-config and responding to the end user
We are all a bit new to netconf and yang, so I am sure there is some way to leverage the notification api or event api either through the netopeer session or sysrepo, we just don't know enough yet.
If there are any examples or implementation advice to create an atomic transaction for this, that'd be really useful.
I know nothing of sysrepo so this is from a NETCONF perspective.
NETCONF severs process requests serially within a single session in a request-response fashion, meaning that everything you do within a single NETCONF session should already be "atomic" - you cannot send two requests and have them applied in reverse order or in parallel no matter what you do. A well behaved client would also wait for each response from the server before sending a new request, especially if all updates must execute successfully and in specific order. The protocol also defines no way to cancel a request already sent to a server.
If you need to prevent other sessions from modifying a datatstore while another session is performing a multi- edit-config, you use <lock> and <unlock> NETCONF operations to lock the entire datastore. There is also RFC5717 and partial lock, which would only lock a specific branch of the datastore.
Using notifications to report success of an <edit-config> would be highly unusual - that's what <rpc-reply> and <rpc-error> are there for within the same session. You would use notifications to inform other sessions about what's happening. In fact, there are standard base notifications for config changes.
I suggest reading the entire RFC6241 before proceeding further. There are things like candidate datastores, confirmed-commits, etc. you should know about.
Which component are you developing? Netconf client/manager or Netconf server?
In general, the Netconf server should implement individual Netconf RPC operations in an atomic way.
When a Netconf client wants to perform a set of operations in an atomic way, it should follow the procedure explained in Apendix E.1 in RFC 6241.

HTTP GET for 'background' job creation and acquiring

I'm designing API for jobs scheduler. There is one scheduler with some set of resources and DB tables for them. Also there are multiple 'workers' that request 'jobs' from scheduler. Worker can't create job it must only request it. Job must be calculated on the server side. Also job is a dynamic entity and calculated using multiple DB tables and time. There is no 'job' table.
In general this system is very similar to task queue. But without queue. I need a method for worker to request next task. That task should be calculated and assigned for this agent.
Is it OK to use GET verb to retrieve and 'lock' job for the specific worker?
In terms of resources this query does not modify anything. Only internal DB state is updated. For client it looks like fetching records one by one. It doesn't know about internal modifications.
In pure REST style I probably should define a job table and CRUD api for it. Then I would need to create some auxilary service to POST jobs to that table. Then each agent would list jobs using GET and then lock it using PATCH. That approach requires multiple potential retries due to race-conditions. (Job can be already locked by another agent). Also it looks a little bit complicated if I need to assign job to specific agent based on server side logic. In that case I need to implement some check logic on client side to iterate through jobs based on different responces.
This approach looks complicated.
Is it OK to use GET verb to retrieve and 'lock' job for the specific worker?
Maybe? But probably not.
The important thing to understand about GET is that it is safe
The purpose of distinguishing between safe and unsafe methods is to
allow automated retrieval processes (spiders) and cache performance
optimization (pre-fetching) to work without fear of causing harm. In
addition, it allows a user agent to apply appropriate constraints on
the automated use of unsafe methods when processing potentially
untrusted content.
If aggressive cache performance optimization would make a mess in your system, then GET is not the http method you want triggering that behavior.
If you were designing your client interactions around resources, then you would probably have something like a list of jobs assigned to a worker. Reading the current representation of that resource doesn't require that a server change it, so GET is completely appropriate. And of course the server could update that resource for its own reasons at any time.
Requests to modify that resource should not be safe. For instance, if the client is going to signal that some job was completed, that should be done via an unsafe method (POST/PUT/PATCH/DELETE/...)
I don't have such resource. It's an ephymeric resource which is spread across the tables. There is no DB table for that and there is no ID column to update that job. That's another question why I don't have such table but it's current requirement and limitation.
Fair enough, though the main lesson still stands.
Another way of thinking about it is to think about failure. The network is unreliable. In a distributed environment, the client cannot distinguish a lost request from a lost response. All it knows is that it didn't receive an acknowledgement for the request.
When you use GET, you are implicitly telling the client that it is safe (there's that word again) to resend the request. Not only that, but you are also implicitly telling any intermediate components that it is safe to repeat the request.
If there are no adverse effects to handling multiple copies of the same request, the GET is fine. But if processing multiple copies of the same request is expensive, then you should probably be using POST instead.
It's not required that the GET handler be safe -- the standard only describes the semantics of the messages; it doesn't constraint the implementation at all. But any loss of property incurred is properly understood to be the responsibility of the server.

Lagom: Asynchronous Operations in Command Handlers

In Lagom, what do you do when a command handler must perform some asynchronous operations? For example:
override def behavior = Actions().onCommand[MyCommand, Done] {
case (cmd, ctx, state) =>
// some complex code that performs asynchronous operations
// (for example, querying other persistent entities or the read-side
// by making calls that return Future[...] and composing those),
// as summarized in a placeholder below:
val events: Future[Seq[Event]] = ???
events map {
xs => ctx.thenPersistAll(xs: _*) { () => ctx.reply(Done) }
}
}
The problem with the code like that is that the compiler expects the command handler to return Persist, not Future[Persist].
Is this done on purpose, to make sure that the events are persisted in the correct order (that is, the events generated by a prior command must be saved before the events generated by a later command)? But can't that be handled by proper management of the event offsets, so that the journal always orders them correctly, regardless of when they are actually saved?
And what does one do in situations like this, when the command handling is complex enough to require making asynchronous calls from the command handler?
There is a similar question on the mailing list with an answer from James.
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/lagom-framework/Z6lynjNTqgE
In short, your entity in a CQRS application is a consistency boundary and should only depend on data that it's immediately available inside it, not outside (no call to external services).
What you are probably looking for a what it's called Command Enrichment. You receive an request, collect some data from external services and build a command containing everything you need to send to your Entity.
You certainly should not query the read-side to make business decisions inside your write-side entity. You also should not make business decisions on data coming from other entities.
Your entity should be able to make all the decision because it is the consistency boundary of your model.
What I've been doing in these cases is to pass the PersistentEntityRef to the asynchronous operation, so that it can issue commands to the entity and its those command-handlers (not the one that spawned the async computation) that then persist the events.
Just bear in mind that none of this is atomic, so you have to think about what happens if the asynchronous operation fails midway through issuing the commands or if some commands succeed and some fail, etc. Presumably you'll want some retry mechanism for systemic failures. If you build your command-handlers to be idempotent, it will help you deal with the duplicates.

spray-cache: Return old value if the future fails

We are using spray-cache (can't move to akka-http yet) to cache results from a downstream service we are calling. The effect we want is, if the data is more than 15 minutes old, do the call, otherwise return the cached data.
Our problem is that, if the service call fails, spray-cache will remove the entry from the cache. What we need is to return the old cached data (even if it's stale), and retry the downstream request when the next request comes in.
It looks like Spray does not ship with a default cache implementation that does what you want. According to the spray-caching docs there are two implementations to the Cache trait: SimpleLruCache and ExpiringLruCache.
What you want is a Cache that distinguishes entry expiration (removal of the entry from the cache) from entry refresh (fetching or calculating a more recent copy of the entry).
Since both default implementations merge these two concepts into a single timeout value I think your best bet will be a write a new Cache implementation that distinguishes refresh from expiration.

Passing values from request to all the layers below controller

If a Play controller retrieves a values from the Request (e.g. logged in user and his role) and those values need to be passed to all the layers down to controllers (e.g. service layer, DAO layer etc) what's the best way to create a "threadlocal" type of object, which can be used by any class in the Application to retrieve those "user" and "userRole" values for that particular request? I am trying to avoid adding implicit parameters to a bunch of methods and Play Cache doesn't look like an appropriate fit here. Also play's different scope (session, flash etc) wouldn't behave right given all the code is asynchronous. Controller methods are async, service methods returns Future etc. That "threadlocal" type of effect in an asynchronous environment is desired.
Alternatives that are not a good fit
These alternatives are probably not helpful, because they assume a global state accessible by all functions across the processing of a request:
Thread local storage is a technique that is helpful for applications that process the request in a single thread, and that block until a response is generated. Although it's possible to do this with Play Framework, it's usually not the optimal design, since Play's strengths are of more benefit for asynchronous, non-blocking applications.
Session and flash are meant to carry data across HTTP requests. They're not globally available to all classes in an application; it would be necessary to pass the modified request across function calls to retrieve them.
A cache could in theory be used to carry this information, but it would have to have a unique key for each request, and it would be necessary to pass this key in each function call. Additionally, it would be necessary to make sure the cache data is not at risk of being evicted while processing the request, not even when cache memory is full.
Alternatives that may be a good fit
Assuming the controller, possibly though the Action call, retrieves the security data (user, role, etc.), and that the controller only deals with validating the request and generating a response, delegating domain logic to a domain object (possibly a service object):
Using the call stack: Pass the security data to all functions that need it, through an implicit parameter. Although the question is about finding an alternative to doing that, this approach makes it explicit what is being sent to the called function, and which functions require this data, instead of resorting to state maintained elsewhere.
Using OOP: Pass the security data in the constructor of the domain object, and in the domain object's methods, retrieve the security data from the object's instance.
Using actors: Pass the security data in the message sent to the actor.
If a domain object's method calls a function that also needs the security data, the same pattern would be applied: either pass it as (a possibly implicit) parameter, through a constructor, or in a message.