What is the default Maximum Size for a Google Guava Cache? - guava

Given a Guava cache created with the code below is there a maximum cache size if one is not set?
LoadingCache<String, String> loadingCache = CacheBuilder.newBuilder().build(new CacheLoader<String, String>() {
#Override
public String load(String key) throws Exception
{
return key.toUpperCase();
}
});
In my case I really want a cache with no upper bound in size. I am using the cache to store permissions for logged in users and will evict items from the cache on user logout or session expiry.

The default cache is unbounded: as the javadoc for CacheBuilder explains
These features are all optional
and
By default cache instances created by CacheBuilder will not perform any type of eviction.

The simple answer is no limit, if your mean “default” is CacheBuilder.maximumSize() is never called.
And I don’t think your application needs size-based evict strategy. When user sessions expires, just remove it from cache (Cache.invalidate(key)).
And String upper case doesn't need cache, call upper case directly is much easier and effective than cache.

Related

spray-cache: Return old value if the future fails

We are using spray-cache (can't move to akka-http yet) to cache results from a downstream service we are calling. The effect we want is, if the data is more than 15 minutes old, do the call, otherwise return the cached data.
Our problem is that, if the service call fails, spray-cache will remove the entry from the cache. What we need is to return the old cached data (even if it's stale), and retry the downstream request when the next request comes in.
It looks like Spray does not ship with a default cache implementation that does what you want. According to the spray-caching docs there are two implementations to the Cache trait: SimpleLruCache and ExpiringLruCache.
What you want is a Cache that distinguishes entry expiration (removal of the entry from the cache) from entry refresh (fetching or calculating a more recent copy of the entry).
Since both default implementations merge these two concepts into a single timeout value I think your best bet will be a write a new Cache implementation that distinguishes refresh from expiration.

scalacache memoization asynchronous refresh

I'd like to do a TTL based memoization with active refresh asynchronously in scala.
ScalaCache example in the documentation allows for TTL based memoization as follows:
import scalacache._
import memoization._
implicit val scalaCache = ScalaCache(new MyCache())
def getUser(id: Int): User = memoize(60 seconds) {
// Do DB lookup here...
User(id, s"user${id}")
}
Curious whether the DB lookup gets triggered after TTL expires for existing value, synchronously and lazily during the next getUser invocation, or if the refresh happens aggressively and asynchronously - even before the next getUser call.
If the ScalaCache implementation is synchronous, is there an alternate library that provides ability to refresh cache actively and asynchronously ?
Expiration and refresh are closely related but different mechanisms. An expired entry is considered stale and cannot be used, so it must be discarded and refetched. An entry eligible for being refreshed means that the content is still valid to use, but the data should be refetched as it may be out of date. Guava provides these TTL policies under the names expireAfterWrite and refreshAfterWrite, which may be used together if the refresh time is smaller than the expiration time.
The design of most caches prefer discarding unused content. An active refresh would require a dedicated thread that reloads entries regardless of whether they have been used. Therefore most caching libraries do not provide active refresh themselves, but make it easy for applications to add that customization on top.
When a read in Guava detects that the entry is eligible for refresh, that caller will perform the operation. All subsequent reads while the refresh is in progress will obtain the current value. This means that to the refresh is performed synchronously on the user's thread that triggered it, and asynchronously to other threads reading that value. A refresh may be fully asynchronous if CacheLoader.reload is overridden to perform the work on an executor.
Caffeine is a rewrite of Guava's cache and differs slightly by always performing the refresh asynchronously to a user's thread. The cache delegates the operation to an executor, by default ForkJoinPool.commonPool which is a JVM-wide executor. The Policy api provides means of inspecting the runtime state of the cache, such as the age of an entry, for adding application-specific custom behavior.
For other ScalaCache backends support is mixed. Ehcache has a RefreshAheadCache decorator that refreshes lazily using its own threadpool. Redis and memcached do not refresh as they are not aware of the system of record. LruMap has expiration support grafted on and does not have any refresh capabilities.

Statelessness of REST

I am creating a REST service that has two methods one is GetAll and other is GetById.In my scenario, database request is very costly so i want to store output of GetAll somewhere (Cache) and use it for subsequent request GetById.
One of the characteristic of REST is it should be Statelessness. A request cannot be dependent on a past request and a service treats each request independently.
I want to understand what should be ideal approach to handle such scenarios or how to design this requirement in REST?
The proper way to achieve what you want is by using caching, like MemoryCache.
You create a separate, private function which fetches all the data and caches it in memory. Then you can have both GetAll and GetById use that function.
Your service will remain stateless.
MemoryCache usage example
MemoryCache cache = MemoryCache.Default;
string cacheName = "MyCache";
if (!cache.Contains(cacheName) || cache[cacheName] == null)
{
// get data
var data = ...
// cache data
cache.Set(cacheName, data, new CacheItemPolicy() { SlidingExpiration = DateTime.Now.AddDays(1).TimeOfDay });
}
return cache[cacheName];
The requirement for statelessness in REST is that the service should appear to be stateless. It doesn't matter if the service maintains some state internally. That's just an implementation detail.
there are many solutions to this. one solution I would recommend you is the following:
configure infinispan to use it as a cache mechanism. inside there store a concurrent hashmap with key the query and with value the resultset of your database,
second time you want to getAll, check if the query exists as a key to your cache, if yes retrieve value from cache, else contact database and then insert the result in the cache.

Atomic get and delete in memcached?

Is there a way to do atomic get-and-delete in memcached?
In other words, I want to get the value for a key if it exists and delete it immediately, so this value can be read once and only once.
I think this pseudocode might work, but note the caveat postscript:
# When setting:
SET key-0 value
SET key-ns 0
# When getting:
ns = INCR key-ns
GET key-{ns - 1}
Constraint: I have millions of keys that could be accessed millions of times, and only a small percentage will have a value set at any given time. I don't want to have to update an atomic counter for every key with every get access request as above.
The canonical, but yet generic, answer to your question is : lock free hash table with a relaxed memory model.
The more relaxed is your memory model the more you get with a good lock free design, it's a way to get more performance out of the same chipset.
Here is a talk about that, I don't think that it's possible to answer to your question with a single post on hash tables and lock free programming, I'm not even trying to do that.
You cannot do this with memcached in a single command since there is no api that supports exactly what your asking for. What I would do to get the behavior your looking for is to implement some sort of marking behavior to signify that another client has or hasn't read the data. For example, you could create a JSON document as follows:
{
"data": "value",
"used": false
}
When you get the item check to see if it has already been used by another client by examining the used field. If it hasn't been used then set the value using the cas you got from the GET command and make sure that the document is updated to reflect the fact that a client has already accessed this key.
If the set operation fails because the cas is invalid then this means that another client has obtained this item and already updated it in memcached to signify that it has been used. In this case you just cancel whatever you were doing with the item and move on.
If the set operation succeeds then this means you client is the sole owner of this data. You can now delete it from memcached and do whatever processing on it you like.
Note that when doing the set I would also add an expiration time of about 5 seconds. This way if you application crashes your documents will clean themselves up if you don't finish with the entire process of deleting them.
To put some code to the answer from #mikewied, I think the basic gist is... (using Node.js):
var Memcached = require('memcached');
var memcache = new Memcached('localhost:11211');
var getOnce = function(key, callback) {
// gets is the check-and-set get (vs regular get)
memcache.gets(key, function(err, data) {
if (!data) {
// Cache miss, nothing to see here.
callback(null);
} else {
var yourData = data[key];
// Do a check-and-set to remove the data from the cache.
// This sets the value to null *only* if no one else already did.
memcache.cas(key, null /* new data */, data.cas, 10, function(err) {
if (err) {
// Check-and-set failed! (Here we'll treat it like a cache miss)
yourData = null;
}
callback(yourData);
});
}
});
};
I'm not an expert on Memcached and so I may be wrong. My answer is from reading the documentation and my experience using Memcached.
IMO this is not possible with memcached's current implementation.
to demonstrate why this is not possible currently here is a simple example to demonstrate the race condition:
two processes start at the same time
both execute a get/delete at the same time
memcached replies to both get commands at the same time
done (the desired result was to have get/delete execute atomically then the second get/delete to fail. instead memcached did get, get, delete, fails to delete)
to get an atomic get/delete would require:
a new command for memcached that is atomic let's call it get_delete
some sort of synchronization lock method of all the memcached clients to ensure both the get and delete commands are executed while the lock is held
so all clients would grab the synchronization lock whenever they need to enter the critcal section (i.e. get, delete) then release the lock after the critical section

xmpp pubsub understanding

The subscriber will only receive content from the moment he is subscripting to a node and all old content published by publisher will not be received by subscriber. Is this correct? May i know, what do i need to do in order for subscriber to receive all previous old content ?
You can configure your nodes to be persistent or transient. According to the specifictaion (XEP-0060):
Whether the node is persistent or transient is determined by the "pubsub#persist_items" configuration field.
However, your pubsub service (or server) might be configured to ignore persistence of events. (If you're using Openfire, I think there is a configurable limit for the maximum total size of stored items)
As I know you are using smackx-pubsub, here's some code:
// create new node
pubSubManager.createNode(nodeId, newConfigureForm(persistent, includePayload, accessModel)
// change existing node
node.sendConfigurationForm(newConfigureForm(persistent, includePayload, accessModel));
private ConfigureForm newConfigureForm(final boolean persistent, final boolean includePayload, final AccessModel accessModel) {
final ConfigureForm form = new ConfigureForm(FormType.submit);
form.setPersistentItems(persistent);
form.setDeliverPayloads(includePayload);
form.setAccessModel(accessModel);
return form;
}
PS: Can you tell me why I get the feeling that we're doing a kind of pair programming here? ;)