How does timed cache expiry work? - guava

I know that Guava Cache allows individual caches to be configured with an expiry time. Does Guava do this using a timer that wakes up after a configured number of seconds to invalidate the cache?
I have a transaction that is long-running. Whatever is in the cache at the start of the transaction, I would like it to continue till the end of the transaction. So even if the number of seconds of validity of a cache gets expired during the transaction, the values accessed from the cache should remain intact till we reach the end of transaction. Is this possible in Guava?
Thanks,
Yash

From When Does Cleanup Happen? · CachesExplained · google/guava Wiki:
Caches built with CacheBuilder do not perform cleanup and evict values
"automatically," or instantly after a value expires, or anything of
the sort. Instead, it performs small amounts of maintenance during
write operations, or during occasional read operations if writes are
rare.
The reason for this is as follows: if we wanted to perform Cache
maintenance continuously, we would need to create a thread, and its
operations would be competing with user operations for shared locks.
Additionally, some environments restrict the creation of threads,
which would make CacheBuilder unusable in that environment.
Instead, we put the choice in your hands. If your cache is
high-throughput, then you don't have to worry about performing cache
maintenance to clean up expired entries and the like. If your cache
does writes only rarely and you don't want cleanup to block cache
reads, you may wish to create your own maintenance thread that calls
Cache.cleanUp() at regular intervals.
If you want to schedule regular cache maintenance for a cache which
only rarely has writes, just schedule the maintenance using
ScheduledExecutorService.
As such, if you are only doing reads you "might" be good if you do a Cache.cleanUp() just before and after your transaction but there is still no guarantee.
However, instead of trying to force items to stay in the cache you might instead simply evict items to another cache/map using a removalListener and then when you read you will first need to check the cache and then, if it wasn't there, check the items evicted during the long-running transaction.
The following is an oversimplified example:
Map<Integer, String> evicted = new ConcurrentHashMap<>();
Cache<Integer, String> cache = CacheBuilder.newBuilder()
.expireAfterAccess(2, SECONDS)
.removalListener((RemovalListener<Integer, String>) notification ->
evicted.put(notification.getKey(), notification.getValue()))
.build();
assert evicted.size() == 0 && cache.size() == 0;
cache.put(0, "a");
cache.put(1, "b");
cache.put(2, "c");
assert evicted.size() == 0 && cache.size() == 3;
sleepUninterruptibly(1, SECONDS);
assert evicted.size() == 0 && cache.size() == 3;
cache.put(3, "d");
assert evicted.size() == 0 && cache.size() == 4;
sleepUninterruptibly(1, SECONDS);
cache.cleanUp();
assert evicted.size() == 3 && cache.size() == 1;
Integer key = 2;
String value;
{
value = cache.getIfPresent(key);
if (value == null) value = evicted.get(key);
}
assert Objects.equals(value, "c");
Your actual code would need to conditionally put into evicted, clean-up evicted, manage multiple evicted objects if your running long-running transactions concurrently or use a common cache between the threads with a different eviction strategy, etc. but hopefully this demonstrates the idea sufficiently to get you started.

Related

Enforce max threshold of tracked connections in eBPF

I have an kprobe ebpf program that tracks a number of active TCP connections. In order to reduce the overhead, I set an upper limit on the number of the TCP connections that can be tracked simultaneously. Thus, I have to maintain a counter in the ebpf program so that when a new connection establishes, the counter is increased if it is below the limit, and when a connection finishes, the counter is decreased. Since the program is reentrant, the manipulation of the counter should be atomic.
I tried bpf_spin_lock. However it cannot be used in the tracing programs (such as kprobe). Also ebpf has few atomic operations. I knew one operation __sync_fetch_and_add. However it is not enough to implement the logic here.
I found some discussion online https://lists.iovisor.org/g/iovisor-dev/topic/bpf_concurrency/74407447?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,20,74407447. The discussion is still open without any viable solution.
Is there any readily available solution out there? Thanks a lot!
After some discussion in comments, we concluded that the best solution is probably to rely on the map's max_entries parameter to enforce a limit on the number of tracked connections.
With that approach, the BPF program would then look something like:
// Update an existing connection or insert a new one.
res = bpf_map_update_elem(&conntrack, &tuple, &connection, 0);
if (res == -E2BIG) {
// The map is full. The max. threshold for tracked connections was reached.
} else if (res != 0) {
// The map update failed for some other reason.
} else {
// The map update was successful.
}
Accessing hash maps in BPF is an atomic operation, so even in case of concurrent accesses, the map will never contain more elements than max_entries. Note the access to specific map elements is not atomic however.
We discarded per-CPU arrays to maintain per-CPU counters because there's no guarantee the event closing the connection (e.g., TCP FIN) will be processed on the same CPU as the event opening it (e.g., TCP SYN).

state manager parallel transactions in runasync

In service fabric stateful services, there is RunAsync(cancellationToken) with using() for state manager transaction.
The legacy code I want to refactor contains two queues with dequeue attempts inside the while(true) with 1 second delay. I would like to get rid of this unnecessary delay and instead use two distinct reactive queues (semaphores with reliable queues).
The problem is, now the two distinct workflows depending on these two queues need to be separated into two separate threads because if these two queues run in single thread one wait() will block the other code from running. (I know probably best practice would separate these two tasks into two microservices, next project.)
I came up with below code as a solution:
protected override async Task RunAsync(CancellationToken cancellationToken)
{
await Task.WhenAll(AsyncTask1(cancellationToken), AsyncTask2(cancellationToken)).ConfigureAwait(false);
}
And each task contains something like:
while (true)
{
cancellationToken.ThrowIfCancellationRequested();
using (var tx = this.StateManager.CreateTransaction())
{
var maybeMessage = await messageQueue.TryDequeueAsync(tx, cancellationToken).ConfigureAwait(false);
if (maybeMessage.HasValue)
{
DoWork();
}
await tx.CommitAsync().ConfigureAwait(false);
}
}
Seems working but I just want to make sure if the using(statemanger.createTansaction()) is ok to be used in this parallel way..
According to documentation
Depending on the replica role for single-entry operation (like TryDequeueAsync) the ITransaction uses the Repeatable Read isolation level (when primary) or Snapshot isolation level (when **secondary).
Repeatable Read
Any Repeatable Read operation by default takes Shared locks.
Snapshot
Any read operation done using Snapshot isolation is lock free.
So if DoWork doesn't modifies the reliable collection then multiple transaction can be executed in parallel with no problems.
In case of multiple reads / updates - this can cause deadlocks and should be done with care.

Can we prevent deadlocks and timeouts on ReliableQueue's in Service Fabric?

We have a stateful service in Service Fabric with both a RunAsync method and a couple of service calls.
One service calls allows to enqueue something in a ReliableQueue
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.EnqueueAsync(tx, message);
queueLength = await queue.GetCountAsync(tx);
await tx.CommitAsync();
}
The RunAsync on the other hand tries to dequeue things:
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.TryDequeueAsync(tx);
queueLength = await queue.GetCountAsync(tx);
await tx.CommitAsync();
}
The GetCountAsync seems to cause deadlocks, because the two transactions block each other. Would it help if we would switch the order: so first counting and then the dequeue/enqueue?
This is likely due to the fact that the ReliableQueue today is strict FIFO and allows only one reader or writer at a time. You're probably not seeing deadlocks, you're seeing timeouts (please correct me if that is not the case). There's no real way to prevent the timeouts other than to:
Ensure that the transactions are not long lived - any longer than you need and you're blocking other work on the queue.
Increase the default transaction timeout (the default is 4 seconds, you can pass in a different value)
Reordering things shouldn't cause any change.
Having two transactions in two different places shouldn't cause deadlocks, as they act like mutexes. What will cause them though is creating transactions within transactions.
Perhaps that is what's happening? I've developed the habit lately of naming functions that create transactions Transactional, ie DoSomethingTransactionalAsync, and if it's a private helper I'll usually create two versions with one taking a tx and one creating a tx.
For example:
AddToProcessingQueueAsync(ITransaction tx, int num) and AddToProcessingQueueTransactionalAsync(int num).

Service Fabric reliable queue long operation

I'm trying to understand some best practices for service fabric.
If I have a queue that is added to by a web service or some other mechanism and a back end task to process that queue what is the best approach to handle long running operations in the background.
Use TryPeekAsync in one transaction, process and then if successful use TryDequeueAsync to finally dequeue.
Use TryDequeueAsync to remove an item, put it into a dictionary and then remove from the dictionary when complete. On startup of the service, check the
dictionary for anything pending before the queue.
Both ways feel slightly wrong, but I can't work out if there is a better way.
One option is to process the queue in RunAsync, something along the lines of this:
protected override async Task RunAsync(CancellationToken cancellationToken)
{
var store = await StateManager.GetOrAddAsync<IReliableQueue<T>>("MyStore").ConfigureAwait(false);
while (!cancellationToken.IsCancellationRequested)
{
using (var tx = StateManager.CreateTransaction())
{
var itemFromQueue = await store.TryDequeueAsync(tx).ConfigureAwait(false);
if (!itemFromQueue.HasValue)
{
await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken).ConfigureAwait(false);
continue;
}
// Process item here
// Remmber to clone the dequeued item if it is a custom type and you are going to mutate it.
// If success, await tx.CommitAsync();
// If failure to process, either let it run out of the Using transaction scope, or call tx.Abort();
}
}
}
Regarding the comment about cloning the dequeued item if you are to mutate it, look under the "Recommendations" part here:
https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/
One limitation with Reliable Collections (both Queue and Dictionary), is that you only have parallelism of 1 per partition. So for high activity queues it might not be the best solution. This might be the issue you're running into.
What we've been doing is to use ReliableQueues for situations where the write amount is very low. For higher throughput queues, where we need durability and scale, we're using ServiceBus Topics. That also gives us the advantage that if a service was Stateful only due to to having the ReliableQueue, it can now be made stateless. Though this adds a dependency to a 3rd party service (in this case ServiceBus), and that might not be an option for you.
Another option would be to create a durable pub/sub implementation to act as the queue. I've done tests before with using actors for this, and it seemed to be a viable option, without spending too much time on it, since we didn't have any issues depending on ServiceBus. Here is another SO about that Pub/sub pattern in Azure Service Fabric
If very slow use 2 queues.. One a fast one where you store the work without interruptions and a slow one to process it. RunAsync is used to move messages from the fast to the slow.

Conditional Work Queue Insertion for beanstalkd?

I'm using the Perl client of beanstalkd. I need a simple way to not enqueue the same work twice.
I need something that needs to basically wait until there are K elements, and then groups them together. To accomplish this, I have the producer:
insert item(s) into DB
insert a queue item into beanstalkd
And the consumer:
while ( 1 ) {
beanstalkd.retrieve
if ( DB items >= K )
func_to_process_all_items
kill job
}
This is linear in the number of requests/processing, but in the case of:
insert 1 item
... repeat many times ...
insert 1 item
Assuming all these insertions happened before a job was retrieved, this would add N queue items, and it would do something as such:
check DB, process N items
check DB, no items
... many times ...
check DB, no items
Is there a smarter way to do this so that it does not insert/process the later job requests unnecessarily?
I had a related requirement. I only wanted to process a specific job once within a few minutes, but the producer could queue several instances of the same job. I used memcache to store the job identifier and set the expiry of the key to just a few minutes.
When a worker tried to add the job identifier to memcache, only the first would succeed - on failure to add the job id, the worker would delete the job. After a few minutes, the key expires from memcache and the job can be processed again.
Not particularly elegant, but it works.
Will this work for you?:
Create two Tubes "buffer" and "live". Your producer always only adds to the "buffer" tube.
Create two workers one watches the "buffer" and the other watches the "live" that call the blocking reserve() call
Whenever the "buffer" worker returns on reserve, it buries the job if there are less than K items. If there are exactly K, then it "kicks" all K jobs and transfers them to the "live" tube.
The "live" watcher will now return on its own reserve()
You just need to take care that a job does not ever return to the buffer queue from the buried state. A failsafe way to do this might be to delete it and then add it to live.
The two separate queues are only for cleaner separation. You could do the same with a single queue by burying everyjob until there are K-1 and then on the arrival of the K-th job, kicking all of them live.