Guava BloomFilter how to expire once a day - guava

I am using guava bloom filter to remove duplicate message in the service for receiving log.
Is there a way for bloom filter to expire like guava cache does?

No, a BloomFilter does not have any remove functionality. This is also impossible, since a BloomFilter keeps track of what may possibly be in a set of objects.
Removing an entry from a BloomFilter for one entry would result into false negatives for other entries. A BloomFilter must be 100% accurate about what is not in the set.

You can't even remove from Guava's BloomFilter, let alone automatically expire entries.

If you need remove functionality consider using a Counting Bloom Filter.

As suggested by Tavian Barnes, one solution here might be to create a class that wraps a BloomFilter and (once a day, or however often you want) atomically replaces that BloomFilter with a new one and repopulates it.

Consider using Sliding Bloom Filters
Check this, This can solve your problem
https://programming.guide/sliding-bloom-filter.html

Related

Message ordering of ReactiveKafkaConsumerTemplate receiveAutoAck

i am asking myself if the ReactiveKafkaConsumerTemplate of the spring-kafka project does guarantee the correct ordering of messages. I read the documentation of the reactor-kafka project and it states that messages should be consumed using the concatMap operator, but the ReactiveKafkaConsumerTemplate uses the flatMap operator at least in case of the receiveAutoAck method here:
https://github.com/spring-projects/spring-kafka/blob/master/spring-kafka/src/main/java/org/springframework/kafka/core/reactive/ReactiveKafkaConsumerTemplate.java#L69
Reference documentation of the reactor-kafka project:
https://projectreactor.io/docs/kafka/release/reference/#_auto_acknowledgement_of_batches_of_records
I am interested in using receiveAutoAck as it seems to be the most simpelst and comfortable approach, which suffices my use case. The only way to overcome this behaviour of the receiveAutoAck method seems to subclass the ReactiveKafkaConsumerTemplate and overwrite this behaviour. Is this correct?
I don't think it really matters here because internally the source of data for us is Flux.fromIterable(consumerRecords) which cannot lose its order because of an iterator therefore how hard we wouldn't try to process them in parallel, we still would get the order in one iterator. Yes, the order in between iterators we flatten is really unpredictable, but this doesn't matter for us since we worry about an order withing a single partition, nothing more.
Nevertheless I think we definitely need to fix that for the mentioned concatMap() to avoid such a confusion in the future. Feel free to provide a contribution on the matter!

Scala PriorityQueue conflict resolution?

I'm working on a project that uses a PriorityQueue and A*. After digging around a ton I think part of the problem that I'm encountering while my search tries to solve my problem is in the PriorityQueue. I'm guessing that when it generates nodes of equal scoring (for example one earlier, and one later) it will chose the one from earlier rather than the one that was most recently generated.
Does anyone know if a PriorityQueue prioritizes the newest node if the scores are the same? If not, how can I make it do this?
Thanks!
PriorityQueue uses a heap to select the next element. Beyond that it makes no guarantees about how the elements are ordered. If it is important to you that nodes are ordered by addition order, you should keep a count of the number of items added and prioritize by the tuple (priority, -order).
If you do anything else, even if it happens to work now, it may break at any arbitrary time since the API makes no guarantees about how it chooses from among equal elements.

Difference between ways of selecting descendants

With the risk of being redirected to a dublicate question, what is the difference between using $("ul li") and $("ul").find("li")?
You may find this similar question of interest: What is the fastest method for selecting descendant elements in jQuery?
If you only need first level children, using .children() will give you a performance boost since there is less to interrogate.
$("ul").find("li") is faster than $("ul li").
$("ul li") is going to be changed into $("ul").find("li") after some checks and breaking.
find() appeared faster than children() to me in single level selections.

How can I implement incr/decr on top of a key/value store?

How can I implement incr/decr on top of a key/value store?
I'm using a key value store that doesn't support incr and decr though which is why I want to create this. I have used Redis and Memcached incr and decr, so as mentioned in some of the answers then this is a perfect example of how I want the incr and decr to behave, so thanks to those who mentioned this.
The point of having a incr() function is it's all internal to the store. You don't have to pull data out and push it back in.
What you're doing sounds like you want to put some logic in your code that pulls the data out, increments it and pushes it back in... While it's not very hard (I think I've just described how you'd do it), it does defeat the point somewhat.
To get the benefit you'd need to change the source of your key store. Might be easy.
But a lot of caches already have this. If you really need this for speed, perhaps you should find an alternate store like memcached that does support it.
Memcache has this functionality built in
edit: it looks like you're not going to get an atomic update without updating the source, as there doesn't appear to be a lock function. If there is (and this is not pretty), you can lock the value, get it, increment it in your application, put it, and unlock it. Suboptimal though.
it kind of seems like without a compareAndSet then you are out of luck. But it will help to consider the problem from another angle. For example, if you were implementing an atomic counter that shows the number of upvotes for a question, then one way would be to have a "table" per question and to put a +1 for each upvote and -1 for each downvote. Then to "get" you would sum the "table". For this to work I assume "tables" are inexpensive and you don't care how long "get" takes to compute, you only mentioned incr/decr.
If you wish to atomically increment or decrement an int value associated with a key of e.g. type string, and if you'll know all of the keys in advance of having to perform the atomic operations on any of them, use Dictionary<string, int[]> and pre-populate the dictionary with a single-item array for each key value. It will then be possible to perform atomic operations (e.g. increment) on items via code like Threading.Interlocked.Increment(MyDict[keyString][0]);. If you need to be able to deal with keys that are not known in advance, you may need to use a ConcurrentDictionary instead of Dictionary, but you need to be careful if two threads try to simultaneously create dictionary entries for the same key.
Since increment and decrement are simple addition and subtraction operations that are "commutative", what you need to implement is a PN-Counter. It is a CRDT (commutative replicated data type). Various examples of how to implement this on Riak are available around the web and on Github.

Any tips or best practices for adding a new item to a history while maintaining a maximum total number of items?

I'm working on some basic logging/history functionality for a Core Data iPhone app. I want to maintain a maximum number of history items.
My general plan is to ignore the maximum when adding a new item and enforce it whenever I need to fetch all the items anyway (e.g. for searching or browsing the history). Alternatively, I could do it when adding a new item: fetch the current items, add the new one, and delete the oldest one if we're at the maximum. The second way seems less efficient, since I would be fetching all the items when I otherwise wouldn't need to.
So, the questions:
Which way is better? Is there an even better way to do this that I'm not considering?
How many items would be a reasonable maximum? The history is used for text field autocompletion, so more items means better usability, unless the number of items is so huge that it's slowing stuff down.
Thanks!
Whichever method is easier to implement is the right one. You shouldn't bother with a more efficient/more complicated implementation unless it proves it's needed.
If these objects are in a to-many relationship of some kind, I'd use the relationship to manage the maximum number. (Override add<Whatever>Object: and delete the extraneous items then).
If you're just fetching them, then that's really your only opportunity to filter them out. If you're using an NSArrayController, you might be able to implement a subclass that detects when new objects are added and chops off the extra ones.
If the items are added manually by the user, then you can safely use the method of cleaning up later. With text data, a user won't enter more a few hundred items at most and text data takes up very little room. If the items are added by software, you have to check every so many entries or risk spill over.
You might not want to spend a lot of time on this. Autocomplete is not that big, usually just a few hundred entries. I would right it the simplest way, with clean up later, and then fiddle with it only if you hit a definite performance bottleneck.
Remember, premature optimization is the root of all programming evil. That and the dweebs in marketing.