Upon expiracy, remove the entry from Guava Cache and LoadingCache - guava

I am trying to use Guava cache in my project, and I investigate Guava Cache and LoadingCache.
I am using following code to create the cache:
Cache<Person, Integer> cache = CacheBuilder.newBuilder().expireAfterAccess(5, TimeUnit.SECONDS).build();
cache.get(key, CallableProvidedForLoading)
or
final LoadingCache<Person, Integer> cache = CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterAccess(6, TimeUnit.SECONDS)
.expireAfterWrite(6, TimeUnit.SECONDS)
.build(
new CacheLoader<Person, Integer>() {
#Override
public Integer load(Person key) throws Exception {
//can't return null, use -1 to indicate invalid value
return -1;
}
}
);
What I want to achieve is that, if the entry is not accessed for a period, for example,10 minute, then remove this entry from the cache.
But if looks to me that the Cache and LoadingCache will try to reload the entry if time expires,this will keep this entry stay in the cache even it will not be used any more.
So,I would ask how to remove the entry from cache upon expiry

Related

Mongo change-Stream with Spring resumeAt vs startAfter and fault tolerance in case of connection loss

Can't find an answer on stackOverflow, nor in any documentation,
I have the following change stream code(listen to a DB not a specific collection)
Mongo Version is 4.2
#Configuration
public class DatabaseChangeStreamListener {
//Constructor, fields etc...
#PostConstruct
public void initialize() {
MessageListenerContainer container = new DefaultMessageListenerContainer(mongoTemplate, new SimpleAsyncTaskExecutor(), this::onException);
ChangeStreamRequest.ChangeStreamRequestOptions options =
new ChangeStreamRequest.ChangeStreamRequestOptions(mongoTemplate.getDb().getName(), null, buildChangeStreamOptions());
container.register(new ChangeStreamRequest<>(this::onDatabaseChangedEvent, options), Document.class);
container.start();
}
private ChangeStreamOptions buildChangeStreamOptions() {
return ChangeStreamOptions.builder()
.returnFullDocumentOnUpdate()
.filter(newAggregation(match(where(OPERATION_TYPE).in(INSERT.getValue(), UPDATE.getValue(), REPLACE.getValue(), DELETE.getValue()))))
.resumeAt(Instant.now().minusSeconds(1))
.build();
}
//more code
}
I want the stream to start listening from system initiation time only, without taking anything prior in the op-log, will .resumeAt(Instant.now().minusSeconds(1)) work?
do I need to use starAfter method if so how can I found the latest resumeToken in the db?
or is it ready out of the box and I don't need to add any resume/start lines?
second question, I never stop the container(it should always live while app is running), In case of disconnection from the mongoDB and reconnection will the listener in current configuration continue to consume messages? (I am having a hard time simulation DB disconnection)
If it will not resume handling events, what do I need to change in the configuration so that the change stream will continue and will take all the event from the last received resumeToken prior to the disconnection?
I have read this great article on medium change stream in prodcution,
but it uses the cursor directly, and I want to use the spring DefaultMessageListenerContainer, as it is much more elegant.
So I will answer my own(some more dumb, some less :)...) questions:
when no resumeAt timestamp provided the change stream will start from current time, and will not draw any previous events.
resumeAfter event vs timestamp difference can be found here: stackOverflow answer
but keep in mind, that for timestamp it is inclusive of the event, so if you want to start from next event(in java) do:
private BsonTimestamp getNextEventTimestamp(BsonTimestamp timestamp) {
return new BsonTimestamp(timestamp.getValue() + 1);
}
In case of internet disconnection the change stream will not resume,
as such I recommend to take following approach in case of error:
private void onException() {
ScheduledExecutorService executorService = newSingleThreadScheduledExecutor();
executorService.scheduleAtFixedRate(() -> recreateChangeStream(executorService), 0, 1, TimeUnit.SECONDS);
}
private void recreateChangeStream(ScheduledExecutorService executorService) {
try {
mongoTemplate.getDb().runCommand(new BasicDBObject("ping", "1"));
container.stop();
startNewContainer();
executorService.shutdown();
} catch (Exception ignored) {
}
}
First I am creating a runnable scheduled task that always runs(but only 1 at a time newSingleThreadScheduledExecutor()), I am trying to ping the DB, after a successful ping I am stopping the old container and starting a new one, you can also pass the last timestamp you took so that you can get all events you might have missed
timestamp retrieval from event:
BsonTimestamp resumeAtTimestamp = changeStreamDocument.getClusterTime();
then I am shutting down the task.
also make sure the resumeAtTimestamp exist in oplog...

Scaffeine: how to set different expiration time for default value

Scala application use case:
We have a Scala based that module reads the data from global cache (Redis) and save the same into local cache(Scaffeine). As we want this data to be refreshed asynchronously, we are using LoadingCache with refreshAfterWrite duration set to refresh window of 2.second.
Question:
We need to set different expiry time while setting values in local cache based on if key present in the redis (global cache) or not.
e.g.
If the key is not present in the global cache, we want to save the same key in local cache with default value and refresh window set to 5.minutes.
If key is present in the global cache, we want to store the same in local cache with actual value and refresh window set to 30.minute.
Sample code
object LocalCache extends App {
// data being stored in the cache
class DataObject(data: String) {
override def toString: String = {
"[ 'data': '" + this.data + "' ]"
}
}
// loader helper
private def loaderHelper(key: Int): Future[DataObject] = {
// this method will replace to read the data from Redis Cache
// for now, returns different values per key
if (key == 1) Future.successful(new DataObject("LOADER_HELPER_1"))
else if (key == 2) Future.successful(new DataObject("LOADER_HELPER_2"))
else Future.successful(new DataObject("LOADER_HELPER"))
}
// async loader
private def loader(key: Int): DataObject = {
Try {
Await.result(loaderHelper(key), 1.seconds)
} match {
case Success(result) =>
result
case Failure(exception: Exception) =>
val temp: DataObject = new DataObject("LOADER")
temp
}
}
// initCache
private def initCache(maximumSize: Int): LoadingCache[Int, DataObject] =
Scaffeine()
.recordStats()
.expireAfterWrite(2.second)
.maximumSize(maximumSize)
.build(loader)
// operations on the cache.
val cache: LoadingCache[Int, DataObject] = initCache(maximumSize = 500)
cache.put(1, new DataObject("foo"))
cache.put(2, new DataObject("hoo"))
println("sleeping for 3 sec\n")
Thread.sleep(3000)
println(cache.getIfPresent(1).toString)
println(cache.getIfPresent(2).toString)
println(cache.get(3).toString)
println("sleeping for 10 sec\n")
Thread.sleep(10000)
println("waking up from 10 sec sleep")
println(cache.get(1).toString)
println(cache.get(2).toString)
println(cache.get(3).toString)
println("\nCache Stats: "+ cache.stats())
}
I see lots of custom.policy that can be used to overwrite the expiryAfter policies (expiryAfterWrite/Update/Access) but nothing can be found for refreshAterWrite policies which refreshes the data asynchronously. Any help will be appreciable.
P.S.
I'm very newbie to work on Scala and also explore the Scaffeine.
Unfortunately variable refresh is not supported yet. There is an open issue to provide that feature.
At the moment expiration can be custom per entry, but automatic refresh is fixed. A manual refresh may be triggered by LoadingCache.refresh(key), if you want to manage it yourself. For example, you could periodically iterate over the entries (via the asMap() view) and refresh manually based on a custom criteria.
The AsyncLoadingCache could be useful instead of blocking on a future within your cache loader. The cache will return the in-flight future, won't make it expirable until the value materializes, and will remove it if it fails. Note that the synchronous() view is very useful for async caches to access more operations.
From testing, you might find Guava's fake ticker useful to simulate time.

Esper EPL window select not working for a basic example

Everything I read says this should work: I need my listener to trigger every 10 seconds with events. What I am getting now is every event in, it a listener trigger. What am I missing? The basic requirements are to create summarized statistics every 10s. Ideally I just want to pump data into the runtime. So, in this example, I would expect a dump of 10 records, once every 10 seconds
class StreamTest {
private final Configuration configuration = new Configuration();
private final EPRuntime runtime;
private final CompilerArguments args = new CompilerArguments();
private final EPCompiler compiler;
public DatadogApplicationTests() {
configuration.getCommon().addEventType(CommonLogEntry.class);
runtime = EPRuntimeProvider.getRuntime(this.getClass().getSimpleName(), configuration);
args.getPath().add(runtime.getRuntimePath());
compiler = EPCompilerProvider.getCompiler();
}
#Test
void testDisplayStatsEvery10S() throws Exception{
// Display stats every 10s about the traffic during those 10s:
EPCompiled compiled = compiler.compile("select * from CommonLogEntry.win:time(10)", args);
runtime.getDeploymentService().deploy(compiled).getStatements()[0].addListener(
(old, newEvents, epStatement, epRuntime) ->
Arrays.stream(old).forEach(e -> System.out.format("%s: received %n", LocalTime.now()))
);
new BufferedReader(new InputStreamReader(this.getClass().getResourceAsStream("/access.log"))).lines().map(CommonLogEntry::new).forEachOrdered(e -> {
runtime.getEventService().sendEventBean(e, e.getClass().getSimpleName());
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ex) {
System.err.println(ex);
}
});
}
}
Which currently outputs every second, corresponding to the sleep in my stream:
11:00:54.676: received
11:00:55.684: received
11:00:56.689: received
11:00:57.694: received
11:00:58.698: received
11:00:59.700: received
A time window is a sliding window. There is a chapter on basic concepts that explains how they work. Here is the link to the basic concepts chapter.
It is not clear what the requirements are but I think what you want to achieve is collecting events for a while and then releasing them. You can draw inspiration from the solution patterns.
This will collect events for 10 seconds.
create schema StockTick(symbol string, price double);
create context CtxBatch start #now end after 10 seconds;
context CtxBatch select * from StockTick#keepall output snapshot when terminated;

How to process a KStream in a batch of max size or fallback to a time window?

I would like to create a Kafka stream-based application that processes a topic and takes messages in batches of size X (i.e. 50) but if the stream has low flow, to give me whatever the stream has within Y seconds (i.e. 5).
So, instead of processing messages one by one, I process a List[Record] where the size of the list is 50 (or maybe less).
This is to make some I/O bound processing more efficient.
I know that this can be implemented with the classic Kafka API but was looking for a stream-based implementation that can also handle offset committing natively, taking errors/failures into account.
I couldn't find anything related int he docs or by searching around and was wondering if anyone has a solution to this problem.
#Matthias J. Sax answer is nice, I just want to add an example for this, I think it might be useful for someone.
let's say we want to combine incoming values into the following type:
public class MultipleValues { private List<String> values; }
To collect messages into batches with max size, we need to create transformer:
public class MultipleValuesTransformer implements Transformer<String, String, KeyValue<String, MultipleValues>> {
private ProcessorContext processorContext;
private String stateStoreName;
private KeyValueStore<String, MultipleValues> keyValueStore;
private Cancellable scheduledPunctuator;
public MultipleValuesTransformer(String stateStoreName) {
this.stateStoreName = stateStoreName;
}
#Override
public void init(ProcessorContext processorContext) {
this.processorContext = processorContext;
this.keyValueStore = (KeyValueStore) processorContext.getStateStore(stateStoreName);
scheduledPunctuator = processorContext.schedule(Duration.ofSeconds(30), PunctuationType.WALL_CLOCK_TIME, this::doPunctuate);
}
#Override
public KeyValue<String, MultipleValues> transform(String key, String value) {
MultipleValues itemValueFromStore = keyValueStore.get(key);
if (isNull(itemValueFromStore)) {
itemValueFromStore = MultipleValues.builder().values(Collections.singletonList(value)).build();
} else {
List<String> values = new ArrayList<>(itemValueFromStore.getValues());
values.add(value);
itemValueFromStore = itemValueFromStore.toBuilder()
.values(values)
.build();
}
if (itemValueFromStore.getValues().size() >= 50) {
processorContext.forward(key, itemValueFromStore);
keyValueStore.put(key, null);
} else {
keyValueStore.put(key, itemValueFromStore);
}
return null;
}
private void doPunctuate(long timestamp) {
KeyValueIterator<String, MultipleValues> valuesIterator = keyValueStore.all();
while (valuesIterator.hasNext()) {
KeyValue<String, MultipleValues> keyValue = valuesIterator.next();
if (nonNull(keyValue.value)) {
processorContext.forward(keyValue.key, keyValue.value);
keyValueStore.put(keyValue.key, null);
}
}
}
#Override
public void close() {
scheduledPunctuator.cancel();
}
}
and we need to create key-value store, add it to StreamsBuilder, and build KStream flow using transform method
Properties props = new Properties();
...
Serde<MultipleValues> multipleValuesSerge = Serdes.serdeFrom(new JsonSerializer<>(), new JsonDeserializer<>(MultipleValues.class));
StreamsBuilder builder = new StreamsBuilder();
String storeName = "multipleValuesStore";
KeyValueBytesStoreSupplier storeSupplier = Stores.persistentKeyValueStore(storeName);
StoreBuilder<KeyValueStore<String, MultipleValues>> storeBuilder =
Stores.keyValueStoreBuilder(storeSupplier, Serdes.String(), multipleValuesSerge);
builder.addStateStore(storeBuilder);
builder.stream("source", Consumed.with(Serdes.String(), Serdes.String()))
.transform(() -> new MultipleValuesTransformer(storeName), storeName)
.print(Printed.<String, MultipleValues>toSysOut().withLabel("transformedMultipleValues"));
KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), props);
kafkaStreams.start();
with such approach we used the incoming key for which we did aggregation. if you need to collect messages not by key, but by some message's fields, you need the following flow to trigger rebalancing on KStream (by using intermediate topic):
.selectKey(..)
.through(intermediateTopicName)
.transform( ..)
The simplest way might be, to use a stateful transform() operation. Each time you receive a record, you put it into the store. When you have received 50 records, you do your processing, emit output, and delete the records from the store.
To enforce processing if you don't read the limit in a certain amount of time, you can register a wall-clock punctuation.
It seems that there is no need to use Processors or Transformers and transform() to batch events by count. Regular groupBy() and reduce()/aggregate() should do the trick:
KeyValueSerde keyValueSerde = new KeyValueSerde(); // simple custom Serde
final AtomicLong batchCount = new AtomicLong(0L);
myKStream
.groupBy((k,v) -> KeyValue.pair(k, batchCount.getAndIncrement() / batchSize),
Grouped.keySerde(keyValueSerde))
.reduce(this::windowReducer) // <-- how you want to aggregate values in batch
.toStream()
.filter((k,v) -> /* pass through full batches only */)
.selectKey((k,v) -> k.key)
...
You'd also need to add straightforward Serde for the standard KeyValue<String, Long>.
This option is obviously only helpful when you don't need a "punctuator" to emit incomplete batches on timeout. It also doesn't guarantee the order of elements in the batch in case of distributed processing.
You can also concatenate count to the key string to form the new key (instead of using KeyValue). That would simplify example even further (to using Serdes.String()).

Flink getting past bad messages in Kafka: "poison message"

First time I'm trying to get this to work so bear with me. I'm trying to
learn checkpointing with Kafka and handling "bad" messages, restarting
without losing state.
Use Case:
Use checkpointing.
Read a stream of integers from Kafka, keep a running sum.
If a "bad" Kafka message read, restart app, skip the "bad" message, keep
state. My stream would look something look like this:
set1,5
set1,7
set1,foobar
set1,6
I want my app to keep a running sum of the integers it has seen, and restart
if it crashes without losing state, so app behavior/running sum would be:
5,
12,
app crashes and restarts, reads checkpoint
18
etc.
However, I'm finding when my app restarts, it keeps reading the bad "foobar"
message and doesnt get past it. Source code below. The mapper bombs when I
try to parse "foobar" as an Integer.
How can I modify app to get past "poison" message?
env.enableCheckpointing(1000L);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500L);
env.getCheckpointConfig().setCheckpointTimeout(10000);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.setStateBackend(new
FsStateBackend("hdfs://mymachine:9000/flink/checkpoints"));
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", BROKERS);
properties.setProperty("zookeeper.connect", ZOOKEEPER_HOST);
properties.setProperty("group.id", "consumerGroup1");
FlinkKafkaConsumer08 kafkaConsumer = new FlinkKafkaConsumer08<>(topicName,
new SimpleStringSchema(), properties);
DataStream<String> messageStream = env.addSource(kafkaConsumer);
DataStream<Tuple2<String,Integer>> sums = messageStream
.map(new NumberMapper())
.keyBy(0)
.sum(1);
sums.print();
private static class NumberMapper implements
MapFunction<String,Tuple2<String,Integer>> {
public Tuple2<String,Integer> map(String input) throws Exception {
return parseData(input);
}
private Tuple2<String,Integer> parseData(String record) {
String[] tokens = record.toLowerCase().split(",");
// Get Key
String key = tokens[0];
// Get Integer Value
String integerValue = tokens[1];
System.out.println("Trying to Parse=" + integerValue);
Integer value = Integer.parseInt(integerValue);
// Build Tuple
return new Tuple2<String,Integer>(key, value);
}
}
You could change the NumberMapper into a FlatMap and filter out invalid elements:
private static class NumberMapper implements FlatMapFunction<String, Tuple2<String, Integer>> {
public void flatMap(String input, Collector<Tuple2<String, Integer>> collector) throws Exception {
Optional<Tuple2<String, Integer>> optionalResult = parseData(input);
optionalResult.ifPresent(collector::collect);
}
private Optional<Tuple2<String, Integer>> parseData(String record) {
String[] tokens = record.toLowerCase().split(",");
// Get Key
String key = tokens[0];
// Get Integer Value
String integerValue = tokens[1];
try {
Integer value = Integer.parseInt(integerValue);
// Build Tuple
return Optional.of(Tuple2.of(key, value));
} catch (NumberFormatException e) {
return Optional.empty();
}
}
}