JPA: Automatically delete data after timeinterval - jpa

I've the following scenario:
Data will be created by the sender on the server.
If the receiver won't request the data within (for example) three days, the data should be deleted otherwise the receiver gets the data and after getting the data, the data will be deleted.
In other words:
The data should only exist for a limited time.
I could create a little jar-file which will delete the data manually, probably started as a cron-job. But i think, that that is a very bad solution.
Is it possible to create a trigger with JPA/Java EE that will be called after a specific time period? As far as i know trigger can only be called after/before insert, update, delete events. So that won't be a solution.
Note: I'm currently using H2-Database and Wildfly for my RESTeasy application. But i might change that in future, so the solution should be adaptive.
Best regards,
Maik

Java EE brings everything you need. You can #Schedule a periodically executed cleanup job:
#Singleton
public class Cleanup {
#Schedule(minute = "*/5")
public void clean() {
// will be called every 5 minutes
}
}
Or you programmatically create a Timer which will be executed after a certain amount of time:
#Singleton
public class Cleanup {
#Resource
TimerService timerService;
public void deleteLater(long id) {
String timerInfo = MyEntity.class.getName() + ':' + id;
long timeout = 1000 * 60 * 5; // 5 minutes
timerService.createTimer(timeout, timerInfo);
}
#Timeout
public void handleTimeout(Timer timer) {
Serializable timerInfo = timer.getInfo(); // the information you passed to the timer
}
}
More information can be found in the Timer Service Tutorial.

Related

Design of a pipeline that invokes a maximum number of requests per second

My goal is to create a pipeline that invokes a back-end (Cloud hosted) service a maximum number of times per second ... how can I achieve that?
Back story: Imagine a back-end service that is invoked with a single input and returns a single output. This service has quotas associated with it that permit a maximum number of requests per second (let's say 10 requests per second). Now imagine an unbounded source PCollection where I wish to transform the elements in the input by passing them through my back-end service. I can envisage a ParDo invoking the back-end service once for each element in the input PCollection. However, this doesn't perform any kind of flow control against the back-end.
I could imagine my DoFn logic testing the response from the back-end response and retrying till it succeeds but this doesn't feel right. If I have 100 workers, then I seem to be burning a lot of resources and putting a load on the back-end. What I think I want to do is throttle the calls to the back-end from the pipeline.
Good Day, kolban. In addition to Bruno Volpato's helpful RampupThrottlingFn example, I've seen a combination of the following. Please do not hesitate at all to let me know how I can update the example with more clarity.
PeriodicImpulse - emits an Instant at a fixed specified interval.
Fix the number of workers with the maxNumWorkers and numWorkers (Please see Dataflow Pipeline Options), if using the Dataflow runner.
Beam Metrics API to monitor the actual resource request count over time and set alerts. When using Dataflow, the Beam Metrics API automatically connects to Cloud Monitoring as Custom metrics
The following shows abbreviated code starting from the whole pipeline followed by some details as needed to provide clarity. It assumes a target of 10 workers, using Dataflow with the arguments --maxNumWorkers=10 and --numWorkers=10 and a goal to limit the resource requests among all workers to 10 requests per second. This translates to 1 request per second per worker.
PeriodicImpulse limits the Request creation to 1 per second
public class MyPipeline {
public static void main(String[] args) {
Pipeline pipeline = Pipeline.create(/* Usually with options */);
PCollection<Response> responses = pipeline.apply(
"PeriodicImpulse",
PeriodicImpulse
.create()
.withInterval(Duration.standardSeconds(1L))
).apply(
"Build Requests",
ParDo.of(new RequestFn())
)
.apply(ResourceTransform.create());
}
}
RequestFn DoFn emits Requests per Instant emitted from PeriodicImpulse
class RequestFn extends DoFn<Instant, Request> {
#ProcessElement
public void process(#Element Instant instant, OutputReceiver<Request> receiver) {
receiver.output(
Request.builder().build()
);
}
}
ResourceTransform transforms Requests to Responses, incrementing a Counter
class ResourceTransform extends PTransform<PCollection<Request>, PCollection<Response>> {
static ResourceTransform create() {
return new ResourceTransform();
}
public PCollection<Response> expand(PCollection<Request> input) {
return ParDo.of("Consume Resource", new ResourceFn());
}
}
class ResourceFn extends DoFn<Request, Response> {
private Counter counter = Metrics.counter(ResourceFn.class, "some:resource");
private transient ResourceClient client = null;
#Setup
public void setup() {
client = new ResourceClient();
}
#ProcessElement
public void process(#Element Request request, OutputReceiver<> receiver)
{
counter.inc(); // Increment the counter.
// not showing error handling
Response response = client.execute(request);
receiver.output(response);
}
}
Request and Response classes
(Aside: consider creating a Schema for the request input and response output classes. Example below uses AutoValue and AutoValueSchema)
#DefaultSchema(AutoValueSchema.class)
#AutoValue
abstract class Request {
/* abstract Getters. */
abstract String getId();
#AutoValue.Builder
static abstract class Builder {
/* abstract Setters. */
abstract Builder setId(String value);
abstract Request build();
}
}
#DefaultSchema(AutoValueSchema.class)
#AutoValue
abstract class Response {
/* abstract Getters. */
abstract String getId();
#AutoValue.Builder
static abstract class Builder {
/* abstract Setters. */
abstract Builder setId(String value);
abstract Response build();
}
}

How Can I have multiples instances of a Spring boot Repository(Interface), to have a complete test-state-isolation?

1) Contextualization:
In order, to have a complete test-isolation-state in all test of my Test-Class;
I would like to have a new-instance-repository(DAO) for each individual test;
My Repository is a Interface, thats the why I can not simply instantiate that.
My Goal is:
Run all tests 'Parallelly', meaning 'at the same time';
That's the why, I need individual/multiple instances of Repository(DAO) in each test;
Those multiple instances will make sure that the tests' conclusion would not interfere on those that still is running.
Below is the code for the above situation:
1.1) Code:
Current working status: working, BUT with ths SAME-REPOSITORY-INSTANCE;
Current behaviour:
The tests are not stable;
SOMETIMES, they interfere in each other;
meaning, the test that finalize early, destroy the Repository Bean that still is being used, for the test that is still running.
public class ServiceTests2 extends ConfigTests {
private List<Customer> customerList;
private Flux<Customer> customerFlux;
#Lazy
#Autowired
private ICustomerRepo repo;
private ICustomerService service;
#BeforeEach
public void setUp() {
service = new CustomerService(repo);
Customer customer1 = customerWithName().create();
Customer customer2 = customerWithName().create();
customerList = Arrays.asList(customer1,customer2);
customerFlux = service.saveAll(customerList);
}
#Test
#DisplayName("Save")
public void save() {
StepVerifier.create(customerFlux)
.expectNextSequence(customerList)
.verifyComplete();
}
#Test
#DisplayName("Find: Objects")
public void find_object() {
StepVerifier
.create(customerFlux)
.expectNext(customerList.get(0))
.expectNext(customerList.get(1))
.verifyComplete();
}
}
2) The ERROR happening:
This ERROR happens in the failed-Tests:
3) Question:
How Can I create multiple instances of Repository
Even if, it being a Interface(does not allow instantation)?
In order, to have a COMPLETE TEST-ISOLATION
Meaning: ONE different instance of Repository in each test?
Thanks a lot for any help or idea
You can use the #DirtiesContext annotation on the test class that modifies the application context.
Java Doc
Spring documentation
By default, this will mark the application context as dirty after the entire test class is run. If you would like to mark the context as dirty after a single test method, then you can either annotate the test method instead or set the classMode property to AFTER_EACH_TEST_METHOD at your class level annotation.
#DirtiesContext(classMode = ClassMode.AFTER_EACH_TEST_METHOD)
When an application context is marked dirty, it is removed from the
testing framework's cache and closed; thus the underlying Spring
container is rebuilt for any subsequent test that requires a context
with the same set of resource locations.

How to call multiple DB calls from different threads, under same transaction?

I have a requirement to perform clean insert (delete + insert), a huge number of records (close to 100K) per requests. For sake testing purpose, I'm testing my code with 10K. With 10K also, the operation is running for 30 secs, which is not acceptable. I'm doing some level of batch inserts provided by spring-data-JPA. However, the results are not satisfactory.
My code looks like below
#Transactional
public void saveAll(HttpServletRequest httpRequest){
List<Person> persons = new ArrayList<>();
try(ServletInputStream sis = httpRequest.getInputStream()){
deletePersons(); //deletes all persons based on some criteria
while((Person p = nextPerson(sis)) != null){
persons.add(p);
if(persons.size() % 2000 == 0){
savePersons(persons); //uses Spring repository to perform saveAll() and flush()
persons.clear();
}
}
savePersons(persons); //uses Spring repository to perform saveAll() and flush()
persons.clear();
}
}
#Transactional
public void savePersons(List<Persons> persons){
System.out.println(new Date()+" Before save");
repository.saveAll(persons);
repository.flush();
System.out.println(new Date()+" After save");
}
I have also set below properties
spring.jpa.properties.hibernate.jdbc.batch_size=40
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true
spring.jpa.properties.hibernate.id.new_generator_mappings=false
Looking at logs, I noticed that the insert operation is taking around 3 - 4 secs to save 2000 records, but not much on iteration. So I believe the time taken to read through the stream is not a bottleneck. But the inserts are. I also checked the logs and confirm that Spring is doing a batch of 40 inserts as per the property set.
I'm trying to see, if there is a way, I can improve the performance, by using multiple threads (say 2 threads) that would read from a blocking queue, and once accumulated say 2000 records, will call save. I hope, in theory, this may provide better results. But the problem is as I read, Spring manages Transactions at the thread level, and Transaction can not propagate across threads. But I need the whole operation (delete + insert) as atomic. I looked into few posts about Spring transaction management and could not get into the correct direction.
Is there a way I can achieve this kind of parallelism using Spring transactions? If Spring transactions is not the answer, are there any other techniques that can be used?
Thanks
Unsure if this will be helpful to you - it is working well in a test app. Also, do not know if it will be in the "good graces" of senior Spring personnel but my hope is to learn so I am posting this suggestion.
In a Spring Boot test app, the following injects a JPA repository into the ApplicationRunner which then injects the same into Runnables managed by an ExecutorService. Each Runnable gets a BlockingQueue that is being continually filled by a separate KafkaConsumer (which is acting like a producer for the queue). The Runnables use queue.takes() to pop from the queue and this is followed by a repo.save(). (Can readily add batch insert to thread but haven't done so since application has not yet required it...)
The test app currently implements JPA for Postgres (or Timescale) DB and is running 10 threads with 10 queues being fed by 10 Consumers.
JPA repository is provide by
public interface DataRepository extends JpaRepository<DataRecord, Long> {
}
Spring Boot Main program is
#SpringBootApplication
#EntityScan(basePackages = "com.xyz.model")
public class DataApplication {
private final String[] topics = { "x0", "x1", "x2", "x3", "x4", "x5","x6", "x7", "x8","x9" };
ExecutorService executor = Executors.newFixedThreadPool(topics.length);
public static void main(String[] args) {
SpringApplication.run(DataApplication.class, args);
}
#Bean
ApplicationRunner init(DataRepository dataRepository) {
return args -> {
for (String topic : topics) {
BlockingQueue<DataRecord> queue = new ArrayBlockingQueue<>(1024);
JKafkaConsumer consumer = new JKafkaConsumer(topic, queue);
consumer.start();
JMessageConsumer messageConsumer = new JMessageConsumer(dataRepository, queue);
executor.submit(messageConsumer);
}
executor.shutdown();
};
}
}
And the Consumer Runnable has a constructor and run() method as follows:
public JMessageConsumer(DataRepository dataRepository, BlockingQueue<DataRecord> queue) {
this.queue = queue;
this.dataRepository = dataRepository;
}
#Override
public void run() {
running.set(true);
while (running.get()) {
// remove record from FIFO blocking queue
DataRecord dataRecord;
try {
dataRecord = queue.take();
} catch (InterruptedException e) {
logger.error("queue exception: " + e.getMessage());
continue;
}
// write to database
dataRepository.save(dataRecord);
}
}
Into learning so any thoughts/concerns/feedback is appreciated...

Score corruption when using computed values to calculate score

I have a use case where:
A job can be of many types, says A, B and C.
A tool can be configured to be a type: A, B and C
A job can be assigned to a tool. The end time of the job depends on the current configured type of the tool. If the tool's current configured type is different from the type of the job, then time needs to be added to change the current tool configuration.
My #PlanningEntity is Allocation, with startTime and tool as #PlanningVariable. I tried to add the currentConfiguredToolType in the Allocation as the #CustomShadowVariable and update the toolType in the shadowListener's afterVariableChanged() method, so that I have the correct toolType for the next job assigned to the tool. However, it is giving me inconsistent results.
[EDIT]: I did some debugging to see if the toolType is set correctly. I found that the toolType is being set correctly in afterVariableChanged() method. However, when I looked at the next job assigned to the tool, I see that the toolType has not changed. Is it because of multiple threads executing this flow? One thread changing the toolType of the tool the first time and then a second thread simultaneously assigning the times the second time without taking into account the changes done by the first thread.
[EDIT]: I was using 6.3.0 Final earlier (till yesterday). I switched to 6.5.0 Final today. There too I am seeing similar results, where the toolType seems to be set properly in afterVariableChanged() method, but is not taken into account for the next allocation on that tool.
[EDIT]: Domain code looks something like below:
#PlanningEntity
public class Allocation {
private Job job;
// planning variables
private LocalDateTime startTime;
private Tool tool;
//shadow variable
private ToolType toolType;
private LocalDateTime endTime;
#PlanningVariable(valueRangeProviderRefs = TOOL_RANGE)
public Tool getTool() {
return this.tool;
}
#PlanningVariable(valueRangeProviderRefs = START_TIME_RANGE)
public LocalDateTime getStartTime() {
return this.startTime;
}
#CustomShadowVariable(variableListenerClass = ToolTypeVariableListener.class,
sources = {#CustomShadowVariable.Source(variableName = "tool")})
public ToolType getCurrentToolType() {
return this.toolType;
}
private void setToolType(ToolType type) {
this.toolType = type;
this.tool.setToolType(type);
}
private setStartTime(LocalDateTime startTime) {
this.startTime = startTime;
this.endTime = getTimeTakenForJob() + getTypeChangeTime();
...
}
private LocalDateTime getTypeChangeTime() {
//typeChangeTimeMap is available and is populated with data
return typeChangeTimeMap.get(tool.getType);
}
}
public class Tool {
...
private ToolType toolType;
getter and setter for this.
public void setToolType() { ...}
public ToolType getToolType() { ...}
}
public class ToolTypeVariableListener implements VariableListener<Allocation> {
...
#Override
public void afterVariableChanged(ScoreDirector scoreDirector, Allocation entity) {
scoreDirector.afterVariableChanged(entity, "currentToolType");
if (entity.getTool() != null && entity.getStartTime() != null) {
entity.setCurrentToolType(entity.getJob().getType());
}
scoreDirector.afterVariableChanged(entity, "currentToolType");
}
[EDIT]: When I did some debugging, looks like the toolType set in the machine for one allocation is used in calculating the type change time for a allocation belonging to a different evaluation set. Not sure how to avoid this.
If this is indeed the case, what is a good way to model problems like this where the state of a item affects the time taken? Or am I totally off. I guess i am totally lost here.
[EDIT]: This is not an issue with how Optaplanner is invoked, but score corruption, when the rule to penalize it based on endTime is added. More details in comments.
[EDIT]: I commented out the rules specified in rules one-by-one and saw that the score corruption occurs only when the score computed depends on the computed values: endTime and toolTypeChange. It is fine when the score depends on the startTime, which is a planningVariable alone. However, that does not give me the best results. It gives me a solution which has a negative hard score, which means it violated the rule of not assigning the same tool during the same time to different jobs.
Can computed values not be used for score calculations?
Any help or pointer is greatly appreciated.
best,
Alice
The ToolTypeVariableListener seems to lack class to the before/after methods, which can cause score corruption. Turn on FULL_ASSERT to verify.

How to access memcached asynchronously in netty

I am writing a server in netty, in which I need to make a call to memcached. I am using spymemcached and can easily do the synchronous memcached call. I would like this memcached call to be async. Is that possible? The examples provided with netty do not seem to be helpful.
I tried using callbacks: created a ExecutorService pool in my Handler and submitted a callback worker to this pool. Like this:
public class MyHandler extends ChannelInboundMessageHandlerAdapter<MyPOJO> implements CallbackInterface{
...
private static ExecutorService pool = Executors.newFixedThreadPool(20);
#Override
public void messageReceived(ChannelHandlerContext ctx, MyPOJO pojo) {
...
CallingbackWorker worker = new CallingbackWorker(key, this);
pool.submit(worker);
...
}
public void myCallback() {
//get response
this.ctx.nextOutboundMessageBuf().add(response);
}
}
CallingbackWorker looks like:
public class CallingbackWorker implements Callable {
public CallingbackWorker(String key, CallbackInterface c) {
this.c = c;
this.key = key;
}
public Object call() {
//get value from key
c.myCallback(value);
}
However, when I do this, this.ctx.nextOutboundMessageBuf() in myCallback gets stuck.
So, overall, my question is: how to do async memcached calls in Netty?
There are two problems here: a small-ish issue with the way you're trying to code this, and a bigger one with many libraries that provide async service calls, but no good way to take full advantage of them in an async framework like Netty. That forces users into suboptimal hacks like this one, or a less-bad, but still not ideal approach I'll get to in a moment.
First the coding problem. The issue is that you're trying to call a ChannelHandlerContext method from a thread other than the one associated with your handler, which is not allowed. That's pretty easy to fix, as shown below. You could code it a few other ways, but this is probably the most straightforward:
private static ExecutorService pool = Executors.newFixedThreadPool(20);
public void channelRead(final ChannelHandlerContext ctx, final Object msg) {
//...
final GetFuture<String> future = memcachedClient().getAsync("foo", stringTranscoder());
// first wait for the response on a pool thread
pool.execute(new Runnable() {
public void run() {
String value;
Exception err;
try {
value = future.get(3, TimeUnit.SECONDS); // or whatever timeout you want
err = null;
} catch (Exception e) {
err = e;
value = null;
}
// put results into final variables; compiler won't let us do it directly above
final fValue = value;
final fErr = err;
// now process the result on the ChannelHandler's thread
ctx.executor().execute(new Runnable() {
public void run() {
handleResult(fValue, fErr);
}
});
}
});
// note that we drop through to here right after calling pool.execute() and
// return, freeing up the handler thread while we wait on the pool thread.
}
private void handleResult(String value, Exception err) {
// handle it
}
That will work, and might be sufficient for your application. But you've got a fixed-sized thread pool, so if you're ever going to handle much more than 20 concurrent connections, that will become a bottleneck. You could increase the pool size, or use an unbounded one, but at that point, you might as well be running under Tomcat, as memory consumption and context-switching overhead start to become issues, and you lose the scalabilty that was the attraction of Netty in the first place!
And the thing is, Spymemcached is NIO-based, event-driven, and uses just one thread for all its work, yet provides no way to fully take advantage of its event-driven nature. I expect they'll fix that before too long, just as Netty 4 and Cassandra have recently by providing callback (listener) methods on Future objects.
Meanwhile, being in the same boat as you, I researched the alternatives, and not being too happy with what I found, I wrote (yesterday) a Future tracker class that can poll up to thousands of Futures at a configurable rate, and call you back on the thread (Executor) of your choice when they complete. It uses just one thread to do this. I've put it up on GitHub if you'd like to try it out, but be warned that it's still wet, as they say. I've tested it a lot in the past day, and even with 10000 concurrent mock Future objects, polling once a millisecond, its CPU utilization is negligible, though it starts to go up beyond 10000. Using it, the example above looks like this:
// in some globally-accessible class:
public static final ForeignFutureTracker FFT = new ForeignFutureTracker(1, TimeUnit.MILLISECONDS);
// in a handler class:
public void channelRead(final ChannelHandlerContext ctx, final Object msg) {
// ...
final GetFuture<String> future = memcachedClient().getAsync("foo", stringTranscoder());
// add a listener for the Future, with a timeout in 2 seconds, and pass
// the Executor for the current context so the callback will run
// on the same thread.
Global.FFT.addListener(future, 2, TimeUnit.SECONDS, ctx.executor(),
new ForeignFutureListener<String,GetFuture<String>>() {
public void operationSuccess(String value) {
// do something ...
ctx.fireChannelRead(someval);
}
public void operationTimeout(GetFuture<String> f) {
// do something ...
}
public void operationFailure(Exception e) {
// do something ...
}
});
}
You don't want more than one or two FFT instances active at any time, or they could become a drain on CPU. But a single instance can handle thousands of outstanding Futures; about the only reason to have a second one would be to handle higher-latency calls, like S3, at a slower polling rate, say 10-20 milliseconds.
One drawback of the polling approach is that it adds a small amount of latency. For example, polling once a millisecond, on average it will add 500 microseconds to the response time. That won't be an issue for most applications, and I think is more than offset by the memory and CPU savings over the thread pool approach.
I expect within a year or so this will be a non-issue, as more async clients provide callback mechanisms, letting you fully leverage NIO and the event-driven model.