I want to reuse an existing, transactional,paginated service class, which retrieves the items using JPA from a database, inside a Spring batch job, as a reader. I want to do that instead of using directly the JpaPagingItemReader basically because the JPA query is more complex to build and the service already provides this functionality.
My question would be what are the things I should take into account when developing the Spring batch adapter over this service. Although the reference documentation http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#pagingItemReaders has a section on reusing existing services, it doesn't say anything regarding the constraints, if there are any, of using such a transactional service.
Now, I looked at the JpaPagingItemReader as an example for building the reader, and I came up with a couple of questions I couldn't find answers for netiher in the documentation or on stackoverflow, although this post https://stackoverflow.com/a/26549831/4473261 helped.
The first thing I noticed is that a new transaction is used by the JpaPagingItemReader for reading a page of data. The above post says that this new transaction is needed "so that features like retry and skip can be correctly performed.". I have also found this article related to the matter https://blog.codecentric.de/en/2012/03/transactions-in-spring-batch-part-3-skip-and-retry/ that says that "when a skippable exception occurs during reading, we just increase the skip count and keep the exception for a later call on the onSkipInRead method of the SkipListener, if configured. There’s no rollback". So I assume that the reader has to do any reading of the records in a new transaction so that if a rollback of the transaction started when the processing of the chunk began happened, then the reader is not affected. I am wondering if this is true and if in this case my adapter should create a new transaction, invoke the service inside that transaction and then commit the transaction, similarly to how the JpaPagingItemReader does it. If that's true though, I wonder why there isn't any template provided by the framework which creates the transaction, delegates to the service the actual call to retrieve the data and then commits the transaction.
Greetings,
Cristi
From a reader perspective, there really isn't much to be concerned about. You can see in our JmsItemReader which obviously works with a transactional store that we don't take any additional precautions within the ItemReader itself.
What really matters is how you configure your step. When configuring your step, you'll need to mark the reader as transactional so that Spring Batch handles rollback correctly. When Spring Batch reads items in a fault tollerant step, the default behavior is to buffer them so that they won't be re-read on failure (retry, skip, etc). However, since the items read from a transactional store are tied to the transaction (and therefore reset when the rollback occurs), you need to tell Spring Batch to not buffer the items as they are read.
To mark the ItemReader as transactional, you'll set the not-quite-well-named flag is-reader-transactional-queue to true. You can read more about configuring steps and transactions in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html
Related
I am configuring a new Job where I need to read the data from the database and in the processor, the data will be used to call a Rest endpoint with payload. In the payload along with dynamic data, I need to pass reference data which is constant for each record getting processed in the job. This reference data is stored in DB. I am thinking to implement the following approach.
In the beforeJob listener method make a DB call and populate the reference data object and use it for the whole job run.
In the processor make a DB call to get the reference data and cache the query so there will be no DB call to fetch the same data for each record.
Please suggest if these approaches are correct or if there is a better way to implement them in Spring batch.
For performance reasons, I would not recommend doing a DB call in the item processor, unless that is really a requirement.
The first approach seems reasonable to me, since the reference data is constant. You can populate/clear a cache with a JobExecutionListener and use the cache in your chunk-oriented step. Please refer to the following thread for more details and a complete sample: Spring Batch With Annotation and Caching.
I have a simple model with repository configured persisting to postgresql. Using spring-data-rest, the api's are available out of the box for all the crud operations.
Now I want to introduce the caching with redis-6.0. So that any write(rest api's for POST or PUT, DELETE) operation, the model is persisted to db first and updated to the cache.
For the read operation(rest api GET), the item is looked into cache first, if available, then use that or else use spring-data-rest default behavior in this case i.e. find it in postgresql.
Write Operations (POST, PUT, DELETE):
Using the RepositoryEventHandler, HandleAfterCreate, HandleAfterDelete, HandleAfterSave events are subscribed and used to sync up the cache. This reasonably keep the cache to latest.
Read Operations(GET):
I do not see any event listener for read operation. Read is the only operation, that I want to bypass hitting the db as much as possible. But currently do not find a way to do this.
Please let me know, if there is a way to listen for the read operation and do cache lookup first.
Thanks.
I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.
That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?
I'm working on a sample Service Fabric project, where I have to maintain a shopping list. For this I have a ShoppingList actor, which is identifiable by a specific id. It stores the current list content in its state using StateManager. All works fine.
However, in parallel I'd like to maintain the shopping list content in a sql database. In particular:
store all add/remove item request for future analysis (ML)
on first actor initialization load list content from db (e.g. after cluster has been re-created)
What is the best approach to achieve that? Create a custom StateProvider (how? can't find examples)?
Or maybe have another service/actor for handling all db operations (possibly using queues and reminders)?
All examples seem to completely rely on default StateManager, with no data persistence to external storage, so I'm not sure what's the best practice.
The best way will be to have a separate entity responsible for storing data to DB. And actor will just send an event (not implying SF events) with some data about performed operation, and another entity will catch it and perform the rest of the work.
But of course you can implement this thing in actor itself, but it will bring two possible issues:
Actor will be not able to process other requests if there will be some issues with DB or connectivity between actor and DB or if there will be high loading of DB itself and it will process requests slowly. The actor would have to wait till transferring to DB successfully completes.
Possible overloading of DB with many single connections from many actors instead of one or several connection from another entity and batch insertion.
So, your final solution will depend on workload of your system. But definitely you will need a reliable queue to safely store data in DB if value of such data is too high to afford a loss.
Also, I think you could use default state manager to store logs and information about transactions before it will be transferred to DB and remove from service's state after transaction completes. There is no need to have permanent storage of such data in services.
And another things to take into consideration — reading from DB. Probably, if you have relationship database and will update with new records only one table + if there will be huge amount of actors that will query such data on activation, you will have performance degradation as this table will be locked for reading or writing if you will not configure it to behave differently. So, probably, you will need caching system to read data for actors activation — depends on your workload.
And about implementing your custom State Manager: take a look at this example. Basically, all you need to do is to implement IReliableStateManagerReplica interface and pass it to StatefullService constructor.
How any query is processed in DSpace and data is managed between front end and PostgreSQL
Like every other webapp running in a Servlet Container like Tomcat, the file WEB-INF/web.xml controls how a query is processed. In case of DSpace's JSPUI you'll find this file in [dspace-install]/webapps/jspui/WEB-INF/web.xml. The JSPUI defines several filters, listeners and servlets to process a request.
The filters are used to report that the JSPUI is running, that restricted areas can be seen by authenticated users or even by authenticated administrators only and to handle Content Negotiation.
The listeners ensure that DSpace has started correctly. During its start DSpace loads the configuration, opens database connections that it uses in a connection pool, let Spring do its IoC magic and so on.
For the beginning the most important part to see how a query is processed are the servlets and the servlet-mappings. A servlet-mapping defines which servlet is used to process a request with a specific request path: e.g. all requests to example.com/dspace-jspui/handle/* will be processed by org.dspace.app.webui.servlet.HandleServlet, all requests to example.com/dspace-jspui/submit will be processed by org.dspace.app.webui.servlet.SubmissionController.
The servlets uses their Java code ;-) and the DSpace Java API to process the request. You'll find most of it in the dspace-api module (see [dspace-source]/dspace-api/src/main/java/...) and some smaller part in dspace-services module ([dspace-source]dspace-services/src/main/java/...). Within the DSpace Java API their are two important classes if you're interested in the communication with the database:
One is org.dspace.core.Context. The context contains information whether and which user is logged in, an initialized and connected database connection (if all went well) and a cache. The methods Context.abort(), Context.commit() and Context.complete() are used to manage the database transaction. That is the reason, why almost all methods manipulating the database requests a Context as method parameter: it controls the database connection and the database transaction.
The other one is org.dspace.storage.rdbms.DatabaseManager. The DatabaseManager is used to handle database queries, updates, deletes and so on. All DSpaceObjects contains an object TableRow which contains the information of the object stored in the database. Inside the DSpaceObject classes (e.g. org.dspace.content.Item, org.dspace.content.Collection, ...) the TableRow may be manipulated and the changes stored back to the database by using DatabaseManager.update(Context, DSpaceObject). The DatabaseManager provides several methods to send SQL queries to the database, to update, delete, insert or even create data in the database. Just take a look to its API or look for "SELECT" it the DSpace source to get an example.
In JSPUI it is important to use Context.commit() if you want to commit the database state. If a request is processed and Context.commit() was not called, then the transaction will be aborted and the changes gets lost. If you call Context.complete() the transaction will be committed, the database connection will be freed and the context is marked as been finished. After you called Context.complete() the context cannot be used for a database connection any more.
DSpace is quite a huge project and their could be written a lot more about its ORM, the initialization of the database and so on. But this should already help you to start developing for DSpace. I would recommend you to read the part "Architecture" in the DSpace manual: https://wiki.duraspace.org/display/DSDOC5x/Architecture
If you have more specific questions you are always invited to ask them here on stackoverflow or on our mailing lists (http://sourceforge.net/p/dspace/mailman/) dspace-tech (for any question about DSpace) and dspace-devel (for question regarding the development of DSpace).
It depends on the version of DSpace you are running, along with your configuration.
In DSpace 4.0 or above, by default, the DSpace JSPUI uses Apache Solr for all searching and browsing. DSpace performs all indexing and querying of Solr via its Discovery module. The Discovery (Solr) based searche/indexing classes are available under the "org.dspace.discovery" package.
In earlier versions of DSpace (3.x or below), by default, the DSpace JSPUI uses Apache Lucene directly. In these older versions, DSpace called Lucene directly for all indexing and searching. The Lucene based search/indexing classes are available under the "org.dspace.search" package.
In both situations, queries are passed directly to either Solr or Lucene (again depending on the version of DSpace). The results are parsed and displayed within the DSpace UI.