How any query is processed in DSpace and data is managed between front end and PostgreSQL
Like every other webapp running in a Servlet Container like Tomcat, the file WEB-INF/web.xml controls how a query is processed. In case of DSpace's JSPUI you'll find this file in [dspace-install]/webapps/jspui/WEB-INF/web.xml. The JSPUI defines several filters, listeners and servlets to process a request.
The filters are used to report that the JSPUI is running, that restricted areas can be seen by authenticated users or even by authenticated administrators only and to handle Content Negotiation.
The listeners ensure that DSpace has started correctly. During its start DSpace loads the configuration, opens database connections that it uses in a connection pool, let Spring do its IoC magic and so on.
For the beginning the most important part to see how a query is processed are the servlets and the servlet-mappings. A servlet-mapping defines which servlet is used to process a request with a specific request path: e.g. all requests to example.com/dspace-jspui/handle/* will be processed by org.dspace.app.webui.servlet.HandleServlet, all requests to example.com/dspace-jspui/submit will be processed by org.dspace.app.webui.servlet.SubmissionController.
The servlets uses their Java code ;-) and the DSpace Java API to process the request. You'll find most of it in the dspace-api module (see [dspace-source]/dspace-api/src/main/java/...) and some smaller part in dspace-services module ([dspace-source]dspace-services/src/main/java/...). Within the DSpace Java API their are two important classes if you're interested in the communication with the database:
One is org.dspace.core.Context. The context contains information whether and which user is logged in, an initialized and connected database connection (if all went well) and a cache. The methods Context.abort(), Context.commit() and Context.complete() are used to manage the database transaction. That is the reason, why almost all methods manipulating the database requests a Context as method parameter: it controls the database connection and the database transaction.
The other one is org.dspace.storage.rdbms.DatabaseManager. The DatabaseManager is used to handle database queries, updates, deletes and so on. All DSpaceObjects contains an object TableRow which contains the information of the object stored in the database. Inside the DSpaceObject classes (e.g. org.dspace.content.Item, org.dspace.content.Collection, ...) the TableRow may be manipulated and the changes stored back to the database by using DatabaseManager.update(Context, DSpaceObject). The DatabaseManager provides several methods to send SQL queries to the database, to update, delete, insert or even create data in the database. Just take a look to its API or look for "SELECT" it the DSpace source to get an example.
In JSPUI it is important to use Context.commit() if you want to commit the database state. If a request is processed and Context.commit() was not called, then the transaction will be aborted and the changes gets lost. If you call Context.complete() the transaction will be committed, the database connection will be freed and the context is marked as been finished. After you called Context.complete() the context cannot be used for a database connection any more.
DSpace is quite a huge project and their could be written a lot more about its ORM, the initialization of the database and so on. But this should already help you to start developing for DSpace. I would recommend you to read the part "Architecture" in the DSpace manual: https://wiki.duraspace.org/display/DSDOC5x/Architecture
If you have more specific questions you are always invited to ask them here on stackoverflow or on our mailing lists (http://sourceforge.net/p/dspace/mailman/) dspace-tech (for any question about DSpace) and dspace-devel (for question regarding the development of DSpace).
It depends on the version of DSpace you are running, along with your configuration.
In DSpace 4.0 or above, by default, the DSpace JSPUI uses Apache Solr for all searching and browsing. DSpace performs all indexing and querying of Solr via its Discovery module. The Discovery (Solr) based searche/indexing classes are available under the "org.dspace.discovery" package.
In earlier versions of DSpace (3.x or below), by default, the DSpace JSPUI uses Apache Lucene directly. In these older versions, DSpace called Lucene directly for all indexing and searching. The Lucene based search/indexing classes are available under the "org.dspace.search" package.
In both situations, queries are passed directly to either Solr or Lucene (again depending on the version of DSpace). The results are parsed and displayed within the DSpace UI.
Related
I am aware of CQRS pattern wherein Query to be used for Reading data and Command to be used for updating the data.
In a peculiar case where the rest api is POST but not updating the data directly rather calling an external POST api of other system and passing the details.
In such case which one holds true - use Query or Command?
*Update *
System involves multiple DB. Not necessarily different DB for query and command however
Super simple. If you know for sure the call will not update or modify state or data, then it's a query, if it does (or might) then it's a command.
However, CQRS is often more about the physical structure of your system. You might have separate command and query databases... and that complicates the answer. It can have both logical and physical aspects.
I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.
That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?
I have a business scenario where, whenever a new record is loaded into a DB table,
a) A notification will be sent to the client. Notification message is to convey data is loaded and ready for querying.
b) Upon receiving the notification, the Client will make an OData query to the JBOSS vitrual DB. Odata is supported by Teiid VDB
Problem is that: The new records (inserted via manual/automated SL script) that are not returned in the ODATA query response. It is always returning the cached result for first 5 minutes. Because the Odata has a default cache time setting to 5 minutes.
We want TEIID to always return all the records including the newly inserted one.
I tried the following option but it is not working as expected (https://developer.jboss.org/wiki/AHowToGuideForMaterializationcachingViewsInTeiid)
1) Cache hints
/*+ cache(ttl:300000) */ select * from Source.UpdateProduct
2) OPTION NOCACH
**** This works when I make a JDBC query to the DB.
Please suggest, how to turn off this caching in case of ODATA REST query ?
I think Teiid documentation https://docs.jboss.org/author/display/TEIID/OData+Support could help.
You don't specify version of Teiid you use, so I enclose the most current version's documentation.
Now when you go through the docs page, at the bottom there is section Configuration, where there are several configurable options.
Doesn't the skiptoken-cache-time option serve your need? Try setting it to lower value/zero and see if this helps. Just locate the odata war, open it, and change the WEB-INF/web.xml file.
Jan
I want to reuse an existing, transactional,paginated service class, which retrieves the items using JPA from a database, inside a Spring batch job, as a reader. I want to do that instead of using directly the JpaPagingItemReader basically because the JPA query is more complex to build and the service already provides this functionality.
My question would be what are the things I should take into account when developing the Spring batch adapter over this service. Although the reference documentation http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#pagingItemReaders has a section on reusing existing services, it doesn't say anything regarding the constraints, if there are any, of using such a transactional service.
Now, I looked at the JpaPagingItemReader as an example for building the reader, and I came up with a couple of questions I couldn't find answers for netiher in the documentation or on stackoverflow, although this post https://stackoverflow.com/a/26549831/4473261 helped.
The first thing I noticed is that a new transaction is used by the JpaPagingItemReader for reading a page of data. The above post says that this new transaction is needed "so that features like retry and skip can be correctly performed.". I have also found this article related to the matter https://blog.codecentric.de/en/2012/03/transactions-in-spring-batch-part-3-skip-and-retry/ that says that "when a skippable exception occurs during reading, we just increase the skip count and keep the exception for a later call on the onSkipInRead method of the SkipListener, if configured. There’s no rollback". So I assume that the reader has to do any reading of the records in a new transaction so that if a rollback of the transaction started when the processing of the chunk began happened, then the reader is not affected. I am wondering if this is true and if in this case my adapter should create a new transaction, invoke the service inside that transaction and then commit the transaction, similarly to how the JpaPagingItemReader does it. If that's true though, I wonder why there isn't any template provided by the framework which creates the transaction, delegates to the service the actual call to retrieve the data and then commits the transaction.
Greetings,
Cristi
From a reader perspective, there really isn't much to be concerned about. You can see in our JmsItemReader which obviously works with a transactional store that we don't take any additional precautions within the ItemReader itself.
What really matters is how you configure your step. When configuring your step, you'll need to mark the reader as transactional so that Spring Batch handles rollback correctly. When Spring Batch reads items in a fault tollerant step, the default behavior is to buffer them so that they won't be re-read on failure (retry, skip, etc). However, since the items read from a transactional store are tied to the transaction (and therefore reset when the rollback occurs), you need to tell Spring Batch to not buffer the items as they are read.
To mark the ItemReader as transactional, you'll set the not-quite-well-named flag is-reader-transactional-queue to true. You can read more about configuring steps and transactions in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html
I have set up a subscription between Orion ContextBroker and Cosmos BigData using Cygnus, and data is properly persisted in Cosmos when an update is made to Orion.
But I want to analyze the data in Cosmos and return the results to Orion, and finally access the result data in Orion from "outside".
How would one do this? Of course, I would like the solution I build to be as "automated" as possible, but mostly I just want to solve this problem.
Any advise is much appreciated!
As general response (as also the question is very general ;), what you need is a process that access to information stored in Cosmos (either using HDFS APIs -such as WebHDFS or HttpFs-, Hive queries, general MapReduce jobs on top of Hadoop, etc.), then implement the client side of the NGSI API that Orion implements in order to inject context elements into Orion based in the information you retrieved from Cosmos. The key operation to do so in the Orion API is updateContext.
The automation degree would depend on how you implement that process. It can be as automated as you want.
EDIT: considering this answer comments, I will try to add more detail.
What I mean is to develop a piece of software (let's call it APOS -A Piece Of Software) implementing the following behaviour:
APOS will grab data from Cosmos any of the interfaces provided by Cosmos, i.e. WebHDFS/HttpFs, Hive, MapReduce jobs, etc.
APOS will process the data to produce some result
APOS will inject that result in Orion, using the Orion REST API described in the Orion user manual. It is particularly useful for that task the updateContext operation. From a client-server point of view, Orion is a server exposing a REST API and APOS is the client interacting with that server.
It is completely up to you how to implement this APOS and how orchestrate the flow from 1 to 3 (e.g. it can run in batch mode all midnights, be triggered by user interaction on a web portal, etc.).
At the present moment, FI-WARE doesn't provide any generic enabler to convert from Cosmos data to NGSI given that each particular realization of the steps 1 to 3 above is different and depends on the use case. However, note that there is software component named Cygnus which implements the other way: from NGIS to Cosmos.