Hibernate Search 6.0.2.Final with opendistro

Hibernate Search 6.0.2.Final with opendistro - hibernate-search

I have a Spring Boot 2.4.2 application integrated with Hibernate Search 6.0.2.Final
When using standard elasticsearch everything works fine (read/write) on persisting new entities. The index also gets created as expected myindex-000001 based on the default simple index strategy.
However, when I switched the backend to opendistro (latest) I only see a single index created by the name myindex-write (different that the expected myindex-000001).
The write operations work fine as expected (due to the suffix -write), however the read operations fail with the error:
root_cause": [
{
"type": "index_not_found_exception",
"reason": "no such index [myindex-read]",
"resource.type": "index_or_alias",
"resource.id": "myindex-read",
"index_uuid": "_na_",
"index": "myindex-read"
}
]
GET /_cat/aliases on opendistro show that there are no aliases for the index.
What is the best possible way to resolve this? no-alias strategy shown here? The downside of using no-alias is the lack of blue-green deployment like re-indexing. Is custom index strategy the best way to solve this?

All the above issues were caused because I had set hibernate.search.schema_management.strategy to none. The indexes need to be created manually, as mentioned in the docs here

Related

Spring data mongodb federation attempts -- how can I get interface methods to use a custom configured mongotemplate?

In my application, I need to be able to connect to any number of mongodb hosts, and any number of databases in any of those hosts to support at least this basic level of query federation. This is specified by configuration, so, for any given installation of our app, I cannot know ahead of time how many collections I will need to access. I based my attempt on configuration that I saw in this Baeldung article with some modifications to suit my requirements. My configuration looks something like this yaml:
datasources:
- name: source1
uri: mongodb://user1:pass1#127.0.0.1:27017
fq_collection: db1.coll1
- name: source2
uri: mongodb://user1:pass1#192.168.0.100:27017
fq_collection: db2.coll2
And, depending on the installation, there could be any number of datasources entries. So, in my #Configuration class, I can iterate through these entries that are injected via configuration properties. But I want to create a MongoTemplate that I can set up for each of these, since I cannot rely on the default MongoTemplate. The solution that I have attempted is to create a repository interface, and then to create a custom impl that will accept the configured MongoTemplate. When I use this code to create each Repository instance with its template:
public MongoRepository<String, Item> mongoCustomRepositories(MongoTemplate template) {
MyCustomMongoRepository customImpl = new MyCustomMongoRepositoryImpl(template);
MongoRepositoryFactory repositoryFactory = new MongoRepositoryFactory(template);
return repositoryFactory.getRepository(MyMongoRepository.class, customImpl);
}
And I call it from a #Bean method that returns the list of all of these repositories created from the config entries, I can inject the repositories into service classes.
UPDATE/EDIT: Ok, I set mongodb profiling to 2 in order to log the queries. It turns out that, in fact, the queries are being sent to mongodb, but the problem is that the collection names are not being set for the model. I can't believe that I forgot about this, but I did, so it was using the lower camel case model class name, which will make sure that there are no documents to be retrieved. I have default collection names, but the specific collection names are set in the configuration, like the example YAML shows. I have a couple of ideas, but if anyone has a suggestion about how to set these dynamically, then that would help a lot.
EDIT 2: I did a bunch of work and I have it almost working. You can see my work in my repo. However, in doing this, I uncovered a bug in spring-data-mongodb, and I filed an issue.

"winning-configuration-property" algorithm : spring configuration properties run time determination based on application specific qualifiers

I need to implement a "winning-configuration-property" algorithm in my application.
For example:
for property: dashboard_material
i would create a file (I am planning to represent each property as a file. This is slightly negotiable)dashboard_material.yml , with the following value
(which format i believe presents a consolidated view of the variants of the property and is more suitable for impact analysis when someone changes values in a particular scenario) :
car_type=luxury&car_subtype=luxury_sedan&special_features=none : leather
car_type=luxury&car_subtype=luxury_sedan&special_features=limited_edition : premium_leather
car_type=economy : pseudo_leather
default : pseudo_leather
I need the closest match. A luxury car can be a sedan or a compact.
I am assuming these are "decorators" of a car in object oriented terms, but not finding any useful implementation for the above problem from sample decorator patterns.
For example an API:
GET /configurations/dashboard_material should return the following values based on input parameters:
car_type=economy : pseudo_leather
car_type=luxury & car_subtype=luxury_sedan : leather
car_type=luxury : pseudo_leather (from default. Error or null if value does not exist)
This looks very similiar to the "specifications" or "queryDSL" problem with GetAPIs - in terms of slicing and dicing based on criteria.
but basically i am looking for a run-time determination of a config value from a single microservice
(which is a spring config client. I use git2consul to push values from git into consul KV. The spring config client is tied into the consul KV. I am open to any equivalent or better alternatives).
I would ideally like to do all the processing as a post-processing after the configuration is read (from either a spring config server or consul KV),
so that no real processing happens after the query is recieved. The same post processing will also have to happen after every spring config client "refresh" interval based
on configuration property updates.
I have previously seen such an implementation (as an ops engineer) with netflix archaius implementation, but not again finding any suitable text on the archaius pages.
My trivial/naive solution would be to create multiple maps/trees to store the data and then consolidate them into a single map based on the API request effectively overriding some of the values from the lower priority maps.
I am looking for any open-source implementations or references and avoid having to create new code.

Titan - How to Use 'Lucene' Search Backend

I am attempting to use the lucene search backend with Titan. I am setting the index.search.backend property to lucene as so.
TitanFactory.Builder config = TitanFactory.build();
config.set("storage.backend", "hbase");
config.set("storage.hostname", "node1");
config.set("storage.hbase.table", "titan");
config.set("index.search.backend", "lucene");
config.set("index.search.directory", "/tmp/foo");
TitanGraph graph = config.open();
GraphOfTheGodsFactory.load(graph);
graph.getVertices().forEach(v -> System.out.println(v.toString()));
Of course, this does not work because this setting is of the GLOBAL_OFFLINE variety. The logs make me aware of this. Titan ignores my 'lucene' setting and then attempts to use Elasticsearch as the search backend.
WARN com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration
- Local setting index.search.backend=lucene (Type: GLOBAL_OFFLINE)
is overridden by globally managed value (elasticsearch). Use
the ManagementSystem interface instead of the local configuration to control
this setting.
After some reading, I understand that I need to use the Management System to set the index.search.backend. I need some code that looks something like the following.
graph.getManagementSystem().set("index.search.backend", "lucene");
graph.getManagementSystem().set("index.search.directory", "/tmp/foo");
graph.getManagementSystem().commit();
I am confused on how to integrate this in my original example code above. Since this is a GLOBAL_OFFLINE setting, I cannot set this on an open graph. At the same time, I do not know how to get a graph unless I open one first. How do I set the search backend correctly?

There is no inmemory search backend. The supported search backends are Lucene, Solr, and Elasticsearch.
Lucene is a good option for a small scale, single machine search backend. You need to set 2 properties to do this, index.search.backend and index.search.directory:
index.search.backend=lucene
index.search.directory=/path/to/titansearchindexdir
As you've noted, the search backend is a GLOBAL_OFFLINE, so you should configure this before initially creating your graph. Since you've already created a titan table in your HBase, either disable and drop the titan table, or set your graph configuration to point at a new storage.hbase.table.

Marklogic REST API search for latest document version

We need to restrict a MarkLogic search to the latest version of managed documents, using Marklogic's REST api. We're using MarkLogic 6.
Using straight xquery, you can use dls:documents-query() as an additional-query option (see
Is there any way to restrict marklogic search on specific version of the document).
But the REST api requires XML, not arbitrary xquery. You can turn ordinary cts queries into XML easily enough (execute <some-element>{cts:word-query("hello world")}</some-element> in QConsole).
If I try that with dls:documents-query() I get this:
<cts:properties-query xmlns:cts="http://marklogic.com/cts">
<cts:registered-query>
<cts:id>17524193535823153377</cts:id>
</cts:registered-query>
</cts:properties-query>
Apart from being less than totally transparent... how safe is that number? We'll need to put it in our query options, so it's not something we can regenerate every time we need it. I've looked on two different installations here and the the number's the same, but is it guaranteed to be the same, and will it ever change? On, for example, a MarkLogic upgrade?
Also, assuming the number is safe, will the registered-query always be there? The documentation says that registered queries may be cleared by the system at various times, but it's talking about user-defined registered queries, and I'm not sure how much of that applies to internal queries.
Is this even the right approach? If we can't do this we can always set up collections and restrict the search that way, but we'd rather use dls:documents-query if possible.

The number is a registered query id, and is deterministic. That is, it will be the same every time the query is registered. That behavior has been invariant across a couple of major releases, but is not guaranteed. And as you already know, the server can unregister a query at any time. If that happens, any query using that id will throw an XDMP-UNREGISTERED error. So it's best to regenerate the query when you need it, perhaps by calling dls:documents-query again. It's safest to do this in the same request as the subsequent search.
So I'd suggest extending the REST API with your own version of the search endpoint. Your new endpoint could add dls:documents-query to the input query. That way the registered query would be generated in the same request with the subsequent search. For ML6, http://docs.marklogic.com/6.0/guide/rest-dev/extensions explains how to do this.

The call to dls:documents-query() makes sure the query is actually registered (on the fly if necessary), but that won't work from REST api. You could extend the REST api with a custom extension as suggested by Mike, but you could also use the following:
cts:properties-query(
cts:and-not-query(
cts:element-value-query(
xs:QName("dls:latest"),
"true",
(),
0
),
cts:element-query(
xs:QName("dls:version-id"),
cts:and-query(())
)
)
)
That is the query that is registered by dls:documents-query(). Might not be future proof though, so check at each upgrade. You can find the definition of the function in /Modules/MarkLogic/dls.xqy
HTH!

Naming resources prior to creation using Neo4j Restful API

This is for people with knowledge on REST and Neo4j.
Is it possible to name a node before creating it in Neo4j ?
Typical Restful thing, you create a URI "XXX/db/data/node/mynode" and you want to create a node with this identifier if it is not existent in the moment.
For all that I have been researching (and testing) to the present moment, the answer is : "no it is not possible, neo4j will just always automatically give ids to the created nodes and the attempt to do create a URI and use POST to cause its creation will result in 405"
Thanks in advance.

That's correct, you can't set the id of a node. What you can do is to add some other kind of id as a property, see create node with properties. Just make sure it's indexed automatically and then you can query that index for exact matches.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse