Applying the same Analyzer to Queries and Fields - hibernate-search

I am trying to build a basic search for my API backend. Users pass arbitrary queries and the backend is supposed to return results (obviously). I would prefer a solution that works with a local index as well as Elasticsearch.
On my entity I defined an analyzer like this:
#AnalyzerDef(name = "ngram",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class ),
filters = {
#TokenFilterDef(factory = StandardFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class),
#TokenFilterDef(factory = NGramFilterFactory.class,
params = {
#Parameter(name = "minGramSize", value = "2"),
#Parameter(name = "maxGramSize", value = "3") } )
}
)
For the query, I tried the following:
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(this.entityManager);
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("ngram");
QueryParser queryParser = new MultiFieldQueryParser(ALL_FIELDS, analyzer);
queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
org.apache.lucene.search.Query query = queryParser.parse(queryString);
javax.persistence.Query persistenceQuery =
fullTextEntityManager.createFullTextQuery(query, MyEntity.class);
List<MyEntity> result = persistenceQuery.getResultList();
As far as I understand, I need to provide an analyzer for the Query so that the search query are "ngram-tokenized" and a match can be found. Before, I used SimpleAnalyzer and as a result, the search only matched full words which - I think - backs my theory (Sorry, I am still learning this).
The above code gives me a NullPointerException:
java.lang.NullPointerException: null
at org.hibernate.search.engine.impl.ImmutableSearchFactory.getAnalyzer(ImmutableSearchFactory.java:370) ~[hibernate-search-engine-5.11.1.Final.jar:5.11.1.Final]
at org.hibernate.search.engine.impl.MutableSearchFactory.getAnalyzer(MutableSearchFactory.java:203) ~[hibernate-search-engine-5.11.1.Final.jar:5.11.1.Final]
at org.hibernate.search.impl.SearchFactoryImpl.getAnalyzer(SearchFactoryImpl.java:50) ~[hibernate-search-orm-5.11.1.Final.jar:5.11.1.Final]
in the line
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("ngram");

You cannot retrieve the analyzer from Hibernate Search when using the Elasticsearch integration, because in that case there is no analyzer locally: the analyzer only exists remotely, in the Elasticsearch cluster.
If you only need a subset of the query syntax, try out the "simple query string" query: it's a query that can be built using the DSL (so it will work the same with Lucene and Elasticsearch) and that provides the most common features (boolean queries, fuzziness, phrases, ...). For example:
Query luceneQuery = queryBuilder.simpleQueryString()
.onFields("name", "history", "description")
.matching("war + (peace | harmony)")
.createQuery();
The syntax is a bit different, but only because it's aiming at end users and tries to be simpler.
EDIT: If simple query strings are not an option, you can create an analyzer manually: this should work even when using the Elasticsearch integration.
org.apache.lucene.analysis.custom.CustomAnalyzer#builder() should be a good starting point. There are several examples in the javadoc of that class.
Make sure you only create the analyzer once and store it somewhere, e.g. in a static constant: creating an analyzer may be costly.

Related

Is there an ExampleMatcher with not equals condition

I am working with spring data jpa, I would like to know in QueryByExample(QBE) can i get all the records (where colum value not equals 'XXX')
I have seen ExampleMatcher , but couldnt find anything like not equals
Employee filterBy = new Employee();
filterBy.setLastName("ar");
//Filter - ignore case search and contains
ExampleMatcher matcher = ExampleMatcher.matching()
.withStringMatcher(StringMatcher.CONTAINING) // Match string containing pattern
.withIgnoreCase(); // ignore case sensitivity
example = Example.of(filterBy, matcher);
The above code gets all the records where lastname is ar, but i am looking for lastname should not be "ar".
Is there any other ExampleMatcher ?
BizExceptionConfig condition = configRequestPair.getCondition();
ExampleMatcher exampleMatcher = ExampleMatcher.matching()
.withMatcher("appCode", startsWith())
.withMatcher("name", startsWith())
.withMatcher("code", startsWith());
if(Objects.isNull(condition.getLifecycle())){
condition.setLifecycle(LifeCycle.DELETE.getCode());
HashMap<String, Integer> params = new HashMap<>();
params.put("$ne", LifeCycle.DELETE.getCode());
exampleMatcher = exampleMatcher.withTransformer("lifecycle", (obj) -> Optional.of(params));
}
Example<BizExceptionConfig> example = Example.of(condition, exampleMatcher);
Page<BizExceptionConfig> pageRecord = bizExcConfigRepository.findAll(example, PageUtil.toPageRequest(configRequestPair.getPage()));`enter code here`
This problem can be solved by "withTransformer". JPA is rather limited,
so I suggest using Mongotmpl. I hope it can help you
Your problem could be solved with QBE by using REGEX as StringMatcher and the solution would look like the following:
Employee filterBy = new Employee();
filterBy.setLastName("ar");
filterBy.setLastName(String.format("^(?!.*$1$).*$", Pattern.quote(filterBy.getLastName())));
//filterBy.setLastName(String.format(".*(?<!$1)$", Pattern.quote(filterBy.getLastName()))); // this would be another alternative
ExampleMatcher matcher = ExampleMatcher.matching()
.withStringMatcher(StringMatcher.REGEX) // Match string containing pattern
.withIgnoreCase(); // ignore case sensitivity
Example example = Example.of(filterBy, matcher);
Unfortunately, even though the developer would at first think that Regular Expressions are supported (as there exists aforementioned enum constant), Spring actually currently doesn't support them - and according to the discussion of the related Jira issue, it never won't: https://jira.spring.io/browse/DATAJPA-944

How to fix FirstOrDefault returning Null in Linq

My Linq Query keeps returning the null error on FirstOrDefault
The cast to value type 'System.Int32' failed because the materialized value is null
because it can't find any records to match on the ClinicalAssetID form the ClinicalReading Table, fair enough!
But I want the fields in my details page just to appear blank if the table does not have matching entry.
But how can I handle the null issue when using the order by function ?
Current Code:
var ClinicalASSPATINCVM = (from s in db.ClinicalAssets
join cp in db.ClinicalPATs on s.ClinicalAssetID equals cp.ClinicalAssetID into AP
from subASSPAT in AP.DefaultIfEmpty()
join ci in db.ClinicalINSs on s.ClinicalAssetID equals ci.ClinicalAssetID into AI
from subASSINC in AI.DefaultIfEmpty()
join co in db.ClinicalReadings on s.ClinicalAssetID equals co.ClinicalAssetID into AR
let subASSRED = AR.OrderByDescending(subASSRED => subASSRED.MeterReadingDone).FirstOrDefault()
select new ClinicalASSPATINCVM
{
ClinicalAssetID = s.ClinicalAssetID,
AssetTypeName = s.AssetTypeName,
ProductName = s.ProductName,
ModelName = s.ModelName,
SupplierName = s.SupplierName,
ManufacturerName = s.ManufacturerName,
SerialNo = s.SerialNo,
PurchaseDate = s.PurchaseDate,
PoNo = s.PoNo,
Costing = s.Costing,
TeamName = s.TeamName,
StaffName = s.StaffName,
WarrantyEndDate = subASSPAT.WarrantyEndDate,
InspectionDate = subASSPAT.InspectionDate,
InspectionOutcomeResult = subASSPAT.InspectionOutcomeResult,
InspectionDocumnets = subASSPAT.InspectionDocumnets,
LastTypeofInspection = subASSINC.LastTypeofInspection,
NextInspectionDate = subASSINC.NextInspectionDate,
NextInspectionType = subASSINC.NextInspectionType,
MeterReadingDone = subASSRED.MeterReadingDone,
MeterReadingDue = subASSRED.MeterReadingDue,
MeterReading = subASSRED.MeterReading,
MeterUnitsUsed = subASSRED.MeterUnitsUsed,
FilterReplaced = subASSRED.FilterReplaced
}).FirstOrDefault(x => x.ClinicalAssetID == id);
Tried this but doesn't work
.DefaultIfEmpty(new ClinicalASSPATINCVM())
.FirstOrDefault()
Error was:
CS1929 'IOrderedEnumerable<ClinicalReading>' does not contain a definition for 'DefaultIfEmpty' and the best extension method overload 'Queryable.DefaultIfEmpty<ClinicalASSPATINCVM>(IQueryable<ClinicalASSPATINCVM>, ClinicalASSPATINCVM)' requires a receiver of type 'IQueryable<ClinicalASSPATINCVM>'
Feel a little closer with this but still errors
let subASSRED = AR.OrderByDescending(subASSRED => (subASSRED.MeterReadingDone != null) ? subASSRED.MeterReadingDone : String.Empty).FirstOrDefault()
Error:
CS0173 Type of conditional expression cannot be determined because there is no implicit conversion between 'System.DateTime?' and 'string'
The original error means that some of the following properties of the ClinicalASSPATINCVM class - MeterReadingDone, MeterReadingDue, MeterReading, MeterUnitsUsed, or FilterReplaced is of type int.
Remember that subASSRED here
let subASSRED = AR.OrderByDescending(subASSRED => subASSRED.MeterReadingDone).FirstOrDefault()
might be null (no corresponding record).
Now look at this part of the projection:
MeterReadingDone = subASSRED.MeterReadingDone,
MeterReadingDue = subASSRED.MeterReadingDue,
MeterReading = subASSRED.MeterReading,
MeterUnitsUsed = subASSRED.MeterUnitsUsed,
FilterReplaced = subASSRED.FilterReplaced
If that was LINQ to Objects, all these would generate NRE (Null Reference Exception) at runtime. In LINQ to Entities this is converted and executed as SQL. SQL has no issues with expression like subASSRED.SomeProperty because SQL supports NULL naturally even if SomeProperty normally does not allow NULL. So the SQL query executes normally, but now EF must materialize the result into objects, and the C# object property is not nullable, hence the error in question.
To solve it, find the int property(es) and use the following pattern inside query:
SomeIntProperty = (int?)subASSRED.SomeIntProperty ?? 0 // or other meaningful default
or change receiving object property type to int? and leave the original query as is.
Do the same for any non nullable type property, e.g. DateTime, double, decimal, Guid etc.
You're problem is because your DefaultIfEmpty is executed AsQueryable. Perform it AsEnumerable and it will work:
// create the default element only once!
static readonly ClinicalAssPatInVcm defaultElement = new ClinicalAssPatInVcm ();
var result = <my big linq query>
.Where(x => x.ClinicalAssetID == id)
.AsEnumerable()
.DefaultIfEmpty(defaultElement)
.FirstOrDefault();
This won't lead to a performance penalty!
Database management systems are extremely optimized for selecting data. One of the slower parts of a database query is the transport of the selected data to your local process. Hence it is wise to let the DBMS do most of the selecting, and only after you know that you only have the data that you really plan to use, move the data to your local process.
In your case, you need at utmost one element from your DBMS, and if there is nothing, you want to use a default object instead.
AsQueryable will move the selected data to your local process in a smart way, probably per "page" of selected data.
The page size is a good compromise: not too small, so you don't have to ask for the next page too often; not too large, so that you don't transfer much more items than you actually use.
Besides, because of the Where statement you expect at utmost one element anyway. So that a full "page" is fetched is no problem, the page will contain only one element.
After the page is fetched, DefaultIfEmpty checks if the page is empty, and if so, returns a sequence containing the defaultElement. If not, it returns the complete page.
After the DefaultIfEmpty you only take the first element, which is what you want.

mapReduce inline results with java mongodb driver 3.2

How to have inline results from a mapReducet with the mongodb java driver 3.2?
with driver version 2.x I was doing:
DBColleciont coll = client.getDB(dbName).getCollection(collName);
coll.mapReduce(map, reduce, null, OutputType.INLINE, query);
the new 3.x driver has two mapReduce() methods returning MapReduceIterable which misses a method to specify the INLINE output mode.
MongoCollection<Documetn> coll = client.getDatabase(dbName).getCollection(collName)
coll
.mapReduce(map, reduce).
.filter(query);
You can create the map-reduce command manually:
String mapFunction = ...
String reduceFunction = ...
BsonDocument command = new BsonDocument();
BsonJavaScript map = new BsonJavaScript(mapFunction);
BsonJavaScript red = new BsonJavaScript(reduceFunction);
BsonDocument query = new BsonDocument("someidentifier", new BsonString("somevalue"));
command.append("mapreduce", new BsonString("mySourceCollection"));
command.append("query", query);
command.append("map", map);
command.append("reduce", red);
command.append("out", new BsonDocument("inline", new BsonBoolean(true)));
Document result = mongoClient.getDatabase(database).runCommand(command);
I think this is extremely ugly, but it is the only working solution I found so far using 3.2. (... and would be very interested in a better variant, too... ;-))
I think I found it...
I had a deeper look into mongodb's Java driver source and it seems that the INLINE output feature is implicitly accessible:
The class MapReduceIterableImpl<TDocument, TResult>(MapReduceIterableImpl.java), which is the default implementation of the Interface return type of mapReduce(),
holds a private boolean inline with initial value true.
The only place where this can ever be switched to false is the method collectionName(final String collectionName) for which the description is as follows:
Sets the collectionName for the output of the MapReduce
The default action is replace the collection if it exists, to change this use action(com.mongodb.client.model.MapReduceAction).
If you never call this method on the object instance after mapReduce(), it will remain true as initialized...meaning: if there is no output collection, it must be inline.
Later on, when you access your result with iterator(), first(), forEach(...) etc internally the execute() method gets called which hast the magic if condition:
if (inline) {
MapReduceWithInlineResultsOperation<TResult> operation =
new MapReduceWithInlineResultsOperation<TResult>(namespace,
new BsonJavaScript(mapFunction),
new BsonJavaScript(reduceFunction),
codecRegistry.get(resultClass))
.filter(toBsonDocument(filter))
.limit(limit)
.maxTime(maxTimeMS, MILLISECONDS)
.jsMode(jsMode)
.scope(toBsonDocument(scope))
.sort(toBsonDocument(sort))
.verbose(verbose)
.readConcern(readConcern);
....
} else {
MapReduceToCollectionOperation operation =
new MapReduceToCollectionOperation(namespace, new BsonJavaScript(mapFunction), new BsonJavaScript(reduceFunction),
collectionName)
.filter(toBsonDocument(filter))
.limit(limit)
.maxTime(maxTimeMS, MILLISECONDS)
.jsMode(jsMode)
.scope(toBsonDocument(scope))
.sort(toBsonDocument(sort))
.verbose(verbose)
.action(action.getValue())
.nonAtomic(nonAtomic)
.sharded(sharded)
.databaseName(databaseName)
.bypassDocumentValidation(bypassDocumentValidation);
...so it is instanciating MapReduceWithInlineResultsOperation when collectionName() has not been called.
I had no chance to test it because my NetBeans hates me at the moment, but I think it is pretty clear.
What do you think, did I miss something?
Would be glad if I could help you shift the code to API 3.x, great project!

How to make Appstats show both small and read operations?

I'm profiling my application locally (using the Dev server) to get more information about how GAE works. My tests are comparing the common full Entity query and the Projection Query. In my tests both queries do the same query, but the Projection is specified with 2 properties. The test kind has 100 properties, all with the same value for each Entity, with a total of 10 Entities. An image with the Datastore viewer and the Appstats generated data is shown bellow. In the Appstats image, Request 4 is a memcache flush, Request 3 is the test database creation (it was already created, so no costs here), Request 2 is the full Entity query and Request 1 is the projection query.
I'm surprised that both queries resulted in the same amount of reads. My guess is that small and read operations and being reported the same by Appstats. If this is the case, I want to separate them in the reports. That's the queries related functions:
// Full Entity Query
public ReturnCodes doQuery() {
DatastoreService dataStore = DatastoreServiceFactory.getDatastoreService();
for(int i = 0; i < numIters; ++i) {
Filter filter = new FilterPredicate(DBCreation.PROPERTY_NAME_PREFIX + i,
FilterOperator.NOT_EQUAL, i);
Query query = new Query(DBCreation.ENTITY_NAME).setFilter(filter);
PreparedQuery prepQuery = dataStore.prepare(query);
Iterable<Entity> results = prepQuery.asIterable();
for(Entity result : results) {
log.info(result.toString());
}
}
return ReturnCodes.SUCCESS;
}
// Projection Query
public ReturnCodes doQuery() {
DatastoreService dataStore = DatastoreServiceFactory.getDatastoreService();
for(int i = 0; i < numIters; ++i) {
String projectionPropName = DBCreation.PROPERTY_NAME_PREFIX + i;
Filter filter = new FilterPredicate(DBCreation.PROPERTY_NAME_PREFIX + i,
FilterOperator.NOT_EQUAL, i);
Query query = new Query(DBCreation.ENTITY_NAME).setFilter(filter);
query.addProjection(new PropertyProjection(DBCreation.PROPERTY_NAME_PREFIX + 0, Integer.class));
query.addProjection(new PropertyProjection(DBCreation.PROPERTY_NAME_PREFIX + 1, Integer.class));
PreparedQuery prepQuery = dataStore.prepare(query);
Iterable<Entity> results = prepQuery.asIterable();
for(Entity result : results) {
log.info(result.toString());
}
}
return ReturnCodes.SUCCESS;
}
Any ideas?
EDIT: To get a better overview of the problem I have created another test, which do the same query but uses the keys only query instead. For this case, Appstats is correctly showing DATASTORE_SMALL operations in the report. I'm still pretty confused about the behavior of the projection query which should also be reporting DATASTORE_SMALL operations. Please help!
[I wrote the go port of appstats, so this is based on my experience and recollection.]
My guess is this is a bug in appstats, which is a relatively unmaintained program. Projection queries are new, so appstats may not be aware of them, and treats them as normal read queries.
For some background, calculating costs is difficult. For write ops, the cost are returned with the results, as they must be, since the app has no way of knowing what changed (which is where the write costs happen). For reads and small ops, however, there is a formula to calculate the cost. Each appstats implementation (python, java, go) must implement this calculation, including reflection or whatever is needed over the request object to determine what's going on. The APIs for doing this are not entirely obvious, and there's lots of little things, so it's easy to get it wrong, and annoying to get it right.

Lucene.Net - weird behaviour in different servers

I was writing a search for one of our sites: (SITE A)
BooleanQuery booleanQuery = new BooleanQuery();
foreach (var field in fields)
{
QueryParser qp = new QueryParser(field, new StandardAnalyzer());
Query query = qp.Parse(search.ToLower() + "*");
if (field.Contains("Title")) { query.SetBoost((float)1.8); }
booleanQuery.Add(query, BooleanClause.Occur.SHOULD);
}
// CODE DIFFERENCE IS HERE
Query query2 = new TermQuery(new Term("StateProperties.IsActive", "True"));
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
// END CODE DIFFERENCE
Lucene.Net.Search.TopScoreDocCollector collector = Lucene.Net.Search.TopScoreDocCollector.create(21, true);
searcher.Search(booleanQuery, collector);
hits = collector.TopDocs().scoreDocs;
this was working as expected.
since we own a few sites, and they use the same skeleton,
i uploaded the search to another site ( SITE B )
but the search stopped returning results.
after playing a round a bit with the code, i managed to make it work like so: (showing only the rewriten lines of code)
QueryParser qp2 = new QueryParser("StateProperties.IsActive", new StandardAnalyzer());
Query query2 = qp2.Parse("True");
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
anyone knows why this is happening ?
i have checked the Lucene dll version, and its the same version in both sites (2.9.2.2)
is the code i have written in SITE A wrong ? is SITE B code wrong ?
is this my fault at all ? can production server infulance something like this ?
Doesn't they have individual indexes on disk? If they have been indexed differently, they would also return different results. One thing that comes to mind is if there is some sort of case sensitivity that matters, becayse a TermQuery will look for an EXACT match, where as the parser will try to tokenize/filter the search term according to the analyzer (and probably search for "true" instead of "True".