Exclude document(s) if condition true - hibernate-search

I have three fields in an entity:
establishmentNameEn
IsTelPublishDa
isTelSecret
I have fuzzy search on establishmentNameEn. And now i want to apply condition to exclude document(s) if field IsTelPublishDa value is 0 or isTelSecret value is 1.
My final query is: (+establishmentNameEn:kamran~1 +(-IsTelPublishDa:[0 TO 0] -isTelSecret:[1 TO 1]))
But it is not returning result.
Query code:
private org.apache.lucene.search.Query excludeDoc(QueryBuilder queryBuilder) {
List<org.apache.lucene.search.Query> queries = new ArrayList<>();
queries.add(queryBuilder.keyword().onField("IsTelPublishDa").matching(0).createQuery());
queries.add(queryBuilder.keyword().onField("isTelSecret").matching(1).createQuery());
BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (Query qu : queries) {
builder.add(qu, BooleanClause.Occur.MUST_NOT);
}
return builder.build();
}
Main method:
Query fuzzyQuery = queryBuilder.keyword().fuzzy().withEditDistanceUpTo(1).onField("establishmentNameEn").matching(word).createQuery();
luceneQuery.add(fuzzyQuery);
luceneQuery.add(excludeDoc(queryBuilder));
BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (Query qu : luceneQuery) {
builder.add(qu, BooleanClause.Occur.MUST);
}

This will never match anything, because the boolean query only contains negative clauses:
BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (Query qu : queries) {
builder.add(qu, BooleanClause.Occur.MUST_NOT);
}
return builder.build();
That's quite confusing, but that's how Lucene works, and you're using a low-level Lucene API when you're using BooleanQuery.Builder.
Solution #1
If you want to avoid that kind of surprise in the future, make sure you always have positive clauses in your query. For example, refactor your code to add the "MUST_NOT" clause to the top-level boolean query:
// Main code
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(queryBuilder.keyword().fuzzy().withEditDistanceUpTo(1).onField("establishmentNameEn").matching(word).createQuery(), BooleanClause.Occur.MUST);
builder.add(excludedDoc(queryBuilder), BooleanClause.Occur.MUST_NOT);
private org.apache.lucene.search.Query excludedDoc(QueryBuilder queryBuilder) {
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(queryBuilder.keyword().onField("IsTelPublishDa").matching(0).createQuery(), BooleanClause.Occur.SHOULD);
builder.add(queryBuilder.keyword().onField("isTelSecret").matching(1).createQuery(), BooleanClause.Occur.SHOULD);
return builder.build();
}
Solution #2
Alternatively, you can just keep your code as is, but use the Hibernate Search DSL instead of BooleanQuery.Builder. The Hibernate Search DSL "fixes" some of the most confusing aspects of Lucene, so that this query will work as expected (matching all documents except those that match the clauses):
BooleanJunction<?> booleanJunction = queryBuilder.bool();
for (Query qu : queries) {
booleanJunction.mustNot(qu);
}
return booleanJunction.createQuery();
More details...
If you want to know why exactly this doesn't work...
Boolean queries will not match anything by default, unless a (positive) clause matches a document, in which case matching documents will be filtered out based on other (positive or negative) clauses.
So in your case, the query doesn't match anything, and then it's filtered out with the "must not" clauses, so it still doesn't match anything.
Just adding a MatchAllDocs clause would make it work as expected:
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(new MatchAllDocsQuery(), BooleanClause.Occur.MUST);
for (Query qu : queries) {
builder.add(qu, BooleanClause.Occur.MUST_NOT);
}
return builder.build();

Related

Spring mongoTemplate. Sort is not working in geo query (NearQuery)

I have a problem with mongoTemplate in Spring when I am trying to query using NearQuery with a Sort. The Sort does not work:
Query query = new Query();
query.with(new Sort(Direction.DESC, "timeStamp"));
Criteria criteria = new Criteria();
criteria.and("type").is("MeasurementPoint");
query.addCriteria(criteria);
NearQuery queryN = NearQuery.near(p).maxDistance(new Distance(distance, Metrics.KILOMETERS)).num(range).query(query);
GeoResults<MeasurementPoint> geoPoints = mongoTemplate.geoNear(queryN, MeasurementPoint.class);
I do not know what I am doing wrong but the geoResult returns me the first match, not the last one (Sorted DESC). So, I assume that the Sort is not working properly.
Any idea? Is it a bug?
Thanks!
Unfortunately, isn't possible sort geoNear results since it doesn't returns a cursor and the default sort is the distance with point. What you could do is sort the results manually in Java code. Note that the code bellow ignores the distance and sorts only by "timeStamp".
List<GeoResult<Person>> results = geoPoints.getContent();
Collections.sort(results, new Comparator<GeoResult<Person>>() {
#Override
public int compare(GeoResult<Person> o1, GeoResult<Person> o2) {
return o1.getContent().getTimeStamp() == 2.getContent().getTimeStamp() ? 0 :
(o1.getContent().getTimeStamp() > o2.getContent().getTimeStamp() ? 1 : -1) ;
}
});
An alternative approach is use $geoWithin and $centerSphere. Since you're limiting results with some distance (distance variable), that could work.
Query query = Query.query(Criteria.where("coords").withinSphere(new Circle(p, new Distance(distance, Metrics.KILOMETERS).getNormalizedValue())));
query.with(new Sort(Direction.DESC, "timeStamp"));
Criteria criteria = new Criteria();
criteria.and("type").is("MeasurementPoint");
query.addCriteria(criteria);
List<Person> geoPoints = mongoTemplate.find(query, MeasurementPoint.class);
You could find more information about $geoWithin and $centerSphere here:
http://docs.mongodb.org/manual/reference/operator/geoWithin/
http://docs.mongodb.org/manual/reference/operator/centerSphere/

Simple JPA 2 criteria query "where" condition

I'm learning jpa-hibernate basics.
I have this query for getting all users:
CriteriaBuilder cb = getEntityManager().getCriteriaBuilder();
CriteriaQuery cq = cb.createQuery();
cq.select(cq.from(Utente.class));
return getEntityManager().createQuery(cq).getResultList();
Now I want to filter by a boolean field named 'ghost' where it equals true (or false, it depends).
Translated:
SELECT * FROM users WHERE ghost = 0;
Do I have to use cq.where() ? How?
Yes, you have to use cq.where().
Try something like this:
Root<Utente> utente = cq.from(Utente.class);
boolean myCondition = true; // or false
Predicate predicate = cb.equal(utente.get(Utente_.ghost), myCondition);
cq.where(predicate);
Where I have used the canonical metamodel class Utente_ that should be generated automatically. This avoids the risk of making errors in typing field names, and enhances type safety. Otherwise you can use
Predicate predicate = cb.equal(utente.get("ghost"), myCondition);

Lucene query with field dependency

I have a lucene index of documents that have an _IsPrivate field. I need to query the index to retrieve all documents that are either _IsPrivate == false or _IsPrivate == true and _Owner == me. I've been trying the following lucene query, but I'm not getting the expected results...
_IsPrivate:false OR (_IsPrivate:true AND _Owner:me)
The result is that I'm only getting documents that I own (public and private).
Any thoughts one how I can rewrite my query?
I would use "BooleanQuery" to performe that kind of operation. You make 2 queries, one for each complete search statement, and then add them together with the "SHOULD" operator.
var bq = new BooleanQuery();
var bq1 = new BooleanQuery();
bg1.add(new Term("_IsPrivate", "false"), BooleanClause.Occur.MUST);
var bq2 = new BooleanQuery();
bg2.add(new Term("_IsPrivate", "true"), BooleanClause.Occur.MUST);
bg2.add(new Term("_Owner", "me"), BooleanClause.Occur.MUST);
bq.add(bq1, BooleanClause.Occur.SHOULD);
bq.add(bq2, BooleanClause.Occur.SHOULD);
It might be a bit cumbersome, but I really like to organise my queries this way.
Hope it helps.

Lucene.Net BooleanClause issue

I'm having an issue with Lucene.Net and a BooleanQuery. This is my code:
BooleanQuery query = new BooleanQuery();
String[] types = searchTypes.Split(',');
foreach (string t in types)
query.Add(new TermQuery(new Term("document type", t.ToLower())), BooleanClause.Occur.SHOULD);
This should basically be an OR statement going through documents that have a certain type, which works on its own. However, I also have this query:
Query documentTitleQuery = new WildcardQuery(new Term("title", "*" + documentTitle.ToLower() + "*"));
query.Add(documentTitleQuery, BooleanClause.Occur.MUST);
Which searches for words in a title. Both of these queries work find on their own. When they are used together, it seems Lucene is treating the documentTitleQuery as an OR. So both queries together should return documents of a specific type AND contain specific words in the title, but it is returning all types that have specific words in the title.
Use one more layer of Boolean query to group both:
BooleanQuery topQuery = new BooleanQuery();
...
BooleanQuery query1 = new BooleanQuery();
...
BooleanQuery query2 = new BooleanQuery();
...
topQuery.add(query1, BooleanClause.Occur.MUST);
topQuery.add(query2, BooleanClause.Occur.MUST);

Lucene.NET "OR"

How do I do an "OR" in Lucene.NET. Basically what I have is an array of IDs and I want to return any records where a particular field contains any of the values. I previously was doing this with just one value, but now I want to convert the following code so that MetaDataID is an array of possible values instead of one single value.
if (MetaDataID.Length > 0)
completeQuery.Add(new QueryParser("MetaData", new StandardAnalyzer()).Parse(MetaDataID), BooleanClause.Occur.MUST);
When combining Lucene queries where you want any index record that contains any one of multiple possible values with additional criteria that must also be met, create multiple boolean query objects.
For the first group of "OR" conditions:
BooleanQuery booleanQueryInner = new BooleanQuery();
Query query1 = new TermQuery(new Term("id", "<id 1>"));
Query query2 = new TermQuery(new Term("id", "<id 2>"));
Query query3 = new TermQuery(new Term("id", "<id 3>"));
Query query4 = new TermQuery(new Term("id", "<id 4>"));
booleanQueryInner.add(query1, BooleanClause.Occur.SHOULD);
booleanQueryInner.add(query2, BooleanClause.Occur.SHOULD);
booleanQueryInner.add(query3, BooleanClause.Occur.SHOULD);
booleanQueryInner.add(query4, BooleanClause.Occur.SHOULD);
Now combine with other conditions in query
BooleanQuery booleanQueryOuter = new BooleanQuery();
booleanQueryOuter.add(booleanQueryInner, BooleanClause.Occur.MUST);
booleanQueryOuter.add(boolenaQueryOtherConditions, BooleanClause.Occur.MUST);
Now index records will only be returned if they meet one of the conditions in the inner "OR" group and also meet the conditions in the "other conditions" query.
You need to use BooleanClause.Occur.SHOULD instead of BooleanClause.Occur.MUST
e.g.:
BooleanQuery booleanQuery = new BooleanQuery();
Query query1 = new TermQuery(new Term("id", "<id 1>"));
Query query2 = new TermQuery(new Term("id", "<id 2>"));
booleanQuery.add(query1, BooleanClause.Occur.SHOULD);
booleanQuery.add(query2, BooleanClause.Occur.SHOULD);
When you really want to parse your query, you just need to choose the correct Analyzer and format for your query.
The StandardAnalyzer is not a good choice when you are indexing anything but english full text, especially not in your case! It filters out numbers!
The shortest solution in you case is to create an analyzer that tokenizes at a separator and combine your object into a string.
Example:
Create a Tokenizer that splits at typical seperators and an Analyzer that uses it
using System.IO;
using System.Linq;
using Lucene.Net.Analysis;
namespace Project.Analysis
{
public class TermTokenizer : LetterTokenizer
{
// some static separators
private static readonly char[] NONTOKEN_CHARS = new char[] { ',', ';', ' ', '\n', '\t' };
protected override bool IsTokenChar(char c)
{
return !NONTOKEN_CHARS .Contains(c);
}
}
public class LowerCaseTermAnalyzer : Analyzer
{
public override TokenStream TokenStream(string fieldName, TextReader reader)
{
return new LowerCaseFilter(new TermTokenizer(reader));
}
}
}
Use the new analyzer in your parser
(You need to include System.Linq)
if (MetaDataID.Length > 0)
{
// the search term will look like this: "1;5;7"
string searchTerm = string.Join(";", MetaDataID);
// the query parser uses the new Analyzer
QueryParser parser = new QueryParser("MetaData",new LowerCaseTermAnalyzer());
// the parsed search term (only used internally) will look like this:
// "MetaData:1 MetaData:5 MetaData:7", which is essentially what you want to achieve
completeQuery.Add(new parser.Parse(MetaDataID), BooleanClause.Occur.MUST);
}
Be careful when using BooleanQuery for retrieving documents by id, because it has a limit of maximum boolean clauses.
The basic "OR" clause in Lucene is performed like this, assuming that your searchable field is named "id":
"id:1 id:2 id:3 id:4"
Instead of an "AND" query:
"+id:1 +id:2 +id:3 + id:4"
Using the standard QueryParser and a StringBuilder should do the magic for you.