Lucene.net 4.8: Search by facet field - lucene.net

I am trying to write a filter in lucene.net which matches all blog posts who have at least one tag from a set of tags.
I'm trying the following but this always returns 0 results:
var bq = new BooleanQuery();
var termsQuery = new BooleanQuery();
foreach (var tag in tags)
{
termsQuery.Add(new TermQuery(new Term("TagSlugs", tag)), Occur.SHOULD);
}
bq.Add(termsQuery, Occur.MUST);
var hits = searcher.Search(bq, page * pageSize);
What am I doing wrong?
My document looks like this:
var doc = new Document
{
new StoredField("Id", blogPost.Id),
new Int32Field("ModuleId", blogPost.ModuleId, Field.Store.YES),
new TextField("Title", blogPost.Title, Field.Store.YES),
new StringField("Slug", blogPost.Slug, Field.Store.YES),
new StoredField("ImagePath", blogPost.ImagePath),
new TextField("Intro", blogPost.Intro, Field.Store.YES),
new TextField("Html", blogPost.Title, Field.Store.YES),
new Int64Field("PublishDate", blogPost.PublishDate.Ticks, Field.Store.YES),
new FacetField("PublishDateTag", blogPost.PublishDate.Year.ToString(), blogPost.PublishDate.Month.ToString(), blogPost.PublishDate.Year.ToString())
};
foreach (var tag in blogPost.TagObjects)
{
doc.Add(new FacetField("Tags", tag.Name));
doc.Add(new FacetField("TagSlugs", tag.Slug));
}

You don't need to use Facet in this specific case, you can use a simple String field and filter by that.
But if you are looking to use Facets, try reading this demo https://lucene.apache.org/core/4_8_0/demo/org/apache/lucene/demo/facet/SimpleFacetsExample.html
It's in Java, but readable if you are C# developer.

Related

How to implement search with multiple filters using lucene.net

I'm new to lucene.net. I want to implement search functionality on a client database. I have the following scenario:
Users will search for clients based on the currently selected city.
If the user wants to search for clients in another city, then he has to change the city and perform the search again.
To refine the search results we need to provide filters on Areas (multiple), Pincode, etc. In other words, I need the equivalent lucene queries to the following sql queries:
SELECT * FROM CLIENTS
WHERE CITY = N'City1'
AND (Area like N'%area1%' OR Area like N'%area2%')
SELECT * FROM CILENTS
WHERE CITY IN ('MUMBAI', 'DELHI')
AND CLIENTTYPE IN ('GOLD', 'SILVER')
Below is the code I've implemented to provide search with city as a filter:
private static IEnumerable<ClientSearchIndexItemDto> _search(string searchQuery, string city, string searchField = "")
{
// validation
if (string.IsNullOrEmpty(searchQuery.Replace("*", "").Replace("?", "")))
return new List<ClientSearchIndexItemDto>();
// set up Lucene searcher
using (var searcher = new IndexSearcher(_directory, false))
{
var hits_limit = 1000;
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
// search by single field
if (!string.IsNullOrEmpty(searchField))
{
var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, searchField, analyzer);
var query = parseQuery(searchQuery, parser);
var hits = searcher.Search(query, hits_limit).ScoreDocs;
var results = _mapLuceneToDataList(hits, searcher);
analyzer.Close();
searcher.Dispose();
return results;
}
else // search by multiple fields (ordered by RELEVANCE)
{
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[]
{
"ClientId",
"ClientName",
"ClientTypeNames",
"CountryName",
"StateName",
"DistrictName",
"City",
"Area",
"Street",
"Pincode",
"ContactNumber",
"DateModified"
}, analyzer);
var query = parseQuery(searchQuery, parser);
var f = new FieldCacheTermsFilter("City",new[] { city });
var hits = searcher.Search(query, f, hits_limit, Sort.RELEVANCE).ScoreDocs;
var results = _mapLuceneToDataList(hits, searcher);
analyzer.Close();
searcher.Dispose();
return results;
}
}
}
Now I have to provide more filters on Area, Pincode, etc. in which Area is multiple. I tried BooleanQuery like below:
var cityFilter = new TermQuery(new Term("City", city));
var areasFilter = new FieldCacheTermsFilter("Area",areas); -- where type of areas is string[]
BooleanQuery filterQuery = new BooleanQuery();
filterQuery.Add(cityFilter, Occur.MUST);
filterQuery.Add(areasFilter, Occur.MUST); -- here filterQuery.Add not have an overloaded method which accepts string[]
If we perform the same operation with single area then it's working fine.
I've tried with ChainedFilter like below, which doesn't seems to satisfy the requirement. The below code performs or operation on city and areas. But the requirement is to perform OR operation between the areas provided in the given city.
var f = new ChainedFilter(new Filter[] { cityFilter, areasFilter });
Can anybody suggest to me how to achieve this in lucene.net? Your help will be appreciated.
You're looking for the BooleanFilter. Almost any query object has a matching filter object.
Look into TermsFilter (from Lucene.Net.Contrib.Queries) if your indexing doesn't match the requirements of FieldCacheTermsFilter. From the documentation of the later; "this filter requires that the field contains only a single term for all documents".
var cityFilter = new FieldCacheTermsFilter("CITY", new[] {"MUMBAI", "DELHI"});
var clientTypeFilter = new FieldCacheTermsFilter("CLIENTTYPE", new [] { "GOLD", "SILVER" });
var areaFilter = new TermsFilter();
areaFilter.AddTerm(new Term("Area", "area1"));
areaFilter.AddTerm(new Term("Area", "area2"));
var filter = new BooleanFilter();
filter.Add(new FilterClause(cityFilter, Occur.MUST));
filter.Add(new FilterClause(clientTypeFilter, Occur.MUST));
filter.Add(new FilterClause(areaFilter, Occur.MUST));
IndexSearcher searcher = null; // TODO.
Query query = null; // TODO.
Int32 hits_limit = 0; // TODO.
var hits = searcher.Search(query, filter, hits_limit, Sort.RELEVANCE).ScoreDocs;
What you are looking for is nested boolean queries so that you have an or (on your cities) but that whole group (matching the or) is itself matched as an and
filter1 AND filter2 AND filter3 AND (filtercity1 OR filtercity2 OR filtercity3)
There is already a good description of how to do this here:
How to create nested boolean query with lucene API (a AND (b OR c))?

Query without condition in MongoDB + C#

I'm trying to use the collection.FindAndModify and give it a IMongoQuery which selects all the documents. But I can not find how to create a query without any conditions!
Can anyone tell me how to do this? I'm using MongoDB C# Driver v1.8.3.
Here's my code:
var query = ???;
var sortBy = SortBy.Ascending(new string[] { "last_update" });
var update = Update<Entity>.Set(e => e.last_update, DateTime.Now);
var fields = Fields.Include(new string[] { "counter", "_id" });
var m = collection.FindAndModify(query, sortBy, update, fields, false, false);
I wonder what should I write in place of ??? to select all the documents!?
Use an empty QueryDocument:
var query = new QueryDocument();
But keep in mind that FindAndModify will only modify the first matching document.

MongoDB query in C#

I'd like to get certain documents that match a specific clause, but don't know how to achieve that WHERE effect in relational databases. I have a simple database with words and their translations (objects with 2 fields) and use this code
var words = database.GetCollection<Word>("Dictionary")
to get them. But this gets the whole collection. What if there were thousands of records in the collection? How to get just the records I want?
Use regular expressions matching as below. The 'i' shows case insensitivity.
var collections = mongoDatabase.GetCollection("Abcd");
var queryA = Query.And(
Query.Matches("strName", new BsonRegularExpression("ABCD", "i")),
Query.Matches("strVal", new BsonRegularExpression("4121", "i")));
var queryB = Query.Or(
Query.Matches("strName", new BsonRegularExpression("ABCD","i")),
Query.Matches("strVal", new BsonRegularExpression("33156", "i")));
var getA = collections.Find(queryA);
var getB = collections.Find(queryB);
For Using 'And' or 'Or' in your query, if you want to search over multiple fields.
This assumes you have a class called Word that is modled like you collection.
MongoServer _server = new MongoClient(connectionString).GetServer();
MongoDatabase _database = _server.GetDatabase(database);
MongoCollection _collection = _database.GetCollection(collection);
var results = _collection.FindAs<Word>(Query.EQ("MyField","WordToFind"));

Lucene.net DeleteDocuments deleting too much?

I have for example two docs in the index...both of them have fields "Id".
Now, I issue a DeleteDocuemnts on IndexWritter giving it the Id of the first item.
So creating the index:
var document = new global::Lucene.Net.Documents.Document();
document.Add(new Field("Content", "content", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("Id", "vladanstrigo", Field.Store.YES, Field.Index.NOT_ANALYZED));
var document2 = new global::Lucene.Net.Documents.Document();
document2.Add(new Field("Content", "content second", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document2.Add(new Field("Id", "ivanstrigo", Field.Store.YES, Field.Index.NOT_ANALYZED));
var directory = FSDirectory.Open("...directory...");
var analyzer = ...GetAnalyzer();
var indexWriter = ...GetWriter();
indexWriter.AddDocument(document);
indexWriter.AddDocument(document2);
This works great...I get two documents in index and they work perfect.
But when I do:
indexWriter.DeleteDocuments(new Term("Id", "ivanstrigo"));
The IndexWriter deleted ALL documents in index...not only the one matching this term...which I don't know how to stop. I only want to delete this one!
What am I doing wrong?
Found an answer...when creating IndexWriter, I've had "true" on create new index...which basically always recreated a new index.
It works now.

Lucene.NET "OR"

How do I do an "OR" in Lucene.NET. Basically what I have is an array of IDs and I want to return any records where a particular field contains any of the values. I previously was doing this with just one value, but now I want to convert the following code so that MetaDataID is an array of possible values instead of one single value.
if (MetaDataID.Length > 0)
completeQuery.Add(new QueryParser("MetaData", new StandardAnalyzer()).Parse(MetaDataID), BooleanClause.Occur.MUST);
When combining Lucene queries where you want any index record that contains any one of multiple possible values with additional criteria that must also be met, create multiple boolean query objects.
For the first group of "OR" conditions:
BooleanQuery booleanQueryInner = new BooleanQuery();
Query query1 = new TermQuery(new Term("id", "<id 1>"));
Query query2 = new TermQuery(new Term("id", "<id 2>"));
Query query3 = new TermQuery(new Term("id", "<id 3>"));
Query query4 = new TermQuery(new Term("id", "<id 4>"));
booleanQueryInner.add(query1, BooleanClause.Occur.SHOULD);
booleanQueryInner.add(query2, BooleanClause.Occur.SHOULD);
booleanQueryInner.add(query3, BooleanClause.Occur.SHOULD);
booleanQueryInner.add(query4, BooleanClause.Occur.SHOULD);
Now combine with other conditions in query
BooleanQuery booleanQueryOuter = new BooleanQuery();
booleanQueryOuter.add(booleanQueryInner, BooleanClause.Occur.MUST);
booleanQueryOuter.add(boolenaQueryOtherConditions, BooleanClause.Occur.MUST);
Now index records will only be returned if they meet one of the conditions in the inner "OR" group and also meet the conditions in the "other conditions" query.
You need to use BooleanClause.Occur.SHOULD instead of BooleanClause.Occur.MUST
e.g.:
BooleanQuery booleanQuery = new BooleanQuery();
Query query1 = new TermQuery(new Term("id", "<id 1>"));
Query query2 = new TermQuery(new Term("id", "<id 2>"));
booleanQuery.add(query1, BooleanClause.Occur.SHOULD);
booleanQuery.add(query2, BooleanClause.Occur.SHOULD);
When you really want to parse your query, you just need to choose the correct Analyzer and format for your query.
The StandardAnalyzer is not a good choice when you are indexing anything but english full text, especially not in your case! It filters out numbers!
The shortest solution in you case is to create an analyzer that tokenizes at a separator and combine your object into a string.
Example:
Create a Tokenizer that splits at typical seperators and an Analyzer that uses it
using System.IO;
using System.Linq;
using Lucene.Net.Analysis;
namespace Project.Analysis
{
public class TermTokenizer : LetterTokenizer
{
// some static separators
private static readonly char[] NONTOKEN_CHARS = new char[] { ',', ';', ' ', '\n', '\t' };
protected override bool IsTokenChar(char c)
{
return !NONTOKEN_CHARS .Contains(c);
}
}
public class LowerCaseTermAnalyzer : Analyzer
{
public override TokenStream TokenStream(string fieldName, TextReader reader)
{
return new LowerCaseFilter(new TermTokenizer(reader));
}
}
}
Use the new analyzer in your parser
(You need to include System.Linq)
if (MetaDataID.Length > 0)
{
// the search term will look like this: "1;5;7"
string searchTerm = string.Join(";", MetaDataID);
// the query parser uses the new Analyzer
QueryParser parser = new QueryParser("MetaData",new LowerCaseTermAnalyzer());
// the parsed search term (only used internally) will look like this:
// "MetaData:1 MetaData:5 MetaData:7", which is essentially what you want to achieve
completeQuery.Add(new parser.Parse(MetaDataID), BooleanClause.Occur.MUST);
}
Be careful when using BooleanQuery for retrieving documents by id, because it has a limit of maximum boolean clauses.
The basic "OR" clause in Lucene is performed like this, assuming that your searchable field is named "id":
"id:1 id:2 id:3 id:4"
Instead of an "AND" query:
"+id:1 +id:2 +id:3 + id:4"
Using the standard QueryParser and a StringBuilder should do the magic for you.