Google like autosuggest with Lucene.net - lucene.net

I have a Lucene index that stores customers that basically includes a view model (documents fields that are stored and not indexed), an ID (field stored and indexed to permit find and update of document), and a list of terms covered by the google-like search (multiple field instances of name Term). Terms may be field in the view model or not.
This works fine for the actual searching of documents by term. The question is how I can implement auto-suggest, basically get a list of Term (the field, not Lucene Term) values that might be the continuation of the entered value (i.e. "Co" might result in "Colorado", "Coloring Book", etc because those are actual values in at least one Document's Term field.

Theres a lot of way to do this, but if you need a quick and simple way to do it, use a TermEnum.
Just paste this little code sample in a new C# console application and check if it works for you to start from.
RAMDirectory dir = new RAMDirectory();
IndexWriter iw = new IndexWriter(dir, new KeywordAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED);
Document d = new Document();
Field f = new Field("text", "", Field.Store.YES, Field.Index.ANALYZED);
d.Add(f);
f.SetValue("abc");
iw.AddDocument(d);
f.SetValue("colorado");
iw.AddDocument(d);
f.SetValue("coloring book");
iw.AddDocument(d);
iw.Commit();
IndexReader reader = iw.GetReader();
TermEnum terms = reader.Terms(new Term("text", "co"));
int maxSuggestsCpt = 0;
// will print:
// colorado
// coloring book
do
{
Console.WriteLine(terms.Term.Text);
maxSuggestsCpt++;
if (maxSuggestsCpt >= 5)
break;
}
while (terms.Next() && terms.Term.Text.StartsWith("co"));
reader.Dispose();
iw.Dispose();

Related

Getting ElasticSearch document fields inside of loaded records in searchkick

Is it possible to get ElasticSearch document fields inside of loaded AR records?
Here is a gist that illustrates what I mean: https://gist.github.com/allomov/39c30905e94c646fb11637b45f43445d
In this case I want to avoid additional computation of total_price after getting response from ES. The solution that I currently see is to include the relationship and run total_price computation for each record, which is not so optimal way to perform this operation, as I see it.
result = Product.search("test", includes: :product_components).response
products_with_total_prices = result.map do |product|
{
product: product
total_price: product.product_components.map(&:price).compact.sum
}
end
Could you please tell if it is possible to mix ES document fields into AR loaded record?
As far as I'm aware it isn't possible to get a response that merges the document fields into the loaded record.
Usually I prefer to completely rely on the data in the indexed document where possible (using load: false as a search option), and only load the AR record(s) as a second step if necessary. For example:
result = Product.search("test", load: false).response
# If you also need AR records, could do something like:
product_ids = result.map(&:id)
products_by_id = {}
Product.where(id: product_ids).find_each do |ar_product|
products_by_id[ar_product.id] = ar_product
end
merged_result = result.map do |es_product|
es_product[:ar_product] = products_by_id[es_product.id]}
end
Additionally, it may be helpful to retrieve the document stored in the ES index for a specific record, which I would normally do by defining the following method in your Product class:
def es_document
return nil unless doc = Product.search_index.retrieve(self).presence
Hashie::Mash.new doc
end
You can use select: true and the with_hit method to get the record and the search document together. For your example:
result = Product.search("test", select: true)
products_with_total_prices =
result.with_hit.map do |product, hit|
{
product: product,
total_price: hit["_source"]["total_price"]
}
end

How can I upsert a record and array element at the same time?

That is meant to be read as a dual upsert operation, upsert the document then the array element.
So MongoDB is a denormalized store for me (we're event sourced) and one of the things I'm trying to deal with is the concurrent nature of that. The problem is this:
Events can come in out of order, so each update to the database need to be an upsert.
I need to be able to not only upsert the parent document but an element in an array property of that document.
For example:
If the document doesn't exist, create it. All events in this stream have the document's ID but only part of the information depending on the event.
If the document does exist, then update it. This is the easy part. The update command is just written as UpdateOneAsync and as an upsert.
If the event is actually to update a list, then that list element needs to be upserted. So if the document doesn't exist, it needs to be created and the list item will be upserted (resulting in an insert); if the document does exist, then we need to find the element and update it as an upsert, so if the element exists then it is updated otherwise it is inserted.
If at all possible, having it execute as a single atomic operation would be ideal, but if it can only be done in multiple steps, then so be it. I'm getting a number of mixed examples on the net due to the large change in the 2.x driver. Not sure what I'm looking for beyond the UpdateOneAsync. Currently using 2.4.x. Explained examples would be appreciated. TIA
Note:
Reiterating that this is a question regarding the MongoDB C# driver 2.4.x
Took some tinkering, but I got it.
var notificationData = new NotificationData
{
ReferenceId = e.ReferenceId,
NotificationId = e.NotificationId,
DeliveredDateUtc = e.SentDate.DateTime
};
var matchDocument = Builders<SurveyData>.Filter.Eq(s => s.SurveyId, e.EntityId);
// first upsert the document to make sure that you have a collection to write to
var surveyUpsert = new UpdateOneModel<SurveyData>(
matchDocument,
Builders<SurveyData>.Update
.SetOnInsert(f => f.SurveyId, e.EntityId)
.SetOnInsert(f => f.Notifications, new List<NotificationData>())){ IsUpsert = true};
// then push a new element if none of the existing elements match
var noMatchReferenceId = Builders<SurveyData>.Filter
.Not(Builders<SurveyData>.Filter.ElemMatch(s => s.Notifications, n => n.ReferenceId.Equals(e.ReferenceId)));
var insertNewNotification = new UpdateOneModel<SurveyData>(
matchDocument & noMatchReferenceId,
Builders<SurveyData>.Update
.Push(s => s.Notifications, notificationData));
// then update the element that does match the reference ID (if any)
var matchReferenceId = Builders<SurveyData>.Filter
.ElemMatch(s => s.Notifications, Builders<NotificationData>.Filter.Eq(n => n.ReferenceId, notificationData.ReferenceId));
var updateExistingNotification = new UpdateOneModel<SurveyData>(
matchDocument & matchReferenceId,
Builders<SurveyData>.Update
// apparently the mongo C# driver will convert any negative index into an index symbol ('$')
.Set(s => s.Notifications[-1].NotificationId, e.NotificationId)
.Set(s => s.Notifications[-1].DeliveredDateUtc, notificationData.DeliveredDateUtc));
// execute these as a batch and in order
var result = await _surveyRepository.DatabaseCollection
.BulkWriteAsync(
new []{ surveyUpsert, insertNewNotification, updateExistingNotification },
new BulkWriteOptions { IsOrdered = true })
.ConfigureAwait(false);
The post linked as being a dupe was absolutely helpful, but it was not the answer. There were a few things that needed to be discovered.
The 'second statement' in the linked example didn't work
correctly, at least when translated literally. To get it to work, I had to match on the
element and then invert the logic by wrapping it in the Not() filter.
In order to use 'this index' on the match, you have to use a
negative index on the array. As it turns out, the C# driver will
convert any negative index to the '$' character when the query is
rendered.
In order to ensure they are run in order, you must include bulk write
options with IsOrdered set to true.

Columns Priority while searching with Lucene.NET

Team,
I have 6 indexed columns to search as below.
Name
Description
SKU
Category
Price
SearchCriteria
Now, While searching I have need to perform search on "SearchCritera" column first then rest of the columns.
In short - The products with matched "SearchCritera" shold display on the top of search results.
var parser = new MultiFieldQueryParser(Version.LUCENE_30,
new[] { "SearchCriteria",
"Name",
"Description",
"SKU",
"Category",
"Price"
}, analyzer);
var query = parseQuery(searchQuery, parser);
var finalQuery = new BooleanQuery();
finalQuery.Add(parser.Parse(searchQuery), Occur.SHOULD);
var hits = searcher.Search(finalQuery, null, hits_limit, Sort.RELEVANCE);
There are 2 ways to do it.
The first method is using field boosting:
During indexing set a boost to the fields by their priority:
Field name = new Field("Name", strName, Field.Store.NO, Field.Index.ANALYZED);
name.Boost = 1;
Field searchCriteria = new Field("SearchCriteria", strSearchCriteria, Field.Store.NO, Field.Index.ANALYZED);
searchCriteria.Boost = 2;
doc.Add(name);
doc.Add(searchCriteria);
This way the scoring of the terms in SearchCriteria field will be doubled then the scoring of the terms in the Name field.
This method is better if you always wants SearchCriteria to be more important than Name.
The second method is to using MultiFieldQueryParser boosting during search:
Dictionary<string,float> boosts = new Dictionary<string,float>();
boosts.Add("SearchCriteria",2);
boosts.Add("Name",1);
MultiFieldQueryParser parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30new[], new[] { "SearchCriteria", "Name"}, analyzer, boosts);
This method is better if you want the boosting to work only in some scenarios of your application.
You should try and see if the boosting number fits your needs (the sensitivity of the priority you are looking for) and change them according to your needs.
to make the example short and readable I used only 2 of your fields but you should use all of them of curse…

Composite views in couchbase

I'm new to Couchbase and am struggling to get a composite index to do what I want it to. The use-case is this:
I have a set of "Enumerations" being stored as documents
Each has a "last_updated" field which -- as you may have guessed -- stores the last time that the field was updated
I want to be able to show only those enumerations which have been updated since some given date but still sort the list by the name of the enumeration
I've created a Couchbase View like this:
function (doc, meta) {
var time_array;
if (doc.doc_type === "enum") {
if (doc.last_updated) {
time_array = doc.last_updated.split(/[- :]/);
} else {
time_array = [0,0,0,0,0,0];
}
for(var i=0; i<time_array.length; i++) { time_array[i] = parseInt(time_array[i], 10); }
time_array.unshift(meta.id);
emit(time_array, null);
}
}
I have one record that doesn't have the last_updated field set and therefore has it's time fields are all set to zero. I thought as a first test I could filter out that result and I put in the following:
startkey = ["a",2012,0,0,0,0,0]
endkey = ["Z",2014,0,0,0,0,0]
While the list is sorted by the 'id' it isn't filtering anything! Can anyone tell me what I'm doing wrong? Is there a better composite view to achieve these results?
In couchbase when you query view by startkey - endkey you're unable to filter results by 2 or more properties. Couchbase has only one index, so it will filter your results only by first param. So your query will be identical to query with:
startkey = ["a"]
endkey = ["Z"]
Here is a link to complete answer by Filipe Manana why it can't be filtered by those dates.
Here is a quote from it:
For composite keys (arrays), elements are compared from left to right and comparison finishes as soon as a element is different from the corresponding element in the other key (same as what happens when comparing strings à la memcmp() or strcmp()).
So if you want to have a view that filters by date, date array should go first in composite key.

NumericRangeQuery in NHibernate.Search

I am creating a search, where the user can both choose an interval and search on a term in the same go.
This is however giving me trouble, since I have up until have only used the usual text query.
I am wondering how I am to go about using both a NumericRangeQuery and a regular term query. Usually I would use a query below:
var parser = new MultiFieldQueryParser(
new[] { "FromPrice", "ToPrice", "Description"}, new SimpleAnalyzer());
Query query = parser.Parse(searchQuery.ToString());
IFullTextSession session = Search.CreateFullTextSession(this.Session);
IQuery fullTextQuery = session.CreateFullTextQuery(query, new[] { typeof(MyObject) });
IList<MyObject> results = fullTextQuery.List<MyObject>();
But if I was to e.g. search the range FromPrice <-> ToPrice and also the description, how should I do this, since session.CreateFullTextQuery only takes one Query object?
you can create a single query that is a BooleanQuery combining all the conditions you want to be met.
For the ranges, heres a link to the synthax using the QueryParser:
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_2/queryparsersyntax.html#Range Searches