How to group the results of a Lucene.Net search? - lucene.net

I have managed to create document and do some complex searching too but facing problem in grouping some search result.
There are books which are displayed after search which is fine. Along with this Author grouping with count need to done which will be based on same search query.
Example,
Author Name | Count
A | 12
B | 2
I am using Lucene.Net 3.0.3.0 which does not support grouping but there might be some work around. I need same feature with price ranges too.

Everything is possible if you write a custom Collector. What you describe are facets, and can easily be solved by counting the document values yourself. The core part is calling the IndexSearcher.Search overload accepting a collector. The collector should read values, usually implemented with a field-cache implementation and do the calculation needed.
This is a short demonstration using some classes from my demo-project Corelicious.Lucene.
var postTypes = new Dictionary<Int32, Int32>();
searcher.Search(query, new DelegatingCollector((reader, doc, scorer) => {
var score = scorer.Score();
if (score > 0) {
var postType = SingleFieldCache.Default.GetInt32(reader, "PostTypeId", doc);
if (postType.HasValue) {
if (postTypes.ContainsKey(postType.Value)) {
postTypes[postType.Value]++;
} else {
postTypes[postType.Value] = 1;
}
}
}
}));
Full code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using System.Xml;
using Corelicious.Lucene;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Directory = Lucene.Net.Store.Directory;
using Version = Lucene.Net.Util.Version;
namespace ConsoleApplication {
public static class Program {
public static void Main(string[] args) {
Console.WriteLine ("Creating directory...");
var directory = new RAMDirectory();
var analyzer = new StandardAnalyzer(Version.LUCENE_30);
CreateIndex(directory, analyzer);
var userQuery = "calculate pi";
var queryParser = new QueryParser(Version.LUCENE_30, "Body", analyzer);
var query = queryParser.Parse(userQuery);
Console.WriteLine("Query: '{0}'", query);
var indexReader = IndexReader.Open(directory, readOnly: true);
var searcher = new IndexSearcher(indexReader);
var postTypes = new Dictionary<Int32, Int32>();
searcher.Search(query, new DelegatingCollector((reader, doc, scorer) => {
var score = scorer.Score();
if (score > 0) {
var postType = SingleFieldCache.Default.GetInt32(reader, "PostTypeId", doc);
if (postType.HasValue) {
if (postTypes.ContainsKey(postType.Value)) {
postTypes[postType.Value]++;
} else {
postTypes[postType.Value] = 1;
}
}
}
}));
Console.WriteLine("Post type summary");
Console.WriteLine("Post type | Count");
foreach(var pair in postTypes.OrderByDescending(x => x.Value)) {
var postType = (PostType)pair.Key;
Console.WriteLine("{0,-10} | {1}", postType, pair.Value);
}
Console.ReadLine ();
}
public enum PostType {
Question = 1,
Answer = 2,
Tag = 4
}
public static void CreateIndex(Directory directory, Analyzer analyzer) {
using (var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
using (var xmlStream = File.OpenRead("/Users/sisve/Downloads/Stack Exchange Data Dump - Sept 2011/Content/092011 Mathematics/posts.xml"))
using (var xmlReader = XmlReader.Create(xmlStream)) {
while (xmlReader.ReadToFollowing("row")) {
var tags = xmlReader.GetAttribute("Tags") ?? String.Empty;
var title = xmlReader.GetAttribute("Title") ?? String.Empty;
var body = xmlReader.GetAttribute("Body");
var doc = new Document();
// tags are stored as <tag1><tag2>
foreach (Match match in Regex.Matches(tags, "<(.*?)>")) {
doc.Add(new Field("Tags", match.Groups[1].Value, Field.Store.NO, Field.Index.NOT_ANALYZED));
}
doc.Add(new Field("Title", title, Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("Body", body, Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("PostTypeId", xmlReader.GetAttribute("PostTypeId"), Field.Store.NO, Field.Index.NOT_ANALYZED));
writer.AddDocument(doc);
}
writer.Optimize();
writer.Commit();
}
}
}
}

Related

Azure Search CreateIndexAsync fails with CamelCase field names FieldBuilder

Azure Search V11
I can't get this to work. But with the standard FieldBuilder the index is created.
private static async Task CreateIndexAsync(SearchIndexClient indexClient, string indexName, Type type)
{
var builder = new FieldBuilder
{
Serializer = new JsonObjectSerializer(new JsonSerializerOptions {PropertyNamingPolicy = new CamelCaseNamingPolicy()})
};
var searchFields = builder.Build(type).ToArray();
var definition = new SearchIndex(indexName, searchFields);
await indexClient.CreateIndexAsync(definition);
}
`
public class CamelCaseNamingPolicy : JsonNamingPolicy
{
public override string ConvertName(string name)
{
return char.ToLower(name[0]) + name.Substring(1);
}
}
See our sample for FieldBuilder. Basically, you must use a naming policy for both FieldBuilder and the SearchClient:
var clientOptions = new SearchClientOptions
{
Serializer = new JsonObjectSerializer(
new JsonSerializerOptions
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
}),
};
var builder = new FieldBuilder
{
Serializer = clientOptions.Serializer,
};
var index = new SearchIndex("name")
{
Fields = builder.Build(type),
};
var indexClient = new SearchIndexClient(uri, clientOptions);
await indexClient.CreateIndexAsync(index);
await Task.DelayAsync(5000); // can take a little while
var searchClient = new SearchClient(uri, clientOptions);
var response = await searchClient.SearchAsync("whatever");
While this sample works (our sample code comes from oft-executed tests), if you have further troubles, please be sure to post the exact exception message you are getting.

Using Dynamic LINQ with EF.Functions.Like

On the Dynamic LINQ website there's an example using the Like function.
I am unable to get it to work with ef core 3.1
[Test]
public void DynamicQuery()
{
using var context = new SamDBContext(Builder.Options);
var config = new ParsingConfig { ResolveTypesBySimpleName = true };
var lst = context.Contacts.Where(config, "DynamicFunctions.Like(FirstName, \"%Ann%\")".ToList();
lst.Should().HaveCountGreaterThan(1);
}
Example from the Dynamic LINQ website
var example1 = Cars.Where(c => EF.Functions.Like(c.Brand, "%t%"));
example1.Dump();
var config = new ParsingConfig { ResolveTypesBySimpleName = true };
var example2 = Cars.Where(config, "DynamicFunctions.Like(Brand, \"%t%\")");
example2.Dump();
Looks like my code. But I am getting the following error
System.Linq.Dynamic.Core.Exceptions.ParseException : No property or field 'DynamicFunctions' exists in type 'Contact'
you don't need the ResolveTypesBySimpleName, implement your wont type provider.
The piece below people to use PostgreSQL ILike with unnaccent
public class LinqCustomProvider : DefaultDynamicLinqCustomTypeProvider
{
public override HashSet<Type> GetCustomTypes()
{
var result = base.GetCustomTypes();
result.Add(typeof(NpgsqlFullTextSearchDbFunctionsExtensions));
result.Add(typeof(NpgsqlDbFunctionsExtensions));
result.Add(typeof(DbFunctionsExtensions));
result.Add(typeof(DbFunctions));
result.Add(typeof(EF));
return result;
}
}
// ....
var expressionString = $"EF.Functions.ILike(EF.Functions.Unaccent(People.Name), \"%{value}%\")";
var config = new ParsingConfig()
{
DateTimeIsParsedAsUTC = true,
CustomTypeProvider = new LinqCustomProvider()
};
return query.Where(config, expressionString);
Hope this helps people, took me some time to get this sorted.

How to update many documents using UpdateManyAsync

I have the following method to update a document in MongoDB:
public async Task UpdateAsync(T entity)
{
await _collection.ReplaceOneAsync(filter => filter.Id == entity.Id, entity);
}
Which works fine - I was just wondering if anybody has an example of how the UpdateManyAsync function works:
public async Task UpdateManyAsync(IEnumerable<T> entities)
{
await _collection.UpdateManyAsync(); // What are the parameters here
}
Any advice is appreciated!
UpdateManyAsync works the same way as update with multi: true in Mongo shell. So you can specify filtering condition and update operation and it will affect multiple documents. For instance to increment all a fields where a is greater than 10 you can use this method:
var builder = Builders<SampleClass>.Update;
await myCollection.UpdateManyAsync(x => x.a > 10, builder.Inc(x => x.a, 1));
I guess you'd like to replace multiple documents. That can be achieved using bulkWrite method. If you need generic method in C# then you can introduce some kind of marker interface to build filter part of replace operation:
public interface IMongoIdentity
{
ObjectId Id { get; set; }
}
Then you can add generic constaint to your class and use BuikWrite in .NET like below:
class YourRepository<T> where T : IMongoIdentity
{
IMongoCollection<T> collection;
public async Task UpdateManyAsync(IEnumerable<T> entities)
{
var updates = new List<WriteModel<T>>();
var filterBuilder = Builders<T>.Filter;
foreach (var doc in entities)
{
var filter = filterBuilder.Where(x => x.Id == doc.Id);
updates.Add(new ReplaceOneModel<T>(filter, doc));
}
await collection.BulkWriteAsync(updates);
}
}
As #mickl answer, you can not use x=> x.Id because it is a Generic
Use as below:
public async Task<string> UpdateManyAsync(IEnumerable<T> entities)
{
var updates = new List<WriteModel<T>>();
var filterBuilder = Builders<T>.Filter;
foreach (var doc in entities)
{
foreach (PropertyInfo prop in typeof(T).GetProperties())
{
if (prop.Name == "Id")
{
var filter = filterBuilder.Eq(prop.Name, prop.GetValue(doc));
updates.Add(new ReplaceOneModel<T>(filter, doc));
break;
}
}
}
BulkWriteResult result = await _collection.BulkWriteAsync(updates);
return result.ModifiedCount.ToString();
}
Or you go by Bson attribute:
public async Task UpdateManyAsync(IEnumerable<TEntity> objs, CancellationToken cancellationToken = default)
{
var updates = new List<WriteModel<TEntity>>();
var filterBuilder = Builders<TEntity>.Filter;
foreach (var obj in objs)
{
foreach (var prop in typeof(TEntity).GetProperties())
{
object[] attrs = prop.GetCustomAttributes(true);
foreach (object attr in attrs)
{
var bsonId = attr as BsonIdAttribute;
if (bsonId != null)
{
var filter = filterBuilder.Eq(prop.Name, prop.GetValue(obj));
updates.Add(new ReplaceOneModel<TEntity>(filter, obj));
break;
}
}
}
}
await _dbCollection.BulkWriteAsync(updates, null, cancellationToken);
}

not able to save documents in mongodb c# with .Net driver 2.0

I want to save the document in a collection my method is as below but it is not saving at all.
internal static void InitializeDb()
{
var db = GetConnection();
var collection = db.GetCollection<BsonDocument>("locations");
var locations = new List<BsonDocument>();
var json = JObject.Parse(File.ReadAllText(#"..\..\test_files\TestData.json"));
foreach (var d in json["locations"])
{
using (var jsonReader = new JsonReader(d.ToString()))
{
var context = BsonDeserializationContext.CreateRoot(jsonReader);
var document = collection.DocumentSerializer.Deserialize(context);
locations.Add(document);
}
}
collection.InsertManyAsync(locations);
}
If I made async and await then it runs lately, I need to run this first and then only test the data.
For future reference, wait() at end of async method work like synchronously
internal static void InitializeDb()
{
var db = GetConnection();
var collection = db.GetCollection<BsonDocument>("locations");
var locations = new List<BsonDocument>();
var json = JObject.Parse(File.ReadAllText(#"..\..\test_files\TestData.json"));
foreach (var d in json["locations"])
{
using (var jsonReader = new JsonReader(d.ToString()))
{
var context = BsonDeserializationContext.CreateRoot(jsonReader);
var document = collection.DocumentSerializer.Deserialize(context);
locations.Add(document);
}
}
collection.InsertManyAsync(locations).wait();
}

Merge Self-tracking entities

Graph of objects stored in the database and the same object graph is serialized into a binary package. Package is transmitted over the network to the client, then it is necessary to merge data from the package and data from the database.
Source code of merge:
//objList - data from package
var objectIds = objList.Select(row => row.ObjectId).ToArray();
//result - data from Database
var result = SomeService.Instance.LoadObjects(objectIds);
foreach (var OSobj in objList)
{
var obj = result.Objects.ContainsKey(OSobj.ObjectId)
? result.Objects[OSobj.ObjectId]
: result.Objects.CreateNew(OSobj.ObjectId);
var targetObject = result.DataObjects.Where(x => x.ObjectId == OSobj.ObjectId).FirstOrDefault();
targetObject.StopTracking();
var importedProperties = ImportProperties(targetObject.Properties, OSobj.Properties);
targetObject.Properties.Clear();
foreach (var property in importedProperties)
{
targetObject.Properties.Add(property);
}
targetObject.StartTracking();
}
return result;
And code of ImportProperties method:
static List<Properties> ImportProperties(
IEnumerable<Properties> targetProperties,
IEnumerable<Properties> sourceProperties)
{
Func<Guid, bool> hasElement = targetProperties
.ToDictionary(e => e.PropertyId, e => e)
.ContainsKey;
var tempTargetProperties = new List<Properties>();
foreach (var sourceProperty in sourceProperties)
{
if (!hasElement(sourceProperty.PropertyId))
{
sourceProperty.AcceptChanges();
tempTargetProperties.Add(sourceProperty.MarkAsAdded());
}
else
{
sourceProperty.AcceptChanges();
tempTargetProperties.Add(sourceProperty.MarkAsModified());
}
}
return tempTargetProperties;
}
Server save incoming changes like this :
_context.ApplyChanges("OSEntities.Objects", entity);
_context.SaveChanges(SaveOptions.DetectChangesBeforeSave);
When the server tries to save the changes occur exception:
AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges.
But if I change the code of ImportProperties method, the error does not occur and the changes are saved successfully:
static List<Properties> ImportProperties(
IEnumerable<Properties> targetProperties,
IEnumerable<Properties> sourceProperties)
{
Func<Guid, bool> hasElement = targetProperties.ToDictionary(e => e.PropertyId, e => e).ContainsKey;
var tempTargetProperties = new List<Properties>();
foreach (var sourceProperty in sourceProperties)
{
if (!hasElement(sourceProperty.PropertyId))
{
var newProp = new Properties
{
ElementId = sourceProperty.ElementId,
Name = sourceProperty.Name,
ObjectId = sourceProperty.ObjectId,
PropertyId = sourceProperty.PropertyId,
Value = sourceProperty.Value
};
tempTargetProperties.Add(newProp);
}
else
{
var modifiedProp = new Properties
{
ElementId = sourceProperty.ElementId,
Name = sourceProperty.Name,
ObjectId = sourceProperty.ObjectId,
PropertyId = sourceProperty.PropertyId,
Value = sourceProperty.Value
};
modifiedProp.MarkAsModified();
tempTargetProperties.Add(modifiedProp);
}
}
return tempTargetProperties;
}
Why is there an exception?
When you transport an object graph (Entity with n-level deep navigation properties) to a client application the entities will record any changes made in their respective change trackers. When entity (or object graph) is sent back to the server side of the application basically all you need to do is:
try
{
using(Entities context = new Entities())
{
context.ApplyChanges(someEntity);
context.SaveChanges();
}
}
catch
{
...
}
I don't see the need of all the code above you posted. What are you trying to achieve with that code?