Lucene get documents containing specific field name - lucene.net

I am using lucene in my project and I got to one issue, that I need to find documents which contain fields with specific name. I was only able to find solutions where you creating search term containing pairs name,value like this:
IndexSearcher searcher = new IndexSearcher(directoryReader);
TermQuery query = new TermQuery(new Term("name", "value"));
TopDocs topdocs = searcher.query(query, numberToReturn);
but as I stated, I need to find documents only by provided name of field and obtain access to value of field specified by name in selected documents.
Although I am working with Lucene.NET I will be thankfull for solution in any language.
Thank you in advance.

I find out this solution and little change made it:
var queryParser = new QueryParser(Version.LUCENE_30, "content", analyzer);
queryParser.AllowLeadingWildcard = true;
var query = queryParser.Parse( "*" );

Related

Why is EWS search filter returning fewer emails than I see from outlook?

I am using the following code to retrieve emails whose subject contains "MS" and "QW". I see more than 8 emails satisfying the search criteria, but the code returns only two emails. Can anyone help me take a look at what the problem is in here?
var filter1 = new SearchFilter.ContainsSubstring(ItemSchema.Subject, "MS", ContainmentMode.Substring, ComparisonMode.IgnoreCase);
var filter2 = new SearchFilter.ContainsSubstring(ItemSchema.Subject, "QW");
var sf = new SearchFilter.SearchFilterCollection(LogicalOperator.And, filter1, filter2);
var findResults = service.FindItems(WellKnownFolderName.Inbox, sf view);
A few things I can see if with the Second search filter you haven't specified the ContainmentMode or ComparsisonMode. You also don't seem to have added them to the SearchFilter Collection. eg you should have
sf.Add(filter1);
sf.Add(filter2);
That Search will yield pretty poor performance on a folder with a large number I would suggest you maybe look at AQS then you can do
service.FindItems(WellKnownFolderName.Inbox, "Subject:MS AND Subject:QW" ,view);
That will search against the Content Indexes and yield better performance.

Query Documents in iManage (Worksite)

I am using Worksite API to query documents in iManage (version 8.5). I've listed my code below. If I only use one search parameter then the code works without any problem. However, if I add more than one parameter then it returns either null or no result (result.Count = 0)
Then I changed my code to use the ManOrQuery class (provided by my Worksite API, please see the commented lines) and that still doesn't work.
// Search for documents matching the specified date range.
iManageSearch rds = new iManageSearch(isession);
// Populate searchparameters
IManProfileSearchParameters searchparams = Utility.CreateUnpopulatedProfileParams(idms);
//searchparams.Add(imProfileAttributeID.imProfileCreateDate, dateRange.Value);
//searchparams.Add(imProfileAttributeID.imProfileAuthor, srchKey);
//searchparams.Add(imProfileAttributeID.imProfileFullText, srchKey);
searchparams.Add(imProfileAttributeID.imProfileDocNum, srchKey);
//searchparams.Add(imProfileAttributeID.imProfileDescription, srchKey);
// Search documents
IManDocuments results = rds.GetDocuments(Utility.BuildDatabaseList(isession.Databases), searchparams);
// tried the other way to search document
//QueryBuilder qb = new QueryBuilder();
//ManOrQuery orquery = qb.CreateORQuery;
//qb.AddORSearchFieldValue(orquery, imProfileAttributeID.imProfileDocNum, srchKey);
//qb.AddORSearchFieldValue(orquery, imProfileAttributeID.imProfileAuthor, srchKey);
//qb.AddORSearchFieldValue(orquery, imProfileAttributeID.imProfileFullText, srchKey);
//IManContents results = qb.GetContents(iworkarea, Utility.BuildDatabaseList(isession.Databases), (IManQuery)orquery);
int c = results.Count;
on my UI, I've a textbox for users to enter their search credential. And I would like to compare the search value with Author, DocNumber, DocTitle and also the content of documents. My goal is to build a query like (docAuthor=srchKey OR docNum=srchKey OR docDescription = srchKey ...). I've been banging my head, hope anyone can help me. Thank you.
PS: I also referred to a post here How to get information out of iManage / Desksite, but that doesn't work for me....
I know its been a little while since this question was posted, and I have done some searching around stackoverflow and not been able to find much to help me on this problem, however I have managed to write some code that works (for me at least) and I hope if its too late to help you, it might help someone else.
I cant see how you set up the database in the code above, so there may be a problem there - as the syntax for adding your search parameters appears to be correct.
Update:
I have spoken to our administrators, and it appears that to do searching, it depends on the indexer settings of the server. This is potentially why your code was not working. For me I had to disable the indexer from the database properties in the WorkSite Service manger, so that it would use SQL
IManDMS dms = (IManDMS)Activator.CreateInstance(Type.GetTypeFromProgID("iManage.ManDMS"));
IManSession session = dms.Sessions.Add(serverName);
session.TrustedLogin2(userToken);
IManDatabase database = session.Databases.ItemByName(libraryName);
IManProfileSearchParameters searchparameters = dms.CreateProfileSearchParameters();
// add search parameters
// this works (just to prove that we can search for a document)
searchparameters.Add(imProfileAttributeID.imProfileDocNum, "4882408");
searchparameters.Add(imProfileAttributeID.imProfileCreateDate, new DateTime(2015, 04, 8).ToShortDateString());
// run the search
IManContents searchResults = database.SearchDocuments(searchparameters, true);
// process the results
foreach (IManDocument item in ((IEnumerable)searchResults).OfType<IManDocument>())
{
// do something with the document
}
session.Logout();
dms.Sessions.RemoveByObject(session);

How do I add a 'where not' to a QueryBuilder Query

I want to search the entire content tree but not specific tress that have a 'Do Not Search' property at their base.
The Query Builder API page does not reference anything besides AND and OR.
Is it possible to exclude paths from the search or can I only explicitly include paths?
The first three lines are "/content AND /content/path/es". I want "/content AND NOT(/content/path/es)"
map.put("group.1_path", "/content");
map.put("group.2_path", "/content/path/es");
map.put("group.p.or","false");
I have tried the next two both true and false and they have no effect.
map.put("group.2_path.p.not", "true");
map.put("group.2_path.not", "true");
map.put("group.2_path", "not('/content/path/es')");
I can't find any documentation that mentions any other name that 'not' or '!' might be used instead.
Yes it is possible. But not exactly in the way you are trying.
You can exclude the pages with certain properties using the property predicate evaluator.
For ex. If you want to exclude pages which have the property "donotsearch" in its jcr:content node, then you can query it using property operation as exists
map.put("path", "/content/geometrixx/en/toolbar");
map.put("type", "cq:Page");
/* Relative path to the property to check for */
map.put("property", "jcr:content/donotsearch");
/* Operation to perform on the value of the prop, in this case existence check */
map.put("property.operation", "exists");
/* Value for the prop, false = not, by default it is true */
map.put("property.value", "false");
This would result in the following XPath Query
/jcr:root/content/geometrixx/en/toolbar//element(*, cq:Page)
[
not(jcr:content/#donotsearch)
]
But in case you would like to exclude pages with certain value for the property donotsearch, then you can change the above query as shown below
map.put("property", "jcr:content/donotsearch"); //the property to check for
map.put("property.operation", "equals"); // or unequals or like etc..
map.put("property.value", "/*the value of the property*/");
You can find a lot other info regarding querying by referring to the docs.
I'm not sure what version of CQ you're using (you linked to the 5.4 docs), but in 5.5 and above, the PredicateGroup class has a setNegated method to exclude results that would match the group defined.
You can't set negation on an individual Predicate, but there would be nothing to stop you creating a group with just the predicate that you wish to negate:
Predicate pathPredicate = new Predicate("path").set("path", "/content/path/es");
PredicateGroup doNotSearchGroup = new PredicateGroup();
doNotSearchGroup.setNegated(true);
doNotSearchGroup.add(pathPredicate);
Query query = queryBuilder.createQuery(doNotSearchGroup);
EDIT: Just to update in relation to your comment, you should be able to add a PredicateGroup to another PredicateGroup (as PredicateGroup is a subclass of Predicate). So once you have your negated group, combine it with the path search:
Predicate pathPredicate = new Predicate("path");
pathPredicate.set("path", "/content");
PredicateGroup combinedPredicate = new PredicateGroup();
combinedPredicate.add(pathPredicate);
combinedPredicate.add(doNotSearchGroup);
Query query - queryBuilder.createQuery(combinedPredicate);
It is pretty straightforward implementation.
Use
map.put("group.p.not",true)
map.put("group.1_path","/path1/where/you/donot/want/to/search")
map.put("group.2_path","/path2/where/you/donot/want/to/search")
I've run into the same problem and while I couldn't fully solve it I was able to come up with a workaround using groups and the unequals operator. Something like:
path=/var/xxx
1_property=jcr:primaryType
1_property.value=rep:ACL
1_property.operation=unequals
2_property=jcr:primaryType
2_property.value=rep:GrantACE
2_property.operation=unequals
Btw, map.put("group.p.not",true) did not work for me.
This link has a lot of useful information: https://hashimkhan.in/2015/12/02/query-builder/

How to search across all the fields?

In Lucene, we can use TermQuery to search a text with a field. I am wondering how to search a keyword across a bunch of fields or all the searchable fields?
Another approach, which doesn't require to index anything more than what you already have, nor to combine different queries, is using the MultiFieldQueryParser.
You can provide a list of fields where you want to search on and your query, that's all.
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(
Version.LUCENE_41,
new String[]{"title", "content", "description"},
new StandardAnalyzer(Version.LUCENE_41));
Query query = queryParser.parse("here goes your query");
This is how I would do it with the original lucene library written in Java. I'm not sure whether the MultiFieldQueryParser is available in lucene.net too.
Two approaches
1) Index-time approach: Use a catch-all field. This is nothing but appending all the text from all the fields (total text from your input doc) and place that resulting huge text in a single field. You've to add an additional field while indexing to act as a catch-all field.
2) Search-time approach: Use a BooleanQuery to combine multiple queries, for example TermQuery instances. Those multiple queries can be formed to cover all the target fields.
Example check at the end of the article.
Use approach 2 if you know the target-field list at runtime. Otherwise, you've got to use the 1st approach.
Another easy approach to search across all fields using "MultifieldQueryParser" is use IndexReader.FieldOption.ALL in your query.
Here is example in c#.
Directory directory = FSDirectory.Open(new DirectoryInfo(HostingEnvironment.MapPath(VirtualIndexPath)));
//get analyzer
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
//get index reader and searcher
IndexReader indexReader__1 = IndexReader.Open(directory, true);
Searcher indexSearch = new IndexSearcher(indexReader__1);
//add all possible fileds in multifieldqueryparser using indexreader getFieldNames method
var queryParser = new MultiFieldQueryParser(Version.LUCENE_29, indexReader__1.GetFieldNames(IndexReader.FieldOption.ALL).ToArray(), analyzer);
var query = queryParser.Parse(Criteria);
TopDocs resultDocs = null;
//perform search
resultDocs = indexSearch.Search(query, indexReader__1.MaxDoc());
var hits = resultDocs.scoreDocs;
click here to check out my pervious answer to same quesiton in vb.net

Lucene.net Fuzzy Phrase Search

I have tried this myself for a considerable period and looked everywhere around the net - but have been unable to find ANY examples of Fuzzy Phrase searching via Lucene.NET 2.9.2. ( C# )
Is something able to advise how to do this in detail and/or provide some example code - I would seriously seriously appreciate any help as I am totally stuck ?
I assume that you have Lucene running and created a search index with some fields in it. So let's assume further that:
var fields = ... // a string[] of the field names you wish to search in
var version = Version.LUCENE_29; // your Lucene version
var queryString = "some string to search for";
Once you have all of these you can go ahead and define a search query on multiple fields like this:
var analyzer = LuceneIndexProvider.CreateAnalyzer();
var query = new MultiFieldQueryParser(version, fields, analyzer).Parse(queryString);
Maybe you already got that far and are only missing the fuzzy part. I simply add a tilde ~ to every word in the queryString to tell Lucene to do a fuzzy search for all words in the queryString:
if (fuzzy && !string.IsNullOrEmpty(queryString)) {
// first escape the queryString so that e.g. ~ will be escaped
queryString = QueryParser.Escape(queryString);
// now split, add ~ and join the queryString back together
queryString = string.Join("~ ",
queryString.Split(' ', StringSplitOptions.RemoveEmptyEntries)) + "~";
// now queryString will be "some~ string~ to~ search~ for~"
}
The key point here is that Lucene uses fuzzy search only for terms that end with a ~. That and some more helpful info was found on
http://scatteredcode.wordpress.com/2011/05/26/performing-a-fuzzy-search-with-multiple-terms-through-multiple-lucene-net-document-fields/.