Lucene.NET version 4.8 beta casing issue [duplicate] - lucene.net

This question already has answers here:
Java Lucene 4.5 how to search by case insensitive
(3 answers)
Closed 3 years ago.
I'm using Lucene.NET version 4.8 (beta) for a little search task in a solution I'm doing, but have problems searching case insensitive. I know that Lucene isn't case insensitive, but when using the StandardAnalyzer, it should lowercase the data stored (according to the documentation here StandardAnalyzer), as long as you make sure the queries are done right.
So any idea what I'm doing wrong here? I've stored the data "Kirsten" in a field in 4 different documents, and when searching for (lowercased) "kirsten" I get no hits, but when searching for "Kirsten" I get the expected 4.
Here's my query code:
query = query.ToLowerInvariant();
BooleanQuery q = new BooleanQuery {
new BooleanClause(new WildcardQuery(new Term(FieldNames.Name, query + WildcardQuery.WILDCARD_STRING)), Occur.SHOULD),
new BooleanClause(new WildcardQuery(new Term("mt-year", query)), Occur.SHOULD),
new BooleanClause(new WildcardQuery(new Term("mt-class", query + WildcardQuery.WILDCARD_STRING)), Occur.SHOULD)
};
And the issue is that the users would always write the lowercase version, and expect it to find both lower- and upper-case.

As #Peska wrote in the comments, this was a case of using StringField instead of TextField when adding the document (and data) to Lucene.
Once I switched to using TextField, everything worked as expected.

Related

How to get range of the text in Word document programmatically in MS Office Word add-in using JS API [duplicate]

This question already exists:
Get range of the text in Word document programmatically in MS Office Word add-in using JS API [closed]
Closed 3 years ago.
I need to get numerical range of the text(example: startpoint-30, endpoint-35) in word document programmatically (via MS Office add-in), but I can't find how to do that via JS.
Here is an example:
Hello my friend Pete, I talked to your friend Robert yesterday and he
told about his friend Ann.
So, I need to get range of any word i want and create array of word`s ranges for future development. For example if we speak about word "Pete" the range of it should be (16,20), if the beginning of the text is 0. When I researched this on the Internet I've found some info that it seems to be impossible to do with JS API.
But i found such functionality in .NET docs. Here is the link
https://learn.microsoft.com/en-us/visualstudio/vsto/how-to-programmatically-define-and-select-ranges-in-documents?view=vs-2019
So the final question. Is it possible(if yes, is it very complicated and how can I achieve that) to do such functionality that I've described above with JS API or I should switch to .NET not to waste my time.
I'm assuming that you mean literally to get the numerical bounds of a range, not that you want to use numerical bounds that you already have to get a range. So if I understand what you want to do, then I recommend that you try the following:
Get a reference to the entire text as a Range object. Then use the Range.split method to get the range that precedes the "Pete". The text of first member of the ranges collection that is returned is "Hello my friend ". The length of this string is your start point. Your end point is the sum of the start point and the length of "Pete".
var target = "Pete";
var ranges = myRange.split([target], true, true, false);
var precedingRange = ranges.getFirst();
var startPoint = precedingRange.text.length;
var endPoint = starPoint + target.length;

Performance issue with fluent query in EF vs SharpRepository

I was having some performance issues using SharpRepository, and after playing around the SQL Query Profiler I found the reason.
With EF I can do stuff like this:
var books = db.Books.Where(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
EF will not really do anything until books is used (last line) and then it will compile a query that will select only the small set of data matching year and author from the db.
But SharpRepository evaluates books at once, so this:
var books = book_repo.Books.FindAll(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
will compile a query like "select * from Books where Year == '2016'" at the first line, and get ALL those records from the database! Then at the second line it will make a search for the author within the C# code... That behaviour can be a major difference in performance when using large databases, and it explains why my queries timed out...
I tried using repo.GetAll().Where() instead of repo.FindAll().... but it worked the same way.
Am I misunderstanding something here, and is there a way around this issue?
You can use repo.AsQueryable() but by doing that you lose some of the functionality that SharpRepository can provide, like caching or and aspects/hooks you are using. It basically takes you out of the generic repo layer and lets you use the underlying LINQ provider. It has it's benefits for sure but in your case you can just build the Predicate conditionally and pass that in to the FindAll method.
You can do this by building an Expression predicate or using Specifications. Working with the Linq expressions does not always feel clean, but you can do it. Or you can use the Specification pattern built into SharpRepository.
ISpecification<Book> spec = new Specification<Book>(x => x.Year == 2016);
if (!string.IsNullorEmpty(search_author))
{
spec = spec.And(x => x.Author.Contains(search_author));
}
return repo.FindAll(spec);
For more info on Specifications you can look here: https://github.com/SharpRepository/SharpRepository/blob/develop/SharpRepository.Samples/HowToUseSpecifications.cs
Ivan Stoev provided this answer:
"The problem is that most of the repository methods return IEnumerable. Try repo.AsQueryable(). "

search function in mongoDB with case-insensitive query [duplicate]

This question already has answers here:
case insensitive find in mongodb for usernames in php
(2 answers)
Closed 8 years ago.
I am doing with function in mongoDB, now it's have problem with case-insensitive data. this is my code in function
$where = array(TblFact::Fou_Name => array('$regex' =>$SearchNameFactory));
this code when data in uppercase and i search by lowercase is return null. So anyone can help me to find solution for case-insensitive query ?
I am looking to see your replay soon. Thanks ...
Thanks everyone for help me , now my problem have been resolve by
$where = array(TblFact::Fou_Name => new MongoRegex("{$SearchNameFactory}/i"));
hope it can help to anyone who meet problem like me
thanks

Sphinx Search term boost

Is there a way I can add a weight to each word in my query?
I need to do something like this (Lucene query):
"word1^50|word2^45|word3^25|word4^20"
All answers I found online are old and I was hoping this changed.
UPDATE:
Sphinx introduced term boosting in version 2.2.3: http://sphinxsearch.com/docs/current/extended-syntax.html
Usage:
select id,weight() from ljplain where match('open source^2') limit 2 option ranker=expr('sum(max_idf)*1000');
No nothing really changed. The same old workarounds should still work tho.

Solr search error when dealing with Arabic string

I'm struggling with Solr search Arabic for several days and made some experiment. Here is the simple reflection of the problem.
After I store some Arabic sentence (now only 1 word السوري ) into database and have Solr index it, then query it by q=*:*&wt=python,(if no wt part, it was garbled chars) the response is:
'\u00d8\u00a7\u00d9\u201e\u00d8\u00b3\u00d9\u02c6\u00d8\u00b1\u00d9\u0160'
The actual word I store there for index is coding in another way:
'\xd8\xa7\xd9\x84\xd8\xb3\xd9\x88\xd8\xb1\xd9\x8a'
As you can tell, there is a one-to-to corresponding from \xd8↔\u00d8. But I don't know what is the name of this coding, thus I cannot convert it. And when I do the search as: <>/select/?q=السوري&wt=python,the response is:
{'responseHeader':{'status':0,'QTime':0,'params':{'wt':'python','q':u'\u0627\u0644\u0633\u0648\u0631\u064a'}},'response':{'numFound':0,'start':0,'docs':[]}}
No docs found and it seems using a third version for coding u'\u0627\u0644\u0633\u0648\u0631\u064a'. if I take it and encode('utf8') then it convert back to '\xd8\xa7\xd9\x84\xd8\xb3\xd9\x88\xd8\xb1\xd9\x8a'.
In summary, when it (السوري) is in my code (python) or in data base (mysql),
it presents as 'form1':
'\xd8\xa7\xd9\x84\xd8\xb3\xd9\x88\xd8\xb1\xd9\x8a'
When it is indexed by Solr, it converts to form2:
'\u00d8\u00a7\u00d9\u201e\u00d8\u00b3\u00d9\u02c6\u00d8\u00b1\u00d9\u0160'
And when I use <>/select/?q=السوري&wt=python, to query from browser (Google chrome), it becomes form3:
'\u0627\u0644\u0633\u0648\u0631\u064a'
(which could convert back to form1 by encode('utf8') But since they are different, the search matches nothing.
Therefore, those three different encode strategy may be the core problem. Could anyone help me figure it out and solve the search problem?
Thanks in advance.