How to write lucene search to find record containing part of word - lucene.net

I am using lucene search to find records.
I am able to search for any string starting with string
LuceneQuery: {+(+nodeName:cen*^4.0)}
My words are:
Lucene
Cent
Excentric
Percent
How do I return all of them because the query I am using is returning only 1 record ?
<add name="DocumentSearchSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"
enableLeadingWildcards="true"/>

Related

PostgreSQL FTS exact search

Hello everyone I reread all the documentation of full-text search and the following question arose:
How to perform an accurate word search using Postgresql ?
For example, I have 100 document titles.
Names:
Document
Document red
The document is green outdated
I want to send a "Document" request and get only the Document, and not see the rest in the search results.
This is possible by means of Postgresql FTS?
I read all the documentation on full-text search and did not see such possibilities and examples
If your concern is just not to build yet another index, you can write a FTS query to use the index, and combine it with a simple AND equality check to rule out the things the index returns but which you don't want:
WHERE doc = 'Document' AND
tsv ## phraseto_tsquery('english','Document')

MongoDB. Searching substring in string field

Currently, I have a MongoDB instance, which contains a collection with a lot of entities. Each entity contains a string attribute, which represents some text. My goal is to provide a strict text search in the collection. It should work as a MySQL query:
SELECT *
FROM texts
WHERE text LIKE '%test%';
MongoDB text index would be great, but it doesn't provide a strict search. How I could organize a strict search for such data? Could I do some optimization?
I already checked other software (such as ElasticSearch, Lucene, MongoDB, ClickHouse), but I haven't found options to do it. Searching as now took too much time.
In mongoDB you can do it as follow:
db.texts.find({ text:/test/ })

Can a $text search perform a partial match

I'm very confused by this behavior. It seems inconsistent and strange, especially since I've read that Mongo isn't supposed to support partial search terms in full text search. I'm using version 3.4.7 of Mongo DB Community Server. I'm doing these tests from the Mongo shell.
So, I have a Mongo DB collection with a text index assigned. I created the index like this:
db.submissions.createIndex({"$**":"text"})
There is a document in this collection that contains these two values:
"Craig"
"Dr. Bob".
My goal is to do a text search for a document that has multiple matching terms in it.
So, here are tests I've run, and their inconsistent output:
SINGLE TERM, COMPLETE
db.submissions.find({"$text":{"$search":"\"Craig\""}})
Result: Gets me the document with this value in it.
SINGLE TERM, PARTIAL
db.submissions.find({"$text":{"$search":"\"Crai\""}})
Result: Returns nothing, because this partial search term doesn't exactly match anything in the document.
MULTIPLE TERMS, COMPLETE
db.submissions.find({"$text":{"$search":"\"Craig\" \"Dr. Bob\""}})
Result: Returns the document with both of these terms in it.
MULTIPLE TERMS, ONE PARTIAL
db.submissions.find({"$text":{"$search":"\"Craig\" \"Dr. Bo\""}})
Result: Returns the document with both terms in it, despite the fact that one term is partial. There is nothing in the document that matches "Dr. Bo"
MULTIPLE TERMS, BOTH PARTIAL
db.submissions.find({"$text":{"$search":"\"Crai\" \"Dr. Bo\""}})
Result: Returns the document with both terms in it, despite the fact that both terms are partial and incomplete. There is nothing in the document that matches either "Crai" or "Dr. Bo".
Question
So, it all boils down to: why? Why is it, when I do a text search with a partial term with only a single value, nothing gets returned. When I do a text search with two partial terms, I get the matching result? It just seems so strange and inconsistent.
MongoDB $text searches do not support partial matching. MongoDB allows text search queries on string content with support for case insensitivity, delimiters, stop words and stemming. And the terms in your search string are, by default, OR'ed.
Taking your (very useful :) examples one by one:
SINGLE TERM, PARTIAL
// returns nothing because there is no world word with the value `Crai` in your
// text index and there is no whole word for which `Crai` is a recognised stem
db.submissions.find({"$text":{"$search":"\"Crai\""}})
MULTIPLE TERMS, COMPLETE
// returns the document because it contains all of these words
// note in the text index Dr. Bob is not a single entry since "." is a delimiter
db.submissions.find({"$text":{"$search":"\"Craig\" \"Dr. Bob\""}})
MULTIPLE TERMS, ONE PARTIAL
// returns the document because it contains the whole word "Craig" and it
// contains the whole word "Dr"
db.submissions.find({"$text":{"$search":"\"Craig\" \"Dr. Bo\""}})
MULTIPLE TERMS, BOTH PARTIAL
// returns the document because it contains the whole word "Dr"
db.submissions.find({"$text":{"$search":"\"Crai\" \"Dr. Bo\""}})
Bear in mind that the $search string is ...
A string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase.
So, if at least one term in your $search string matches then MongoDB matches that document.
To verify this behaviour, if you edit your document changing Dr. Bob to DrBob then the following queries will return no documents:
db.submissions.find({"$text":{"$search":"\"Craig\" \"Dr. Bo\""}})
db.submissions.find({"$text":{"$search":"\"Crai\" \"Dr. Bo\""}})
These now return no matches because Dr is no longer a whole word in your text index because it is not followed by the . delimiter.
You can do partial searching in mongoose database using mongoose external library called mongoose-fuzzy-search where the search text is broken in various anagrams.
for more information visit this link
User.fuzzySearch('jo').sort({ age: -1 }).exec(function (err, users) {
console.error(err);
console.log(users);
});

Autocomplete by most frequent words - postgres or lucene?

We're using Postgres and its fulltext feature to search for documents (posts content) in our system, and it works really well.
For autocomplete we want to build index (dictionary?) with all words used in documents and search by most frequent ones.
We will always search for one word. We will never search for phrase.
So if I write:
"th"
I will receive (suppose the most frequent words in our documents):
"this"
"there"
"thoughts"
...
How to do it with Postgres? Or maybe we need some more advanced solution like apache lucene / solr ?
Neither postgres fulltext search (which provides lexems) nor postgres trigrams seems to be suitable for this work. Or maybe I am wrong ?
I don't want to manually parse text and ignore all english stopwords which would be error prone. Postgres does good job with this while building lexems index. But intead of lexems, we need to build and search words dictionary without normalization
Thank you for your assistance

Normalize unicode

Let's say I have document indexed with Apache Solr that contains this string:
Klüft skräms inför
I want to be able to find it with search using this keyword (note the "u"-"ü"):
kluft
Is there a way to do this ?
Use the ASCIIFoldingFilterFactory for both the index and query analyzers.