$And operator on Mongoldb full text research with word stemming - mongodb

Our project is built on mongodb and I'm doing a full text search with a $and operator on words.
I've searched for similarly questions for this problem (MongoDB Text Search AND multiple search words), but the solution of surrounding words with quotation marks would take out Word Stemming, which is the whole interest of using full text search.
For example, this code with not find "Items that want to be found":
find({$text:{$search:"\"Item\"",$language:"en"}})
Does anyone can provide a solution to this WITH Word Stemming?

Related

VSCode multiline search of two words?

I saw a SO post that says you can search using regex or an actual literal text on it to search multiline texts. But what if you want to (quickly) search two or three of words within a specified lines of text content?
For example, what if you want to search for multiline text area that contains "ruby" and "regex" (assuming you want to know where you took a note on your txt (or markdown or rich text format) file. you may want to search for "how to use regex in ruby" or "the ruby regex tutorial", right? )
Now you can use a simple (but redundant) regex like ruby(.*\n)+regex|regex(.*\n)+ruby. But to me it doesn't look beautiful. For three or more words, this kind of regex workaround increases its redundancy exponentially also, not good.
So is there a smarter way to do this? Thanks.

Odd to_tsquery results for s:* and t:*

I was experimenting with PostgreSQL's text search feature - particularly with the normalization function to_tsquery.
I was using english dictionary(config) and for some reason s and t won't normalize. I understand why i and a would not, but s and t? Interesting.
Are they matched to single space and tab?
Here is the query:
select
to_tsquery('english', 'a:*') as for_a,
to_tsquery('english', 's:*') as for_s,
to_tsquery('english', 't:*') as for_t,
to_tsquery('english', 'u:*') as for_u
fiddle just in case.
You would see 'u:*' is returning as 'u:*' and 'a:*' is not returning anything.
The letters s and t are considered stop words in the english text search dictionary, therefore they get discarded. You can read the stop word list under tsearch_data/english.stop in the postgres shared folder, which you can locate by typing pg_config --sharedir
With pg 11 on ubuntu/debian/mint, that would be
cat /usr/share/postgresql/11/tsearch_data/english.stop
Quoting from the docs,
Stop words are words that are very common, appear in almost every document, and have no discrimination value. Therefore, they can be ignored in the context of full text searching.
It is best to discard english grammar and think of words in a programmatic and logical way as described above. Full text search does not try to infer context based on sentence structuring so it has no use for these words. After all, it's called full text search and not natural language search.
As to how they arrived on the conclusion to add s and t to the stop word list, statistical analysis must have revealed these characters to be noise.

what query can be used to replace \n with space for the text all documents in Mongodb?

I have many documents that have \n, and I want to replace them with white space. Can anyone help me to do that in MongoDB?
{"Definition": "The \n sequence is a popular one found in many languages that support escape sequences."}
I want it to be like this:
{"Definition": "The sequence is a popular one found in many languages that support escape sequences."}
Thanks

MongoDB Text Search AND multiple search words with word stemming

I am trying to search for multiple words in text inclusively(AND operation)
without losing word stemming.
For example:
db.supplies.runCommand("text", {search:"printers inks"})
should return results with (printer and ink) or (printers ink) or (printers ink) or (printers inks) , instead of all results with either printer or ink.
This post covers the search for multiple words as an AND operation, but the solution doesn't search for stemmed words ->MongoDB Text Search AND multiple search words.
The only way I could think of is creating a permutation of all the words and then running the search for the number of permutations(which could be large)
This may not be an effective way to search on a large collection.
Is there a better and smarter way to do it ?
So is there a reason you have to use a text search? If it were me i would use a regular expression.
https://docs.mongodb.com/manual/reference/operator/query/regex/
Off the top of my head something like this.
db.collection.find({products:/printers inks|printers|inks/})
Now i suppose you can do the same thing with a text search too.
db.collection.find({$text:{$search : "\"printers inks\" printers inks"}})
note the escaped quotes.

Removing some paragraph marks in a word document

I copy text from PDF files into word 2010 documents using Abbyy conversion software. I find the result will contain many line breaks which are incorrect. Is there any way I can remove any such marks if they are not preceded by either "." or "?" or "!"
I write macros in excel but have no experience of word coding
You could do a search and replace depending if you can find some sort of rules wich you can apply. Mayeby a little screenshot?