Is it possible to get the exact match of the keywords based the number of them? Below is more clear I guess :)
In index I have this
record 1 "This is text"
record 2 "This is text and text"
then when I search for "This is text" I need to find only the first record.
Please note that I tried many filter but none seems to work, I always get both of them.
An extended match mode query of
"^this is test$"
should do it. Read up on the field-start, field-end and phrase operators for more inforation
Related
I want to search strings like "number 1" or "number 152" or "number 36985".
In all above strings "number " will be constant but digits will change and can have any length.
I tried Search option using wildcard but it doesn't seem to work.
basic regEx operators like + seem to not work.
I tried 'number*[1-9]*' and 'number*[1-9]+' but no luck.
This regular expression only selects upto one digit. e.g. If the string is 'number 12345' it only matches number 12345 (the part which is in bold).
Does anyone know how to do this?
Word doesn't use regular expressions in its search (Find) functionality. It has its own set of wildcard rules. These are very similar to RegEx, but not identical and not as powerful.
Using Word's wildcards, the search text below locates the examples given in the question. (Note that the semicolon separator in 1;100 may be soemthing else, depending on the list separator set in Windows (or on the Mac). My European locale uses a semicolon; the United States would use a comma, for example.
"number [0-9]{1;100}"
The 100 is an arbitrary number I chose for the maximum number of repeats of the search term just before it. Depending on how long you expect a number to be, this can be much smaller...
The logic of the search text is: number is a literal; the valid range of characters following the literal are 0 through 9; there may be one to one hundred of these characters - anything in that range is a match.
The only way RegEx can be used in Word is to extract a string and run the search on the string. But this dissociates the string from the document, meaning Word-specific content (formatting, fields, etc.) will be lost.
Try putting < and > on the ends of your search string to indicate the beginning and ending of the desired strings. This works for me: '<number [1-9]*>'. So does '<number [1-9]#>' which is probably what you want. Note that in Word wildcards the # is used where + is used in other RegEx systems.
I am trying to search for multiple words in text inclusively(AND operation)
without losing word stemming.
For example:
db.supplies.runCommand("text", {search:"printers inks"})
should return results with (printer and ink) or (printers ink) or (printers ink) or (printers inks) , instead of all results with either printer or ink.
This post covers the search for multiple words as an AND operation, but the solution doesn't search for stemmed words ->MongoDB Text Search AND multiple search words.
The only way I could think of is creating a permutation of all the words and then running the search for the number of permutations(which could be large)
This may not be an effective way to search on a large collection.
Is there a better and smarter way to do it ?
So is there a reason you have to use a text search? If it were me i would use a regular expression.
https://docs.mongodb.com/manual/reference/operator/query/regex/
Off the top of my head something like this.
db.collection.find({products:/printers inks|printers|inks/})
Now i suppose you can do the same thing with a text search too.
db.collection.find({$text:{$search : "\"printers inks\" printers inks"}})
note the escaped quotes.
We have two fields: keywords (weight 10) and text (weight 1).
Let's see three records:
A: keywords = "some stuff, happy cat", text = "This is A"
B: keywords = "where stuff is, some dogs", text = "This is B some stuff"
C: keywords = "where some stuff is", text = "This is B some stuff"
When searching for some stuff we want to have A record above the B and C.
Sphinx shows A below the others, because it has less mentions for the stuff. But A has exact match in keywords (comma really means), so it is the only right answer.
How to configure Sphinx to reach that? Any kinds of texts preprocessing are allowed.
You can check various ranking modes as per your requirement.
Please see SPH_RANK_SPH04 ranking mode, this should work as per your expectation
You should mention which version of sphinx you are using.
Please read more details on ranking modes here
In your example C is most relevant.
You can filter by exact matches using quotes around search term.
You need to set matching mode to SPH_MATCH_MODE_EXTENDED2 which tell sphinx to get documents which containt exact string.
I recommend you take a look at extended search syntax.
I call my statement with CONTAINS function, but sometimes it does not return correct records, e.g. I want to return row which contain in one field word 'Your':
SELECT [Email]
,[Comment]
FROM [USERS]
WHERE CONTAINS(Comment, 'Your')
It gives mi 0 result despite that this field contains this word (the same with 'as', 'to', 'was', 'me'). When I use 'given' instead of 'Your' then I receive a result. Is there maybe a list of words which cannot be used with CONTAINS? Or maybe this words are to short (when i use 'name' then i receive the results)? The work 'Your' is at the beginning in field Comment.
The field is of type 'text' and has enabled full-text index.
Words such as those you mention are "stop words"; they are expressly excluded from being indexed and searched in Full Text Search due to how common (and thereby meaningless for searches) they are. You'll notice the same thing when searching Google, for instance.
It is possible to edit the list, but I would avoid doing so except perhaps to add words to it; the words in the list are chosen very well, IMHO, for their lack of utility in searches.
I have an app that utilizes hashtags to help tag posts. I am trying to have a more detailed search.
Lets say one of the records I'm searching is:
The #bird flew very far.
When I search for "flew", "fle", or "#bird", it should return the record.
However, when I search "#bir", it should NOT return the sentence because the whole the tag being searched for doesn't match.
I'm also not sure if "bird" should even return the sentence. I'd be interested how to do that though as well.
Right now, I have a very basic search:
SELECT "posts".* FROM "posts" WHERE (body LIKE '%search%')
Any ideas?
You could do this with LIKE but it would be rather hideous, regexes will serve you better here. If you want to ignore the hashes then a simple search like this will do the trick:
WHERE body ~ E'\\mbird\M''
That would find 'The bird flew very far.' and 'The #bird flew very far.'. You'd want to strip off any #s before search though as this:
WHERE body ~ E'\\m#bird\M''
wouldn't find either of those results due to the nature of \m and \M.
If you don't want to ignore #s in body then you'd have to expand and modify the \m and \M shortcuts yourself with something like this:
WHERE body ~ E'(^|[^\\w#])#bird($|[^\\w#])'
-- search term goes here^^^^^
Using E'(^|[^\\w#])#bird($|[^\\w#])' would find 'The #bird flew very far.' but not 'The bird flew very far.' whereas E'(^|[^\\w#])bird($|[^\\w#])' would find 'The bird flew very far.' but not 'The #bird flew very far.'. You might also want to look at \A instead of ^ and \Z instead of $ as there are subtle differences but I think $ and ^ would be what you want.
You should keep in mind that none of these regex searches (or your LIKE search for that matter) will uses indexes so you're setting yourself up for lots of table scans and performance problems unless you can restrict the searches using something that will use an index. You might want to look at a full-text search solution instead.
It might help to parse the hash tags out of the text and store them in an array in a separate column called say hashtags when the articles are inserted/updated. Remove them from the article body before feeding it into to_tsvector and store the tsvector in a column of the table. Then use:
WHERE body_tsvector ## to_tsquery('search') OR 'search' IN hashtags
You could use a trigger on the table to maintain the hashtags column and the body_tsvector stripped of hash tags, so that the application doesn't have to do the work. Parse them out of the text when entries are INSERTed or UPDATEd.