Currently lastest MongoDB stable version is v2.4.6, full text search is in beta state.
Which one will be faster:
To create one field named "text" and extract all text info from all other fields and concatenate them into field - "text" and index that field ("text")
So that will look like
db.collection.ensureIndex( { "text": "text" } );
Just index all fields:
db.collection.ensureIndex( { "$**": "text" } );
Which one is better in performance and in disk storage requirements?
Related
How to index nested objects in Pymongo so that I can perform full text search. For example I've the collection object like this...
{
"_id":"ObjectId(" "5e8b0fa1c869790699efdb2d" ")",
"xmlfileid": "334355343343223567",
"threads":{
"threads_participants":{
"participant":[
{
"#reference": rits_dbx_1
},
{
"#reference": rits_dbx_2
}
]
},
"thread":{
"namedAnchor":"{' ': 'NORP', 'Ho': 'PERSON', 'Lets': 'PERSON', 'Boris Johnson': 'PERSON', 'Britain': 'GPE'}",
"selectedText":{
"fragment":[
{
"#class":"next_steps",
"#text":"rits_dbx_1 said hello this is a good site."
},
{
"#class":"other",
"#text":"rits_dbx_1 said ho ho."
},
{
"#class":"other",
"#text":"rits_dbx_1 said lets put some meaningful stuff here."
},
]
}
}
}
}
I've placed search box in my website and when user types the #text in search box I want to display the #text and class and the xmlfileid
So far I've created index using below command. And I don't know it's the right way to get the result and also please help with query too.
db.xml_collection.createIndex({"threads.thread.selectedText.fragment": "text"})
In my python code I've this but that prints nothing
result = collection.find({"$text": {"$search": "ho ho"}})
Your index is wrong.
MongoDB provides text indexes to support text search queries on string content. text indexes can include any field whose value is a string or an array of string elements.
https://docs.mongodb.com/manual/core/index-text/
If you want to index only #text field, change your index to this:
db.xml_collection.createIndex({"threads.thread.selectedText.fragment.#text": "text"})
Also, you may create Wildcard text index and MongoDB will index all key:value pairs (where value is string / array of string)
db.xml_collection.createIndex({"$**": "text"})
Note: You need to drop any previous text indexes for this collection
I want to filter input text from database and show that data.as shown in below I have tried these query $text query giving this working for text values but how to filter query for numeric fields like id or date.
find({$text :{$search:<input text>, $caseSensitive: false}})
I want output data which match with input text/value(which can be text, date,number).
MongoDB provides text indexes to support text search queries on string content. text indexes can include any field whose value is a string or an array of string elements.
To perform text search queries, you must have a text index on your collection
db.stores.createIndex( { name: "text", description: "text" } )
You can use your search string with $search parameter :
db.stores.find( { $text: { $search: "coffee",$caseSensitive: false, } } )
Full documentation is available here : https://docs.mongodb.com/manual/text-search/
I have a document named "posts", it is like this :
{
"_id" : ObjectId("5afc22290c06a67f081fa463"),
"title" : "Cool",
"description" : "this is amazing"
}
And I have putted index on title and description :
db.posts.createIndex( { title: "text", description: "text" } )
The problem is when I search and type for example "amaz" it return the data with "this is amazing" above, while it should return data only when I type "amazing"
db.posts.find({ $text: { $search: 'amaz' } }, (err, results) => {
return res.json(results);
});
Credit to #amenadiel for the original data here:
https://stackoverflow.com/a/24316510/7948962
From the MongoDB docs:
https://docs.mongodb.com/manual/core/index-text/
Index Entries
text index tokenizes and stems the terms in the indexed fields for the index entries. text index stores one index entry for each unique stemmed term in each indexed field for each document in the collection. The index uses simple language-specific suffix stemming.
This is to allow you to search partial "stem" terms in the index, and have the database return all related results. In your specific scenario, amaz is a bit of an odd token as it is a bit irregular compared to other words such as talking, which is tokenized to the word talk, or talked to talk. Similarly walking and walked to walk.
In your case, the word amazing in your text will be tokenized as amaz. If your column contained data such as amazed, it would receive the same amaz token as well. And those results would also be returned from a search of amaz.
I've read the MongoDB documentation on getting the indexes within a collection, and have also searched SO and Google for my question. I want to get the actual indexed values.
Or maybe my understanding of how MongoDB indexes is incorrect. If I've been indexing a field called text that contains paragraphs, am I right in thinking that what gets indexed is each word in the paragraph?
Either case I want to retrieve the values that were indexed, which db.collection.getIndexes() doesn't seem to be returning.
Well yes and no, in summary.
Indexes work on the "values" of the fields they are supplied to index, and are much like a "card index" in that there is a point of reference to look at to find the location of something that matches that term.
What "you" seem to be asking about here is "text indexes". This is a special index format in MongoDB and other databases as well that looks at the "text" content of a field and breaks down every "word" in that content into a value in that "index".
Typically we do:
db.collection.createIndex({ "text": "text" })
Where the "field name" here is "text" as you asked, but more importantly the type of index here is "text".
This allows you to then insert data like this:
db.collection.insert({ "text": "The quick brown fox jumped over the lazy dog" })
And then search like this, using the $text operator:
db.collection.find({ "$text": { "$search": "brown fox" } })
Which will return and "rank" in order the terms you gave in your query depending how they matched the given "text" of your field in the index on your collection.
Note that a "text" index and it's query does not interact on a specific field. But the index itself can be made over multiple fields. The query and the constraints on the "index" itself are that there can "only be one" text index present on any given collection otherwise errors will occur.
As per mongodb's docs:
"db.collection.getIndexes() returns an array of documents that hold index information for the collection. Index information includes the keys and options used to create the index. For information on the keys and index options, see db.collection.createIndex()."
You first have to create the index on the collection, using the createIndex() method:
db.records.createIndex( { userid: 1 } )
Queries on the userid field are supported by the index:
Example:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
Indexes help you avoid scanning the whole document. They basically are references or pointers to specific parts of your collection.
The docs explain it better:
http://docs.mongodb.org/manual/tutorial/create-an-index/
I have a collection that looks something like this
{ "text1" : "text",
"url" : "http:....",
"title" : "the title",
......,
"search_metadata" : { "tags" : [ "tag1", "tag2", "tag3" ],
"title" : "the title",
"topcis": [ "topic1", "topic2"]
}
}
I want to be able to add a text index to search_metadata and all it's subdocuments.
ensureIndex({search_metadata:"text"}) Gives me no results
and:
ensureIndex({"$**":"text"}) will give me irrelevant data
How can I make it happen?
From the text indexes page:
text indexes can include any field whose value is a string or an array
of string elements. To perform queries that access the text index, use
the $text query operator
Your search_metadata field is a series of sub-documents, not a string or an array of strings, so it basically is not in the right format to make use of a text index in MongoDB as it is currently structured.
Now, embedded in search_metadata you have both strings and arrays of strings, so you could use a text index on those, so an index on {search_metadata.tags : "text"} for example fits the criteria and should work just fine.
Hence, it's a choice between restructuring the field to meet the text index criteria, or a matter of indexing the relevant sub-fields. If you take the latter approach you may find that you don't need text indexes on each of the fields and a simpler (and far smaller) index may serve you just as well (using a normal index on tags and then $elemMatch for example).