Get text words from query - mongodb

I've read the MongoDB documentation on getting the indexes within a collection, and have also searched SO and Google for my question. I want to get the actual indexed values.
Or maybe my understanding of how MongoDB indexes is incorrect. If I've been indexing a field called text that contains paragraphs, am I right in thinking that what gets indexed is each word in the paragraph?
Either case I want to retrieve the values that were indexed, which db.collection.getIndexes() doesn't seem to be returning.

Well yes and no, in summary.
Indexes work on the "values" of the fields they are supplied to index, and are much like a "card index" in that there is a point of reference to look at to find the location of something that matches that term.
What "you" seem to be asking about here is "text indexes". This is a special index format in MongoDB and other databases as well that looks at the "text" content of a field and breaks down every "word" in that content into a value in that "index".
Typically we do:
db.collection.createIndex({ "text": "text" })
Where the "field name" here is "text" as you asked, but more importantly the type of index here is "text".
This allows you to then insert data like this:
db.collection.insert({ "text": "The quick brown fox jumped over the lazy dog" })
And then search like this, using the $text operator:
db.collection.find({ "$text": { "$search": "brown fox" } })
Which will return and "rank" in order the terms you gave in your query depending how they matched the given "text" of your field in the index on your collection.
Note that a "text" index and it's query does not interact on a specific field. But the index itself can be made over multiple fields. The query and the constraints on the "index" itself are that there can "only be one" text index present on any given collection otherwise errors will occur.

As per mongodb's docs:
"db.collection.getIndexes() returns an array of documents that hold index information for the collection. Index information includes the keys and options used to create the index. For information on the keys and index options, see db.collection.createIndex()."
You first have to create the index on the collection, using the createIndex() method:
db.records.createIndex( { userid: 1 } )
Queries on the userid field are supported by the index:
Example:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
Indexes help you avoid scanning the whole document. They basically are references or pointers to specific parts of your collection.
The docs explain it better:
http://docs.mongodb.org/manual/tutorial/create-an-index/

Related

Is there a way to define single fields that are never indexed in firestore in all collections

I understand that index has a cost in firestore. Most of the time we simply store objects without really caring about index and even if we don’t want most of the fields to be indexed.
If I understand correctly, any field at any level are indexed. I.e. for the following document in pseudo json
{
"root_field1": "abc" (indexed)
"root_field2": "def" (indexed)
"root_field3": {
"Sub_field1: "ghi" (indexed)
"sub_field2: "jkl" (indexed)
"sub_field3: {
"Inner_field1: "mno" (indexed)
"Inner_field2: "pqr" (indexed)
}
}
Let’s assume that I have the following record
{
"name": "abc"
"birthdate": "2000-01-01"
"gender": "m"
}
Let’s assume that I just want the field "name" to be indexed. One solution (A), without having to specify every field is to define it this way (i.e. move the root fields to a sub level unindexed), and exclude unindexed from being indexed
{
"name": "abc"
"unindexed" {
"birthdate": "2000-01-01"
"gender": "m"
}
Ideally I would like to just specify a prefix such as _ to prevent each field to be indexed but there is no global solution for that.
{
"name": "abc"
"_birthdate": "2000-01-01"
"_gender": "m"
}
Is my solution (A) correct and is there a more elegant generic solution?
Thanks!
Accordinig to the documentation
https://cloud.google.com/firestore/docs/query-data/indexing
Add a single-field index exemption
Single-field index exemptions allow you to override automatic index
settings for specific fields in a collection. You can add a
single-field exemptions from the console:
Go to the Single Field Indexes section.
Click Add Exemption.
Enter a Collection ID and Field path.
Select new indexing settings for this field. Enable or disable
automatically updated ascending, descending, and array-contains
single-field indexes for this field.
Click Save Exemption.

MongoDb Indexing full text search

I have a document named "posts", it is like this :
{
"_id" : ObjectId("5afc22290c06a67f081fa463"),
"title" : "Cool",
"description" : "this is amazing"
}
And I have putted index on title and description :
db.posts.createIndex( { title: "text", description: "text" } )
The problem is when I search and type for example "amaz" it return the data with "this is amazing" above, while it should return data only when I type "amazing"
db.posts.find({ $text: { $search: 'amaz' } }, (err, results) => {
return res.json(results);
});
Credit to #amenadiel for the original data here:
https://stackoverflow.com/a/24316510/7948962
From the MongoDB docs:
https://docs.mongodb.com/manual/core/index-text/
Index Entries
text index tokenizes and stems the terms in the indexed fields for the index entries. text index stores one index entry for each unique stemmed term in each indexed field for each document in the collection. The index uses simple language-specific suffix stemming.
This is to allow you to search partial "stem" terms in the index, and have the database return all related results. In your specific scenario, amaz is a bit of an odd token as it is a bit irregular compared to other words such as talking, which is tokenized to the word talk, or talked to talk. Similarly walking and walked to walk.
In your case, the word amazing in your text will be tokenized as amaz. If your column contained data such as amazed, it would receive the same amaz token as well. And those results would also be returned from a search of amaz.

mongoDB text index on subdocuments

I have a collection that looks something like this
{ "text1" : "text",
"url" : "http:....",
"title" : "the title",
......,
"search_metadata" : { "tags" : [ "tag1", "tag2", "tag3" ],
"title" : "the title",
"topcis": [ "topic1", "topic2"]
}
}
I want to be able to add a text index to search_metadata and all it's subdocuments.
ensureIndex({search_metadata:"text"}) Gives me no results
and:
ensureIndex({"$**":"text"}) will give me irrelevant data
How can I make it happen?
From the text indexes page:
text indexes can include any field whose value is a string or an array
of string elements. To perform queries that access the text index, use
the $text query operator
Your search_metadata field is a series of sub-documents, not a string or an array of strings, so it basically is not in the right format to make use of a text index in MongoDB as it is currently structured.
Now, embedded in search_metadata you have both strings and arrays of strings, so you could use a text index on those, so an index on {search_metadata.tags : "text"} for example fits the criteria and should work just fine.
Hence, it's a choice between restructuring the field to meet the text index criteria, or a matter of indexing the relevant sub-fields. If you take the latter approach you may find that you don't need text indexes on each of the fields and a simpler (and far smaller) index may serve you just as well (using a normal index on tags and then $elemMatch for example).

Multi-key Indexing on an Entire Array

MongoDB's docs explain multi-key indexing. Consider this comment document.
{
"_id": ObjectId(...)
"title": "Grocery Quality"
"comments": [
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
]
}
The docs explain that it's possible to index on the comments.text, or any comments' field. However, is it possible to index on the comments key itself?
This post demonstrates indexing on an array of strings, however, the above comments field is an array of JSON objects.
Based on Antoine Girbal's article, it appears possible to index on an array of JSON objects where each JSON object has a different key name. However, it doesn't appear possible where each JSON object in the array shares the same key names.
Example - https://gist.github.com/kman007us/6797422
Yes you can index subdocuments and they can be in a multikey index. When indexing a whole subdocuments, it will only match when searching against the whole document eg:
db.test.find({records: {hair: "brown"}})
Searches for records that match documents that are exactly {hair: "brown"} and it can use the index to find it.
If you want to find any sub documents that have hair="brown" and any other fields the dot notation is needed eg:
db.test.find({"records.hair": "brown"})
However, there is no index to use for that - so its a full table scan.
Please note: There are limitations on index size and whole documents could easily exceed that size.

How to Retrieve any element value from mongoDB?

Suppose I have following collection :
{ _id" : ObjectId("4f1d8132595bb0e4830d15cc"),
"Data" : "[
{ "id1": "100002997235643", "from": {"name": "Joannah" ,"id": "100002997235643"} , "label" : "test" } ,
{ "id1": "100002997235644", "from": {"name": "Jon" ,"id": "100002997235644"} , "label" : "test1" }
]" ,
"stat" : "true"
}
How can I retrieve id1 , name , id ,label or any other element?
I am able to get _id field , DATA (complete array) but not the inner elements in DATA.
You cannot query for embedded structures. You always query for top level documents. If you want to query for individual elements from your array you will have to make those element top level documents (so, put them in their own collection) and maintain an array of _ids in this document.
That said, unless the array becomes very large it's almost always more efficient to simply grab your entire document and find the appropriate element in your app.
I don't think you can do that. It is explained here.
If you want to access specific fields, then following MongoDB Documentation,
you could add a flag parameter to your query, but you should redesign your documents for this to be useful:
Field Selection
In addition to the query expression, MongoDB queries can take some additional arguments. For example, it's possible to request only certain fields be returned. If we just wanted the social security numbers of users with the last name of 'Smith,' then from the shell we could issue this query:
// retrieve ssn field for documents where last_name == 'Smith':
db.users.find({last_name: 'Smith'}, {'ssn': 1});
// retrieve all fields *except* the thumbnail field, for all documents:
db.users.find({}, {thumbnail:0});