Difference between `{id: 1, title: "text"}` and {title: "text", id: 1} MongoDB Indexes - mongodb

I am designing a schema where a User document stores the Skill ids. A Skill document look like this
Skill = {
id: ObjectId
title: String
description: String
}
User = {
id: ObjectId
skills: [ObjectId]
}
And on the frontend side, a user can add Skills by searching it's title. Therefore, I indexed skills collection by {id: 1, title: "text"}. I want to know whether ordering matters when we combine text indexes with the numeric ones.

Yes, it makes a big difference, the documentation about composite text indexes says
If the compound text index includes keys preceding the text index key, to perform a $text search, the query predicate must include equality match conditions on the preceding keys.
That means if you do {id: 1, title: 'text'} you can only use the text index on title if you also constrain the search to within a single id.
If you do {title: 'text', id: 1} you will be able to text-search title by itself, or further constrained by id, or retrieve the matching id for the text search results.
Are you sure you need that id column in the index? Why not just the text index?

Yes, it does matter. Straight from the docs:
The order of the fields listed in a compound index is important. The index will contain references to documents sorted first by the values of the item field and, within each value of the item field, sorted by values of the stock field. See Sort Order for more information.
In your current structure if you only query the title field without an restrictions (no query) on the id field you will not be able to utilize the index.

Related

How can I search in arrays of integers with a compound MongoDB Atlas search query?

I am working on a function that helps me find similar documents, sorted by score, using the full-text search feature of MongoDB Atlas.
I set my collection index as "dynamic".
I am looking for similarities in text fields, such as "name" or "description", but I also want to look in another field, "thematic", that stores integer values (ids) of thematics.
Example:
Let say that I have a reference document as follows:
{
name: "test",
description: "It's a glorious day!",
thematic: [9, 3, 2, 33]
}
I want my search to match these int in the thematic field and include their weight in the score calculation.
For instance, if I compare my reference document with :
{
name: "test2",
description: "It's a glorious night!",
thematic: [9, 3, 6, 22]
}
I want to increase the score since the thematic field shares the 9 and 3 values with the reference document.
Question:
What search operator should I use to achieve this? I can input array of strings as queries with a text operator but I don't know how to proceed with integers.
Should I go for another approach? Like splitting the array to compare into several compound.should.term queries?
Edit:
After a fair amount of search, I found this here and here:
Atlas Search cannot index numeric or date values if they are part of an array.
Before I consider to change the whole data structure of my objects, I wanted to make sure that there is no workaround.
For instance, could it be done with custom analyzers?
I solved it by adding a trigger to my collection. Each time a document is inserted or updated, I update the thematic and other similar fields counterparts, e.g. _thematic, where I store the string value of the integers. I then use this _thematic field for search.
Here is a sample code demonstrating it:
exports = function (changeEvent) {
const fullDocument = changeEvent.fullDocument;
const format = (itemSet) => {
let rst = [];
Object.keys(itemSet).forEach(item => rst.push(itemSet[item].toString()));
return rst;
};
let setter = {
_thematic: fullDocument.thematic ? format(fullDocument.thematic) : [],
};
const docId = changeEvent.documentKey._id;
const collection = context.services.get("my-cluster").db("dev").collection("projects");
const doc = collection.findOneAndUpdate({ _id: docId },
{ $set: setter });
return;
};
I'm pretty sure it can be done in a cleaner way, so if someone post it, I'll switch the selected answer to her/his.
Another way to solve this is to make a custom analyser with character mapping that will replace each digit with its string counterpart. I haven’t tried this one tho. See https://docs.atlas.mongodb.com/reference/atlas-search/analyzers/custom/#mapping
Alternatives welcome!

How to nested query in mongodb?

How to index nested objects in Pymongo so that I can perform full text search. For example I've the collection object like this...
{
"_id":"ObjectId(" "5e8b0fa1c869790699efdb2d" ")",
"xmlfileid": "334355343343223567",
"threads":{
"threads_participants":{
"participant":[
{
"#reference": rits_dbx_1
},
{
"#reference": rits_dbx_2
}
]
},
"thread":{
"namedAnchor":"{' ': 'NORP', 'Ho': 'PERSON', 'Lets': 'PERSON', 'Boris Johnson': 'PERSON', 'Britain': 'GPE'}",
"selectedText":{
"fragment":[
{
"#class":"next_steps",
"#text":"rits_dbx_1 said hello this is a good site."
},
{
"#class":"other",
"#text":"rits_dbx_1 said ho ho."
},
{
"#class":"other",
"#text":"rits_dbx_1 said lets put some meaningful stuff here."
},
]
}
}
}
}
I've placed search box in my website and when user types the #text in search box I want to display the #text and class and the xmlfileid
So far I've created index using below command. And I don't know it's the right way to get the result and also please help with query too.
db.xml_collection.createIndex({"threads.thread.selectedText.fragment": "text"})
In my python code I've this but that prints nothing
result = collection.find({"$text": {"$search": "ho ho"}})
Your index is wrong.
MongoDB provides text indexes to support text search queries on string content. text indexes can include any field whose value is a string or an array of string elements.
https://docs.mongodb.com/manual/core/index-text/
If you want to index only #text field, change your index to this:
db.xml_collection.createIndex({"threads.thread.selectedText.fragment.#text": "text"})
Also, you may create Wildcard text index and MongoDB will index all key:value pairs (where value is string / array of string)
db.xml_collection.createIndex({"$**": "text"})
Note: You need to drop any previous text indexes for this collection

How can I query whether each element inside an array match a collection field

I am using mongodb to save user information. There is a userId field in that collection. I get many userIds in my application and saved as an array. I need to query whether all the userIds in that array exist in the collection. If not, find out all the missing ones. Is there one query command does the work? I don't want to query the userId one by one. So what is the better way to achieve this?
The user collection is very simple as below and there is no nested data.
userId: String
name: String
gender: String
phone: String
For example, I have an array of ids [1, 2, 3]. I have to run query three times to check whether these are users to match the three ids.
This can be done with the $in operator.
Example:
db.users.find( {userId: { $in: [ 1, 2, 3 ] } } );
Once you have the users pulled back from that you can determine in the application layer which users did not come back.

Get text words from query

I've read the MongoDB documentation on getting the indexes within a collection, and have also searched SO and Google for my question. I want to get the actual indexed values.
Or maybe my understanding of how MongoDB indexes is incorrect. If I've been indexing a field called text that contains paragraphs, am I right in thinking that what gets indexed is each word in the paragraph?
Either case I want to retrieve the values that were indexed, which db.collection.getIndexes() doesn't seem to be returning.
Well yes and no, in summary.
Indexes work on the "values" of the fields they are supplied to index, and are much like a "card index" in that there is a point of reference to look at to find the location of something that matches that term.
What "you" seem to be asking about here is "text indexes". This is a special index format in MongoDB and other databases as well that looks at the "text" content of a field and breaks down every "word" in that content into a value in that "index".
Typically we do:
db.collection.createIndex({ "text": "text" })
Where the "field name" here is "text" as you asked, but more importantly the type of index here is "text".
This allows you to then insert data like this:
db.collection.insert({ "text": "The quick brown fox jumped over the lazy dog" })
And then search like this, using the $text operator:
db.collection.find({ "$text": { "$search": "brown fox" } })
Which will return and "rank" in order the terms you gave in your query depending how they matched the given "text" of your field in the index on your collection.
Note that a "text" index and it's query does not interact on a specific field. But the index itself can be made over multiple fields. The query and the constraints on the "index" itself are that there can "only be one" text index present on any given collection otherwise errors will occur.
As per mongodb's docs:
"db.collection.getIndexes() returns an array of documents that hold index information for the collection. Index information includes the keys and options used to create the index. For information on the keys and index options, see db.collection.createIndex()."
You first have to create the index on the collection, using the createIndex() method:
db.records.createIndex( { userid: 1 } )
Queries on the userid field are supported by the index:
Example:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
Indexes help you avoid scanning the whole document. They basically are references or pointers to specific parts of your collection.
The docs explain it better:
http://docs.mongodb.org/manual/tutorial/create-an-index/

Multi-key Indexing on an Entire Array

MongoDB's docs explain multi-key indexing. Consider this comment document.
{
"_id": ObjectId(...)
"title": "Grocery Quality"
"comments": [
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
]
}
The docs explain that it's possible to index on the comments.text, or any comments' field. However, is it possible to index on the comments key itself?
This post demonstrates indexing on an array of strings, however, the above comments field is an array of JSON objects.
Based on Antoine Girbal's article, it appears possible to index on an array of JSON objects where each JSON object has a different key name. However, it doesn't appear possible where each JSON object in the array shares the same key names.
Example - https://gist.github.com/kman007us/6797422
Yes you can index subdocuments and they can be in a multikey index. When indexing a whole subdocuments, it will only match when searching against the whole document eg:
db.test.find({records: {hair: "brown"}})
Searches for records that match documents that are exactly {hair: "brown"} and it can use the index to find it.
If you want to find any sub documents that have hair="brown" and any other fields the dot notation is needed eg:
db.test.find({"records.hair": "brown"})
However, there is no index to use for that - so its a full table scan.
Please note: There are limitations on index size and whole documents could easily exceed that size.