Index not picked with nested field hierarchy but gets picked in the flatten mode - mongodb

so i am struggling for 2 weeks on why does not my indexes get picked when i “explain” my queries.
i have this query:
{ “$and”: [
{ "extraProperties.class": "Residential" }, { "extraProperties.type": "Sale" }, { "extraProperties.propertyType": "Condo Apartment" }, { "extraProperties.propertyTypeStyle": "Apartment" } ] }
the above query wont pick this index :
{ “extraProperties.class”:1 , “extraProperties.type” : 1, “extraProperties.propertyType”:1,“extraProperties.propertyTypeStyle”:1}
i have been testing everything these days and finally i decided to flatten the hierarchy and now my query looks like this:
{ “$and”: [
{ “class”: “Residential” }, { “type”: “Sale” }, { “propertyType”: “Condo Apartment” }, { “propertyTypeStyle”: “Apartment” }
] }
now the above query will pick this index :
{ “class”:1 , “type” : 1, “propertyType”:1,“propertyTypeStyle”:1}
could someone explain what the hell is going on there?!?!
explain result:
https://drive.google.com/file/d/1bs_mqO-1FEBHQ_FsBWgP2TQ2jeiQz4q_/view?usp=sharing

Related

Tricky MongoDB search challenge

I have a tricky mongoDB problem that I have never encountered.
The Documents:
The documents in my collection have a search object containing named keys and array values. The keys are named after one of eight categorys and the corresponding value is an array containing items from that category.
{
_id: "bRtjhGNQ3eNqTiKWa",
/* */
search :{
usage: ["accounting"],
test: ["knowledgetest", "feedback"]
},
test: {
type:"list",
vals: [
{name:'knowledgetest', showName: 'Wissenstest'},
{name:'feedback', showName: '360 Feedback'},
]
},
usage: {
type:"list",
vals: [
{name:'accounting', showName: 'Accounting'},
]
}
},
{
_id: "7bgvegeKZNXkKzuXs",
/* */
search :{
usage: ["recruiting"],
test: ["intelligence", "feedback"]
},
test: {
type:"list",
vals: [
{name:'intelligence', showName: 'Intelligenztest'},
{name:'feedback', showName: '360 Feedback'},
]
},
usage: {
type:"list",
vals: [
{name:'recruiting', showName: 'Recruiting'},
]
}
},
The Query
The query is an object containing the same category - keys and array - values.
{
usage: ["accounting", "assessment"],
test : ["feedback"]
}
The desired outcome
If the query is empty, I want all documents.
If the query has one category and any number of items, I want all the documents that have all of the items in the specified category.
If the query has more then one category, I want all the documents that have all of the items in all of the specified categorys.
My tries
I tried all kinds of variations of:
XX.find({
'search': {
"$elemMatch": {
'tool': {
"$in" : ['feedback']
}
}
}
No success.
EDIT
Tried: 'search.test': {$all: (query.test ? query.test : [])} which gives me no results if I have nothing selected; the right documents when I am only looking inside the test category; and nothing when I additionally look inside the usage category.
This is at the heart of my app, thus I historically put up a bounty.
let tools = []
const search = {}
for (var q in query) {
if (query.hasOwnProperty(q)) {
if (query[q]) {
search['search.'+q] = {$all: query[q] }
}
}
}
if (Object.keys(query).length > 0) {
tools = ToolsCollection.find(search).fetch()
} else {
tools = ToolsCollection.find({}).fetch()
}
Works like a charm
What I already hinted at in the comment: your document structure does not support efficient and simple searching. I can only guess the reason, but I suspect that you stick to some relational ideas like "schemas" or "normalization" which just don't make sense for a document database.
Without digging deeper into the problem of modeling, I could imagine something like this for your case:
{
_id: "bRtjhGNQ3eNqTiKWa",
/* */
search :{
usage: ["accounting"],
test: ["knowledgetest", "feedback"]
},
test: {
"knowledgetest" : {
"showName": "Wissenstest"
},
"feedback" : {
"showName": "360 Feedback"
}
},
usage: {
"accounting" : {
"values" : [ "knowledgetest", "feedback" ],
"showName" : "Accounting"
}
}
},
{
_id: "7bgvegeKZNXkKzuXs",
/* */
search : {
usage: ["recruiting"],
test: ["intelligence", "feedback"]
},
test: {
"intelligence" : {
showName: 'Intelligenztest'
},
"feedback" : {
showName: '360 Feedback'
}
},
usage: {
"recruiting" : {
"values" : [ "intelligence", "feedback" ],
"showName" : "Recruiting"
}
}
}
Then, a search for "knowledgetest" and "feedback" in "accounting" would be a simple
{ "usage.accounting.values" : { $all : [ "knowledgetest", "feedback"] } }
which can easily be used multiple times in an and condition:
{
{ "usage.accounting.values" : { $all : [ "knowledgetest", "feedback"] } },
{ "usage.anothercategory.values" : { $all [ "knowledgetest", "assessment" ] } }
}
Even the zero-times-case matches your search requirements, because an and-filter with none of these criteria yields {} which is the find-everything filter expression.
Once more, to make it absolutely clear: when using mongo, forget everything you know as "best practice" from the relational world. What you need to consider is: what are your queries, and how can my document model support these queries in an ideal way.

Exclude nested subdocuments in Mongodb without arrays

Here is how data are inserted in a "Products" MongoDB collection (using Meteor):
Products.insert(
{
productOne:
{
publicData:
{
pricePerUnit : 1,
label : "The first product"
},
privateData:
{
test1: "xxxxx",
test2: "xxxxx"
}
},
productTwo:
{
publicData:
{
pricePerUnit : 2,
label : "The second product"
},
privateData:
{
test1: "yyyyy",
test2: "yyyyy"
}
}
}
);
I would like to retrieve all the products, but without the "privateData" subdocuments, to get this:
{
productOne:
{
publicData:
{
pricePerUnit : 1,
label : "The first product"
}
},
productTwo:
{
publicData:
{
pricePerUnit : 2,
label : "The second product"
}
}
}
I tried several things with "$elemMatch" but honnestly I didn't succeed in anything, I have trouble understanding how I am even supposed to do that.
Would anyone have a suggestion? Any help would be greatly appreciated.
Thanks!
Your Query would be something Similar to this
Products.find({},{
fields: {
privateData: 0
}
}
privateData:0 will make sure that the field is omitted.
please Refer https://docs.mongodb.org/manual/tutorial/project-fields-from-query-results/ for more info
if you can use the aggregation framework you can use the $project operator:
db.<colletion_name>.aggregate( { $project: { publicData: 1} } );
And you will get back all of your documents with only the publicData field

Merge changeset documents in a query

I have recorded changes from an information system in a mongo database. Every time a set of values are set or changed, a record is saved in the mongo database.
The change collection is in the following form:
{ "user_id": 1, "timestamp": { "date" : "2010-09-22 09:28:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "valueA", "fieldB": "valueB", "fieldC": "valueC" } }
{ "user_id": 1, "timestamp": { "date" : "2010-09-24 19:01:52", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "new_valueA", "fieldB": null, "fieldD": "valueD" } }
{ "user_id": 1, "timestamp": { "date" : "2010-10-01 11:11:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldD": "new_valueD" } }
Of course there are thousands of records per user with different attributes which represent millions of records. What I want to do is to see a user status at a given time. By example, the user_id 1 at 2010-09-30 would be
fieldA: new_valueA
fieldC: valueC
fieldD: valueD
This means I need to flatten all the changes prior to a given date for a given user into a single record. Can I do that directly in mongo ?
Edit: I am using the 2.0 version of mongodb hence cannot benefit from the aggregation framework.
Edit: It sounds I have found the answer to my question.
var mapTimeAndChangesByUserId = function() {
var key = this.user_id;
var value = { timestamp: this.timestamp.date, changes: this.changes };
emit(key, value);
}
var reduceMergeChanges = function(user_id, changeset) {
var mergeFunction = function(a, b) { for (var attr in b) a[attr] = b[attr]; };
var result = {};
changeset.forEach(function(e) { mergeFunction(result, e.changes); });
return { timestamp: changeset.pop().timestamp, changes: result };
}
The reduce function merges the changes in the order they come and returns the result.
db.user_change.mapReduce(
mapTimeAndChangesByUserId,
reduceMergeChanges,
{
out: { inline: 1 },
query: { user_id: 1, "timestamp.date": { $lt: "2010-09-30" } },
sort: { "timestamp.date": 1 }
});
'results' : [
"_id": 1,
"value": {
"timestamp": "2010-09-24 19:01:52",
"changes": {
"fieldA": "new_valueA",
"fieldB": null,
"fieldC": "valueC",
"fieldD": "valueD"
}
}
]
Which is fine to me.
You could write a MR to do this.
Since the fields are a lot like tags you can modify a nice cookbook example of counting tags here: http://cookbook.mongodb.org/patterns/count_tags/ of course instead of counting you want the latest value applied (assumption since this is not clear in your question) for that field.
So lets get our map function:
map = function() {
if (!this.changes) {
// If there were not changes for some reason lets bail this record
return;
}
// We iterate the changes
for (index in this.changes) {
emit(index /* We emit the field name */, this.changes[index] /* We emit the field value */);
}
}
And now for our reduce:
reduce = function(values){
// This part is dependant upon your input query. If you add a sort of
// date (ts) DESC then you will prolly want the first index (0) not the last as
// gathered here by values.length
return values[values.length];
}
And this will output a single document per field change of the type:
{
_id: your_field_ie_fieldA,
value: whoop
}
You can then iterate the end of the (most likely) in line output and, bam, you have your changes.
This is of course one way of dong it and is not designed to be run completely in line to your app, however that all depends on the size of the data your working on; it could be run very close.
I am unsure whether the group and distinct can run on this but it looks like it might: http://docs.mongodb.org/manual/reference/method/db.collection.group/#db-collection-group however I should note that group is basically a MR wrapper but you could do something like (untested just like the MR above):
db.col.group( {
key: { 'changes.fieldA': 1, // the rest of the fields },
cond: { 'timestamp.date': { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
But it does require you to define the keys instead of just iterating them programmably (maybe a better way).

How can I find records greater than or equal to a time in MongoDB?

I have a MongoDB document structured like this:
{
"_id": ObjectId("50cf904a07ef604c8cc3d091"),
"lessons": {
"0": {
"lesson_name": "View and Edit Lists",
"release_time": ISODate("2012-12-17T00:00:00Z"),
"requires_anim": false,
"requires_qq": true
},
"1": {
"lesson_name": "Leave a Tip",
"release_time": ISODate("2012-12-18T00:00:00Z"),
"requires_anim": false,
"requires_qq": true
}
}
}
I have a number of such documents. I'd like to get all documents for which the release time of a lesson is greater than or equal to a given time. Here's the query I wrote:
db.lessons.find({"lessons.release_time":{"$gte": ISODate("2012-12-16")}});
But this is not returning any documents. Any ideas on what I'm doing wrong and how to correct it. Thanks.
Here's the result of my testing:
> db.testc.insert( { lessons: [
{release_time: ISODate("2012-12-17T00:00:00Z")},
{release_time: ISODate("2012-12-18T00:00:00Z")}
] } )
> db.testc.find({"lessons.release_time":{"$gte": ISODate("2012-12-16")}})
{ "_id" : ObjectId("50cfa093ab08a4592c73f927"),
"lessons" : [
{ "release_time" : ISODate("2012-12-17T00:00:00Z") },
{ "release_time" : ISODate("2012-12-18T00:00:00Z") }
] }
Your query is fine but, as others have pointed out, most likely your data is not structured as an array.

Efficiency of indexed embedded array

I am currently evaluating the efficiency of different databases for a use case. In Mongodb, would like to store around 1 million objects with the following structure. Each object will have between 5 and 10 objects in the foo array.
{
name:"my name",
foos:[
{
foo:"...",
bar:"..."
},
{
foo:"...",
bar:"..."
},
{
foo:"...",
bar:"..."
}
]
}
I often need to search for objects which where the foos collection contains an object with a specific property, e.g.:
// mongo collection
[
{
name:'my name',
foos:[
{
foo:'one_foo',
bar:'a_bar'
},
{
foo:'two_foo',
bar:'b_bar'
}
]
},
{
name:'another name',
foos:[
{
foo:'another foo',
bar:'a_bar'
},
{
foo:'just another foo',
bar:'c_bar'
}
]
}
]
// search (pseudo code)
{ foos: {$elemMatch: {bar: 'c_bar'}} }
// returns
{
name:'another name',
foos:[
{
foo:'another foo',
bar:'a_bar'
},
{
foo:'just another foo',
bar:'c_bar'
}
]
}
Can this efficiently be done with mongo and how should the indexes be set?
I don't want you to evaluate performance for me, just an idea how mongo performs for my use case or how optimization could look like.
MongoDB has documentation explaining how to create indexes on embedded documents, through dot notation:
Dot Notation (Reaching into Objects)
> db.blogposts.findOne()
{ title : "My First Post", author: "Jane",
comments : [{ by: "Abe", text: "First" },
{ by : "Ada", text : "Good post" } ]
}
> db.blogposts.find( { "comments.by" : "Ada" } )
> db.blogposts.ensureIndex( { "comments.by" : 1 } );
As for the performance characteristic... just test it with your dataset.