Matching on compound _id fields in MongoDB aggregate - mongodb

I'm a MongoDB novice so please forgive me if this question has an obvious answer...
Context:
I've followed the example in the MongoDB docs to implement hierarchical aggregation using map-reduce. The example uses a "compound" _id field as a map-reduce key producing aggregate documents like this...
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z") },
value: {
ts: ISODate('2010-10-10T15:01:00Z'),
total: 254,
count: 10,
mean: 25.4 }
}
This is all well and good. My particular use case requires that values for several similar keys be emitted each map step. For example...
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), hobby: "wizardry" },
value: {
ts: ISODate('2010-10-10T15:01:00Z'),
total: 254,
count: 10,
mean: 25.4 }
}
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), gender: "male" },
value: {
ts: ISODate('2010-10-10T15:01:00Z'),
total: 254,
count: 10,
mean: 25.4 }
}
(The values are the same, but the _id keys are slightly different.)
This is also well and good.
Question:
Now I'd like to aggregate over my hierarchical collections (views), which contain documents having several different compound _id fields, but only over documents with $matching _id fields. For example, I'd like to aggregate over just the documents possessing the {u: String, d: Date, hobby: String} type _id or just the documents with an _id of type {u: String, d: Date}.
I'm aware that I can use the $exists operator to restrict which _id fields should and shouldn't be permitted, but I don't want to have to create a separate aggregation for each _id (potentially many).
Is there a simple way of programmatically restricting $matching documents to those containing (or not containing) particular fields in an aggregate?

I think the best way to address this issues is by storing your data differently. Your "_id" sort of has arbitrary values as key and that is something you should avoid. I would probably store the documents as:
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), type: hobby, value: "wizardry" }
}
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), type: gender, value: "male" },
}
And then your match because simple even without having to create a different match for each type.

Related

Find a value in multiple nested fields with the same name in MongoDB

In some documents I have a property with a complex structure like:
{
content: {
foo: {
id: 1,
name: 'First',
active: true
},
bar: {
id: 2,
name: 'Second',
active: false
},
baz: {
id: 3,
name: 'Third',
active: true
},
}
I'm trying to make a query that can find all documents with a given value in the field name across the different second level objects foo, bar, baz
I guess that a solution could be:
db.getCollection('mycollection').find({ $or: [
{'content.foo.name': 'First'},
{'content.bar.name': 'First'},
{'content.baz.name': 'First'}
]})
But a I want to do it dynamic, with no need to specify key names of nested fields, nether repeat the value to find in every line.
If some Regexp on field name were available , a solution could be:
db.getCollection('mycollection').find({'content.*.name': 'First'}) // Match
db.getCollection('mycollection').find({'content.*.name': 'Third'}) // Match
db.getCollection('mycollection').find({'content.*.name': 'Fourth'}) // Doesn't match
Is there any way to do it?
I would say this is a bad schema if you don't know your keys in advance. Personally I'd recommend to change this to an array structure.
Regardless what you can do is use the aggregation $objectToArray operator, then query that newly created object. Mind you this approach requires a collection scan each time you execute a query.
db.collection.aggregate([
{
$addFields: {
arr: {
"$objectToArray": "$content"
}
}
},
{
$match: {
"arr.v.name": "First"
}
},
{
$project: {
arr: 0
}
}
])
Mongo Playground
Another hacky approach you can take is potentially creating a wildcard text index and then you could execute a $text query to search for the name, obviously this comes with the text index/query limitations and might not be right for your usecase.

Is it possible to write a Mongo query that validates a condition in each element of an array?

Consider the following documents:
{
_id: 1,
a: [{b:true},{b:true}]
},
{
_id: 2,
a: [{b:false},{b:true}]
},
{
_id: 3,
a: [{b:true}]
}
I'd like to write a query that will return all of the top level documents that have an array ("a") that contain only elements matching {b : true}. In this example, I'm expecting the first and third document to be returned, but not the second.
When writing the query like this, all 3 documents are returned..
{a : {b : true}}
Is there an operator that I'm missing? I've reviewed quite a few of them ($all) and I'm not sure which would match best for this use case
Thanks so much
Simply use $allElementsTrue on the array a.b.
db.collection.find({
$expr: {
"$allElementsTrue": "$a.b"
}
})
Here is the Mongo Playground for your reference.

MongoDB range query on entire embedded document value

Assume we have a collection foo with index {tag: 1} where tag is a single key-value pair (there are a lot more details in my actual use-case, but I'm trying to distill down the problem):
{_id: 1, tag: {bar: "BAR"}}
{_id: 2, tag: {baz: "BAZ"}}
When I query {tag: { $gte: { baz: MinKey() }}}, it returns BOTH documents (unexpected).
When I query {tag: { $gte: { baz: "" }}}, it returns only {_id: 2, tag: {baz: "BAZ"}} (expected).
According to https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#objects, BSON objects are ordered: 1) by field names and 2) by field values.
So why does {tag: { $gte: { baz: MinKey() }}} return {_id: 1, tag: {bar: "BAR"}} when bar is NOT GREATER THAN baz?
Note the command a few lines down in the documentation you linked:
Non-existent Fields
The comparison treats a non-existent field as if it were an empty BSON Object. As such, a sort on the a field in documents { } and { a: null } would treat the documents as equivalent in sort order.
This is telling you that non-existent fields and fields set to null are treated specially.
In order for the documents {} and {a: null} to be equivalent in sort order, the sort algorithm must be considering the missing sort field to be present, and have a value of null.
If you explicitly add the missing field, just to see how it looks, the ordering makes more sense.
The filter {tag: { $gte: { baz: MinKey() }}} applied to {_id: 1, tag: {bar: "BAR"}} is essentially comparing {baz: MinKey()} with {baz: null, bar: "BAR"}.
Near the top of the documentaion you linked it states that MinKey is less than null, so that is the proper ordering.
EDIT
In general, querying is most efficient when the fieldnames are not themselves data. In terms of a tabular database, which columnn would contain "baz"?
A slight change in the schema would simplify this type of query. Instead of {tagname: tagvalue}, use {k:tagname, v:tagvalue}. You could then index tag.k and/or tag.v, and query on tag.k to find all documents with a "baz" tag, querying tags with inequality operations would work more intuitively.
db.collection.find({"tag.k":{$gte:"baz"}})
Exact matches could be done with elemMatch like
db.collection.find({tag: {$elemMatch:{k:"baz",v:"BAZ"}}})
If you really need the returned documents to contain {tagname: tagvalue}, the $arrayToObject aggregation operator can do that:
db.collection.aggregate([
{$match: {
"tag.k": {$gte: "baz"}
}},
{
$addFields: {
tag: {$arrayToObject: [["$tag"]]}
}}
])
Playground

Mongodb: How to update calculated field in atomic operation

In an atomic update operation, I would like to (re)calculate a field in a document based on the updated values of other fields in the same document.
Example:
Documents have this structure:
{ _id: xxx, nrValues: 4, sumOfValues: 20, meanValue: 5 }
{ _id: yyy, nrValues: 3, sumOfValues: 12, meanValue: 4 }
The (indexed) meanValue field is needed for fast sort/find operations.
Now I would like to "add" new values to documents like in this pseudocode:
Documents.update( { _id: xxx },
{ $inc: { nrValues: 1, sumOfValues: newValue },
$set: { meanValue: sumOfValues / nrValues } } );
Is there a way to find real code for this pseudocode?
BTW: I know that I could use map/reduce/aggregation to sort/find the documents this way without having an actual meanValue field at all, but I do want this actual field in the document and in my (Meteor based) situation I cannot use map/reduce/aggregation.

Storing null vs not storing the key at all in MongoDB

It seems to me that when you are creating a Mongo document and have a field {key: value} which is sometimes not going to have a value, you have two options:
Write {key: null} i.e. write null value in the field
Don't store the key in that document at all
Both options are easily queryable, in one you query for {key : null} and the other you query for {key : {$exists : false}}.
I can't really think of any differences between the two options that would have any impact in an application scenario (except that option 2 has slightly less storage).
Can anyone tell me if there are any reasons one would prefer either of the two approaches over the other, and why?
EDIT
After asking the question it also occurred to me that indexes may behave differently in the two cases i.e. a sparse index can be created for option 2.
Indeed you have also a third possibility :
key: "" (empty value)
And you forget a specificity about null value.
Query on
key: null will retrieve you all document where key is null or where key doesn't exist.
When a query on $exists:false will retrieve only doc where field key doesn't exist.
To go back to your exact question it depends of you queries and what data represent.
If you need to keep that, by example, a user set a value then unset it, you should keep the field as null or empty. If you dont need, you may remove this field.
Note that, since MongoDB doesnt use field name dictionary compression, field:null consumes disk space and RAM, while storing no key at all doesnt consume resources.
It really comes down to:
Your scenario
Your querying manner
Your index needs
Your language
I personally have chosen to store null keys. It makes it much easier to integrate into my app. I use PHP with Active Record and uisng null values makes my life a lot easier since I am not having to put the stress of field depedancy upon the app. Also I do not need to make any complex code to deal with magics to set non-existant variables.
I personally would not store an empty value like "" since if your not careful you could have two empty values null and "" and then you'll have a hap-hazard time of querying specifically. So I personally prefer null for empty values.
As for space and index: it depends on how many rows might not have this colum but I doubt you will really notice the index size increase due to a few extra docs with null in. I mean the difference in storage is mineute especially if the corresponding key name is small as well. That goes for large setups too.
I am quite frankly unsure of the index usage between $exists and null however null could be a more standardised method by which to query the existance since remember that MongoDB is schemaless which means you have no requirement to have that field in the doc which again produces two empty values: non-existant and null. So better to choose one or the other.
I choose null.
Another point you might want to consider is when you use OGM tools like Hibernate OGM.
If you are using Java, Hibernate OGM supports the JPA standard. So if you can write a JPQL query, you would be theoretically easy if you want to switch to an alternate NoSQL datastore which is supported by the OGM tool.
JPA does not define a equivalent for $exists in Mongo. So if you have optional attributes in your collection then you cannot write a proper JPQL for the same. In such a case, if the attribute's value is stored as NULL, then it is still possible to write a valid JPQL query like below.
SELECT p FROM pppoe p where p.logout IS null;
I think in terms of disk space the difference is negligible. If you need to create an index on this field then consider Partial Index.
In index with { partialFilterExpression: { key: { $exists: true } } } can be much smaller than a normal index.
Also should be noted, that queries look different, see values like this:
db.collection.insertMany([
{ _id: 1, a: 1 },
{ _id: 2, a: '' },
{ _id: 3, a: undefined },
{ _id: 4, a: null },
{ _id: 5 }
])
db.collection.aggregate([
{
$set: {
type: { $type: "$a" },
ifNull: { $ifNull: ["$a", true] },
defined: { $ne: ["$a", undefined] },
existing: { $ne: [{ $type: "$a" }, "missing"] }
}
}
])
{ _id: 1, a: 1, type: double, ifNull: 1, defined: true, existing: true }
{ _id: 2, a: "", type: string, ifNull: "", defined: true, existing: true }
{ _id: 3, a: undefined, type: undefined, ifNull: true, defined: false, existing: true }
{ _id: 4, a: null, type: null, ifNull: true, defined: true, existing: true }
{ _id: 5, type: missing, ifNull: true, defined: false, existing: false }
Or with db.collection.find():
db.collection.find({ a: { $exists: false } })
{ _id: 5 }
db.collection.find({ a: { $exists: true} })
{ _id: 1, a: 1 },
{ _id: 2, a: '' },
{ _id: 3, a: undefined },
{ _id: 4, a: null }
db.collection.find({ a: null })
{ _id: 3, a: undefined },
{ _id: 4, a: null },
{ _id: 5 }
db.collection.find({ a: {$ne: null} })
{ _id: 1, a: 1 },
{ _id: 2, a: '' },
db.collection.find({ a: {$type: "null"} })
{ _id: 4, a: null }