Background
I have a collection of items. Here is one:
{ "_id" : ObjectId("5d3e315975132b3a43149225"),
"thetime" : 201812,
"name": "watermelon",
"cost" : 7,
"info" : "empty"
"taglist" : [
{ "color" : "red" },
{ "color": "green"},
{ "store" : "market" },
{ "taste" : "sweet" } ]
}
I am trying to do an aggregate with a $match on every item that has the key color in its taglist (at least once). Later I want to group on total cost of every color or every store etc. So, my output for just this item collection would be (red: $7, green: $7). Point is: I don't want to use $find; I want to use $match.
Question:
How do I match on a tuple key in an array?
What I have so far
This query works for getting items that have the key value pair: (color, red): db.items.aggregate([{$match: {"taglist":{"color":"red"}}}]);
But, I do not know how to change the query to return all the items with any color.
Note: I'd prefer to not start with an $unwind because the documents can actually be larger than this one and performance is important.
If you want to return all the data that has the key color.
You can use $exist
db.items.aggregate([{$match: {"taglist.color":{$exists:true}}}])
Before match you need to unwind taglist
db.items.aggregate([
{
$unwind: "$taglist"
},
{
$match: { $exists:{"$taglist.color":true}}}
}
}
]);
On the basis of key you can group further
Related
I have a document in which data are like
collection A
{
"_id" : 69.0,
"values" : [
{
"date_data" : "2016-12-16 10:00:00",
"valueA" : 8,
"valuB" : 9
},
{
"date_data" : "2016-12-16 11:00:00",
"valueA" : 8,
"valuB" : 9
},.......
}
collection B
{
"_id" : 69.0,
"values" : [
{
"date_data" : "2017-12-16 10:00:00",
"valueA" : 8,
"valuB" : 9
},
{
"date_data" : "2017-12-16 11:00:00",
"valueA" : 8,
"valuB" : 9
},.......
}
data is being stored at each hour, as it store in one documents, it may reach its limit 16Mb at some point, that's why i'm thinking to spread data across the years, means in one collection all the id's will hold the data on yearly basis. But when we want to show data combined, how we can use aggregate function?
For example, collectionA has data from 7th dec'16 to 7th dec'17 and collectionB has data from 6th dec'15 to 6th dec'16. how i can show data between 1st dec'16 to 1st jan'17 which are in different collection?
Very simple, use mongodb $lookup query which is the equivalent of a left outer join. All the documents on the left will be scanned for a value inside a field and the documents from the right considered the foreign document will match with respect to value. For your case, here is the parent collection
Parent A
Child collection B
Now all we have to do is make a query from the collection A
With a very simple aggregation $lookup query, you ll see the following result
db.getCollection('uniques').aggregate([
{
"$lookup": {
"from": "values",//Get data from values table
"localField": "_id", //The field _id of the current table uniques
"foreignField": "parent_id", //The foreign column containing a matching value
"as": "related" //An array containing all items under 69
}
},
{
"$unwind": "$related" //Unwind that array
},
{
"$project": {
"value_id": "$related._id",//project only what you need
"date": "$related.date_data",
"a": "$related.valueA",
"b": "$related.valueB"
}
}
], {"allowDiskUse": true})
Remember a few things
Local field for the lookup doesnt care if you have indexed it or not so run it over a table with the least number of rows
Foreign field works best when indexed or directly on an _id
There is an option to specify a pipeline and do some custom filtering work while matching, I wont recommend it as pipelines are ridiculously slow
Dont forget to "allowDiskUse" if you are going to aggregate large amounts of data
I have a collection of documents where a "tags" field was switched over from being a space separated list of tags to an array of individual tags. I want to update the previous space-separated fields to all be arrays like the new incoming data.
I'm also having problems with the $type selector because it is applying the type operation to individual array elements, which are strings. So filtering by type just returns everything.
How can I get every document that looks like the first example into the format for the second example?
{
"_id" : ObjectId("12345"),
"tags" : "red blue green white"
}
{
"_id" : ObjectId("54321"),
"tags" : [
"red",
"orange",
"black"
]
}
We can't use the $type operator to filter our documents here because the type of the elements in our array is "string" and as mentioned in the documentation:
When applied to arrays, $type matches any inner element that is of the specified BSON type. For example, when matching for $type : 'array', the document will match if the field has a nested array. It will not return results where the field itself is an array.
But fortunately MongoDB also provides the $exists operator which can be used here with a numeric array index.
Now how can we update those documents?
Well, from MongoDB version <= 3.2, the only option we have is mapReduce() but first let look at the other alternative in the upcoming release of MongoDB.
Starting from MongoDB 3.4, we can $project our documents and use the $split operator to split our string into an array of substrings.
Note that to split only those "tags" which are string, we need a logical $condition processing to split only the values that are string. The condition here is $eq which evaluate to true when the $type of the field is equal to "string". By the way $type here is new in 3.4.
Finally we can overwrite the old collection using the $out pipeline stage operator. But we need to explicitly specify the inclusion of other field in the $project stage.
db.collection.aggregate(
[
{ "$project": {
"tags": {
"$cond": [
{ "$eq": [
{ "$type": "$tags" },
"string"
]},
{ "$split": [ "$tags", " " ] },
"$tags"
]
}
}},
{ "$out": "collection" }
]
)
With mapReduce, we need to use the Array.prototype.split() to emit the array of substrings in our map function. We also need to filter our documents using the "query" option. From there we will need to iterate the "results" array and $set the new value for "tags" using bulk operations using the bulkWrite() method new in 3.2 or the now deprecated Bulk() if we are on 2.6 or 3.0 as shown here.
db.collection.mapReduce(
function() { emit(this._id, this.tags.split(" ")); },
function(key, value) {},
{
"out": { "inline": 1 },
"query": {
"tags.0": { "$exists": false },
"tags": { "$type": 2 }
}
}
)['results']
Given the following MongoDB collection of documents :
{
title : 'shirt one'
tags : [
'shirt',
'cotton',
't-shirt',
'black'
]
},
{
title : 'shirt two'
tags : [
'shirt',
'white',
'button down collar'
]
},
{
title : 'shirt three'
tags : [
'shirt',
'cotton',
'red'
]
},
...
How do you retrieve a list of items matching a list of tags, ordered by the total number of matched tags? For example, given this list of tags as input:
['shirt', 'cotton', 'black']
I'd want to retrieve the items ranked in desc order by total number of matching tags:
item total matches
-------- --------------
Shirt One 3 (matched shirt + cotton + black)
Shirt Three 2 (matched shirt + cotton)
Shirt Two 1 (matched shirt)
In a relational schema, tags would be a separate table, and you could join against that table, count the matches, and order by the count.
But, in Mongo... ?
Seems this approach could work,
break the input tags into multiple "IN" statements
query for items by "OR"'ing together the tag inputs
i.e. where ( 'shirt' IN items.tags ) OR ( 'cotton' IN items.tags )
this would return, for example, three instances of "Shirt One", 2 instances of "Shirt Three", etc
map/reduce that output
map: emit(this._id, {...});
reduce: count total occurrences of _id
finalize: sort by counted total
But I'm not clear on how to implement this as a Mongo query, or if this is even the most efficient approach.
As i answered in In MongoDB search in an array and sort by number of matches
It's possible using Aggregation Framework.
Assumptions
tags attribute is a set (no repeated elements)
Query
This approach forces you to unwind the results and reevaluate the match predicate with unwinded results, so its really inefficient.
db.test_col.aggregate(
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$unwind: "$tags"},
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$group: {
_id:{"_id":1},
matches:{$sum:1}
}},
{$sort:{matches:-1}}
);
Expected Results
{
"result" : [
{
"_id" : {
"_id" : ObjectId("5051f1786a64bd2c54918b26")
},
"matches" : 3
},
{
"_id" : {
"_id" : ObjectId("5051f1726a64bd2c54918b24")
},
"matches" : 2
},
{
"_id" : {
"_id" : ObjectId("5051f1756a64bd2c54918b25")
},
"matches" : 1
}
],
"ok" : 1
}
Right now, it isnt possible to do unless you use MapReduce. The only problem with MapReduce is that it is slow (compared to a normal query).
The aggregation framework is slated for 2.2 (so should be available in 2.1 dev release) and should make this sort of thing much easier to do without MapReduce.
Personally, I do not think using M/R is an efficient way to do it. I would rather query for all the documents and do those calculations on the application side. It is easier and cheaper to scale your app servers than it is to scale your database servers so let the app servers do the number crunching. Of those, this approach may not work for you given your data access patterns and requirements.
An even simpler approach may be to just include a count property in each of your tag objects and whenever you $push a new tag to the array, you also $inc the count property. This is a common pattern in the MongoDB world, at least until the aggregation framework.
I'll second #Bryan in saying that MapReduce is the only possible way at the moment (and it's far from perfect). But, in case you desperately need it, here you go :-)
var m = function() {
var searchTerms = ['shirt', 'cotton', 'black'];
var me = this;
this.tags.forEach(function(t) {
searchTerms.forEach(function(st) {
if(t == st) {
emit(me._id, {matches : 1});
}
})
})
};
var r = function(k, vals) {
var result = {matches : 0};
vals.forEach(function(v) {
result.matches += v.matches;
})
return result;
};
db.shirts.mapReduce(m, r, {out: 'found01'});
db.found01.find();
I have a document with an array field, similar to this:
{
"_id" : "....",
"Statuses" : [
{ "Type" : 1, "Timestamp" : ISODate(...) },
{ "Type" : 2, "Timestamp" : ISODate(...) },
//Etc. etc.
]
}
How can I update a specific Status item's Timestamp, by specifying its Type value?
From mongodb shell you can do this by
db.your_collection.update(
{ _id: ObjectId("your_objectid"), "Statuses.Type": 1 },
{ $set: { "Statuses.$.Timestamp": "new timestamp" } }
)
so the c# equivalent
var query = Query.And(
Query.EQ("_id", "your_doc_id"),
Query.EQ("Statuses.Type", 1)
);
var result = your_collection.Update(
query,
Update.Set("Statuses.$.Timestamp", "new timestamp", UpdateFlags.Multi,SafeMode.True)
);
This will update the specific document, you can remove _id filter if you wanted to update the whole collection
Starting with MongoDB 3.6, the $[<identifier>] positional operator may be used. Unlike the $ positional operator — which updates at most one array element per document — the $[<identifier>] operator will update every matching array element. This is useful for scenarios where a given document may have multiple matching array elements that need to be updated.
db.yourCollection.update(
{ _id: "...." },
{ $set: {"Statuses.$[element].Timestamp": ISODate("2021-06-23T03:47:18.548Z")} },
{ arrayFilters: [{"element.Type": 1}] }
);
The arrayFilters option matches the array elements to update, and the $[element] is used within the $set update operator to indicate that only array elements that matched the arrayFilter should be updated.
I have a document with an array field, similar to this:
{
"_id" : "....",
"Statuses" : [
{ "Type" : 1, "Timestamp" : ISODate(...) },
{ "Type" : 2, "Timestamp" : ISODate(...) },
//Etc. etc.
]
}
How can I update a specific Status item's Timestamp, by specifying its Type value?
From mongodb shell you can do this by
db.your_collection.update(
{ _id: ObjectId("your_objectid"), "Statuses.Type": 1 },
{ $set: { "Statuses.$.Timestamp": "new timestamp" } }
)
so the c# equivalent
var query = Query.And(
Query.EQ("_id", "your_doc_id"),
Query.EQ("Statuses.Type", 1)
);
var result = your_collection.Update(
query,
Update.Set("Statuses.$.Timestamp", "new timestamp", UpdateFlags.Multi,SafeMode.True)
);
This will update the specific document, you can remove _id filter if you wanted to update the whole collection
Starting with MongoDB 3.6, the $[<identifier>] positional operator may be used. Unlike the $ positional operator — which updates at most one array element per document — the $[<identifier>] operator will update every matching array element. This is useful for scenarios where a given document may have multiple matching array elements that need to be updated.
db.yourCollection.update(
{ _id: "...." },
{ $set: {"Statuses.$[element].Timestamp": ISODate("2021-06-23T03:47:18.548Z")} },
{ arrayFilters: [{"element.Type": 1}] }
);
The arrayFilters option matches the array elements to update, and the $[element] is used within the $set update operator to indicate that only array elements that matched the arrayFilter should be updated.