Save Subset of MongoDB Collection to Another Collection - mongodb

I have a set like so
{date: 20120101}
{date: 20120103}
{date: 20120104}
{date: 20120005}
{date: 20120105}
How do I save a subset of those documents with the date '20120105' to another collection?
i.e db.subset.save(db.full_set.find({date: "20120105"}));

I would advise using the aggregation framework:
db.full_set.aggregate([ { $match: { date: "20120105" } }, { $out: "subset" } ])
It works about 100 times faster than forEach at least in my case. This is because the entire aggregation pipeline runs in the mongod process, whereas a solution based on find() and insert() has to send all of the documents from the server to the client and then back. This has a performance penalty, even if the server and client are on the same machine.

Here's the shell version:
db.full_set.find({date:"20120105"}).forEach(function(doc){
db.subset.insert(doc);
});
Note: As of MongoDB 2.6, the aggregation framework makes it possible to do this faster; see melan's answer for details.

Actually, there is an equivalent of SQL's insert into ... select from in MongoDB. First, you convert multiple documents into an array of documents; then you insert the array into the target collection
db.subset.insert(db.full_set.find({date:"20120105"}).toArray())

The most general solution is this:
Make use of the aggregation (answer given by #melan):
db.full_set.aggregate({$match:{your query here...}},{$out:"sample"})
db.sample.copyTo("subset")
This works even when there are documents in "subset" before the operation and you want to preserve those "old" documents and just insert a new subset into it.
Care must be taken, because the copyTo() command replaces the documents with the same _id.

There's no direct equivalent of SQL's insert into ... select from ....
You have to take care of it yourself. Fetch documents of interest and save them to another collection.
You can do it in the shell, but I'd use a small external script in Ruby. Something like this:
require 'mongo'
db = Mongo::Connection.new.db('mydb')
source = db.collection('source_collection')
target = db.collection('target_collection')
source.find(date: "20120105").each do |doc|
target.insert doc
end

Mongodb has aggregate along with $out operator which allow to save subset into new collection. Following are the details :
$out Takes the documents returned by the aggregation pipeline and writes them to a specified collection.
The $out operation creates a new collection in the current database if one does not already exist.
The collection is not visible until the aggregation completes.
If the aggregation fails, MongoDB does not create the collection.
Syntax :
{ $out: "<output-collection>" }
Example
A collection books contains the following documents:
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
The following aggregation operation pivots the data in the books collection to have titles grouped by authors and then writes the results to the authors collection.
db.books.aggregate( [
{ $group : { _id : "$author", books: { $push: "$title" } } },
{ $out : "authors" }
] )
After the operation, the authors collection contains the following documents:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
In the asked question, use following query and you will get new collection named 'col_20120105' in your database
db.products.aggregate([
{ $match : { date : "20120105" } },
{ $out : "col_20120105" }
]);

You can also use $merge aggregation pipeline stage.
db.full_set.aggregate([
{$match: {...}},
{ $merge: {
into: { db: 'your_db', coll: 'your_another_collection' },
on: '_id',
whenMatched: 'keepExisting',
whenNotMatched: 'insert'
}}
])

Related

MongoDB Sorting: Equivalent Aggregation Query

I have following students collection
{ "_id" : ObjectId("5f282eb2c5891296d8824130"), "name" : "Rajib", "mark" : "1000" }
{ "_id" : ObjectId("5f282eb2c5891296d8824131"), "name" : "Rahul", "mark" : "1200" }
{ "_id" : ObjectId("5f282eb2c5891296d8824132"), "name" : "Manoj", "mark" : "1000" }
{ "_id" : ObjectId("5f282eb2c5891296d8824133"), "name" : "Saroj", "mark" : "1400" }
My requirement is to sort the collection basing on 'mark' field in descending order. But it should not display 'mark' field in final result. Result should come as:
{ "name" : "Saroj" }
{ "name" : "Rahul" }
{ "name" : "Rajib" }
{ "name" : "Manoj" }
Following query I tried and it works fine.
db.students.find({},{"_id":0,"name":1}).sort({"mark":-1})
My MongoDB version is v4.2.8. Now question is what is the equivalent Aggregation Query of the above query. I tried following two queries. But both didn't give me desired result.
db.students.aggregate([{"$project":{"name":1,"_id":0}},{"$sort":{"mark":-1}}])
db.students.aggregate([{"$project":{"name":1,"_id":0,"mark":1}},{"$sort":{"mark":-1}}])
Why it is working in find()?
As per Cursor.Sort, When a set of results are both sorted and projected, the MongoDB query engine will always apply the sorting first.
Why it isn't working in aggregate()?
As per Aggregation Pipeline, The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents.
You need to correct:
You should change pipeline order, because if you have not selected mark field in $project then it will no longer available in further pipelines and it will not affect $sort operation.
db.students.aggregate([
{ "$sort": { "mark": -1 } },
{ "$project": { "name": 1, "_id": 0 } }
])
Playground: https://mongoplayground.net/p/xtgGl8AReeH

MongoDB get all embedded documents where condition is met

I did this in my mongodb:
db.teams.insert({name:"Alpha team",employees:[{name:"john"},{name:"david"}]});
db.teams.insert({name:"True team",employees:[{name:"oliver"},{name:"sam"}]});
db.teams.insert({name:"Blue team",employees:[{name:"jane"},{name:"raji"}]});
db.teams.find({"employees.name":/.*o.*/});
But what I got was:
{ "_id" : ObjectId("5ddf3ca83c182cc5354a15dd"), "name" : "Alpha team", "employees" : [ { "name" : "john" }, { "name" : "david" } ] }
{ "_id" : ObjectId("5ddf3ca93c182cc5354a15de"), "name" : "True team", "employees" : [ { "name" : "oliver" }, { "name" : "sam" } ] }
But what I really want is
[{"name":"john"},{"name":"oliver"}]
I'm having a hard time finding examples of this without using some kind of programmatic iterator/loop. Or examples I find return the parent document, which means I'd have to parse out the embedded array employees and do some kind of UNION statement?
Eg.
How to get embedded document in mongodb?
Retrieve only the queried element in an object array in MongoDB collection
Can someone point me in the right direction?
Please add projections to filter out the fields you don't need. Please refer the project link mongodb projections
Your find query should be constructed with the projection parameters like below:
db.teams.find({"employees.name":/.*o.*/}, {_id:0, "employees.name": 1});
This will return you:
[{"name":"john"},{"name":"oliver"}]
Can be solved with a simple aggregation pipeline.
db.teams.aggregate([
{$unwind : "$employees"},
{$match : {"employees.name":/.*o.*/}},
])
EDIT:
OP Wants to skip the parent fields. Modified query:
db.teams.aggregate([
{$unwind : "$employees"},
{$match : {"employees.name":/.*o.*/}},
{$project : {"name":"$employees.name",_id:0}}
])
Output:
{ "name" : "john" }
{ "name" : "oliver" }

MongoDB 4.0 aggregation addFields not saving documents after using toDate

I have the following documents,
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "0"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180330"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180402"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180323"
},
I tried to convert date to ISODate using $toDate in aggregation,
db.documents.aggregate( [ { "$addFields": { "received_date": { "$cond": [ {"$ne": ["$date", "0"] }, {"$toDate": "$date"}, new Date("1970-01-01") ] } } } ] )
the query executed fine, but when I
db.documents.find({})
to examine all the documents, nothing changed, I am wondering how to fix it. I am using MongoDB 4.0.6 on Linux Mint 19.1 X64.
As they mentioned in the comments, aggregate doesn't update documents in the database directly (just an output of them).
If you'd like to permanently add a new field to documents via aggregation (aka update the documents in the database), use the following .forEach/.updateOne method:
Your example:
db.documents
.aggregate([{"$addFields":{"received_date":{"$cond":[{"$ne":["$date","0"]}, {"$toDate": "$date"}, new Date("1970-01-01")]}}}])
.forEach(function (x){db.documents.updateOne({_id: x._id}, {$set: {"received_date": x.received_date}})})
Since _id's value is an ObjectID(), there may be a slight modification you need to do to {_id:x._id}. If there is, let me know and I'll update it!
Another example:
db.users.find().pretty()
{ "_id" : ObjectId("5acb81b53306361018814849"), "name" : "A", "age" : 1 }
{ "_id" : ObjectId("5acb81b5330636101881484a"), "name" : "B", "age" : 2 }
{ "_id" : ObjectId("5acb81b5330636101881484b"), "name" : "C", "age" : 3 }
db.users
.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}])
.forEach(function (x){db.users.updateOne({name: x.name}, {$set: {totalAge: x.totalAge}})})
Being able to update collections via the aggregation pipeline seems to be quite valuable because of what you have the power to do with aggregation (e.g. what you did in your question, doing calculations based on other fields within the document, etc.). I'm newer to MongoDB so maybe updating collections via aggregation pipeline is "bad practice", but it works and it's been quite valuable for me. I wonder why it isn't more straight-forward to do?
Note: I came up with this method after discovering Nazo's now-deprecated .save() method. Shoutout to Nazo!

MongoDB - simple sub query example

Given the data:
> db.parameters.find({})
{ "_id" : ObjectId("56cac0cd0b5a1ffab1bd6c12"), "name" : "Speed", "groups" : [ "
123", "234" ] }
> db.groups.find({})
{ "_id" : "123", "name" : "Group01" }
{ "_id" : "234", "name" : "Group02" }
{ "_id" : "567", "name" : "Group03" }
I would like to supply a parameter _id an make a query return all groups that are within the groups array of the given document in parameters table.
The straightforward solution seems to make several DB calls in PyMongo:
Get parameter from parameters table based on the supplied _id
For each element of groups array select a document from groups collection
But this will have so much unnecessary overhead. I feel there must be a better, faster way to do this within MongoDB (without running custom JS in the DB). Or should I re-structure my data by normalising it a little bit (like a table of relationships), neglecting the document-based approach?
Again, please help me find a solution that would work from PyMongo DB interface
You can do this within a single query using the aggregation framework. In particular you'd need to run an aggregation pipeline that uses the $lookup operator to do a left join from the parameters collection to the groups collection.
Consider running the following pipeline:
db.parameters.aggregate([
{ "$unwind": "$groups" },
{
"$lookup": {
"from": "groups",
"localField": "groups",
"foreignField": "_id",
"as": "grp"
}
},
{ "$unwind": "$grp" }
])
Sample Output
/* 1 */
{
"_id" : ObjectId("56cac0cd0b5a1ffab1bd6c12"),
"name" : "Speed",
"groups" : "123",
"grp" : {
"_id" : "123",
"name" : "Group01"
}
}
/* 2 */
{
"_id" : ObjectId("56cac0cd0b5a1ffab1bd6c12"),
"name" : "Speed",
"groups" : "234",
"grp" : {
"_id" : "234",
"name" : "Group02"
}
}
If your MongoDB server version does not support the $lookup pipeline operator, then you'd need execute two queries as follows:
# get the group ids
ids = db.parameters.find_one({ "_id": ObjectId("56cac0cd0b5a1ffab1bd6c12") })["groups"]
# query the groups collection with the ids from previous query
db.groups.find({ "_id": { "$in": ids } })
EDIT: matched the field name in the aggregation query to the field name in example dataset (within the question)

Mongodb : count array values with mapreduce / aggregation

I have documents with the following structure :
{
"name" : "John",
"items" : [
{"key1" : "value1"},
{"key1" : "value1"}
]
}
And have built a simple function to count the number of "items" total.
var count = 0;
db.collection.find({},{items:1}).limit(10000).forEach(
function (doc) {
if(doc.items){
count += doc.items.length;
}
}
)
print(count);
But after ~1 million items, my function breaks, Mongo exits. I've looked at the new aggregation framework as well as mapreduce functions, and I'm not sure which would be the best to use for a simple count like this.
Suggestions welcome! Thanks.
It becomes very easy when you use aggregation http://docs.mongodb.org/manual/core/aggregation-pipeline/
db.collection.aggregate(
{ $unwind : "$items" },
{ $group : {_id:null, items_count : {$sum:1} }}
)
to return count of items for each document,
{ $group : {_id:"$_id", items_count : {$sum:1} }}
You can store length of doc.items as an element of doc. This method causes disk redundancy but a fast and easy way to deal with large collections.
{
"name" : "John",
"itemsLength" : 2,
"items" : [
{"key1" : "value1"},
{"key1" : "value1"}
]
}
Another option may be using mapreduce but, I think, without sharding mapreduce would be slow.