How to create a collection from a query in MongoDB? - mongodb

I have an existing collection where I can do some queries on. For further data processing, it would be handy to create some subset collections via query.
I understood that I can use the aggregate function with $match and $expr to e.g. $group some values and at the end use $out to get a new collection with the results.
The thing I am hanging on is not to $group anything, but just put the objects that $match finds into a new collection only. So not the complete objects with all their values. Just the one I am matching. Like when you db[collection].find({$match: {...}}, {"key1": 1, "key2": 0})
Where I get the new matching objects just containing key1: value1 but not key2: value2, which is also in the original collection.
How do I achieve that using aggregate without grouping anything? I read through the documentation and couldn't find any other stage operation that looks good.

As i mentioned in the comments $project is the right operator to use to achieve this.

Related

How to project in MongoDB after sort?

In find operation fields can be excluded, but what if I want to do a find then a sort and just after then the projection. Do you know any trick, operation for it?
doc: fields {Object}, the fields to return in the query. Object of fields to include or exclude (not both), {‘a’:1}
You can run a usual find query with conditions, projections, and sort. I think you want to sort on a field that you don't want to project. But don't worry about that, you can sort on that field even after not projecting it.
If you explicitly select projection of sorting field as "0", then you won't be able to perform that find query.
//This query will work
db.collection.find(
{_id:'someId'},
{'someField':1})
.sort('someOtherField':1)
//This query won't work
db.collection.find(
{_id:'someId'},
{'someField':1,'someOtherField':0})
.sort('someOtherField':1)
However, if you still don't get required results, look into the MongoDB Aggregation Framework!
Here is the sample query for aggregation according to your requirement
db.collection.aggregate([
{$match: {_id:'someId'}},
{$sort: {someField:1}},
{$project: {_id:1,someOtherField:1}},
])

MongoDB Query Nested Array Search

I need to query documents with mongoDb that contain nested arrays. I see a lot of examples using the simple $in operator. The only problem is that I strictly need to check for proper subsets.
Consider the following document.
{data: [[1,2,3], [4,5,6]]}
The query needs to be able to get documents with all of [1,2,3] where 1,2,3 can be in any order, which rules out the following query, because it will only match in the correct order.
{data:{$elemMatch:{$all:[[1,2,3]]}}}
I've also tried nested $elemMatch operators with no success, because the $in operator will return the document even if only one element matches such as the following.
{data:{$elemMatch:{$elemMatch:{$in:[1,4]}}}}
Not sure what your actual query looks like, but this should do what you need:
db.documentDto.find({"some_field":{"$elemMatch":{"$in":[1,2,3]}} })
I haven't got a complete answer (and not much time as its late here) but I would consider
Using aggregation pipeline instead of a query if your not already
Use $unwind operator to deconstruct your nested arrays
Use $sort to sort the contents of the arrays - so you can now compare
Use $match to filter out the arrays which don't fit the array subset values as you can now check based on order.
Use $group to group the result back together based on the _id value
Ref:
http://docs.mongodb.org/manual/reference/operator/aggregation-pipeline/ will give you info on each of the above.
From a quick search I came up with a similar question/example that might be helpful: Mongodb sort inner array

Is it possible to get a slice of a slice in Mongo?

I'm querying a mongo collection that has a field that is an array of arrays. I want to find a record with a projection of one deep value out of the array of arrays. Conceptually, this is a $slice of a $slice. Is there a way to do this in Mongo?
For example - I have a record:
{
name: "foo",
text: [["part 1:1", "part 1:2"],["part 2:1","part 2:2"]]
}
and want to select the record with projection "part 2:2".
db.collection.find({"name":"foo"},{text: {$slice: [1,1]}}
gives me the array with both "part 2:1" and "part 2:2". How do I get just "part 2:2"?
You need to use the aggregation pipeline to achieve a $slice chain, due to the limitations in the project statement being part of the find query.
The below query is invalid because the first $slice would return an array, instead of an index, and the execution of the outer scoped $slice fails.
db.collection.find({"name":"foo"},{text: {$slice:[{$slice: [1,1]}]}})
Moreover there is no way to work on an projected field in the same project statement, if possible, we could have modified the text further by applying a $slice to it.
The way to go would be:
Match the record with the name as foo.
Unwind the text array to get to the first level.
Unwind again to get to the level that we want.
Group the records together by name.
Project the last record in the group which is also the last element
of the last nested array.
The Code:
db.collection.aggregate([
{$match:{"name":"foo"}},
{$unwind:"$text"},
{$unwind:"$text"},
{$group:{"_id":"$name","text":{$last:"$text"}}},
{$project:{"name":"$_id","text":1}}
])
or if you would want to project an element appearing in a particular order, then you could use the $skip and $limit operations to achieve this.
var orderOfElement = 2;
db.collection.aggregate([
{$match:{"name":"foo"}},
{$unwind:"$text"},
{$unwind:"$text"},
{$skip:orderOfElement -1},
{$limit:1}
])
Which projects the second element in order in the nested arrays.
If you want the specific part 2:2 the below query will help.
db.user.find({"name":"foo"},{_id:0,name:0})[0].text[1][1];
part 2:2

difference between aggregate ($match) and find, in MongoDB?

What is the difference between the $match operator used inside the aggregate function and the regular find in Mongodb?
Why doesn't the find function allow renaming the field names like the aggregate function?
e.g. In aggregate we can pass the following string:
{ "$project" : { "OrderNumber" : "$PurchaseOrder.OrderNumber" , "ShipDate" : "$PurchaseOrder.ShipDate"}}
Whereas, find does not allow this.
Why does not the aggregate output return as a DBCursor or a List? and also why can't we get a count of the documents that are returned?
Thank you.
Why does not the aggregate output return as a DBCursor or a List?
The aggregation framework was created to solve easy problems that otherwise would require map-reduce.
This framework is commonly used to compute data that requires the full db as input and few document as output.
What is the difference between the $match operator used inside the aggregate function and the regular find in Mongodb?
One of differences, like you stated, is the return type. Find operations output return as a DBCursor.
Other differences:
Aggregation result must be under 16MB. If you are using shards, the full data must be collected in a single point after the first $group or $sort.
$match only purpose is to improve aggregation's power, but it has some other uses, like improve the aggregation performance.
and also why can't we get a count of the documents that are returned?
You can. Just count the number of elements in the resulting array or add the following command to the end of the pipe:
{$group: {_id: null, count: {$sum: 1}}}
Why doesn't the find function allow renaming the field names like the aggregate function?
MongoDB is young and features are still coming. Maybe in a future version we'll be able to do that. Renaming fields is more critical in aggregation than in find.
EDIT (2014/02/26):
MongoDB 2.6 aggregation operations will return a cursor.
EDIT (2014/04/09):
MongoDB 2.6 was released with the predicted aggregation changes.
I investigated a few things about the aggregation and find call:
I did this with a descending sort in a table of 160k documents and limited my output to a few documents.
The Aggregation command is slower than the find command.
If you access to the data like ToList() the aggregation command is faster than the find.
if you watch at the total times (point 1 + 2) the commands seem to be equal
Maybe the aggregation automatically calls the ToList() and does not have to call it again. If you dont call ToList() afterwards the find() call will be much faster.
7 [ms] vs 50 [ms] (5 documents)

how can I manipulate the value field of MapReduce?

When using MapReduce, each resulting document 'result' is structured like this:
{ '_id' : 123, 'value' :{'sum_donations' 999, 'nbr_visitors':50 }
I could access _id and value field by using:
db.result.find() OR db.result.find({},{_id:1, value:1})
Is there a way to select _id and sum_donations without selecting the nbr_visitors? Something like this:
{'id': 123, 'sum_donation': 999}
Or should I just create another MapReduce function that return that for me?
I was thinking about having one MapReduce Collection and manipulate it to answer different questions.
I tried
db.result.find({},{_id:1, value.sum_donations:1}) but it didn't work.
There are two problems to doing this:
The value field of the MR is not currently manipulatable from the MR itself atm, there is a JIRA for it but it's not exactly on the "list": https://jira.mongodb.org/browse/SERVER-2517
The query language of Mongo cannot automatically project your fields to the top level document. Subdocument fields stay in the subdocument.
You could (if your using MongoDB 2.2) use the aggregation framework here with the $project operator but I believe this to be super over kill and would slow down your system and your program.
So the best way to do this atm is to just extend your programming to grab the field out of that subdocument. This is probably the most performant, direct and easiest method of doing this atm, to simply code around it.