I would like to create a filtering query for one of my collections in mongoDB. Basically I want to retrieve every element in my collection except some field in all the documents. On the mongoDB spec it's written something like this:
db.users.find({}, {thumbnail:0});
But I would like to do more, I would like to filter for three different entries, something more like this:
db.users.find({}, {thumbnail: 0, a: 0, b: 0});
The problem is that this is not working. I keep receiving those fields after the query.
I also tried something like this:
db.users.find({}, {{thumbnail: 0}, {a: 0}, {b: 0}});
But mongoDB doesn't even accept something like this...
Can anyone help me?
As I wrote in the comments of the question, I discovered that the guy that generated the collection gave me the wrong information about the data structure. The format of the collection is something like this: {_id: ..., "1" : {a : "a", b : "b", d : "d", ...}, ... } and so on, thus it's not possible to filter for example elements a and b from the collection when entirely retrieving it.
Related
Say I have a Foo document like the following:
{
_id: 1,
bar: [{_id: 1, ...bar props}, {_id: 2, ...bar props}, {_id: 3, ...bar props)}],
... other foo props
}
How do I query the database for a single Bar, such that my result looks like:
{_id: 2, ...bar props}
Something like:
db().collection('foo').findOne({ _id: 1, {foo: _id: 2}}, {foo: 1, _id: 0})
Matching and projection are separate operations in MongoDB and you should also keep them separate when you are thinking (and asking) about queries.
You cannot "query for a single Bar". Queries always match documents. What you can do is find a document which contains a Bar which matches conditions, or you can find a document which contains exactly one Bar which also matches conditions, etc. In all of these cases you still get the top-level document(s) as a result.
To retrieve (only) one, several or all of the Bars in whichever documents matched your query conditions, instead of those documents, use projection (either second argument to find or $project aggregation pipeline stage).
When you are using the aggregation pipeline, you can mix $match and $project stages so that, for example, you $match to filter down documents, then $project to reduce the documents to some of their fields, then $match to further filter down the resulting documents, and so on. Still matching and projection are separate operations.
Suppose the following collection of documents that include an 'user_id' field and an array of ids that this user follows
{"user_id": 1 , "follows" : [2,30]},
{"user_id": 2 , "follows" : [1,40]},
{"user_id": 3 , "follows" : [2,50]},
... large collection
I would like to filter out from "references" the numbers that don't exist in the collection as an id. Think about it as a data cleaning procedure, where follows to users that don't exist anymore need to be deleted. Example output from input above:
{"user_id": 1 , "follows" : [2]},
{"user_id": 2 , "follows" : [1]},
{"user_id": 3 , "follows" : [2]},
... large collection
I thought about a projection with a "$filter", but I can't find an expression for checking that a document with that id exists in the whole collection (as $filter seems to be limited to the current document).
Then I tried to aggregate a set of all ids to use an $in condition, but that failed miserable due to the size of collection (too large object error).
Thought about unwinding, but I'm hitting the same rock: can't find an expression to $match or $project that answers the question "Does this value of 'follows' exists as an 'id' in the collection?"
The only other thing I see doing the filtering client side with a few independent queries, but wanted to check first with the community if I'm missing something.
You could do a $lookup, like this:
$lookup: {
from: 'users',
localField: 'follows',
foreignField: 'user_id',
as: 'follows'
}
This will produce a result like { user_id: 1, follows: [ {user_id: 2, follows: [1, 40] } ] }. Then you should be able to get the result you want with $addFields (to map follows to follows.user_id).
$addFields: { follows: "$follows.user_id" }
I have a mongo DB collection that looks something like this:
{
{
_id: objectId('aabbccddeeff'),
objectName: 'MyFirstObject',
objectLength: 0xDEADBEEF,
objectSource: 'Source1',
accessCounter: {
'firstLocationCode' : 283,
'secondLocationCode' : 543,
'ThirdLocationCode' : 564,
'FourthLocationCode' : 12,
}
}
...
}
Now, assuming that this is not the only record in the collection and that most/all of the documents contain the accessCounter subdocument/field how will I go with selecting the x first documents where I have the most access from a specific location.
A sample "query" will be something like:
"Select the first 10 documents From myCollection where the accessCounter.firstLocationCode are the highest"
So a sample result will be X documents where the accessCounter. will be the greatest is the database.
Thank your for taking the time to read my question.
No need for an aggregation, that is a basic query:
db.collection.find().sort({"accessCounter.firstLocation":-1}).limit(10)
In order to speed this up, you should create a subdocument index on accessCounter first:
db.collection.ensureIndex({'accessCounter':-1})
assuming the you want to do the same query for all locations. In case you only want to query firstLocation, create the index on accessCounter.firstLocation.
You can speed this up further in case you only need the accessCounter value by making this a so called covered query, a query of which the values to return come from the index itself. For example, when you have the subdocument indexed and you query for the top secondLocations, you should be able to do a covered query with:
db.collection.find({},{_id:0,"accessCounter.secondLocation":1})
.sort("accessCounter.secondLocation":-1).limit(10)
which translates to "Get all documents ('{}'), don't return the _id field as you do by default ('_id:0'), get only the 'accessCounter.secondLocation' field ('accessCounter.secondLocation:1'). Sort the returned values in descending order and give me the first ten."
This might be trivial, but I haven't figured out a way to do it.
Say I have the following records in the database:
{ A: 1, B: 2, C: "Red" }
{ A: 1, B: 2, C: "Blue"}
{ A: 1, B: 3, C: "Red" }
And I want to return all records with {A: 1, C: "Red"}, but not when C: "Blue" if there are multiple records with the same B values. So for the above records, it'll only return the 3rd record. The 1st record would not be returned because there are two records with the same B value, and one of them has C: "Blue" as a value.
I can only think of doing this via two queries to the database, i.e. first query {A:1, C:"Red"}, then check by querying all elements in database. I suppose the second step might actually be many more than just 1 query.
I don't really want to query with {A: 1}. Of course, I'm doing all this through the API, so this way it'll be one database query, but the resulting list could be much bigger than I'd like.
Is there a query that can do what I want via just 1 database call? Thanks.
I don't think it's possible with one query. But you can get all B you want with aggregate and then query database for that B:
db.test1.aggregate(
[
{$group: {_id: "$B", count: {$sum:1}}},
{$match: {count:1}}
]
)
will return you all B for which there only one record in your collection.
I'm trying to use the aggregation framework to group a lot of strings together to indentify the unique ones. I must also keep some information about the rest of the fields. This would be analogous to me using the * operator in mysql with a group by statement.
SELECT *
FROM my_table
GROUP BY field1
I have tried using the aggregation framework, and it works fine just to get unique fields.
db.mycollection.aggregate({
$group : { _id : "$field1"}
})
What if I want the other fields that went with that. MySQL would only give me the first one that appeared in the group (which I'm fine with). Thats what I thought the $first operator did.
db.mycollection.aggregate({
$group : {
_id : "$field1",
another_field : {$first : "$field2"}
}})
This way it groups by field1 but still gives me back the other fields attached to document. When I try this I get:
exception: aggregation result exceeds maximum document size (16MB)
Which I have a feeling is because it is returning the whole aggregation back as one document. Can I return it as another json array?
thanks in advance
You're doing the aggregation correctly, but as the error message indicates, the full result of the aggregate call cannot be larger than 16 MB.
Work-arounds would be to either add a filter to reduce the size of the result or use map-reduce instead and output the result to another collection.
If you unique values of the result does not exceed 2000 you could use group() function like
db.mycollection.group( {key : {field1 : 1, field2 : 1}}, reduce: function(curr, result){}, initial{} })
Last option would be map reduce:
db.mycollection.mapReduce( function() { emit( {field1 :1, field2: 1}, 1); }, function(key, values) { return 1;}, {out: {replace: "unique_field1_field2"}})
and your result would be in "unique_field1_field2" collection
Another alternative is use the distinct function:
db.mycollection.distinct('field1')
This functions accepts a second argument, a query, where you can filter the documents.