Mongo DB: how to select items with nested array count > 0 - mongodb

The database is near 5GB. I have documents like:
{
_id: ..
user: "a"
hobbies: [{
_id: ..
name: football
},
{
_id: ..
name: beer
}
...
]
}
I want to return users who have more then 0 "hobbies"
I've tried
db.collection.find({"hobbies" : { &gt : 0}}).limit(10)
and it takes all RAM and no result.
How to do conduct this select?
And how to return only: id, name, count ?
How to do it with c# official driver?
TIA
P.S.
near i've found:
"Add new field to hande category size. It's a usual practice in mongo world."
is this true?

In this specific case, you can use list indexing to solve your problem:
db.collection.find({"hobbies.0" : {$exists : true}}).limit(10)
This just makes sure a 0th element exists. You can do the same to make sure the list is shorter than n or between x and y in length by checking the existing of elements at the ends of the range.

Have you tried using hobbies.length. i haven't tested this, but i believe this is the right way to query the range of the array in mongodb
db.collection.find({$where: '(this.hobbies.length > 0)'})

You can (sort of) check for a range of array lengths with the $size operator using a logical $not:
db.collection.find({array: {$not: {$size: 0}}})

That's somewhat true.
According to the manual
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
$size
The $size operator matches any array with the specified number of
elements. The following example would match the object {a:["foo"]},
since that array has just one element:
db.things.find( { a : { $size: 1 } } );
You cannot use $size to find a range of sizes (for example: arrays
with more than 1 element). If you need to query for a range, create an
extra size field that you increment when you add elements
So you can check for array size 0, but not for things like 'larger than 0'

Earlier questions explain how to handle the array count issue. Although in your case if ZERO really is the only value you want to test for, you could set the array to null when it's empty and set the option to not serialize it, then you can test for the existence of that field. Remember to test for null and to create the array when you want to add a hobby to a user.
For #2, provided you added the count field it's easy to select the fields you want back from the database and include the count field.

if you need to find only zero hobbies, and if the hobbies key is not set for someone with zero hobbies , use EXISTS flag.
Add an index on "hobbies" for performance enhancement :
db.collection.find( { hobbies : { $exists : true } } );
However, if the person with zero hobbies has empty array, and person with 1 hobby has an array with 1 element, then use this generic solution :
Maintain a variable called "hcount" ( hobby count), and always set it equal to size of hobbies array in any update.
Index on the field "hcount"
Then, you can do a query like :
db.collection.find( { hcount : 0 } ) // people with 0 hobbies
db.collection.find( { hcount : 5 } ) // people with 5 hobbies
3 - From #JohnPs answer, "$size" is also a good operator for this purpose.
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size

Related

Mongodb: Query on the last N documents(some portion) of a collection only

In my collection that has say 100 documents, I want to run the following query:
collection.find({"$text" : {"$search" : "some_string"})
Assume that a suitable "text" index already exists and thus my question is : How can I run this query on the last 'n' documents only?
All the question that I found on the web ask how to get the last n docs. Whereas My question is how to search on the last n docs only?
More generally my question is How can I run a mongo query on some portion say 20% of a collection.
What I tried
Im using pymongo so I tried to use skip() and limit() to get the last n documents but I didn't find a way to perform a query on cursor that the above mentioned function return.
After #hhsarh's anwser here's what I tried to no avail
# here's what I tried after initial answers
recents = information_collection.aggregate([
{"$match" : {"$text" : {"$search" : "healthline"}}},
{"$sort" : {"_id" : -1}},
{"$limit" : 1},
])
The result is still coming from the whole collection instead of just the last record/document as the above code attempts.
The last document doesn't contain "healthline" in any field therefore the intended result of the query should be empty []. But I get a documents.
Please can someone tell how this can be possible
What you are looking for can be achieved using MongoDB Aggregation
Note: As pointed out by #turivishal, $text won't work if it is not in the first stage of the aggregation pipeline.
collection.aggregate([
{
"$sort": {
"_id": -1
}
},
{
"$limit": 10 // `n` value, where n is the number of last records you want to consider
},
{
"$match" : {
// All your find query goes here
}
},
], {allowDiskUse=true}) // just in case if the computation exceeds 100MB
Since _id is indexed by default, the above aggregation query should be faster. But, its performance reduces in proportion to the n value.
Note: Replace the last line in the code example with the below line if you are using pymongo
], allowDiskUse=True)
It is not possible with $text operator, because there is a restriction,
The $match stage that includes a $text must be the first stage in the pipeline
It means we can't limit documents before $text operator, read more about $text operator restriction.
Second option this might possible if you use $regex regular expression operator instead of $text operator for searching,
And if you need to search same like $text operator you have modify your search input as below:
lets assume searchInput is your input variable
list of search field in searchFields
slice that search input string by space and loop that words array and convert it to regular expression
loop that search fields searchFields and prepare $in condition
searchInput = "This is search"
searchFields = ["field1", "field2"]
searchRegex = []
searchPayload = []
for s in searchInput.split(): searchRegex.append(re.compile(s, re.IGNORECASE));
for f in searchFields: searchPayload.append({ f: { "$in": searchRegex } })
print(searchPayload)
Now your input would look like,
[
{'field1': {'$in': [/This/i, /is/i, /search/i]}},
{'field2': {'$in': [/This/i, /is/i, /search/i]}}
]
Use that variable searchPayload with $or operator in search query at last stage using $in operator,
recents = information_collection.aggregate([
# 1 = ascending, -1 descending you can use anyone as per your requirement
{ "$sort": { "_id": 1 } },
# use any limit of number as per your requirement
{ "$limit": 10 },
{ "$match": { "$or": searchPayload } }
])
print(list(recents))
Note: The $regex regular expression search will cause performance issues.
To improve search performance you can create a compound index on your search fields like,
information_collection.createIndex({ field1: 1, field2: 1 });

MongoDB, retrieve specific field in array of objects

In my collection I have an array of objects. I'd like to share only a subset of those objects, but I can't find out how to do this?
Here are a few things I tried:
db.collections.find({},
{ fields: {
'myField': 1, // works
'myArray': 1, // works
'myArray.$': 1, // doesn't work
'myArray.$.myNestedField': 1, // doesn't work
'myArray.0.myNestedField': 1, // doesn't work
}
};
myArray.myNestedField':1 for projecting nested fields from the array.
I'll briefly explain all the variants you have.
'myField': 1 -- Projecting a field value
'myArray': 1 -- Projecting a array as a whole - (Can be scalar, embedded and sub document)
The below variants works only with positional operator($) in the query preceding the projections and projects only the first element matching the query.
'myArray.$': 1
'myArray.$.myNestedField': 1
This is not a valid projection operation.
'myArray.0.myNestedField': 1
More here on how to query & project documents

What does $sum:1 mean in Mongo

I have a collection foo:
{ "_id" : ObjectId("5837199bcabfd020514c0bae"), "x" : 1 }
{ "_id" : ObjectId("583719a1cabfd020514c0baf"), "x" : 3 }
{ "_id" : ObjectId("583719a6cabfd020514c0bb0") }
I use this query:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$sum:1}}})
Then I get a result:
{ "_id" : 1, "avg" : 2, "sum" : 3 }
What does {$sum:1} mean in this query?
From the official docs:
When used in the $group stage, $sum has the following syntax and returns the collective sum of all the numeric values that result from applying a specified expression to each document in a group of documents that share the same group by key:
{ $sum: < expression > }
Since in your example the expression is 1, it will aggregate a value of one for each document in the group, thus yielding the total number of documents per group.
Basically it will add up the value of expression for each row. In this case since the number of rows is 3 so it will be 1+1+1 =3 . For more details please check mongodb documentation https://docs.mongodb.com/v3.2/reference/operator/aggregation/sum/
For example if the query was:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$sum:$x}}})
then the sum value would be 1+3=4
I'm not sure what MongoDB version was there 6 years ago or whether it had all these goodies, but it seems to stand to reason that {$sum:1} is nothing but a hack for {$count:{}}.
In fact, $sum here is more expensive than $count, as it is being performed as an extra, whereas $count is closer to the engine. And even if you don't give much stock to performance, think of why you're even asking: because that is a less-than-obvious hack.
My option would be:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$count:{}}}})
I just tried this on Mongo 5.0.14 and it runs fine.
The good old "Just because you can, doesn't mean you should." is still a thing, no?

search in limited number of record MongoDB

I want to search in the first 1000 records of my document whose name is CityDB. I used the following code:
db.CityDB.find({'index.2':"London"}).limit(1000)
but it does not work, it return the first 1000 of finding, but I want to search just in the first 1000 records not all records. Could you please help me.
Thanks,
Amir
Note that there is no guarantee that your documents are returned in any particular order by a query as long as you don't sort explicitely. Documents in a new collection are usually returned in insertion order, but various things can cause that order to change unexpectedly, so don't rely on it. By the way: Auto-generated _id's start with a timestamp, so when you sort by _id, the objects are returned by creation-date.
Now about your actual question. When you first want to limit the documents and then perform a filter-operation on this limited set, you can use the aggregation pipeline. It allows you to use $limit-operator first and then use the $match-operator on the remaining documents.
db.CityDB.aggregate(
// { $sort: { _id: 1 } }, // <- uncomment when you want the first 1000 by creation-time
{ $limit: 1000 },
{ $match: { 'index.2':"London" } }
)
I can think of two ways to achieve this:
1) You have a global counter and every time you input data into your collection you add a field count = currentCounter and increase currentCounter by 1. When you need to select your first k elements, you find it this way
db.CityDB.find({
'index.2':"London",
count : {
'$gte' : currentCounter - k
}
})
This is not atomic and might give you sometimes more then k elements on a heavy loaded system (but it can support indexes).
Here is another approach which works nice in the shell:
2) Create your dummy data:
var k = 100;
for(var i = 1; i<k; i++){
db.a.insert({
_id : i,
z: Math.floor(1 + Math.random() * 10)
})
}
output = [];
And now find in the first k records where z == 3
k = 10;
db.a.find().sort({$natural : -1}).limit(k).forEach(function(el){
if (el.z == 3){
output.push(el)
}
})
as you see your output has correct elements:
output
I think it is pretty straight forward to modify my example for your needs.
P.S. also take a look in aggregation framework, there might be a way to achieve what you need with it.

Count fields in a MongoDB Collection

I have a collection of documents like this one:
{
"_id" : ObjectId("..."),
"field1": "some string",
"field2": "another string",
"field3": 123
}
I'd like to be able to iterate over the entire collection, and find the entire number of fields there are. In this example document there are 3 (I don't want to include _id), but it ranges from 2 to 50 fields in a document. Ultimately, I'm just looking for the average number of fields per document.
Any ideas?
Iterate over the entire collection, and find the entire number of fields there are
Now you can utilise aggregation operator $objectToArray (SERVER-23310) to turn keys into values and count them. This operator is available in MongoDB v3.4.4+
For example:
db.collection.aggregate([
{"$project":{"numFields":{"$size":{"$objectToArray":"$$ROOT"}}}},
{"$group":{"_id":null, "fields":{"$sum":"$numFields"}, "docs":{"$sum":1}}},
{"$project":{"total":{"$subtract":["$fields", "$docs"]}, _id:0}}
])
First stage $project is to turn all keys into array to count fields. Second stage $group is to sum the number of keys/fields in the collection, also the number of documents processed. Third stage $project is subtracting the total number of fields with the total number of documents (As you don't want to count for _id ).
You can easily add $avg to count for average on the last stage.
PRIMARY> var count = 0;
PRIMARY> db.my_table.find().forEach( function(d) { for(f in d) { count++; } });
PRIMARY> count
1074942
This is the most simple way I could figure out how to do this. On really large datasets, it probably makes sense to go the Map-Reduce path. But, while your set is small enough, this'll do.
This is O(n^2), but I'm not sure there is a better way.
You could create a Map-Reduce job. In the Map step iterate over the properties of each document as a javascript object, output the count and reduce to get the total.
For a simple way just find() all value and for each set of record get size of array.
db.getCollection().find(<condition>)
then for each set of result, get the size of array.
sizeOf(Array[i])