search in limited number of record MongoDB - mongodb

I want to search in the first 1000 records of my document whose name is CityDB. I used the following code:
db.CityDB.find({'index.2':"London"}).limit(1000)
but it does not work, it return the first 1000 of finding, but I want to search just in the first 1000 records not all records. Could you please help me.
Thanks,
Amir

Note that there is no guarantee that your documents are returned in any particular order by a query as long as you don't sort explicitely. Documents in a new collection are usually returned in insertion order, but various things can cause that order to change unexpectedly, so don't rely on it. By the way: Auto-generated _id's start with a timestamp, so when you sort by _id, the objects are returned by creation-date.
Now about your actual question. When you first want to limit the documents and then perform a filter-operation on this limited set, you can use the aggregation pipeline. It allows you to use $limit-operator first and then use the $match-operator on the remaining documents.
db.CityDB.aggregate(
// { $sort: { _id: 1 } }, // <- uncomment when you want the first 1000 by creation-time
{ $limit: 1000 },
{ $match: { 'index.2':"London" } }
)

I can think of two ways to achieve this:
1) You have a global counter and every time you input data into your collection you add a field count = currentCounter and increase currentCounter by 1. When you need to select your first k elements, you find it this way
db.CityDB.find({
'index.2':"London",
count : {
'$gte' : currentCounter - k
}
})
This is not atomic and might give you sometimes more then k elements on a heavy loaded system (but it can support indexes).
Here is another approach which works nice in the shell:
2) Create your dummy data:
var k = 100;
for(var i = 1; i<k; i++){
db.a.insert({
_id : i,
z: Math.floor(1 + Math.random() * 10)
})
}
output = [];
And now find in the first k records where z == 3
k = 10;
db.a.find().sort({$natural : -1}).limit(k).forEach(function(el){
if (el.z == 3){
output.push(el)
}
})
as you see your output has correct elements:
output
I think it is pretty straight forward to modify my example for your needs.
P.S. also take a look in aggregation framework, there might be a way to achieve what you need with it.

Related

Selecting data from MongoDB where K of N criterias are met

I have documents with four fields: A, B, C, D Now I need to find documents where at least three fields matches. For example:
Query: A=a, B=b, C=c, D=d
Returned documents:
a,b,c,d (four of four met)
a,b,c (three of four met)
a,b,d (another three of four met)
a,c,d (another three of four met)
b,c,d (another three of four met)
So far I created something like:
`(A=a AND B=b AND C=c)
OR (A=a AND B=b AND D=d)
OR (A=a AND C=c AND D=d)
OR (B=b AND C=c AND D=d)`
But this is ugly and error prone.
Is there a better way to achieve it? Also, query performance matters.
I'm using Spring Data but I believe it does not matter. My current code:
Criteria c = new Criteria();
Criteria ca = Criteria.where("A").is(doc.getA());
Criteria cb = Criteria.where("B").is(doc.getB());
Criteria cc = Criteria.where("C").is(doc.getC());
Criteria cd = Criteria.where("D").is(doc.getD());
c.orOperator(
new Criteria().andOperator(ca,cb,cc),
new Criteria().andOperator(ca,cb,cd),
new Criteria().andOperator(ca,cc,cd),
new Criteria().andOperator(cb,cc,cd)
);
Query query = new Query(c);
return operations.find(query, Document.class, "documents");
Currently in MongoDB we cannot do this directly, since we dont have any functionality supporting Permutation/Combination on the query parameters.
But we can simplify the query by breaking the condition into parts.
Use Aggregation pipeline
$project with records (A=a AND B=b) --> This will give the records which are having two conditions matching.(Our objective is to find the records which are having matches for 3 out of 4 or 4 out of 4 on the given condition)`
Next in the pipeline use OR condition (C=c OR D=d) to find the final set of records which yields our expected result.
Hope it Helps!
The way you have it you have to do all permutations in your query. You can use the aggregation framework to do this without permuting all combinations. And it is generic enough to do with any K. The downside is I think you need Mongodb 3.2+ and also Spring Data doesn't support these oparations yet: $filter $concatArrays
But you can do it pretty easy with the java driver.
[
{
$project:{
totalMatched:{
$size:{
$filter:{
input:{
$concatArrays:[ ["$A"], ["$B"], ["$C"],["$D"]]
},
as:"attr",
cond:{
$eq:["$$attr","a"]
}
}
}
}
}
},
{
$match:{
totalMatched:{ $gte:3 }
}
}
]
All you are doing is you are concatenating the values of all the fields you need to check in a single array. Then select a subset of those elements that are equal to the value you are looking for (or any condition you want for that matter) and finally getting the size of that array for each document.
Now all you need to do is to $match the documents that have a size of greater than or equal to what you want.

MongoDB Query advice for weighted randomized aggregation

By far I have encountered ways for selecting random documents but my problem is a bit more of a pickle.So here goes
I have a collection which contains say a 1000+ documents (products)
say each document has a more or less generic format of .Say for simplicity it is
{"_id":{},"name":"Product1","groupid":5}
The groupid is a number say between 1 to 20 denoting the product belongs to that group.
Now if my query input is something like an array of {groupid->weight} for eg {[{"2":4},{"7":6}]} and say another parameter n(=10 say) Then I need to be able to pick 4 random documents that belong to groupid 2 and 6 random documents that belong to groupid 7.
The only solution i can think of is to run 'm' subqueries where m is the array length in the query input.
How do I accomplish this an efficient manner in MongoDB using probably a Mapreduce.
Picking up n random documents for each group.
Group the records by the groupid field. Emit the groupid as key
and the record as value.
For each group pick n random documents from the values array.
Let,
var parameter = {"5":1,"6":2}; //groupid->weight, keep it as an Object.
be the input to the map reduce functions.
The map function, emit only those group ids which we have provided as the parameter.
var map = function map(){
if(parameter.hasOwnProperty(this.groupid)){
emit(this.groupid,this);
}
}
The reduce function, for each group, get random records based on the parameter object in scope.
var reduce = function(key,values){
var length = values.length;
var docs = [];
var added = [];
var i= 1;
while(i<=parameter[key]){
var index = Math.floor(Math.random()*length);
if(added.indexOf(index) == -1){
docs.push(values[index]);
added.push(index);
i++;
}
else{
i--;
}
}
return {result:docs};
}
Invoking map reduce on the collection, by passing the parameter object in scope.
db.collection.mapReduce(map,
reduce,
{out: "sam",
scope:{"parameter":{"5":1,"6":2,"n":10}}})
To get the dumped output:
db.sam.find({},{"_id":0,"value.result":1}).pretty()
When you bring the parameter n into picture, you need to specify the number of documents for each group as a ratio, or else that parameter is not necessary at all.

Number of items in the aggregation with MongoDB 2.6

My query looks like that:
var x = db.collection.aggregate(...);
I want to know the number of items in the result set. The documentation says that this function returns a cursor. However it contains far less methods/fields than when using db.collection.find().
for (var k in x) print(k);
Produces
_firstBatch
_cursor
hasNext
next
objsLeftInBatch
help
toArray
forEach
map
itcount
shellPrint
pretty
No count() method! Why is this cursor different from the one returned by find()? itcount() returns some type of count, but the documentation says "for testing only".
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count, like that:
var cnt = x.hasNext() ? x.next().cnt : 0;
Is there a more straight forward way to get this count? As in db.collection.find(...).count()?
Barno's answer is correct to point out that itcount() is a perfectly good method for counting the number of results of the aggregation. I just wanted to make a few more points and clear up some other points of confusion:
No count() method! Why is this cursor different from the one returned by find()?
The trick with the count() method is that it counts the number of results of find() on the server side. itcount(), as you can see in the code, iterates over the cursor, retrieving the results from the server, and counts them. The "it" is for "iterate". There's currently (as of MongoDB 2.6), no way to just get the count of results from an aggregation pipeline without returning the cursor of results.
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count
Yes. This is a reasonable way to get the count of results and should be more performant than itcount() since it does the work on the server and does not need to send the results to the client. If the point of the aggregation within your application is just to produce the number of results, I would suggest using the $group stage to get the count. In the shell and for testing purposes, itcount() works fine.
Where have you read that itcount() is "for testing only"?
If in the mongo shell I do
var p = db.collection.aggregate(...);
printjson(p.help)
I receive
function () {
// This is the same as the "Cursor Methods" section of DBQuery.help().
print("\nCursor methods");
print("\t.toArray() - iterates through docs and returns an array of the results")
print("\t.forEach( func )")
print("\t.map( func )")
print("\t.hasNext()")
print("\t.next()")
print("\t.objsLeftInBatch() - returns count of docs left in current batch (when exhausted, a new getMore will be issued)")
print("\t.itcount() - iterates through documents and counts them")
print("\t.pretty() - pretty print each document, possibly over multiple lines")
}
If I do
printjson(p)
I find that
"itcount" : function (){
var num = 0;
while ( this.hasNext() ){
num++;
this.next();
}
return num;
}
This function
while ( this.hasNext() ){
num++;
this.next();
}
It is very similar var cnt = x.hasNext() ? x.next().cnt : 0; And this while is perfect for count...

MongoDB Aggregating Array Items

Given the Schema Displayed Below & MongoDB shell version: 2.0.4:
How would I go about counting the number of items in the impressions array ?
I am thinking I have to do a Map Reduce, which seems to be a little complex, I thought that the count Function would do it. Can someone Demonstrate this?
A simple example of counting is:
var count = 0;
db.engagement.find({company_name: "me"},{impressions:1}).forEach(
function (doc) {
count += doc.impressions.length;
}
)
print("Impressions: " + count);
If you have a large number of documents to process, you would be better maintaining the count as an explicit field. You could either update the count when pushing to the impressions array, or use an incremental MapReduce to re-count for updated documents as needed.

Random Sampling from Mongo

I have a mongo collection with documents. There is one field in every document which is 0 OR 1. I need to random sample 1000 records from the database and count the number of documents who have that field as 1. I need to do this sampling 1000 times. How do i do it ?
For people coming to the answer, you should now use the new $sample aggregation function, new in 3.2.
https://docs.mongodb.org/manual/reference/operator/aggregation/sample/
db.collection_of_things.aggregate(
[ { $sample: { size: 15 } } ]
)
Then add another step to count up the 0s and 1s using $group to get the count. Here is an example from the MongoDB docs.
For MongoDB 3.0 and before, I use an old trick from SQL days (which I think Wikipedia use for their random page feature). I store a random number between 0 and 1 in every object I need to randomize, let's call that field "r". You then add an index on "r".
db.coll.ensureIndex(r: 1);
Now to get random x objects, you use:
var startVal = Math.random();
db.coll.find({r: {$gt: startVal}}).sort({r: 1}).limit(x);
This gives you random objects in a single find query. Depending on your needs, this may be overkill, but if you are going to be doing lots of sampling over time, this is a very efficient way without putting load on your backend.
Here's an example in the mongo shell .. assuming a collection of collname, and a value of interest in thefield:
var total = db.collname.count();
var count = 0;
var numSamples = 1000;
for (i = 0; i < numSamples; i++) {
var random = Math.floor(Math.random()*total);
var doc = db.collname.find().skip(random).limit(1).next();
if (doc.thefield) {
count += (doc.thefield == 1);
}
}
I was gonna edit my comment on #Stennies answer with this but you could also use a seprate auto incrementing ID index here as an alternative if you were to skip over HUGE amounts of record (talking huge here).
I wrote another answer to another question a lot like this one where some one was trying to find nth record of the collection:
php mongodb find nth entry in collection
The second half of my answer basically describes one potential method by which you could approach this problem. You would still need to loop 1000 times to get the random row of course.
If you are using mongoengine, you can use a SequenceField to generate an incremental counter.
class User(db.DynamicDocument):
counter = db.SequenceField(collection_name="user.counters")
Then to fetch a random list of say 100, do the following
def get_random_users(number_requested):
users_to_fetch = random.sample(range(1, User.objects.count() + 1), min(number_requested, User.objects.count()))
return User.objects(counter__in=users_to_fetch)
where you would call
get_random_users(100)