How to delete rows within an interval in MongoDb - mongodb

I have a collection with lot of rows, for instance:
{ "_id" : 1, "state" : "1+" }
I want to set up a cron tab in order to remove the first N rows in my collection.
I tried:
db.history.remove(
{
_id :
{
$lt : db.history.find().sort({_id:1}).limit(1)._id + N
}
,
$atomic : true
}
);
Where N is the number of rows to remove, and then I will add this string inside the --eval in my cron task, but this command return nothing.
What am I doing wrong? I can probably write a server side function with N in parameter...

The following works for me:
for (var v = db.ninja.find().sort({_id:1}).limit(2);v.hasNext();)
{
db.ninja.remove(v.next());
}
Note:
1) Replace ninja with the name of your collection
2) variable v holds the cursor pointer to the sorted documents (sorted by _id). I have given a limit value of 2, you can replace it with the value of N

Related

Custom index comparator in MongoDB

I'm working with a dataset composed by probabilistic encrypted elements indistinguishable from random samples. This way, sequential encryptions of the same number results in different ciphertexts. However, these still comparable through a special function that applies algorithms like SHA256 to compare two ciphertexts.
I want to add a list of the described ciphertexts to a MongoDB database and index it using a tree-based structure (i.e.: AVL). I can't simply apply the default indexing of the database because, as described, the records must be comparable using the special function.
An example: Suppose I have a database db and a collection c composed by the following document type:
{
"_id":ObjectId,
"r":string
}
Moreover, let F(int,string,string) be the following function:
F(h,l,r) = ( SHA256(l | r) + h ) % 3
where the operator | is a standard concatenation function.
I want to execute the following query in an efficient way, such as in a collection with some suitable indexing:
db.c.find( { F(h,l,r) :{ $eq: 0 } } )
for h and l chosen arbitrarily but not constants. I.e.: Suppose I want to find all records that satisfy F(h1,l1,r), for some pair (h1, l1). Later, in another moment, I want to do the same but using (h2, l2) such that h1 != h2 and l1 != l2. h and l may assume any value in the set of integers.
How can I do that?
You can execute this query use the operator $where, but this way can't use index. So, for query performance it's dependents on the size of your dataset.
db.c.find({$where: function() { return F(1, "bb", this.r) == 0; }})
Before execute the code above, you need store your function F on the mongodb server:
db.system.js.save({
_id: "F",
value: function(h, l, r) {
// the body of function
}
})
Links:
store javascript function on server
I've tried a solution that store the result of the function in your collection, so I changed the schema, like below:
{
"_id": ObjectId,
"r": {
"_key": F(H, L, value),
"value": String
}
}
The field r._key is value of F(h,l,r) with constant h and l, and the field r.value is original r field.
So you can create index on field r._key and your query condition will be:
db.c.find( { "r._key" : 0 } )

Iterate and save cursor - MongoDB

I have this array in MongoDB:
I want to iterate and make queries on it:
> for(var i=0; i < AllegriTeams.length; i++) {a[i]=db.team.find({
_id:AllegriTeams[i].team_id}, {_id:0, official_name:1})}
The array a, at the end of for cycle, contains just the first two official names. I lose the last official_name.
Your loop looks correct and it is not clear why you would not receive three items in your array a.
Check that your variable AllegriTeams has three elements.
> AllegriTeams.length
3
I mimicked your setup and I received the results you expected where a has three elements. Here's what I did:
// 1. Log into mongo and use the "test" database, for example
> use test
// 2. Create data
> db.team.insert({"_id": "Juv.26", "official_name":"Juv.26.xxx”})
WriteResult({ "nInserted" : 1 })
> db.team.insert({"_id": "Mil.74", "official_name":"Mil.74.xxx”})
WriteResult({ "nInserted" : 1 })
> db.team.insert({"_id": "Cag.00", "official_name":"Cag.00.xxx”})
WriteResult({ "nInserted" : 1 })
// 3. Create the AllegriTeams variable
> var AllegriTeams = [ { "team_id":"Juv.26"}, {"team_id":"Mil.74"}, {"team_id":"Cag.00"}]
// 4. Create the "a" array
> var a = []
// 5. Run the for loop. Consider using "findOne" instead of "find".
> for (var i=0; i < AllegriTeams.length; i++) { a[i]=db.team.find({ _id:AllegriTeams[i].team_id}, {_id:0, official_name:1})}
{ "official_name" : "Cag.00.xxx" }
// 6. Get length of "a"
> a.length
3
As an aside, take note that the find() function will return a cursor. Therefore, the values stored in your a array will be the cursor values. Consider using the findOne() function since it returns a document.
Again, check that your AllegriTeam variable has three array elements.

search in limited number of record MongoDB

I want to search in the first 1000 records of my document whose name is CityDB. I used the following code:
db.CityDB.find({'index.2':"London"}).limit(1000)
but it does not work, it return the first 1000 of finding, but I want to search just in the first 1000 records not all records. Could you please help me.
Thanks,
Amir
Note that there is no guarantee that your documents are returned in any particular order by a query as long as you don't sort explicitely. Documents in a new collection are usually returned in insertion order, but various things can cause that order to change unexpectedly, so don't rely on it. By the way: Auto-generated _id's start with a timestamp, so when you sort by _id, the objects are returned by creation-date.
Now about your actual question. When you first want to limit the documents and then perform a filter-operation on this limited set, you can use the aggregation pipeline. It allows you to use $limit-operator first and then use the $match-operator on the remaining documents.
db.CityDB.aggregate(
// { $sort: { _id: 1 } }, // <- uncomment when you want the first 1000 by creation-time
{ $limit: 1000 },
{ $match: { 'index.2':"London" } }
)
I can think of two ways to achieve this:
1) You have a global counter and every time you input data into your collection you add a field count = currentCounter and increase currentCounter by 1. When you need to select your first k elements, you find it this way
db.CityDB.find({
'index.2':"London",
count : {
'$gte' : currentCounter - k
}
})
This is not atomic and might give you sometimes more then k elements on a heavy loaded system (but it can support indexes).
Here is another approach which works nice in the shell:
2) Create your dummy data:
var k = 100;
for(var i = 1; i<k; i++){
db.a.insert({
_id : i,
z: Math.floor(1 + Math.random() * 10)
})
}
output = [];
And now find in the first k records where z == 3
k = 10;
db.a.find().sort({$natural : -1}).limit(k).forEach(function(el){
if (el.z == 3){
output.push(el)
}
})
as you see your output has correct elements:
output
I think it is pretty straight forward to modify my example for your needs.
P.S. also take a look in aggregation framework, there might be a way to achieve what you need with it.

Search for a record where a value is between two item fields in MongoDB

I have a MondoDB collection with over 5 million items. Each item has a "start" and "end" fields containing integer values.
Items don't have overlapping starts and ends.
e.g. this would be invalid:
{start:100, end:200}
{start:150, end:250}
I am trying to locate an item where a given value is between start and end
start <= VALUE <= end
The following query works, but it takes 5 to 15 seconds to return
db.blocks.find({ "start" : { $lt : 3232235521 }, "end" :{ $gt : 3232235521 }}).limit(1);
I've added the following indexes for testing with very little improvement
db.blocks.ensureIndex({start:1});
db.blocks.ensureIndex({end:1});
//also a compounded one
db.blocks.ensureIndex({start:1,end:1});
** Edit **
The result of explain() on the query results in:
> db.blocks.find({ "start" : { $lt : 3232235521 }, "end" :{ $gt : 3232235521 }}).limit(1).explain();
{
"cursor" : "BtreeCursor end_1",
"nscanned" : 1160982,
"nscannedObjects" : 1160982,
"n" : 0,
"millis" : 5779,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"end" : [
[
3232235521,
1.7976931348623157e+308
]
]
}
}
What would be the best approach to speeding this specific query up?
actually I'm working on similar problem and my friend find a nice way to solve this.
If you don't have overlapping data, you can do this:
query using start field and sort function
validate with end field
for example you can do
var x = 100;
var results = db.collection.find({start:{$lte:x}}).sort({start:-1}).limit(1)
if (results!=null) {
var result = results[0];
if (result.end > x) {
return result;
} else {
return null; // no range contain x
}
}
If you are sure that there will always range containing x, then you do not have to validate the result.
By using this piece of code, you only have to index by either start or end field and your query become a lot faster.
--- edit
I did some benchmark, using composite index takes 100-100,000ms per query, in the other hand using one index takes 1-5ms per query.
I guess compbound index should work faster for you:
db.blocks.ensureIndex({start:1, end:1});
You can also use explain to see number of scanned object, etc and choose best index.
Also if you are using mongodb < 2.0 you need to update to 2.0+, because there indexes work faster.
Also you can limit results to optimize query.
This might help: how about you introduce some redundancy. If there is not a big variance in the lengths of the intervals, then you can introduce a tag field for each record - this tag field is a single value or string that represents a large interval - say for example tag 50,000 is used to tag all records with intervals that are at least partially in the range 0-50,000 and tag 100,000 is for all intervals in the range 50,000-100,000, and so on. Now you can index on the tag as primary and one of the end points of record range as secondary.
Records on the edge of big interval would have more than one tag - so we are talking multikeys. On your query you would of course calculate the big interval tag and use it in the query.
You would roughly want SQRT of total records per tag - just a starting point for tests, then you can fine tune the big interval size.
Of course this would make writing bit slower.

Mongo DB: how to select items with nested array count > 0

The database is near 5GB. I have documents like:
{
_id: ..
user: "a"
hobbies: [{
_id: ..
name: football
},
{
_id: ..
name: beer
}
...
]
}
I want to return users who have more then 0 "hobbies"
I've tried
db.collection.find({"hobbies" : { &gt : 0}}).limit(10)
and it takes all RAM and no result.
How to do conduct this select?
And how to return only: id, name, count ?
How to do it with c# official driver?
TIA
P.S.
near i've found:
"Add new field to hande category size. It's a usual practice in mongo world."
is this true?
In this specific case, you can use list indexing to solve your problem:
db.collection.find({"hobbies.0" : {$exists : true}}).limit(10)
This just makes sure a 0th element exists. You can do the same to make sure the list is shorter than n or between x and y in length by checking the existing of elements at the ends of the range.
Have you tried using hobbies.length. i haven't tested this, but i believe this is the right way to query the range of the array in mongodb
db.collection.find({$where: '(this.hobbies.length > 0)'})
You can (sort of) check for a range of array lengths with the $size operator using a logical $not:
db.collection.find({array: {$not: {$size: 0}}})
That's somewhat true.
According to the manual
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
$size
The $size operator matches any array with the specified number of
elements. The following example would match the object {a:["foo"]},
since that array has just one element:
db.things.find( { a : { $size: 1 } } );
You cannot use $size to find a range of sizes (for example: arrays
with more than 1 element). If you need to query for a range, create an
extra size field that you increment when you add elements
So you can check for array size 0, but not for things like 'larger than 0'
Earlier questions explain how to handle the array count issue. Although in your case if ZERO really is the only value you want to test for, you could set the array to null when it's empty and set the option to not serialize it, then you can test for the existence of that field. Remember to test for null and to create the array when you want to add a hobby to a user.
For #2, provided you added the count field it's easy to select the fields you want back from the database and include the count field.
if you need to find only zero hobbies, and if the hobbies key is not set for someone with zero hobbies , use EXISTS flag.
Add an index on "hobbies" for performance enhancement :
db.collection.find( { hobbies : { $exists : true } } );
However, if the person with zero hobbies has empty array, and person with 1 hobby has an array with 1 element, then use this generic solution :
Maintain a variable called "hcount" ( hobby count), and always set it equal to size of hobbies array in any update.
Index on the field "hcount"
Then, you can do a query like :
db.collection.find( { hcount : 0 } ) // people with 0 hobbies
db.collection.find( { hcount : 5 } ) // people with 5 hobbies
3 - From #JohnPs answer, "$size" is also a good operator for this purpose.
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size