Remove given number of records in Mongodb - mongodb

I have Too much records in my Collection, can I have only desired number of records and remove others without any condition?
I have a collection called Products with around 10,0000 of records and its slowing down my Local application, I am thinking to shrink this huge amount of records to something around 1000, How can do it?
OR
How to copy a collection with limited number of records?

If you want to copy collection with limited number of records without any filter condition, for loop can be used . It copies 1000 document form originalCollection to copyCollection.
db.originalCollection.find().limit(1000).forEach( function(doc)
{db.copyCollection.insert(doc)} );

Related

MongoDb optimisation

Say I have a MongoDB collection with 5,000 records (or thousands). I want to update this collection with new records.
These new records can be the same or with few updated records.
To do this, I have 2 approaches:
Iterate over these records and determine which ones to update (based on a condition). If the condition is truthy update the record.
Delete all the existing records (deleteMany), and then simply add new data.
Which one would be more performant and optimised? And why? Or this there another way?

MongoDB Collection Structure Performance

I have a MongoDB database of semi-complex records and my reporting queries are struggling as the collection size increases. I want to make some reporting Views that are optimized for quick searching and aggregating. Here is an sample format:
var record = {
fieldOne:"",
fieldTwo:"",
fieldThree:"", //There is approx 30 fields at this level
ArrayOne:[
{subItem1:""},
{subItem2:""} // There are usually about 10-15 items in this array
],
ArrayTwo:[
{subItem1:""}, //ArrayTwo items reference ArrayOne item ids for ref
{subItem2:""} // There are usually about 20-30 items in this array
],
ArrayThree:[
{subItem1:""},// ArrayThree items reference both ArrayOne and ArrayTwo items for ref
{subItem2:""},// There are usually about 200-300 items in this array
{subArray:[
{subItem1:""},
{subItem2:""} // There are usually about 5 items in this array
]}
]
};
I used to have this data where ArrayTwo was inside ArrayOne items and ArrayThree was inside ArrayTwo items so that referencing a parent was implied, but reporting became a nightmare with multiple nested levels of arrays.
I have a field called 'fieldName' at every level which is a way we target objects in the arrays.
I will often need to aggregate values from any of the 3 arrays across thousands of records in a query.
I see two ways of doing it.
A). Flatten and go Vertically, making a single smaller record in the database for every item in ArrayThree, essentially adding 200 records per single complex record. I tried this and I already have 200K records in 5 days of new data coming in. The benefit to this is that I have fieldNames that I can put indexing on.
B). Flatten Horizontally, making every array flat all within a single collection record. I would use the FieldName located in each array object as the key. This would make a single record with 200-300 fields in it. This would make a lot less records in the collection, but the fields would be dynamic, so adding indexes would not be possible(that I know of).
At this time, I have approx 300K existing records that I would be building this View off of. If I go vertical, that would place 60 Million simple records in the db and if I go Horizontal, it would be 300K records with 200 fields flattened in each with no indexing ability.
What's the right way to approach this?
I'd be inclined to stick with the mongo philosophy and do individual entries for each distinct set/piece of information, rather than relying on references within a weird composite object.
60 Million records is "a lot" (but it really isn't "a ton"), and mongodb loves to have lots of little things tossed at it. On the flipside, you'd end up with fewer big objects and take up just as much space.
(*using the wired tiger back end with compression will make your disk go further too).
**edit:
I'd also add that you really really really want indexes at the end of the day, so that's another vote for this approach.

Get top 50 records for a certain value w/ mongo and meteor

In my meteor project, I have a leaderboard of sorts, where it shows players of every level on a chart, spread across every level in the game. For simplicitys sake, lets say there are levels 1-100. Currently, to avoid overloading meteor, I just tell the server to send me every record newer than two weeks old, but that's not sufficient for an accurate leaderboard.
What I'm trying to do is show 50 records representing each level. So, if there are 100 records at level 1, 85 at level 2, 65 at level 3, and 45 at level 4, I want to show the latest 50 records from each level, making it so I would have [50, 50, 50, 45] records, respectively.
The data looks something like this:
{
snapshotDate: new Date(),
level: 1,
hp: 5000,
str: 100
}
I think this requires some mongodb aggregation, but I couldn't quite figure out how to do this in one query. It would be trivial to do it in two, though - select all records, group by level, sort each level by date, then take the last 50 records from each level. However, I would prefer to do it in one operation, if I could. Is it currently possible to do something like this?
Currently there is no way to pick up n top records of a group, in the aggregation pipeline. There is an unresolved open ticket regarding this: https://jira.mongodb.org/browse/SERVER-9377.
There are two solutions to this:
Keep your document structure as it is now and aggregate, but,
grab the n top records and slice off the remaining records for each group, in the client side.
Code:
var top_records = [];
db.collection.aggregate([
// The sort operation needs to come before the $group,
// because once the records are grouped by level,
// there exists only one document per group.
{$sort:{"snapshotDate":-1}},
// Maintain all the records in an array in sorted order.
{$group:{"_id":"$level","recs":{$push:"$$ROOT"}}},
],{allowDiskUse: true}).forEach(function(level){
level.recs.splice(50); //Keep only the top 50 records.
top_records.push(level);
})
Remember that this loads all the documents for each level and removes the unwanted records in the client side.
Alter your document structure to accomplish what you really need. If
you only need the top n records always, keep them always in sorted
order in the root document.This is accomplished using a sorted capped array.
Your document would look like this:
{
level:1,
records:[{snapshotDate:2,hp:5000,str:100},
{snapshotDate:1,hp:5001,str:101}]
}
where, records is an capped array of size n and always has sub documents sorted in descending order of their snapshotDate.
To make the records array work that way, we always perform an update operation when we need to insert documents to it for any level.
db.collection.update({"level":1},
{$push:{
recs:{
$each:[{snapshotDate:1,hp:5000,str:100},
{snapshotDate:2,hp:5001,str:101}],
$sort:{"snapshotDate":-1},
$slice:50 //Always trim the array size to 50.
}
}},{upsert:true})
What this does is, is always keeps the size of the records array to 50 and always sorts the records whenever new sub documents are inserted at a level.
A simple find, db.collection.find({"level":{$in:[1,2,..]}}), would give you the top 50 records in order, for each selected level.

what is the Recommended max emits in map function?

I am new to mongoDb and planning to use map reduce for computing large amount of data.
As you know we have map function to match the criteria and then emit the required data for a given filed. In my map function I have multiple emits. As of now I have 50 Fields emitted from a single document. That means from a single document in a collection explodes to 40 document in temp table. So if I have 1 million documents to be processed it will 1million * 40 documents in temp table by end of map function.
The next step is to sort on this collection. (I haven't used sort param of map will it help?)
Thought of splitting the map function into two….but then one more problem …while performing map function if by chance I ran into an exception thought of skipping entire document data (I.e not to emit any data from that document) but if I split I won't be able to….
In mongoDB.org i found a comment which said..."When I run MR job, with sort - it takes 1.5 days to reach 23% at first stage of MR. When I run MR job, without sort, it takes about 24-36 hours for all job.Also when turn off jsMode is speed up my MR twice ( before i turn off sorting )"
Will enabling sort help? or Will turning OFF jsmode help? i am using mongo 2.0.5
Any suggestion?
Thanks in advance .G
The next step is to sort on this collection. (I haven't used sort param of map will it help?)
Don't know what you mean, MR's don't have sort params, only the incoming query has a sort param. The sort param of the incoming query only sorts the data going in. Unless you are looking for some specific behaviour that will avoid sorting the final output using an incoming sort you don't normally need to sort.
How are you looking to use this MR. Obviusly it won't be in realtime else you would just kill your servers so Ima guess it is a background process that runs and formats data to the way you want. I would suggest looking into incremental MRs so that you do delta updates throughout the day to limit the amount of resources used at any given time.
So if I have 1 million documents to be processed it will 1million * 40 documents in temp table by end of map function.
Are you emiting multiple times? If not then the temp table should have only one key per row with documents of the format:
{
_id: emitted_id
[{ //each of your docs that you emit }]
}
This is shown: http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/
or Will turning OFF jsmode help? i am using mongo 2.0.5
Turning off jsmode is unlikely to do anything significant and results from it have varied.

Is there a way to limit the number of records in certain collection

Assuming I insert the following records (e.g. foo1, foo2, foo3, foo4, .... foo10)
I would like the collection to retain only 5 records at any point in time (e.g. it could be foo1, ... foo5 OR foo2, ... foo6 or foo6, ... foo10)
How should I achieve this?
Sounds like you're looking for a capped collection:
Capped collections are fixed sized collections...
[...]
Once the space is fully utilized, newly added objects will replace the oldest objects in the collection.
You can achieve this using a command similar to
db.createCollection("collectionName",{capped:true,size:10000,max:5})
where 10000 is the size in bytes and 5 is the maximum number of documents you would restrict in the collection.
By using
db.createCollection("collectionName",{capped:true,size:10000,max:1000})
command, limits can be put on the no. of records in collection
Capped Collections sound great but it limits remove operation.
In a capped Collection you can only insert new elements and old elements will be removed automatically when certain size is reached.
However if you want to insert and delete any document in any order and limit the maximum number of document in a collection, mongoDB does not offer a direct solution.
When I encounter this problem I use an extra count variable in another collection.
This collection has a validation rule that avoids count variables to become negative.
Count variable should always be non negative integer number.
"count": {
"$gte": 0
}
The algorithm was simple.
Decrement the count by one.
If it succeed insert the document.
If it fails it means there is no space left.
Vice versa for deletion.
Also you can use transactions to prevent failures(Count is decremented but service is failed just before insertion operation).