Best way to get the first inserted record in a collection of MongoDB - mongodb

I need to fetch the first inserted record in a collection in MongoDB for which I am currently using the below query:
db.users.find({}).sort({"created_at":1}).limit(1);
But, this takes up a lot of memory. The collection has about 100K records.
What is the efficient way to do this?

MongoDB _id is unique identifier which is automatically generated upon insertion of document into MongoDB collection
_id field stores ObjectId value and is automatically indexed.
According to MongoDB documentation,
The 12-byte ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
According to description as mentioned into above question to fetch first inserted record please try executing following mongodb find operation into MongoDB shell.
db.users.find({}).sort({"_id":1}).limit(1);
In above query we have sorted result according to _id field since _id Object ID value consists of unix epoch timestamp

further to this you can add specific filters in query to get first record inserted for that criteria:
like suppose you collection contains data for storing employees from IT, ADMIN, FINANCE department and you want to look for the first document inserted for IT (i.e. first IT employee) then you can execute:
db.users.find({"Dept" : "IT"}).sort({"_id":1}).limit(1);
and similarly to find last employee:
db.users.find({"Dept" : "IT"}).sort({"_id":-1}).limit(1);
Note: for bigger collections/sharded collection it will take considerable time to get the result as it iterates entire _id field for ascending and descending criteria.

Related

Remove obsolete collection in mongodb

I want to delete all the collections from my db which are not used for long time. Is there any why i can check when the particular collection was last used?
It depends what you mean by 'last used'. If you mean the last time a document was inserted into the collection then you could do this by converting the ObjectId of the last inserted document into a date. The following query should return the date the last document was inserted:
db.<collection_name>.findOne({},{_id:1})._id.getTimestamp()
the findOne query will return documents in natural order, therefore if you input no query criteria ('{}') then it will return the most recently inserted document. You can then get the _id field and call the getTimestamp() function
I'm not sure if there is any way to reliably tell when a collection was last queried. If you're running your database with profiling enabled then there might be entries in the db.system.profile collection, or in the oplog.

inserting into a mongodb collection with manually specified _id

Suppose I have a collection in mongodb and the objects in the collection have an _id that is an ObjectID that I selected in some random manner completely external to mongoDB, such as starting with ObjectID 0000 ... 0000 and incrementing by 10000, or maybe just used a random number generator to make the ObjectID's.
Suppose I then go to add another item to the collection, but I don't have an ObjectID in mind for the new object, and am satisfied with letting the system pick one. Would the system ever select a ObjectID that was already a part of the collection?
If it is relevant, I am using the java API and the python API to do this.
_id is always unique. There's an implicit unique primary index on _id field.
So rest assured that MongoDB will not choose an _id field that has been taken.
Also the _id field is a 12 byte hexadecimal string in which :
4 bytes is for the date timestamp.
3 bytes is the MAC Address.
2 bytes is the process id.
3 bytes is the counter.
So if MongoDB chooses an _id for your document in a collection, it's definitely going to be unique from the other _id fields in other documents in your collection.
Hope this helps.

Mongodb query forward from username X

I have problem whit long Mongo find results. Example how can i start query starting from _id X
to forward Example I know I have document where is 1000 users details I know there is user called Peter in list I can make query Users.find({userName: "Peter"}) and get this on user _id but how I can get all users also after this with out I need return JSON from above "Peter"
With the little amount of information you have given, You need to do this in two steps:
Get the id of the first record that matches the name "peter".
db.test.findOne({"userName":"Peter"},{"_id":1});
Returns one document that satisfies the specified query criteria. If
multiple documents satisfy the query, this method returns the first
document according to the natural order which reflects the order of
documents on the disk. In capped collections, natural order is the
same as insertion order.
Once you have the id of the record with peter, you can retrieve the records with their id > the id of this record.
db.test.find({"_id":{$gte:x}});
Where, x is the id of the first record returned by the first query.

Does _id field change in MongoDB when copying data from one collection to another?

We are planning on using MongoDB _id as a key that we would provide to the client. Therefore, the requirement is that this key should not change if we ever need to move the data from one collection to another. The copy will be performed using db.copyDatabase() or mongoimport.
One of the ways in which data can be copied from one collection to another is iterating through the documents in the first collection(C1) and inserting these documents in the second collection(C2). In this case _id should remain the same(in C2) because it would be present in the documents(of C1) being inserted(same as the case in which we would provide an _id ourselves).
However, if there is an alternate way in which documents are copied, the _id might change since it depends on :
(1) The UNIX timestamp
(2) Machine identifier
(3) ProcessId
(**This should only happen if MongoDB while copying removes _id from documents in C1 and regenerated them while inserting into C2?)
We want the _id values to be same irrespective of the location of the destination collection:
(1)within same database
(2)different database - same machine
(3)different database - different machine)
Thanks
No, the _id numbers will not change.
A new ObjectId is generated when a document without an _id field is inserted into the database. When you insert a document which already has an _id field, MongoDB won't touch it.
The timestamp, machine identifier and processID refer to those where the ObjectID was generated. This can be a database server, but it can also be generated by the MongoDB driver on the application server. In that case MongoDB will not change it on its own.
By the way: The _id can be an auto-generated ObjectId, but it doesn't have to. You can also use any other value as _id, as long as you can guarantee that it's unique. So when your data already has a natural key, you can use this as _id when you want to.

Difference between "id" and "_id" fields in MongoDB

Is there any difference between using the field ID or _ID from a MongoDB document?
I am asking this, because I usually use "_id", however I saw this sort({id:-1}) in the documentation: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Sortbyidtosortbyinsertiontime
EDIT
Turns out the docs were wrong.
I expect it's just a typo in the documentation. The _id field is primary key for every document. It's called _id and is also accessible via id. Attempting to use an id key may result in a illegal ObjectId format error.
That section is just indicating that the automatically generated ObjectIDs start with a timestamp so it's possible to sort your documents automatically. This is pretty cool since the _id is automatically indexed in every collection. See http://www.mongodb.org/display/DOCS/Object+IDs for more information. Specifically under "BSON ObjectID Specification".
A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON.
The _id field is the default field for Bson ObjectId's and it is,by default, indexed.
_id and id are not the same. You may also choose to add a field called id if you want, but it will not be index unless you add an index.
It is just a typo in the docs.
id is an alias for _id in mongoid.id would return the _id of the document.
https://github.com/mongodb/mongoid/blob/master/lib/mongoid/fields.rb#L47
if the _id field is not specified an ObjectedId is generated automatically.
My two cents:
The _id field
MongoDB assigns an _id field to each document and assigns primary index on it. There're ways by which we can apply secondary indices as well. By default, MongoDB creates values for the _id field of type ObjectID. This value is defined in BSON spec and it's structured this way:
ObjectID (12 bytes HEX string) = Date (4 bytes, a timestamp value representing number of seconds since the Unix epoch) + MAC address (3 bytes) + PID (2 bytes) + Counter (3 bytes)