I want to know the most recent record in a collection. How to do that?
Note: I know the following command line queries works:
1. db.test.find().sort({"idate":-1}).limit(1).forEach(printjson);
2. db.test.find().skip(db.test.count()-1).forEach(printjson)
where idate has the timestamp added.
The problem is longer the collection is the time to get back the data and my 'test' collection is really really huge. I need a query with constant time response.
If there is any better mongodb command line query, do let me know.
This is a rehash of the previous answer but it's more likely to work on different mongodb versions.
db.collection.find().limit(1).sort({$natural:-1})
This will give you one last document for a collection
db.collectionName.findOne({}, {sort:{$natural:-1}})
$natural:-1 means order opposite of the one that records are inserted in.
Edit: For all the downvoters, above is a Mongoose syntax,
mongo CLI syntax is: db.collectionName.find({}).sort({$natural:-1}).limit(1)
Yet another way of getting the last item from a MongoDB Collection (don't mind about the examples):
> db.collection.find().sort({'_id':-1}).limit(1)
Normal Projection
> db.Sports.find()
{ "_id" : ObjectId("5bfb5f82dea65504b456ab12"), "Type" : "NFL", "Head" : "Patriots Won SuperBowl 2017", "Body" : "Again, the Pats won the Super Bowl." }
{ "_id" : ObjectId("5bfb6011dea65504b456ab13"), "Type" : "World Cup 2018", "Head" : "Brazil Qualified for Round of 16", "Body" : "The Brazilians are happy today, due to the qualification of the Brazilian Team for the Round of 16 for the World Cup 2018." }
{ "_id" : ObjectId("5bfb60b1dea65504b456ab14"), "Type" : "F1", "Head" : "Ferrari Lost Championship", "Body" : "By two positions, Ferrari loses the F1 Championship, leaving the Italians in tears." }
Sorted Projection ( _id: reverse order )
> db.Sports.find().sort({'_id':-1})
{ "_id" : ObjectId("5bfb60b1dea65504b456ab14"), "Type" : "F1", "Head" : "Ferrari Lost Championship", "Body" : "By two positions, Ferrari loses the F1 Championship, leaving the Italians in tears." }
{ "_id" : ObjectId("5bfb6011dea65504b456ab13"), "Type" : "World Cup 2018", "Head" : "Brazil Qualified for Round of 16", "Body" : "The Brazilians are happy today, due to the qualification of the Brazilian Team for the Round of 16 for the World Cup 2018." }
{ "_id" : ObjectId("5bfb5f82dea65504b456ab12"), "Type" : "NFL", "Head" : "Patriots Won SuperBowl 2018", "Body" : "Again, the Pats won the Super Bowl" }
sort({'_id':-1}), defines a projection in descending order of all documents, based on their _ids.
Sorted Projection ( _id: reverse order ): getting the latest (last) document from a collection.
> db.Sports.find().sort({'_id':-1}).limit(1)
{ "_id" : ObjectId("5bfb60b1dea65504b456ab14"), "Type" : "F1", "Head" : "Ferrari Lost Championship", "Body" : "By two positions, Ferrari loses the F1 Championship, leaving the Italians in tears." }
I need a query with constant time response
By default, the indexes in MongoDB are B-Trees. Searching a B-Tree is a O(logN) operation, so even find({_id:...}) will not provide constant time, O(1) responses.
That stated, you can also sort by the _id if you are using ObjectId for you IDs. See here for details. Of course, even that is only good to the last second.
You may to resort to "writing twice". Write once to the main collection and write again to a "last updated" collection. Without transactions this will not be perfect, but with only one item in the "last updated" collection it will always be fast.
php7.1 mongoDB:
$data = $collection->findOne([],['sort' => ['_id' => -1],'projection' => ['_id' => 1]]);
My Solution :
db.collection("name of collection").find({}, {limit: 1}).sort({$natural: -1})
If you are using auto-generated Mongo Object Ids in your document, it contains timestamp in it as first 4 bytes using which latest doc inserted into the collection could be found out. I understand this is an old question, but if someone is still ending up here looking for one more alternative.
db.collectionName.aggregate(
[{$group: {_id: null, latestDocId: { $max: "$_id"}}}, {$project: {_id: 0, latestDocId: 1}}])
Above query would give the _id for the latest doc inserted into the collection
This is how to get the last record from all MongoDB documents from the "foo" collection.(change foo,x,y.. etc.)
db.foo.aggregate([{$sort:{ x : 1, date : 1 } },{$group: { _id: "$x" ,y: {$last:"$y"},yz: {$last:"$yz"},date: { $last : "$date" }}} ],{ allowDiskUse:true })
you can add or remove from the group
help articles: https://docs.mongodb.com/manual/reference/operator/aggregation/group/#pipe._S_group
https://docs.mongodb.com/manual/reference/operator/aggregation/last/
Mongo CLI syntax:
db.collectionName.find({}).sort({$natural:-1}).limit(1)
Let Mongo create the ID, it is an auto-incremented hash
mymongo:
self._collection.find().sort("_id",-1).limit(1)
Related
I'm getting puzzled more and more discovering how mongodb is overcomplicated and bad designed in the query writing, anyway I have this kind of document in a db with thousand of records:
db.messages.aggregate([{$limit: 1}]).pretty()
{
"_id" : ObjectId("4f16fc97d1e2d32371003f42"),
"body" : "Hey Gillette,\n\nThe heat rate is going to depend on the type of fuel and the construction \ndate of the unit. Unfortunately, most of that info is proprietary. \n\nChris Gaskill is the head of our fundamentals group and he might be able to \nsupply you with some of the guidelines.\n\n-Bass\n\n\n \n\tEnron North America Corp.\n\t\n\tFrom: Lisa Gillette 04/05/2001 02:31 PM\n\t\n\nTo: Eric Bass/HOU/ECT#ECT\ncc: \nSubject: Power Generation Question\n\nHey Bass,\n\nI have a question and I am hoping you can help me. I am wanting to compile a \nlist of all the different types of power plants and their respective heat \nrates to determine some sort of generation ratio.\n\ni.e. Coal 4 mmbtu = 1 MW\n Simple Cycle 11 mmbtu = 1 MW\n\nPlease let me know if you can help me or point me to someone who can. Just \nFYI...Bryan suggested that I call you so blame him as you curse me under your \nbreath right now.\n\nThanks,\nLisa\n\n",
"filename" : "1045.",
"headers" : {
"Content-Transfer-Encoding" : "7bit",
"Content-Type" : "text/plain; charset=us-ascii",
"Date" : ISODate("2001-04-05T14:45:00Z"),
"From" : "eric.bass#enron.com",
"Message-ID" : "<2106897.1075854772243.JavaMail.evans#thyme>",
"Mime-Version" : "1.0",
"Subject" : "Re: Power Generation Question",
"To" : [
"lisa.gillette#enron.com"
],
"X-FileName" : "ebass.nsf",
"X-Folder" : "\\Eric_Bass_Jun2001\\Notes Folders\\Sent",
"X-From" : "Eric Bass",
"X-Origin" : "Bass-E",
"X-To" : "Lisa Gillette",
"X-bcc" : "",
"X-cc" : ""
},
"mailbox" : "bass-e",
"subFolder" : "sent"
}
And I need to find records from address X to address Y.
I managed to catch the "From" records with
db.messages.find({"headers.From": "eric.bass#enron.com"}).pretty().count()
But I can't get the To records (and I Need to get both togheter).
To query the "To" field I've tried:
db.messages.find({headers: {$elemMatch :{ "To": "lisa.gillette#enron.com"}}})
But it returns nothing
What am I missing?
Thanks
$elemMatch - To use this operator we need to give the array element and the matching operator, here in your case it should be like
db.messages.find({"headers.To": {$elemMatch :{$eq:"lisa.gillette#enron.com"}}})
$elemMatch is optimal to use when we have multiple queries to given for the array elements. If we are specifying only a single condition in the $elemMatch expression, we don't need to use $elemMatch, instead we can use find
db.messages.find({"headers.To": "lisa.gillette#enron.com"});
I have a collection with users. Each user has comments. I want to track for some specific users (according to theirs ids) if there is a new comment.
Tailable cursor I guess are what I need but my main problem is that I want to track subdocuments and not documents.
Sample of tracking documents in python:
db = Connection().my_db
coll = db.my_collection
cursor = coll.find(tailable=True)
while cursor.alive:
try:
doc = cursor.next()
print doc
except StopIteration:
time.sleep(1)
One solution is to run intervals every x time and see if the number of the comments has changed. However I do not find the interval solution very appealing. Is there any better way to track changes? Probably with tailable cursors.
PS: I have a comment_id field (which is an ObjectID) in each comment.
Small update:
Since I have the commect_id bson, I can store the biggest (=latest) one in each user. Then run intervals compare the bson if it's still the latest one. I don't mind not to be a precisely real time method. Even 10 minutes of delay is fine. However now I have 70k users and 180k comments but I worry for the scalability of this method.
This would be my solution. Evaluate if it fits your requirement -
I am assuming a data structure as follows
db.user.find().pretty()
{
"_id" : ObjectId("5335123d900f7849d5ea2530"),
"user_id" : 200,
"comments" : [
{
"comment_id" : 1,
"comment" : "hi",
"createDate" : ISODate("2012-01-01T00:00:00Z")
},
{
"comment_id" : 2,
"comment" : "bye",
"createDate" : ISODate("2013-01-01T00:00:00Z")
}
]
}
{
"_id" : ObjectId("5335123e900f7849d5ea2531"),
"user_id" : 201,
"comments" : [
{
"comment_id" : 3,
"comment" : "hi",
"createDate" : ISODate("2012-01-01T00:00:00Z")
},
{
"comment_id" : 4,
"comment" : "bye",
"createDate" : ISODate("2013-01-01T00:00:00Z")
}
]
}
I added createDate attribute to the document. Add an index as follows -
db.user.ensureIndex({"user_id":1,"comments.createDate":-1})
You can search for latest comments with the query -
db.user.find({"user_id":200,"comments.createDate":{$gt:ISODate('2012-12-31')}})
The time used for "greater than" comparison would be last checked time. Since you are using index, the search will be faster. You can follow the same idea of checking in for new comments in some interval.
You can also use UTC time stamp, instead of ISODate. That way you don't have to worry about bson data type.
Note that while creating index on createDate, I have specified descending index.
If you will have too many comments within a user document, over a period of time, I would suggest that, you move comments to a different collection. Use user_id as one of the attributes in the comment document. That will give a better performance in the long run.
I am playing around with the The bios Example Collection from http://docs.mongodb.org/manual/reference/bios-example-collection to educate myself about querying mongodb.
I want to retrieve informations about the awards won by _id : 1 in year : 1975.
I tried several queries, among those
bios.find({
"_id" : 1,
"awards" : {
"year" : 1975
}
});
but I never receive the proper document back. How can I retrieve this document in the array?
You have to use the dot notation:
bios.find({"_id" : 1, "awards.year" : 1975 });
It's a rather pointless query, because you also have the _id in the query, but I guess that's due to the fact that you're playing with an example. Also, you're saying you're looking for awards from 1967, but the code says 1975.
If you search for "awards" : { "year" : 1975 }, mongodb will look for an exact match of the entire subdocument awards. In this case, that is not what you want. Also, since awards is an array, this will always be false. If you wanted to look up a specific award document in a list, $elemMatch would be the way to go.
I have a MongoDB $within that looks like this:
db.action.find( { $and : [
{ actionType : "PLAY" },
{
location : {
$within : {
$polygon : [ [ 0.0, 0.1 ], [ 0.0, 0.2 ] .. [ a.b, c.d ] ]
}
}
}
] } ).sort( { time : -1 } ).limit(50)
With regard to the action collection documents
There are 5 actionTypes
The action documents MAY or MAY NOT have a location with a ratio of approximately 70:30 for PLAY actions
Otherwise there is no location
The action documents will ALWAYS have time
The collection contains the following indexes
# I am interested recent actions
db.action.ensureIndex({"time": -1}
# I am interested in recent actions by a specific user
db.action.ensureIndex({"userId" : 1}, "time" -1}
# I am interested in recent actions that relate to a unique song id
db.action.ensureIndex({"songId" : 1}, "time" -1}
I am experimenting with the following two indexes
LocationOnly: db.action.ensureIndex({"location":"2d"})
LocationPlusTime: db.action.ensureIndex({"location":"2d"}, { "time": -1})
Identical queries with each index are explained below:
LocationOnly
{
"cursor":"BasicCursor",
"isMultiKey":false,
"n":50,
"nscannedObjects":91076,
"nscanned":91076,
"nscannedObjectsAllPlans":273229,
"nscannedAllPlans":273229,
"scanAndOrder":true,
"indexOnly":false,
"nYields":1,
"nChunkSkips":0,
"millis":1090,
"indexBounds":{},
"server":"xxxx"
}
LocationPlusTime
{
"cursor":"BasicCursor",
"isMultiKey":false,
"n":50,
"nscannedObjects":91224,
"nscanned":91224,
"nscannedObjectsAllPlans":273673,
"nscannedAllPlans":273673,
"scanAndOrder":true,
"indexOnly":false,
"nYields":44,
"nChunkSkips":0,
"millis":1156,
"indexBounds":{},
"server":"xxxxx"
}
Given
The geosearch will cover documents of ALL types
The geosearch will cover documents with NO Location and WITH Location in a ratio of roughly 60:40
My questions are
Can anybody explain why isMultiKey="false" on the second explain plan?
Can anybody explain why there are more yields on the 2nd explain plan?
My speculative thoughts are
The potential for NULL location is reducing the effectiveness of the
GeoSpatial index.
Compound Indexes of the GeoSpatial variety are not as powerful as standard compound indexes.
UPDATE
A sample document looks like this.
{ "_id" : "adba1154f1f3d4ddfafbff9bb3ae98f2a50e76ffc74a38bae1c44d251db315d25c99e7a1b4a8acb13d11bcd582b9843e335006a5be1d3ac8a502a0a205c0c527",
"_class" : "ie.soundwave.backstage.model.action.Action",
"time" : ISODate("2013-04-18T10:11:57Z"),
"actionType" : "PLAY",
"location" : { "lon" : -6.412839696767714, "lat" : 53.27401934563561 },
"song" : { "_id" : "82e08446c87d21b032ccaee93109d6be",
"title" : "Motion Sickness", "album" : "In Our Heads", "artist" : "Hot Chip"
},
"userId" : "51309ed6e4b0e1fb33d882eb", "createTime" : ISODate("2013-04-18T10:12:59.127Z")
}
UPDATE
The geo-query looks like this
https://www.google.com/maps/ms?msid=214949566612971430368.0004e267780661744eb95&msa=0&ll=-0.01133,-0.019226&spn=0.14471,0.264187
For various reasons approximately 250,000 documents exist in our DB at the point 0.0
I played with this for a number of days and got the result I was looking for.
Firstly, given that action types other than "PLAY" CAN NOT have a location the additional query parameter "actionType==PLAY" was unnecessary and removed. Straight away I flipped from "time-reverse-b-tree" cursor to "Geobrowse-polygon" and for my test search latency improved by an order of 10.
Next, I revisited the 2dsphere as suggested by Derick. Again another latency improvement by roughly 5. Overall a much better user experience for map searches was achieved.
I have one refinement remaining. Queries in areas where there are no plays for a number of days have generally increased in latency. This is due to the query looking back in time until it can find "some play". If necessary, I will add in a time range guard to limit the search space of these queries to a set number of days.
Thanks for the hints Derick.
Has anybody used MongoDB's Array type to implement a stack?
I know that I can append to an Array like so:
db.blogposts.update( {_id:5}, {$push: {comments: {by: "Abe", text:"First"}}})
Here, the end of the array is the top of the stack... I don't see a way to implement this with the top of the stack at the zero'th index, but I'd love to be wrong.
And I know that I can peek at the last value of the the array like so:
db.blogposts.find( {_id:5}, {comments: {$slice:-1}})
With an implementation like this, can I "peek" at the top of the stack in a MongoDB update statement? That would give me the semantic, "push this item on the stack if the top of the stack is X". I need this to be an atomic operation!
Any advice appreciated. Thanks!
Unfortunately, there is currently no way to do this exactly as you have described.
As Chris Shain pointed out, https://jira.mongodb.org/browse/SERVER-2191 - "$push() to front of array" and similarly https://jira.mongodb.org/browse/SERVER-1824 - "Support for inserting into specific array index" would help, but these features are currently not slated for a specific release version.
As a possible work-around, you could add a field named "lastElement" (or equivalent) to your document, which contains a copy of the last element pushed to the array. In your update statement, you could then query against the "lastElement" value, and if it matches, simultaneously set it to the new value and push the same value to the array in a single, atomic operation.
For example:
> db.blogposts.save({_id:5, comments:[{by: "Abe", text:"First"}], lastElement:{by: "Abe", text:"First"}})
> db.blogposts.find().pretty()
{
"_id" : 5,
"comments" : [
{
"by" : "Abe",
"text" : "First"
}
],
"lastElement" : {
"by" : "Abe",
"text" : "First"
}
}
> db.blogposts.update({"lastElement.text":"First"}, {$set:{lastElement:{by: "Joe", text:"Second"}}, $push:{comments:{by: "Joe", text:"Second"}}})
> db.blogposts.find().pretty()
{
"_id" : 5,
"comments" : [
{
"by" : "Abe",
"text" : "First"
},
{
"by" : "Joe",
"text" : "Second"
}
],
"lastElement" : {
"by" : "Joe",
"text" : "Second"
}
}
>
As an alternative, you may consider the strategy outlined in the "Update if Current" section of the "Atomic Operations" documentation: http://www.mongodb.org/display/DOCS/Atomic+Operations
I realize these are work-arounds and not ideal solutions. Hopefully the above will help you to accomplish your goal, or at least provide some food for thought for you to come up with a different solution. If you do, please share it here so that any members of the Community who may be experiencing similar issues may have the benefit of your experience. Thanks.
Looks like as of mongoDB v2.6, this is now supported via the $position operator: http://docs.mongodb.org/manual/reference/operator/update/position/
db.blogposts.update(
{_id:5}
, {$push:
{comments:
{$each: {by: "Abe", text:"First"}
, $position:0 }
}
}
);