Find Subdocument in Array with mongodb - mongodb

I am playing around with the The bios Example Collection from http://docs.mongodb.org/manual/reference/bios-example-collection to educate myself about querying mongodb.
I want to retrieve informations about the awards won by _id : 1 in year : 1975.
I tried several queries, among those
bios.find({
"_id" : 1,
"awards" : {
"year" : 1975
}
});
but I never receive the proper document back. How can I retrieve this document in the array?

You have to use the dot notation:
bios.find({"_id" : 1, "awards.year" : 1975 });
It's a rather pointless query, because you also have the _id in the query, but I guess that's due to the fact that you're playing with an example. Also, you're saying you're looking for awards from 1967, but the code says 1975.
If you search for "awards" : { "year" : 1975 }, mongodb will look for an exact match of the entire subdocument awards. In this case, that is not what you want. Also, since awards is an array, this will always be false. If you wanted to look up a specific award document in a list, $elemMatch would be the way to go.

Related

Complex-ish mongo query runs fairly slow, combination of $and $or $in and regex

I'm running some queries to a mongodb 2.4.9 server that populate a datatable on a webpage. The user needs to be able to do a substring search across multiple fields, sort the data on various columns, and flip through the results in pages. I have to check multiple fields for matches since the user could be searching for anything related to the documents. There are about 300,000 documents in the collection so the database is relatively small.
I have indexes created for the created_by, requester, desc.name, metaprogram.id, program.id, and arr.programid fields. I've also created indexes [("created", 1), ("created_by", 1), ("requester", 1)] and [("created_by", 1), ("requester", 1)] at the suggestion of Dex.
It's also worth mentioning that documents might not have all of the fields that are being searched for here. Some documents might have a metaprogram.id but not the other ID fields for example.
An example of a query I might run is
{
"$query" : {
"$and" : [
{
"created_by" : {"$ne" : "automation"},
"requester" : {"$in" : ["Broadway", "Spec", "Falcon"] }
},
{
"$or" : [
{"requester" : /month/i },
{"created_by" : /month/i },
{"desc.name" : /month/i },
{"metaprogram.id" : {"$in" : [708, 2314, 709 ] } },
{"program.id" : {"$in" : [708, 2314, 709 ] } },
{"arr.programid" : {"$in" : [708, 2314, 709 ] } }
]
}
]
},
"$orderby" : {
"created" : 1
}
}
with differing orderby, limit, and skip values as well.
Queries on average take 500-1500ms to complete.
I've looked into how to make it faster, but haven't been able to come up with anything. Some of the text searching stuff looks handy but as far as I know each collection only supports at most one text index and it doesn't support pagination (skips). I'm sure that prefix searching instead of regex substring matches would be faster as well but I need substring matching.
Is there anything you can think of to improve the speed of a query like this?
It's quite hard to optimize a query when it's unpredictable.
Analyze how the system is being used and place indexes on the most popular fields.
Use .explain() to make sure the indexes are being used.
Also limit the results returned to a value of 50 or 100. The user doesn't need to see everything at once.
Try upgrading mongodb to see if there's a performance improvement.
Side note:
You might want to consider using ElasticSearch as a search engine instead of Mongodb. ElasticSearch would store the searchable fields and return the Mongodb Ids for matched results. ElasticSearch is a magnitude faster as a search engine than Mongodb.
More info:
How to find queries not using indexes or slow in mongodb
Range query for MongoDB pagination
http://www.elasticsearch.org/overview/

Search full document in mongodb for a match

Is there a way to match a value with every array and sub document inside the document in mongodb collection and return the document
{
"_id" : "2000001956",
"trimline1" : "abc",
"trimline2" : "xyz",
"subtitle" : "www",
"image" : {
"large" : 0,
"small" : 0,
"tiled" : 0,
"cropped" : false
},
"Kytrr" : {
"count" : 0,
"assigned" : 0
}
}
for eg if in the above document I am searching for xyz or "ab" or "xy" or "z" or "0" this document should be returned.
I actually have to achieve this at the back end using C# driver but a mongo query would also help greatly.
Please advice.
Thanks
You could probably do this using '$where'
db.mycollection({$where:"JSON.stringify(this).indexOf('xyz')!=-1"})
I'm converting the whole record to a big string and then searching to see if your element is in the resulting string. Probably won't work if your xyz is in the fieldnames!
You can make it iterate through the fields to make a big string and then search it though.
This isn't the most elegant way and will involve a full tablescan. It will be faster if you look through the individual fields!
While Malcolm's answer above would work, when your collection gets large or you have high traffic, you'll see this fall over pretty quickly. This is because of 2 things. First, dropping down to javascript is a big deal and second, this will always be a full table scan because $where can't use an index.
MongoDB 2.6 introduced text indexing which is on by default (it was in beta in 2.4). With it, you can have a full text index on all the fields in the document. The documentation gives the following example where a text index is created for every field and names the index "TextIndex".
db.collection.ensureIndex(
{ "$**": "text" },
{ name: "TextIndex" }
)

Mongodb tail subdocuments

I have a collection with users. Each user has comments. I want to track for some specific users (according to theirs ids) if there is a new comment.
Tailable cursor I guess are what I need but my main problem is that I want to track subdocuments and not documents.
Sample of tracking documents in python:
db = Connection().my_db
coll = db.my_collection
cursor = coll.find(tailable=True)
while cursor.alive:
try:
doc = cursor.next()
print doc
except StopIteration:
time.sleep(1)
One solution is to run intervals every x time and see if the number of the comments has changed. However I do not find the interval solution very appealing. Is there any better way to track changes? Probably with tailable cursors.
PS: I have a comment_id field (which is an ObjectID) in each comment.
Small update:
Since I have the commect_id bson, I can store the biggest (=latest) one in each user. Then run intervals compare the bson if it's still the latest one. I don't mind not to be a precisely real time method. Even 10 minutes of delay is fine. However now I have 70k users and 180k comments but I worry for the scalability of this method.
This would be my solution. Evaluate if it fits your requirement -
I am assuming a data structure as follows
db.user.find().pretty()
{
"_id" : ObjectId("5335123d900f7849d5ea2530"),
"user_id" : 200,
"comments" : [
{
"comment_id" : 1,
"comment" : "hi",
"createDate" : ISODate("2012-01-01T00:00:00Z")
},
{
"comment_id" : 2,
"comment" : "bye",
"createDate" : ISODate("2013-01-01T00:00:00Z")
}
]
}
{
"_id" : ObjectId("5335123e900f7849d5ea2531"),
"user_id" : 201,
"comments" : [
{
"comment_id" : 3,
"comment" : "hi",
"createDate" : ISODate("2012-01-01T00:00:00Z")
},
{
"comment_id" : 4,
"comment" : "bye",
"createDate" : ISODate("2013-01-01T00:00:00Z")
}
]
}
I added createDate attribute to the document. Add an index as follows -
db.user.ensureIndex({"user_id":1,"comments.createDate":-1})
You can search for latest comments with the query -
db.user.find({"user_id":200,"comments.createDate":{$gt:ISODate('2012-12-31')}})
The time used for "greater than" comparison would be last checked time. Since you are using index, the search will be faster. You can follow the same idea of checking in for new comments in some interval.
You can also use UTC time stamp, instead of ISODate. That way you don't have to worry about bson data type.
Note that while creating index on createDate, I have specified descending index.
If you will have too many comments within a user document, over a period of time, I would suggest that, you move comments to a different collection. Use user_id as one of the attributes in the comment document. That will give a better performance in the long run.

Get the latest record from mongodb collection

I want to know the most recent record in a collection. How to do that?
Note: I know the following command line queries works:
1. db.test.find().sort({"idate":-1}).limit(1).forEach(printjson);
2. db.test.find().skip(db.test.count()-1).forEach(printjson)
where idate has the timestamp added.
The problem is longer the collection is the time to get back the data and my 'test' collection is really really huge. I need a query with constant time response.
If there is any better mongodb command line query, do let me know.
This is a rehash of the previous answer but it's more likely to work on different mongodb versions.
db.collection.find().limit(1).sort({$natural:-1})
This will give you one last document for a collection
db.collectionName.findOne({}, {sort:{$natural:-1}})
$natural:-1 means order opposite of the one that records are inserted in.
Edit: For all the downvoters, above is a Mongoose syntax,
mongo CLI syntax is: db.collectionName.find({}).sort({$natural:-1}).limit(1)
Yet another way of getting the last item from a MongoDB Collection (don't mind about the examples):
> db.collection.find().sort({'_id':-1}).limit(1)
Normal Projection
> db.Sports.find()
{ "_id" : ObjectId("5bfb5f82dea65504b456ab12"), "Type" : "NFL", "Head" : "Patriots Won SuperBowl 2017", "Body" : "Again, the Pats won the Super Bowl." }
{ "_id" : ObjectId("5bfb6011dea65504b456ab13"), "Type" : "World Cup 2018", "Head" : "Brazil Qualified for Round of 16", "Body" : "The Brazilians are happy today, due to the qualification of the Brazilian Team for the Round of 16 for the World Cup 2018." }
{ "_id" : ObjectId("5bfb60b1dea65504b456ab14"), "Type" : "F1", "Head" : "Ferrari Lost Championship", "Body" : "By two positions, Ferrari loses the F1 Championship, leaving the Italians in tears." }
Sorted Projection ( _id: reverse order )
> db.Sports.find().sort({'_id':-1})
{ "_id" : ObjectId("5bfb60b1dea65504b456ab14"), "Type" : "F1", "Head" : "Ferrari Lost Championship", "Body" : "By two positions, Ferrari loses the F1 Championship, leaving the Italians in tears." }
{ "_id" : ObjectId("5bfb6011dea65504b456ab13"), "Type" : "World Cup 2018", "Head" : "Brazil Qualified for Round of 16", "Body" : "The Brazilians are happy today, due to the qualification of the Brazilian Team for the Round of 16 for the World Cup 2018." }
{ "_id" : ObjectId("5bfb5f82dea65504b456ab12"), "Type" : "NFL", "Head" : "Patriots Won SuperBowl 2018", "Body" : "Again, the Pats won the Super Bowl" }
sort({'_id':-1}), defines a projection in descending order of all documents, based on their _ids.
Sorted Projection ( _id: reverse order ): getting the latest (last) document from a collection.
> db.Sports.find().sort({'_id':-1}).limit(1)
{ "_id" : ObjectId("5bfb60b1dea65504b456ab14"), "Type" : "F1", "Head" : "Ferrari Lost Championship", "Body" : "By two positions, Ferrari loses the F1 Championship, leaving the Italians in tears." }
I need a query with constant time response
By default, the indexes in MongoDB are B-Trees. Searching a B-Tree is a O(logN) operation, so even find({_id:...}) will not provide constant time, O(1) responses.
That stated, you can also sort by the _id if you are using ObjectId for you IDs. See here for details. Of course, even that is only good to the last second.
You may to resort to "writing twice". Write once to the main collection and write again to a "last updated" collection. Without transactions this will not be perfect, but with only one item in the "last updated" collection it will always be fast.
php7.1 mongoDB:
$data = $collection->findOne([],['sort' => ['_id' => -1],'projection' => ['_id' => 1]]);
My Solution :
db.collection("name of collection").find({}, {limit: 1}).sort({$natural: -1})
If you are using auto-generated Mongo Object Ids in your document, it contains timestamp in it as first 4 bytes using which latest doc inserted into the collection could be found out. I understand this is an old question, but if someone is still ending up here looking for one more alternative.
db.collectionName.aggregate(
[{$group: {_id: null, latestDocId: { $max: "$_id"}}}, {$project: {_id: 0, latestDocId: 1}}])
Above query would give the _id for the latest doc inserted into the collection
This is how to get the last record from all MongoDB documents from the "foo" collection.(change foo,x,y.. etc.)
db.foo.aggregate([{$sort:{ x : 1, date : 1 } },{$group: { _id: "$x" ,y: {$last:"$y"},yz: {$last:"$yz"},date: { $last : "$date" }}} ],{ allowDiskUse:true })
you can add or remove from the group
help articles: https://docs.mongodb.com/manual/reference/operator/aggregation/group/#pipe._S_group
https://docs.mongodb.com/manual/reference/operator/aggregation/last/
Mongo CLI syntax:
db.collectionName.find({}).sort({$natural:-1}).limit(1)
Let Mongo create the ID, it is an auto-incremented hash
mymongo:
self._collection.find().sort("_id",-1).limit(1)

MongoDB search - find newest within document

I've got a collection containing a few hundreds thousand documents, and I need to efficiently perform a check on one of those documents.
Here's an example of one of those documents:
{
"_id" : ObjectId( "4ee9b4239641dce218030000" ),
"last_seen" : Date( 1323938851402 ),
"is" : {
"4ee910de9641dc4906000000" : Date( 1323938851302 ),
"4ee211df9621dc4206000000" : Date( 1323938851402 ),
"4ef913de9631db4922000000" : Date( 1323938851102 ),
}
}
I know the _id of the document is 4ee9b4239641dce218030000, and I need to determine whether the item in the "is" object with ID 4ee211df9621dc4206000000 is the newest out of the 3 and has a timestamp value of in the last 5 minutes.
If it helps the timestamp stored in "last_seen" will always be the same value as the latest record in the "is" object. Perhaps they could be compared?
Any thoughts on how this can be done with MongoDB efficiently?
Why would you want to solve this with a query? You're already doing a find-by-id. Just write code that fetches the latest element from "is". Don't overcomplicate things by changing your schema or make arbitrary assumptions about recent activity timewindows when you can do it with one line of code in whatever language you prefer working with.
How about making "is" an array, sorted by Date, and then the query would be something like this:
`{$and : [ {_id : "4ee9b..." },
{is[0].date: {$gt : now-5min }},
{is[0]._id : "4ee21.." }]}`
You get no results unless item "4ee9b.." is the newest and less than 5min old.
best to keep "is" in another collection. If so, you can sort it anyway you like so find whether it's the latest id or not.