Mongodb find inside sub array - mongodb

I have a document that's setup like this:
{
_id : ObjectId(),
info : [
[
1399583281000,
20.13
],
[
1399583282000,
20.13
],
[
1399583283000,
20.13
],
[
1399583285000,
20.13
],
[
1399583286000,
20.13
]
]
}
This data could be spread across multiple documents. In general, each document contains data in the info for 59 periods (seconds).
What I would like to do is get all of the info data where the timestamp is greater than a specific time.
Any ideas how I would go about doing this?
Thank you
EDIT:
So, I've found that this seems to return all of the documents:
db.infos.find({
info:{
$elemMatch:{
0:{
$gt:1399583306000
}
}
}
})
But maybe I need to use this in an aggregate query? so that it will return just all the values?

Your on the right track, but there are a few things to note here, aside from the part that nested arrays ( and especially with anonymous keys) are not exactly a great way to store things, but as long as you consistently know the position then that should be reasonably okay.
There is a distinct difference between matching documents and matching "elements of an array". Though your current value would actually not match (your search value is not within the bounds of the document), if the value actually was valid your query correctly matches the "document" here, which contains a matching element in the array.
The "document" contains all of the array elements, even those that do not match, but the condition says the "document" does match, so it is returned. If you just want the matching "elements" then use .aggregate() instead:
db.infos.aggregate([
// Still match the document
{ "$match": {
"info": {
"$elemMatch": { "0": {"$gte": 1399583285000} }
}
}},
// unwind the array for the matched documents
{ "$unwind": "$info" },
// Match only the elements
{ "$match": { "info.0": { "$gte": 1399583285000 } } },
// Group back to the original form if you want
{ "$group": {
"_id": "$_id",
"info": { "$push": "$info" }
}}
])
And that returns just the elements that matched the condition:
{
"_id" : ObjectId("536c1145e99dc11e65ed07ce"),
"info" : [
[
1399583285000,
20.13
],
[
1399583286000,
20.13
]
]
}
Or course if you only ever expected one element to match, then you could simply use projection with .find()**:
db.infos.find(
{
"info":{
"$elemMatch":{
"0": {
"$gt": 1399583285000
}
}
}
},
{
"info.$": 1
}
)
But with a term like $gt you are likely to get multiple hits within a document so the aggregate approach is going to be safer considering that the positional $ operator is only going to return the first match.

Related

How to $concat two fields inside a subdocument in mongodb

WHAT I WANT TO ACHIEVE
Let us say I have this object:
{
"_id" : ObjectId("5aec063380a7490014e88792"),
"personal_info" : {
"dialing_code": "+44",
"phone_number": "67467885664"
}
}
I need to concat the two values personal_info.dialing_code (+44) and phone_number (67467885664) into one. +4467467885664 and compare it to a value. I need to retrieve a specific record from the database that will match the said value.
PROBLEM
I am having trouble concatinating two fields inside a subdocument and I am receiving this error:
{
"name": "MongoError",
"message": "$concat only supports strings, not object",
"ok": 0,
"errmsg": "$concat only supports strings, not object",
"code": 16702,
"codeName": "Location16702"
}
ATTEMPT #1
I have tried this:
UserModel.aggregate([
{ $unwind: '$personal_info' },
{$project: {
concat_p: {$concat: [
'$personal_info.dialing_code',
'$personal_info.phone_number'
]}
}}
])
It is giving me an error as mentioned above and in result I cannot do a $match right after.
ATTEMPT #2
I also tried this:
UserModel.aggregate([
{ $unwind: '$personal_info' },
{$project: {
p_dialing_code: '$personal_info.dialing_code',
p_phone_number: '$personal_info.phone_number',
concat_p: {$concat: [
'$p_dialing_code',
'$p_phone_number'
]}
}}
])
I have successfully took out the subdocument values one level however when I tried concatinating, it is producing me null values. This is the result I am getting:
{
"_id": "5af0998036daa90014129d6e",
"p_dialing_code": "+44",
"p_phone_number": "13231213213244",
"concat_p": null
}
I know how to do it on the $match pipeline but I have no luck concatinating the values inside the subdocument. Clearly, I need to do this first before I can compare. Thanks
It seems like you have different types under personal_info.dialing_code and personal_info.phone_number fields. In your example $concat is applied to every document in your collection and that's why you're getting an exception since $concat strictly expects its parameters to be strings.
So it will be working fine for document posted in your question but will throw an exception for something like this:
{
"_id" : ObjectId("5aec063380a7490014e88792"),
"personal_info" : {
"dialing_code": {},
"phone_number": "67467885664"
}
}
One way to fix this is to add $match condition before $project and use $type operator to get only documents having strings on those fields you want to concatenate.
db.UserModel.aggregate([
{
$match: {
$expr: {
$and: [
{ $eq: [ { $type: "$personal_info.dialing_code" }, "string" ] },
{ $eq: [ { $type: "$personal_info.phone_number" }, "string" ] }
]
}
}
},
{$project: {
concat_p: {$concat: [
"$personal_info.dialing_code",
"$personal_info.phone_number"
]}
}}
])

Mongo Sort by Count of Matches in Array

Lets say my test data is
db.multiArr.insert({"ID" : "fruit1","Keys" : ["apple", "orange", "banana"]})
db.multiArr.insert({"ID" : "fruit2","Keys" : ["apple", "carrot", "banana"]})
to get individual fruit like carrot i do
db.multiArr.find({'Keys':{$in:['carrot']}})
when i do an or query for orange and banana, i see both the records fruit1 and then fruit2
db.multiArr.find({ $or: [{'Keys':{$in:['carrot']}}, {'Keys':{$in:['banana']}}]})
Result of the output should be fruit2 and then fruit1, because fruit2 has both carrot and banana
To actually answer this first, you need to "calculate" the number of matches to the given condition in order to "sort" the results to return with the preference to the most matches on top.
For this you need the aggregation framework, which is what you use for "calculation" and "manipulation" of data in MongoDB:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$project": {
"ID": 1,
"Keys": 1,
"order": {
"$size": {
"$setIntersection": [ ["carrot", "banana"], "$Keys" ]
}
}
}},
{ "$sort": { "order": -1 } }
])
On an MongoDB older than version 3, then you can do the longer form:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$unwind": "$Keys" },
{ "$group": {
"_id": "$_id",
"ID": { "$first": "$ID" },
"Keys": { "$push": "$Keys" },
"order": {
"$sum": {
{ "$cond": [
{ "$or": [
{ "$eq": [ "$Keys", "carrot" ] },
{ "$eq": [ "$Keys", "banana" ] }
]},
1,
0
]}
}
}
}},
{ "$sort": { "order": -1 } }
])
In either case the function here is to first match the possible documents to the conditions by providing a "list" of arguments with $in. Once the results are obtained you want to "count" the number of matching elements in the array to the "list" of possible values provided.
In the modern form the $setIntersection operator compares the two "lists" returning a new array that only contains the "unique" matching members. Since we want to know how many matches that was, we simply return the $size of that list.
In older versions, you pull apart the document array with $unwind in order to perform operations on it since older versions lacked the newer operators that worked with arrays without alteration. The process then looks at each value individually and if either expression in $or matches the possible values then the $cond ternary returns a value of 1 to the $sum accumulator, otherwise 0. The net result is the same "count of matches" as shown for the modern version.
The final thing is simply to $sort the results based on the "count of matches" that was returned so the most matches is on "top". This is is "descending order" and therefore you supply the -1 to indicate that.
Addendum concerning $in and arrays
You are misunderstanding a couple of things about MongoDB queries for starters. The $in operator is actually intended for a "list" of arguments like this:
{ "Keys": { "$in": [ "carrot", "banana" ] } }
Which is essentially the shorthand way of saying "Match either 'carrot' or 'banana' in the property 'Keys'". And could even be written in long form like this:
{ "$or": [{ "Keys": "carrot" }, { "Keys": "banana" }] }
Which really should lead you to if it were a "singular" match condition, then you simply supply the value to match to the property:
{ "Keys": "carrot" }
So that should cover the misconception that you use $in to match a property that is an array within a document. Rather the "reverse" case is the intended usage where instead you supply a "list of arguments" to match a given property, be that property an array or just a single value.
The MongoDB query engine makes no distinction between a single value or an array of values in an equality or similar operation.

How to concatenate all values and find specific substring in Mongodb?

I have json document like this:
{
"A": [
{
"C": "abc",
"D": "de"
},
{
"C": "fg",
"D": "hi"
}
]
}
I would check whether "A" contains string ef or not.
first Concatenate all values abcdefghi then search for ef
In XML, XPATH it would be something like:
//A[contains(., 'ef')]
Is there any similar query in Mongodb?
All options are pretty horrible for this type of search, but there are a few approaches you can take. Please note though that the end case here is likely the best solution, but I present the options in order to illustrate the problem.
If your keys in the array "A" are consistently defined and always contained an array, you would be searching like this:
db.collection.aggregate([
// Filter the documents containing your parts
{ "$match": {
"$and": [
{ "$or": [
{ "A.C": /e/ },
{ "A.D": /e/ }
]},
{"$or": [
{ "A.C": /f/ },
{ "A.D": /f/ }
]}
]
}},
// Keep the original form and a copy of the array
{ "$project": {
"_id": {
"_id": "$_id",
"A": "$A"
},
"A": 1
}},
// Unwind the array
{ "$unwind": "$A" },
// Join the two fields and push to a single array
{ "$group": {
"_id": "$_id",
"joined": { "$push": {
"$concat": [ "$A.C", "$A.D" ]
}}
}},
// Copy the array
{ "$project": {
"C": "$joined",
"D": "$joined"
}},
// Unwind both arrays
{ "$unwind": "$C" },
{ "$unwind": "$D" },
// Join the copies and test if they are the same
{ "$project": {
"joined": { "$concat": [ "$C", "$D" ] },
"same": { "$eq": [ "$C", "$D" ] },
}},
// Discard the "same" elements and search for the required string
{ "$match": {
"same": false,
"joined": { "$regex": "ef" }
}},
// Project the origial form of the matching documents
{ "$project": {
"_id": "$_id._id",
"A": "$_id.A"
}}
])
So apart from the horrible $regex matching there are a few hoops to go through in order to get the fields "joined" in order to again search for the string in sequence. Also note the reverse joining that is possible here that could possibly produce a false positive. Currently there would be no simple way to avoid that reverse join or otherwise filter it, so there is that to consider.
Another approach is to basically run everything through arbitrary JavaScript. The mapReduce method can be your vehicle for this. Here you can be a bit looser with the types of data that can be contained in "A" and try to tie in some more conditional matching to attempt to reduce the set of documents you are working on:
db.collection.mapReduce(
function () {
var joined = "";
if ( Object.prototype.toString.call( this.A ) === '[object Array]' ) {
this.A.forEach(function(doc) {
for ( var k in doc ) {
joined += doc[k];
}
});
} else {
joined = this.A; // presuming this is just a string
}
var id = this._id;
delete this["_id"];
if ( joined.match(/ef/) )
emit( id, this );
},
function(){}, // will not reduce
{
"query": {
"$or": [
{ "A": /ef/ },
{ "$and": [
{ "$or": [
{ "A.C": /e/ },
{ "A.D": /e/ }
]},
{"$or": [
{ "A.C": /f/ },
{ "A.D": /f/ }
]}
] }
]
},
"out": { "inline": 1 }
}
);
So you can use that with whatever arbitrary logic to search the contained objects. This one just differentiates between "arrays" and presumes otherwise a string, allowing the additional part of the query to just search for the matching "string" element first, and which is a "short circuit" evaluation.
But really at the end of the day, the best approach is to simply have the data present in your document, and you would have to maintain this yourself as you update the document contents:
{
"A": [
{
"C": "abc",
"D": "de"
},
{
"C": "fg",
"D": "hi"
}
],
"search": "abcdefghi"
}
So that is still going to invoke a horrible usage of $regex type queries but at least this avoids ( or rather shifts to writing the document ) the overhead of "joining" the elements in order to effect the search for your desired string.
Where this eventually leads is that a "full blown" text search solution, and that means an external one at this time as opposed to the text search facilities in MongoDB, is probably going to be your best performance option.
Either using the "pre-stored" approach in creating your "joined" field or otherwise where supported ( Solr is one solution that can do this ) have a "computed field" in this text index that is created when indexing document content.
At any rate, those are the approaches and the general point of the problem. This is not XPath searching, not is their some "XPath like" view of an entire collection in this sense, so you are best suited to structuring your data towards the methods that are going to give you the best performance.
With all of that said, your sample here is a fairly contrived example, and if you had an actual use case for something "like" this, then that actual case may make a very interesting question indeed. Actual cases generally have different solutions than the contrived ones. But now you have something to consider.

Mongo. Narroving down results of nested array

If I have a document like this:
{
"name" : "Foo",
"words" :
[
"lorem",
"ipsum",
"dolor",
"sit",
"amet",
...
]
}
Let's say this words array is pretty big. Now I need a query that would fetch that document:
db.docs.find({'name':'Foo'}) - that will get whole document
but what I want, instead of fetching the entire words array (cause it's too big) I would like to retrieve only elements that meet some criteria. Let's say I want to see only words that start with "a" or have a length of at least 3 characters.
You know maybe something like this:
// this won't work!
db.docs.find({
"$where":"(this.words.map(function(e){ if (e.length >=3) { return e } }))"
})
EDIT
You cannot filter array contents using find, You can only match that the array contains the condition. So in order to filter the contents of the array you need to make use of aggregate:
db.docs.aggregate([
// Still makes sense to match the documents that meet the condition
{ "$match": {
"name": "Foo",
"words": { "$regex": "^[A-Za-z0-9_]{4,}" }
}},
// Unwind the array to "de-normalize"
{ "$unwind": "$words" },
// Actually "filter" the array elements
{ "$match": { "words": { "$regex": "^[A-Za-z0-9_]{4,}" } } },
// Group back the document with the "filtered" array
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"words": { "$push": "$words" }
}}
])
That makes use a regular expression condition that will match at least 4 characters from the start of the string. The ^ anchor is quite important here as it allows an index to be used which is much more optimal than whatever else you can do.
The result returned will look like this:
{
"result" : [
{
"_id" : ObjectId("5341f0476cbcc02b995092ac"),
"name" : "Foo",
"words" : [
"lorem",
"ipsum",
"dolor"
]
}
],
"ok" : 1
}
You can also throw a lot of arbitrary JavaScript at mapReduce and test the length of elements in the array, but that will take considerably longer to execute.
--
The terms are quite simple, you simply add the additional operator to the query document as so:
db.docs.find({ "name": "Foo", "$where": "(this.words.length > 3)" })
You really should not be using the $where operator unless absolutely necessary, and even then you really should think about what you are doing. Heed the warnings that are given in that document.
As stated in the manual page for $size, probably the best way to deal with detecting array length for a given range (rather than exact) is to create a "counter" field in your document that is updated as elements are added/removed from the array. This makes a very simple and efficient query:
db.docs.find({ "name": "Foo", "counter": { "$gt": 3 } })
Of course from MongoDB versions 2.6 and upwards you can also do this:
db.docs.aggregate([
{ "$project": {
"name": 1,
"words": 1,
"count": { "$size": "$words" }
}},
{ "$match": {
"count": { "$gt": 3 }
}}
])
Either of those forms is going to perform a lot better than using something that is going to remove the use of an index and then invoke the JavaScript interpreter over each resulting document. Or even just use the $size operator for an exact size of the array.

How to sort by 'value' of a specific key within a property stored as an array with k-v pairs in mongodb

I have a mongodb collection, let's call it rows containing documents with the following general structure:
{
"setid" : 154421,
"date" : ISODate("2014-02-22T14:06:48.229Z"),
"version" : 2,
"data" : [
{
"k" : "name",
"v" : "ryan"
},
{
"k" : "points",
"v" : "375"
},
{
"k" : "email",
"v" : "ryan#123.com"
}
],
}
There is no guarantee what values of k and v might populate the "data" property for any particular document (eg. other documents might have 5 k-v pairs with different key names in it). The only rule is that documents with the same setid have the same k-v pairs. (i.e. the rows collection might hold 100 other documents with setid = 154421, that have the same set of 3 keys in the data property: "name", "points", "email", with their own respective values.
How would one, with this setup, construct a query to retrieve all rows with a particular setid sorted by points? I need, in effect, some way of saying 'sort by the the field data.v where the value of k==points or something like that...?
Something like this:
db.rows.find({setid:154421},{$sort:{'data.v',-1}, {$where: k:'points'}}})
I know this is the incorrect syntax, but I'm just taking a stab at it to illustrate my point.
Is it possible?
Assuming that what you want would be all the documents that have the "points" value as a "key" in the array, and then sort on the "value" for that "key", then this is a little out of scope for the .find() method.
Reason being if you did something like this
db.collection.find({
"setid": 154421, "data.k": "point" }
).sort({ "data.v" : -1 })
The problem is that even though the matched elements do have the matching key of "point", there is no way of telling which data.v you are referring to for the sort. Also, a sort within .find() results will not do something like this:
db.collection.find({
"setid": 154421, "data.k": "point" }
).sort({ "data.$.v" : -1 })
Which would be trying to use a positional operator within a sort, essentially telling which element to use the value of v on. But this is not supported and not likely to be, and for the most likely explaination, that "index" value would be likely different in every document.
But this kind of selective sorting can be done with the use of .aggregate().
db.collection.aggregate([
// Actually shouldn't need the setid
{ "$match": { "data": {"$elemMatch": { "k": "points" } } } },
// Saving the original document before you filter
{ "$project": {
"doc": {
"_id": "$_id",
"setid": "$setid",
"date": "$date",
"version": "$version",
"data": "$data"
},
"data": "$data"
}}
// Unwind the array
{ "$unwind": "$data" },
// Match the "points" entries, so filtering to only these
{ "$match": { "data.k": "points" } },
// Sort on the value, presuming you want the highest
{ "$sort": { "data.v": -1 } },
// Restore the document
{ "$project": {
"setid": "$doc.setid",
"date": "$doc.date",
"version": "$doc.version",
"data": "$doc.data"
}}
])
Of course that presumes the data array only has the one element that has the key points. If there were more than one, you would need to $group before the sort like this:
// Group to remove the duplicates and get highest
{ "$group": {
"_id": "$doc",
"value": { "$max": "$data.v" }
}},
// Sort on the value
{ "$sort": { "value": -1 } },
// Restore the document
{ "$project": {
"_id": "$_id._id",
"setid": "$_id.setid",
"date": "$_id.date",
"version": "$_id.version",
"data": "$_id.data"
}}
So there is one usage of .aggregate() in order to do some complex sorting on documents and still return the original document result in full.
Do some more reading on aggregation operators and the general framework. It's a useful tool to learn that takes you beyond .find().