Array intersection in MongoDB - mongodb

Ok there are a couple of things going on here..I have two collections: test and test1. The documents in both collections have an array field (tags and tags1, respectively) that contains some tags. I need to find the intersection of these tags and also fetch the whole document from collection test1 if even a single tag matches.
> db.test.find();
{
"_id" : ObjectId("5166c19b32d001b79b32c72a"),
"tags" : [
"a",
"b",
"c"
]
}
> db.test1.find();
{
"_id" : ObjectId("5166c1c532d001b79b32c72b"),
"tags1" : [
"a",
"b",
"x",
"y"
]
}
> db.test.find().forEach(function(doc){db.test1.find({tags1:{$in:doc.tags}})});
Surprisingly this doesn't return anything. However when I try it with a single document, it works:
> var doc = db.test.findOne();
> db.test1.find({tags1:{$in:doc.tags}});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a", "b", "x", "y" ] }
But this is part of what I need. I need intersection as well. So I tried this:
> db.test1.find({tags1:{$in:doc.tags}},{"tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a" ] }
But it returned just "a" whereas "a" and "b" both were in tags1. Does positional operator return just the first match? Also, using $in won't exactly give me an intersection..How can I get an intersection (should return "a" and "b") irrespective of which array is compared against the other.
Now say there's an operator that can do this..
> db.test1.find({tags1:{$intersection:doc.tags}},{"tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a", "b" ] }
My requirement is, I need the entire tags1 array PLUS this intersection, in the same query like this:
> db.test1.find({tags1:{$intersection:doc.tags}},{"tags1":1, "tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1": [ "a", "b", "x", "y" ],
"tags1" : [ "a", "b" ] }
But this is an invalid json. Is renaming key possible, or this is possible only through aggregation framework (and across different collections?)? I tried the above query with $in. But it behaved as if it totally ignored "tags:1" projection.
PS: I am going to have at least 10k docs in test1 and very few (<10) in test. And this query is in real-time, so I want to avoid mapreduce :)
Thanks for any help!

In newer versions you can use aggregation to accomplish this.
db.test.aggregate(
{
$match: {
tags1: {
$in: doc.tags
}
}
},
{
$project: {
tags1: 1,
intersection: {
$setIntersection: [doc.tags, "$tags1"]
}
}
}
);
As you can see, the match portion is exactly the same as your initial find() query. The project portion generates the result fields. In this case, it selects tags1 from the matching documents and also creates intersection from the input and the matching docs.

Mongo doesn't have any inherent ability to retrieve array intersections. If you really need to use ad-hoc querying get the intersection on the client side.
On the other hand, consider using Map-Reduce and storing it's output as a collection. You can augment the returned objects in the finalize section to add the intersecting tags. Cron MR to run every few seconds. You get the benefit of a permanent collection you can query from on the client side.

If you want to have this in realtime you should consider to move away from Serverside Javascript which is only run with one thread and should be quite slow (single threaded) (this is no longer true for v2.4, http://docs.mongodb.org/manual/core/server-side-javascript/).
The positional operator only returns the first matching/current value. Without knowing the internal implementation, from the point of performance it doesn't even makes sense to look for further matching criteria if the document was already evaluated as match. So I doubt that you can go for this.
I don't know if you need the cartesian product for your search, but I would consider joining your few test one document tags into one and then have some $in search for it on test1, returning all matching documents. On your local machine you could have multiple threads which generate the intersection for your document.
Depending on how frequent your test1 and test collection changes, you're performing this query you might precalculate this information. Which would allow to easily do a query on the field which contains the intersection information.
The document is invalid because you have two fields names tags1

Related

MongoDB - Looking up documents based on criteria defined by the documents themselves

Overall, I am trying to find a system design to quickly look up stored objects whose metadata matches data bundled on incoming events. Which fields are required, however, are themselves part of the stored objects, and are not fields that I can hardcode into a lookup query.
My system has a policies collection stored in MongoDB with documents that look like this:
{
id: 123,
name: "Jason's Policy",
requirements: {
"var1": "aaa",
"var2": "bbb"
// could have any number more, and each policy can have different field/values under requirements
}
}
My system receives events that look like this:
// Event 1 - matches all requirements under above policy
{
id: 777,
"var1": "aaa",
"var2": "bbb"
}
// Event 2 - does not match all requirements from above policy since var1 is undefined
{
id: 888,
"var2": "bbb",
"var3": "zzz"
}
As I receive events, how can I efficiently look up all the policies whose requirements are fully satisfied by the values received in the event?
As an example, in the sample data above, event 1 should return the policy (since var1 and var2 match the policy requirements), but event 2 should not return the policy (since var1 does not match/ is missing).
I can think of brute-force ways to do this on the application server itself (think nested for loops) but efficiency will be key as we receive hundreds of events per second.
I am open to recommendations for document schema changes that can satisfy the general problem (looking up documents based on criteria itself defined in our documents). I am also open to any overall design recommendations that address the problem, too (perhaps there is a better way to structure our system to trigger policy actions in response to events).
Thanks!
Not sure what's the exact scenario but can think of 2 here,
You need an exact match. For that you can run the below querydb.getCollection('test').find({'requirements':{'var1':'aaa','var2':'bbb'}})
for above query to run you need to save requirements object after sorting it's keys var1 and var2.
You need to match all properties exists and don't care if anything is extra in policies collection. You need to change policies being stored as,
{
"_id" : ObjectId("603250b0775428e32b9b303f"),
"id" : 123,
"name" : "Jason's Policy",
"requirements" : {
"var1" : "aaa",
"var2" : "bbb"
},
"requirements_search" : [
"var1aaa",
"var2bbb",
"var3ccc"
]
}
then you can run the below query,
db.getCollection('test').find({'requirements_search':{'$all' : ['var1aaa','var2bbb']}})
I found an answer to my question in another post: Find Documents in MongoDB whose with an array field is a subset of a query array.
MongoDB offers a $setIsSubset operator that can check if a document's array values are a subset of the array values in a query. Translated to my use case: if a given policy's requirements are a subset of the event's metadata, then I know that the event data fully meets the requirements for that policy.
For completeness, below is the MongoDB aggregation that solved my problem. I still need to research if there is a more efficient overall system design to facilitate what I need, but at a minimum, this Mongo aggregation will fetch the results that I need.
// Requires us to flatten policy requirements into an array like the following
//
// {
// "id" : 123,
// "name" : "Jason's Policy",
// "requirements" : [
// "var1_aaa",
// "var2_bbb"
// ]
// }
//
// Event matches all policy requirements and has extra unrelated attributes
// {
// id: 777,
// "var1": "aaa",
// "var2": "bbb",
// "var3": "ccc"
// }
db.collection.aggregate([
{$project: {
doc: '$$ROOT',
isSubset: {$setIsSubset: ['$requirements', ['var1_aaa', 'var2_bbb', 'var3_ccc']]}
}},
{$match: {isSubset: true}},
{$project: {_id: 0, 'doc.name': 1}}
])

Count documents based on Array value and inner Array value

Before I explain my use case, I'd like to state that yes, I could change this application so that it would store things in a different manner or even split it into 2 collections for that matter. But that's not my intention, and I'd rather want to know if this is at all possible within MongoDB (since I am quite new to MongoDB). I can for sure work around this problem if I'd really need to, but rather looking for a method to achieve what I want (no I am not being lazy here, I really want to know a way to do this).
Let's get to the problem then.
I have a document like below:
{
"_id" : ObjectId("XXXXXXXXXXXXXXXXXXXXX"),
"userId" : "XXXXXXX",
"licenses" : [
{
"domain" : "domain1.com",
"addons" : [
{"slug" : "1"},
{"slug" : "2"}
]
},
{
"domain" : "domain2.com",
"addons" : [
{"slug" : "1"},
]
}
]
}
My goal is to check if a specific domain has a specific addon. When I use the below query to count the documents with domain: domain2.com and addon slug: 2 the result should be: 0. However with the below query it returns 1. I know that this is because the query is executed document wide and not just the license index that matched domain2.com. So my question is, how to do a sub $and (or however you'd call it)?
db.test.countDocuments(
{$and: [
{"licenses.domain": "domain2.com"},
{"licenses.addons.slug": "2"},
]}
)
Basically I am looking for something like this (below isn't working obviously), but below should return 0, not 1:
db.test.countDocuments(
{$and: [
{
"licenses.domain": "domain2.com",
$and: [
{ "licenses.addons.slug": "2"}
]
}
]}
)
I know there is $group and $filter operators, I have been trying many combinations to no avail. I am lost at this point, I feel like I am completely missing the logic of Mongo here. However I believe this must be relatively easy to accomplish with a single query (just not for me I guess).
I have been trying to find my answer on the official documentation and via stack overflow/google, but I really couldn't find any such use case.
Any help is greatly appreciated! Thanks :)
What you are describe is searching for a document whose array contains a single element that matches multiple criteria.
This is exactly what the $elemMatch operator does.
Try using this for the filter part:
{
licenses: {
$elemMatch: {
domain: "domain2.com",
"addons.slug": "2"
}
}
}

Return MongoDB documents that don't contain specific inner array items

How can I return a set of documents, each not containing a specific item in an inner array?
My data scheme is:
Posts:
{
"_id" : ObjectId("57f91ec96241783dac1e16fe"),
"votedBy" : [
{
"userId" : "101",
"vote": 1
},
{
"userId" : "202",
"vote": 2
}
],
"__v" : NumberInt(0)
}
I want to return a set of posts, non of which contain a given userId in any of the votedBy array items.
The official documentation implies that this is possible:
MongoDB documentation: Field with no specific array index
Though it returns an empty set (for the more simple case of finding a document with a specific array item).
It seems like I have to know the index for a correct set of results, like:
votedBy.0.userId.
This Question is the closest I found, with this solution (Applied on my scheme):
db.collection.find({"votedBy": { $not: {$elemMatch: {userId: 101 } } } })
It works fine if the only inner document in the array matches the one I wish not to return, but in the example case I specified above, the document returns, because it finds the userId=202 inner document.
Just to clarify: I want to return all the documents, that NONE of their votedBy array items have the given userId.
I also tried a simpler array, containing only the userId's as an array of Strings, but still, each of them receives an Id and the search process is just the same.
Another solution I tried is using a different collection for uservotes, and applying a lookup to perform a SQL-similar join, but it seems like there is an easier way.
I am using mongoose (node.js).
User $ne on the embedded userId:
db.collection.find({'votedBy.userId': {$ne: '101'}})
It will filter all the documents with at least one element of userId = "101"

Storing a query in Mongo

This is the case: A webshop in which I want to configure which items should be listed in the sjop based on a set of parameters.
I want this to be configurable, because that allows me to experiment with different parameters also change their values easily.
I have a Product collection that I want to query based on multiple parameters.
A couple of these are found here:
within product:
"delivery" : {
"maximum_delivery_days" : 30,
"average_delivery_days" : 10,
"source" : 1,
"filling_rate" : 85,
"stock" : 0
}
but also other parameters exist.
An example of such query to decide whether or not to include a product could be:
"$or" : [
{
"delivery.stock" : 1
},
{
"$or" : [
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 60
}
},
{
"delivery.filling_rate" : {
"$gt" : 90
}
}
]
},
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 40
}
},
{
"delivery.filling_rate" : {
"$gt" : 80
}
}
]
},
{
"$and" : [
{
"delivery.delivery_days" : {
"$lt" : 25
}
},
{
"delivery.filling_rate" : {
"$gt" : 70
}
}
]
}
]
}
]
Now to make this configurable, I need to be able to handle boolean logic, parameters and values.
So, I got the idea, since such query itself is JSON, to store it in Mongo and have my Java app retrieve it.
Next thing is using it in the filter (e.g. find, or whatever) and work on the corresponding selection of products.
The advantage of this approach is that I can actually analyse the data and the effectiveness of the query outside of my program.
I would store it by name in the database. E.g.
{
"name": "query1",
"query": { the thing printed above starting with "$or"... }
}
using:
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
Which results in:
2016-03-27T14:43:37.265+0200 E QUERY Error: field names cannot start with $ [$or]
at Error (<anonymous>)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:161:19)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:165:18)
at insert (src/mongo/shell/bulk_api.js:646:20)
at DBCollection.insert (src/mongo/shell/collection.js:243:18)
at (shell):1:12 at src/mongo/shell/collection.js:161
But I CAN STORE it using Robomongo, but not always. Obviously I am doing something wrong. But I have NO IDEA what it is.
If it fails, and I create a brand new collection and try again, it succeeds. Weird stuff that goes beyond what I can comprehend.
But when I try updating values in the "query", changes are not going through. Never. Not even sometimes.
I can however create a new object and discard the previous one. So, the workaround is there.
db.queries.update(
{"name": "query1"},
{"$set": {
... update goes here ...
}
}
)
doing this results in:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 52,
"errmsg" : "The dollar ($) prefixed field '$or' in 'action.$or' is not valid for storage."
}
})
seems pretty close to the other message above.
Needles to say, I am pretty clueless about what is going on here, so I hope some of the wizzards here are able to shed some light on the matter
I think the error message contains the important info you need to consider:
QUERY Error: field names cannot start with $
Since you are trying to store a query (or part of one) in a document, you'll end up with attribute names that contain mongo operator keywords (such as $or, $ne, $gt). The mongo documentation actually references this exact scenario - emphasis added
Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)...
I wouldn't trust 3rd party applications such as Robomongo in these instances. I suggest debugging/testing this issue directly in the mongo shell.
My suggestion would be to store an escaped version of the query in your document as to not interfere with reserved operator keywords. You can use the available JSON.stringify(my_obj); to encode your partial query into a string and then parse/decode it when you choose to retrieve it later on: JSON.parse(escaped_query_string_from_db)
Your approach of storing the query as a JSON object in MongoDB is not viable.
You could potentially store your query logic and fields in MongoDB, but you have to have an external app build the query with the proper MongoDB syntax.
MongoDB queries contain operators, and some of those have special characters in them.
There are rules for mongoDB filed names. These rules do not allow for special characters.
Look here: https://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names
The probable reason you can sometimes successfully create the doc using Robomongo is because Robomongo is transforming your query into a string and properly escaping the special characters as it sends it to MongoDB.
This also explains why your attempt to update them never works. You tried to create a document, but instead created something that is a string object, so your update conditions are probably not retrieving any docs.
I see two problems with your approach.
In following query
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
a valid JSON expects key, value pair. here in "query" you are storing an object without a key. You have two options. either store query as text or create another key inside curly braces.
Second problem is, you are storing query values without wrapping in quotes. All string values must be wrapped in quotes.
so your final document should appear as
db.queries.insert({
"name" : "query1",
"query": 'the thing printed above starting with "$or"... '
})
Now try, it should work.
Obviously my attempt to store a query in mongo the way I did was foolish as became clear from the answers from both #bigdatakid and #lix. So what I finally did was this: I altered the naming of the fields to comply to the mongo requirements.
E.g. instead of $or I used _$or etc. and instead of using a . inside the name I used a #. Both of which I am replacing in my Java code.
This way I can still easily try and test the queries outside of my program. In my Java program I just change the names and use the query. Using just 2 lines of code. It simply works now. Thanks guys for the suggestions you made.
String documentAsString = query.toJson().replaceAll("_\\$", "\\$").replaceAll("#", ".");
Object q = JSON.parse(documentAsString);

How $id helps in mongodb?

I have made a mongodb document a reference of other document. But I think it is not working the way I want it to work like!
For example:
> db.ttt.insert({_id: "a", b:"b" })
> db.ttt.insert({_id: "b", b: {$id:"a" } })
> db.ttt.find()
{ "_id" : "a", "b" : "b" }
{ "_id" : "b", "b" : { "$id" : "a" } }
Since I am making my last insertion a referee to first so it should be an equivalent to:
{
_id: "b",
b: {
{_id: "a", b:"b" }
}
}
yet why this query fails?
> db.ttt.find({"b.b":"b"} )
I may have understood $id's property wrong. But if it can't be done by referencing then what are the other choice do I have? and what is the advantage of referencing ?
Firstly, embedding document inside a document is different then referencing a document. In your case, you are referring a document rather than embedding. So you are not suppose to treat it like embedded document. What you are doing is querying it as if you have embedded document which it is not.
Mongodb documentation is very clear about how referencing can be used and I think you must visit the mongodb documentation for referencing. But in short, once you have got the result from db.ttt.find({ "_id" : "b"}) then your application should make another query to find the referenced document with the reference you got in the first query.
The important thing to remember is in case of embedded document you can run one query to get the result from embedded document whereas in referenced document you have to take help of second query.