Could there be a (unique) workaround for duplicate "$or" keys when working with boolean logic in MongoDB? - mongodb

When I asked a question earlier about querying in MongoDB resolved the respective issue, but another question stemmed from the original idea.
Under similar conditions, suppose that I'm trying to query:
Example: {
"AttributeA": type,
"AttributeB": type,
"AttributeC": type,
"AttributeD": type
etc...
}
But I want to find all elements given the conditions where:
(Attribute A matches criteria1 or Attribute B matches criteria2)
and
(Attribute C matches criteria3 or Attribute D matches criteria4 or Attribute E matches criteria5)
The $in operator only tracks an $or conditional given that the attributes are the same (eg. referring to previous question of AttributeC matching criteria 3, 4, or 5). So the general layout in this new query would be more like:
db.Example.find({
$or:[ {AttrA : "criteria1"}, {AttrB : "criteria2"}],
$or:[ {AttrC : "criteria3"}, {AttrD : "criteria4"}, {AttrE : "criteria5"} ]
})
But under the conditions above it seems impossible without a duplicate "$or" operator unless I do some boolean algebra and separate it into:
((A+B)*(C+D+E) = AC + AD + AE + BC + BD + BE) aka
AttrA matches ... and AttrC matches ...
or
AttrA matches ... and AttrD matches ...
or
...
AttrB matches ... and AttrE matches ...
meaning that the format would look like
db.Example.find({
$or:[
$and:[{AttrA : "criteria1"}, {AttrC : "criteria3"}],
$and:[{AttrA : "criteria1"}, {AttrD : "criteria4"}],
...,
$and:[{AttrB : "criteria2"}, {AttrE : "criteria5"}
]
})
Though I'm not even sure if the mongoDB system allows for duplicate "$and"s either.
Could there be an easier way or am I overcomplicating the conditional queries?

There is no need to manually distribute the conditions here. You must use the $and explicitly rather than implying on the implicit one:
db.Example.find({
$and: [
{ $or:[ {AttrA : "criteria1"}, {AttrB : "criteria2"}] },
{ $or:[ {AttrC : "criteria3"}, {AttrD : "criteria4"}, {AttrE : "criteria5"} ] }
]
})
Playground example here. This is covered in the documentation here.
The general problem that is encountered, at least with Javascript that includes the shell, is that duplicate field names are not supported. Consider this example using Node directly:
$ node
Welcome to Node.js v16.17.0.
Type ".help" for more information.
> let x = { x: 123, y: 456, x: 789 }
undefined
> x
{ x: 789, y: 456 }
>

Related

(Boolean logic) What is a workaround for duplicate keys when working with boolean logic in MongoDB?

When I'm trying to query for several categories in a database, such as:
Example: {
"AttributeA": type,
"AttributeB": type,
"AttributeC": type,
"AttributeD": type
etc...
}
And say that I'm trying to query for samples that have (AttributeA matching criteria1 or AttributeB matching criteria2) and (AttributeC matching criteria3, 4, 5, or 6). How would we compose the .find() method in MongoDB to match without error or complicated boolean algebra (distributing out the "and")?
I've tried using notation such as:
db.Example.find({
$or:[{"AttributeA" : criteria1, "AttributeB" : criteria2}],
$or:[{"AttributeC" : criteria3, "AttributeC" : criteria4, ...}]
})
in an attempt for an easy query satisfying the conditions above, but in some cases this would result in only one condition of the query being satisfied (giving a "duplicate key '$or'" warning before compiling) and in other instances (different platforms) this straight up gives an error along the lines of the warning.
If I understood the question correctly, I think you are looking for a query predicate such as:
{
$or: [
{
AttributeA: "criteria1"
},
{
AttributeB: "criteria2"
}
],
AttributeC: {
$in: [
"criteria3",
"criteria4",
"criteria5"
]
}
}
See how it works on this playground example
How would we compose the .find() method in MongoDB to match without ... complicated boolean algebra (distributing out the "and")?
By default, MongoDB applies an implicit $and operator to query predicates at the same level. There is an explicit operator also available, but it is not needed in your situation. You can allow the database to distribute the condition(s) that is logically $anded for you.
AttributeC matching criteria3, 4, 5, or 6
This is a perfect use for the $in operator, which I've used in the query above as:
AttributeC: {
$in: [ "criteria3", "criteria4", "criteria5" ]
}

multi updating a key along the documents of a collection using pymongo

I have lots of documents inside a collection.
The structure of each of the documents inside the collection is as it follows:
{
"_id" : ObjectId(....),
"valor" : {
"AB" : {
"X" : 0.0,
"Y" : 142.6,
},
"FJ" : {
"X" : 0.2,
"Y" : 3.33
....
The collection has currently about 200 documents and I have noticed that one of the keys inside valor has the wrong name. In this case we will say "FJ" shall be "JOF" in all the docs of the collection.
Im pretty sure it is possible to change the key in all the docs using the update function of pymongo. The problem I am facing is that when I visit the online doc available https://docs.mongodb.com/v3.0/reference/method/db.collection.update/ only explains how to change the values(which I would like to remain how they currently are and change only the keys).
This is what I have tried:
def multi_update(spec_key,key_updte):
rdo=col.update((valor.spec_key),{"$set":(valor.key_updte)},multi=True)
return rdo
print(multi_update('FJ','JOF'))
But outputs name 'valor' is not defined . I thought I shall use valor.specific_key to access to the corresponding json
how can I update a key only along the docs of the collection?
You have two problems. First, valor is not an identifier in your Python code, it's a field name of a MongoDB document. You need to quote it in single or double quotes in Python in order to make it a string and use it in a PyMongo update expression.
Your second problem is, MongoDB's update command doesn't allow you set one field to the value of another, nor to rename a field. However, you can reshape all the documents in your collection using the aggregate command with a $project stage and store the results in a second collection using a $out stage.
Here's a complete example to play with:
db = MongoClient().test
collection = db.collection
collection.delete_many({})
collection.insert_one({
"valor" : {
"AB" : {
"X" : 0.0,
"Y" : 142.6,
},
"FJ" : {
"X" : 0.2,
"Y" : 3.33}}})
collection.aggregate([{
"$project": {
"valor": {
"AB": "$valor.AB",
"FOJ": "$valor.FJ"
}
}
}, {
"$out": "collection2"
}])
This is the dangerous part. First, check that "collection2" has all the documents you want, in the desired shape. Then:
collection.drop()
db.collection2.rename("collection")
import pprint
pprint.pprint(collection.find_one())

Array intersection in MongoDB

Ok there are a couple of things going on here..I have two collections: test and test1. The documents in both collections have an array field (tags and tags1, respectively) that contains some tags. I need to find the intersection of these tags and also fetch the whole document from collection test1 if even a single tag matches.
> db.test.find();
{
"_id" : ObjectId("5166c19b32d001b79b32c72a"),
"tags" : [
"a",
"b",
"c"
]
}
> db.test1.find();
{
"_id" : ObjectId("5166c1c532d001b79b32c72b"),
"tags1" : [
"a",
"b",
"x",
"y"
]
}
> db.test.find().forEach(function(doc){db.test1.find({tags1:{$in:doc.tags}})});
Surprisingly this doesn't return anything. However when I try it with a single document, it works:
> var doc = db.test.findOne();
> db.test1.find({tags1:{$in:doc.tags}});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a", "b", "x", "y" ] }
But this is part of what I need. I need intersection as well. So I tried this:
> db.test1.find({tags1:{$in:doc.tags}},{"tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a" ] }
But it returned just "a" whereas "a" and "b" both were in tags1. Does positional operator return just the first match? Also, using $in won't exactly give me an intersection..How can I get an intersection (should return "a" and "b") irrespective of which array is compared against the other.
Now say there's an operator that can do this..
> db.test1.find({tags1:{$intersection:doc.tags}},{"tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a", "b" ] }
My requirement is, I need the entire tags1 array PLUS this intersection, in the same query like this:
> db.test1.find({tags1:{$intersection:doc.tags}},{"tags1":1, "tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1": [ "a", "b", "x", "y" ],
"tags1" : [ "a", "b" ] }
But this is an invalid json. Is renaming key possible, or this is possible only through aggregation framework (and across different collections?)? I tried the above query with $in. But it behaved as if it totally ignored "tags:1" projection.
PS: I am going to have at least 10k docs in test1 and very few (<10) in test. And this query is in real-time, so I want to avoid mapreduce :)
Thanks for any help!
In newer versions you can use aggregation to accomplish this.
db.test.aggregate(
{
$match: {
tags1: {
$in: doc.tags
}
}
},
{
$project: {
tags1: 1,
intersection: {
$setIntersection: [doc.tags, "$tags1"]
}
}
}
);
As you can see, the match portion is exactly the same as your initial find() query. The project portion generates the result fields. In this case, it selects tags1 from the matching documents and also creates intersection from the input and the matching docs.
Mongo doesn't have any inherent ability to retrieve array intersections. If you really need to use ad-hoc querying get the intersection on the client side.
On the other hand, consider using Map-Reduce and storing it's output as a collection. You can augment the returned objects in the finalize section to add the intersecting tags. Cron MR to run every few seconds. You get the benefit of a permanent collection you can query from on the client side.
If you want to have this in realtime you should consider to move away from Serverside Javascript which is only run with one thread and should be quite slow (single threaded) (this is no longer true for v2.4, http://docs.mongodb.org/manual/core/server-side-javascript/).
The positional operator only returns the first matching/current value. Without knowing the internal implementation, from the point of performance it doesn't even makes sense to look for further matching criteria if the document was already evaluated as match. So I doubt that you can go for this.
I don't know if you need the cartesian product for your search, but I would consider joining your few test one document tags into one and then have some $in search for it on test1, returning all matching documents. On your local machine you could have multiple threads which generate the intersection for your document.
Depending on how frequent your test1 and test collection changes, you're performing this query you might precalculate this information. Which would allow to easily do a query on the field which contains the intersection information.
The document is invalid because you have two fields names tags1

Find documents with arrays not containing a document with a particular field value in MongoDB

I'm trying to find all documents that do not contain at least one document with a specific field value. For example here is a sample collection:
{ _id : 1,
docs : [
{ foo : 1,
bar : 2},
{ foo : 3,
bar : 3}
]
},
{ _id : 2,
docs : [
{ foo : 2,
bar : 2},
{ foo : 3,
bar : 3}
]
}
I want to find every record where there is not a document in the docs block that does not contain at least one record with foo = 1. In the example above, only the second document should be returned.
I have tried the following, but it only tells me if there are any that don't match (which returns document 1.
db.collection.find({"docs": { $not: {$elemMatch: {foo: 1 } } } })
UPDATE: The query above actually does work. As many times happens, my data was wrong, not my code.
I have also looked at the $nin operator but the examples only show when the array contains a list of primitive values, not an additional document. When I've tried to do this with something like the following, it looks for the EXACT document rather than just the foo field I want.
db.collection.find({"docs": { $nin: {'foo':1 } } })
Is there anyway to accomplish this with the basic operators?
Using $nin will work, but you have the syntax wrong. It should be:
db.collection.find({'docs.foo': {$nin: [1]}})
Use the $ne operator:
db.collection.find({'docs.foo': {$ne: 1}})
Update: I'd advise against using $nin in this case.
{'docs.foo': {$ne: 1}} takes all elements of docs, and for each of them it checks whether the foo field equals 1 or not. If it finds a match, it discards the document from the result list.
{'docs.foo': {$nin: [1]}} takes all elements of docs, and for each element it checks whether its foo field matches any of the members of the array [1]. This is a Cartesian product, you compare an array to another array, each element to each element. Although MongoDB might be smart and optimize this query, I assume you only use $nin because "it has do to something with arrays". But if you understand what you do here, you'll realize $nin is superfluous, and has possibly subpar performance.

MongoDB Casbah query field not exist or specific value

I would like to perform a query using casbah in order to find all objects that have a certain field not set (the field does not exist) or the field has a particular value.
I have tried using
val query = ("_id.serviceName" $in serviceNames) ++ ($or("element" $exists false), MongoDBObject("element" -> "value")))
but I obtain an error:
found com.mongodb.casbah.commons.Imports.DBObject
required (String, Any)
Is it possible to express such query?
Thanks
Looks like this may be a bug in the right-hand value filter for $or; it doesn't appear to be accepting a preconstructed DBObject from the $exists DSL statement. It definitely should --- I'm filing a bug internally to fix this; in the meantime you can construct this by doing the "$exists" statement by hand:
scala> val query = ("_id.serviceName" $in serviceNames) ++ ($or(("element" -> MongoDBObject("$exists" -> false)), ("element" -> "value")))
query: com.mongodb.casbah.commons.Imports.DBObject = { "$or" : [ { "element" : { "$exists" : false}} , { "element" : "value"}] , "_id.serviceName" : { "$in" : [ "foo" , "bar" , "baz" , "bah"]}}
Sorry for the trouble... I've created a bug entry for this to correct for the next release.