mongodb apply $not to $and - is there some ideas? - mongodb

I'm trying to do something like this :
select * from table where not (a=3 and b=4 and c=3 or x=4)
I would expect this to work:
db.table.find( {
$not : {
$or : [
{ $and : [ { a : 3 },
{ b : 4 },
{ c : 3 }
] } ,
{ x : 4 }
}
} )
But it is not work.
I have read this article
And something like this: {a : { $ne : 3}, b : { $ne : 4} ...}. does
not suit me.
Because: my program takes DIFFERENT queries like this (a=3 and b=4 and c=3 or x=4) from users (queries to multilevel embedded objects and arrays).
And to write procedure, wich automatically apply $not to those queries looks
long and thankless task. have you any ideas?
PS Why mongo does not have the way simply to do that?
for example, to find all the documents that match the condition, and to take from the collection of the remaining documents

Related

Possible to avoid $unwind / aggregation on large array using $elemMatch and regular query?

I have a collection of documents (call it 'logs') which looks similar to this:
{
"_id" : ObjectId("52f523892491e4d58e85d70a"),
"ds_id" : "534d35d72491de267ca08e96",
"eT" : NumberLong(1391784000),
"vars" : [{
"n" : "ActPow",
"val" : 73.4186401367188,
"u" : "kWh",
"dt" : "REAL",
"cM" : "AVE",
"Q" : 99
}, {
"n" : "WinSpe",
"val" : 3.06327962875366,
"u" : "m/s",
"dt" : "REAL",
"cM" : "AVE",
"Q" : 99
}]
}
The vars array holds about 150 subdocuments, not just the two I have shown above. What I'd like to do now is to run a query which retrieves the val of the two subdocuments in the vars array that I have shown above.
Using the aggregation framework, I've been able to come up with the following:
db.logs.aggregate( [
{ $match :
{ ds_id: "534d35d72491de267ca08e96",
eT: { $lt : 1391784000 },
vars: { $elemMatch: { n: "PowCrvVld", val: 3 }}
}
},
{ $unwind : "$vars" },
{ $match :
{ "vars.n" : { $in : ["WinSpe", "ActPow"] }},
{ $project : { "vars.n" : 1, N : 1}
}
]);
While this works, I run up against the 16MB limit when running larger queries. Seeing as I have about 150 subdocuments in the vars array, I'd also like to avoid $unwind if it's possible.
Using a regular query and using $elemMatch I have been able to retrieve ONE of the values:
db.logs.TenMinLog.find({
ds_id : "534d35d72491de267ca08e96",
eT : { $lt : 1391784000 },
vars : { $elemMatch : { n : "PowCrvVld", val : 3 }
}
}, {
ds_id : 1,
vars : { $elemMatch : { n : "ActPow", cM : "AVE" }
});
What my question comes down to is if there's a way to use $elemMatch on an array multiple times in the <projection> part of find. If not, is there another way to easily retrieve those two subdocuments without using $unwind? I am also open to other suggestions that would be more performant that I may not be aware of. Thanks!
If you're using MongoDB 2.6 you can use the $redact operator to prune the elements from the vars array.
In MongoDB 2.6 you can also return results as a cursor to avoid the 16MB limit. From the docs:
In MongoDB 2.6 the aggregate command can return results as a cursor or
store the results in a collection, which are not subject to the size
limit. The db.collection.aggregate() returns a cursor and can return
result sets of any size.
I'd strongly consider a move to MongoDB version 2.6. Aggregation has been enhanced to return a cursor which eliminates the 16MB document limit:
Changed in version 2.6:
The db.collection.aggregate() method returns a cursor and can return
result sets of any size. Previous versions returned all results in a
single document, and the result set was subject to a size limit of 16
megabytes.
http://docs.mongodb.org/manual/core/aggregation-pipeline/
Also there are a number of enhancements that you may find useful for more complex aggregation queries:
Aggregation Enhancements
The aggregation pipeline adds the ability to return result sets of any
size, either by returning a cursor or writing the output to a
collection. Additionally, the aggregation pipeline supports variables
and adds new operations to handle sets and redact data.
The db.collection.aggregate() now returns a cursor, which enables the
aggregation pipeline to return result sets of any size. Aggregation
pipelines now support an explain operation to aid analysis of
aggregation operations. Aggregation can now use a more efficient
external-disk-based sorting process.
New pipeline stages:
$out stage to output to a collection.
$redact stage to allow additional control to accessing the data.
New or modified operators:
set expression operators.
$let and $map operators to allow for the use of variables.
$literal operator and $size operator.
$cond expression now accepts either an object or an array.
http://docs.mongodb.org/manual/release-notes/2.6/
Maybe this works.
db.logs.TenMinLog.find({
ds_id : "534d35d72491de267ca08e96",
eT : { $lt : 1391784000 },
vars : { $or: [{ $elemMatch : { n : "PowCrvVld", val : 3 },
{ $elemMatch : { n : <whatever>, val : <whatever> }]
}
}
}, {
ds_id : 1,
vars : { $elemMatch : { n : "ActPow", cM : "AVE" }
});
Hope it works as you want.

mongodb micro-optimization of batch inserts ? or is this an important optimization?

premise : update statements are harmless since the driver by default works in one way messaging (as long as getLastError isn't used).
question Is the following fragment the best way to do this in mongodb for high volume inserts ? Is it possible to fold step 2 and 3 ?
edit : old buggy form , see below
// step 1 : making sure the top-level document is present (an upsert in the real
example)
db.test.insert( { x :1} )
// step 2 : making sure the sub-document entry is present
db.test.update( { x:1 }, { "$addToSet" : { "u" : { i : 1, p : 2 } } }, false)
// step 3 : increment a integer within the subdocument document
db.test.update( { x : 1, "u.i" : 1}, { "$inc" : { "u.$.c" : 1 } },false)
I have a feeling there is no way out of operation 3, since the$ operator requires priming in the query field of the query part of an update. amirite ? iamrite ?
If this is the best way to do things, can I get creative in my code and go nuts with update operations ?
edit : new form
There was a bug in my logic, thanks Gates. Still want to fold the updates if possible :D
// make sure the top-level entry exists and increase the incidence counter
db.test.update( { x : 1 }, { $inc : { i : 1 } }, true ) --1
// implicetly creates the array
db.test.update( { x : 1 , u : { $not : { $elemMatch : { i : 1 } } } } ,
{ $push : { u : { i : 1 , p :2 , c:0} } }) -- 2
db.test.update( { x :1 , "u.i" : 1}, { $inc : { "u.$.c" : 1 } },false) --3
notes : $addToSet is not usefull in this case, since it does a element-wise match, there is no way to express what elements in an array may be mutable as in C++ OO bitwise comparison parlance
question is pointless Data model is wrong. Please vote to close (OP).
So, the first thing to note is that the $ positional operator is a little sketchy. It has a lot of "gotchas": it doesn't play well with upserts, it only affects the first true match, etc.
To understand "folding" of #2 and #3, you need to look at the output of your commands:
db.test.insert( { x :1} )
{ x:1 } // DB value
db.test.update( { x:1 }, { "$addToSet" : { "u" : { i : 1, p : 2 } } }, false)
{ x:1, u: [ {i:1,p;2} ] } // DB value
db.test.update( { x : 1, "u.i" : 1}, { "$inc" : { "u.$.c" : 1 } },false)
{ x:1, u: [ {i:1,p:2,c:1} ] } // DB value
Based on the sequence you provided, the whole thing can be rolled into a single update.
If you're only looking to roll together #2 & #3, then you're worried about matching 'u.i':1 with u.$.c. But there are some edge cases here you have to clarify.
Let your starting document be the following:
{
x:1,
u: [
{i:1, p:2},
{i:1, p:3}
]
}
What do you expect from running update #3?
As written you get:
{
x:1,
u: [
{i:1, p:2, c:1},
{i:1, p:3}
]
}
Is this correct? Is that first document legal? (semantically correct)? Depending on the answers, this may actually be an issue of document structure.

Get "data from collection b not in collection a" in a MongoDB shell query

I have two MongoDB collections that share a common _id. Using the mongo shell, I want to find all documents in one collection that do not have a matching _id in the other collection.
Example:
> db.Test.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 })
> db.Test.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 })
> db.Test.insert({ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 })
> db.Test.insert({ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 })
> db.Test.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 }
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
> db.Test2.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 });
> db.Test2.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 });
> db.Test2.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 }
Now I want some query or queries that returns the two documents in Test where the _id's do not match any document in Test2:
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
I've tried various combinations of $not, $ne, $or, $in but just can't get the right combination and syntax. Also, I don't mind if db.Test2.find({}, {"_id": 1}) is executed first, saved to some variable, which is then used in a second query (though I can't get that to work either).
Update: Zachary's answer pointing to the $nin answered the key part of the question. For example, this works:
> db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), ObjectId("4f08a766306b428fb9d8bb2f")]}})
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
But (and acknowledging this is not scalable but trying to it anyway because its not an issue in this situation) I still can't combine the two queries together in the shell. This is the closest I can get, which is obviously less than ideal:
vals = db.Test2.find({}, {"_id": 1}).toArray()
db.Test.find({"_id": {"$nin": [ObjectId(vals[0]._id), ObjectId(vals[1]._id)]}})
Is there a way to return just the values in the find command so that vals can be used directly as the array input to $nin?
In mongo 3.2 the following code seems to work
db.collectionb.aggregate([
{
$lookup: {
from: "collectiona",
localField: "collectionb_fk",
foreignField: "collectiona_fk",
as: "matched_docs"
}
},
{
$match: {
"matched_docs": { $eq: [] }
}
}
]);
based on this https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-array example
Answering your follow-up. I'd use map().
Given this:
> b1 = {i: 1}
> db.b.save(b1)
> db.b.save({i: 2})
> db.a.save({_id: b1._id})
All you need is:
> vals = db.a.find({}, {id: 1}).map(function(a){return a._id;})
> db.b.find({_id: {$nin: vals}})
which returns
{ "_id" : ObjectId("4f08c60d6b5e49fa3f6b46c1"), "i" : 2 }
You will have to save the _ids from collection A to not pull them again from collection B, but you can do it using $nin. See Advanced Queries for all of the MongoDB operators.
Your end query, using the example you gave would look something like:
db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"),
ObjectId("4f08a766306b428fb9d8bb2f")]}})`
Note that this approach won't scale. If you need a solution that scales, you should be setting a flag in collections A and B indicating if the _id is in the other collection and then query off of that instead.
Updated for second part:
The second part is impossible. MongoDB does not support joins or any sort of cross querying between collections in a single query. Querying from one collection, saving the results and then querying from the second is your only choice unless you embed the data in the rows themselves as I mention earlier.
I've made a script, marking all documents on the second collection that appears in first collection. Then processed the second collection documents.
var first = db.firstCollection.aggregate([ {'$unwind':'$secondCollectionField'} ])
while (first.hasNext()){ var doc = first.next(); db.secondCollection.update( {_id:doc.secondCollectionField} ,{$set:{firstCollectionField:doc._id}} ); }
...process the second collection that has no mark
db.secondCollection.find({"firstCollectionField":{$exists:false}})

Using $not in mongodb

I'm trying to do something like this :
select * from table where not (a=3 and b=4 and c=3 or x=4)
I would expect this to work:
db.table.find({
$not: {
$or: [
{
$and: [
{ a: 3 },
{ b: 4 },
{ c: 3 }
]
},
{ x: 4 }
]
}
})
But it gives me an error:
error: { "$err" : "invalid operator: $and", "code" : 10068 }
Is there another way to express it in mongodb?
Firstly, I don't think you mean "and" as a field will never 3 and 4 at the same time - it can only be 3 or 4. So, assuming you do want "documents where b is not 3 or 4, then you can use $nin (not in) like this:
db.table.find({b:{$nin: [3,4]}});
Using { $not : { $and : []} } will not work ($not is not like other operators, can only be applied to negate the check of other operators).
$and is not the problem here, this also doesn't work (though without reporting any errors):
{ $not : { a : {$gt : 14} }
you'd have to rewrite it to
{ a : { $not : {$gt : 14} }
Coming back to your query:
`not (a=3 and b=4 and c=3 or x=4)`
is equivalent to:
a!=3 and b!=4 and c!=3 or x!=4
and that you can do in mongo:
{a : { $ne : 3}, b : { $ne : 4} ...}

mongodb $elemMatch

According to mongodb doc, syntax for $elemMatch would be,
t.find( { x : { $elemMatch : { a : 1, b : { $gt : 1 } } } } )
I have tried and it works fine.
The above means that, it can find if an object {a:1, b:'more than 1'} exist in the array x.
I have a requirement, where I need to figure out, if all the objects in an array exist in the database or not.
for example, let's say I have an array,
a=[{a:1, b:2},{a:3, b:4}, {a:5, b:6}]
and I need to find out if x contains all of them.
t.find( { x : { $elemMatch : { a : {$all:[1]}, b : {$all:[2]} } } } ) and it finds out all x containing {a:1, b:2}
But if I try, t.find( { x : { $elemMatch : { a : {$all:[1,3]}, b : {$all:[2,4]} } } } ), it fails. I know this is not correct.
Is there any way I can achieve this ?
Ideallt, it should be,
t.find( { x : { $elemMatch : {$all:[ {a:1, b:2}, {a:3, b:4}] } } )
I tried, it does not work.
t.find({$and:[{a:{$elemMatch:{a:1, b:2}}}, {a:{$elemMatch:{a:3, b:4}}}, {a:{$elemMatch:{a:5, b:6}}}]})
It isn't a particularly high performance option though.
You can not use elemMatch for this, but you can simply just create a query which checks whether a matches the whole array:
db.items.insert({ 'foo' : 1, 'a' : [{a:1, b:2},{a:3, b:4}, {a:5, b:6}]});
db.items.insert({ 'foo' : 1, 'a' : [{a:1, b:2},{a:3, b:4}, {a:8, b:7}]});
db.items.find({'a': [{a:1, b:2},{a:3, b:4}, {a:8, b:7}]});
{ "_id" : ObjectId("4f3391856e196eca5eaa7518"), "foo" : 1, "a" : [ { "a" : 1, "b" : 2 }, { "a" : 3, "b" : 4 }, { "a" : 8, "b" : 7 } ] }
However, for this to work the order of the elements in the array need to be the same for the document and the query. The following will not find anything:
db.items.find({'a': [{a:3, b:4},{a:1, b:2}, {a:8, b:7}]});
(Because {a:3, b:4} and {a:1, b:2} are swapped).
How about this:
db.items.find({x : { $all: [
{$elemMatch: {a: 1, b: 2}},
{$elemMatch: {a: 3, b: 4}},
{$elemMatch: {a: 5, b: 6}}
]}}
Check out the Mongo docs.
Also, note the docs warning:
In the current release, queries that use the $all operator must scan
all the documents that match the first element in the query array. As
a result, even with an index to support the query, the operation may
be long running, particularly when the first element in the array is
not very selective.