How do I accomplish the following?
db.test.save( {a: [1,2,3]} );
db.test.find( {a: [1,2,3,4]} ); //must match because it contains all required values [1, 2, 3]
db.test.find( {a: [1,2]} ); //must NOT match because the required value 3 is missing
I know about $in and $all but they work differently.
Interesting..The problem is..the $in and the $or operators get applied on the elements of the array that you are comparing against each document in the collection, not on the elements of the arrays in the documents..To summarize your question: You want it to be a match, if any of the documents in the collection happens to be a subset of the passed array. I can't think of a way to do this unless you swap your input and output. What I mean is..Let's take your first input:
db.test.find( {a: [1,2,3,4]} );
Consider putting this in a temporary collection say,temp as:
db.temp.save( {a: [1,2,3,4]} );
Now iterate over each document in test collection and 'find' it in temp, with the $all operator to ensure it is completely contained, i.e., do something like this:
foreach(doc in test)
{
db.temp.find( { a: { $all: doc.a } } );
}
This is definitely a workaround! I am not sure if I am missing any other operator that can do this job.
There is currently no operator that will find documents with a subset values. There is a proposed improvement to add a $subset operator. Please feel free to up vote the issue in JIRA:
https://jira.mongodb.org/browse/SERVER-974
Other potential workarounds, which may not be suitable for your use case, may involve map reduce or the new aggregation framework (available in MongoDB version 2.2):
http://docs.mongodb.org/manual/reference/aggregation/#_S_project
Related
so I'm working with a system that can only take the query-argument of a .find() query as input, which means that I can not input an iterative script or an aggregation query.
However, I need to be able to find all documents that contain in one of their array fields the value that is set in another field. So, if my documents look as follows:
{ _id: ..., FieldA: x, FieldB: [x, y, z] }
Then I want to find all documents that contain the value of FieldA in FieldB, in this example contain x in FieldB.
Is that at all possible?
You can use $in with $expr for your case.
db.collection.find({
$expr: {
"$in": [
"$FieldA",
"$FieldB"
]
}
})
Here is the Mongo playground for your reference.
In my collection that has say 100 documents, I want to run the following query:
collection.find({"$text" : {"$search" : "some_string"})
Assume that a suitable "text" index already exists and thus my question is : How can I run this query on the last 'n' documents only?
All the question that I found on the web ask how to get the last n docs. Whereas My question is how to search on the last n docs only?
More generally my question is How can I run a mongo query on some portion say 20% of a collection.
What I tried
Im using pymongo so I tried to use skip() and limit() to get the last n documents but I didn't find a way to perform a query on cursor that the above mentioned function return.
After #hhsarh's anwser here's what I tried to no avail
# here's what I tried after initial answers
recents = information_collection.aggregate([
{"$match" : {"$text" : {"$search" : "healthline"}}},
{"$sort" : {"_id" : -1}},
{"$limit" : 1},
])
The result is still coming from the whole collection instead of just the last record/document as the above code attempts.
The last document doesn't contain "healthline" in any field therefore the intended result of the query should be empty []. But I get a documents.
Please can someone tell how this can be possible
What you are looking for can be achieved using MongoDB Aggregation
Note: As pointed out by #turivishal, $text won't work if it is not in the first stage of the aggregation pipeline.
collection.aggregate([
{
"$sort": {
"_id": -1
}
},
{
"$limit": 10 // `n` value, where n is the number of last records you want to consider
},
{
"$match" : {
// All your find query goes here
}
},
], {allowDiskUse=true}) // just in case if the computation exceeds 100MB
Since _id is indexed by default, the above aggregation query should be faster. But, its performance reduces in proportion to the n value.
Note: Replace the last line in the code example with the below line if you are using pymongo
], allowDiskUse=True)
It is not possible with $text operator, because there is a restriction,
The $match stage that includes a $text must be the first stage in the pipeline
It means we can't limit documents before $text operator, read more about $text operator restriction.
Second option this might possible if you use $regex regular expression operator instead of $text operator for searching,
And if you need to search same like $text operator you have modify your search input as below:
lets assume searchInput is your input variable
list of search field in searchFields
slice that search input string by space and loop that words array and convert it to regular expression
loop that search fields searchFields and prepare $in condition
searchInput = "This is search"
searchFields = ["field1", "field2"]
searchRegex = []
searchPayload = []
for s in searchInput.split(): searchRegex.append(re.compile(s, re.IGNORECASE));
for f in searchFields: searchPayload.append({ f: { "$in": searchRegex } })
print(searchPayload)
Now your input would look like,
[
{'field1': {'$in': [/This/i, /is/i, /search/i]}},
{'field2': {'$in': [/This/i, /is/i, /search/i]}}
]
Use that variable searchPayload with $or operator in search query at last stage using $in operator,
recents = information_collection.aggregate([
# 1 = ascending, -1 descending you can use anyone as per your requirement
{ "$sort": { "_id": 1 } },
# use any limit of number as per your requirement
{ "$limit": 10 },
{ "$match": { "$or": searchPayload } }
])
print(list(recents))
Note: The $regex regular expression search will cause performance issues.
To improve search performance you can create a compound index on your search fields like,
information_collection.createIndex({ field1: 1, field2: 1 });
we are tyring to enquiry system.profile to collect all operation that impacts some document (ie DATA.COD:12)
This is a snippet of a system.profile document.
{
op:"update",
ns:"db.myCollection",
command:{
q: {
"DATA.COD":12,
NAME:"PIPPO"
},
u:{
FIELD:"PLUTO"},
...
}
We'd like something like this
{op:"update", "command.q.DATA.COD":{"$exists":true},ns:"db.myCollection"}
but the field inside the name doesn't work (it search a subdocument). we have already tried with escape but nothing so far...
This is not very pretty, and likely not very efficient, but you should be able to get an equivalent match using $expr and some aggregation ops:
db.system.profile.find({op:"update",$expr:{$gt:[{$size:{$filter:{input:{$objectToArray:"$$ROOT.command.q"},cond:{$eq:["$$this.k","DATA.COD"]}}}},0]}})
To break this down:
op:"update" - exact match on the op field
$expr - use aggregation expressions
{$objectToArray:"$$ROOT.command.q"} - convert the command.q subdocument to an array of documents each containing a single key-value pair
{$eq:["$$this.k","DATA.COD"]} - check if the current key name is "DATA.COD"
{$filter:{input: ... , cond: ...}} - eliminate elements from the input array that do not match the condition
{$size: ...} - return the size of the array
{$gt:[ ..., 0]} - determine if the first argument is greater than zero
Summary:
Convert q: {"DATA.COD":12, NAME:"PIPPO"} to q:[{k:"DATA.DOC", v:12},{k:"NAME",v:"PIPPO"}]
eliminate all array elements that do not match k=="DATA.DOC"
Match the document if the array still contains any elements
repeat for all documents in the collection
Here is the problem I want to resolve:
each document contains an array of 30 integers
the documents are grouped under a certain condition (not relevant here)
while grouping them, I want to:
add together the 29 last elements of the array (skipping the first one) of each document
sum the previous result among the same group, and return it
Data structure is very difficult to change and I cannot afford a migration + I still need the 30 values for another purpose. Here is what I tried, unsuccessfully:
db.collection.aggregate([
{$match: {... some matching query ...}},
{$project: {total_29_last_values: {$add: ["$my_array.1", "$my_array.2", ..., "$my_array.29"]}}},
{$group: {
... some grouping here ...
my_result: {$sum: "$total_29_last_values"}
}}
])
Theoretically (IMHO) this should work, given the definition of $add in mongodb documentation, but for some reason it fails:
exception: $add only supports numeric or date types, not Array
Maybe there is not support for adding together elements of an array, but this seems strange...
Thanks for your help !
From the docs,
The $add expression has the following syntax:
{ $add: [ <expression1>, <expression2>, ... ] }
The arguments can be any valid expression as long as they resolve to
either all numbers or to numbers and a date.
It clearly states that the $add operator accepts only numbers or dates.
$my_array.1 resolves to an empty array. for example, []. (You can always look for a match in particular index, such as, {$match:{"a.0":1}} but cannot derive the value from a particular index of an array. For that you need to use the $ or the $slice operators.This is currently an unresolved issue: JIRA1, JIRA2)
And the $add expression becomes $add:[[],[],[],..].
$add does not take an array as input and hence you get the error stating that it does not support Array as input.
What you need to do is:
Match the documents.
Unwind the my_array field.
Group together based on the _id of each document to get the sum
of all the elements in the array skipping the first element.
Project the summed field for each grouped document.
Again group the documents based on the condition to get the sum.
Stage operators:
db.collection.aggregate([
{$match:{}}, // condition
{$unwind:"$my_array"},
{$group:{"_id":"$_id",
"first_element":{$first:"$my_array"},
"sum_of_all":{$sum:"$my_array"}}},
{$project:{"_id":"$_id",
"sum_of_29":{$subtract:["$sum_of_all","$first_element"]}}},
{$group:{"_id":" ", // whatever condition
"my_result":{$sum:"$sum_of_29"}}}
])
I have a collection of documents like this one:
{
"_id" : ObjectId("..."),
"field1": "some string",
"field2": "another string",
"field3": 123
}
I'd like to be able to iterate over the entire collection, and find the entire number of fields there are. In this example document there are 3 (I don't want to include _id), but it ranges from 2 to 50 fields in a document. Ultimately, I'm just looking for the average number of fields per document.
Any ideas?
Iterate over the entire collection, and find the entire number of fields there are
Now you can utilise aggregation operator $objectToArray (SERVER-23310) to turn keys into values and count them. This operator is available in MongoDB v3.4.4+
For example:
db.collection.aggregate([
{"$project":{"numFields":{"$size":{"$objectToArray":"$$ROOT"}}}},
{"$group":{"_id":null, "fields":{"$sum":"$numFields"}, "docs":{"$sum":1}}},
{"$project":{"total":{"$subtract":["$fields", "$docs"]}, _id:0}}
])
First stage $project is to turn all keys into array to count fields. Second stage $group is to sum the number of keys/fields in the collection, also the number of documents processed. Third stage $project is subtracting the total number of fields with the total number of documents (As you don't want to count for _id ).
You can easily add $avg to count for average on the last stage.
PRIMARY> var count = 0;
PRIMARY> db.my_table.find().forEach( function(d) { for(f in d) { count++; } });
PRIMARY> count
1074942
This is the most simple way I could figure out how to do this. On really large datasets, it probably makes sense to go the Map-Reduce path. But, while your set is small enough, this'll do.
This is O(n^2), but I'm not sure there is a better way.
You could create a Map-Reduce job. In the Map step iterate over the properties of each document as a javascript object, output the count and reduce to get the total.
For a simple way just find() all value and for each set of record get size of array.
db.getCollection().find(<condition>)
then for each set of result, get the size of array.
sizeOf(Array[i])