Can I do a query based on multiple documents? - mongodb

I have a collection of documents where every document should have another, matching document. (This isn't by design, only for my current operation.) However, the current count for the number of documents in the collection is an odd number. Is there a way I can do a query on values of a key that aren't shared by one other document?
ie. if I have a collection like this:
{_id:'dcab0001', foo: 1, bar: 'dfgdgd'}
{_id:'dcab0002', foo: 2, bar: 'tjhttj'}
{_id:'dcab0003', foo: 1, bar: 'ydgdge'}
{_id:'dcab0004', foo: 3, bar: 'jyutkf'}
{_id:'dcab0005', foo: 3, bar: 'pofsth'}
I could do a query on foo that would return back the document with id dcab0002.

You could do this with MapReduce, or in MongoDB 2.2+ using the Aggregation Framework.
Here's an example using the Aggregation Framework:
db.pairs.aggregate(
// Group by values of 'foo' and count duplicates
{ $group: {
_id: '$foo',
key: { $push: '$_id' },
dupes: { $sum: 1 }
}},
// Find the 'foo' values that are unpaired (odd number of dupes)
{ $match: {
dupes: { $mod: [ 2, 1 ] }
}}
// Optionally, could add a $project to tidy up the output
)
Sample output:
{
"result" : [
{
"_id" : 2,
"key" : [
"dcab0002"
],
"dupes" : 1
}
],
"ok" : 1
}

Related

Filter only documents that have ALL FIELDS non null (with aggregation framework)

I have many documents, but I want to figure out how to get only documents that have ALL FIELDS non null.
Suppose I have these documents:
[
{
'a': 1,
'b': 2,
'c': 3
},
{
'a': 9,
'b': 12
},
{
'a': 5
}
]
So filtering the documents, only the first have ALL FIELDS not null. So filtering out these documents, I would get only the first. How can I do this?
So when you wanted to get only the documents which have ALL FIELDS, without specifying all of them in filter query like this : { a: {$exists : true}, b : {$exists : true}, c : {$exists : true}} then it might not be a good idea, in other way technically if you've 10s of fields in the document then it wouldn't either be a good idea to mention all of them in the query. Anyhow as you don't want to list them all - We can try this hack if it performs well, Let's say if you've a fixed schema & say that all of your documents may contain only fields a, b & c (_id is default & exceptional) but nothing apart from those try this :
If you can get count of total fields, We can check for field count which says all fields do exists, Something like below :
db.collection.aggregate([
/** add a new field which counts no.of fields in the document */
{
$addFields: { count: { $size: { $objectToArray: "$$ROOT" } } }
},
{
$match: { count: { $eq: 4 } } // we've 4 as 3 fields + _id
},
{
$project: { count: 0 }
}
])
Test : mongoplayground
Note : We're only checking for field existence but not checking for false values like null or [] or '' on fields. Also this might not work for nested fields.
Just in case if you wanted to check all fields exist in the document with their names, So if you can pass all fields names as input, then try below query :
db.collection.aggregate([
/** create a field with all keys/field names in the document */
{
$addFields: {
data: {
$let: {
vars: { data: { $objectToArray: "$$ROOT" } },
in: "$$data.k"
}
}
}
},
{
$match: { data: { $all: [ "b", "c", "a" ] } } /** List down all the field names from schema */
},
{
$project: { data: 0 }
}
])
Test : mongoplayground
Ref : aggregation-pipeline
You can try to use explain to check your queries performance.

MongoDB Find values passed in that don't match

Currently stuck with an issue using MongoDB aggregation. I have a array of '_ids' that I need to check exist in a specific collection.
Example:
I have 3 records in 'Collection 1' with _id 1,2,3. I can find the matching values using:
$match: {
_id: {
$in: [1, 2, 3, 4]
}
}
However what I want to know is from the values I have passed in (1,2,3,4). Which ones don't match up to a record. (In this case _id 4 will not have a matching record)
So instead of returning records with _id 1, 2, 3. It needs to return the _id that doesn't exist. So in this example '_id: 4'
The query should also disregard any extra records in the collection. Example, if the collection held records with ID 1-10, and I passed in a query to determine if the _ids: 1, 7, 15 existed. The the value i'm expecting would be along the lines of ' _id: 15 doesn't exist
The first thought was to use to use $project within a aggregation to hold each _id that was passed in, and then attach each record in the collection. To the matching _id passed in. E.g:
Record 1:
{
_id: 1,
Collection1: [
record details: ...,
...
...
]
},
{
_id: 2,
Collection1: [] // This _id passed in, doesn't have a matching collection
}
However cant seem to get a working example in this instance. Any help would be appreciated!
If the input documents are:
{ _id: 1 },
{ _id: 2 },
{ _id: 5 },
{ _id: 10 }
And the array to match is:
var INPUT_ARRAY = [ 1, 7, 15 ]
The following aggregation:
db.test.aggregate( [
{
$match: {
_id: {
$in: INPUT_ARRAY
}
}
},
{
$group: {
_id: null,
matches: { $push: "$_id" }
}
},
{
$project: {
ids_not_exist: { $setDifference: [ INPUT_ARRAY, "$matches" ] },
_id: 0
}
}
] )
Returns:
{ "ids_not_exist" : [ 7, 15 ] }
Are you looking for $not ?
MDB Docs

Aggregate on array of embedded documents

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:
{
"name": string,
"count": integer
}
Now I want to aggregate these subdocuments to find
The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.
I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.
[{
$match: {
_id: id
}
}, {
$unwind: {
path: "$array"
}
}, {
$sort: {
'count': -1
}
}, {
$limit: x
}]
Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.
The sort has to include the array name in order to avoid an additional sort later on.
Given the following document to work with:
{
students: [{
count: 4,
name: "Ann"
}, {
count: 7,
name: "Brad"
}, {
count: 6,
name: "Beth"
}, {
count: 8,
name: "Catherine"
}]
}
As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.
db.tests.aggregate([
{$match: {
_id: ObjectId("5c1b191b251d9663f4e3ce65")
}},
{$unwind: {
path: "$students"
}},
{$match: {
"students.name": /[he]/
}},
{$sort: {
"students.count": -1
}},
{$limit: 2}
])
This is the output given the above mentioned input:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }
Both names contain the letters "h" and "e", and the output is sorted from high to low.
When setting the limit to 1, the output is limited to:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
In this case only the highest count has been kept after having matched the names.
=====================
Edit for the extra question:
Yes, the first $match can be changed to filter on specific universities.
{$match: {
university: "University X"
}},
That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.
The following match would retrieve the students for the given university for a given academic year in case that would be needed.
{$match: {
university: "University X",
academic_year: "2018-2019"
}},
That should narrow it down to get the correct documents.

How to aggregate all existing field in my document [duplicate]

I got a problem when I use db.collection.aggregate in MongoDB.
I have a data structure like:
_id:...
Segment:{
"S1":1,
"S2":5,
...
"Sn":10
}
It means the following in Segment: I might have several sub attributes with numeric values. I'd like to sum them up as 1 + 5 + .. + 10
The problem is: I'm not sure about the sub attributes names since for each document the segment numbers are different. So I cannot list each segment name. I just want to use something like a for loop to sum all values together.
I tried queries like:
db.collection.aggregate([
{$group:{
_id:"$Account",
total:{$sum:"$Segment.$"}
])
but it doesn't work.
You have made the classical mistake to have arbitrary field names. MongoDB is "schema-free", but it doesn't mean you don't need to think about your schema. Key names should be descriptive, and in your case, f.e. "S2" does not really mean anything. In order to do most kinds of queries and operations, you will need to redesign you schema to store your data like this:
_id:...
Segment:[
{ field: "S1", value: 1 },
{ field: "S2", value: 5 },
{ field: "Sn", value: 10 },
]
You can then run your query like:
db.collection.aggregate( [
{ $unwind: "$Segment" },
{ $group: {
_id: '$_id',
sum: { $sum: '$Segment.value' }
} }
] );
Which then results into something like this (with the only document from your question):
{
"result" : [
{
"_id" : ObjectId("51e4772e13573be11ac2ca6f"),
"sum" : 16
}
],
"ok" : 1
}
Starting Mongo 3.4, this can be achieved by applying inline operations and thus avoid expensive operations such as $group:
// { _id: "xx", segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
db.collection.aggregate([
{ $addFields: {
total: { $sum: {
$map: { input: { $objectToArray: "$segments" }, as: "kv", in: "$$kv.v" }
}}
}}
])
// { _id: "xx", total: 42, segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
The idea is to transform the object (containing the numbers to sum) as an array. This is the role of $objectToArray, which starting Mongo 3.4.4, transforms { s1: 1, s2: 3, ... } into [ { k: "s1", v: 1 }, { k: "s2", v: 3 }, ... ]. This way, we don't need to care about the field names since we can access values through their "v" fields.
Having an array rather than an object is a first step towards being able to sum its elements. But the elements obtained with $objectToArray are objects and not simple integers. We can get passed this by mapping (the $map operation) these array elements to extract the value of their "v" field. Which in our case results in creating this kind of array: [1, 3, 18, 42].
Finally, it's a simple matter of summing elements within this array, using the $sum operation.
Segment: {s1: 10, s2: 4, s3: 12}
{$set: {"new_array":{$objectToArray: "$Segment"}}}, //makes field names all "k" or "v"
{$project: {_id:0, total:{$sum: "$new_array.v"}}}
"total" will be 26.
$set replaces $addFields in newer versions of mongo. (I'm using 4.2.)
"new_array": [
{
"k": "s1",
"v": 10
},
{
"k": "s2",
"v": 4
},
{
"k": "s3",
"v": 12
}
]
You can also use regular expressions. Eg. /^s/i for words starting with "s".

How to sum every fields in a sub document of MongoDB?

I got a problem when I use db.collection.aggregate in MongoDB.
I have a data structure like:
_id:...
Segment:{
"S1":1,
"S2":5,
...
"Sn":10
}
It means the following in Segment: I might have several sub attributes with numeric values. I'd like to sum them up as 1 + 5 + .. + 10
The problem is: I'm not sure about the sub attributes names since for each document the segment numbers are different. So I cannot list each segment name. I just want to use something like a for loop to sum all values together.
I tried queries like:
db.collection.aggregate([
{$group:{
_id:"$Account",
total:{$sum:"$Segment.$"}
])
but it doesn't work.
You have made the classical mistake to have arbitrary field names. MongoDB is "schema-free", but it doesn't mean you don't need to think about your schema. Key names should be descriptive, and in your case, f.e. "S2" does not really mean anything. In order to do most kinds of queries and operations, you will need to redesign you schema to store your data like this:
_id:...
Segment:[
{ field: "S1", value: 1 },
{ field: "S2", value: 5 },
{ field: "Sn", value: 10 },
]
You can then run your query like:
db.collection.aggregate( [
{ $unwind: "$Segment" },
{ $group: {
_id: '$_id',
sum: { $sum: '$Segment.value' }
} }
] );
Which then results into something like this (with the only document from your question):
{
"result" : [
{
"_id" : ObjectId("51e4772e13573be11ac2ca6f"),
"sum" : 16
}
],
"ok" : 1
}
Starting Mongo 3.4, this can be achieved by applying inline operations and thus avoid expensive operations such as $group:
// { _id: "xx", segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
db.collection.aggregate([
{ $addFields: {
total: { $sum: {
$map: { input: { $objectToArray: "$segments" }, as: "kv", in: "$$kv.v" }
}}
}}
])
// { _id: "xx", total: 42, segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
The idea is to transform the object (containing the numbers to sum) as an array. This is the role of $objectToArray, which starting Mongo 3.4.4, transforms { s1: 1, s2: 3, ... } into [ { k: "s1", v: 1 }, { k: "s2", v: 3 }, ... ]. This way, we don't need to care about the field names since we can access values through their "v" fields.
Having an array rather than an object is a first step towards being able to sum its elements. But the elements obtained with $objectToArray are objects and not simple integers. We can get passed this by mapping (the $map operation) these array elements to extract the value of their "v" field. Which in our case results in creating this kind of array: [1, 3, 18, 42].
Finally, it's a simple matter of summing elements within this array, using the $sum operation.
Segment: {s1: 10, s2: 4, s3: 12}
{$set: {"new_array":{$objectToArray: "$Segment"}}}, //makes field names all "k" or "v"
{$project: {_id:0, total:{$sum: "$new_array.v"}}}
"total" will be 26.
$set replaces $addFields in newer versions of mongo. (I'm using 4.2.)
"new_array": [
{
"k": "s1",
"v": 10
},
{
"k": "s2",
"v": 4
},
{
"k": "s3",
"v": 12
}
]
You can also use regular expressions. Eg. /^s/i for words starting with "s".