Get number of documents per field in MongoDB, like Studio3T Schema operation - mongodb

How do I obtain the distribution of fields among a MongoDB collection, i.e the total count of documents for each field (without knowing the fields) ?
E.g. considering these documents :
{ "doc": { "a": …, "b": … } }
{ "doc": { "a": …, "c": … } }
{ "doc": { "a": …, "c": …, "d": { "e": … } } }
I would like to get
{ "a": 3, "b": 1, "c": 2, "d": 1, "d.e": 1 }
Studio3T has a "Schema" feature which does exactly that (and a bit more) for a random sample of the DB, how is the query constructed ?

One way is to use db.collection.countDocuments() with the $exists operator:
db.collection.countDocuments({ a: { $exists: true});
db.collection.countDocuments({ b: { $exists: true});
db.collection.countDocuments({ c: { $exists: true});

Related

MongoDB query: keys and their distinct values (each key independently)?

If you have this collection of objects:
{ "a": 10, "b": 20, "c": 30 }
{ "a": 11, "b": 20, "c": 31 }
{ "a": 10, "b": 20, "c": 31 }
There is a way to get distinct values, for example, for field "a":
[10, 11]
There is also a way to get distinct values of any tuple, for example, for pairs of ("b", "c"):
[
{"b": 20, "c": 30},
{"b": 20, "c": 31}
]
Is there a way to query distinct values for each field individually in a single query?
For example, I can simply use query 1 above 3 times for "a", "b", "c":
[10, 11]
[20]
[30, 31]
But I guess it might be less efficient and there should be a better option.
Bonus: How to do it if the list of fields is not known upfront?
Ideally, the single query should return all keys and their distinct values:
{
"a": [10, 11],
"b": [20],
"c": [30, 31]
}
Assuming you don't know the full list of the fields beforehand, you need to use $objectToArray to convert the $$ROOT document into an array of k-v tuples. Then group by the field name and $addToSet the values.
db.collection.aggregate([
{
"$project": {
_id: 0,
arr: {
"$objectToArray": "$$ROOT"
}
}
},
{
"$unwind": "$arr"
},
{
$match: {
"arr.k": {
$ne: "_id"
}
}
},
{
$group: {
_id: "$arr.k",
values: {
"$addToSet": "$arr.v"
}
}
}
])
Mongo Playground

Sum from array object value by mongodb

I am trying to solve a problem. I want to write a query that finds a document among my documents which one is greater by the sum of columns A and B's in an array. I write an example down here. I am new to MongoDB and I've been searching a lot but I could not find my solution. So can somebody help me to solve this problem? Here are my sample documents:
document1:
{
"_id" : "1",
"array": [
{
"user": "1",
"A": 2,
"B": 0
},
{
"user": "2",
"A": 3,
"B": 1},
{
"user": "3",
"A": 0,
"B": 5
}
]
}
and document 2:
{
"_id" : "2",
"array": [
{
"user": "4",
"A": 1,
"B": 1
},
{
"user": "5",
"A": 2,
"B": 2
}
]
}
for example, the sum of A and B's in all elements of an array in document 1 is 11 and the sum of A and B's in elements of an array in document 2 is 6. So I want to get document 1 for output because it is greater than 2 after summing all A and B's in all of the elements.
You can try this query:
Create an auxiliar field called total (or whatever name you want) and $add values. This add the $sum of the arrays. That means here you are adding all values from A and B together.
Then sort by the auxiliar field to get the greatest at first position
$limit to only one (the greatest)
And $project to not output the auxiliar field.
db.collection.aggregate([
{
"$addFields": {
"total": {
"$add": [
{
"$sum": "$array.A"
},
{
"$sum": "$array.B"
}
]
}
}
},
{
"$sort": {
"total": -1
}
},
{
"$limit": 1
},
{
"$project": {
"total": 0
}
}
])
Example here

Get value from filed if another filed match condition MongoDB

for example i have such structure of document:
{
"_id": "1230987",
"Z": [{
"A": [{
"B": {
"C": [{
"E": "2104331180",
"D": "boroda.jpg"
}, {
"E": "1450987095",
"D": "small.PNG"
}]
},
}],
}]
}
How could i get value from field E if value in field D matches condition ?
Use an $elemMatch project:
db.collection.find({"Z.A.B.C.D":<condition>},{"Z,A,B,C":{$elemMatch:{D:<condition>} }})
Playground

Project object existence boolean in MongoDB

I have a document structure that looks like this (two example docs below).
{
"A": "value"
},
{
"A": "value",
"B": {
"a": "value",
"b": "value"
}
}
I want to aggregate such that the value of field A is projected while a true/false value is returned depending on whether the object B exists. The result of the query would be:
{
"A": "value",
"B": false
},
{
"A": "value",
"B": true
}
Even a shorter solution:
db.collection.aggregate({
$project: {
A: 1,
B: { $cond: ["$B", true, false] }
}
})
or
db.collection.aggregate({
$project: {
A: 1,
B: { $ifNull: [{ $toBool: "$B" }, false] }
}
})
However, following documents will yield different result than the other answers. Check your application if such documents apply.
{
'A': 'value5',
'B': false
},
{
'A': 'value5',
'B': []
}
You can use below aggregation
db.collection.aggregate([
{ "$addFields": {
"B": {
"$cond": [
{ "$eq": ["$B", undefined] },
false,
true
]
}
}}
])
You may use $type operator:
If the argument is a field that is missing in the input document, $type returns the string "missing".
db.collection.aggregate([
{
$project: {
A: 1,
B: {
$ne: [
{
$type: "$B"
},
"missing"
]
}
}
}
])
MongoPlayground

How to select only not null values when aggregating with first or last in mongodb?

My data represents a dictionary that receives a bunch of updates and potentially new fields (metadata being added to a post). So something like:
> db.collection.find()
{ _id: ..., 'A': 'apple', 'B': 'banana' },
{ _id: ..., 'A': 'artichoke' },
{ _id: ..., 'B': 'blueberry' },
{ _id: ..., 'C': 'cranberry' }
The challenge - I want to find the first (or last) value for each key ignoring blank values (i.e. I want some kind of conditional group by that works at a field not document level). (Equivalent to the starting or ending version of the metadata after updates).
The problem is that:
db.collection.aggregate([
{ $group: {
_id: null,
A: { $last: '$A' },
B: { $last: '$B' },
C: { $last: '$C' }
}}
])
fills in the blanks with nulls (rather than skipping them in the result), so I get:
{ '_id': ..., 'A': null, 'B': null, 'C': 'cranberry' }
when I want:
{ '_id': ..., 'A': 'artichoke', 'B': 'blueberry', 'C': cranberry' }
I don't think this is what you really want, but it does solve the problem you are asking. The aggregation framework cannot really do this, as you are asking for "last results" of different columns from different documents. There is really only one way to do this and it is pretty insane:
db.collection.aggregate([
{ "$group": {
"_id": null,
"A": { "$push": "$A" },
"B": { "$push": "$B" },
"C": { "$push": "$C" }
}},
{ "$unwind": "$A" },
{ "$group": {
"_id": null,
"A": { "$last": "$A" },
"B": { "$last": "$B" },
"C": { "$last": "$C" }
}},
{ "$unwind": "$B" },
{ "$group": {
"_id": null,
"A": { "$last": "$A" },
"B": { "$last": "$B" },
"C": { "$last": "$C" }
}},
{ "$unwind": "$C" },
{ "$group": {
"_id": null,
"A": { "$last": "$A" },
"B": { "$last": "$B" },
"C": { "$last": "$C" }
}},
])
Essentially you compact down the documents pushing all of the found elements into arrays. Then each array is unwound and the $last element is taken from there. You need to do this for each field in order to get the last element of each array, which was the last match for that field.
Not real good and certain to explode the BSON 16MB limit on any meaningful collection.
So what you are really after is looking for a "last seen" value for each field. You could brute force this by iterating the collection and keeping values that are not null. You can even do this on the server like this with mapReduce:
db.collection.mapReduce(
function () {
if (start == 0)
emit( 1, "A" );
start++;
current = this;
Object.keys(store).forEach(function(key) {
if ( current.hasOwnProperty(key) )
store[key] = current[key];
});
},
function(){},
{
"scope": { "start": 0, "store": { "A": null, "B": null, "C": null } },
"finalize": function(){ return store },
"out": { "inline": 1 }
}
)
That will work as well, but iterating the whole collection is nearly as bad as mashing everything together with aggregate.
What you really want in this case is three queries, ideally in parallel to just get the discreet value last seen for each property:
> db.collection.find({ "A": { "$exists": true } }).sort({ "$natural": -1 }).limit(1)
{ "_id" : ObjectId("54b319cd6997a054ce4d71e7"), "A" : "artichoke" }
> db.collection.find({ "B": { "$exists": true } }).sort({ "$natural": -1 }).limit(1)
{ "_id" : ObjectId("54b319cd6997a054ce4d71e8"), "B" : "blueberry" }
> db.collection.find({ "C": { "$exists": true } }).sort({ "$natural": -1 }).limit(1)
{ "_id" : ObjectId("54b319cd6997a054ce4d71e9"), "C" : "cranberry" }
Acutally even better is to create a sparse index on each property and query via $gt and a blank string. This makes sure an index is used and as a sparse index it will only contain documents where the property is present. You'll need to .hint() this, but you still want $natural ordering for the sort:
db.collection.ensureIndex({ "A": -1 },{ "sparse": 1 })
db.collection.ensureIndex({ "B": -1 },{ "sparse": 1 })
db.collection.ensureIndex({ "C": -1 },{ "sparse": 1 })
> db.collection.find({ "A": { "$gt": "" } }).hint({ "A": -1 }).sort({ "$natural": -1 }).limit(1)
{ "_id" : ObjectId("54b319cd6997a054ce4d71e7"), "A" : "artichoke" }
> db.collection.find({ "B": { "$gt": "" } }).hint({ "B": -1 }).sort({ "$natural": -1 }).limit(1)
{ "_id" : ObjectId("54b319cd6997a054ce4d71e8"), "B" : "blueberry" }
> db.collection.find({ "C": { "$gt": "" } }).hint({ "C": -1 }).sort({ "$natural": -1 }).limit(1)
{ "_id" : ObjectId("54b319cd6997a054ce4d71e9"), "C" : "cranberry" }
That's the best way to solve what you are saying here. But as I said, this is how you think you need to solve it. Your real problem likely has another way to approach both storing and querying.
Starting Mongo 3.6, for those using $first or $last as a way to get one value from grouped records (not necessarily the actual first or last), $group's $mergeObjects can be used as a way to find a non-null value from grouped items:
// { "A" : "apple", "B" : "banana" }
// { "A" : "artichoke" }
// { "B" : "blueberry" }
// { "C" : "cranberry" }
db.collection.aggregate([
{ $group: {
_id: null,
A: { $mergeObjects: { a: "$A" } },
B: { $mergeObjects: { b: "$B" } },
C: { $mergeObjects: { c: "$C" } }
}}
])
// { _id: null, A: { a: "artichoke" }, B: { b: "blueberry" }, C: { c: "cranberry" } }
$mergeObjects accumulates an object based on each grouped record. And the thing to note is that $mergeObjects will merge in priority values that aren't null. But that requires to modify the accumulated field to an object, thus the "awkward" { a: "$A" }.
If the output format isn't exactly what you expect, one can always use an additional $project stage.
So I've just thought about how to answer this, but would be interested to hear people's opinions on how right/wrong this is. Based on the reply from #NeilLunn I guess I'll hit the BSON limit, making his version better for pulling the data, but it's important to my app that I can run this query in one go. (Perhaps my real problem is the data design).
The problem we have is that in the "group by" we pull in a version of A, B, C for every document. So my solution is to tell the aggregation what fields it should pull in by changing (slightly) the original data structure to tell the engine which keys are in each document:
> db.collection.find()
{ _id: ..., 'A': 'apple', 'B': 'banana', 'Keys': ['A', 'B']},
{ _id: ..., 'A': 'artichoke', 'Keys': ['A']},
{ _id: ..., 'B': 'blueberry', 'Keys': ['B']},
{ _id: ..., 'C': 'cranberry', 'Keys': ['C']}
Now we can can $unwind on 'Keys' and then group with 'Keys' as '_id'. Thus:
db.collection.aggregate([
{'$unwind': 'Keys'},
{'$group':
{'_id': 'Keys',
'A': {'$last': '$A'},
'B': {'$last': '$B'},
'C': {'$last': '$C'}
}
}
])
I get back a series of documents with _id equal to the key:
{_id: 'A', 'A': 'artichoke', 'B': null, 'C': null},
{_id: 'B', 'A': null, 'B': 'blueberry', 'C': null},
{_id: 'C', 'A': null, 'B': null, 'C': 'cranberry'}
You can then pull the results you want, knowing that the value for key X is only valid for the result where _id is X.
(Of course the next question is how to reduce this series of documents to one, taking the appropriate field each time)