Find out all types used in a MongoDB collection - mongodb

Let's say you want to use mongoexport/import to update a collection (for reasons explained here. You should make sure the types in the collection are JSON-safe.
How can one determine all the types used in all documents of a collection, including within array elements, using the aggregation framework?

You can use $objectToArray in combination with $map and $type.
I think something like this should get you started:
db.collection.aggregate([
{ $project: {
types: {
$map: {
input: { $objectToArray: "$$CURRENT" },
in: { $type: [ "$$this.v" ] }
}
}
}
}
])
Note it is not recursive and it would not go deep into the values of the arrays since I am not also sure how many levels you want to go deep and even what is the desired output. So hopefully that is a good start for you.
You can see that aggregation with provided input with various types working here.

Related

Is it possible to $setOnInsert with aggregation pipeline?

MongoDB has recently added an option to perform an update operation by providing an aggregation pipeline rather than the standard modifier object. Check MongoDB's docs on this topic.
The ability to use aggregation pipeline, whose statements can refer to existing document properties, can be extremely useful in situations when certain fields needs to be evaluated based on other fields, e.g. during data migration.
Moreover, most of the standard update operators like $set, $push, $inc, etc. can be successfully replicated with the aggregation expression language so in some sense this new functionality generalizes the good old modifiers technique. Though, I must admit the pipeline can become quite verbose if one tries to do things like $addToSet. This of course brings up a whole bunch of performance related questions, but let's ignore them for now.
So far, there's been just one thing which I haven't been able to fully replicate with the aggregation pipeline update, namely the $setOnInsert operator. Let's assume that I want to perform an upsert:
db.test.update(selector, pipeline, { upsert: true });
My initial intuition was that the $$ROOT variable (which I can use in the pipeline) will equal null unless there exists a document that matches selector. Unfortunately, but probably for a good reason, MongoDB developers decided that $$ROOT should be derived from selector by default. It makes sense when you think about how normal $setOnInsert works, but it also makes it practically impossible to distinguish between an update and an insert within pipeline.
I know what you're thinking. You can look at $$ROOT._id. This is a good idea, though if _id is part of the selector it doesn't work anymore. I have figured out that this can be bypassed by tricking MongoDB a little bit and doing things like:
selector = {
_id: { $in: [value, 'fake'] },
}
instead if the simpler { _id: value }, but this doesn't look clean. Please note that if $in only contains one element, then Mongo is actually clever enough to figure out what the identifier should be and it populates $$ROOT accordingly (sic!).
I am wondering if anyone has a better idea how to approach this. Maybe there's some hidden variable that I could potentially use inside the pipeline itself to distinguish between update and insert (e.g. in $merge stage there's $$new variable which serves a similar purpose)?
If there is no matching documents, $$ROOT will have only _id field. So you can transform $$ROOT to array by its key/value pairs and check if the size of that array is equal to 1. If it is then create a new document, and if it is not then do nothing.
$objectToArray and $size to convert $$ROOT to an array by its key/value pairs and to get the size of that array
$cond to check if the size of the array above is equal to 1. If it is then merge current $$ROOT (which is only _id field) with the update object. If it is not, return the current $$ROOT. In both scenarios, put result in result feild.
$mergeObjects to merge $$ROOT and the update that you are sending, and put that in the result field
$replaceRoot to replace root to the result field from previous stage
db.collection.update({
_id: 1
},
[
{
$set: {
result: {
$cond: {
if: {
"$eq": [
{
$size: {
$objectToArray: "$$ROOT"
},
},
1
]
},
then: {
$mergeObjects: [
"$$ROOT",
{
key: 3
}
]
},
else: "$$ROOT"
},
}
}
},
{
$replaceRoot: {
newRoot: "$result"
}
}
],
{
upsert: true
})
Working example

MongoDB query over several collections with one sort stage

I have some data with identical layout divided over several collections, say we have collections named Jobs.Current, Jobs.Finished, Jobs.ByJames.
I have implemented a complex query using some aggregation stages on one of these collections, where the last stage is the sorting. It's something like this (but in real it's implemented in C# and additionally doing a projection):
db.ArchivedJobs.aggregate([ { $match: { Name: { $gte: "A" } } }, { $addFields: { "UpdatedTime": { $max: "$Transitions.TimeStamp" } } }, { $sort: { "__tmp": 1 } } ])
My new requirement is to include all these collections into my query. I could do it by simply running the same query on all collections in sequence - but then I still need to sort the results together. As this sort isn't so trivial (using an additional field being created by a $max on a sub-array) and I'm using skip and limit options I hope it's possible to do it in a way like:
Doing the query I already implemented on all relevant collections by defining appropriate aggregation steps
Sorting the whole result afterwards inside the same aggregation request
I found something with a $lookup stage, but couldn't apply it to my request as it needs to do some field-oriented matching (?). I need to access the complete objects.
The data is something like
{
"_id":"4242",
"name":"Stream recording Test - BBC 60 secs switch",
"transitions":[
{
"_id":"123",
"timeStamp":"2020-02-13T14:59:40.449Z",
"currentProcState":"Waiting"
},
{
"_id":"124",
"timeStamp":"2020-02-13T14:59:40.55Z",
"currentProcState":"Running"
},
{
"_id":"125",
"timeStamp":"2020-02-13T15:00:23.216Z",
"currentProcState":"Error"
} ],
"currentState":"Error"
}

How can I use an aggregation pipeline to see which documents have a field with a string that starts with any of the strings in a list?

I am using mongo server version 3.4, so my question pertains to the functionality of that version. I cannot upgrade anytime soon, so please keep that in mind. If have a field in some documents in a MongoDB collection that may contain a string but also have trailing characters, how might I find them when submitting multiple "startsWith" strings to be evaluated in the same query? I may have some difficulty explaining this, so let me show some examples. Let's say that I have a field called "description" in all of my documents. This description might be encoded so that the text is not completely straightforward. Some values might be:
green:A-4_ABC
yellow:C-12_456
red:A-431_ZXCVQ
yellow_green:C-12_999
brown:B-3_R
gray:EN-44_195
EDIT: I think I made a mistake with using words in my keys. The keys are a randomized string of numbers, letters, and underscores, followed by a colon, then one to three letters, followed by a dash, then a couple of numbers, then an underscore, and lastly followed by several alphanumeric characters:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82_FR2
I realize that this sounds arbitrary and stupid, but it is an encoding of information that is intended to result in a unique key. It is in data that I receive, so I cannot change it, unfortunately.
Now, I want to find all of the documents with descriptions that start with any of the following values, and all of these values must be submitted in the same query. I might have hundreds of submitted values, and I need to get all matching documents at once. Here is a short list of what might be submitted in a single query:
green:A-4
red:A-431
gray:EN-44
yellow_green:C-12
Note that it was not accidental that the text is everything prior to the last underscore. And, as with one of the examples, there might be more than one underscore. With my use case, I cannot create a query that hard-codes these strings in the javascript regex format. And the $in filter does not work with "startsWith" functionality, particularly when you pass in a list of strings (though I am familiar with supplying a list of hard-coded javascript regexes). Is there any way to use the $in operator where I can take a list of strings that are passed in from the user who wants to run a query like this? Or is there something equivalent? The cherry on the top of all of this would be to find a way to project the matching document with the string that it matched (either from the query, or by some substring magic that I cannot seem to figure out).
EDIT: Specifically, when I find each document, I want to be able to project everything from they key up until the LAST underscore, like:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82
(along with its value)
Thanks in advance for any nudges in the right direction.
We use $objectToArray to get {k:field_name, v:field_value} array. Then we split by _ token all values and convert to object with $arrayToObject operator.
Next step we apply $match operator to filter documents and exclude data with $unset.
Note: If your document contains array or subdocuments, we may use $filter before we convert $objectToArray.
db.collection.aggregate([
{
$addFields: {
data: {
$arrayToObject: {
$map: {
input: {
$objectToArray: "$$ROOT"
},
in: {
k: "$$this.k",
v: {
$arrayElemAt: [
{
$split: [
{
$toString: "$$this.v"
},
"_"
]
},
0
]
}
}
}
}
}
}
},
{
$match: {
"data.green": "A-4",
"data.red": "A-431",
"data.gray": "EN-44",
"data.yellow_green": "C-12"
}
},
{
$unset: "data"
}
])
MongoPlayground

What is the purpose of $eq

I noticed a new $eq operator released with MongoDB 3.0 and I don't understand the purpose of it. For instance these two queries are exactly the same:
db.users.find({age:21})
and
db.users.find({age:{$eq:21}})
Does anyone know why this was necessary?
The problem was that you'd have to handle equality differently from comparison when you had some kind of query builder, so it's
{ a : { $gt : 3 } }
{ a : { $lt : 3 } }
but
{ a : 3 }
for equality, which looks completely different. The same applies for composition of $not, as JohnnyHK already pointed out. Also, comparing with $eq saves you from having to $-escape user provided strings. Therefore, people asked for alternatives that are syntactically closer and it was implemented. The Jira ticket contains a longer discussion which mentions all these points.
The clearer syntax of an $eq operator might also make sense in the aggregation framework to compare two fields, should such a feature be implemented.
Also, the feature has apparently been around since 2.5, was added to the documentation relatively late.
One specific application I can see for $eq is with cases like the $not operator which requires that its value is an operator-expression.
This allows you to construct a query like:
db.zips.find({state: {$not: {$eq: 'NY'}}})
Before, the closest you could get to this semantically was:
db.zips.find({state: {$not: {$regex: /^NY$/}}})
I realize there are other ways to represent the functionality of that query, but if you need to use the $not operator there for other reasons, this would now allow it.
In filter part of an aggregation query if you need to check if some field is equal to a value you can not use assign syntax:
db.sales.aggregate([
{
$project: {
items: {
$filter: {
input: "$items",
as: "item",
cond: { $eq: [ "$$item.price", 100 ] }
}
}
}
}
])

Mongo complex sorting?

I know how to sort queries in MongoDB by multiple fields, e.g., db.coll.find().sort({a:1,b:-1}).
Can I sort with a user-defined function; e.g., supposing a and b are integers, by the difference between a and b (a-b)?
Thanks!
UPDATE: This answer appears to be out of date; it seems that custom sorting can be more or less achieved by using the $project function of the aggregation pipeline to transform the input documents prior to sorting. See also #Ari's answer.
I don't think this is possible directly; the sort documentation certainly doesn't mention any way to provide a custom compare function.
You're probably best off doing the sort in the client, but if you're really determined to do it on the server you might be able to use db.eval() to arrange to run the sort on the server (if your client supports it).
Server-side sort:
db.eval(function() {
return db.scratch.find().toArray().sort(function(doc1, doc2) {
return doc1.a - doc2.a
})
});
Versus the equivalent client-side sort:
db.scratch.find().toArray().sort(function(doc1, doc2) {
return doc1.a - doc2.b
});
Note that it's also possible to sort via an aggregation pipeline and by the $orderby operator (i.e. in addition to .sort()) however neither of these ways lets you provide a custom sort function either.
Ran into this and this is what I came up with:
db.collection.aggregate([
{
$project: {
difference: { $subtract: ["$a", "$b"] }
// Add other keys in here as necessary
}
},
{
$sort: { difference: -1 }
}
])
Why don't create the field with this operation and sort on it ?