How to update string field in mongodb and manipulate string values? - mongodb

I have a MongoDB collection with some documents that have a field called Personal.FirstName and another field call Personal.Surname. Some documents are messed up and have the persons first name and last name in both fields. For example there are some documents that have Personal.FirstName = 'John Doe' and Personal.Surname = 'John Doe'.
I want to write a mongo update statement that will do the following:
Find all of the documents that have a Personal section
Find all of the documents where Personal.FirstName == Personal.Surname
Update Personal.FirstName to be just the first part of Personal.FirstName before the space
Update Personal.Surname to be just the second part of Personal.Surname after the space
Is this possible in a mongo update statement? I am new to mongo and know very little about how to query it.
EDIT: here is an example document
{
"_id" : LUUID("fcd140b1-ec0f-0c49-aa79-fed00899290e"),
"Personal" : {
"FirstName" : "John Doe",
"Surname" : "John Doe"
}
}

you can't do this in a single query, but you can achieve this by iterating over result like this :
db.name.find({$and: [{Personal: {$exists: true}}, {$where: "this.Personal.FirstName == this.Personal.Surname"}]}).forEach(function(e,i){
var parts = e.Personal.FirstName.split(" ");
e.Personal.FirstName = parts[0];
e.Personal.Surname = parts[1];
db.name.save(e);
})
result:
{ "_id" : "fcd140b1-ec0f-0c49-aa79-fed00899290e", "Personal" : { "FirstName" : "John", "Surname" : "Doe" } }

The idea is get a subset of the documents from your collection by filtering the documents that match the specified criteria. Once you get the subset you iterate the list and update each document
within a loop.
Now, to get the subset, you need to run an aggregation pipeline which is faster than doing a filter using find() and $where operator. Take the following example aggregate() operation which uses $redact as the filtering mechanism
and then a $project pipeline to create an additional field that you can use in your update. The cursor from the aggregate() method containing the results can then be iterated with its forEach() method and subsequently update the collection on the documents from the subset:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$and": [
{ "$eq": [ "$Personal.FirstName", "$Personal.Surname" ] },
{
"$gt": [
{
"$size": {
"$split": ["$Personal.FirstName", " "]
}
},
0
]
}
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
"$project": {
"FirstName": {
"$arrayElemAt": [
{ "$split": ["$Personal.FirstName", " "] },
0
]
},
"Surname": {
"$arrayElemAt": [
{ "$split": ["$Personal.FirstName", " "] },
1
]
}
}
}
]).forEach(function(doc) {
db.collection.updateOne(
{ "_id": doc._id },
{
"$set": {
"Personal.FirstName": doc.FirstName,
"Personal.Surname": doc.Surname,
}
}
)
})
Using the aggregation framework with the $redact pipeline operator allows you to process the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This should improve in performance significantly because the $redact operator uses MongoDB's native operators whilst a query operation with the $where operator calls the JavaScript engine to evaluate Javascript code on every document and checks the condition for each, thus can be very slow as MongoDB evaluates non-$where query operations before $where expressions and non-$where query statements may use an index.

Related

How can I filter document in mongodb?

I have a query collection in mongodb which contains document in the below format :
{
_id : ObjectId("61aced92ede..."),
query : "How to solve...?",
answer : []
is_solved : false
}
Now, I want to filter the documents with the following condition
filter all documents that are not solved. (is_solved : true)
filter "n" number of document that are solved.
So, That result will have all unsolved documents and only 10 solved documents in an array.
You can use this aggregation query:
First use $facet to create two ways: The document solved, and document not solved.
Into each way do the necessary $match and $limit the solved documents.
Then concatenate the values using $concatArrays.
db.collection.aggregate([
{
"$facet": {
"not_solved": [
{
"$match": {
"is_solved": false
}
}
],
"solved": [
{
"$match": {
"is_solved": true
}
},
{
"$limit": 10
}
]
}
},
{
"$project": {
"result": {
"$concatArrays": [
"$not_solved",
"$solved"
]
}
}
}
])
Example here where I've used $limit: 1 to see easier.
Also, if you want, you can add $unwind at the end of the aggregation to get values at the top level like this example

if mongodb match inside aggregation returns nothing, how to make a new query?

I use match to select some documents from the collection, and then output all other documents except those found.
If match doesn't find any documents, then I need to display all available documents from the collection.
How can this be done?
Without an example I don't know if I've understood correctly, but you can try this aggregation query (or add this aggregation stages into your query).
The ide is using $facet create two ways:
Frist way: Match the value
Second way: Get everything
And use $project to output one of these options using $cond and $size.
Into the $project if the array returned in the "exists way" is 0 (any result) the result is no_exists(i.e. all values) otherwise is the exists value.
db.collection.aggregate([
{
"$facet": {
"exists": [
{
"$match": {
// your match
}
}
],
"no_exists": []
}
},
{
"$project": {
"result": {
"$cond": {
"if": {
"$eq": [
{
"$size": "$exists"
},
0
]
},
"then": "$no_exists",
"else": "$exists"
}
}
}
}
])
Example here where value exists and output only the value, and here where not exists and output all collection.

MongoDB sort by value in embedded document array

I have a MongoDB collection of documents formatted as shown below:
{
"_id" : ...,
"username" : "foo",
"challengeDetails" : [
{
"ID" : ...,
"pb" : 30081,
},
{
"ID" : ...,
"pb" : 23995,
},
...
]
}
How can I write a find query for records that have a challengeDetails documents with a matching ID and sort them by the corresponding PB?
I have tried (this is using the NodeJS driver, which is why the projection syntax is weird)
const result = await collection
.find(
{ "challengeDetails.ID": challengeObjectID},
{
projection: {"challengeDetails.$": 1},
sort: {"challengeDetails.0.pb": 1}
}
)
This returns the correct records (documents with challengeDetails for only the matching ID) but they're not sorted.
I think this doesn't work because as the docs say:
When the find() method includes a sort(), the find() method applies the sort() to order the matching documents before it applies the positional $ projection operator.
But they don't explain how to sort after projecting. How would I write a query to do this? (I have a feeling aggregation may be required but am not familiar enough with MongoDB to write that myself)
You need to use aggregation to sort n array
$unwind to deconstruct the array
$match to match the value
$sort for sorting
$group to reconstruct the array
Here is the code
db.collection.aggregate([
{ "$unwind": "$challengeDetails" },
{ "$match": { "challengeDetails.ID": 2 } },
{ "$sort": { "challengeDetails.pb": 1 } },
{
"$group": {
"_id": "$_id",
"username": { "$first": "$username" },
"challengeDetails": { $push: "$challengeDetails" }
}
}
])
Working Mongo playground

How to filter array in a mongodb query

In mongodb, I have a collection that contains a single document that looks like the following:
{
"_id" : ObjectId("5552b7fd9e8c7572e36e39df"),
"StackSummaries" : [
{
"StackId" : "arn:aws:cloudformation:ap-southeast-2:406119630047:stack/XXXX-30fb22a-285-439ee279-c7c8d36/4ebd8770-f8f4-11e4-bf36-503f2370240f",
"TemplateDescription" : "XXXX",
"StackStatusReason" : "",
"CreationTime" : "2015-05-12T22:14:50.535Z",
"StackName" : "XXXX",
"StackStatus" : "CREATE_COMPLETE"
},
{
"TemplateDescription" : "XXXX",
"StackStatusReason" : "",
"CreationTime" : "2015-05-11T04:02:05.543Z",
"StackName" : "XXXX",
"StackStatus" : "DELETE_COMPLETE",
"StackId" : "arn:aws:cloudformation:ap-southeast-2:406119630047:stack/XXXXX/7c8d04e0-f792-11e4-bb12-506726f15f9a"
},
{ ... },
{ many others }
]
}
ie the imported results of the aws cli command aws cloudformation
list-stacks
I'm trying to find the items of the StackSummaries array that have a StackStatus of CREATE_COMPLETE or UPDATE_COMPLETE. After much experimenting and reading other SO posts I arrived at the following:
db.cf_list_stacks.aggregate( {$match: {"StackSummaries.StackStatus": "CREATE_COMPLETE"}})
However this still returns the whole document (and I haven't even worried about UPDATE_COMPLETE).
I'm coming from an SQL background and struggling with simple queries like this. Any ideas on how to get the information I'm looking for?
SO posts I've looked at:
MongoDB query with elemMatch for nested array data
MongoDB: multiple $elemMatch
$projection vs $elemMatch
Make $elemMatch (projection) return all objects that match criteria
Update
Notes on things I learned while understanding this topic:
aggregate() is just a pipeline (like a Unix shell pipeline) where each $ operator is just another step. And like shell pipelines they can look complex, but you just build them up step by step until you get the results you want
Mongo has a great webinar: Exploring the Aggregation Framework
RoboMongo is a good tool (GPL3) for working with Mongo data and queries
If you only want the object inside the StackSummaries array, you should use the $unwind clause to expand the array, filter the documents you want and then project only the parts of the document that you actually want.
The query would look something like this:
db.cf_list_stacks.aggregate([
{ '$unwind' : '$StackSummaries' },
{ '$match' : { 'StackSummaries.StackStatus' : 'CREATE_COMPLETE' } },
{ '$project' : {
'TemplateDescription' : '$StackSummaries.TemplateDescription',
'StackStatusReason' : '$StackSummaries.StackStatusReason',
...
} }
])
Useful links:
Aggregation pipeline documentation
$unwind Documentation
$project Documentation
With MongoDB 3.4 and newer, you can leverage the $addFields and $filter operators with the aggregation framework to get the desired result.
Consider running the following pipeline:
db.cf_list_stacks.aggregate([
{
"$addFields": {
"StackSummaries": {
"$filter": {
"input": "$StackSummaries",
"as": "el":
"cond": {
"$in": [
"$$el.StackStatus",
["CREATE_COMPLETE", "UPDATE_COMPLETE"]
]
}
}
}
}
}
]);
For MongoDB 3.2
db.cf_list_stacks.aggregate([
{
"$project": {
"StackSummaries": {
"$filter": {
"input": "$StackSummaries",
"as": "el":
"cond": {
"$or": [
{ "$eq": ["$$el.StackStatus", "CREATE_COMPLETE"] },
{ "$eq": ["$$el.StackStatus", "UPDATE_COMPLETE"] }
]
}
}
}
}
}
]);
For MongoDB 3.0 and below
db.cf_list_stacks.aggregate([
{ "$unwind": "$StackSummaries" },
{
"$match": {
"StackSummaries.StackStatus": {
"$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
}
}
},
{
"$group": {
"_id": "$_id",
"StackSummaries": {
"$addToSet": "$StackSummaries"
}
}
}
])
The above pipeline has the $unwind operator which deconstructs the StackSummaries array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
A further filtering is required after the $unwind to get only the documents that pass the given criteria thus a second $match operator pipeline stage follows.
In order to get the original array field after doing the $unwind bit, you would need to group the documents using the $group operator and within the group you can then use the $addToSet array operator to then push the elements into the array.
Based on the criteria that you are trying to find the items of the StackSummaries array that have a StackStatus of CREATE_COMPLETE OR UPDATE_COMPLETE, you could use $elemMatch projection but this won't work with the $in operator as required to get the document with StackStatus of CREATE_COMPLETE OR UPDATE_COMPLETE at this time. There is a JIRA issue for this:
db.cf_list_stacks.find(
{
"StackSummaries.StackStatus": {
"$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
}
},
{
"StackSummaries": {
"$elemMatch": {
"StackStatus": {
"$in": ["CREATE_COMPLETE", "UPDATE_COMPLETE"]
}
}
}
})
This will only give you documents where the StackStatus has the "CREATE_COMPLETE" value.

MongoDB Aggregation - Possible to have an OR operator in a group function?

I need to perform an aggregate on a mongodb collection. The issue I'm having is that I need to do an OR operation in a group function.
My documents have different keys for values where ideally they should be the same. I'm stuck with these values so I cannot just normalise the data.
For example the collection has three different field values for an email field :
{
_id : 1,
name : "docType1",
e_mail : "joe#example.com"
}
{
_id : 2,
name : "docType2",
email : "joe#example.com"
}
{
_id : 3,
name : "docType3",
mail : "joe#example.com"
}
Is it possible to perform a group function that groups the documents by the email values once the key is either e_mail, email or mail?
Perhaps its possible in combination with a project but I've been unsuccessful in my attempts so far is I think I would also need an OR here?
Yes it is possible with the aggregation framework. You basically need to test with the $ifNull operator and use the second condition in a nested form:
db.collection.aggregate([
{ "$group": {
"_id": { "$ifNull": [
"$e_mail",
{ "$ifNull": [
"$email",
{ "$ifNull": [
"$mail",
false
]}
]}
]},
"count": { "$sum": 1 }
}},
{ "$match": { "_id": { "$ne": false } } }
])
So the $ifNull takes two arguments, the first being the field to test which is returned where it exists or is not null, and the second is the alternate value to return where the condition is not true.