Mongo DB aggregation pipeline: convert string to document/object - mongodb

Have a field of type "String" that contain representation of an object/document
" {"a":35,b:[1,2,3,4]}"
I know is a strange construct but i can't change it.
my goal would be to extract for example the value of "a".
As the document represented by the string are nested and repeated a regex doesnt fit.
So how can i convert in a mongo db aggregation/query this String to object so that i can process it in a following aggregation step?
(could extract string with python make a dict and extract infos, but i'd like to stay inside the aggregation pipeline and so having better performance)

In 4.4 this works
db.target.aggregate([{$project: {
X: "$AD_GRAPHIC",
Y : {
$function : {
body: function(jsonString) {
return JSON.parse(jsonString)
},
args: [ "$AD_GRAPHIC"],
lang: "js"
}
}
}
}])
Basically use the $function operator to invoke the JSON parser. (assumes you have enabled Javascript)
Results
{ "_id" : ObjectId("60093dc8f2c829000e38a8d0"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"modem.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "modem.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d1"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"monitor.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "monitor.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d2"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"mousepad.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "mousepad.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d3"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"keyboard.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "keyboard.jpg" } }
>

There's no native way in the MongoDB engine to parse a blob of JSON from a field. However, I'd recommend just doing it client-side in your language of choice and then if required save it back.
Alternatively, if your data is too big and still needs to aggregate it you could use regex and project out the required fields from the JSON to then use them later to filter etc...
For example if we insert the following document:
> db.test.insertOne({ name: 'test', blob: '{"a":35,b:[1,2,3,4]}' })
{
"acknowledged" : true,
"insertedId" : ObjectId("5ed9fe21b5d91941c9e85cdb")
}
We can then just project out the array with some regex:
db.test.aggregate([
{ $addFields: { b: { $regexFind: { input: "$blob", regex: /\[(((\d+,*))+)\]/ } } } },
{ $addFields: { b: { $split: [ { $arrayElemAt: [ "$b.captures", 0 ] }, "," ] } } }
]);
{
"_id" : ObjectId("5ed9fe21b5d91941c9e85cdb"),
"name" : "test",
"blob" : "{\"a\":35,b:[1,2,3,4]}",
"b" : [
"1",
"2",
"3",
"4"
]
}
This means we can do some filtering, sorting and any of the other aggregation stages.

You could just use JSON.parse()
For example
db.getCollection('system').find({
a: JSON.parse('{"a":35,b:[1,2,3,4]}').a
})

Related

Insert new fields to document at given array index in MongoDB

I have the following document structure in a MongoDB collection :
{
"A" : [ {
"B" : [ { ... } ]
} ]
}
I'd like to update this to :
{
"A" : [ {
"B" : [ { ... } ],
"x" : [],
"y" : { ... }
} ]
}
In other words, I want the "x" and "y" fields to be added to the first element of the "A" array without loosing "B".
Ok as there is only one object in A array you could simply do as below :
Sample Collection Data :
{
"_id" : ObjectId("5e7c3cadc16b5679b4aeec26"),
A:[
{
B: [{ abc: 1 }]
}
]
}
Query :
/** Insert new fields into 'A' array's first object by index 0 */
db.collection.updateOne(
{ "_id" : ObjectId("5e7c3f77c16b5679b4af4caf") },
{ $set: { "A.0.x": [] , "A.0.y" : {abcInY :1 }} }
)
Output :
{
"_id" : ObjectId("5e7c3cadc16b5679b4aeec26"),
"A" : [
{
"B" : [
{
"abc" : 1
}
],
"x" : [],
"y" : {
"abcInY" : 1.0
}
}
]
}
Or Using positional operator $ :
db.collection.updateOne(
{ _id: ObjectId("5e7c3cadc16b5679b4aeec26") , 'A': {$exists : true}},
{ $set: { "A.$.x": [] , "A.$.y" : {abcInY :1 }} }
)
Note : Result will be the same, but functionally when positional operator is used fields x & y are inserted to first object of A array only when A field exists in that documents, if not this positional query would not insert anything (Optionally you can check A is an array condition as well if needed). But when you do updates using index 0 as like in first query if A doesn't exist in document then update would create an A field which is an object & insert fields inside it (Which might cause data inconsistency across docs with two types of A field) - Check below result of 1st query when A doesn't exists.
{
"_id" : ObjectId("5e7c3f77c16b5679b4af4caf"),
"noA" : 1,
"A" : {
"0" : {
"x" : [],
"y" : {
"abcInY" : 1.0
}
}
}
}
However, I think I was able to get anothe#whoami Thanks for the suggestion, I think your first solution should work. However, I think I was able to get another solution to this though I'm not sure if its better or worse (performance wise?) than what you have here. My solution is:
db.coll.update( { "_id" : ObjectId("5e7c4eb3a74cce7fd94a3fe7") }, [ { "$addFields" : { "A" : { "x" : [ 1, 2, 3 ], "y" : { "abc" } } } } ] )
The issue with this is that if "A" has more than one array entry then this will update all elements under "A" which is not something I want. Just out of curiosity is there a way of limiting this solution to only the first entry in "A"?

How to find special characters in a particular attribute of json data stored in mongodb using mongo query

how to search any special characters in a particular nested json object field. Am having a field which stores nested json data.
I need to write a MongoDB query to fetch all the names which is having special characters
Student collection:
Example:
{
_id:123
student: {
"personalinfo":{
"infoid": "YYY21"
"name": "test##!*"
}
}
}
I have tried few regular expressions but I am not sure how to loop in array elements
I expect it to print the infoid & name, which has special characters in the name field.
The following query can do the trick:
db.collection.distinct("student.personalinfo.name",{"student.personalinfo.name": { $not: /^[\w]+[\w ]*$/ } })
Data set:
{
"_id" : ObjectId("5d77a5babd4e75c58d59821d"),
"student" : {
"personalinfo" : {
"infoid" : "YYY21",
"name" : "test##!*"
}
}
}
{
"_id" : ObjectId("5d77a5babd4e75c58d59821e"),
"student" : {
"personalinfo" : {
"infoid" : "YYY21",
"name" : "Bruce##"
}
}
}
{
"_id" : ObjectId("5d77a5babd4e75c58d59821f"),
"student" : {
"personalinfo" : {
"infoid" : "YYY21",
"name" : "Tony"
}
}
}
{
"_id" : ObjectId("5d77a5babd4e75c58d598220"),
"student" : {
"personalinfo" : {
"infoid" : "YYY21",
"name" : "Natasha"
}
}
}
Output:
[ "test##!*", "Bruce##" ]
Try using below regex, it matches all unicode punctuation and symbols.
dbname.find({'student.name':{$regex:"[\p{P}\p{S}]"}})

Update a json document in a mongodb database

I'm trying to update an existing document in a MongoDb. There are many explanations how to do this if you want to update or add key/value pairs on the first level. But in my use-case, I need to create with the first updateOne (with upsert option set) a document with the following structure:
{
"_id" : "1234",
"raw" : {
"meas" : {
"meas1" : {
"data" : "blabla"
}
}
}
}
In the second command, I need to add - in the same document - a "meas2" field at the level of "meas1". My desired output is:
{
"_id" : "1234",
"raw" : {
"meas" : {
"meas1" : {
"data" : "blabla"
},
"meas2" : {
"data" : "foo"
}
}
}
}
I played with statements like
updateOne({"_id":"1234"},{$set:{"raw":{"meas":{"meas2":{"data":"foo"}}}}}, {"upsert":true})
and also with $push, both variants with insert - here only the document and also insertOne, but nothing produces the desired output. Is there a MongoDb expert who could give a hint ? ... I'm sure this functionality exists... Thanks in advance!
When you update {$set: {"raw":{"meas":{"meas2":{"data":"foo"}}}} you're not adding "mesa2" to "meas" but rather you're overriting "raw" completely.
In order to change / add one field in a document refer to it with dot notations.
The command you want is updateOne({"_id": "1234"}, {$set: {"raw.meas.mesa2": { "data" : "foo" }}}, {"upsert":"true"})
You need to understand the below concept first
Set Fields in Embedded Documents, with details document check at official documentation of mongo
For your problem, just look at the below execution on the mongo shell:
> db.st4.insert({
... "_id" : "1234",
... "raw" : {
... "meas" : {
... "meas1" : {
... "data" : "blabla"
... }
... }
... }
... })
WriteResult({ "nInserted" : 1 })
> db.st4.find()
{ "_id" : "1234", "raw" : { "meas" : { "meas1" : { "data" : "blabla" } } } }
>
> // Below query will replace the raw document with {"meas":{"meas2":{"data":"foo"}}}, will not add
> //db.st4.updateOne({"_id":"1234"},{$set:{"raw":{"meas":{"meas2":{"data":"foo"}}}}}, {"upsert":true})
>// By using the dot operator, you actually write the values inside the documents i.e you are replacing or adding at raw.meas.mesa2 i.e inside the document of mesa2.
> db.st4.updateOne({"_id":"1234"},{$set: {"raw.meas.mesa2": { "data" : "foo" }}}, {"upsert":"true"})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
> db.st4.find().pretty()
{
"_id" : "1234",
"raw" : {
"meas" : {
"meas1" : {
"data" : "blabla"
},
"mesa2" : {
"data" : "foo"
}
}
}
}
>

Retrieving value of an emedded object in mongo

Followup Question
Thanks #4J41 for your spot on resolution. Along the same lines, I'd also like to validate one other thing.
I have a mongo document that contains an array of Strings, and I need to convert this particular array of strings into an array of object containing a key-value pair. Below is my curent appraoch to it.
Mongo Record:
Same mongo record in my initial question below.
Current Query:
templateAttributes.find({platform:"V1"}).map(function(c){
//instantiate a new array
var optionsArray = [];
for (var i=0;i< c['available']['Community']['attributes']['type']['values'].length; i++){
optionsArray[i] = {}; // creates a new object
optionsArray[i].label = c['available']['Community']['attributes']['type']['values'][i];
optionsArray[i].value = c['available']['Community']['attributes']['type']['values'][i];
}
return optionsArray;
})[0];
Result:
[{label:"well-known", value:"well-known"},
{label:"simple", value:"simple"},
{label:"complex", value:"complex"}]
Is my approach efficient enough, or is there a way to optimize the above query to get the same desired result?
Initial Question
I have a mongo document like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"platform" : "A1",
"available" : {
"Community" : {
"attributes" : {
"type" : {
"values" : [
"well-known",
"simple",
"complex"
],
"defaultValue" : "well-known"
},
[......]
}
I'm trying to query the DB and retrieve only the value of defaultValue field.
I tried:
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
as well as
db.templateAttributes.findOne(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
But they both seem to retrieve the entire object hirarchy like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"available" : {
"Community" : {
"attributes" : {
"type" : {
"defaultValue" : "well-known"
}
}
}
}
}
The only way I could get it to work was with find and map function, but it seems to be convoluted a bit.
Does anyone have a simpler way to get this result?
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
).map(function(c){
return c['available']['Community']['attributes']['type']['defaultValue']
})[0]
Output
well-known
You could try the following.
Using find:
db.templateAttributes.find({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 }).toArray()[0]['available']['Community']['attributes']['type']['defaultValue']
Using findOne:
db.templateAttributes.findOne({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 })['available']['Community']['attributes']['type']['defaultValue']
Using aggregation:
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}},
{"$project": {_id:0, default:"$available.Community.attributes.type.defaultValue"}}
]).toArray()[0].default
Output:
well-known
Edit: Answering the updated question: Please use aggregation here.
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}}, {"$unwind": "$available.Community.attributes.type.values"},
{$group: {"_id": null, "val":{"$push":{label:"$available.Community.attributes.type.values",
value:"$available.Community.attributes.type.values"}}}}
]).toArray()[0].val
Output:
[
{
"label" : "well-known",
"value" : "well-known"
},
{
"label" : "simple",
"value" : "simple"
},
{
"label" : "complex",
"value" : "complex"
}
]

get a mongo document based in two different values

I have the following document structure
{
"_id" : "aaa0001",
"path" : "/some/path",
"information" : {
"name" : "info"
},
"colors" : {
"colors" : [
{
"key" : "AAAA001",
"name" : "White"
},
{
"key" : "BBBB002",
"name" : "Black"
}
]
}
}
the idea is that I have to return the document by the color key. I have two parameters the "path" and the "color", so, I was trying to make something like this
db.components.find(
{$and:[
{"path" : "/some/path"},
{"colors":{"colors" : {$elemMatch: { "key" : "AAAA001" } } } }
]})
I'm getting the following message "Script is executed successfully, but there is no results to show".
Can anyone give me some directions regarding this?
thanks
Use the following query:
db.components.find({
"path": "/some/path",
"colors.colors.key" : "AAAA001"
})
MongoDB expects query document to contain field-value pairs { <field>: <value> }. So, in your example you're querying for a document with colors field equal to:
{"colors" : {$elemMatch: { "key" : "AAAA001" } } }
As for $and and $elemMatch operators, you don't need them in such a simple query.
For more information read Query Documents.
Update
You can also select only matching subdocument from colors array using Positional Operator $:
db.components.find({
"path": "/some/path",
"colors.colors.key" : "AAAA001"
}, {
_id: 0,
"colors.colors.$": 1
})
Though, you won't be able to change your documents structure, thus getting
{ "colors" : { "colors" : [ { "key" : "AAAA001", "name" : "White" } ] } }