Remove HTML Tags MondoDB - mongodb

I am creating a query to extract description of customers in mongodb. Unfortunately, the description is in HTML Format. Is there a way to replace all HTML tags and make it as " ". Either replace it with " " or remove HTML Tags.
Below is a sample document
{
"_id" : ObjectId("61f72aefdc85500a8baa6bb8")
"CustomerPin" : "22010871",
"CustomerName" : "TestLastName, TestFirstName",
"Age" : 39.0,
"Gender" : "Male",
"Description" : "<p><span>This will be a test description</span><br/></p>",
}
The output should remove "p", "span", and "br". Is there a function in mongodb to remove them all at once without repeating $project
This is the expected output:
{
"_id" : ObjectId("61f72aefdc85500a8baa6bb8")
"CustomerPin" : "22010871",
"CustomerName" : "TestLastName, TestFirstName",
"Age" : 39.0,
"Gender" : "Male",
"Description" : "This will be a test description",
}
Thanks!

One way to do it is by removing all tags by regex in pre hook of save method
Description.replace(/(<([^>]+)>)/gi, "");
See hooks here

If you use Mongo 4.2 then you have to find the exact regex which will extract content from HTML. Below you can find an aggregate pipeline and the regex also.
db.getCollection("name_of_your_collection").aggregate({
$set: {
contentRegex: {
$regexFind: { input: "$Description", regex: /([^<>]+)(?!([^<]+)?>)/gi }
}
}
},
{
$set: {
content: { $ifNull: ["$contentRegex.match", "$Description"] }
}
},
{
$unset: [ "contentRegex" ]
}
)

Related

Mongo DB aggregation pipeline: convert string to document/object

Have a field of type "String" that contain representation of an object/document
" {"a":35,b:[1,2,3,4]}"
I know is a strange construct but i can't change it.
my goal would be to extract for example the value of "a".
As the document represented by the string are nested and repeated a regex doesnt fit.
So how can i convert in a mongo db aggregation/query this String to object so that i can process it in a following aggregation step?
(could extract string with python make a dict and extract infos, but i'd like to stay inside the aggregation pipeline and so having better performance)
In 4.4 this works
db.target.aggregate([{$project: {
X: "$AD_GRAPHIC",
Y : {
$function : {
body: function(jsonString) {
return JSON.parse(jsonString)
},
args: [ "$AD_GRAPHIC"],
lang: "js"
}
}
}
}])
Basically use the $function operator to invoke the JSON parser. (assumes you have enabled Javascript)
Results
{ "_id" : ObjectId("60093dc8f2c829000e38a8d0"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"modem.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "modem.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d1"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"monitor.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "monitor.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d2"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"mousepad.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "mousepad.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d3"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"keyboard.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "keyboard.jpg" } }
>
There's no native way in the MongoDB engine to parse a blob of JSON from a field. However, I'd recommend just doing it client-side in your language of choice and then if required save it back.
Alternatively, if your data is too big and still needs to aggregate it you could use regex and project out the required fields from the JSON to then use them later to filter etc...
For example if we insert the following document:
> db.test.insertOne({ name: 'test', blob: '{"a":35,b:[1,2,3,4]}' })
{
"acknowledged" : true,
"insertedId" : ObjectId("5ed9fe21b5d91941c9e85cdb")
}
We can then just project out the array with some regex:
db.test.aggregate([
{ $addFields: { b: { $regexFind: { input: "$blob", regex: /\[(((\d+,*))+)\]/ } } } },
{ $addFields: { b: { $split: [ { $arrayElemAt: [ "$b.captures", 0 ] }, "," ] } } }
]);
{
"_id" : ObjectId("5ed9fe21b5d91941c9e85cdb"),
"name" : "test",
"blob" : "{\"a\":35,b:[1,2,3,4]}",
"b" : [
"1",
"2",
"3",
"4"
]
}
This means we can do some filtering, sorting and any of the other aggregation stages.
You could just use JSON.parse()
For example
db.getCollection('system').find({
a: JSON.parse('{"a":35,b:[1,2,3,4]}').a
})

Perform a search on main collection field and array of objects simultaneously

I have my document structure as below:
{
"codeId" : 8.7628945723895E13, // long numeric value stored in scientific notation by Mongodb
"problemName" : "Hardware Problem",
"problemErrorCode" : "97695686856",
"status" : "active",
"problemDescription" : "ghdsojgnhsdjgh sdojghsdjoghdghd i0dhgjodshgddsgsdsdfghsdfg",
"subProblems" : [
{
"codeId" : 8.76289457238896E14,
"problemName" : "Some problem",
"problemErrorCode" : "57790389503490249640",
"problemDescription" : "This is edited",
"status" : "active",
"_id" : ObjectId("589476eeae39b20b1c15535b")
},
...
]
}
I have a search field which should search by codeId which basically serves as parentCodeID in search fields as shown below
Now, along with parentIdCode I want to search for codeId, problemCode, problemName and problemDescription as well.
How do I query the submodules with a regex search and at same time tag some parent field with "$or" clause etc. to achieve this ?
You can try something like this.
query = {
'$or': [{
"codeId":somevalue
}, {
"subProblems.codeId": {
"$regex": searchValue,
"$options": "i"
}
}, {
//rest of sub modules fields
}]
};

Select LIKE string in MongoDB with Array

I have the following document with this array:
"r" : [{
"id" : "890",
"ca" : "Other CPF Schemes and Priorities",
"su" : "National Day Rally 2015"
}, {
"id" : "1031-52347",
"ca" : "Current Events",
"su" : "Lee Kuan Yew"
}]
and I would like to list all documents where the id has got a dash so document with "id" : "1031-52347" will be returned.
I tried this:
{
r: { id: { $in: [/^-/] } }
}
but not able to get anything.
What would be the correct syntax?
I used this regex:
^[0-9]+-[0-9]+$
Debuggex Demo
You should try this database query:
"r":
{
{ "id": {"$regex" : new RegExp("^[0-9]+-[0-9]+$") } }
}
UPDATE
Working database queries by Blakes Seven
db.mydb.find({ "r.id": { "$regex": "^[0-9]+-[0-9]+$" }})
or
db.mydb.find({ "r.id": /^[0-9]+-[0-9]+$/ })

How to query and update nested arrays

I am building a course system. Each course has multiple sections, each section has multiple steps. My datastructure is as follows:
{
"_id" : "Mtz4DMTwMMKWTWbzE",
"slug" : "how-to-be-awesome",
"title" : "How to be awesome",
"description" : "In 4 easy lessons.",
"createdAt" : ISODate("2014-08-25T13:33:24.675Z"),
"sections" : [
{
"title" : "Be cool",
"description" : "Title says it all really",
"steps" : [
{
"title" : "Wear sunglasses",
"description" : "Always works."
},
{
"title" : "Be funny",
"description" : "Make an occasional joke. But no lame ones."
}
]
}
]
}
This worked while adding steps;
Course._collection.update( { _id: course._id, sections: section }, {
"$push": {
"sections.$.steps": step
}
})
But I can't figure out how to update a step. I tried to give the steps an ID and do it like that, but it's not working, apparently because it's two arrays deep, and you can't have two positionals ($) in a query. I tried something like this:
Course._collection.update( { _id: course._id, 'sections.steps._id': step._id }, {
"$set": {
"sections.steps.$.title": "test updated title"
}
})
But this gave the following error:
can't append to array using string field name: steps
Is there a way to do this? Or is my schema design off?
Thanks!

Modify a document inside an array in MongoDB

Past answers (from mid 2013 and before) don't seem to work and links to the documentation are all out of date.
Example user object:
{
"name": "Joe Bloggs",
"email": "joebloggs#example.com",
"workstations" : [
{ "number" : "10001",
"nickname" : "home" },
{ "number" : "10002",
"nickname" : "work" },
{ "number" : "10003",
"nickname" : "vacation" }
]
}
How can I modify the nickname of a workstation?
I tried using $set, workstations.$ and workstations.nickname but none gave the desired results.
Short answer, you have to use array index. For example, you want to update the nickname of 10002: {$set:{"workstations.1.nickname":"newnickname"}}
Here is the complete example:
> db.test.update({"_id" : ObjectId("5332b7cf4761549fb7e1e72f")},{$set:{"workstations.1.nickname":"newnickname"}})
> db.test.findOne()
{
"_id" : ObjectId("5332b7cf4761549fb7e1e72f"),
"email" : "joebloggs#example.com",
"name" : "Joe Bloggs",
"workstations" : [
{
"number" : "10001",
"nickname" : "home"
},
{
"nickname" : "newnickname",
"number" : "10002"
},
{
"number" : "10003",
"nickname" : "vacation"
}
]
}
>
If you don't know the index (position of the workstations), you can update the doc using $elemMatch:
>db.test.update(
{
"email": "joebloggs#example.com",
"workstations": { "$elemMatch" { "number" : "10002" } }
},
{
"$set": { "workstations.$.nickname": "newnickname2" }
}
)
>
#naimdjon's answer would work. To generalize, you could use the $elemMatch operator in combination with the $ positional operator to update one element in the array using below query:
db.test.update({
// Find the document where name="Joe Bloggs" and the element in the workstations array where number = "10002"
"name": "Joe Bloggs",
"workstations":{$elemMatch:{"number":"10002"}}
},
{
// Update the nickname in the element matched
$set:{"workstations.$.nickname":"newnickname"}
})
Note: $elemMatch is only required if you need to match more than one component in the array. If you are going to match on just the number, you could use "workstations.number":"10002"
As long as you know "which" entry you wish to update then the positional $ operator can be of help. But you need to update your query form:
db.collection.update(
{
"email": "joebloggs#example.com",
"workstations": { "$elemMatch" { "nickname" : "work" } }
},
{
"$set": { "workstations.$.nickname": "new name" }
}
)
So that is the general form. What you need to do here is "match" something in the array in order to get a "position" to use for the update.
Alternately, where you know the position, then you can just "specify" the position with "dot notation":
db.collection.update(
{
"email": "joebloggs#example.com",
},
{
"$set": { "workstations.1.nickname": "new name" }
}
)
Which updates the second element in the array, and does not need the "matching" part in the query.