Edit Mongodb array values to only contain alphabetic characters - mongodb

I am trying to edit the data in my mongodb collection after importing the data from a csv file. One of my fields contains a mixture of alphabetic and numeric characters, and I want to edit it such that it only contains alphabetic characters to make querying easier
So I want to change this:
"categories" : [
"9100005:1:factual",
"9200041:2:arts_culture_and_the_media",
"9200055:2:history"
]
Into this:
"categories" : [
"factual",
"arts culture and the media",
"history"
]
I know that using $regex I can query for certain categories, so using:
db.bbc.find( { categories: {$regex: /factual/ }} )
I get all records with "factual" as a category, but I am unsure how to use $regex to query for multiple categories and I just feel avoiding using regular expressions entirely would be easier. Does anyone know what command I would have to run to be able to do this? Thanks.

Try this
db.test1.find().snapshot().forEach(function (el) {
for(a in el.Category){
print(el.Category[a].replace(/[^A-Za-z]/g, ""));
el.Category[a]=el.Category[a].replace(/[^A-Za-z]/g, "");
}
db.test1.save(el);
});
Edit:
To Preserve Space Just add space in regex.
You can use this line
el.Category[a].replace(/[^A-Za-z]*$/g, "");
To Handle Underscore
el.Category[a].replace(/[^A-Za-z_]*$/g, "");

Related

mongodb $regex not working with $options x

I have models collection of documents in mongodb atlas. This is how document looks.
{
name: "Iphone 11 Pro Max",
description: "",
}
I have a value like "Iphone11ProMax", that I retrieved from the URL params. Now I want to query the above document with this value. But wasn't able to because the value I have doesn't have spaces and I can not manually insert spaces since params changes. so I tried using $regex operator like this
const {name} = req.params;
const pattern = new RegExp(name);
Model.findOne({name: {$regex: pattern, $options: 'x'}});
Since 'x' option ignores the any whitespaces, I thought it might work but it did not. Any suggestions on this?
The "$regex" "$options" "x" only ignores whitespace in the regex pattern, not the target string.
There are several options to query your collection with the value retrieved from the URL. One option is to programmatically place /s* between each character in this value to use as your "$regex" pattern. After transforming this value, the query could be like this.
N.B.: You may or may not need to "escape" \ in the "$regex".
db.collection.find({
"name": {
// put "\\s*" between every character in regex
"$regex": "I\\s*p\\s*h\\s*o\\s*n\\s*e\\s*1\\s*1\\s*P\\s*r\\s*o\\s*M\\s*a\\s*x"
}
})
Try it on mongoplayground.net.

How to nested query in mongodb?

How to index nested objects in Pymongo so that I can perform full text search. For example I've the collection object like this...
{
"_id":"ObjectId(" "5e8b0fa1c869790699efdb2d" ")",
"xmlfileid": "334355343343223567",
"threads":{
"threads_participants":{
"participant":[
{
"#reference": rits_dbx_1
},
{
"#reference": rits_dbx_2
}
]
},
"thread":{
"namedAnchor":"{' ': 'NORP', 'Ho': 'PERSON', 'Lets': 'PERSON', 'Boris Johnson': 'PERSON', 'Britain': 'GPE'}",
"selectedText":{
"fragment":[
{
"#class":"next_steps",
"#text":"rits_dbx_1 said hello this is a good site."
},
{
"#class":"other",
"#text":"rits_dbx_1 said ho ho."
},
{
"#class":"other",
"#text":"rits_dbx_1 said lets put some meaningful stuff here."
},
]
}
}
}
}
I've placed search box in my website and when user types the #text in search box I want to display the #text and class and the xmlfileid
So far I've created index using below command. And I don't know it's the right way to get the result and also please help with query too.
db.xml_collection.createIndex({"threads.thread.selectedText.fragment": "text"})
In my python code I've this but that prints nothing
result = collection.find({"$text": {"$search": "ho ho"}})
Your index is wrong.
MongoDB provides text indexes to support text search queries on string content. text indexes can include any field whose value is a string or an array of string elements.
https://docs.mongodb.com/manual/core/index-text/
If you want to index only #text field, change your index to this:
db.xml_collection.createIndex({"threads.thread.selectedText.fragment.#text": "text"})
Also, you may create Wildcard text index and MongoDB will index all key:value pairs (where value is string / array of string)
db.xml_collection.createIndex({"$**": "text"})
Note: You need to drop any previous text indexes for this collection

How can I use an aggregation pipeline to see which documents have a field with a string that starts with any of the strings in a list?

I am using mongo server version 3.4, so my question pertains to the functionality of that version. I cannot upgrade anytime soon, so please keep that in mind. If have a field in some documents in a MongoDB collection that may contain a string but also have trailing characters, how might I find them when submitting multiple "startsWith" strings to be evaluated in the same query? I may have some difficulty explaining this, so let me show some examples. Let's say that I have a field called "description" in all of my documents. This description might be encoded so that the text is not completely straightforward. Some values might be:
green:A-4_ABC
yellow:C-12_456
red:A-431_ZXCVQ
yellow_green:C-12_999
brown:B-3_R
gray:EN-44_195
EDIT: I think I made a mistake with using words in my keys. The keys are a randomized string of numbers, letters, and underscores, followed by a colon, then one to three letters, followed by a dash, then a couple of numbers, then an underscore, and lastly followed by several alphanumeric characters:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82_FR2
I realize that this sounds arbitrary and stupid, but it is an encoding of information that is intended to result in a unique key. It is in data that I receive, so I cannot change it, unfortunately.
Now, I want to find all of the documents with descriptions that start with any of the following values, and all of these values must be submitted in the same query. I might have hundreds of submitted values, and I need to get all matching documents at once. Here is a short list of what might be submitted in a single query:
green:A-4
red:A-431
gray:EN-44
yellow_green:C-12
Note that it was not accidental that the text is everything prior to the last underscore. And, as with one of the examples, there might be more than one underscore. With my use case, I cannot create a query that hard-codes these strings in the javascript regex format. And the $in filter does not work with "startsWith" functionality, particularly when you pass in a list of strings (though I am familiar with supplying a list of hard-coded javascript regexes). Is there any way to use the $in operator where I can take a list of strings that are passed in from the user who wants to run a query like this? Or is there something equivalent? The cherry on the top of all of this would be to find a way to project the matching document with the string that it matched (either from the query, or by some substring magic that I cannot seem to figure out).
EDIT: Specifically, when I find each document, I want to be able to project everything from they key up until the LAST underscore, like:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82
(along with its value)
Thanks in advance for any nudges in the right direction.
We use $objectToArray to get {k:field_name, v:field_value} array. Then we split by _ token all values and convert to object with $arrayToObject operator.
Next step we apply $match operator to filter documents and exclude data with $unset.
Note: If your document contains array or subdocuments, we may use $filter before we convert $objectToArray.
db.collection.aggregate([
{
$addFields: {
data: {
$arrayToObject: {
$map: {
input: {
$objectToArray: "$$ROOT"
},
in: {
k: "$$this.k",
v: {
$arrayElemAt: [
{
$split: [
{
$toString: "$$this.v"
},
"_"
]
},
0
]
}
}
}
}
}
}
},
{
$match: {
"data.green": "A-4",
"data.red": "A-431",
"data.gray": "EN-44",
"data.yellow_green": "C-12"
}
},
{
$unset: "data"
}
])
MongoPlayground

mongoDB text index on subdocuments

I have a collection that looks something like this
{ "text1" : "text",
"url" : "http:....",
"title" : "the title",
......,
"search_metadata" : { "tags" : [ "tag1", "tag2", "tag3" ],
"title" : "the title",
"topcis": [ "topic1", "topic2"]
}
}
I want to be able to add a text index to search_metadata and all it's subdocuments.
ensureIndex({search_metadata:"text"}) Gives me no results
and:
ensureIndex({"$**":"text"}) will give me irrelevant data
How can I make it happen?
From the text indexes page:
text indexes can include any field whose value is a string or an array
of string elements. To perform queries that access the text index, use
the $text query operator
Your search_metadata field is a series of sub-documents, not a string or an array of strings, so it basically is not in the right format to make use of a text index in MongoDB as it is currently structured.
Now, embedded in search_metadata you have both strings and arrays of strings, so you could use a text index on those, so an index on {search_metadata.tags : "text"} for example fits the criteria and should work just fine.
Hence, it's a choice between restructuring the field to meet the text index criteria, or a matter of indexing the relevant sub-fields. If you take the latter approach you may find that you don't need text indexes on each of the fields and a simpler (and far smaller) index may serve you just as well (using a normal index on tags and then $elemMatch for example).

Querying array of arrays in MongoDB

I have a mongo collection which has an array of arrays (bigrams from a NLP process) that I'd like to be able to search, so for example;
{
"sentence" : "will most likely be",
"biGrams" : [["will","most"], ["most","likely"], ["likely", "be"]
},
{
"sentence" : "likely most people use stackoverflow",
"biGrams" : [["likely","most"], ["most","people"], ["people", "use"], ["use", "stackoverflow"]
}
What I'd like to be able to do is search through the biGram sub-doucment for a certain instance of one of these bigrams, e.g. search for all sentences that contain the bigram ["most","likely"].
I've tried this;
find({'biGrams':{$elemMatch: {$elemMatch:{$in:['most','likely']}} }})
But this obviously finds all cases with the word 'most' or 'likely' are present. And the order is important, i.e. I don't want to find docs with ['likely','most'] in this example.
Thanks in advance, I'm stumped....
How about
find({"biGrams":["most","likely"]})
Searching on a particular field "unwinds" one level of arrays in that field, and searching for a particular array should be a binary match on that array.