Sanitize object literal in javascrtipt? [duplicate] - mongodb

It seems mongo does not allow insertion of keys with a dot (.) or dollar sign ($) however when I imported a JSON file that contained a dot in it using the mongoimport tool it worked fine. The driver is complaining about trying to insert that element.
This is what the document looks like in the database:
{
"_id": {
"$oid": "..."
},
"make": "saab",
"models": {
"9.7x": [
2007,
2008,
2009,
2010
]
}
}
Am I doing this all wrong and should not be using hash maps like that with external data (i.e. the models) or can I escape the dot somehow? Maybe I am thinking too much Javascript-like.

MongoDB doesn't support keys with a dot in them so you're going to have to preprocess your JSON file to remove/replace them before importing it or you'll be setting yourself up for all sorts of problems.
There isn't a standard workaround to this issue, the best approach is too dependent upon the specifics of the situation. But I'd avoid any key encoder/decoder approach if possible as you'll continue to pay the inconvenience of that in perpetuity, where a JSON restructure would presumably be a one-time cost.

As mentioned in other answers MongoDB does not allow $ or . characters as map keys due to restrictions on field names. However, as mentioned in Dollar Sign Operator Escaping this restriction does not prevent you from inserting documents with such keys, it just prevents you from updating or querying them.
The problem of simply replacing . with [dot] or U+FF0E (as mentioned elsewhere on this page) is, what happens when the user legitimately wants to store the key [dot] or U+FF0E?
An approach that Fantom's afMorphia driver takes, is to use unicode escape sequences similar to that of Java, but ensuring the escape character is escaped first. In essence, the following string replacements are made (*):
\ --> \\
$ --> \u0024
. --> \u002e
A reverse replacement is made when map keys are subsequently read from MongoDB.
Or in Fantom code:
Str encodeKey(Str key) {
return key.replace("\\", "\\\\").replace("\$", "\\u0024").replace(".", "\\u002e")
}
Str decodeKey(Str key) {
return key.replace("\\u002e", ".").replace("\\u0024", "\$").replace("\\\\", "\\")
}
The only time a user needs to be aware of such conversions is when constructing queries for such keys.
Given it is common to store dotted.property.names in databases for configuration purposes I believe this approach is preferable to simply banning all such map keys.
(*) afMorphia actually performs full / proper unicode escaping rules as mentioned in Unicode escape syntax in Java but the described replacement sequence works just as well.

The latest stable version (v3.6.1) of the MongoDB does support dots (.) in the keys or field names now.
Field names can contain dots (.) and dollar ($) characters now

The Mongo docs suggest replacing illegal characters such as $ and . with their unicode equivalents.
In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FF0E (i.e. “.”).

A solution I just implemented that I'm really happy with involves splitting the key name and value into two separate fields. This way, I can keep the characters exactly the same, and not worry about any of those parsing nightmares. The doc would look like:
{
...
keyName: "domain.com",
keyValue: "unregistered",
...
}
You can still query this easy enough, just by doing a find on the fields keyName and keyValue.
So instead of:
db.collection.find({"domain.com":"unregistered"})
which wouldn't actually work as expected, you would run:
db.collection.find({keyName:"domain.com", keyValue:"unregistered"})
and it will return the expected document.

You can try using a hash in the key instead of the value, and then store that value in the JSON value.
var crypto = require("crypto");
function md5(value) {
return crypto.createHash('md5').update( String(value) ).digest('hex');
}
var data = {
"_id": {
"$oid": "..."
},
"make": "saab",
"models": {}
}
var version = "9.7x";
data.models[ md5(version) ] = {
"version": version,
"years" : [
2007,
2008,
2009,
2010
]
}
You would then access the models using the hash later.
var version = "9.7x";
collection.find( { _id : ...}, function(e, data ) {
var models = data.models[ md5(version) ];
}

It is supported now
MongoDb 3.6 onwards supports both dots and dollar in field names.
See below JIRA: https://jira.mongodb.org/browse/JAVA-2810
Upgrading your Mongodb to 3.6+ sounds like the best way to go.

You'll need to escape the keys. Since it seems most people don't know how to properly escape strings, here's the steps:
choose an escape character (best to choose a character that's rarely used). Eg. '~'
To escape, first replace all instances of the escape character with some sequence prepended with your escape character (eg '~' -> '~t'), then replace whatever character or sequence you need to escape with some sequence prepended with your escape character. Eg. '.' -> '~p'
To unescape, first remove the escape sequence from all instance of your second escape sequence (eg '~p' -> '.'), then transform your escape character sequence to a single escape character(eg '~s' -> '~')
Also, remember that mongo also doesn't allow keys to start with '$', so you have to do something similar there
Here's some code that does it:
// returns an escaped mongo key
exports.escape = function(key) {
return key.replace(/~/g, '~s')
.replace(/\./g, '~p')
.replace(/^\$/g, '~d')
}
// returns an unescaped mongo key
exports.unescape = function(escapedKey) {
return escapedKey.replace(/^~d/g, '$')
.replace(/~p/g, '.')
.replace(/~s/g, '~')
}

From the MongoDB docs "the '.' character must not appear anywhere in the key name". It looks like you'll have to come up with an encoding scheme or do without.

A late answer, but if you use Spring and Mongo, Spring can manage the conversion for you with MappingMongoConverter. It's the solution by JohnnyHK but handled by Spring.
#Autowired
private MappingMongoConverter converter;
#PostConstruct
public void configureMongo() {
converter.setMapKeyDotReplacement("xxx");
}
If your stored Json is :
{ "axxxb" : "value" }
Through Spring (MongoClient) it will be read as :
{ "a.b" : "value" }

As another user mentioned, encoding/decoding this can become problematic in the future, so it's probably just easier to replace all keys that have a dot. Here's a recursive function I made to replace keys with '.' occurrences:
def mongo_jsonify(dictionary):
new_dict = {}
if type(dictionary) is dict:
for k, v in dictionary.items():
new_k = k.replace('.', '-')
if type(v) is dict:
new_dict[new_k] = mongo_jsonify(v)
elif type(v) is list:
new_dict[new_k] = [mongo_jsonify(i) for i in v]
else:
new_dict[new_k] = dictionary[k]
return new_dict
else:
return dictionary
if __name__ == '__main__':
with open('path_to_json', "r") as input_file:
d = json.load(input_file)
d = mongo_jsonify(d)
pprint(d)
You can modify this code to replace '$' too, as that is another character that mongo won't allow in a key.

I use the following escaping in JavaScript for each object key:
key.replace(/\\/g, '\\\\').replace(/^\$/, '\\$').replace(/\./g, '\\_')
What I like about it is that it replaces only $ at the beginning, and it does not use unicode characters which can be tricky to use in the console. _ is to me much more readable than an unicode character. It also does not replace one set of special characters ($, .) with another (unicode). But properly escapes with traditional \.

Not perfect, but will work in most situations: replace the prohibited characters by something else. Since it's in keys, these new chars should be fairly rare.
/** This will replace \ with ⍀, ^$ with '₴' and dots with ⋅ to make the object compatible for mongoDB insert.
Caveats:
1. If you have any of ⍀, ₴ or ⋅ in your original documents, they will be converted to \$.upon decoding.
2. Recursive structures are always an issue. A cheap way to prevent a stack overflow is by limiting the number of levels. The default max level is 10.
*/
encodeMongoObj = function(o, level = 10) {
var build = {}, key, newKey, value
//if (typeof level === "undefined") level = 20 // default level if not provided
for (key in o) {
value = o[key]
if (typeof value === "object") value = (level > 0) ? encodeMongoObj(value, level - 1) : null // If this is an object, recurse if we can
newKey = key.replace(/\\/g, '⍀').replace(/^\$/, '₴').replace(/\./g, '⋅') // replace special chars prohibited in mongo keys
build[newKey] = value
}
return build
}
/** This will decode an object encoded with the above function. We assume the structure is not recursive since it should come from Mongodb */
decodeMongoObj = function(o) {
var build = {}, key, newKey, value
for (key in o) {
value = o[key]
if (typeof value === "object") value = decodeMongoObj(value) // If this is an object, recurse
newKey = key.replace(/⍀/g, '\\').replace(/^₴/, '$').replace(/⋅/g, '.') // replace special chars prohibited in mongo keys
build[newKey] = value
}
return build
}
Here is a test:
var nastyObj = {
"sub.obj" : {"$dollar\\backslash": "$\\.end$"}
}
nastyObj["$you.must.be.kidding"] = nastyObj // make it recursive
var encoded = encodeMongoObj(nastyObj, 1)
console.log(encoded)
console.log( decodeMongoObj( encoded) )
and the results - note that the values are not modified:
{
sub⋅obj: {
₴dollar⍀backslash: "$\\.end$"
},
₴you⋅must⋅be⋅kidding: {
sub⋅obj: null,
₴you⋅must⋅be⋅kidding: null
}
}
[12:02:47.691] {
"sub.obj": {
$dollar\\backslash: "$\\.end$"
},
"$you.must.be.kidding": {
"sub.obj": {},
"$you.must.be.kidding": {}
}
}

There is some ugly way to query it not recommended to use it in application rather than for debug purposes (works only on embedded objects):
db.getCollection('mycollection').aggregate([
{$match: {mymapfield: {$type: "object" }}}, //filter objects with right field type
{$project: {mymapfield: { $objectToArray: "$mymapfield" }}}, //"unwind" map to array of {k: key, v: value} objects
{$match: {mymapfield: {k: "my.key.with.dot", v: "myvalue"}}} //query
])

For PHP I substitute the HTML value for the period. That's ".".
It stores in MongoDB like this:
"validations" : {
"4e25adbb1b0a55400e030000" : {
"associate" : "true"
},
"4e25adb11b0a55400e010000" : {
"associate" : "true"
}
}
and the PHP code...
$entry = array('associate' => $associate);
$data = array( '$set' => array( 'validations.' . str_replace(".", `"."`, $validation) => $entry ));
$newstatus = $collection->update($key, $data, $options);

Lodash pairs will allow you to change
{ 'connect.sid': 's:hyeIzKRdD9aucCc5NceYw5zhHN5vpFOp.0OUaA6' }
into
[ [ 'connect.sid',
's:hyeIzKRdD9aucCc5NceYw5zhHN5vpFOp.0OUaA6' ] ]
using
var newObj = _.pairs(oldObj);

You can store it as it is and convert to pretty after
I wrote this example on Livescript. You can use livescript.net website to eval it
test =
field:
field1: 1
field2: 2
field3: 5
nested:
more: 1
moresdafasdf: 23423
field3: 3
get-plain = (json, parent)->
| typeof! json is \Object => json |> obj-to-pairs |> map -> get-plain it.1, [parent,it.0].filter(-> it?).join(\.)
| _ => key: parent, value: json
test |> get-plain |> flatten |> map (-> [it.key, it.value]) |> pairs-to-obj
It will produce
{"field.field1":1,
"field.field2":2,
"field.field3":5,
"field.nested.more":1,
"field.nested.moresdafasdf":23423,
"field3":3}

Give you my tip: You can using JSON.stringify to save Object/ Array contains the key name has dots, then parse string to Object with JSON.parse to process when get data from database
Another workaround:
Restructure your schema like:
key : {
"keyName": "a.b"
"value": [Array]
}

Latest MongoDB does support keys with a dot, but java MongoDB-driver is not supporting. So to make it work in Java, I pulled code from github repo of java-mongo-driver and made changes accordingly in their isValid Key function, created new jar out of it, using it now.

Replace the dot(.) or dollar($) with other characters that will never used in the real document. And restore the dot(.) or dollar($) when retrieving the document. The strategy won't influence the data that user read.
You can select the character from all characters.

The strange this is, using mongojs, I can create a document with a dot if I set the _id myself, however I cannot create a document when the _id is generated:
Does work:
db.testcollection.save({"_id": "testdocument", "dot.ted.": "value"}, (err, res) => {
console.log(err, res);
});
Does not work:
db.testcollection.save({"dot.ted": "value"}, (err, res) => {
console.log(err, res);
});
I first thought dat updating a document with a dot key also worked, but its identifying the dot as a subkey!
Seeing how mongojs handles the dot (subkey), I'm going to make sure my keys don't contain a dot.

Like what #JohnnyHK has mentioned, do remove punctuations or '.' from your keys because it will create much larger problems when your data starts to accumulate into a larger dataset. This will cause problems especially when you call aggregate operators like $merge which requires accessing and comparing keys which will throw an error. I have learnt it the hard way please don't repeat for those who are starting out.

In our case the properties with the period is never queried by users directly. However, they can be created by users.
So we serialize our entire model first and string replace all instances of the specific fields. Our period fields can show up in many location and it is not predictable what the structure of the data is.
var dataJson = serialize(dataObj);
foreach(pf in periodFields)
{
var encodedPF = pf.replace(".", "ENCODE_DOT");
dataJson.replace(pf, encodedPF);
}
Then later after our data is flattened we replace instances of the encodedPF so we can write the decoded version in our files
Nobody will ever need a field named ENCODE_DOT so it will not be an issue in our case.
The result is the following
color.one will be in the database as colorENCODE_DOTone
When we write our files we replace ENCODE_DOT with .

/home/user/anaconda3/lib/python3.6/site-packages/pymongo/collection.py
Found it in error messages. If you use anaconda (find the correspondent file if not), simply change the value from check_keys = True to False in the file stated above. That'll work!

Related

MongoDB and NextJS: Find a certain data matches regardless if uppercase or lowercase

The goal of this code is to display the current numbers of death, recoveries and critical for covid 19 around the world.
The search function codes are as follows:
const search = (e) => {
e.preventDefault() //to avoid page redirection
const countryMatch = countryCollection.find(country => country.country_name === targetCountry)
if (!countryMatch || countryMatch === null|| countryMatch === 'undefined') {
alert("Country Does Not Exist, use another name.")
setName("")
setTargetCountry("")
} else {
setName(countryMatch.country_name)
setDeathCount(toNum(countryMatch.deaths))
setCriticalCount(toNum(countryMatch.serious_critical))
setRecoveryCount(toNum(countryMatch.total_recovered))
}
}
Our task is to find a country regardless if its in upper or lower case. Eg: Malaysia vs malaysia.
REGULAR EXPRESSION
What you need is regular expression or RegExp. MongoDb supports regular expression for your searches.
In Your case it can be something like
countryCollections.find({'country':new RegExp(countryName,flag)},callback)
flag determines how you want to search
for case insensitive search use 'i'
More about RegExp can be found on mongoDB docs https://docs.mongodb.com/manual/reference/operator/query/regex/
According to your usage of MongoDB, I would say, that this case is an excellent case to using text indexes.
Here is an example for you:
Schema.index(
// making field available for $text search and $meta sorting
{
'field': 'text',
'embedDoc.field': 'text',
},
{
//options of index
weights: // weight for each field
{
'field': 2,
'embedDoc.field': 1,
},
name: 'Countries', // Index Name for Mongo Compass and .explain debug
})
I guess you should try that. It will solve all your potential problems with text search. Like ' or diacritic symbols in searching, lower-uppercase and so on. But please, check the documentation of text indexes, before implementing them, it's quite sensitive and flexible for any cases. But there is no universal silver bullet.

How to rename mongoDB columns big data?

I have a database with 15,574,934 records in mongo
I want to rename some columns to:
db.offerPhotos.files.update({}, {$rename: { 'orgFilename': 'metadata.orgFilename', 'offerId': 'metadata.offerId', 'batch': 'metadata.batch', 'group': 'metadata.group', 'size': 'metadata.size', 'mimeType': 'metadata.mimeType'}}, false, true)
I do it by mongo CLI but I'm waiting and waiting and nothing happens
How to do it better?
As per MongDB naming conventions
Field names cannot contain the null character.
Top-level field names cannot start with the dollar sign ($) character.
The use of $ and . in field names is not recommended and is not supported by the official MongoDB drivers
Otherwise, starting in MongoDB 3.6, the server permits storage of
field names that contain dots (i.e. .) and dollar signs (i.e. $).
Reason for the slowness:
The $rename operator logically performs an $unset of both the old name and the new name, and then performs a $set operation with the new name.
It does two operation first on a document unset on old and new name. Then it performs set operation. So totally three operations per document.
Reference
And you are actually doing more worse. Because you are converting a field to nested one.
Input doc:
{
"_id":"",
"key":1
}
Rename:
db.test.update({}, {
"$rename": {
"key": "key1.name"
}
})
Output:
/* 1 */
{
"_id" : ObjectId("5ef718a290c7f76c305aa21c"),
"key1" : {
"name" : 1
}
}
It's converted to nested one. You are doing worse because each rename results in 3 operations on a nested doc which in turn contains many fields.
Approximately you are doing 1.6million * 3 * 6 operations on a doc. Hence it is slow.

How can I use an aggregation pipeline to see which documents have a field with a string that starts with any of the strings in a list?

I am using mongo server version 3.4, so my question pertains to the functionality of that version. I cannot upgrade anytime soon, so please keep that in mind. If have a field in some documents in a MongoDB collection that may contain a string but also have trailing characters, how might I find them when submitting multiple "startsWith" strings to be evaluated in the same query? I may have some difficulty explaining this, so let me show some examples. Let's say that I have a field called "description" in all of my documents. This description might be encoded so that the text is not completely straightforward. Some values might be:
green:A-4_ABC
yellow:C-12_456
red:A-431_ZXCVQ
yellow_green:C-12_999
brown:B-3_R
gray:EN-44_195
EDIT: I think I made a mistake with using words in my keys. The keys are a randomized string of numbers, letters, and underscores, followed by a colon, then one to three letters, followed by a dash, then a couple of numbers, then an underscore, and lastly followed by several alphanumeric characters:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82_FR2
I realize that this sounds arbitrary and stupid, but it is an encoding of information that is intended to result in a unique key. It is in data that I receive, so I cannot change it, unfortunately.
Now, I want to find all of the documents with descriptions that start with any of the following values, and all of these values must be submitted in the same query. I might have hundreds of submitted values, and I need to get all matching documents at once. Here is a short list of what might be submitted in a single query:
green:A-4
red:A-431
gray:EN-44
yellow_green:C-12
Note that it was not accidental that the text is everything prior to the last underscore. And, as with one of the examples, there might be more than one underscore. With my use case, I cannot create a query that hard-codes these strings in the javascript regex format. And the $in filter does not work with "startsWith" functionality, particularly when you pass in a list of strings (though I am familiar with supplying a list of hard-coded javascript regexes). Is there any way to use the $in operator where I can take a list of strings that are passed in from the user who wants to run a query like this? Or is there something equivalent? The cherry on the top of all of this would be to find a way to project the matching document with the string that it matched (either from the query, or by some substring magic that I cannot seem to figure out).
EDIT: Specifically, when I find each document, I want to be able to project everything from they key up until the LAST underscore, like:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82
(along with its value)
Thanks in advance for any nudges in the right direction.
We use $objectToArray to get {k:field_name, v:field_value} array. Then we split by _ token all values and convert to object with $arrayToObject operator.
Next step we apply $match operator to filter documents and exclude data with $unset.
Note: If your document contains array or subdocuments, we may use $filter before we convert $objectToArray.
db.collection.aggregate([
{
$addFields: {
data: {
$arrayToObject: {
$map: {
input: {
$objectToArray: "$$ROOT"
},
in: {
k: "$$this.k",
v: {
$arrayElemAt: [
{
$split: [
{
$toString: "$$this.v"
},
"_"
]
},
0
]
}
}
}
}
}
}
},
{
$match: {
"data.green": "A-4",
"data.red": "A-431",
"data.gray": "EN-44",
"data.yellow_green": "C-12"
}
},
{
$unset: "data"
}
])
MongoPlayground

Meteor: How to do a case insensitive collection.findOne()?

I'm implementing a way for users to change their username in a Meteor app I am writing. Before accepting changes, I want to check if the username already exists. Usernames can contain upper and lowercase, but they must be unique names regardless of case. For example, bob and Bob cannot exist together.
The problem is that I can't seem to figure out how to do a collection.findOne() that is case insensitive. For example, say I have a collection called Profiles, I'd like to be able to do something like this:
newName = "bob";
//Assume "Bob" exists as a username in the Profiles collection;
var isAlreadyRegistered = Profiles.findOne({"username": newName});
if (isAlreadyRegistered == null) {
saveUsername();
};
Your can use regular expression.
var isAlreadyRegistered = Profiles.findOne({"username": /^newName$/i });
Or you can query like this also :
var isAlreadyRegistered = Profiles.findOne({ "username" : {
$regex : new RegExp(newName, "i") } }
);
There are two ways and your mileage may vary on the best approach for you, but both are fairly horrible actually since MongoDB does case "sensitive" matching:
First approach is to use $regex:
Profiles.findOne({ "username": {
"$regex": "^" + newName + "\\b", "$options": "i"
}})
That matches the word and only the exact word from the beginning of the string in a case insensitive way. The problem here is that you are scanning an index.
The second approach is to project using aggregate:
db.collection("profiles").aggregate([
{ "$project": {
"username": 1,
"lower": { "$toLower": "$username" }
}},
{ "$match": {
"username": newName
}}
])
And you do that where of course newName has already been converted to lowercase.
The problem here is that will $project over everything in the pipeline. But can be useful if you can possibly $match first.
Of course I think that aggregate is only available on the server side and not through Minimongo, so there is that to consider.
As a solution to your underlying use-case, I suggest using two fields to store the username rather than one.
The built-in username field should store the lowercase version of the username. The other, extra field stores the original case-sensitive version.
Searches would be conducted against the 'username' field with the search criteria lowercased as well before use.

mongodb evaluate if string ends with another substring

I would like to project a field with a value if another field ends with a substring but another value if it's not
How can I do that?
Example (I omit what's not important):
Doc 1:
{
'Field1': 'A perfect normal string'
}
Doc N:
{
'Field1': 'This one ends with my substring'
}
The ideal will be:
$project: {
'HasSubstring': {$cond: [{$regex: 'substring$'}, true, false]}
}
But this doesn't work because we can't (????) use $regex inside a $cond
Anyone could point me, please?
Thanks a lot
PS: I can't use the regex filter in the match because I need both docs for groupping them
How about:
Store 'Field1Reversed' field , with characters reversed, then use:
$project: {
'HasSubstring': {$cond:[{$eq:[ 'gnirtsbus', {$substr:["$Field1Reversed",0,9]}] } ,true,false]}
}
There are unicode implications to using substr, but these can be resolved in your app if necessary (measure length of comparand to get the proper substr # of bytes)