Can MongoDB resolve empty string as property name? - mongodb

Knows somebody, how to query from MongoDB value from this (JSON valid, pretty printed) object:
var a = JSON.parse(`
{
"vnut_okraj_podmienky": {
"": {
"standart_podmienky": {
"type": "radio",
"value": "on"
},
"nestand_teplota": {
"type": "number",
"value": "24"
},
"nestand_vlhkost": {
"type": "number",
"value": "70"
}
}
}
}
`
In browser console I can obtain value (=24) of:
a.vnut_okraj_podmienky[""].nestand_teplota.value
but mongosh returns [] on this (db name irrelevant):
db.isover_projects.distinct("vnut_okraj_podmienky.''.nestand_teplota.value")
and error MongoServerError: FieldPath field names may not be empty strings.
on:
db.isover_projects.distinct("vnut_okraj_podmienky..nestand_teplota.value")

The MongoDB server stores data in BSON.
According the specification at https://bsonspec.org/spec.html a field name must be
Zero or more modified UTF-8 encoded characters followed by '\x00'. The (byte*) MUST NOT contain '\x00', hence it is not full UTF-8.
So it technically can store the empty string as a field name.
This works in simple queries as well:
>db.collection.find({"":{a:1}})
[ { _id: ObjectId("616c4783e3be8ecf36d5e932"), '': { a: 1 } } ]
This also works dotted notation:
>db.collection.find({".a":1})
[ { _id: ObjectId("616c4783e3be8ecf36d5e932"), '': { a: 1 } } ]
However, that does not work if you try to use that empty field name with update, projection, or aggregation operators:
>db.collection.aggregate([{$match:{".a":1}},{$set:{".b":2}}])
MongoError: Invalid $set :: caused by :: FieldPath field names may not be empty strings.
So while it is technically permitted to store a document with a field whose name is the empty string, not all operations are support on such fields.

Related

Change data type from string to date while skipping missing data

The core collection (other collections in the DB refer back to this one) in my DB contains 3 fields with date information which at this point is formatted as strings like MM/DD/YYYY. Further, there are a range of documents for which this field contains missing data, i.e. "". I populated this collection by running the mongoimport command on a JSON file.
My goal is to convert these date-fields into actual ISODate data types, so as to allow filtering the collection by dates. Further, I want MongoDB to know that empty strings indicate missing values. I have read quite widely on this, leading me to try a bunch of things:
Trying a forEach statement - This worked, but only for the very first document.
db.collection.find().forEach(function(element){
element.startDate = ISODate(element.startDate);
db.collection.save(element);
})
Using kind of a for-loop: this worked well, but stopped once it encountered a missing value (so it transformed about 11 values):
db.collection.update(
{
"startDate":{
"$type":"string"
}
},
[
{
"$set":{
"startDate":{
"$dateFromString":{
"dateString":"$startDate",
"format":"%m/%d/%Y"
}
}
}
}
]
)
So, both of these approaches kind of worked - but I don't know how to apply them to the entire collection. Further, I'd be interested in performing this task in the most efficient way possible. However, I only want to do this once - data that will be added in the future should hopefully be correctly formatted at the import stage.
db.collection.updateMany(
{
"$and": [
{ "startDate": { "$type": "string" } },
{ "startDate": { "$ne": "" } }
]
},
[
{
"$set": {
"startDate": {
"$dateFromString": {
"dateString": "$startDate",
"format": "%m/%d/%Y"
}
}
}
}
]
)
Filtering out empty string than doing the transformation will ignore documents that have empty string in date field.

MongoDB Atlas Search - Multiple terms in search-string with 'and' condition (not 'or')

In the documentation of MongoDB Atlas search, it says the following for the autocomplete operator:
query: String or strings to search for. If there are multiple terms in
a string, Atlas Search also looks for a match for each term in the
string separately.
For the text operator, the same thing applies:
query: The string or strings to search for. If there are multiple
terms in a string, Atlas Search also looks for a match for each term
in the string separately.
Matching each term separately seems odd behaviour to me. We need multiple searches in our app, and for each we expect less results the more words you type, not more.
Example: When searching for "John Doe", I expect only results with both "John" and "Doe". Currently, I get results that match either "John" or "Doe".
Is this not possible using MongoDB Atlas Search, or am I doing something wrong?
Update
Currently, I have solved it by splitting the search-term on space (' ') and adding each individual keyword to a separate must-sub-clause (with the compound operator). However, then the search query no longer returns any results if there is one keyword with only one character. To account for that, I split keywords with one character from those with multiple characters.
The snippet below works, but for this I need to save two generated fields on each document:
searchString: a string with all the searchable fields concatenated. F.e. "John Doe Man Streetstreet Citycity"
searchArray: the above string uppercased & split on space (' ') into an array
const must = [];
const searchTerms = 'John D'.split(' ');
for (let i = 0; i < searchTerms.length; i += 1) {
if (searchTerms[i].length === 1) {
must.push({
regex: {
path: 'searchArray',
query: `${searchTerms[i].toUpperCase()}.*`,
},
});
} else if (searchTerms[i].length > 1) {
must.push({
autocomplete: {
query: searchTerms[i],
path: 'searchString',
fuzzy: {
maxEdits: 1,
prefixLength: 4,
maxExpansions: 20,
},
},
});
}
}
db.getCollection('someCollection').aggregate([
{
$search: {
compound: { must },
},
},
]).toArray();
Update 2 - Full example of unexpected behaviour
Create collection with following documents:
db.getCollection('testing').insertMany([{
"searchString": "John Doe ExtraTextHere"
}, {
"searchString": "Jane Doe OtherName"
}, {
"searchString": "Doem Sarah Thisistestdata"
}])
Create search index 'default' on this collection:
{
"mappings": {
"dynamic": false,
"fields": {
"searchString": {
"type": "autocomplete"
}
}
}
}
Do the following query:
db.getCollection('testing').aggregate([
{
$search: {
autocomplete: {
query: "John Doe",
path: 'searchString',
fuzzy: {
maxEdits: 1,
prefixLength: 4,
maxExpansions: 20,
},
},
},
},
]).toArray();
When a user searches for "John Doe", this query returns all the documents that have either "John" OR "Doe" in the path "searchString". In this example, that means all 3 documents. The more words the user types, the more results are returned. This is not expected behaviour. I would expect more words to match less results because the search term gets more precise.
An edgeGram tokenization strategy might be better for your use case because it works left-to-right.
Try this index definition take from the docs:
{
"mappings": {
"dynamic": false,
"fields": {
"searchString": [
{
"type": "autocomplete",
"tokenization": "edgeGram",
"minGrams": 3,
"maxGrams": 10,
"foldDiacritics": true
}
]
}
}
}
Also, add change your query clause from must to filter. That will exclude the documents that do not contain all the tokens.

RemoteTransportException, Fielddata is disabled on text fields when doing aggregation on text field

I am migrating from 2.x to 5.x
I am adding values to the index like this
indexInto (indexName / indexType) id someKey source foo
however I would also want to fetch all values by field:
def getValues(tag: String) ={
client execute {
search(indexName / indexType) query ("_field_names", tag) aggregations (termsAggregation( "agg") field tag size 1)
}
But I am getting this exception :
RemoteTransportException[[8vWOLB2][172.17.0.5:9300][indices:data/read/search[phase/query]]];
nested: IllegalArgumentException[Fielddata is disabled on text fields
by default. Set fielddata=true on [my_tag] in order to load fielddata
in memory by uninverting the inverted index. Note that this can
however use significant memory.];
I am thought maybe to use keyword as shown here , but the fields are not known in advanced (sent by the user) so I cannot use perpend mappings
By default all the unknown fields will be indexed/added to elasticsearch as text fields which are not specified in the mappings.
If you will take a look at mappings of such a field, you can see there a field is enabled with for such fields with type 'keyword' and these fields are indexed but not analyzed.
GET new_index2/_mappings
{
"new_index2": {
"mappings": {
"type": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
so you can use the fields values for the text fields for aggregations like the following
POST new_index2/_search
{
"aggs": {
"NAME": {
"terms": {
"field": "name.fields",
"size": 10
}
}
}
}
Check name.fields
So your scala query can work if you can shift to fields value.
def getValues(tag: String) = {
client.execute {
search(indexName / indexType)
.query("_field_name", tag)
.aggregations {
termsAgg("agg", "field_name.fields")
}.size(1)
}
}
Hope this helps.
Thanks

Get value by key of object field in a MongoDB document or default value

I have a collection in a MongoDB database. Each document has (among others) one object field that looks like this:
name: {
"en-US": "Foo",
"es-ES": "Bar"
...
}
The en-US key is guaranteed to exist, but any other keys/values are not.
Is there a way I can query the documents in such a way that the result would contain the field name and the value of that field would be either the language I passed (let it be es-ES) or if that key doesn't exist, the value of en-US?
I solved my problem with:
db.products.aggregate([
{
$project: {
short_id: 1,
name: {
$cond: {
if: {
$eq: ["$name.es-ES", undefined]
},
then: '$name.en-EN',
else: '$name.es-ES'
}
}
}
}
])

What does it mean in JSON

{
"messageshow": [
{
"message_id": "497",
"message": "http://flur.p-sites.info/api/messages/voice/1360076234.caff",
"message_pic": "<UIImage: 0xa29e160>",
"uid": "44",
"created": "4 hours ago",
"username": "pari",
"first_name": "pp",
"last_name": "pp",
"profile_pic": "http://flur.p-sites.info/api/uploads/13599968121.jpg",
"tag_user": {
"tags": [
{
"message": "false"
}
]
},
"boos_list": {
"booslist": [
{
"message": "false"
}
]
},
"aplouds_list": {
"aploudslist": [
{
"message": "false"
}
]
},
"total_comments": 0,
"total_boos": 0,
"total_applouds": 0
},
{
"message_id": "496",
"message": "http://flur.p-sites.info/api/messages/voice/1360076182.caff",
"message_pic": "<UIImage: 0xa3b0610>",
"uid": "44",
"created": "4 hours ago",
"username": "pari",
"first_name": "pp",
"last_name": "pp",
"profile_pic": "http://flur.p-sites.info/api/uploads/13599968121.jpg",
"tag_user": {
"tags": [
{
"message": "false"
}
]
},
"boos_list": {
"booslist": [
{
"message": "false"
}
]
},
"aplouds_list": {
"aploudslist": [
{
"message": "false"
}
]
},
"total_comments": 0,
"total_boos": 0,
"total_applouds": 0
}
]
}
In this JSON all value are coming in "" quotes, but few tags are coming without any quotes what does it indicate ?
JSON Display value without quote it consider as Numeric value..
For JSON beginner :
JSON Syntax Rules
JSON syntax is a subset of the JavaScript object notation syntax:
Data is in name/value pairs
Data is separated by commas
Curly braces hold objects
Square brackets hold arrays
JSON data is written as name/value pairs.
A name/value pair consists of a field name (in double quotes), followed by a colon, followed by a value:
"firstName" : "John"
This is simple to understand, and equals to the JavaScript statement:
firstName = "John"
JSON values can be:
A number (integer or floating point)
A string (in double quotes)
A Boolean (true or false)
An array (in square brackets)
An object (in curly brackets)
null
JSON Objects :
JSON objects are written inside curly brackets,
Objects can contain multiple name/values pairs:
{ "firstName":"John" , "lastName":"Doe" }
This is also simple to understand, and equals to the JavaScript statements:
firstName = "John"
lastName = "Doe"
JSON Arrays :
JSON arrays are written inside square brackets.
An array can contain multiple objects:
{
"employees": [
{ "firstName":"John" , "lastName":"Doe" },
{ "firstName":"Anna" , "lastName":"Smith" },
{ "firstName":"Peter" , "lastName":"Jones" }
]
}
In the example above, the object "employees" is an array containing three objects. Each object is a record of a person (with a first name and a last name).
This is Basic of JSON
For more understanding refere this site.
Thanks
The tags which are without double quotes are integer values or Boolean Values or NULL.
The tags which are starting with [] square brackets are Arrays.
The tags which are starting with {} is JSON inside a attribute/value.
That depends on the type of the value. If the value is an numerical type its WITHOUT the quotes.
If it is no numerical type it's WITH the quotes (for example Strings, like most in your example).
In addition to strings JSON supports numerical values. So in this case the values without quotes are simply considered numbers.
They are numeric values. As per the JSON docs:
A value can be a string in double quotes, or a number, or true or
false or null, or an object or an array. These structures can be
nested.