mongoDB query with case insensitive schema element - mongodb

In my MongoDB collection I have added a record as follows
db.teacher.insert({_id:1 ,"name":"Kaushik"})
If I search
db.teacher.find({name:"Kaushik"})
I get one record. But if I try "NAME" instead of "name" i.e.
db.teacher.find({NAME:"Kaushik"})
It won't return any record.
It means that I must know how schema element is spelled exactly with exact case. Is there way to write query by ignoring case of schema element.
We can search the element value using case insensitive as follows
> db.teacher.find({name:/kAUSHIK/i})
{ "_id" : 1, "name" : "Kaushik" }
Is there similar for schema element; something like
> db.teacher.find({/NAME/i:"kaushik"})

We can search the element value using case insensitive [...]
Is there [something] similar for schema element [?]
No.
We may assume that JavaScript and JSON are case sensitive, and so are MongoDB queries.
That being said, internally MongoDB uses BSON, and the specs say nothing about case-sensitivity of keys. The BNF grammar only said that an element name is a nul terminated modified UTF-8 string:
e_name ::= cstring Key name
cstring ::= (byte*) "\x00" Zero or more modified UTF-8 encoded
characters followed by '\x00'. The
(byte*) MUST NOT contain '\x00', hence
it is not full UTF-8.
But, from the source code (here or here for example), it appears that MongoDB BSON's implementation use strcmp to perform binary comparison on element names, confirming there is no way to achieve what you want.
This might be indeed an issue beyond case sensitivity, as using combining characters, the same character might have several binary representations -- but MongoDB does not perform Unicode normalization. For example:
> db.collection.insert({"é":1})
> db.collection.find({"é":1}).count()
1
> db.collection.find({"e\u0301":1}).count()
0

This related to javascript engine and json specification. in js identifiers are case sensitive. This means you can have a document with two field named "name" and "Name" or "NAME". So mongodb act as two distinct filed with your fields.

You could use a regex like
db.teacher.find({name:/^kaushik$/i})

Related

Mongo: Is & char ignored in text index [duplicate]

So I have a document in a collection with on of the fields having a value "###"
I indexed the collection and tried running the query:
db.getCollection('TestCollection').find({$text:{$search:"\"###\""}})
But it didn't show the result
How can I work around this?
Sample Document:
{
"_id" : ObjectId("5b90dc6d3de8562a6ef7c409"),
"field" : "value",
"field2" : "###"
}
Text search is designed to index strings based on language heuristics. Text indexing involves two general steps: tokenizing (converting a string into individual terms of interest) followed by stemming (converting each term into a root form for indexing based on language-specific rules).
During the tokenizing step certain characters (for example, punctuation symbols such as #) are classified as word separators (aka delimiters) rather than text input and used to separate the original string into terms. Language-specific stop words (common words such as "the", "is", or "on" in English) are also excluded from a text index.
Since your search phrase of ### consists entirely of delimiters, there is no corresponding entry in the text index.
If you want to match generic string patterns, you should use regular expressions rather than text search. For example: db.getCollection('TestCollection').find({field2:/###/}). However, please note the caveats on index usage for regular expressions.
Your query has to many curly braces, remove them:
db.getCollection('so2').find({$text:{$search:"\"###\""}})
If you run it, Mongo tells you you're missing a text index. Add it like this:
db.so2.createIndex( { field2: "text" } )
The value you're using is pretty small. Try using longer values.

Mongo doesn't match string with strange space characters

I have two documents with a field named contact_name that contains the exact first name and last name but the space that is separating them seems to be different.
The input could be from any keyboard/device since it comes through an API(iOS app, android app, browser).
MongoDb version is 3.0.3
How could I make Mongo match any type of space?
One space is encoded 20 and one is a0
you could use regex pattern
db.collection.find({contact_name: {$regex:/firstName.*lastName/})
and maybe this is better for matching blank spaces(not easy for me to test if \s will match 0xa0 with mongodb, but it promised to be working just as well)
db.collection.find({contact_name: {$regex:/firstName\s+lastName/})

Algolia tag not searchable when ending with special characters

I'm coming across a strange situation where I cannot search on string tags that end with a special character. So far I've tried ) and ].
For example, given a Fruit index with a record with a tag apple (red), if you query (using the JS library) with tagFilters: "apple (red)", no results will be returned even if there are records with this tag.
However, if you change the tag to apple (red (not ending with a special character), results will be returned.
Is this a known issue? Is there a way to get around this?
EDIT
I saw this FAQ on special characters. However, it seems as though even if I set () as separator characters to index that only effects the direct attriubtes that are searchable, not the tag. is this correct? can I change the separator characters to index on tags?
You should try using the array syntax for your tags:
tagFilters: ["apple (red)"]
The reason it is currently failing is because of the syntax of tagFilters. When you pass a string, it tries to parse it using a special syntax, documented here, where commas mean "AND" and parentheses delimit an "OR" group.
By the way, tagFilters is now deprecated for a much clearer syntax available with the filters parameter. For your specific example, you'd use it this way:
filters: '_tags:"apple (red)"'

How to use regex to include/exclude some input files in sc.textFile?

I have attempted to filter out dates for specific files using Apache spark inside the file to RDD function sc.textFile().
I have attempted to do the following:
sc.textFile("/user/Orders/201507(2[7-9]{1}|3[0-1]{1})*")
This should match the following:
/user/Orders/201507270010033.gz
/user/Orders/201507300060052.gz
Any idea how to achieve this?
Looking at the accepted answer, it seems to use some form of glob syntax. It also reveals that the API is an exposure of Hadoop's FileInputFormat.
Searching reveals that paths supplied to FileInputFormat's addInputPath or setInputPath "may represent a file, a directory, or, by using glob, a collection of files and directories". Perhaps, SparkContext also uses those APIs to set the path.
The syntax of the glob includes:
* (match 0 or more character)
? (match single character)
[ab] (character class)
[^ab] (negated character class)
[a-b] (character range)
{a,b} (alternation)
\c (escape character)
Following the example in the accepted answer, it is possible to write your path as:
sc.textFile("/user/Orders/2015072[7-9]*,/user/Orders/2015073[0-1]*")
It's not clear how alternation syntax can be used here, since comma is used to delimit a list of paths (as shown above). According to zero323's comment, no escaping is necessary:
sc.textFile("/user/Orders/201507{2[7-9],3[0-1]}*")

In mongodb, need quotes around keys for CRUD operations, example: "_id" vs _id?

I'm reading over the MongoDB manual. Some examples, have quotes around the key values, e.g: db.test.find({"_id" : 5}) and others don't, e.g: db.test.find({_id : 5})
Both quoted and un-quoted versions work. But I'm wondering if there are some nuanced difference here I don't know about or is one a preferred best practice?
Thanks.
In JavaScript (the language of the MongoDB shell) those are treated exactly the same. The quotes are needed, however, when a key contains a period like when you're using dot notation to match against an embedded field as in:
db.test.find({"name.last": "Jones"})
My preference is to not use the quotes unless they're needed.