MongoDB case insensitive key search - mongodb

I am able to query values without regard to case, but I would like to to query keys insensitively, so users can type them in all lower case.
This doesn't work, because it is not valid JSON:
{
/^lastName$/i: "Jones"
}
Is there a strategy I could use for this, besides just making a new collection of keys as values?

There is currently no way to do this.
MongoDB is "schema-free" but that should not be confused with "doesn't have a schema". There's an implicit assumption that your code has some control over the names of the keys that actually appear in the system.
Let's flip the question around.
Is there a good reason that users are inserting case-sensitive keys?
Can you just cast all keys to lower-case when they're inserted?
Again, MongoDB assumes that you have some knowledge of the available keys. Your question implies that you have no knowledge of the available keys. You'll need to close this gap.

Related

Why does Map addition of duplicates only take the last key's value

Per this question:
Best way to merge two maps and sum the values of same key?
I would need to use scalaz to get what I want, however I am curious if anybody knew why the below does not work as I expect?
Map(1->2.0)+(1->1.0) //Map(1->1.0)
I would expect this to result in Map(1->3.0). But, it appears that maps only return the last key as shown by:
Map(1->1.0, 1->3.0) //Map(1->3.0)
So, based on the documentation
Adds two or more elements to this collection and returns a new collection.
and the above, my guess is that the map might store the values, but only return the last item? This is just not my intuition of what the add should do...maybe it is an efficiency move.
Once I have more of a moment, I will take a look at the code and try to figure it out from there, but wanted to ask here in case somebody already knew?
It has nothing to do with efficiency; it's typing. Map plus elements returns a map of compatible type. You don't know the type, so you can't know to add numbers. You could list them instead, but Seq(2.0,1.0) is not a supertype of 2.0. So you'd end up having a map to Any, which doesn't help you out at all when it comes to keeping your types straight, and you wouldn't have any way to replace an existing element with another.
So, + adds a new element if the key does not exist, or replaces the existing element if the key does exist. (The docs should say so, though.)
If you want the "other" behavior, you need a more complex transformation, and that's what Scalaz' |+| will do for you for the natural addition on the elements.
#RexKerr's answer is perfectly correct, but I think doesn't emphasise the crucial misunderstanding here.
The + operation on a Map means put - it puts a new key/value pair into the map (potentially replacing the existing pair for that key). It doesn't have anything to do with addition (and Rex's answer explains further how it cannot necessarily have anything to do with addition). You seem to come from a C# background, so you should think of:
myMap + (1, 1.0)
as being
myMap[1] = 1.0
The ability to insert a new key/value pair is a fundamental operation of a Map/Dictionary datatype. The ability you want to encode is something much less fundamental (and a special case for a more general ability to merge maps by key, as is mentioned in the question you reference, and here).

Scaler values vs array in MongoDB

Consider following two collections and followed note. Which one do you think is more appropriate ?
// #1
{x:'a'}
{x:'b'}
{x:'c'}
{x:['d','e']}
{x:'f'}
.
//#2
{x:['a']}
{x:['b']}
{x:['c']}
{x:['d','e']}
{x:['f']}
some facts:
field x have usually only one value (95%) and some times more (5%).
Mongodb behaves with {x:['a']} like {x:'a'} while querying.
MongoVUE shows scaler values in #1 directly and shows Array[0] for #2.
Using #1, when you want append a new value you have to cast data types
#1 May be a little faster in some CRUD operation (?)
To amplify #ZaidMasud's point I recommend staying with sclars or arrays and not mix both. If you have unavoidable reasons for having both (legacy data, say) then I recommend that you get very familiar with how Mongo queries work with arrays; it is not intuitive at first glance. See for example this puzzler.
From a schema design perspective, even though MongoDB allows you to store different data types for a key value pair, it's not necessarily a good idea to do so. If there is no compelling reason to use different data types, it's often best to use the same datatype for a given key/value pair.
So given that reasoning, I would prefer #2. Application code will generally be simpler in this case. Additionally, if you ever need to use the Aggregation Framework, you will find it useful to have uniform types.

Use optional keys or a catch-all key in MongoMapper?

Suppose I'm working on a MongoMapper class that looks like this:
class Animal
include MongoMapper::Document
key :type, String, :required => true
key :color, String
key :feet, Integer
end
Now I want to store a bird's wingspan. Would it be better to add this, even though it's irrelevant for many documents and feels a bit untidy:
key :wingspan, Float
Or this, even though it's an indescriptive catch-all that feels like a hack:
key :metadata, Hash
It seems like the :metadata approach (for which there's precedent in the code I'm inheriting) is almost redundant to the Mongo document as a whole: they're both intended to be schemaless buckets of key-value pairs.
However, it also seems like adding animal-specific keys is a slippery slope to a pretty ugly model.
Any alternatives (create a Bird subclass)?
MongoMapper doesn't store keys that are nil, so if you did define key :wingspan only the documents that actually set that key would store it.
If you opt not to define the key, you can still set/get it with my_bird[:wingspan] = 23. (The [] call will actually automatically define a key for you; similarly if a doc comes back from MongoDB with a key that's not explicitly defined a key will be defined for it and all docs of that class--it's kind of a bug to define it for the whole class but since nil keys aren't stored it's not so much of a problem.)
If bird has its own behavior as well (it probably does), then a subclass makes sense. For birds and animals I would take this route, since every bird is an animal. MongoDB is much nicer than ActiveRecord for Single Table/Single Collection Inheritance, because you don't need a billion migrations and your code makes it clear which attributes go with which classes.
It's hard to give a good answer without knowing how you intend to extend the database in the future and how you expect to use the information you store. If you were storing large numbers of birds and wanted to summarize on wingspan, then wingspan would be helpful even if it would be unused for other animals. If you plan to store random arbitrary information for every known animal, there are too many possibilities to try to track in a schema and the metadata approach would be more usable.

MongoDB: What's a good way to get a list of all unique tags?

What's the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don't have access to mongodb's new "distinct" command, either, since my driver, erlmongo, doesn't seem to implement it, yet.
Even if your driver doesn't implement distinct, you can implement it yourself. In JavaScript (sorry, I don't know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({"distinct" : "collection_name", "key" : "tags"})
So, that is: you do a findOne on the "$cmd" collection of whatever database you're using. Pass it the collection name and the key you want to run distinct on.
If you ever need a command your driver doesn't provide a helper for, you can look at http://www.mongodb.org/display/DOCS/List+of+Database+Commands for a somewhat complete list of database commands.
I know this is an old question, but I had the same issue and could not find a real solution in PHP for it.
So I came up with this:
http://snipplr.com/view/59334/list-of-keys-used-in-mongodb-collection/
John, you may find it useful to use Variety, an open source tool for analyzing a collection's schema: https://github.com/jamescropcho/variety
Perhaps you could run Variety every N hours in the background, and query the newly-created varietyResults database to retrieve a listing of unique keys which begin with a given string (i.e. are descendants of a specific parent).
Let me know if you have any questions, or need additional advice.
Good luck!

Hashes vs Numeric id's

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).
Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?
In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?
Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?
The platform uses php/python with ISAM /w MySQL.
Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.
For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.
If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.
I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.
Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.
Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.
But watch out for collisions, obviously.
With hashes you
Are free to merge the database with a similar one (or a backup), if necessary
Are not doing something that could help some guessing attacks even a bit
Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
(Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
Helping yourself in case there appears a search engine that indexes by GUID.
Please let us know if the last 6 months brought you some clarity on this question...
Hashes aren't guaranteed to be unique, nor, I believe, consistent.
will your users have to remember/use the value? or are you looking at it from a security POV?
From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.
Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.
However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.
Hashes have the advantage that you can check if they are valid or not BEFORE performing any check to your database whether they exist or not. This can help you to fend off attacks with random hashes as you don't need to burden your database with fake lookups.
Therefor, if your hash has some kind of well-defined format with for example a checksum at the end, you can check if it's correct without needing to go to the database.