Mongo filed name change not updating indexes - mongodb

While I undertand a foreign key constraint would not make sense for a NoSql database, should it not ensure that it updates the indexes if it allows me to rename fields?
http://www.mongodb.org/display/DOCS/Updating#Updating-%24rename
{ $rename : { old_field_name : new_field_name } }
but if I had
db.mycollections.ensureIndex({old_field_name:1});
wouldn't it be great if the index was updated automatically?
Is it that since system.indexes is simply just another table and such a automatic update would imply a foreign key constraint of sorts, the index update is not done? Or am I missing certain flags?

It doesn't do it.
The answer to your question "wouldn't it be great if the index was updated automatically?" is, "no, not really".
If you think that renaming fields is a good idea you can add the new index yourself at the same time. You'll likely have lots of other changes to do in your code to reflect a rename on a field (queries, updates, map reduce operations, ...) so why do you think it should single out index recreation as something that should happen automatically on what is a very rare operation when it's just one thing out of many that you'd need to do, manually?
If you care about this feature, go request it, 10Gen are incredibly responsive to suggestions, but I wouldn't be surprised if the answer was "why is this important?"

Quoting Mike O' Brien:
The $rename operator is analogous to doing a $set/$unset in a single atomic operation. It's a shortcut for situations where you need to take a value and move it into another field, without the need to do it in 2 steps (one to fetch the value of the field, another to set the new one).
Doing a $rename does mean the data is changing. If I use $rename to rename a field named "x" to "y", but the field named "y" already existed in the document, the old value for "y" is overwritten, and the field "x" will no longer exist anymore. If either "x" or "y" is indexed, then the operation will update those indexes to reflect the final values resulting from the operation. The same applies when using rename to move a field from within an embedded document up to the top level (e.g. renaming "a.b" to "c") or vice versa.
The behavior suggested in the SO question (i.e., renaming a field maintains the relationship between the field it was moved to and its value in the index) then things can get really confusing and make it difficult to reason about what the "correct" expected behavior is for certain operations. For example, if I create an index on field "A" in a collection, rename "A" to "B" in one of the documents, and then do things like:
update({"A" : }, {"$set":{"B":}}) // find a document where A= and set its value of B to
update({"B" : }, {"$set":{"A":}}) // find a document where B= and set its value of A to
Should these be equivalent?
In general, having the database maintain indexes across a collection by field name is a design decision that keeps behavior predictable and simple.

Related

MongoDB: how to persist selected document from collection

I'm new to MongoDB and I'm not how to best solve my fairly basic problem.
I have a Collection of "emoji" Documents in my database. At any given time, there is one (and only one) "selected" emoji Document. This is determined and updated by the application. How can I persist the information of which one is selected to the database?
Approach 1:
Add a new Collection to hold this kind of metadata of the emoji collection? I'm thinking it would hold a single document with a reference to the currently selected emoji document. This seems to hurt the OO design. A whole collection, with a single document, to hold a single property. But it does have flexibility to add more metadata.
Approach 2:
Add a new boolean field to each emoji Document indicating whether or not it is the current selected emoji. This seems like a lot of extra info to track for each Document, when only one should have a true value. I would also be concerned with maintaining consistency.
I know I'm not the first person to have this issue, but I couldn't find a solution this is as a general case. Thanks!
MongoDB is schemaless so you can just add the boolean field to the currently selected emoji and remove it when the selection changes. You should add a parse unique index to make querying this field faster. You could set the field using this syntax:
db.emojis.update({name:"b"},{$set:{selected:true}})
And simply unset it like this:
db.emojis.update({name:"b"},{$unset:{selected:""}})
You could create the following parse unique index to ensure there is only ever one field with selected:true
db.emojis.createIndex( { selected: 1 } , { sparse: true, unique: true } )

Mongodb id on bulk insert performance

I have a class/object that have a guid and i want to use that field as the _id object when it is saved to Mongodb. Is it possible to use other value instead of the ObjectId?
Is there any performance consideration when doing bulk insert when there is an _id field? Is _id an index? If i set the _id to different field, would it slow down the bulk insert? I'm inserting about 10 million records.
1) Yes you can use that field as the id. There is no mention of what API (if any) you are using for inserting the documents. So if you would do the insertion at the command line, the command would be:
db.collection.insert({_id : <BSONString_version_of_your_guid_value>, field1 : value1, ...});
It doesn't have to be BsonString. Change it to whatever Bson value is closest matching to your guid's original type (except the array type. Arrays aren't allowed as the value of _id field).
2) As far as i know, there IS effect on performance when db.collection.insert when you provide your own ids, especially in bulk, BUT if the id's are sorted etc., there shouldn't be a performance loss. The reason, i am quoting:
The structure of index is a B-tree. ObjectIds have an excellent
insertion order as far as the index tree is concerned: they are always
increasing, meaning they are always inserted at the right edge of
B-tree. This, in turn, means that MongoDB only has to keep the right
edge of the B-Tree in memory.
Conversely, a random value in the _id field means that _ids will be
inserted all over the tree. Then the machine must move a page of the
index into memory, update a tiny piece of it, then probably ignore it
until it slides out of memory again. This is less efficient.
:from the book `50 Tips and Tricks for MongoDB Developers`
The tip's title says - "Override _id when you have your own simple, unique id." Clearly it is better to use your id if you have one and you don't need the properties of an ObjectId. And it is best if your ids are increasing for the reason stated above.
3) There is a default index on _id field by MongoDB.
So...
Yes. It is possible to use other types than ObjectId, including GUID that will be saved as BinData.
Yes, there are considerations. It's better if your _id is always increasing (like a growing number, or ObjectId) otherwise the index needs to rebuild itself more often. If you plan on using sharding, the _id should also be hashed evenly.
_id indeed has an index automatically.
It depends on the type you choose. See section 2.
Conclusion: It's better to keep using ObjectId unless you have a good reason not to.

Does updating a MongoDB record rewrites the whole record or only the updated fields?

I have a MongoDB collection as follows:
comment_id (number)
comment_title (text)
score (number)
time_score (number)
final_score (number)
created_time (timestamp)
Score is and integer that's usually updated using $inc 1 or -1 whenever someone votes up or down for that record.
but time_score is updated using a function relative to timestamp and current time and other factors like how many (whole days passed) and how many (whole weeks passed) .....etc
So I do $inc and $dec on db directly but for the time_score, I retrieve data from db calculate the new score and write it back. What I'm worried about is that in case many users incremented the "score" field during my calculation of time_score then when I wrote time_score to db it'll corrupt the last value of score.
To be more clear does updating specific fields in a record in Mongo rewrites the whole record or only the updated fields ? (Assume that all these fields are indexed).
By default, whole documents are rewritten. To specify the fields that are changed without modifying anything else, use the $set operator.
Edit: The comments on this answer are correct - any of the update modifiers will cause only relevant fields to be rewritten rather than the whole document. By "default", I meant a case where no special modifiers are used (a vanilla document is provided).
The algorithm you are describing is definitely not thread-safe.
When you read the entire document, change one field and then write back the entire document, you are creating a race condition - any field in the document that is modified after your read but before your write will be overwritten by your update.
That's one of many reasons to use $set or $inc operators to atomically set individual fields rather than updating the entire document based on possibly stale values in it.
Another reason is that setting/updating a single field "in-place" is much more efficient than writing the entire document. In addition you have less load on your network when you are passing smaller update document ({$set:{field:value}}, rather than entire new version of the document).

The fastest way to show Documents with certain property first in MongoDB

I have collections with huge amount of Documents on which I need to do custom search with various different queries.
Each Document have boolean property. Let's call it "isInTop".
I need to show Documents which have this property first in all queries.
Yes. I can easy do sort in this field like:
.sort( { isInTop: -1 } );
And create proper index with field "isInTop" as last field in it. But this will be work slowly, as indexes in mongo works best with unique fields.
So is there is solution to show Documents with field "isInTop" on top of each query?
I see two solutions here.
First: set Documents wich need to be in top the _id from "future". As you know, ObjectId contains timestamp. So I can create ObjectId with timestamp from future and use natural order
Second: create separate collection for Ducuments wich need to be in top. And do queries in it first.
Is there is any other solutions for this problem? Which will work fater?
UPDATE
I have done this issue with sorting on custom field which represent rank.
Using the _id field trick you mention has the problem that at some point in time you will reach the special time, and you can't change the _id field (without inserting a new document and removing the old one).
Creating a special collection which just holds the ones you care about is probably the best option. It gives you the ability to logically (and to some extent, physically) separate the documents.
Newly introduced in mongodb there is also support for a "sparse" index which may fulfill your needs as well. You could only set the "isInTop" field when you want it to be special, and then create a sparse index on it which would not have the problems you would normally have with a single indexed boolean field (in btrees).

MongoDB: Speed of field ("inside record") search in comporation with speed of search in "global scope"

My question may be not very good formulated because I haven't worked with MongoDB yet, so I'd want to know one thing.
I have an object (record/document/anything else) in my database - in global scope.
And have a really huge array of other objects in this object.
So, what about speed of search in global scope vs search "inside" object? Is it possible to index all "inner" records?
Thanks beforehand.
So, like this
users: {
..
user_maria:
{
age: "18",
best_comments :
{
goodnight:"23rr",
sleeptired:"dsf3"
..
}
}
user_ben:
{
age: "18",
best_comments :
{
one:"23rr",
two:"dsf3"
..
}
}
So, how can I make it fast to find user_maria->best_comments->goodnight (index context of collections "best_comment") ?
First of all, your example schema is very questionable. If you want to embed comments (which is a big if), you'd want to store them in an array for appropriate indexing. Also, post your schema in JSON format so we don't have to parse the whole name/value thing :
db.users {
name:"maria",
age: 18,
best_comments: [
{
title: "goodnight",
comment: "23rr"
},
{
title: "sleeptired",
comment: "dsf3"
}
]
}
With that schema in mind you can put an index on name and best_comments.title for example like so :
db.users.ensureIndex({name:1, 'best_comments.title:1})
Then, when you want the query you mentioned, simply do
db.users.find({name:"maria", 'best_comments.title':"first"})
And the database will hit the index and will return this document very fast.
Now, all that said. Your schema is very questionable. You mention you want to query specific comments but that requires either comments being in a seperate collection or you filtering the comments array app-side. Additionally having huge, ever growing embedded arrays in documents can become a problem. Documents have a 16mb limit and if document increase in size all the time mongo will have to continuously move them on disk.
My advice :
Put comments in a seperate collection
Either do document per comment or make comment bucket documents (say,
100 comments per document)
Read up on Mongo/NoSQL schema design. You always query for root documents so if you end up needing a small part of a large embedded structure you need to reexamine your schema or you'll be pumping huge documents over the connection and require app-side filtering.
I'm not sure I understand your question but it sounds like you have one record with many attributes.
record = {'attr1':1, 'attr2':2, etc.}
You can create an index on any single attribute or any combination of attributes. Also, you can create any number of indices on a single collection (MongoDB collection == MySQL table), whether or not each record in the collection has the attributes being indexed on.
edit: I don't know what you mean by 'global scope' within MongoDB. To insert any data, you must define a database and collection to insert that data into.
Database 'Example':
Collection 'table1':
records: {a:1,b:1,c:1}
{a:1,b:2,d:1}
{a:1,c:1,d:1}
indices:
ensureIndex({a:ascending, d:ascending}) <- this will index on a, then by d; the fact that record 1 doesn't have an attribute 'd' doesn't matter, and this will increase query performance
edit 2:
Well first of all, in your table here, you are assigning multiple values to the attribute "name" and "value". MongoDB will ignore/overwrite the original instantiations of them, so only the final ones will be included in the collection.
I think you need to reconsider your schema here. You're trying to use it as a series of key value pairs, and it is not specifically suited for this (if you really want key value pairs, check out Redis).
Check out: http://www.jonathanhui.com/mongodb-query