I've got a lot of data in MongoDB, which we access primarily via MongoEngine, and sometimes data first ended up in field F1, and then we later decided that field F2 is a better place for it, so we moved it over there, and stopped using F1.
That's convenient, but now we've got a bunch of stale (or useless) data in old F1 keys, and new documents are being created with empty F1 keys, for no reason.
While MongoDB being schemaless is convenient, I still appreciate the strict=True feature (which is on by default), and try to avoid turning it off except when absolutely necessary. I don't like turning off all the safety checks on a collection.
So is there any way to delete a field F1 from my MongoDB collection, without downtime, and without strict=False?
If I remove the field from my Document subclass first, MongoEngine will complain when it tries to load existing documents.
If I remove the field from my database first, MongoEngine will create it for any new records, until the model is updated.
Is there any way with MongoEngine to say "This is an old field. You can load it (or ignore it) if it's there, but don't create it for any new documents"?
If I remove the field from my database first, MongoEngine will create it for any new records, until the model is updated
It's only true if you explicitly write to that field or if the field has a default value set. Otherwise the field won't exist in MongoDB.
So as first step I suggest to remove the code that writes to that field and remove the default value (or set it to None). Then it's safe to remove the field from the database.
Below a small proof:
import mongoengine
class Foo(mongoengine.Document):
a = mongoengine.IntField()
b = mongoengine.ListField(default=None)
f = Foo().save()
type(f.a) # NoneType
type(f.b) # NoneType
And the database query:
> db.foo.findOne()
{ "_id" : ObjectId("56c49ae8ee8b341b4ea02fcb") }
Related
The save() method seems to replace an entire document/record in my database. If I know the primary key of a document in my database, what is the best way in Grails for me to update a single field on that document without first querying the database using get()?
For instance, I don't like the following code because it executes two queries when all I want to do is to update myField.
def key = "foo"
def doc = MyDomain.get(key) // This is the query I want to eliminate
doc.myField = "bar"
doc.save()
In situations where I know the primary key, I want to simply update a single field, similarly to how Ruby on Rails leverages the ActionModel.update_attribute() method.
Even though my specific database is MongoDB, I think the question is applicable to any database, SQL or NoSQL. If the database supports the ability to update just a single field on one record via one query, using Grails can I avoid the extra get() query to retrieve a full record from the database?
I'm beginner with mongoDB. i want to know is there any way to load predefined schema to mongoDB? ( for example like cassandra that use .cql file for this purpose)
If there is, please intruduce some document about structure of that file and way for restoring.
If there is not, how i can create an index only one time when I create a collection. I think it is wrong if i create index every time I call insert method or run my program.
p.s: I have a multi-threaded program that every thread insert and update my mongo collection. I want to create index only one time.
Thanks.
To create an index on a collection you need to use ensureIndex command. You need to only call it once to create an index on a collection.
If you call ensureIndex repeatedly with the same arguments, only the first call will create an index, all subsequent calls will have no effect.
So if you know what indexes you're going to use for your database, you can create a script that will call that command.
An example insert_index.js file that creates 2 indexes for collA and collB collections:
db.collA.ensureIndex({ a : 1});
db.collB.ensureIndex({ b : -1});
You can call it from a shell like this:
mongo --quiet localhost/dbName insert_index.js
This will create those indexes on a database named dbName on your localhost. It's worth noticing that if your database and/or collections are not yet created, this will create both the database and the collections for which you're adding the indexes.
Edit
To clarify a little bit. MongoDB is schemaless so you can't restore it's schema.
You can only create indexes and collections (by using createCollection helper).
MongoDB is basically schemaless so there is no definition of a schema or namespaces to be restored.
In the case of indexes, these can be created at any time. There does not need to be a collection present or even the required fields for the index as this will all be sorted out as the collections are created and when documents are inserted that matches the defined fields.
Commands to create an index are generally the same with each implementation language, for example:
db.collection.ensureIndex({ a: 1, b: -1 })
Will define the index on the target collection in the target database that will reference field "a" and field "b", the latter in descending order. This will happen even if the collection or even the database does not exist as yet, or in fact will establish a blank namespace in that case.
Subsequent calls to the same index creation method do not actually re-create the index. Where the same index is specified to one that already exists it is effectively skipped as a "no-operation".
As such, you can simply feed all your required index creation statements at application startup and anything that is not already present will be created. Anything that already exists will be left alone.
I need to create a document in mongodb and then immediately want it available in my application. The normal way to do this would be (in Python code):
doc_id = collection.insert({'name':'mike', 'email':'mike#gmail.com'})
doc = collection.find_one({'_id':doc_id})
There's two problems with this:
two requests to the server
not atomic
So, I tried using the find_and_modify operation to effectively do a "create and return" with the help of upserts like this:
doc = collection.find_and_modify(
# so that no doc can be found
query= { '__no_field__':'__no_value__'},
# If the <update> argument contains only field and value pairs,
# and no $set or $unset, the method REPLACES the existing document
# with the document in the <update> argument,
# except for the _id field
document= {'name':'mike', 'email':'mike#gmail.com'},
# since the document does not exist, this will create it
upsert= True,
#this will return the updated (in our case, newly created) document
new= True
)
This indeed works as expected. My question is: whether this is the right way to accomplish a "create and return" or is there any gotcha that I am missing?
What exactly are you missing from a plain old regular insert call?
If it is not knowing what the _id will be, you could just create the _id yourself first and insert the document. Then you know exactly how it will look like. None of the other fields will be different from what you sent to the database.
If you are worried about guarantees that the insert will have succeeded you can check the return code, and also set a write concern that provides enough assurances (such as that it has been flushed to disk or replicated to enough nodes).
I have a queue mechanism which I want to keep small, so all queued objects, after they are done processing are moved to a different collection called history, where they are not updated anymore and are only there for reference and info.
Is it possible to take an object from one collection, remove it and insert it into another collection without changing the _id ?
I now solved this by creating a second id in the schema which I transfer to the new object, but I'd rather keep referencing _id.
Also if you think I'm overlooking something and don't need a second collection to keep my queueing mechanism fast, I'd love to hear about it.
Here's how I currently do it (using step and underscore)
`
// moving data to a new process in the history collection
var data = _.omit(process.toObject(), '_id');
console.log("new history data", data);
var history = new db.History(data);
history.save(this.parallel());
db.Queue.remove({_id : process._id }, this.parallel());
`
You can copy/move a doc from one collection to another without changing its _id just fine. If you create a new doc that already has an _id then Mongoose/Mongo will use it. And _id values only need to be unique within a single collection, not between collections.
I tried using delete delete object._id; in order to remove the property and allow mongodb to assign one itself, but for some reason delete did not work.
This works for me:
obj['_id'] = undefined;
If you are not going to use the _id then this will fix your problem.
While I undertand a foreign key constraint would not make sense for a NoSql database, should it not ensure that it updates the indexes if it allows me to rename fields?
http://www.mongodb.org/display/DOCS/Updating#Updating-%24rename
{ $rename : { old_field_name : new_field_name } }
but if I had
db.mycollections.ensureIndex({old_field_name:1});
wouldn't it be great if the index was updated automatically?
Is it that since system.indexes is simply just another table and such a automatic update would imply a foreign key constraint of sorts, the index update is not done? Or am I missing certain flags?
It doesn't do it.
The answer to your question "wouldn't it be great if the index was updated automatically?" is, "no, not really".
If you think that renaming fields is a good idea you can add the new index yourself at the same time. You'll likely have lots of other changes to do in your code to reflect a rename on a field (queries, updates, map reduce operations, ...) so why do you think it should single out index recreation as something that should happen automatically on what is a very rare operation when it's just one thing out of many that you'd need to do, manually?
If you care about this feature, go request it, 10Gen are incredibly responsive to suggestions, but I wouldn't be surprised if the answer was "why is this important?"
Quoting Mike O' Brien:
The $rename operator is analogous to doing a $set/$unset in a single atomic operation. It's a shortcut for situations where you need to take a value and move it into another field, without the need to do it in 2 steps (one to fetch the value of the field, another to set the new one).
Doing a $rename does mean the data is changing. If I use $rename to rename a field named "x" to "y", but the field named "y" already existed in the document, the old value for "y" is overwritten, and the field "x" will no longer exist anymore. If either "x" or "y" is indexed, then the operation will update those indexes to reflect the final values resulting from the operation. The same applies when using rename to move a field from within an embedded document up to the top level (e.g. renaming "a.b" to "c") or vice versa.
The behavior suggested in the SO question (i.e., renaming a field maintains the relationship between the field it was moved to and its value in the index) then things can get really confusing and make it difficult to reason about what the "correct" expected behavior is for certain operations. For example, if I create an index on field "A" in a collection, rename "A" to "B" in one of the documents, and then do things like:
update({"A" : }, {"$set":{"B":}}) // find a document where A= and set its value of B to
update({"B" : }, {"$set":{"A":}}) // find a document where B= and set its value of A to
Should these be equivalent?
In general, having the database maintain indexes across a collection by field name is a design decision that keeps behavior predictable and simple.