MongoDB 1.6.5: how to rename field in collection - mongodb

$rename function is available only in development version 1.7.2.
How to rename field in 1.6.5?

The simplest way to perform such an operation is to loop through the data set re-mapping the name of the field. The easiest way to do this is to write a function that performs the re-write and then use the .find().forEach() syntax in the shell.
Here's a sample from the shell:
db.foo.save({ a : 1, b : 2, c : 3});
db.foo.save({ a : 4, b : 5, c : 6});
db.foo.save({ a : 7, b : 8 });
db.foo.find();
remap = function (x) {
if (x.c){
db.foo.update({_id:x._id}, {$set:{d:x.c}, $unset:{c:1}});
}
}
db.foo.find().forEach(remap);
db.foo.find();
In the case above I'm doing an $unset and a $set in the same action. MongoDB does not support transactions across collections, but the above is a single document. So you're guaranteed that the set and unset will be atomic (i.e. they both succeed or they both fail).
The only limitation here is that you'll need to manage outside writers to keep the data consistent. My normal preference for this is simply to turn off writes while this updates. If this option is not available, then you'll have to figure out what level of consistency you want for the data. (I can provide some ideas here, but it's really going to be specific to your data and system)

db.collection_name.update({}, {$rename: {"oldname": "newname"}}, false, true);
This will rename the column for each row in the collection.
Also, I discovered that if your column (the one you're renaming) appears within the index catalog (db.collection_name.getIndexes()), then you're going to have to drop and recreate the index (using the new column name) also.

Related

How to search values in real time on a badly designed database?

I have a collection named Company which has the following structure:
{
"_id" : ObjectId("57336ea1a7454c0100d889e4"),
"currentMonth" : 62,
"variables1": { ... },
...
"variables61": { ... },
"variables62" : {
"name" : "Test",
"email": "email#test.com",
...
},
"country" : "US",
}
My need is to be able to search for companies by name with up-to-date data. I don't have permission to change this data structure because many applications still use it. For the moment I haven't found a way to index these variables with this data structure, which makes the search slow.
Today each of these documents can be several megabytes in size and there are over 20,000 of them in this collection.
The system I want to implement uses a search engine to index the names of companies, but for that it needs to be able to detect changes in the collection.
MongoDB's change stream seems like a viable option but I'm not sure how to make it scalable and efficient.
Do you have any suggestions that would help me solve this problem? Any suggestion on the steps needed to set up the above system?
Usually with MongoDB you can add new fields to documents and existing applications would simply ignore the extra fields (though they naturally would not be populated by old code). Therefore:
Create a task that is regularly executed which goes through all documents in your collection, figures out the name for each document from its fields, then writes the name into a top-level field.
Add an index on that field.
In your search code, look up by the values of that field.
Compare the calculated name to the source-of-truth name. If different, discard the document.
If names don't change once set, step 1 only needs to go through documents that are missing the top-level name and step 4 is not needed.
Using the change detection pattern with monstache, I was able to synchronise in real time MongoDB with ElasticSearch, performing a Filter based on the current month and then Map the result of the variables to be indexed 🎊

Is there any way to recover recently deleted documents in MongoDB?

I have removed some documents in my last query by mistake, Is there any way to rollback my last query mongo collection.
Here it is my last query :
db.foo.remove({ "name" : "some_x_name"})
Is there any rollback/undo option? Can I get my data back?
There is no rollback option (rollback has a different meaning in a MongoDB context), and strictly speaking there is no supported way to get these documents back - the precautions you can/should take are covered in the comments. With that said however, if you are running a replica set, even a single node replica set, then you have an oplog. With an oplog that covers when the documents were inserted, you may be able to recover them.
The easiest way to illustrate this is with an example. I will use a simplified example with just 100 deleted documents that need to be restored. To go beyond this (huge number of documents, or perhaps you wish to only selectively restore etc.) you will either want to change the code to iterate over a cursor or write this using your language of choice outside the MongoDB shell. The basic logic remains the same.
First, let's create our example collection foo in the database dropTest. We will insert 100 documents without a name field and 100 documents with an identical name field so that they can be mistakenly removed later:
use dropTest;
for(i=0; i < 100; i++){db.foo.insert({_id : i})};
for(i=100; i < 200; i++){db.foo.insert({_id : i, name : "some_x_name"})};
Now, let's simulate the accidental removal of our 100 name documents:
> db.foo.remove({ "name" : "some_x_name"})
WriteResult({ "nRemoved" : 100 })
Because we are running in a replica set, we still have a record of these documents in the oplog (being inserted) and thankfully those inserts have not (yet) fallen off the end of the oplog (the oplog is a capped collection remember) . Let's see if we can find them:
use local;
db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}).count();
100
The count looks correct, we seem to have our documents still. I know from experience that the only piece of the oplog entry we will need here is the o field, so let's add a projection to only return that (output snipped for brevity, but you get the idea):
db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}, {"o" : 1});
{ "o" : { "_id" : 100, "name" : "some_x_name" } }
{ "o" : { "_id" : 101, "name" : "some_x_name" } }
{ "o" : { "_id" : 102, "name" : "some_x_name" } }
{ "o" : { "_id" : 103, "name" : "some_x_name" } }
{ "o" : { "_id" : 104, "name" : "some_x_name" } }
To re-insert those documents, we can just store them in an array, then iterate over the array and insert the relevant pieces. First, let's create our array:
var deletedDocs = db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}, {"o" : 1}).toArray();
> deletedDocs.length
100
Next we remind ourselves that we only have 100 docs in the collection now, then loop over the 100 inserts, and finally revalidate our counts:
use dropTest;
db.foo.count();
100
// simple for loop to re-insert the relevant elements
for (var i = 0; i < deletedDocs.length; i++) {
db.foo.insert({_id : deletedDocs[i].o._id, name : deletedDocs[i].o.name});
}
// check total and name counts again
db.foo.count();
200
db.foo.count({name : "some_x_name"})
100
And there you have it, with some caveats:
This is not meant to be a true restoration strategy, look at backups (MMS, other), delayed secondaries for that, as mentioned in the comments
It's not going to be particularly quick to query the documents out of the oplog (any oplog query is a table scan) on a large busy system.
The documents may age out of the oplog at any time (you can, of course, make a copy of the oplog for later use to give you more time)
Depending on your workload you might have to de-dupe the results before re-inserting them
Larger sets of documents will be too large for an array as demonstrated, so you will need to iterate over a cursor instead
The format of the oplog is considered internal and may change at any time (without notice), so use at your own risk
While I understand this is a bit old but I wanted to share something that I researched in this area that may be useful to others with a similar problem.
The fact is that MongoDB does not Physically delete data immediately - it only marks it for deletion. This is however version specific and there is currently no documentation or standardization - which could enable a third party tool developer (or someone in desperate need) to build a tool or write a simple script reliably that works across versions. I opened a ticket for this - https://jira.mongodb.org/browse/DOCS-5151.
I did explore one option which is at a much lower level and may need fine tuning based on the version of MongoDB used. Understandably too low level for most people's linking, however it works and can be handy when all else fails.
My approach involves directly working with the binary in the file and using a Python script (or commands) to identify, read and unpack (BSON) the deleted data.
My approach is inspired by this GitHub project (I am NOT the developer of this project). Here on my blog I have tried to simplify the script and extract a specific deleted record from a Raw MongoDB file.
Currently a record is marked for deletion as "\xee" at the start of the record. This is what a deleted record looks like in the raw db file,
‘\xee\xee\xee\xee\x07_id\x00U\x19\xa6g\x9f\xdf\x19\xc1\xads\xdb\xa8\x02name\x00\x04\x00\x00\x00AAA\x00\x01marks\x00\x00\x00\x00\x00\x00#\x9f#\x00′
I replaced the first block with the size of the record which I identified earlier based on other records.
y=”3\x00\x00\x00″+x[20804:20800+51]
Finally using the BSON package (that comes with pymongo), I decoded the binary to a Readable object.
bson.decode_all(y)
[{u’_id': ObjectId(‘5519a6679fdf19c1ad73dba8′), u’name': u’AAA’, u’marks': 2000.0}]
This BSON is a python object now and can be dumped into a recover collection or simply logged somewhere.
Needless to say this or any other recovery technique should be ideally done in a staging area on a backup copy of the database file.

How to use an index with MongoCollection.Update()

I am writing a method that updates a single document in a very large MongoCollection,
and I have an index that I want the MongoCollection.Update() call to use to drastically reduce lookup time, but I can't seem to find anything like MongoCursor.SetHint(string indexName).
Is using an index on an update operation possible? If so, how?
You can create index according to your query section of update command.
For example if you have this collection, named data:
> db.data.find()
{ "_id" : ObjectId("5334908bd7f87918dae92eaf"), "name" : "omid" }
{ "_id" : ObjectId("5334943fd7f87918dae92eb0"), "name" : "ali" }
{ "_id" : ObjectId("53349478d7f87918dae92eb1"), "name" : "reza" }
and if you do this update query:
> db.data.update(query={name:'ali'}, update={name: 'Ali'})
without any defined index, the number of scanned document is 2:
"nscanned" : 2,
But if you define an index, according to your query, here for name field:
db.data.ensureIndex({name:1})
Now if you update it again:
> db.data.update(query={name:'Ali'}, update={name: 'ALI'})
Mongodb use your index for doing update, and number of scanned document is 1:
"nscanned" : 1,
But if you want to hint for update, you can hint it for your query:
# Assume that the index and field of it exists.
> var cursor = db.data.find({name:'ALI'}).hint({family:1})
Then use it in your update query:
> db.data.update(query=cursor, update={name: 'ALI'})
If you already have indexed your collection, update will be using the CORRECT index right away. There is no point to provide hint (in fact you can't hint with update).
Hint is only for debugging and testing purposes. Mongo is in most cases smart enough to automatically decide which index (if you have many of them) should be used in a particular query and it reviews its strategy from time to time.
So short answer - do nothing. If you have an index and it is useful, it will be automatically used on find, update, delete, findOne.
If you want to see if it is used - take the part of the query which searches for something and run it through find with explain.
Example for hellboy. This is just an example and in real life it can be more complex.
So you have a collection with docs like this {a : int, b : timestamp}. You have 2 indexes: one is on a, another is on b. So right now you need to do a query like a > 5 and b is after 2014. For some reason it uses index a, which does not give you the faster time (may be because you have 1000 elements and most of them are bigger than 5 and only 10 are > 2004 ). SO you decided to hint it to use b index. Cool it works much faster now. But your collection changes and right now you are in 2020 year and most of your documents have b bigger than 2014. So right now your index b is not doing so much work. But mongo still uses it, because you told so.

MongoDB: Modify each document on server

Given a large (millions+) collection of documents similar to:
{ _id : ObjectId, "a" : 3, "b" : 5 }
What is the most efficient way to process these documents directly on the server, with the results added to each document within the same collection? For example, add a key c whose value equals a+b.
{ _id : ObjectId, "a" : 3, "b" : 5, "c" : 8 }
I'd prefer to do this in the shell.
Seems that find().forEach() would waste time in transit between the db and the shell, and mapReduce() seems intended to process groups of objects down into aggregated data (though I may be misunderstanding).
EDIT: I'd prefer a solution that doesn't block, if there is one (other than using a cursor on the client)...
From the MongoDB Docs on db.eval():
"db.eval() is used to evaluate a function (written in JavaScript) at the database server.
This is useful if you need to touch a lot of data lightly. In that scenario, network transfer of the data could be a bottleneck."
The documentation has an example of how to use it that is very similar to what you are trying to do.
forEach is your best option. I would run it on the server (from the shell) to reduce latency.

Mongo filed name change not updating indexes

While I undertand a foreign key constraint would not make sense for a NoSql database, should it not ensure that it updates the indexes if it allows me to rename fields?
http://www.mongodb.org/display/DOCS/Updating#Updating-%24rename
{ $rename : { old_field_name : new_field_name } }
but if I had
db.mycollections.ensureIndex({old_field_name:1});
wouldn't it be great if the index was updated automatically?
Is it that since system.indexes is simply just another table and such a automatic update would imply a foreign key constraint of sorts, the index update is not done? Or am I missing certain flags?
It doesn't do it.
The answer to your question "wouldn't it be great if the index was updated automatically?" is, "no, not really".
If you think that renaming fields is a good idea you can add the new index yourself at the same time. You'll likely have lots of other changes to do in your code to reflect a rename on a field (queries, updates, map reduce operations, ...) so why do you think it should single out index recreation as something that should happen automatically on what is a very rare operation when it's just one thing out of many that you'd need to do, manually?
If you care about this feature, go request it, 10Gen are incredibly responsive to suggestions, but I wouldn't be surprised if the answer was "why is this important?"
Quoting Mike O' Brien:
The $rename operator is analogous to doing a $set/$unset in a single atomic operation. It's a shortcut for situations where you need to take a value and move it into another field, without the need to do it in 2 steps (one to fetch the value of the field, another to set the new one).
Doing a $rename does mean the data is changing. If I use $rename to rename a field named "x" to "y", but the field named "y" already existed in the document, the old value for "y" is overwritten, and the field "x" will no longer exist anymore. If either "x" or "y" is indexed, then the operation will update those indexes to reflect the final values resulting from the operation. The same applies when using rename to move a field from within an embedded document up to the top level (e.g. renaming "a.b" to "c") or vice versa.
The behavior suggested in the SO question (i.e., renaming a field maintains the relationship between the field it was moved to and its value in the index) then things can get really confusing and make it difficult to reason about what the "correct" expected behavior is for certain operations. For example, if I create an index on field "A" in a collection, rename "A" to "B" in one of the documents, and then do things like:
update({"A" : }, {"$set":{"B":}}) // find a document where A= and set its value of B to
update({"B" : }, {"$set":{"A":}}) // find a document where B= and set its value of A to
Should these be equivalent?
In general, having the database maintain indexes across a collection by field name is a design decision that keeps behavior predictable and simple.