How to move object from one collection to another without changing _id - mongodb

I have a queue mechanism which I want to keep small, so all queued objects, after they are done processing are moved to a different collection called history, where they are not updated anymore and are only there for reference and info.
Is it possible to take an object from one collection, remove it and insert it into another collection without changing the _id ?
I now solved this by creating a second id in the schema which I transfer to the new object, but I'd rather keep referencing _id.
Also if you think I'm overlooking something and don't need a second collection to keep my queueing mechanism fast, I'd love to hear about it.
Here's how I currently do it (using step and underscore)
`
// moving data to a new process in the history collection
var data = _.omit(process.toObject(), '_id');
console.log("new history data", data);
var history = new db.History(data);
history.save(this.parallel());
db.Queue.remove({_id : process._id }, this.parallel());
`

You can copy/move a doc from one collection to another without changing its _id just fine. If you create a new doc that already has an _id then Mongoose/Mongo will use it. And _id values only need to be unique within a single collection, not between collections.

I tried using delete delete object._id; in order to remove the property and allow mongodb to assign one itself, but for some reason delete did not work.
This works for me:
obj['_id'] = undefined;
If you are not going to use the _id then this will fix your problem.

Related

Is it possible to run a "dummy" query to see how many documents _would_ be inserted

I am using MongoDB to track unique views of a resource.
Everytime a user views a specific resource for the first time, a new view is logged in the db.
If that same user views the same resource again, the unique compound index on the collection blocks the insert of the duplicate.
For bulk inserts, with { ordered: false }, Mongo allows the new views through and blocks the duplicates. The return value of the insert is an object with an insertedCount property, telling me how many docs made it past the unique index.
In some cases, I want to know how many docs would be inserted before running the query. Then, based on the dummy insertedCount, I would choose to run the query, or not.
Is there a way to test a query and have it do everything except actually inserting the docs?
I could solve this by running some js serverside to get the answer I need. But I would prefer to let the db do those checks

mongodb multiple documents insert or update by unique key

I would like to get a list of items from an external resource periodically and save them into a collection.
There are several possible solutions but they are not optimal, for example:
Delete the entire collection and save the new list of items
Get all items from the collection using "find({})" and use it to filter out existing items and save those that do not exist.
But a better solution will be to set a unique key and just do kind of "update or insert".
Right now on saving items the unique key already exists I will get an error
is there a way to do it at all?
**upsert won't do the work since it's updating all items with the same value so it's actually good for a single document only
I have a feeling you can achieve what you want simply by using the "normal" insertMany with the ordered option set to false. The documentation states that
Note that one document was inserted: The first document of _id: 13
will insert successfully, but the second insert will fail. This will
also stop additional documents left in the queue from being inserted.
With ordered to false, the insert operation would continue with any
remaining documents.
So you will get "duplicate key" exceptions which, however, you can simply ignore in your case.

mongoDB: is it better to remove a field to nullify it?

I would like to know which is better for an existing document: To remove a field to nullify it, or to keep the field and simply set it to null?
I'd also like to know more about the process of saving the document: Will the document be saved in the same place or will an entirely new document be saved somewhere else on the disk?
Note: This question has absolutely nothing in common with "Is shortening MongoDB property names worthwhile?"! I'm asking about saving a record that already exists by removing a field instead of nullifying it.

Is it possible to delete a field with MongoEngine, without strict=False?

I've got a lot of data in MongoDB, which we access primarily via MongoEngine, and sometimes data first ended up in field F1, and then we later decided that field F2 is a better place for it, so we moved it over there, and stopped using F1.
That's convenient, but now we've got a bunch of stale (or useless) data in old F1 keys, and new documents are being created with empty F1 keys, for no reason.
While MongoDB being schemaless is convenient, I still appreciate the strict=True feature (which is on by default), and try to avoid turning it off except when absolutely necessary. I don't like turning off all the safety checks on a collection.
So is there any way to delete a field F1 from my MongoDB collection, without downtime, and without strict=False?
If I remove the field from my Document subclass first, MongoEngine will complain when it tries to load existing documents.
If I remove the field from my database first, MongoEngine will create it for any new records, until the model is updated.
Is there any way with MongoEngine to say "This is an old field. You can load it (or ignore it) if it's there, but don't create it for any new documents"?
If I remove the field from my database first, MongoEngine will create it for any new records, until the model is updated
It's only true if you explicitly write to that field or if the field has a default value set. Otherwise the field won't exist in MongoDB.
So as first step I suggest to remove the code that writes to that field and remove the default value (or set it to None). Then it's safe to remove the field from the database.
Below a small proof:
import mongoengine
class Foo(mongoengine.Document):
a = mongoengine.IntField()
b = mongoengine.ListField(default=None)
f = Foo().save()
type(f.a) # NoneType
type(f.b) # NoneType
And the database query:
> db.foo.findOne()
{ "_id" : ObjectId("56c49ae8ee8b341b4ea02fcb") }

MongoDB, return recent document for each user_id in collection

Looking for similar functionality to Postgres' Distinct On.
Have a collection of documents {user_id, current_status, date}, where status is just text and date is a Date. Still in the early stages of wrapping my head around mongo and getting a feel for best way to do things.
Would mapreduce be the best solution here, map emits all, and reduce keeps a record of the latest one, or is there a built in solution without pulling out mr?
There is a distinct command, however I'm not sure that's what you need. Distinct is kind of a "query" command and with lots of users, you're probably going to want to roll up data not in real-time.
Map-Reduce is probably one way to go here.
Map Phase: Your key would simply be an ID. Your value would be something like the following {current_status:'blah',date:1234}.
Reduce Phase: Given an array of values, you would grab the most recent and return only it.
To make this work optimally you'll probably want to look at a new feature from 1.8.0. The "re-reduce" feature. Will allow you to process only new data instead of re-processing the whole status collection.
The other way to do this is to build a "most-recent" collection and tie the status insert to that collection. So when you insert a new status for the user, you update their "most-recent".
Depending on the importance of this feature, you could possibly do both things.
Current solution that seems to be working well.
map = function () {emit(this.user.id, this.created_at);}
//We call new date just in case somethings not being stored as a date and instead just a string, cause my date gathering/inserting function is kind of stupid atm
reduce = function(key, values) { return new Date(Math.max.apply(Math, values.map(function(x){return new Date(x)})))}
res = db.statuses.mapReduce(map,reduce);
Another way to achieve the same result would be to use the group command, which is a kind of a mr-shortcut that lets you aggregate on a specific key or set of keys.
In your case it would read like this:
db.coll.group({ key : { user_id: true },
reduce : function(obj, prev) {
if (new Date(obj.date) < prev.date) {
prev.status = obj.status;
prev.date = obj.date;
}
},
initial : { status : "" }
})
However, unless you have a rather small fixed amount of users I strongly believe that a better solution would be, as previously suggested, to keep a separate collection containing only the latest status-message for each user.