Clean up content after application bug - mongodb

I'm new on mongoDB and the point is, that we new realize an bug at our application witch results in multiple mongoDB entries instead of update the (edited) document.
now, the application is online, we realize the bug and trying to manage the trouble which comes with that.
Following situation: #mongoDB there are lots of documents containing this structure:
"_id" : ObjectId("4fd9ede5a6b9579f5b000003"),
"USER" : "my_username",
"matchID" : 18809,
"data1" : 2,
"data2" : 1,
"tippDate" : ISODate("2012-06-14T13:57:57Z"),
"data3" : 0
If the user changes the data at the application, the application inserts an new document instead of updating the existing.
Like that
{
"_id" : ObjectId("4fd9ede5a6b9579f5b000003"),
"USER" : "my_username",
"matchID" : 18809,
"data1" : 2,
"data2" : 1,
"tippDate" : ISODate("2012-06-14T13:57:57Z"),
"data3" : 0
}
{
"_id" : ObjectId("4fd9ede5a6b9579f5b000002"),
"USER" : "my_username",
"matchID" : 18809,
"data1" : 4,
"data2" : 2,
"tippDate" : ISODate("2012-06-14T12:45:33Z"),
"data3" : 0
}
Right now, the bug on application side is solved, but now we have to clean up the database.
The gaol is to keep only the newest record/document from each user.
One way is to handle this on application side: loading all the data from one user, order by date, removing all the data from one user and writing the newest entry back to mongoDB.
But: isn't it possible to process that an mongoDB like an delete with joints on MySQL?
Thank you for any kind of help or hints!

is'n it possible to process that an mongoDB like an delete with joints on MySQL?
No. MongoDB does not support joins at all.
However, MongoDB does have sorting. So you can run a script to fetch each user, sort them by date and then delete the old ones.
Also, please not that you can override the _id field. It does not have to be an ObjectId(). Based on your description, you have a unique user_name, so why not simply use that as the _id?

Related

How to get mongodb schema dump

I can take mongodb data backup but I am not sure about mongodb schama backup.
Is there any way to take dump of MONGODB schema only not the data ?
You need to use mongorestore... which is used for things like importing json, or csv, etc.
You can read more about mongorestore in the docs below; I'd take a look and read up on them as they are very helpful.
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-mongorestore
You can also check out http://learnmongo.com for tips and help!
or you can visit the links
How to use the dumped data by mongodump? hope this may be helpful for you.
MongoDB is an NoSQL Database.
There is no fixed schema for any collection, so there are no functions available in mongo shell to find the collection schema.
Fixed Schema is applicable for RDBMS databases. In NoSQL DB, such as mongodb it is not required, but you can enforce same schema using your implementation logic, if required.
A document in a same collection, can be of different schema's. Please see example below
db.mycollection.insert([
{ "_id":1, "name":"A"},
{ "_id":2, "name":"CD", "age":29},
{ "_id":3, "name":"AB", "age":28},
{ "_id":4, "name":"ABC", "age":27, "emailId":"abc#xyz.com"},
{ "_id":5, "name":"ABCD", "age":29, "emailId":"abcd#xyz.com"}]);
db.mycollection.find();
{ "_id" : 1, "name" : "A" }
{ "_id" : 2, "name" : "CD", "age" : 29 }
{ "_id" : 3, "name" : "AB", "age" : 28 }
{ "_id" : 4, "name" : "ABC", "age" : 27, "emailId" : "abc#xyz.com" }
{ "_id" : 5, "name" : "ABCD", "age" : 29, "emailId" : "abcd#xyz.com" }
An approach to find the schema
In Mongo Shell
var k = db.mycollection.findOne();
for ( i in k){print (i)};
_id
name
this approach will work for you if all the documents in your collection follows the same schema.
Here's how I did it:
mongodump --uri="mongodb://localhost/mydb" -o ./mydb-dump
find ./mydb-dump -name *.bson -exec truncate -s 0 {} \;
Explanation: I'm dumping the whole database, then truncating all the .bson files (which hold collection data) to zero bytes.
Limitation: Obviously, this is only practical if the source database is small, otherwise you're generating a huge data dump only to throw away most of it.
To restore this-
mongorestore --uri="mongodb://some-other-server/mydb" ./mydb-dump
If there's a better way to do this, I'd love to know what it is!
MongoDB Compass GUI has a way to export the schema to JSON.
At the time of this post, there doesn't seem to be a way to do this by bulk, so this will have to be done for each collection one by one.
From the docs:
You can export your schema after analyzing it. This is useful for
sharing your schema and comparing schemas across collections.
If you have not already done so, analyze your schema:
Select your desired collection and click the Schema tab. Click
Analyze Schema.
Once your schema has been analyzed, export your schema:
In the top menu bar, click Collection. From the dropdown, click Share
Schema as JSON.
Your schema is copied to your clipboard as a JSON object.
See full docs here ~ https://www.mongodb.com/docs/compass/master/schema/export/

Mongodb: How to avoid locking on big collection updates

I have an events collection of 2.502.011 elements and would like to perform an update on all elements. Unfortunately I facing a lot of mongodb faults due to the write lock.
Question: How can I avoid those faults in order to be sure that all my events are correctly updated ?
Here are the informations regarding my events collection:
> db.events.stats()
{
"count" : 2502011,
"size" : 2097762368,
"avgObjSize" : 838.4305136947839,
"storageSize" : 3219062784,
"numExtents" : 21,
"nindexes" : 6,
"lastExtentSize" : 840650752,
"paddingFactor" : 1.0000000000874294,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 1265898256,
"indexSizes" : {
"_id_" : 120350720,
"destructured_created_at_1" : 387804032,
"destructured_updated_at_1" : 419657728,
"data.assigned_author_id_1" : 76053152,
"emiting_class_1_data.assigned_author_id_1_data.user_id_1_data.id_1_event_type_1" : 185071936,
"created_at_1" : 76960688
}
}
Here is what an event look like:
> db.events.findOne()
{
"_id" : ObjectId("4fd5d4586107d93b47000065"),
"created_at" : ISODate("2012-06-11T11:19:52Z"),
"data" : {
"project_id" : ObjectId("4fc3d2abc7cd1e0003000061"),
"document_ids" : [
"4fc3d2b45903ef000300007d",
"4fc3d2b45903ef000300007e"
],
"file_type" : "excel",
"id" : ObjectId("4fd5d4586107d93b47000064")
},
"emiting_class" : "DocumentExport",
"event_type" : "created",
"updated_at" : ISODate("2013-07-31T08:52:48Z")
}
I would like to update each event to add 2 new fields base on the existing created_at and updated_at. Please correct me if I am wrong but it seems you can't use the mongo update command when you need to access current's element data along the way.
This is my update loop:
db.events.find().forEach(
function (e) {
created_at = new Date(e.created_at);
updated_at = new Date(e.updated_at);
e.destructured_created_at = [e.created_at]; // omitted the actual values
e.destructured_updated_at = [e.updated_at]; // omitted the actual values
db.events.save(e);
}
)
When running the above command, I get a huge amount of page faults due to the write lock on the database.
I think you are confused here, it is not the write lock causing that, it is MongoDB querying for your update documents; the lock does not exist during a page fault (in fact it only exists when actually updating, or rather saving, a document on the disk), it gives way to other operations.
The lock is more of a mutex in MongoDB.
Page faults on this size of data is perfectly normal, since you obviously do not query this data often, I am unsure what you are expecting to see. I am definitely unsure what you mean by your question:
Question: How can I avoid those faults in order to be sure that all my events are correctly updated ?
Ok, the problem you may be seeing is that you are getting page thrashing on that machine in turn destroying your IO bandwidth and flooding your working set with data that is not needed. Do you really need to add this field to ALL documents eagerly, can it not be added on-demand by the application when that data is used again?
Another option is to do this in batches.
One feature you could make use of here is priority queues that dictate that such an update is a background task that shouldn't effect the current workings of your mongod too much. I hear such a feature is due (can't find JIRA :/).
Please correct me if I am wrong but it seems you can't use the mongo update command when you need to access current's element data along the way.
You are correct.

Mongodb how to get the sum of two columns and save it to another column

Hi my mongodb collection is look likes this
{ "created" : ISODate("2013-01-01T00:00:00Z"), "total_page_impression_count" : 500, "total_page_story_count" : 7 }
{ "created" : ISODate("2013-01-02T00:00:00Z"), "total_page_impression_count" : 511, "total_page_story_count" : 7 }
{ "created" : ISODate("2013-01-03T00:00:00Z"), "total_page_impression_count" : 513, "total_page_story_count" : 7 }
and I want to persist the value of (total_page_impression_count + total_page_story_count) in another column on the same table. Any one know a way to do that. According to the finding what I got to know is it is not possible like SQL. I'll be great full if anyone can help. Thank you in advance.
You can iterate all documents and write aggregate field manually. using your language driver or javascript shell. Yes, there's no update users set full_name = first_name + last_name kind of commands in mongodb.
I think application logic is an application logic. And saving results of aggregating to perform new level of denormalization at deal with questionable performance gain is layed, 1st of all, on developer choice . So, do 10gen need to implement such functionality at all?
googletranslate
mongodb is not RDB.
mongodb has not "join" function.
db.yourtable.insert({ "created" : ISODate("2013-01-01T00:00:00Z"), "total_page_impression_count" : 500, "total_page_story_count" : 7 })
you need to insert 2 times.
db.yourtable.insert({ "created" : ISODate("2013-01-01T00:00:00Z"), "total_page_impression_count" : 500, "total_page_story_count" : 7 });
db.othertable.insert({"total_page_story_count":507});

MongoDB: Doing $inc on multiple keys

I need help incrementing value of all keys in participants without having to know name of the keys inside of it.
> db.conversations.findOne()
{
"_id" : ObjectId("4faf74b238ba278704000000"),
"participants" : {
"4f81eab338ba27c011000001" : NumberLong(2),
"4f78497938ba27bf11000002" : NumberLong(2)
}
}
I've tried with something like
$mongodb->conversations->update(array('_id' => new \MongoId($objectId)), array('$inc' => array('participants' => 1)));
to no avail...
You need to redesign your schema. It is never a good idea to have "random key names". Even though MongoDB is schemaless, it still means you need to have defined key names. You should change your schema to:
{
"_id" : ObjectId("4faf74b238ba278704000000"),
"participants" : [
{ _id: "4f81eab338ba27c011000001", count: NumberLong(2) },
{ _id: "4f78497938ba27bf11000002", count: NumberLong(2) }
]
}
Sadly, even with that, you can't update all embedded counts in one command. There is currently an open feature request for that: https://jira.mongodb.org/browse/SERVER-1243
In order to still update everything, you should:
query the document
update all the counts on the client side
store the document again
In order to prevent race conditions with that, have a look at "Compare and Swap" and following paragraphs.
It is not possible to update all nested elements in one single move in current version of MongoDB. So I can advice to use "foreach {}".
Read realted topic: How to Update Multiple Array Elements in mongodb
I hope this feature will be implemented in next version.

In MongoDB, how does on get the value in a field for an embedded document, but query based on a different value

I have a basic structure like this:
> db.users.findOne()
{
"_id" : ObjectId("4f384903cd087c6f720066d7"),
"current_sign_in_at" : ISODate("2012-02-12T23:19:31Z"),
"current_sign_in_ip" : "127.0.0.1",
"email" : "something#gmail.com",
"encrypted_password" : "$2a$10$fu9B3M/.Gmi8qe7pXtVCPu94mBVC.gn5DzmQXH.g5snHT4AJSZYCu",
"last_sign_in_at" : ISODate("2012-02-12T23:19:31Z"),
"last_sign_in_ip" : "127.0.0.1",
"name" : "Trip Jameson",
"sign_in_count" : 100,
"usertimes" : [
...thousands and thousands of records like this one....
{
"enddate" : 348268392.115282,
"idle" : 0,
"startdate" : 348268382.116728,
"title" : "My Awesome Title"
},
]
}
So I want to find only usertimes for a single user where the title was "My Awesome Title", and then I want to see what the value for "idle" was in that record(s)
So far all I can figure out is that I can find the entire user record with a search like:
> db.users.find({'usertimes.title':"My Awesome Title"})
This just returns the entire User record though, which is useless for my purposes. Am I misunderstanding something?
Return only partial embedded documents is currently not supported by MongoDB
The matching User record will always be returned (at least with the current MongoDB version).
see this question for similar reference
Filtering embedded documents in MongoDB
This is the correspondent Jira on MongoDB space
http://jira.mongodb.org/browse/SERVER-142
Use:
db.users.find({'usertimes.title': "My Awesome Title"}, {'idle': 1});
May I suggest you take a more detailed look at http://www.mongodb.org/display/DOCS/Querying, it'll explain things for you.