Unique Values in NoSQL - mongodb

Consider mongodb or couchbase. What if I need a certain value to be unique (maybe incremental) within the range of UINT32?
Well, I guess I could add a field like another_id and use something like this to increment it (mongo).
function getNextSequence(name) {
var ret = db.counters.findAndModify(
{
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
}
);
return ret.seq;
}
db.users.insert(
{
another_id : getNextSequence("userid"),
name : "Stack O. Flow"
}
)
But really the question is,
Is this approach safe?
Should I even use NoSQL for this? (consider I only have around 50M rows of data but I really need fast read and writes because this 50M rows of data gets updated almost a few times in second.)
If I should stick with SQL which one should I use. I've used MySQL and it was too slow. (though non-optimization might be at fault) (joining quite a few tables)
Thank you for any suggestions.

There is a specific counter object in Couchbase that should do what you want. Here is an example of it with Node.js.
You could relate it to the main object you are using by doing an objectID such as:
original_objectID::counter.
Then when you go to get the original object, you just do another get for the counter object by ID and done. You can iterate it easily as well. So if you needed to get the object and the original objectID was
user::kirk
then that user's counter object would be:
user::kirk::counter
And you can get and set it by that ID. It works very well in Couchbase.

Related

Best way to count documents in mongoDB

we have a collection with big amount of documents, lets say around 100k. We now want to count the number of documents which has the key x set.
If I try it with Collection.countDocuments({ x: { $exists: true }}) I get the result, but it creates instantly a warning in the console: Query Targeting: Scanned Objects / Returned has gone above 1000.
So, is there a better way to count the documents? There is a Index on the field, is it possible to get the length of the index?
Thanks
Theres no real way of viewing the index trees in Mongo, what other people have linked you just returns the size of the tree, I'm not sure how useful that information is in this context.
Now to your question is this the best way to count?.
The answer is Yes ... -ish.
countDocuments is a wrapper function, it just simulates the following pipeline:
db.collection.aggregate([
{ $match: <query> },
{ $group: { _id: null, n: { $sum: 1 } } } )
])
This pipeline is the most efficient way to go, but the difference between running this aggregation and using the wrapper function is about 100-200 milliseconds, depending on your machine spec.
Meaning if you're looking for "way" better performance you're not going to find it.
With that said this warning is stupid, it just means you have more than 1000 documents with that field. The true purpose of it is to alert you in the case you're trying to query 1-20 documents without a proper index.
You can use the indexSizes field returned by the stats() method.
The stats() method "Returns statistics about the collection".
See example here :
https://docs.mongodb.com/manual/reference/method/db.collection.stats/#basic-stats-lookup
{
...,
"indexSizes" : {
"_id_" : 237568,
"cuisine_1" : 143360,
"borough_1_cuisine_1" : 151552,
"borough_1_address.zipcode_1" : 151552
},
...
}
indexSize key return size as in space used in storing not count
Check With Explain if index getting used or not . (Update in question Also)
can use hint option to check the performance after specifying index
Or precalculate count by $inc operator might good option if possible in you use case
try cursor.count if its faster countDocument should been faster but no harm in checking
https://docs.mongodb.com/manual/reference/method/cursor.count/

optimizing query for $exists in sub property

I need to search for the existence of a property that is within another object.
the collection contains documents that look like:
"properties": {
"source": {
"a/name": 12837,
"a/different/name": 76129
}
}
As you can see below, part of the query string is from a variable.
With some help from JohnnyHK (see mongo query - does property exist? for more info), I've got a query that works by doing the following:
var name = 'a/name';
var query = {};
query['properties.source.' + name] = {$exists: true};
collection.find(query).toArray(function...
Now I need to see if I can index the collection to improve the performance of this query.
I don't have a clue how to do this or if it is even possible to index for this.
Suggestions?
2 things happening in here.
First probably you are looking for sparse indexes.
http://docs.mongodb.org/manual/core/index-sparse/
In your case it could be a sparse index on "properties.source.a/name" field. Making indexes on field will dramatically improve your query lookup time.
db.yourCollectionName.createIndex( { "properties.source.a/name": 1 }, { sparse: true } )
Second thing. Always when you want to know whether your query is fast/slow, use mongo console, run your query and on its result call explain method.
db.yourCollectionName.find(query).explain();
Thanks to it you will know whether your query uses indexes or not, how many documents it had to check in order to complete query and some others useful information.

Keeping default mongo _id and unique index of MondoDB

Is it good or bad practice to keep the standard "_id" generated my mongo in a document as well as my own unique identifier such as "name", or should I just replace _id generated with the actual name so my documents will look like this:
{
_id: 782yb238b2327b3,
name: "my_name"
}
or just like this:
{
_id: "my_name"
}
This depends on the scenario, there is nothing wrong with having your own unique ID, it may be string or a number, completely depends on your situation as long as its unique, the important thing is you are in charge of it. You would want to add an index to it of course.
for example i have an additional ID field which is a number called 'ID', because i required a sequential number as an identifier, another usecase may be that your migrating an application so you have to conform to a particular sequence pattern.
The sequences for the unique identifies could easily be stored in a separate document/collections.
There is no issue with using the built in _id if you have no requirement not to have a custom one, an interesting fact is that you can get the created date out of the _id. Always useful.
db.col.insert( { name: "test" } );
var doc = db.col.findOne( { name: "test" } );
var timestamp = doc._id.getTimestamp();

mongo: multiple queries or not?

I'm wondering the best way to query mongo db for many objects, where each one has an array of _id's that are attached to it. I want to grab the referenced objects as well. The objects' schemas looks like this:
var headlineSchema = new Schema({
title : String,
source : String,
edits : Array // list of edits, stored as an array of _id's
...
});
...and the one that's referenced, if needed:
var messageSchema = new Schema({
message : String,
user : String,
headlineID : ObjectId // also contains a ref. back to headline it's incl. in
...
});
One part of the problem (well, depending if I want to keep going this route) is that pushing the message id's is not working (edits remains an empty array [] afterwards) :
db.headline.update({_id : headlineid}, {$push: {edits : messageid} }, true);
When I do my query, I need to grab about 30 'headlines' at a time, and each one could contain references to up to 20 or 30 'messages'. My question is, what is the best way to fetch all of these things? I know mongo isn't a relational db, so what I'm intending is to first grab the headlines that I need, and then loop through all 30 of them to grab any attached messages.
db.headline.find({'date': {$gte: start, $lt: end} }, function (err, docs) {
if(err) { console.log(err.message); }
if(docs) {
docs.forEach(function(doc){
doc.edits.forEach(function(ed){
db.messages.find({_id:ed}, function (err, msg) {
// save stuff
});
});
});
}
});
This just seems wrong, but I'm unsure how else to proceed. Should I even bother with keeping an array of attached messages? I'm not married to the way I've set up my schema, either. If there is a better way to track relationships between them, or a better query to achieve this, please let me know.
Thanks
Does each message belong to only one headline? If so, you can store the headline id as part of each message. Then for each headline, do:
db.messages.find({headline_id: current-headline-id-here})
You could try using the $in operator for selecting a list of ObjectIds
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24in

Ways to implement data versioning in MongoDB

Can you share your thoughts how would you implement data versioning in MongoDB. (I've asked similar question regarding Cassandra. If you have any thoughts which db is better for that please share)
Suppose that I need to version records in an simple address book. (Address book records are stored as flat json objects). I expect that the history:
will be used infrequently
will be used all at once to present it in a "time machine" fashion
there won't be more versions than few hundred to a single record.
history won't expire.
I'm considering the following approaches:
Create a new object collection to store history of records or changes to the records. It would store one object per version with a reference to the address book entry. Such records would looks as follows:
{
'_id': 'new id',
'user': user_id,
'timestamp': timestamp,
'address_book_id': 'id of the address book record'
'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...}
}
This approach can be modified to store an array of versions per document. But this seems to be slower approach without any advantages.
Store versions as serialized (JSON) object attached to address book entries. I'm not sure how to attach such objects to MongoDB documents. Perhaps as an array of strings.
(Modelled after Simple Document Versioning with CouchDB)
The first big question when diving in to this is "how do you want to store changesets"?
Diffs?
Whole record copies?
My personal approach would be to store diffs. Because the display of these diffs is really a special action, I would put the diffs in a different "history" collection.
I would use the different collection to save memory space. You generally don't want a full history for a simple query. So by keeping the history out of the object you can also keep it out of the commonly accessed memory when that data is queried.
To make my life easy, I would make a history document contain a dictionary of time-stamped diffs. Something like this:
{
_id : "id of address book record",
changes : {
1234567 : { "city" : "Omaha", "state" : "Nebraska" },
1234568 : { "city" : "Kansas City", "state" : "Missouri" }
}
}
To make my life really easy, I would make this part of my DataObjects (EntityWrapper, whatever) that I use to access my data. Generally these objects have some form of history, so that you can easily override the save() method to make this change at the same time.
UPDATE: 2015-10
It looks like there is now a spec for handling JSON diffs. This seems like a more robust way to store the diffs / changes.
There is a versioning scheme called "Vermongo" which addresses some aspects which haven't been dealt with in the other replies.
One of these issues is concurrent updates, another one is deleting documents.
Vermongo stores complete document copies in a shadow collection. For some use cases this might cause too much overhead, but I think it also simplifies many things.
https://github.com/thiloplanz/v7files/wiki/Vermongo
Here's another solution using a single document for the current version and all old versions:
{
_id: ObjectId("..."),
data: [
{ vid: 1, content: "foo" },
{ vid: 2, content: "bar" }
]
}
data contains all versions. The data array is ordered, new versions will only get $pushed to the end of the array. data.vid is the version id, which is an incrementing number.
Get the most recent version:
find(
{ "_id":ObjectId("...") },
{ "data":{ $slice:-1 } }
)
Get a specific version by vid:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } } }
)
Return only specified fields:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } }, "data.content":1 }
)
Insert new version: (and prevent concurrent insert/update)
update(
{
"_id":ObjectId("..."),
$and:[
{ "data.vid":{ $not:{ $gt:2 } } },
{ "data.vid":2 }
]
},
{ $push:{ "data":{ "vid":3, "content":"baz" } } }
)
2 is the vid of the current most recent version and 3 is the new version getting inserted. Because you need the most recent version's vid, it's easy to do get the next version's vid: nextVID = oldVID + 1.
The $and condition will ensure, that 2 is the latest vid.
This way there's no need for a unique index, but the application logic has to take care of incrementing the vid on insert.
Remove a specific version:
update(
{ "_id":ObjectId("...") },
{ $pull:{ "data":{ "vid":2 } } }
)
That's it!
(remember the 16MB per document limit)
If you're looking for a ready-to-roll solution -
Mongoid has built in simple versioning
http://mongoid.org/en/mongoid/docs/extras.html#versioning
mongoid-history is a Ruby plugin that provides a significantly more complicated solution with auditing, undo and redo
https://github.com/aq1018/mongoid-history
I worked through this solution that accommodates a published, draft and historical versions of the data:
{
published: {},
draft: {},
history: {
"1" : {
metadata: <value>,
document: {}
},
...
}
}
I explain the model further here: http://software.danielwatrous.com/representing-revision-data-in-mongodb/
For those that may implement something like this in Java, here's an example:
http://software.danielwatrous.com/using-java-to-work-with-versioned-data/
Including all the code that you can fork, if you like
https://github.com/dwatrous/mongodb-revision-objects
If you are using mongoose, I have found the following plugin to be a useful implementation of the JSON Patch format
mongoose-patch-history
Another option is to use mongoose-history plugin.
let mongoose = require('mongoose');
let mongooseHistory = require('mongoose-history');
let Schema = mongoose.Schema;
let MySchema = Post = new Schema({
title: String,
status: Boolean
});
MySchema.plugin(mongooseHistory);
// The plugin will automatically create a new collection with the schema name + "_history".
// In this case, collection with name "my_schema_history" will be created.
I have used the below package for a meteor/MongoDB project, and it works well, the main advantage is that it stores history/revisions within an array in the same document, hence no need for an additional publications or middleware to access change-history. It can support a limited number of previous versions (ex. last ten versions), it also supports change-concatenation (so all changes happened within a specific period will be covered by one revision).
nicklozon/meteor-collection-revisions
Another sound option is to use Meteor Vermongo (here)