Mongo - most efficient way to $inc values in this schema - mongodb

I'm a Mongo newbie, and trying to make this schema work.
Its intended for logging events as they happen on a high traffic website - just incrementing the number of times a certain action happens.
{
_id: "20110924",
Vals:[
{ Key: "SomeStat1": Value: 1},
{ Key: "SomeStat2": Value: 2},
{ Key: "SomeStat3": Value: 3}
]
}
,
{
_id: "20110925",
Vals:[
{ Key: "SomeStat1": Value: 3},
{ Key: "SomeStat8": Value: 13},
{ Key: "SomeStat134": Value: 63}
]
}, etc.
So _id here is the date, then an array of the different stats, and the number of times they've occurred. There may be no stats for that day, and the stat keys can be dynamic.
I'm looking for the most efficient way to achieve these updates, avoiding race conditions...so ideally all atomic.
I've got stuck when trying to do the $inc. When I specify that it should upsert, it tries to match the whole document as the conditional, and it fails on duplicate keys. Similarly for $addToSet - if I addToSet with { Key:"SomeStat1" }, it won't think it's a duplicate as it's matching against the entire document, and hence insert it alongside the existing SomeStat1 value.
What's the best approach here? Is there a way to control how $addToSet matches? Or do I need a different schema?
Thanks in advance.

You're using a bad schema, it's impossible to do atomic updates on it. Do it like this:
{
_id: "20110924",
Vals: {
SomeStat1: 1,
SomeStat2: 2,
SomeStat3: 3,
}
}
or you can skip The Vals subdocument and embed the stats in the main document.

Take a look at this article which explains the issues around serializing a Dictionary using the MongoDB C# driver. See also this work item on the C# driver.

Related

Mongo best practice to structure nested document array

I've been struggling to find a solution to the following problem and seem to get conflicting advice from various mongodb posts. I am trying to figure out how to correctly represent an "array" of sub-objects such that:
they can be upserted (i.e. updated or new element created if needed, in a single operation)
the ids of the objects are available as values that can be searched, not just keys (that you can't really search in mongo).
I have a structure that I can represent as an array (repr A):
{
_id: 1,
subdocs: [
{ sd_id: 1, title: t1 },
{ sd_id: 2, title: t2 },
...
]
}
or as a nested document (repr B)
{
_id: 1,
subdocs: {
1: { title: t1 },
2: { title: t2 },
...
}
}
I would like to be able to update OR insert (i.e. upsert) new subdocs without having to use extra in-application logic.
In repr B this is straight-forward as I can simply use set
$set: {subdocs.3.title: t3}
in an update with upsert: true.
In repr A it is possible to update an existing record using an 'arrayFilter' with something like:
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}], upsert: true})
The problem is that while the above will update an existing subobject it will not create a new subobject (i.e. with _id: 3) if it does not exist (it is not an upsert). The docs claim that $[] does support upsert but this does not work for me.
While repr B does allow for update/upserts there is no way to search on the ids of the subdocuments because they are now keys rather than values.
The only solution to the above is to use a denormalized representation with e.g. the id being stored as both a key and a value:
subdocs: {
1: { sd_id: 1, title: t1 },
2: { sd_id: 2, title: t2 },
...
}
But this seems precarious (because the values might get out of sync).
So my question is whether there is a way around this? Am I perhaps missing a way to do an upsert in case A?
UPDATE: I found a workaround that lets me effectively use repr A even though I'm not sure its optimal. It involves using two writes rather than one:
update({_id: 1, "subdocs.sd_id": {$ne: 3}}, {$push: {subdocs: {sd_id: 3}}})
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}]})
The first line in the above ensures that we only ever insert one subdoc with sd_id 3 (and only has an effect if the id does not exist) while the second line updates the record (which should now definitely exist). I can probably put these in an ordered bulkwrite to make it all work.

MongoDB - Aggregation, group by an array value

I have the following document structure:
{
_id: ...,
name: "Item1",
Props: [
{
Key: "numberKey",
Val: 1234
},
{
Key: "dateKey",
Val: Date("2013-09-09")
}]
}
This is simplified and there can be various Keys and Values in Props field in the real application.
My question - is it possible to $group and $sum "numberKey"s by "dateKey"s?
What structure should I use if this is not possible? I need users to let add keys and values so I need something flexible.
Unfortunately, that isn't possible using aggregation with your schema. The problem is that aggregation is meant to operate over values in an array that are being selected by the $group clause and those elements have all the data needed. Your setup separates what you want to group by and what you want to sum. You could use a mapReduce job to do what you want with your schema. http://docs.mongodb.org/manual/core/map-reduce/ should be able to get you started.
Let me know if you have any other questions.
Best,
Charlie

JSON Schema with dynamic key field in MongoDB

Want to have a i18n support for objects stored in mongodb collection
currently our schema is like:
{
_id: "id"
name: "name"
localization: [{
lan: "en-US",
name: "name_in_english"
}, {
lan: "zh-TW",
name: "name_in_traditional_chinese"
}]
}
but my thought is that field "lan" is unique, can I just use this field as a key, so the structure would be
{
_id: "id"
name: "name"
localization: {
"en-US": "name_in_english",
"zh-TW": "name_in_traditional_chinese"
}
}
which would be neater and easier to parse (just localization[language] would get the value i want for specific language).
But then the question is: Is this a good practice in storing data in MongoDB? And how to pass the json-schema check?
It is not a good practice to have values as keys. The language codes are values and as you say you can not validate them against a schema. It makes querying against it impossible. For example, you can't figure out if you have a language translation for "nl-NL" as you can't compare against keys and neither is it possible to easily index this. You should always have descriptive keys.
However, as you say, having the languages as keys makes it a lot easier to pull the data out as you can just access it by ['nl-NL'] (or whatever your language's syntax is).
I would suggest an alternative schema:
{
your_id: "id_for_name"
lan: "en-US",
name: "name_in_english"
}
{
your_id: "id_for_name"
lan: "zh-TW",
name: "name_in_traditional_chinese"
}
Now you can :
set an index on { your_id: 1, lan: 1 } for speedy lookups
query for each translation individually and just get that translation:
db.so.find( { your_id: "id_for_name", lan: 'en-US' } )
query for all the versions for each id using this same index:
db.so.find( { your_id: "id_for_name" } )
and also much easier update the translation for a specific language:
db.so.update(
{ your_id: "id_for_name", lan: 'en-US' },
{ $set: { name: "ooga" } }
)
Neither of those points are possible with your suggested schemas.
Obviously the second schema example is much better for your task (of course, if lan field is unique as you mentioned, that seems true to me also).
Getting element from dictionary/associated array/mapping/whatever_it_is_called_in_your_language is much cheaper than scanning whole array of values (and in current case it's also much efficient from the storage size point of view (remember that all fields are stored in MongoDB as-is, so every record holds the whole key name for json field, not it's representation or index or whatever).
My experience shows that MongoDB is mature enough to be used as a main storage for your application, even on high-loads (whatever it means ;) ), and the main problem is how you fight database-level locks (well, we'll wait for promised table-level locks, it'll fasten MongoDB I hope a lot more), though data loss is possible if your MongoDB cluster is built badly (dig into docs and articles over Internet for more information).
As for schema check, you must do it by means of your programming language on application side before inserting records, yeah, that's why Mongo is called schemaless.
There is a case where an object is necessarily better than an array: supporting upserts into a set. For example, if you want to update an item having name 'item1' to have val 100, or insert such an item if one doesn't exist, all in one atomic operation. With an array, you'd have to do one of two operations. Given a schema like
{ _id: 'some-id', itemSet: [ { name: 'an-item', val: 123 } ] }
you'd have commands
// Update:
db.coll.update(
{ _id: id, 'itemSet.name': 'item1' },
{ $set: { 'itemSet.$.val': 100 } }
);
// Insert:
db.coll.update(
{ _id: id, 'itemSet.name': { $ne: 'item1' } },
{ $addToSet: { 'itemSet': { name: 'item1', val: 100 } } }
);
You'd have to query first to know which is needed in advance, which can exacerbate race conditions unless you implement some versioning. With an object, you can simply do
db.coll.update({
{ _id: id },
{ $set: { 'itemSet.name': 'item1', 'itemSet.val': 100 } }
});
If this is a use case you have, then you should go with the object approach. One drawback is that querying for a specific name requires scanning. If that is also needed, you can add a separate array specifically for indexing. This is a trade-off with MongoDB. Upserts would become
db.coll.update({
{ _id: id },
{
$set: { 'itemSet.name': 'item1', 'itemSet.val': 100 },
$addToSet: { itemNames: 'item1' }
}
});
and the query would then simply be
db.coll.find({ itemNames: 'item1' })
(Note: the $ positional operator does not support array upserts.)

mongodb mapreduce function does not provide skip functionality, is their any solution to this?

Mongodb mapreduce function does not provide any way to skip record from database like find function. It has functionality of query, sort & limit options. But I want to skip some records from the database, and I am not getting any way to it. please provide solutions.
Thanks in advance.
Ideally a well-structured map-reduce query would allow you to skip particular documents in your collection.
Alternatively, as Sergio points out, you can simply not emit particular documents in map(). Using scope to define a global counter variable is one way to restrict emit to a specified range of documents. As an example, to skip the first 20 docs that are sorted by ObjectID (and thus, sorted by insertion time):
db.collection_name.mapReduce(map, reduce, {out: example_output, sort: {id:-1}, scope: "var counter=0")};
Map function:
function(){
counter ++;
if (counter > 20){
emit(key, value);
}
}
I'm not sure since which version this feature is available, but certainly in MongoDB 2.6 the mapReduce() function provides query parameter:
query : document
Optional. Specifies the selection criteria using query operators for determining the documents input to the map
function.
Example
Consider the following map-reduce operations on a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Perform the map-reduce operation on the orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date:
{ $gt: new Date('01/01/2012') }
},
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents with ord_date greater than new Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the map_reduce_example collection already exists, the operation will merge the existing contents with the results of this map-reduce operation.

mongoDB: unique index on a repeated value

So i'm pretty new to mongoDb so i figure this could be a misunderstanding on general usage. so bear with me.
I have a document schema I'm working with as such
{
name: "bob",
email: "bob#gmail.com",
logins: [
{ u: 'a', p: 'b', public_id: '123' },
{ u: 'x', p: 'y', public_id: 'abc' }
]
}
My Problem is that i need to ensure that the public ids are unique within a document and collection,
Furthermore there are some existing records being migrated from a mySQL DB that dont have records, and will therefore all be replaced by null values in mongo.
I figure its either an index
db.users.ensureIndex({logins.public_id: 1}, {unique: true});
which isn't working because of the missing keys and is throwing a E11000 duplicate key error index:
or this is a more fundamental schema problem in that I shouldn't be nesting objects in an array structure like that. In which case, what? a seperate collection for the user_logins??? which seems to go against the idea of an embedded document.
If you expect u and p to have always the same values on each insert (as in your example snippet), you might want to use the $addToSet operator on inserts to ensure the uniqueness of your public_id field. Otherwise I think it's quite difficult to make them unique across a whole collection not working with external maintenance or js functions.
If not, I would possibly store them in their own collection and use the public_id as _id field to ensure their cross-document uniqueness inside a collection. Maybe that would contradict the idea of embedded docs in a doc database, but according to different requirements I think that's negligible.
Furthermore there are some existing records being migrated from a mySQL DB that dont have records, and will therefore all be replaced by null values in mongo.
So you want to apply a unique index on a data set that's not truly unique. I think this is just a modeling problem.
If logins.public_id is null that's going to violate your uniqueness constraint, then just don't write it at all:
{
logins: [
{ u: 'a', p: 'b' },
{ u: 'x', p: 'y' }
]
}
Thanks all.
In the end I opted to seperate this into 2 collections, one for users and one for logins.
users this looked a little like..
userDocument = {
...
logins: [
DBRef('loginsCollection', loginDocument._id),
DBRef('loginsCollection', loginDocument2._id),
]
}
loginDocument = {
...
user: new DBRef('userCollection', userDocument ._id)
}
Although not what i was originally after (a single collection) It is working niocely and by utilising the MongoId uniquness there is a constraint now built in at a database level and not implemented at the application level.