does it possible contains multi format document in one lift-mongo-recorder model? - mongodb

I attempt build a model able to insert different field(only one field is different).
also, the field should be custom format and I will be a embed object.
document form Assembla use case class define embed field, so I can't use base class(trait/abstract class) to define under my recorder model.
hope insert different format, such as
{
_id: ...,
same_field: "s1",
specify_field: {
format1: "..."
}
}
{
_id: ...,
same_field: "s1",
specify_field: [
formatX: "..",
formatY: ".."
]
}
How to construct my model?
thanks

Related

In MongoDB, can we get distinct values of a field regardless of hierarchy?

I am creating an application to store and display multiple hierarchies. I am storing json data in nested tree format like the following
{
"text":"Node1",
"children":
[{
"text":"Node2",
"children":[...]
},
{
"text":"Node3",
"children":[...]
}]
}
Is there a way to get the distinct values for the field text regardless of where the field appears in hierarchy. It could be the text of the parent, child, grandchild or further descendants.
Desired output is ["Node1","Node2","Node3",...]

Insert multiple documents referenced by another Schema

I have the following two schemas:
var SchemaOne = new mongoose.Schema({
id_headline: { type: String, required: true },
tags: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Tag' }]
});
var tagSchema = new mongoose.Schema({
_id: { type: String, required: true, index: { unique: true } }, // value
name: { type: String, required: true }
});
As you can see, in the first schema there is an array of references to the second schema.
My problem is:
Suppose that, in my backend server, I receive an array of tags (just the id's) and, before creating the SchemaOne document, I need to verify if the received tags already exist in the database and, if not, create them. Only after having all the tags stored in the database, I may assign this received array to the tags array of the to be created SchemaOne document.
I'm not sure on how to implement this? Can you give me a helping hand?
So lets assume you have input being sent to your server that essentially resolves to this:
var input = {
"id_headline": "title",
"tags": [
{ "name": "one" },
{ "name": "two" }
]
};
And as you state, you are not sure whether any of the "tags" entries alredy exists, but of course the "name" is also unique for lookup to the associated object.
What you are basically going to have to do here is "lookup" each of the elements within "tags" and return the document with the reference to use to the objects in the "Tag" model. The ideal method here is .findOneAndUpdate(), with the "upsert" option set to true. This will create the document in the collection where it is not found, and at any rate will return the document content with the reference that was created.
Note that natually, you want to ensure you have those array items resolved "first", before preceeding to saving the main "SchemaOne" object. The async library has some methods that help structure this:
async.waterfall(
[
function(callback) {
async.map(input.tags,function(tag,callback) {
Tag.findOneAndUpdate(
{ "name": tag.name },
{ "$setOnInsert": { "name": tag.name } },
{ "upsert": true, "new": true },
callback
)
},callback);
},
function(tags,callback) {
Model.findOneAndUpdate(
{ "id_headline": input.id_headline },
{ "$addToSet": {
"tags": { "$each": tags.map(function(tag) { return tag._id }) }
}},
{ "upsert": true, "new": true },
callback
)
}
],
function(err,result) {
// if err then do something to report it, otherwise it's done.
}
)
So the async.waterfall is a special flow control method that will pass the result returned from each of the functions specified in the array of arguments to the next one, right until the end of execution where you can optionally pass in the result of the final function in the list. It basically "cascades" or "waterfalls" results down to each step. This is wanted to pass in the results of the "tags" creation to the main model creation/modification.
The async.map within the first executed stage looks at each of the elements within the array of the input. So for each item contained in "tags", the .findOneAndUpdate() method is called to look for and possibly create if not found, the specified "tag" entry in the collection.
Since the output of .map() is going to be an array of those documents, it is simply passed through to the next stage. Therefore each iteration returns a document, when the iteration is complete you have all documents.
The next usage of .findOneAndUpdate() with "upsert" is optional, and of course considers that the document with the matching "id_headline" may or may not exist. The same case is true that if it is there then the "update" is processed, if not then it is simply created. You could optionally .insert() or .create() if the document was known not to be there, but the "update" action gives some interesting options.
Namely here is the usage of $addToSet, where if the document already existed then the specified items would be "added" to any content that was already there, and of course as a "set", any items already present would not be new additions. Note that only the _id fields are required here when adding to the array with an atomic operator, hence the .map() function employed.
An alternate case on "updating" could be to simply "replace" the array content using the $set atomic operation if it was the intent to only store those items that were mentioned in the input and no others.
In a similar manner the $setOnInsert shown when "creating"/"looking for" items in "Tags" makes sure that there is only actual "modification" when the object is "created/inserted", and that removes some write overhead on the server.
So the basic priciples of using .findOneAndUpdate() at least for the "Tags" entries is the most optimal way of handling this. This avoids double handling such as:
Querying to see if the document exists by name
if No result is returned, then send an additional statement to create one
That means two operations to the database with communication back and forth, which the actions here using "upserts" simplifies into a single request for each item.

JSON Schema with dynamic key field in MongoDB

Want to have a i18n support for objects stored in mongodb collection
currently our schema is like:
{
_id: "id"
name: "name"
localization: [{
lan: "en-US",
name: "name_in_english"
}, {
lan: "zh-TW",
name: "name_in_traditional_chinese"
}]
}
but my thought is that field "lan" is unique, can I just use this field as a key, so the structure would be
{
_id: "id"
name: "name"
localization: {
"en-US": "name_in_english",
"zh-TW": "name_in_traditional_chinese"
}
}
which would be neater and easier to parse (just localization[language] would get the value i want for specific language).
But then the question is: Is this a good practice in storing data in MongoDB? And how to pass the json-schema check?
It is not a good practice to have values as keys. The language codes are values and as you say you can not validate them against a schema. It makes querying against it impossible. For example, you can't figure out if you have a language translation for "nl-NL" as you can't compare against keys and neither is it possible to easily index this. You should always have descriptive keys.
However, as you say, having the languages as keys makes it a lot easier to pull the data out as you can just access it by ['nl-NL'] (or whatever your language's syntax is).
I would suggest an alternative schema:
{
your_id: "id_for_name"
lan: "en-US",
name: "name_in_english"
}
{
your_id: "id_for_name"
lan: "zh-TW",
name: "name_in_traditional_chinese"
}
Now you can :
set an index on { your_id: 1, lan: 1 } for speedy lookups
query for each translation individually and just get that translation:
db.so.find( { your_id: "id_for_name", lan: 'en-US' } )
query for all the versions for each id using this same index:
db.so.find( { your_id: "id_for_name" } )
and also much easier update the translation for a specific language:
db.so.update(
{ your_id: "id_for_name", lan: 'en-US' },
{ $set: { name: "ooga" } }
)
Neither of those points are possible with your suggested schemas.
Obviously the second schema example is much better for your task (of course, if lan field is unique as you mentioned, that seems true to me also).
Getting element from dictionary/associated array/mapping/whatever_it_is_called_in_your_language is much cheaper than scanning whole array of values (and in current case it's also much efficient from the storage size point of view (remember that all fields are stored in MongoDB as-is, so every record holds the whole key name for json field, not it's representation or index or whatever).
My experience shows that MongoDB is mature enough to be used as a main storage for your application, even on high-loads (whatever it means ;) ), and the main problem is how you fight database-level locks (well, we'll wait for promised table-level locks, it'll fasten MongoDB I hope a lot more), though data loss is possible if your MongoDB cluster is built badly (dig into docs and articles over Internet for more information).
As for schema check, you must do it by means of your programming language on application side before inserting records, yeah, that's why Mongo is called schemaless.
There is a case where an object is necessarily better than an array: supporting upserts into a set. For example, if you want to update an item having name 'item1' to have val 100, or insert such an item if one doesn't exist, all in one atomic operation. With an array, you'd have to do one of two operations. Given a schema like
{ _id: 'some-id', itemSet: [ { name: 'an-item', val: 123 } ] }
you'd have commands
// Update:
db.coll.update(
{ _id: id, 'itemSet.name': 'item1' },
{ $set: { 'itemSet.$.val': 100 } }
);
// Insert:
db.coll.update(
{ _id: id, 'itemSet.name': { $ne: 'item1' } },
{ $addToSet: { 'itemSet': { name: 'item1', val: 100 } } }
);
You'd have to query first to know which is needed in advance, which can exacerbate race conditions unless you implement some versioning. With an object, you can simply do
db.coll.update({
{ _id: id },
{ $set: { 'itemSet.name': 'item1', 'itemSet.val': 100 } }
});
If this is a use case you have, then you should go with the object approach. One drawback is that querying for a specific name requires scanning. If that is also needed, you can add a separate array specifically for indexing. This is a trade-off with MongoDB. Upserts would become
db.coll.update({
{ _id: id },
{
$set: { 'itemSet.name': 'item1', 'itemSet.val': 100 },
$addToSet: { itemNames: 'item1' }
}
});
and the query would then simply be
db.coll.find({ itemNames: 'item1' })
(Note: the $ positional operator does not support array upserts.)

mongodb pointer to another collection's item

Is it possible to point from one collection's item's value to another collection's item?
example:
db.col2.save( { value: 'test' } );
db.col1.save( { title: 'testing, something: [code to point to another collection's item] } );
db.col1.find().toArray()
[
{
"_id" : ObjectId([someobjectidhere]),
"title" : "testing",
"something": {
"value": "test"
}
}
]
Yes you can point to another document, however unlike SQL you can't do a join to retrieve both at the same time.
Therefore you would need to do 2 retrieves. One to get the first document (then extract the reference in code) and then use this reference to get the second document
MongoDB does not support joins. In MongoDB some data is “denormalized,” or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.
You can refer the doc for DBRef here

Ways to implement data versioning in MongoDB

Can you share your thoughts how would you implement data versioning in MongoDB. (I've asked similar question regarding Cassandra. If you have any thoughts which db is better for that please share)
Suppose that I need to version records in an simple address book. (Address book records are stored as flat json objects). I expect that the history:
will be used infrequently
will be used all at once to present it in a "time machine" fashion
there won't be more versions than few hundred to a single record.
history won't expire.
I'm considering the following approaches:
Create a new object collection to store history of records or changes to the records. It would store one object per version with a reference to the address book entry. Such records would looks as follows:
{
'_id': 'new id',
'user': user_id,
'timestamp': timestamp,
'address_book_id': 'id of the address book record'
'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...}
}
This approach can be modified to store an array of versions per document. But this seems to be slower approach without any advantages.
Store versions as serialized (JSON) object attached to address book entries. I'm not sure how to attach such objects to MongoDB documents. Perhaps as an array of strings.
(Modelled after Simple Document Versioning with CouchDB)
The first big question when diving in to this is "how do you want to store changesets"?
Diffs?
Whole record copies?
My personal approach would be to store diffs. Because the display of these diffs is really a special action, I would put the diffs in a different "history" collection.
I would use the different collection to save memory space. You generally don't want a full history for a simple query. So by keeping the history out of the object you can also keep it out of the commonly accessed memory when that data is queried.
To make my life easy, I would make a history document contain a dictionary of time-stamped diffs. Something like this:
{
_id : "id of address book record",
changes : {
1234567 : { "city" : "Omaha", "state" : "Nebraska" },
1234568 : { "city" : "Kansas City", "state" : "Missouri" }
}
}
To make my life really easy, I would make this part of my DataObjects (EntityWrapper, whatever) that I use to access my data. Generally these objects have some form of history, so that you can easily override the save() method to make this change at the same time.
UPDATE: 2015-10
It looks like there is now a spec for handling JSON diffs. This seems like a more robust way to store the diffs / changes.
There is a versioning scheme called "Vermongo" which addresses some aspects which haven't been dealt with in the other replies.
One of these issues is concurrent updates, another one is deleting documents.
Vermongo stores complete document copies in a shadow collection. For some use cases this might cause too much overhead, but I think it also simplifies many things.
https://github.com/thiloplanz/v7files/wiki/Vermongo
Here's another solution using a single document for the current version and all old versions:
{
_id: ObjectId("..."),
data: [
{ vid: 1, content: "foo" },
{ vid: 2, content: "bar" }
]
}
data contains all versions. The data array is ordered, new versions will only get $pushed to the end of the array. data.vid is the version id, which is an incrementing number.
Get the most recent version:
find(
{ "_id":ObjectId("...") },
{ "data":{ $slice:-1 } }
)
Get a specific version by vid:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } } }
)
Return only specified fields:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } }, "data.content":1 }
)
Insert new version: (and prevent concurrent insert/update)
update(
{
"_id":ObjectId("..."),
$and:[
{ "data.vid":{ $not:{ $gt:2 } } },
{ "data.vid":2 }
]
},
{ $push:{ "data":{ "vid":3, "content":"baz" } } }
)
2 is the vid of the current most recent version and 3 is the new version getting inserted. Because you need the most recent version's vid, it's easy to do get the next version's vid: nextVID = oldVID + 1.
The $and condition will ensure, that 2 is the latest vid.
This way there's no need for a unique index, but the application logic has to take care of incrementing the vid on insert.
Remove a specific version:
update(
{ "_id":ObjectId("...") },
{ $pull:{ "data":{ "vid":2 } } }
)
That's it!
(remember the 16MB per document limit)
If you're looking for a ready-to-roll solution -
Mongoid has built in simple versioning
http://mongoid.org/en/mongoid/docs/extras.html#versioning
mongoid-history is a Ruby plugin that provides a significantly more complicated solution with auditing, undo and redo
https://github.com/aq1018/mongoid-history
I worked through this solution that accommodates a published, draft and historical versions of the data:
{
published: {},
draft: {},
history: {
"1" : {
metadata: <value>,
document: {}
},
...
}
}
I explain the model further here: http://software.danielwatrous.com/representing-revision-data-in-mongodb/
For those that may implement something like this in Java, here's an example:
http://software.danielwatrous.com/using-java-to-work-with-versioned-data/
Including all the code that you can fork, if you like
https://github.com/dwatrous/mongodb-revision-objects
If you are using mongoose, I have found the following plugin to be a useful implementation of the JSON Patch format
mongoose-patch-history
Another option is to use mongoose-history plugin.
let mongoose = require('mongoose');
let mongooseHistory = require('mongoose-history');
let Schema = mongoose.Schema;
let MySchema = Post = new Schema({
title: String,
status: Boolean
});
MySchema.plugin(mongooseHistory);
// The plugin will automatically create a new collection with the schema name + "_history".
// In this case, collection with name "my_schema_history" will be created.
I have used the below package for a meteor/MongoDB project, and it works well, the main advantage is that it stores history/revisions within an array in the same document, hence no need for an additional publications or middleware to access change-history. It can support a limited number of previous versions (ex. last ten versions), it also supports change-concatenation (so all changes happened within a specific period will be covered by one revision).
nicklozon/meteor-collection-revisions
Another sound option is to use Meteor Vermongo (here)