Mongo reference to field in the same document - mongodb

I am currently thinking about how to structure data for a node.js app i will be developping.
I am thinking of using Mongo as database, but cannot think of a good way to accomplish what I want. All my researches led me to cross-documents references, but this is not what i am looking for, as everything lies in my 'event' document.
My document consists in an event in which people take part that can be represented like that :
{
name: "Some event name",
people: ["A", "B", "C", "D", ... ], // Could be more complex objects, with name, mail for instance
actions : [
{
name: "Some action name"
someAttr: 4.21
leader: "A" // One of people
followers : ["A", "B", "D"] // from people too
},
...
]
}
As you can see, the people are repeated in the document. They can be repeated numerous times as each event will gather lots of actions.
Should I keep my schema this way, or is there something more clever to do such as referencing people from the actions with the index of the global people list with something like that ?
{
name: "Some event name",
people: ["A", "B", "C", "D", ... ],
actions : [
{
name: "Some action name"
someAttr: 4.21
leader: 0
followers : [0, 1, 3]
},
...
]
}
If so, what is the smartest way to accomplish that ?
Thank you very much
Note : This is a theoretical question, and I am a beginner in the Mongo and node.js world

The question "linking or embedding" has been discussed multiple times. A good start for your resaerch can be this question.
EDIT:
If (as you said) people are complex objects, than you should reference them by an id or a name. It makes no sense, to store the (complex) objects twice (or more) in the same document.

Related

MongoDB - Looking up documents based on criteria defined by the documents themselves

Overall, I am trying to find a system design to quickly look up stored objects whose metadata matches data bundled on incoming events. Which fields are required, however, are themselves part of the stored objects, and are not fields that I can hardcode into a lookup query.
My system has a policies collection stored in MongoDB with documents that look like this:
{
id: 123,
name: "Jason's Policy",
requirements: {
"var1": "aaa",
"var2": "bbb"
// could have any number more, and each policy can have different field/values under requirements
}
}
My system receives events that look like this:
// Event 1 - matches all requirements under above policy
{
id: 777,
"var1": "aaa",
"var2": "bbb"
}
// Event 2 - does not match all requirements from above policy since var1 is undefined
{
id: 888,
"var2": "bbb",
"var3": "zzz"
}
As I receive events, how can I efficiently look up all the policies whose requirements are fully satisfied by the values received in the event?
As an example, in the sample data above, event 1 should return the policy (since var1 and var2 match the policy requirements), but event 2 should not return the policy (since var1 does not match/ is missing).
I can think of brute-force ways to do this on the application server itself (think nested for loops) but efficiency will be key as we receive hundreds of events per second.
I am open to recommendations for document schema changes that can satisfy the general problem (looking up documents based on criteria itself defined in our documents). I am also open to any overall design recommendations that address the problem, too (perhaps there is a better way to structure our system to trigger policy actions in response to events).
Thanks!
Not sure what's the exact scenario but can think of 2 here,
You need an exact match. For that you can run the below querydb.getCollection('test').find({'requirements':{'var1':'aaa','var2':'bbb'}})
for above query to run you need to save requirements object after sorting it's keys var1 and var2.
You need to match all properties exists and don't care if anything is extra in policies collection. You need to change policies being stored as,
{
"_id" : ObjectId("603250b0775428e32b9b303f"),
"id" : 123,
"name" : "Jason's Policy",
"requirements" : {
"var1" : "aaa",
"var2" : "bbb"
},
"requirements_search" : [
"var1aaa",
"var2bbb",
"var3ccc"
]
}
then you can run the below query,
db.getCollection('test').find({'requirements_search':{'$all' : ['var1aaa','var2bbb']}})
I found an answer to my question in another post: Find Documents in MongoDB whose with an array field is a subset of a query array.
MongoDB offers a $setIsSubset operator that can check if a document's array values are a subset of the array values in a query. Translated to my use case: if a given policy's requirements are a subset of the event's metadata, then I know that the event data fully meets the requirements for that policy.
For completeness, below is the MongoDB aggregation that solved my problem. I still need to research if there is a more efficient overall system design to facilitate what I need, but at a minimum, this Mongo aggregation will fetch the results that I need.
// Requires us to flatten policy requirements into an array like the following
//
// {
// "id" : 123,
// "name" : "Jason's Policy",
// "requirements" : [
// "var1_aaa",
// "var2_bbb"
// ]
// }
//
// Event matches all policy requirements and has extra unrelated attributes
// {
// id: 777,
// "var1": "aaa",
// "var2": "bbb",
// "var3": "ccc"
// }
db.collection.aggregate([
{$project: {
doc: '$$ROOT',
isSubset: {$setIsSubset: ['$requirements', ['var1_aaa', 'var2_bbb', 'var3_ccc']]}
}},
{$match: {isSubset: true}},
{$project: {_id: 0, 'doc.name': 1}}
])

MongoDB, positional argument, two levels deep - still not possible?

I have a collection with the following structure:
event_id: "id",
votedUp: 0,
votedDown: 0,
questions: [
{
title: "title"
desc: "desc"
type: "TEXT"
answers: []
},
{
title: "title"
desc: "desc"
type: "MULTIPLE"
answers: [
{
value: "ans1",
answered: 0
},
{
value: "ans2",
answered: 2
},
]
}
]
Important thing here: question doesn't make any sense without an Event, so it can't have it's own ID - they are identified by their properties within given Event.
This is some kind of questionnaire, mixed a little bit with survey.
I have two use cases:
User completes questionnaire.
For TEXT questions, I need to push submitted answers.
For multi/single choice I need to increment 'answered' field.
How can I achieve this WITHOUT changing the model using mongodb? I've found this issue and I'm quite terrified. It seems like there's no way to do this with one query.
Is it still the case? Any workarounds?
The only idea I have is to retrieve document from database, edit it and save again, but since mongo has nothing to do with transactions, it could be dangerous...

Meteor Collection: find element in array

I have no experience with NoSQL. So, I think, if I just try to ask about the code, my question can be incorrect. Instead, let me explain my problem.
Suppose I have e-store. I have catalogs
Catalogs = new Mongo.Collection('catalogs);
and products in that catalogs
Products = new Mongo.Collection('products');
Then, people add there orders to temporary collection
Order = new Mongo.Collection();
Then, people submit their comments, phone, etc and order. I save it to collection Operations:
Operations.insert({
phone: "phone",
comment: "comment",
etc: "etc"
savedOrder: Order //<- Array, right? Or Object will be better?
});
Nice, but when i want to get stats by every product, in what Operations product have used. How can I search thru my Operations and find every operation with that product?
Or this way is bad? How real pro's made this in real world?
If I understand it well, here is a sample document as stored in your Operation collection:
{
clientRef: "john-001",
phone: "12345678",
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 1, itemRef: "XYZ001", unitPriceInCts: 1050, desc: "USB Pen Drive 8G" },
{ qty: 1, itemRef: "ABC002", unitPriceInCts: 19995, desc: "Entry level motherboard" },
]
}
},
{
clientRef: "paul-002",
phone: null,
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 3, itemRef: "XYZ001", unitPriceInCts: 950, desc: "USB Pen Drive 8G" },
]
}
},
Given that, to find all operations having item reference XYZ001 you simply have to query:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"})
This will return the whole document. If instead you are only interested in the client reference (and operation _id), you will use a projection as an extra argument to find:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"}, {"clientRef": 1})
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "clientRef" : "john-001" }
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c17a"), "clientRef" : "paul-002" }
If you need to perform multi-documents (incl. multi-embedded documents) operations, you should take a look at the aggregation framework:
For example, to calculate the total of an order:
> db.operations.aggregate([
{$match: { "_id" : ObjectId("556f07b5d5f2fb3f94b8c179") }},
{$unwind: "$savedOrder.lines" },
{$group: { _id: "$_id",
total: {$sum: {$multiply: ["$savedOrder.lines.qty",
"$savedOrder.lines.unitPriceInCts"]}}
}}
])
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "total" : 21045 }
I'm an eternal newbie, but since no answer is posted, I'll give it a try.
First, start by installing robomongo or a similar software, it will allow you to have a look at your collections directly in mongoDB (btw, the default port is 3001)
The way I deal with your kind of problem is by using the _id field. It is a field automatically generated by mongoDB, and you can safely use it as an ID for any item in your collections.
Your catalog collection should have a string array field called product where you find all your products collection items _id. Same thing for the operations: if an order is an array of products _id, you can do the same and store this array of products _id in your savedOrder field. Feel free to add more fields in savedOrder if necessary, e.g. you make an array of objects products with additional fields such as discount.
Concerning your queries code, I assume you will find all you need on the web as soon as you figure out what your structure is.
For example, if you have a product array in your savedorder array, you can pull it out like that:
Operations.find({_id: "your operation ID"},{"savedOrder.products":1)
Basically, you ask for all the products _id in a specific operation. If you have several savedOrders in only one operation, you can specify too the savedOrder _id, if you used the one you had in your local collection.
Operations.find({_id: "your_operation_ID", "savedOrder._id": "your_savedOrder_ID"},{"savedOrder.products":1)
ps: to bad-ass coders here, if I'm doing it wrong, please tell me.
I find an answer :) Of course, this is not a reveal for real professionals, but is a big step for me. Maybe my experience someone find useful. All magic in using correct mongo operators. Let solve this problem in pseudocode.
We have a structure like this:
Operations:
1. Operation: {
_id: <- Mongo create this unique for us
phone: "phone1",
comment: "comment1",
savedOrder: [
{
_id: <- and again
productId: <- whe should save our product ID from 'products'
name: "Banana",
quantity: 100
},
{
_id:,
productId: <- Another ID, that we should save if order
name: "apple",
quantity: 50
}
]
And if we want to know, in what Operation user take "banana", we should use mongoDB operator"elemMatch" in Mongo docs
db.getCollection('operations').find({}, {savedOrder: {$elemMatch:{productId: "f5mhs8c2pLnNNiC5v"}}});
In simple, we get documents our saved order have products with id that we want to find. I don't know is it the best way, but it works for me :) Thank you!

MongoDB Schema Design for language database

I need some advice on MongoDB schema design for a natural language database.
I need to store for each language texts and words like:
lang: {
_id: "English",
texts : [
{ text : "This is a first text",
date : Date("2011-09-19T04:00:10.112Z"),
tag : "test1"
},
{ text : "Second One",
date : Date("2011-09-19T04:00:10.112Z"),
tag : "test2"
}
],
words : [
{
word : "This",
},
{
word : "is",
},
{
word : "a",
},
{
word : "first",
},
{
word : "text",
},
{
word : "second",
},
{
word : "one",
}
]
}
And then I need to know each words and texts a user has associated. The word/text amount tends to be huge and I need to list all words on a language and all words a user has associated for that language.
From my perspective I think storing the user_ids that are associated with a given word in an array for the word is maybe a good approach like:
lang: {
_id: "English",
texts : [
...
],
words : [
{
word : "This",
users: [user1,user2,user3]
},
{
word : "is",
users: [user1,user2]
},
...
]
}
Having in mind that a word can be associated to hundreds of thousand of users and the document limit (as I read) is 4MB and that I need to:
List all words for a given user and language
Is this a good approach? Or can you think of a better one?
Hope this question is clear enough and that someone can give me a help on this ;)
Thank you all!
I don't think this is a good approach, for just the reason you mention: the document size limit. It looks like with your approach, you are definitely going to run up against the limit. I would go for a flatter approach (which should also make your collection easier to query). Something like this:
[
{
user: "user1",
word: "This",
lang: "en"
},
{
user: "user1",
word: "is",
lang: "en"
},
// et cetera...
]
In other words, grow vertically by adding documents rather than horizontally by adding more data to one document. You can query words for a given user with db.find( { user: "user1", lang: "en" });.
This approach isn't "normalized", of course, so if you're concerned about space then you might want to create a separate collection for users, words, and languages and reference them in the main collection by an ID. But since there are no join queries in MongoDB, you have to weigh query performance against space efficiency.
dbaseman is correct (and upvoted), but a couple of other points:
First, the document limit is now 16MB (Max Document Size), as of this writing, assuming you are running a recent versionof MongoDB.
Second, unbounded growth is generally a bad idea in MongoDB, this type of document size expansion can cause MongoDB to have to move the document if it exceeds the current space allocated to it. You can read more about this in the Padding Factor section of the documentation.
Those types of moves are relatively expensive, especially if they happen frequently. Therefore, if you do go with this type of design limiting the size (essentially bounding that growth) of the comments equivalent in your main collection (most recent X, most popular X etc.) and perhaps even pre-populating that document field (essentially manual padding) to beyond the average size will reduce the moves caused additions/changes.
This is the reason why tip #6 in the MongoDB Developers tips and tricks book from O'Reilly is:
Tip #6: Do not embed fields that have unbound growth

MongoDB many-to-many search

this is my collections: (many-to-many)
actors:
{
_id: 1,
name: "Name 1"
}
movies:
{
_id: 1,
name: "The Terminator",
production_year: 1984,
actors: [
{
actors_id: 1,
role_id : 1
},
{
actors_id: 2,
role_id : 1
}
]
}
I can't get a list of actors for some movie
it is not a problem when I have this:
{
_id: 1,
name: "The Terminator",
production_year: 1984,
actors: [1,2,3,4,5] (actors id's)
}
var a = db.movies.findOne(name:"The Terminator").actors
db.actors.find({"_id":{$in:a}})
but, how can I make it with this above structure:
if, I do this var a = db.movies.findOne(name:"The Terminator").actors
it returns me this:
[
{
actors_id: 1,
role_id : 1
},
{
actors_id: 2,
role_id : 1
}
]
How do I get only this in array [1,2] (actors_id) to get the names of actors (with $in)
Thanks,
Zoran
You don't. Within MongoDB you always query for documents so you have to make sure your schema is such that you can get all the information you need by querying for specific documents. There is no join/view like functionality in MongoDB.
Denormalization is usually the most appropriate choice in such cases. Your schema looks like it's designed for a traditional relational database and you will have to try and let go of some of the schema design principles that come with relational data.
Specifically for your example you could add the actor name to the embedded array so you have that information after querying for the movie.
Finally, consider if you're using the right tool for what you need to do. Too often people think of MongoDB is a "fast MySQL" which is entirely wrong. Document databases are very different to RDBMS and even k/v stores. If you have a lot of related data use an RDBMS.
variable a in db.movies.findOne(name:"The Terminator").actors is an array of documents, so you'd have to make it an array of integers (ids)