MongoDB many-to-many search - mongodb

this is my collections: (many-to-many)
actors:
{
_id: 1,
name: "Name 1"
}
movies:
{
_id: 1,
name: "The Terminator",
production_year: 1984,
actors: [
{
actors_id: 1,
role_id : 1
},
{
actors_id: 2,
role_id : 1
}
]
}
I can't get a list of actors for some movie
it is not a problem when I have this:
{
_id: 1,
name: "The Terminator",
production_year: 1984,
actors: [1,2,3,4,5] (actors id's)
}
var a = db.movies.findOne(name:"The Terminator").actors
db.actors.find({"_id":{$in:a}})
but, how can I make it with this above structure:
if, I do this var a = db.movies.findOne(name:"The Terminator").actors
it returns me this:
[
{
actors_id: 1,
role_id : 1
},
{
actors_id: 2,
role_id : 1
}
]
How do I get only this in array [1,2] (actors_id) to get the names of actors (with $in)
Thanks,
Zoran

You don't. Within MongoDB you always query for documents so you have to make sure your schema is such that you can get all the information you need by querying for specific documents. There is no join/view like functionality in MongoDB.
Denormalization is usually the most appropriate choice in such cases. Your schema looks like it's designed for a traditional relational database and you will have to try and let go of some of the schema design principles that come with relational data.
Specifically for your example you could add the actor name to the embedded array so you have that information after querying for the movie.
Finally, consider if you're using the right tool for what you need to do. Too often people think of MongoDB is a "fast MySQL" which is entirely wrong. Document databases are very different to RDBMS and even k/v stores. If you have a lot of related data use an RDBMS.

variable a in db.movies.findOne(name:"The Terminator").actors is an array of documents, so you'd have to make it an array of integers (ids)

Related

Meteor Collection: find element in array

I have no experience with NoSQL. So, I think, if I just try to ask about the code, my question can be incorrect. Instead, let me explain my problem.
Suppose I have e-store. I have catalogs
Catalogs = new Mongo.Collection('catalogs);
and products in that catalogs
Products = new Mongo.Collection('products');
Then, people add there orders to temporary collection
Order = new Mongo.Collection();
Then, people submit their comments, phone, etc and order. I save it to collection Operations:
Operations.insert({
phone: "phone",
comment: "comment",
etc: "etc"
savedOrder: Order //<- Array, right? Or Object will be better?
});
Nice, but when i want to get stats by every product, in what Operations product have used. How can I search thru my Operations and find every operation with that product?
Or this way is bad? How real pro's made this in real world?
If I understand it well, here is a sample document as stored in your Operation collection:
{
clientRef: "john-001",
phone: "12345678",
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 1, itemRef: "XYZ001", unitPriceInCts: 1050, desc: "USB Pen Drive 8G" },
{ qty: 1, itemRef: "ABC002", unitPriceInCts: 19995, desc: "Entry level motherboard" },
]
}
},
{
clientRef: "paul-002",
phone: null,
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 3, itemRef: "XYZ001", unitPriceInCts: 950, desc: "USB Pen Drive 8G" },
]
}
},
Given that, to find all operations having item reference XYZ001 you simply have to query:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"})
This will return the whole document. If instead you are only interested in the client reference (and operation _id), you will use a projection as an extra argument to find:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"}, {"clientRef": 1})
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "clientRef" : "john-001" }
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c17a"), "clientRef" : "paul-002" }
If you need to perform multi-documents (incl. multi-embedded documents) operations, you should take a look at the aggregation framework:
For example, to calculate the total of an order:
> db.operations.aggregate([
{$match: { "_id" : ObjectId("556f07b5d5f2fb3f94b8c179") }},
{$unwind: "$savedOrder.lines" },
{$group: { _id: "$_id",
total: {$sum: {$multiply: ["$savedOrder.lines.qty",
"$savedOrder.lines.unitPriceInCts"]}}
}}
])
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "total" : 21045 }
I'm an eternal newbie, but since no answer is posted, I'll give it a try.
First, start by installing robomongo or a similar software, it will allow you to have a look at your collections directly in mongoDB (btw, the default port is 3001)
The way I deal with your kind of problem is by using the _id field. It is a field automatically generated by mongoDB, and you can safely use it as an ID for any item in your collections.
Your catalog collection should have a string array field called product where you find all your products collection items _id. Same thing for the operations: if an order is an array of products _id, you can do the same and store this array of products _id in your savedOrder field. Feel free to add more fields in savedOrder if necessary, e.g. you make an array of objects products with additional fields such as discount.
Concerning your queries code, I assume you will find all you need on the web as soon as you figure out what your structure is.
For example, if you have a product array in your savedorder array, you can pull it out like that:
Operations.find({_id: "your operation ID"},{"savedOrder.products":1)
Basically, you ask for all the products _id in a specific operation. If you have several savedOrders in only one operation, you can specify too the savedOrder _id, if you used the one you had in your local collection.
Operations.find({_id: "your_operation_ID", "savedOrder._id": "your_savedOrder_ID"},{"savedOrder.products":1)
ps: to bad-ass coders here, if I'm doing it wrong, please tell me.
I find an answer :) Of course, this is not a reveal for real professionals, but is a big step for me. Maybe my experience someone find useful. All magic in using correct mongo operators. Let solve this problem in pseudocode.
We have a structure like this:
Operations:
1. Operation: {
_id: <- Mongo create this unique for us
phone: "phone1",
comment: "comment1",
savedOrder: [
{
_id: <- and again
productId: <- whe should save our product ID from 'products'
name: "Banana",
quantity: 100
},
{
_id:,
productId: <- Another ID, that we should save if order
name: "apple",
quantity: 50
}
]
And if we want to know, in what Operation user take "banana", we should use mongoDB operator"elemMatch" in Mongo docs
db.getCollection('operations').find({}, {savedOrder: {$elemMatch:{productId: "f5mhs8c2pLnNNiC5v"}}});
In simple, we get documents our saved order have products with id that we want to find. I don't know is it the best way, but it works for me :) Thank you!

JSON Schema with dynamic key field in MongoDB

Want to have a i18n support for objects stored in mongodb collection
currently our schema is like:
{
_id: "id"
name: "name"
localization: [{
lan: "en-US",
name: "name_in_english"
}, {
lan: "zh-TW",
name: "name_in_traditional_chinese"
}]
}
but my thought is that field "lan" is unique, can I just use this field as a key, so the structure would be
{
_id: "id"
name: "name"
localization: {
"en-US": "name_in_english",
"zh-TW": "name_in_traditional_chinese"
}
}
which would be neater and easier to parse (just localization[language] would get the value i want for specific language).
But then the question is: Is this a good practice in storing data in MongoDB? And how to pass the json-schema check?
It is not a good practice to have values as keys. The language codes are values and as you say you can not validate them against a schema. It makes querying against it impossible. For example, you can't figure out if you have a language translation for "nl-NL" as you can't compare against keys and neither is it possible to easily index this. You should always have descriptive keys.
However, as you say, having the languages as keys makes it a lot easier to pull the data out as you can just access it by ['nl-NL'] (or whatever your language's syntax is).
I would suggest an alternative schema:
{
your_id: "id_for_name"
lan: "en-US",
name: "name_in_english"
}
{
your_id: "id_for_name"
lan: "zh-TW",
name: "name_in_traditional_chinese"
}
Now you can :
set an index on { your_id: 1, lan: 1 } for speedy lookups
query for each translation individually and just get that translation:
db.so.find( { your_id: "id_for_name", lan: 'en-US' } )
query for all the versions for each id using this same index:
db.so.find( { your_id: "id_for_name" } )
and also much easier update the translation for a specific language:
db.so.update(
{ your_id: "id_for_name", lan: 'en-US' },
{ $set: { name: "ooga" } }
)
Neither of those points are possible with your suggested schemas.
Obviously the second schema example is much better for your task (of course, if lan field is unique as you mentioned, that seems true to me also).
Getting element from dictionary/associated array/mapping/whatever_it_is_called_in_your_language is much cheaper than scanning whole array of values (and in current case it's also much efficient from the storage size point of view (remember that all fields are stored in MongoDB as-is, so every record holds the whole key name for json field, not it's representation or index or whatever).
My experience shows that MongoDB is mature enough to be used as a main storage for your application, even on high-loads (whatever it means ;) ), and the main problem is how you fight database-level locks (well, we'll wait for promised table-level locks, it'll fasten MongoDB I hope a lot more), though data loss is possible if your MongoDB cluster is built badly (dig into docs and articles over Internet for more information).
As for schema check, you must do it by means of your programming language on application side before inserting records, yeah, that's why Mongo is called schemaless.
There is a case where an object is necessarily better than an array: supporting upserts into a set. For example, if you want to update an item having name 'item1' to have val 100, or insert such an item if one doesn't exist, all in one atomic operation. With an array, you'd have to do one of two operations. Given a schema like
{ _id: 'some-id', itemSet: [ { name: 'an-item', val: 123 } ] }
you'd have commands
// Update:
db.coll.update(
{ _id: id, 'itemSet.name': 'item1' },
{ $set: { 'itemSet.$.val': 100 } }
);
// Insert:
db.coll.update(
{ _id: id, 'itemSet.name': { $ne: 'item1' } },
{ $addToSet: { 'itemSet': { name: 'item1', val: 100 } } }
);
You'd have to query first to know which is needed in advance, which can exacerbate race conditions unless you implement some versioning. With an object, you can simply do
db.coll.update({
{ _id: id },
{ $set: { 'itemSet.name': 'item1', 'itemSet.val': 100 } }
});
If this is a use case you have, then you should go with the object approach. One drawback is that querying for a specific name requires scanning. If that is also needed, you can add a separate array specifically for indexing. This is a trade-off with MongoDB. Upserts would become
db.coll.update({
{ _id: id },
{
$set: { 'itemSet.name': 'item1', 'itemSet.val': 100 },
$addToSet: { itemNames: 'item1' }
}
});
and the query would then simply be
db.coll.find({ itemNames: 'item1' })
(Note: the $ positional operator does not support array upserts.)

$unwind an object in aggregation framework

In the MongoDB aggregation framework, I was hoping to use the $unwind operator on an object (ie. a JSON collection). Doesn't look like this is possible, is there a workaround? Are there plans to implement this?
For example, take the article collection from the aggregation documentation . Suppose there is an additional field "ratings" that is a map from user -> rating. Could you calculate the average rating for each user?
Other than this, I'm quite pleased with the aggregation framework.
Update: here's a simplified version of my JSON collection per request. I'm storing genomic data. I can't really make genotypes an array, because the most common lookup is to get the genotype for a random person.
variants: [
{
name: 'variant1',
genotypes: {
person1: 2,
person2: 5,
person3: 7,
}
},
{
name: 'variant2',
genotypes: {
person1: 3,
person2: 3,
person3: 2,
}
}
]
It is not possible to do the type of computation you are describing with the aggregation framework - and it's not because there is no $unwind method for non-arrays. Even if the person:value objects were documents in an array, $unwind would not help.
The "group by" functionality (whether in MongoDB or in any relational database) is done on the value of a field or column. We group by value of field and sum/average/etc based on the value of another field.
Simple example is a variant of what you suggest, ratings field added to the example article collection, but not as a map from user to rating but as an array like this:
{ title : title of article", ...
ratings: [
{ voter: "user1", score: 5 },
{ voter: "user2", score: 8 },
{ voter: "user3", score: 7 }
]
}
Now you can aggregate this with:
[ {$unwind: "$ratings"},
{$group : {_id : "$ratings.voter", averageScore: {$avg:"$ratings.score"} } }
]
But this example structured as you describe it would look like this:
{ title : title of article", ...
ratings: {
user1: 5,
user2: 8,
user3: 7
}
}
or even this:
{ title : title of article", ...
ratings: [
{ user1: 5 },
{ user2: 8 },
{ user3: 7 }
]
}
Even if you could $unwind this, there is nothing to aggregate on here. Unless you know the complete list of all possible keys (users) you cannot do much with this. [*]
An analogous relational DB schema to what you have would be:
CREATE TABLE T (
user1: integer,
user2: integer,
user3: integer
...
);
That's not what would be done, instead we would do this:
CREATE TABLE T (
username: varchar(32),
score: integer
);
and now we aggregate using SQL:
select username, avg(score) from T group by username;
There is an enhancement request for MongoDB that may allow you to do this in the aggregation framework in the future - the ability to project values to keys to vice versa. Meanwhile, there is always map/reduce.
[*] There is a complicated way to do this if you know all unique keys (you can find all unique keys with a method similar to this) but if you know all the keys you may as well just run a sequence of queries of the form db.articles.find({"ratings.user1":{$exists:true}},{_id:0,"ratings.user1":1}) for each userX which will return all their ratings and you can sum and average them simply enough rather than do a very complex projection the aggregation framework would require.
Since 3.4.4, you can transform object to array using $objectToArray
See:
https://docs.mongodb.com/manual/reference/operator/aggregation/objectToArray/
This is an old question, but I've run across a tidbit of information through trial and error that people may find useful.
It's actually possible to unwind on a dummy value by fooling the parser this way:
db.Opportunity.aggregate(
{ $project: {
Field1: 1, Field2: 1, Field3: 1,
DummyUnwindField: { $ifNull: [null, [1.0]] }
}
},
{ $unwind: "$DummyUnwindField" }
);
This will produce 1 row per document, regardless of whether or not the value exists. You may be able tinker with this to generate the results you want. I had hoped to combine this with multiple $unwinds to (sort of like emit() in map/reduce), but alas, the last $unwind wins or they combine as an intersection rather than union which makes it impossible to achieve the results I was looking for. I am sadly disappointed with the aggregate framework functionality as it doesn't fit the one use case I was hoping to use it for (and seems strangely like a lot of the questions on StackOverflow in this area are asking) - ordering results based on match rate. Improving the poor map reduce performance would have made this entire feature unnecessary.
This is what I found & extended.
Lets create experimental database in mongo
db.copyDatabase('livedb' , 'experimentdb')
Now Use experimentdb & convert Array to object in your experimentcollection
db.getCollection('experimentcollection').find({}).forEach(function(e){
if(e.store){
e.ratings = [e.ratings]; //Objects name to be converted to array eg:ratings
db.experimentcollection.save(e);
}
})
Some nerdy js code to convert json to flat object
var flatArray = [];
var data = db.experimentcollection.find().toArray();
for (var index = 0; index < data.length; index++) {
var flatObject = {};
for (var prop in data[index]) {
var value = data[index][prop];
if (Array.isArray(value) && prop === 'ratings') {
for (var i = 0; i < value.length; i++) {
for (var inProp in value[i]) {
flatObject[inProp] = value[i][inProp];
}
}
}else{
flatObject[prop] = value;
}
}
flatArray.push(flatObject);
}
printjson(flatArray);

Querying and grouping in mongoDb?

Part 1:
I have (student) collection:
{
sname : "",
studentId: "123"
age: "",
gpa: "",
}
im trying to get only two keys from it :
{
sname : "",
studentId: "123"
}
so i need to eliminate age and gpa to have only name and studentId , how could i do that ?
Part2:
Then I have 'subject' collection :
{
subjectName : "Math"
studentId : "123"
teacherName: ""
}
I need to match/combine the previous keys (in part1) with the correct studentId so I will end up with something like this :
{
sname : "",
studentId: "123",
subjectName : "Math"
}
How can i do this and is that the right way to think to get the result? i tried to read about group and mapReduce but i didnt find a clear example.
To answer your first question, you can do this:
db.student.find({}, {"sname":1, "studentId":1});
The first {} in that is the limiting query, which in this case includes the entire collection. The second half specifies keys with a 1 or 0 depending on whether or not you want them back. Don't mix include and excludes in a single query though. Except for a couple special cases, mongo won't accept it.
Your second question is more difficult. What you're asking for is a join and mongo doesn't support that. There is no way to connect the two collections on studentId. You'll need to find all the students that you want, then use those studentIds to find all the matching subjects. Then you'll need to merge the two results in your own code. You can do this through whatever driver you're using, or you can do this in javascript in the shell itself, but either way, you'll have to merge them with your own code.
Edit:
Here's an example of how you could do this in the shell with the output going to a collection called "out".
db.student.find({}, {"sname":1, "studentId":1}).forEach(
function (st) {
db.subject.find({"studentId":st.studentId}, {"subjectName":1}).forEach(
function (sub) {
db.out.insert({"sname":st.sname, "studentId":st.studentId, "subjectName":sub.subjectName});
}
);
}
);
If this isn't data that changes all that often, you could just drop the "out" collection and repopulate it periodically with this shell script. Then your code could query directly from "out". If the data does change frequently, you'll want to do this merging in your code on the fly.
Another, and possibly better, option is to include the "subject" data in the "student" collection or vice versa. This will result in a more mongodb friendly structure. If you run into this joining problem frequently, mongo may not be the way to go and a relational database may be better suited to your needs.
Mongo's find() operator lets you include or exclude certain fields from the results
Check out Field Selection in the docs for more info. You could do either:
db.users.find({}, { 'sname': 1, 'studentId': 1 });
db.users.find({}, { 'age': 0, 'gpa': 0 });
For relating your student and subject together, you could either lookup which subjects a student has separately, like this:
db.subjects.find({ studentId: 123 });
Or embed subject data with each student, and retrieve it together with the student document:
{
sname : "Roland Browning",
studentId: "123"
age: 14,
gpa: "B",
subjects: [ { name : "French", teacher: "Mr Bronson" }, ... ]
}

MongoDB: Is a range query possible using multikeys?

var jd = {
type: "Person",
attributes: {
name: "John Doe",
age: 30
}
};
var pd = {
type: "Person",
attributes: {
name: "Penelope Doe",
age: 26
}
};
var ss = {
type: "Book",
attributes: {
name: "The Sword Of Shannara",
author: "Terry Brooks"
}
};
db.things.save(jd);
db.things.save(pd);
db.things.save(ss);
db.things.ensureIndex({attributes: 1})
db.things.find({"attributes.age": 30}) // => John Doe
db.things.find({"attributes.age": 30}).explain() // => BasicCursor... (don't want a scan)
db.things.find({"attributes.age": {$gte: 18}) // John Doe, Penelope Doe (via a scan)
The goal is that all attributes be indexed and searchable via range queries and that the index actually be used (as opposed to a collection scan). There's no telling what attributes a document will have. I have read about multikeys but they seem only to work (by index) with exact-match queries.
Multikeys prefers this format for a document:
var pd = {
type: "Person",
attributes: [
{name: "Penelope Doe"},
{age: 26}
]
};
Is there a pattern where by one index I can find items by attribute using a range?
EDIT:
In a schemaless DB it makes sense to have potentially a limitless array of types, yet a collection name practically implies some sort of type. But if we go to the extreme, we want to allow for any number of types within a collection (so that we don't have to define a collection for every conceivable custom type a user might imagine). Searching, therefore, by attributes (of any sort) with just a single deep index (that supports ranged queries) makes this sort of thing far more feasible. Seems to me a natural fit for a schemaless DB.
Opened a ticket if you wanna vote it up:
http://jira.mongodb.org/browse/SERVER-2675
Yes range queries work with multikeys. However multikeys are for arrays rather than embedded objects.
In the example above try
db.things.ensureIndex({"attributes.age": 1})
Range queries are possible using multikeys; however, expressing the query can be tricky.