Hopefully I can describe this correctly but I come from the RDBMS world and I'm building an inventory type application with Meteor. Meteor and Mongodb may not be the best option for this application but hopefully it can be done and this seems like a circumstance that many converts will run into.
I'm trying to forget many of the things I know about relational databases and referential integrity so I can get my head wrapped around Mongodb but I'm hung up on this issue and how I would appropriately find the data with Meteor.
The inventory application will have a number of drop downs but I'll use an example to better explain. Let's say I wanted to track an item so I'll want the Name, Qty on Hand, Manufacturer, and Location. Much more than that but I'm keeping it simple.
The Name and Qty on Hand are easy since they are entered by the user but the Manufacturer and the Location should be chosen in a drop down from a data driven list (I'm assuming a Collection of sorts (or a new one added to the list if it is a new Manufacturer or Location). Odds are that I will use the Autocomplete package as well but the point is the same. I certainly wouldn't want the end user to misspell the Manufacturer name and thereby end up with documents that are supposed to have the same Manufacturer but that don't due to a typo. So I need some way to enforce the integrity of the data stored for Manufacturer and Location.
The reason is because when the user is viewing all inventory items later, they will have the option of filtering the data. They might want to filter the inventory items by Manufacturer. Or by Location. Or by both.
In my relational way of thinking this would just be three tables. INVENTORY, MANUFACTURER, and LOCATION. In the INVENTORY table I would store the ID of the related respective table row.
I'm trying to figure out how to store this data with Mongodb and, equally important, how to then find these Manufacturer and Location items to populate the drop down in the first place.
I found the following article which helps me understand some things but not quite what I need to connect the dots in my head.
Thanks!
referential data
[EDIT]
Still working at this, of course, but the best I've come up with is to do it normalized way much like is listed in the above article. Something like this:
inventory
{
name: "Pen",
manufacturer: id: "25643"},
location: {id: "95789"}
}
manufacturer
{
name: "BIC",
id: "25643"
}
location
{
name: "East Warehouse",
id: "95789"
}
Seems like this (in a more simple form) would have to be an extremely common need for many/most applications so want to make sure that I'm approaching it correctly. Even if this example code were correct, should I use an id field with generated numbers like that or should I just use the built-in _id field?
I've come from a similar background so I don't know if I'm doing it correctly in my application but I have gone for a similar option to you. My app is an e-learning app so an Organisation will have many Courses.
So my schema looks similar to yours except I obviously have an array of objects that look like {course_id: <id>}
I then registered a helper than takes the data from the organisation and adds in the additional data I need about the courses.
// Gets Organisation Courses - In your case could get the locations/manufacturers
UI.registerHelper('organisationCourses', function() {
user = Meteor.user();
if (user) {
organisation = Organisations.findOne({_id: user.profile.organisation._id});
courses = organisation.courses.courses;
return courses;
} else {
return false;
}
});
// This takes the coursedata and for each course_id value finds and adds all the course data to the object
UI.registerHelper('courseData', function() {
var courseContent = this;
var course = Courses.findOne({'_id': courseContent.course_id});
return _.extend(courseContent, _.omit(course, '_id'));
});
Then from my page all I have to call is:
{{#each organisationCourses}}
{{#with courseData}}
{{> admListCoursesItem}}
{{/with}}
{{/each}}
If I remember rightly I picked up this approach from an EventedMind How-to video.
Related
I have a general MongoDB question as I have recently found an issue with how I store things.
Currently, there is a collection called spaces like this:
{
_id: 5e1c4689429a8a0decf16f69,
challengers: [
5dfa24dce9cbc0180fb60226,
5dfa26f46719311869ac1756,
5dfa270c6719311869ac1757
],
tasks: [],
owner: 5dfa24dce9cbc0180fb60226,
name: 'testSpace',
description: 'testSpace'
}
As you can see, this has a challengers array, in which we store the ID of the User.
Would it be okey, if instead of storing the ID, I would store the entire User object, minus fields such as password etc?
Or should I continue with this reference path of referring to the ID of other documents?
The problem I have with this, is that when I want to go through all the spaces that a user has, I want to see what members are a part of that space (challengers array). However, I receive the IDS instead of name and email obviously. I am therefore struggling with sending the correct data to the frontend (I have tried doing some manual manipulation without luck).
So, if I have to continue the path of reference, then I will need to solve my problem somehow.
If it is okey to store the entire object in the array, It would be a lot easier.
HOWEVER, I want to do what is the best practice.
Thank you everyone!
I'm designing a REST API where you can search for data in different countries, but since you can search for the same thing, at the same time, in different countries (max 4), am I unsure of the best/correct way to do it.
This would work to start with to get data (I'm using cars as an example):
/api/uk,us,nl/car/123
That request could return different ids for the different countries (uk=1,us=2,nl=3), so what do I do when data is requested for those 3 countries?
For a nice structure I could get the data one at the time:
/api/uk/car/1
/api/us/car/2
/api/nl/car/3
But that is not very efficient since it hits the backend 3 times.
I could do this:
/api/car/?uk=1&us=2&nl=3
But that doesn't work very well if I want to add to that path:
/api/uk/car/1/owner
Because that would then turn into:
/api/car/owner/?uk=1&us=2&nl=3
Which doesn't look good.
Anyone got suggestions on how to structure this in a good way?
I answered a similar question before, so I will stick to that idea:
You have a set of elements -cars- and you want to filter it in some way. My advice is add any filter as a field. If the field is not present, then choose one country based on the locale of the client:
mydomain.com/api/v1/car?countries=uk,us,nl
This field should dissapear when you look for a specific car or its owner
mydomain.com/api/v1/car/1/owner
because the country is not needed (unless the car ID 1 is reused for each country)
Update:
I really did not expect the id of the car can be shared by several cars, an ID should be unique (like a primary key in a database). Then, it makes sense to keep the country parameter with the owner's search:
mydomain.com/api/v1/car/1/owner?countries=uk,us
This should return a list of people who own a car with the id 1... but for me this makes little sense as a functionality, in this search I'll only allow one country:
mydomain.com/api/v1/car/1/owner?country=uk
I wonder, How do I change a live data schema with MongoDB ?
For example If I have "Users" collection with the following document:
var user = {
_id:123312,
name:"name",
age:12,
address:{
country:"",
city:"",
location:""
}
};
now, in a new version of my application, if I add a new property to "User" entity, let us say weight, tall or adult ( based on users year ), How to change all the current live data which does not have adult property. I read MapReduce and group aggregation command but, they seem to be comfortable and suitable for analytic operation or other calculations, or I am wrong.
So what is the best way to change your current running data schema in MongoDB ?
It really depends upon your programming language. MongoDB is really good at having a dynamic schema. I think your pattern of thought at the moment is too SQL related whereby you believe that all rows, even if they do not yet have a value, must have the new field.
The reality is quite different. The rows which have nothing meaningful to put into them do not require the field and you can, in your application, just check to see if the returned document has a value, if not then you can assume, as in a fixed SQL schema, that the value is null.
So this is one aspect where MongoDB shines, is the fact that you don't have to apply that new field to the entire schema on demand, instead you can lazy fill it as data is entered by the user.
So just code the field into your application and let the user do the work for you.
The best way to add this field is to write a loop, in maybe the console close or on the primary of your replica (if you have one, otherwise just on the server), like so:
db.users.find().forEach(function(doc){
doc.weight = '44 stone';
db.users.save(doc);
});
That is currently the best way to do something like what your asking.
We have nested categories for several products (e.g., Sports -> Basketball -> Men's, Sports -> Tennis -> Women's ) and are using Mongo instead of MySQL.
We know how to store nested categories in a SQL database like MySQL, but would appreciate any advice on what to do for Mongo. The operation we need to optimize for is quickly finding all products in one category or subcategory, which could be nested several layers below a root category (e.g., all products in the Men's Basketball category or all products in the Women's Tennis category).
This Mongo doc suggests one approach, but it says it doesn't work well when operations are needed for subtrees, which we need (since categories can reach multiple levels).
Any suggestions on the best way to efficiently store and search nested categories of arbitrary depth?
The first thing you want to decide is exactly what kind of tree you will use.
The big thing to consider is your data and access patterns. You have already stated that 90% of all your work will be querying and by the sounds of it (e-commerce) updates will only be run by administrators, most likely rarely.
So you want a schema that gives you the power of querying quickly on child through a path, i.e.: Sports -> Basketball -> Men's, Sports -> Tennis -> Women's, and doesn't really need to truly scale to updates.
As you so rightly pointed out MongoDB does have a good documentation page for this: https://docs.mongodb.com/manual/applications/data-models-tree-structures/ whereby 10gen actually state different models and schema methods for trees and describes the main ups and downs of them.
The one that should catch the eye if you are looking to query easily is materialised paths: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
This is a very interesting method to build up trees since to query on the example you gave above into "Womens" in "Tennis" you could simply do a pre-fixed regex (which can use the index: http://docs.mongodb.org/manual/reference/operator/regex/ ) like so:
db.products.find({category: /^Sports,Tennis,Womens[,]/})
to find all products listed under a certain path of your tree.
Unfortunately this model is really bad at updating, if you move a category or change its name you have to update all products and there could be thousands of products under one category.
A better method would be to house a cat_id on the product and then separate the categories into a separate collection with the schema:
{
_id: ObjectId(),
name: 'Women\'s',
path: 'Sports,Tennis,Womens',
normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}
So now your queries only involve the categories collection which should make them much smaller and more performant. The exception to this is when you delete a category, the products will still need touching.
So an example of changing "Tennis" to "Badmin":
db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
doc.path = doc.path.replace(/,Tennis/, ",Badmin");
db.categories.save(doc);
});
Unfortunately MongoDB provides no in-query document reflection at the moment so you do have to pull them out client side which is a little annoying, however hopefully it shouldn't result in too many categories being brought back.
And this is basically how it works really. It is a bit of a pain to update but the power of being able to query instantly on any path using an index is more fitting for your scenario I believe.
Of course the added benefit is that this schema is compatible with nested set models: http://en.wikipedia.org/wiki/Nested_set_model which I have found time and time again are just awesome for e-commerce sites, for example, Tennis might be under both "Sports" and "Leisure" and you want multiple paths depending on where the user came from.
The schema for materialised paths easily supports this by just adding another path, that simple.
Hope it makes sense, quite a long one there.
If all categories are distinct then think of them as tags. The hierarchy isn't necessary to encode in the items because you don't need them when you query for items. The hierarchy is a presentational thing. Tag each item with all the categories in it's path, so "Sport > Baseball > Shoes" could be saved as {..., categories: ["sport", "baseball", "shoes"], ...}. If you want all items in the "Sport" category, search for {categories: "sport"}, if you want just the shoes, search for {tags: "shoes"}.
This doesn't capture the hierarchy, but if you think about it that doesn't matter. If the categories are distinct, the hierarchy doesn't help you when you query for items. There will be no other "baseball", so when you search for that you will only get things below the "baseball" level in the hierarchy.
My suggestion relies on categories being distinct, and I guess they aren't in your current model. However, there's no reason why you can't make them distinct. You've probably chosen to use the strings you display on the page as category names in the database. If you instead use symbolic names like "sport" or "womens_shoes" and use a lookup table to find the string to display on the page (this will also save you hours of work if the name of a category ever changes -- and it will make translating the site easier, if you would ever need to do that) you can easily make sure that they are distinct because they don't have anything to do with what is displayed on the page. So if you have two "Shoes" in the hierarchy (for example "Tennis > Women's > Shoes" and "Tennis > Men's > Shoes") you can just add a qualifier to make them distinct (for example "womens_shoes" and "mens_shoes", or "tennis_womens_shoes") The symbolic names are arbitrary and can be anything, you could even use numbers and just use the next number in the sequence every time you add a category.
I have songs and tags. A tag can be of a type like "recording location" or "last recording date".
In a relational model I would have a join model that hold song_id and tag_id infos. But in a document base DB like indexeddb I would store the tags and their infos directly in the document. I wonder if that would not lead to DB bload in the long run if I do not have many unique tags?
If another song would require one of the tags, that are already used on another song, I would have a duplicate tag.
I could of course go with a join store here too, but this would then also include manual fetches over 2 tables.
I have several questions to the model:
Should I have a songs and a tags store?
How are bulk updates of tags, that a attached to each song performed?
What indices would I probably need to make this fast?
My main aspect is to search via tag values (and filtered by type).
The key with using IDB and other NoSQL stores well is not getting caught up in join ids, and just trying to make each object store useful in its own right. (And using indexes well!) Remember this is just how SQL-ish databases work under the hood, but it allows you to do specialized joins when you need it rather than in the general case.
Bulk updates are more of a challenge but this is more about optimizing for the most common case (showing looking up songs/tags) than optimizing for more rare things (bulk-changing tag names)
The most basic schema you're looking at is storing songs as something like:
{ name: "Name of Song"
id: <song id>
tags: ["tag1", "tag2", "tag3"] }
And tags can simply be id'd by their name:
{ name: "tagname"
description: "Some tag description or whatever" }
First create your song objectStore:
var songs = db.createObjectStore("songs", "id");
Then create an multiEntry index:
songs.createIndex("tags", "tags", {multiEntry: true});
Then create a tag objectStore:
var tags = db.createObjectStore("tags", "name");
Now you can do the joins yourself if you really need to, but sometimes it's just some on-off stuff.
var trans = db.transaction(["songs", "tags"]);
var songTags = [];
trans.objectStore("songs").get("songid").onsuccess = function(e) {
var song = e.target.result;
for (var i = 0; i < song.tags.length; i++) {
trans.objectStore("tags").get(song.tags[i]).onsuccess = function(e) {
var tag = e.target.result;
songTags.push(tag);
}
}
}
trans.oncomplete = function(e) {
showSongTags(songTags);
}
Thanks to the 'tags' index on songs, the reverse is pretty easy too. Note that we're using tag names directly, not messing around with some intermediate numeric tag_id.
var trans = db.transaction(["songs", "tags"]);
var songs = [];
trans.objectStore("songs").index("tags").openCursor("tag1").onsuccess = function(e) {
var cursor = e.target.result;
if (!cursor) return;
cursor.continue();
songs.push(cursor.value);
}
trans.oncomplete = function(e) {
showSongs(songs);
}
You're basically facing the same problem people face when they have to choose between a relational database like MySQL and a document-oriented one like MongoDB. Duplicating data -- not to mention the keys themselves -- take up space and it's definitely more "third form"ish to store one copy and use foreign keys instead.
That said, I've been through both sides of this in my IndexedDB work. Believing the same thing you do -- regarding storage efficiency -- needs to be weighed against the way you're actually accessing the data. When you want to achieve foreign key-like schema in IndexedDB it necessarily requires 2 plus object stores, like stored as two separate files on the underlying filesystem. That means for each query that you want foreign key data (here, tags) you have to have at least two object store hits, probably two transactions and, I'm assuming, the extra io overhead and such associated with those.
I'd go with the document-oriented approach and try to make the storage hit less severe with tricks like key shorthand (e.g. "n" instead of "name).