Most efficient way to store nested categories (or hierarchical data) in Mongo? - mongodb

We have nested categories for several products (e.g., Sports -> Basketball -> Men's, Sports -> Tennis -> Women's ) and are using Mongo instead of MySQL.
We know how to store nested categories in a SQL database like MySQL, but would appreciate any advice on what to do for Mongo. The operation we need to optimize for is quickly finding all products in one category or subcategory, which could be nested several layers below a root category (e.g., all products in the Men's Basketball category or all products in the Women's Tennis category).
This Mongo doc suggests one approach, but it says it doesn't work well when operations are needed for subtrees, which we need (since categories can reach multiple levels).
Any suggestions on the best way to efficiently store and search nested categories of arbitrary depth?

The first thing you want to decide is exactly what kind of tree you will use.
The big thing to consider is your data and access patterns. You have already stated that 90% of all your work will be querying and by the sounds of it (e-commerce) updates will only be run by administrators, most likely rarely.
So you want a schema that gives you the power of querying quickly on child through a path, i.e.: Sports -> Basketball -> Men's, Sports -> Tennis -> Women's, and doesn't really need to truly scale to updates.
As you so rightly pointed out MongoDB does have a good documentation page for this: https://docs.mongodb.com/manual/applications/data-models-tree-structures/ whereby 10gen actually state different models and schema methods for trees and describes the main ups and downs of them.
The one that should catch the eye if you are looking to query easily is materialised paths: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
This is a very interesting method to build up trees since to query on the example you gave above into "Womens" in "Tennis" you could simply do a pre-fixed regex (which can use the index: http://docs.mongodb.org/manual/reference/operator/regex/ ) like so:
db.products.find({category: /^Sports,Tennis,Womens[,]/})
to find all products listed under a certain path of your tree.
Unfortunately this model is really bad at updating, if you move a category or change its name you have to update all products and there could be thousands of products under one category.
A better method would be to house a cat_id on the product and then separate the categories into a separate collection with the schema:
{
_id: ObjectId(),
name: 'Women\'s',
path: 'Sports,Tennis,Womens',
normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}
So now your queries only involve the categories collection which should make them much smaller and more performant. The exception to this is when you delete a category, the products will still need touching.
So an example of changing "Tennis" to "Badmin":
db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
doc.path = doc.path.replace(/,Tennis/, ",Badmin");
db.categories.save(doc);
});
Unfortunately MongoDB provides no in-query document reflection at the moment so you do have to pull them out client side which is a little annoying, however hopefully it shouldn't result in too many categories being brought back.
And this is basically how it works really. It is a bit of a pain to update but the power of being able to query instantly on any path using an index is more fitting for your scenario I believe.
Of course the added benefit is that this schema is compatible with nested set models: http://en.wikipedia.org/wiki/Nested_set_model which I have found time and time again are just awesome for e-commerce sites, for example, Tennis might be under both "Sports" and "Leisure" and you want multiple paths depending on where the user came from.
The schema for materialised paths easily supports this by just adding another path, that simple.
Hope it makes sense, quite a long one there.

If all categories are distinct then think of them as tags. The hierarchy isn't necessary to encode in the items because you don't need them when you query for items. The hierarchy is a presentational thing. Tag each item with all the categories in it's path, so "Sport > Baseball > Shoes" could be saved as {..., categories: ["sport", "baseball", "shoes"], ...}. If you want all items in the "Sport" category, search for {categories: "sport"}, if you want just the shoes, search for {tags: "shoes"}.
This doesn't capture the hierarchy, but if you think about it that doesn't matter. If the categories are distinct, the hierarchy doesn't help you when you query for items. There will be no other "baseball", so when you search for that you will only get things below the "baseball" level in the hierarchy.
My suggestion relies on categories being distinct, and I guess they aren't in your current model. However, there's no reason why you can't make them distinct. You've probably chosen to use the strings you display on the page as category names in the database. If you instead use symbolic names like "sport" or "womens_shoes" and use a lookup table to find the string to display on the page (this will also save you hours of work if the name of a category ever changes -- and it will make translating the site easier, if you would ever need to do that) you can easily make sure that they are distinct because they don't have anything to do with what is displayed on the page. So if you have two "Shoes" in the hierarchy (for example "Tennis > Women's > Shoes" and "Tennis > Men's > Shoes") you can just add a qualifier to make them distinct (for example "womens_shoes" and "mens_shoes", or "tennis_womens_shoes") The symbolic names are arbitrary and can be anything, you could even use numbers and just use the next number in the sequence every time you add a category.

Related

Geofire TableView - CircleQuery Users for leaderboard [duplicate]

I'm trying to figure out how to query with filter with Geofire.
Suppose I have restaurants with different category. and I want to add that category to my query. How do I go about this?
One way I have now is querying the key with Geofire, run the for loop through each key and get the restaurant, and insert the appropriate restaurant to the array.
These seems so inefficient. Is there any other way to go about this?
Ideally I will have the filtered results, and only load each item when they're about to be shown.
Cheers!
Firebase queries can only filter by one condition. Geofire already does quite some "magic" to allow it to filter on both longitude and latitude. Adding another property to that equation might be possible, but is well beyond what Geofire handles by default. See GeoFire: How to add extra conditions within the query?
If you only ever want to access one category at a time, you can put the restaurants in a top-level node per category and point Geofire to one category.
/category1
item1
g: "pns0h0mf2u"
l: [-53.435719, 140.808716]
item2
g: "u417k3dwub"
l: [56.83069, 1.94822]
/category2
item3
g: "8m3rz3s480"
l: [30.902225, -166.66809]
/items
item1: ...
item2: ...
item3: ...
In the above example, we have two categories: category1 with 2 items and category2 with just 1 item. For each item, we see the data that Geofire uses: a geohash and the longitude and latitude. We also keep a single list with the other properties of these 3 items.
But more commonly, you simply do the extra filtering in client-side code. If you're worried about the performance of that: measure it, share the code, JSON data and measurements.
This is an old question, but I've seen it in a few places on the web, so I thought I might share one trick I've used.
The Problem
If you have a large collection in your database, maybe containing hundreds of thousands of keys, for example, it might not be feasible to grab them all. If you're trying to filter results based on location in addition to other criteria, you're stuck with something like:
Execute the location query
Loop through each returned geofire key and grab the corresponding data in the database
Check each returned piece of data to see if it matches the other criteria
Unfortunately, that's a lot of network requests, which is quite slow.
More concretely, let's say we want to get all users within e.g. 100 miles of a particular location that are male and between ages 20 and 25. If there are 10,000 users within 100 miles, that means 10,000 network requests to grab the user data and compare their gender and age.
The Workaround:
You can store the data you need for your comparisons in the geofire key itself, separated by a delimiter. Then, you can just split the keys returned by the geofire query to get access to the data. You still have to filter through them, but it's much faster than sending hundreds or thousands of requests.
For instance, you could use the format:
UserID*gender*age, which might look something like facebook:1234567*male*24. The important points are
Separate data points by a delimiter
Use a valid character for the delimiter -- "It can include any unicode characters except for . $ # [ ] / and ASCII control characters 0-31 and 127.)"
Use a character that is not going to be found elsewhere in your database - I used *, but that might not work for you. Do not use any characters from -0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz, since those are fair-game for keys generated by firebase's push()
Choose a consistent order for the data - in this case, UserID first, then gender, then age.
You can store up to 768 bytes of data in firebase keys, which goes a long way.
Hope this helps!

REST API structure for multiple countries

I'm designing a REST API where you can search for data in different countries, but since you can search for the same thing, at the same time, in different countries (max 4), am I unsure of the best/correct way to do it.
This would work to start with to get data (I'm using cars as an example):
/api/uk,us,nl/car/123
That request could return different ids for the different countries (uk=1,us=2,nl=3), so what do I do when data is requested for those 3 countries?
For a nice structure I could get the data one at the time:
/api/uk/car/1
/api/us/car/2
/api/nl/car/3
But that is not very efficient since it hits the backend 3 times.
I could do this:
/api/car/?uk=1&us=2&nl=3
But that doesn't work very well if I want to add to that path:
/api/uk/car/1/owner
Because that would then turn into:
/api/car/owner/?uk=1&us=2&nl=3
Which doesn't look good.
Anyone got suggestions on how to structure this in a good way?
I answered a similar question before, so I will stick to that idea:
You have a set of elements -cars- and you want to filter it in some way. My advice is add any filter as a field. If the field is not present, then choose one country based on the locale of the client:
mydomain.com/api/v1/car?countries=uk,us,nl
This field should dissapear when you look for a specific car or its owner
mydomain.com/api/v1/car/1/owner
because the country is not needed (unless the car ID 1 is reused for each country)
Update:
I really did not expect the id of the car can be shared by several cars, an ID should be unique (like a primary key in a database). Then, it makes sense to keep the country parameter with the owner's search:
mydomain.com/api/v1/car/1/owner?countries=uk,us
This should return a list of people who own a car with the id 1... but for me this makes little sense as a functionality, in this search I'll only allow one country:
mydomain.com/api/v1/car/1/owner?country=uk

Mongo - How to query by category

I have the need to list links on a page, grouped together by category.
In MySql this would be easy - I'd have a category table and a links table. The Links table would include a category Id. I could then loop over the entries in the category table and
query the the links table where category matches the current category.
I've been reading about model tree structures in Mongo and things seem to work a little differently (link). The basic list seems to be either store a parent or child reference in the document directly which allows you to
create a tree strcuture. I'm unsure how to use these sort of structures to output the data as I need it. I need to somehow loop over the categories and then query the links based on that.
I'm wondering if I can just take the same approach as I would have with Sql? So have a seperate categories collection and then in the links collection just have an object Id that points to the categories? I can then loop over categories and query the links table each time for all links with a matching category.

Meteor and MongoDB dropdown population and integrity

Hopefully I can describe this correctly but I come from the RDBMS world and I'm building an inventory type application with Meteor. Meteor and Mongodb may not be the best option for this application but hopefully it can be done and this seems like a circumstance that many converts will run into.
I'm trying to forget many of the things I know about relational databases and referential integrity so I can get my head wrapped around Mongodb but I'm hung up on this issue and how I would appropriately find the data with Meteor.
The inventory application will have a number of drop downs but I'll use an example to better explain. Let's say I wanted to track an item so I'll want the Name, Qty on Hand, Manufacturer, and Location. Much more than that but I'm keeping it simple.
The Name and Qty on Hand are easy since they are entered by the user but the Manufacturer and the Location should be chosen in a drop down from a data driven list (I'm assuming a Collection of sorts (or a new one added to the list if it is a new Manufacturer or Location). Odds are that I will use the Autocomplete package as well but the point is the same. I certainly wouldn't want the end user to misspell the Manufacturer name and thereby end up with documents that are supposed to have the same Manufacturer but that don't due to a typo. So I need some way to enforce the integrity of the data stored for Manufacturer and Location.
The reason is because when the user is viewing all inventory items later, they will have the option of filtering the data. They might want to filter the inventory items by Manufacturer. Or by Location. Or by both.
In my relational way of thinking this would just be three tables. INVENTORY, MANUFACTURER, and LOCATION. In the INVENTORY table I would store the ID of the related respective table row.
I'm trying to figure out how to store this data with Mongodb and, equally important, how to then find these Manufacturer and Location items to populate the drop down in the first place.
I found the following article which helps me understand some things but not quite what I need to connect the dots in my head.
Thanks!
referential data
[EDIT]
Still working at this, of course, but the best I've come up with is to do it normalized way much like is listed in the above article. Something like this:
inventory
{
name: "Pen",
manufacturer: id: "25643"},
location: {id: "95789"}
}
manufacturer
{
name: "BIC",
id: "25643"
}
location
{
name: "East Warehouse",
id: "95789"
}
Seems like this (in a more simple form) would have to be an extremely common need for many/most applications so want to make sure that I'm approaching it correctly. Even if this example code were correct, should I use an id field with generated numbers like that or should I just use the built-in _id field?
I've come from a similar background so I don't know if I'm doing it correctly in my application but I have gone for a similar option to you. My app is an e-learning app so an Organisation will have many Courses.
So my schema looks similar to yours except I obviously have an array of objects that look like {course_id: <id>}
I then registered a helper than takes the data from the organisation and adds in the additional data I need about the courses.
// Gets Organisation Courses - In your case could get the locations/manufacturers
UI.registerHelper('organisationCourses', function() {
user = Meteor.user();
if (user) {
organisation = Organisations.findOne({_id: user.profile.organisation._id});
courses = organisation.courses.courses;
return courses;
} else {
return false;
}
});
// This takes the coursedata and for each course_id value finds and adds all the course data to the object
UI.registerHelper('courseData', function() {
var courseContent = this;
var course = Courses.findOne({'_id': courseContent.course_id});
return _.extend(courseContent, _.omit(course, '_id'));
});
Then from my page all I have to call is:
{{#each organisationCourses}}
{{#with courseData}}
{{> admListCoursesItem}}
{{/with}}
{{/each}}
If I remember rightly I picked up this approach from an EventedMind How-to video.

Need a good database design for this situation

I am making an application for a restaurant.
For some food items, there are some add-ons available - e.g. Toppings for Pizza.
My current design for Order Table-
FoodId || AddOnId
If a customer opts for multiple addons for a single food item (say Topping and Cheese Dip for a Pizza), how am I gonna manage?
Solutions I thought of -
Ids separated by commas in AddOnId column (Bad idea i guess)
Saving Combinations of all addon as a different addon in Addon Master Table.
Making another Trans table for only Addon for ordered food item.
Please suggest.
PS - I searched a lot for a similar question but cudnt find one.
Your relationship works like this:
(1 Order) has (1 or more Food Items) which have (0 or more toppings).
The most detailed structure for this will be 3 tables (in addition to Food Item and Topping):
Order
Order to Food Item
Order to Food Item to Topping
Now, for some additional details. Let's start flushing out the tables with some fields...
Order
OrderId
Cashier
Server
OrderTime
Order to Food Item
OrderToFoodItemId
OrderId
FoodItemId
Size
BaseCost
Order to Food Item to Topping
OrderToFoodItemId
ToppingId
LeftRightOrWhole
Notice how much information you can now store about an order that is not dependent on anything except that particular order?
While it may appear to be more work to maintain more tables, the truth is that it structures your data, allowing you many added advantages... not the least of which is being able to more easily compose sophisticated reports.
You want to model two many-to-many realtionships by the sound of it.
i.e. Many products (food items) can belong to many orders, and many addons can belong to many products:
Orders
Id
Products
Id
OrderLines
Id
OrderId
ProductId
Addons
Id
ProductAddons
Id
ProductId
AddonId
Option 1 is certainly a bad idea as it breaks even first normal form.
why dont you go for many-to-many relationship.
situation: one food can have many toppings, and one toppings can be in many food.
you have a food table and a toppings table and another FoodToppings bridge table.
this is just a brief idea. expand the database with your requirement
You're right, first one is a bad idea, because it is not compliant with normal form of tables and it would be hard to maintain it (e.g. if you remove some addon you would need to parse strings to remove ids from each row - really slow).
Having table you have already there is nothing wrong, but the primary key of that table will be (foodId, addonId) and not foodId itself.
Alternatively you can add another "id" not to use compound primary key.