Basically we have four tables that are joined: Malls, Stores, Brands, and Categories.
Malls can have many Stores.
Each Store is linked to a Brand.
Each Brand has many Categories.
e.g. Mall A has 2 "McDonald's Cafes", each belonging to the Brand "McDonald's". "McDonald's" Brand has "Fast Food" and "Breakfast" as Categories.
We're trying to figure out how to display all the Categories that exist within Mall A.
e.g. Mall A has "Fast Food" and "Breakfast" categories, based on the "McDonald's" stores.
Ideally this would be a query so that the Categories are updated automatically.
We've tried querying for all Stores within a Mall, and then finding the Categories via the Store-Brand join, then reducing duplicates. But some Malls have more than 700 Stores, so this process is quite expensive in terms of querying and processing the data.
I managed to figure it out! Using Sequelize, my query was as follows:
postgres.BrandsCategories.findAndCountAll({
limit,
offset,
include: [
{
model: postgres.Brands,
as: "brands",
include: [
{
model: postgres.Stores,
as: "stores",
where: {
mallId: parent.dataValues.id
}
}
]
}
]
})
Related
I am designing some data model for mango db, I have some requirement similar to below json.
Single_Collection.
{
"collegeid": 1234,
"Name": "aaaa",
"otherinfo": 1,
"studnet":[
{
"stdid": 1,
"name": "n1"
},
{
"stdid": 2,
"name": "n2"
}
]
}
Two Collections.
College Info
{
"collegeid": 1234,
"Name": "aaaa",
"otherinfo": 1
}
Student Info collection
{
"collegeid": 1234,
"stdid": 1,
"name": "n1"
}
{
"collegeid": 1234,
"stdid": 2,
"name": "n2"
}
Which is the better way interms of reading performance(Having single collection or separating it out), I have more read like given student ID find out the college ID .
Student ID list will be very big.
Also I perform more student insertion operations
IMO,each model design has its own Pros & Cons, what we say "better way" is depending on your use cases(How you query the data? Do you need all the data at the beginning? Do you need paging? etc...)
Let's start from your requirements.
Your requirements
Given a college ID, find out the students in this college.
Given student ID, find out his college ID.
Relation between objects
Obviously college & student is 1:m mapping, because a lot of students in one college but each student can stay in one college only.
I will show you some different model designs and also provide the Pros & Cons for each model.
Approach 1 - Embed students into college
This is the design you mentioned as a single collection.
{
"collegeid":1234,
"Name":"aaaa",
"otherinfo":1,
"studnet":[
{
"stdid":1,
"name":"n1"
},
{
"stdid":2,
"name":"n2"
}
]
}
Pros:
Very natural model for human to read & front-end to display.
Good performance when loading the college and all the students in it. Because the data stored in the engine is in continuous. The engine needs fewer I/O to do that.
Cons:
If you have huge number of students in a college, the size of the document will be very big. It will be inefficient if you add/remove/update student frequently.
There is not a quick way to achieve requirement(2). Since we only maintain the mapping from college -> students, you have to walk through all the documents to find out which college contains the specified studentID.
Approach 2 - Student has reference to college
This is the design you mentioned as a Two Collections. It is similar to the RDBMS tables, student model owns a reference key point to its college.
College collection:
{
"collegeid":1234,
"Name":"aaaa",
"otherinfo":1
}
Student collection:
{
"collegeid":1234,
"stdid":1,
"name":"n1"
}
{
"collegeid":1234,
"stdid":2,
"name":"n2"
}
Pros:
Can achieve requirement(1) and (2). Remember to add index on "collegeid" and "stdid" field.
Every document can be maintained in small size, it is easy for the engine to store the data.
Cons:
College and students are separated. It will be slower than Approach 1 if loading a college and all its students(Need two queries).
You need to merge college and students together by yourself before displaying in UI.
Approach 3 - Duplicated data in college and students
This approach looks like we mix up approach 1 and approach 2. We have two collections: college will have its students embeded in itself, and also a separated student collection. So, the student data is duplicated in both collections.
College collection:
{
"collegeid":1234,
"Name":"aaaa",
"otherinfo":1,
"studnet":[ // duplicated here!
{
"stdid":1,
"name":"n1"
},
{
"stdid":2,
"name":"n2"
}
]
}
Student collection:
{
"collegeid":1234,
"stdid":1,
"name":"n1"
}
{
"collegeid":1234,
"stdid":2,
"name":"n2"
}
Pros:
You have all Pros from Approach 1 & Approach 2.
Cons:
The document in college collection will grow to very big.
You have to keep the data from college collection and student collection Synchronous by yourself.
Approach 4 - Duplicated data in college(Only studentID) and students
This is a variant from Approach 3.
We assume that your use case is:
User can search for a college.
User click one college in search result.
A new UI show all student IDs to user(maybe in a grid or list).
User click one student ID.
System loads the full data of the specified student and show to user in another UI.
In a short word, user does not need the full data of all students at the beginning, he just need students' basic info(e.g. student ID). If user accepts such scenario, you can have below model:
College collection:
{
"collegeid":1234,
"Name":"aaaa",
"otherinfo":1,
"studnetIds":[1, 2] // only student IDs are duplicated
}
Student collection:
{
"collegeid":1234,
"stdid":1,
"name":"n1"
}
{
"collegeid":1234,
"stdid":2,
"name":"n2"
}
College only has the studnet IDs in it. This is the difference compared to Approach 3.
Pros:
Can achieve requirement(1) and (2).
You don't need to worry about the college document grows to huge size. Since it only owns the student IDs.
If user accepts above scenario, this will be a better desgin on the balance of performance/complex/data size.
Cons:
Suitable to specified use case, if the requirement is changed in the future, will break the scenario and this model will be not good.
Summary
You should be very clear to your use cases.
Based on use cases, compare the approachs to see whether you can accept the Pros & Cons.
Load testing is important!
I am building a game app using Firebase and Swift, and store my data like this.
players: {
"category": {
"playerId": {
name: "Joe",
score: 3
}
}
}
There are two queries I need to make based on the score:
Get the top 5 players from a single category.
Get the overall top 100 players of the game.
I have no problem getting the data for the first query, using
ref.child("players").child(category).queryOrdered(byChild: "score").queryLimited(toLast:5)
But I am having trouble figuring out the second query. How can I do this efficiently?
You'll either need to add an additional data structure to allow the second query (which is quite common in NoSQL databases) or change your current structure to allow both queries.
The latter could be accomplished by turning the data into a single flat list, with some additional synthetic properties. E.g.
players: {
"playerId": {
category: "category1"
name: "Joe",
score: 3,
category_score: "category1_3"
}
}
With the new additional properties category and category_score, you can get the top scorers for a category with:
ref.child("players")
.queryOrdered(byChild: "category_score")
.queryStarting(atValue:"category1_")
.queryEnding(atValue("category1_~")
.queryLimited(toLast:5)
And the top 100 scores overall with:
ref.child("players")
.queryOrdered(byChild: "score")
.queryLimited(toLast:100)
For more on this, see my answer here: Firebase Database queries can only order/filter on a single property. In many cases it is possible to combine the values you want to filter on into a single (synthetic) property. For an example of this and other approaches, see my answer here: http://stackoverflow.com/questions/26700924/query-based-on-multiple-where-clauses-in-firebase.
I'm trying to design a schema for Products, Suppliers and Manufacturers:
A product can have many suppliers but only 1 manufacturer.
A supplier can have many products and many manufacturers.
A manufacturer can have many suppliers and many products.
I've reviewed this page where 10gen indicates "To avoid mutable, growing arrays, store the publisher reference inside the book document". In my example I consider the product-->manufacturer relationship to be equivalent to that of book-->publisher. I would therefore do this:
{
_id: "widgets",
manufacturer_name: "Widgets Inc.",
founded: 1980,
location: "CA"
}
{
_id: 123456789,
product_name: "Steel Widget",
color: "Steel",
manufacturer_id: "widgets"
}
{
_id: 223456789,
product_name: "White Widget",
color: "White",
manufacturer_id: "widgets"
}
What is the best way to handle the SUPPLIER (with relationships to many products and many manufacturers) such that I "avoid mutable, growing arrays"??
Note: This is one way to model it. Data modeling has a lot to do with the use cases and the according questions you want to ask. Your use case might need a different model.
I'd probably model it like this
manufacturer
{
_id:"ACME",
name: "ACME Corporation"
…
}
products
{
_id:ObjectId(...),
manufacturer: "ACME",
name: "SuperFoo",
description: "Without SuperFoo, you can't bar or baz!",
…
}
Now comes the problem. Since potentially, if we embed all the products in a supplier document or vice versa, we could break the 16MB size limit easily, I'd use a different approach:
Supplier:
{
_id:ObjectId(...),
"name": "FooMart",
"location: { city:"Cologne",state:"MN",country:"US",geo:[44.770833,-93.783056]}
}
productSuppliers:
{
_id:ObjectId(...),
product:ObjectId(...),
supplier:ObjectId(...)
}
This way, each product can have gazillions of suppliers. Now here come the drawbacks.
For finding a supplier for a certain product, you have to do a two step query. First, you have to find all supplier IDs for a given product:
db.productSuppliers.find({product:<some ObjectId>},{_id:0,supplier:1})
Now, let's say we want to find all suppliers of SuperFoo near Cologne,MN at a max distance of 10 miles:
db.suppliers.find({
_id:{$in:[<ObjectIds of suppliers from the first query>]},
"location.geo": { $near :
{
$geometry: { type: "Point", coordinates: [ 44.770833, -93.783056] },
$maxDistance: 16093
}
}
})
You need to do quite some smart indexing to make these queries efficient. Which indices you need depends on your use case. The problem with indices is that they are only efficient when kept in RAM. So you really should be careful when creating your indices.
Again: The way you model data heavily depends on your use cases. In case you only have a few products per manufacturer or only a few suppliers per product or if each supplier only supplies products by a certain manufacturer, the model may look quite different.
Please have a close look at MongoDB's data modeling documentation. Most problems with MongoDB stem from wrong data modeling for the respective use case.
There are several collections, i.e. Country, Province, City, Univ.
Just like in the real world, every country has several provinces, and every province has several cities, and every city has several universities.
How can I know whether a university is in the given country?For example, country0 may have some universities, what are their _ids?
Documents in those collections are showed below:
{
_id:"country0",
provinces:[
{
$ref:"Province",
$id:"province0"
},
...
]
}
{
_id:"province0",
belongs:{$ref:"Country", $id:"country0"},
cities:[
{
$ref:"City",
$id:"city0"
}
...
]
}
{
_id:"city0",
belongsTo:{$ref:"Province",$id:"province0"},
univs:[
{
$ref:"Univ",
$id:"univ0"
}
...
]
}
{
_id:"univ0",
address:{$ref:"City", $id:"city0"}
}
If there are only two collections, I know fetch() may be useful.
Also, python drivers may be useful, but I can't know their performance well, because I can't use db.system.profile in a .py file.
MongoDB doesn't do joins. N queries are required to get information from N collections. In this situation, to get the _id's of universities in a given country in an array one could do the following (in the mongo shell):
> var country = db.countries.findOne({ "_id": "country0" });
> var province_ids = [];
> country.provinces.forEach(function(province) { province_ids.push(province["$id"]); });
> var provinces = db.provinces.find({ "_id": { "$in": province_ids });
> var city_ids = [];
> provinces.forEach(function(province) { province.cities.forEach(function(city) { city_ids.push(city["$id"]); }); });
> var cities = db.cities.find({ "_id": { "$in": city_ids } });
> univ_ids = [];
> cities.forEach(function(city) { city.univs.forEach(function(univ) { univ_ids.push(univ["$id"]); }); });
It's also possible to accomplish the same thing using the belongsTo field, using similar steps. This is cumbersome and it seems like there should be a better way. There is! Normalize the data. Countries have provinces, which have cities, which have universities, but the relationships are fixed and not of huge cardinality. For doing queries like "what universities are in a given country?" I would suggest storing province documents entirely within countries and university documents entirely within city documents. You could store cities inside of province documents, or inside country documents directly, but a province or country could have hundreds or thousands of cities and this might be too much information for one document (16MB limit per document in MongoDB). Having provinces in countries and universities in cities reduces the number of queries necessary to two.
Another option is to store more information in each child document. Essentially you have a forest (a collection of trees): countries are parents of provinces which are parents of cities which are parents of universities. The belongsTo field is a parent reference. You could store a reference to all ancestors instead of just the parent. Then finding all universities in a given country is one query on the universities collection.
> db.universities.findOne();
{
_id: "univ0",
city: "city0",
province: "province0",
country: "country0"
}
> db.universities.find({ "country": "country0" });
The schema design that is best for you depends on the types of queries your application will need to answer and their relative frequencies and importance. I can't determine that from your question so I can't firmly recommend one schema over another.
As to your mini-question about performance and the db.system.profile collection, note that db.system.profile is a collection. You can query it from a .py file using a driver.
I am having some difficulty structuring my data so that I can benefit from reactivity in Meteor. Mainly nesting arrays of objects makes queries tricky.
The three main views I am trying to project this data onto are
waiter: shows order for one table, each persons meal (items nested, essentially what I have below)
kitchen manager: columns of orders by table (only needs table, note, and the items)
cook: columns of items, by category where started=true (only need item info)
Currently I have a meteor collection of order objects like this:
Order {
table: "1",
waiter: "neil",
note: "note from kitchen",
meals: [
{
seat: "1",
items: [ {n: "potato", category: "fryer", started: false },
{n: "water", category: "drink" }
]
},
{
seat: "2",
items: [ {n: "water", category: "drink" } ]
},
]
}
Is there any way to query inside the nested array and apply some projection, or do I need to look at an entirely different data model?
Assuming you're building this app for one restaurant, there shouldn't be many active orders at any given time—presumably you don't have thousands of tables. Even if you want to keep the orders in the database after they're served, you could add a field active to separate out the current ones.
Then your query is simple: activeOrders = Orders.find({active: true}).fetch(). The fetch returns an array, which you could loop through several times for each of your views, using nested if and for loops as necessary to dig down into the child objects. See also Underscore's _.pluck. You don't need to get everything right with some complicated Mongo query, and in fact your app will run faster if you query once and process the results however many times you need to.