I'm creating a web platform on AWS to enable restaurant owners in my community to create menus. I'm moving from a relational database model to a NoSQL solution and wondering the best way to organize this data. My current relational model is as follows:
Table 'restaurants': id (int / primary key), name, owner (int)
Table 'categories': id (int / primary key), restaurant (int), parent (int)
Table 'items': id (primary key), name, category (int)
Only the owner should be allowed to create/update/delete places, categories, and items.
What would you recommend as a de-normalized solution given the ownership constraint? I was thinking of doing the following:
Table 'restaurants': id (primary key), owner (sort key), categories (list of ids)
Table 'categories': id (primary key), restaurant (id), items (list of item objects), subcategories (list of category ids)
Wondering if it'd be better to have all category data contained within the restaurant table. As an example, a user should only be able to add an item to a category if they are the owner of the associated restaurant, which would take an additional query, per above.
Depends mostly how you use your data . If usually the Restaurant is read full, is ok to have all in the restaurants table.
If you have a lot of operations only on one category , for example many are interested only in food and not interested in drinks , then it would be good to have this done on categories.
I think for some restaurants would be better to have it split in categories and keep common data on restaurant level , address, phone , opening hours and so on .
I don't think write is important , seems to be over 90% read web site.0
Perhaps a cache solution ? Redis ? Memcache ? this would speed up even more.
Related
The current scene is this:
There is a category table [category], the number of records is only more than 50, almost no increase, and the modification is rare.
There is a product table [product] currently millions of levels, will always increase.
These two are many-to-many relationships. One category will have more products, and each product will have multiple categories.
The category list is almost not changed, and there are about 1000 products in a category, and the list of a category will be changed not frequently.
Query requirements:
Query all categories (excluding the list of products under the category)
Query the category list by product_id
Query the product list by category_id
Operational requirements:
Modify the product list in category (add/delete a product to a category, sort the product list in a category, so the product list in category needs order.)
How many-to-many design of this kind of scene is better, there are some points:
1. If you follow the design of the SQL database, add a Category<-->Product relation table.
[Question] The order of each category of products is not well maintained. For example, the front-end performs a large-scale adjustment order on a category of products, and then requests it. The Category<-->Product relation table also needs to add an index field to indicate the order, and needs to update a lot of records. It is not particularly friendly to the operation requirements, is there any What can be optimized?
2. The way of NOSQL. Add a products:[] directly to the category to indicate a list of items in this category.
[Evaluation] In the query requirement, there is a requirement to query all categories (excluding the list of products under the category), which will pull out a lot of unnecessary data (products) at one time. Not applicable.
3. Add products:[] in the Category<-->Product association table
[Question] This can meet the operational requirements, but if you want to meet the Query requirments-2 [Query the category list by product_id], how to query it and will there be performance problems?
You need a third table (junction table) to complete the relationship. The keys must be primary keys along with a foreign key constraint.
tblProductCategories
PK product_id FK
PK category_id FK
I'm new to database design and I am working on a project that requires the use of a single entity (medication) that could be tied to any number of patients and each patient could have a different dosage. What would be the best way to layout a table for this type of situation. I could use a single table and just store each individual medication and dosage and tie that to the unique patient. But that would give me duplicate entries in the medication table (same med with just different dosage).
What I would like is to have a single entry for each medication name and have each patient have a unique dosage for that particular med. Of course a single patient could also have many different medications so I would have to be able to have a unique dosage for each med for different patients.
I using entity framework model first approach. Would I use a single table T_Patient_Medication and use each of the two table IDs as the primary key combo and then use a dosage field for that combination? If so how would I create the association to tie this table to the other two. Any suggestions?
Off the top of my head:
-a medication table(MedicineId, MedicineName, etc).
-a patient table(PatientId, PatientName, etc)
-a patient-medicine table(MedicineId, PatientId, Dosage, date, notes etc).
In other words, the medication table contains on row per unique med, a patient contains one row for each unique patient.
The patient-medicine table is where these two things meet: it contains a patientId, a medicineId and then anything else unique about that patient getting that medicine (i.e. Dr. name, dosage, date started etc). Personally I would make each row in the patient-medicine table also have its own unqiue ID separate from the combination of the patientid and medicine id (what are you going to do when the same patient goes back on the same medicine at a different time, if your primary key is Patientid+Medicineid). Each record should have its own unique id in my way of thinking.
There would be foreign keys between the tables to enforce this relationship: i.e. you can't add a row to the patient-medicine table unless the patientid exists in the patient table, and the medicine exists in the medicine table; and equally important prevent you from deleting a rows from tables where there are dependent records in other tables. If you take the time and setup all those foreign keys (relationships), it will be a breeze in EF to navigate the related records.
It is without a doubt more complicated than this, but that is the basics idea of a relational table.
I have read Eric Evans' Domain Driven Design book and I have been trying to apply some of the concepts.
In his book, Eric talks about aggregates and how aggregate roots should have a unique global id whereas aggregate members should have a unique local id. I have been trying to apply that concept to my database tables and I'm running into some issues.
I have two tables in my PostgreSQL database: facilities and employees where employees can be assigned to a single facility.
In the past, I would lay out the employees table as follows:
CREATE TABLE "employees" (
"employeeid" serial NOT NULL PRIMARY KEY,
"facilityid" integer NOT NULL,
...
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is a globally unique id. I would then add code in the backend for access control validation, preventing users of one facility from accessing rows pertaining to other facilities. I have a feeling this might not be the safest way to do it.
What I am now considering is this layout:
CREATE TABLE "employees" (
"employeeid" integer NOT NULL,
"facilityid" integer NOT NULL,
...
PRIMARY KEY ("employeeid", "facilityid"),
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is unique (locally) for a given facilityid but needs to be paired with a facilityid to be unique globally.
Concretely, this is what I am looking for:
Employee A (employeeid: 1, facilityid: 1)
Employee B (employeeid: 2, facilityid: 1)
Employee C (employeeid: 1, facilityid: 2)
where A, B and C are 3 distinct employees and...
adding an employee D to facility 1 would give him the keys (employeeid : 3, facilityid: 1)
adding an employee E to facility 2 would give him the keys (employeeid : 2, facilityid: 2)
I see two ways of achieving this:
I could use triggers or stored procedures to automatically generate new employeeids and store the last ids for every facility in another table for quicker access but I am concerned about concurrency issues and ending up with 2 employees from the same facility with the same id.
I could possibly create a new sequence for each facility to manage the employeeids but I fear ending up with thousands of sequences to manage and with procedures to delete those sequences in case a facility is deleted. Is there anything wrong with this? It seems heavy to me.
Which approach should I take? Is there anything I'm missing out on?
I am inferring from your question that you will be running a single database for all facilities, or at least that if you have a local database as the "master" for each facility that the data will need to be combined in a central database without collisions.
I would make the facilityid the high order part of the primary key. You could probably assign new employee numbers using a simple SELECT max(employeeid) + 1 ... WHERE facilityid = n approach, since adding employees to any one facility is presumably not something that happens hundreds of times per second from multiple concurrent sources. There is some chance that this could generate an occasional serialization failure, but it is my opinion that any database access should be through a framework which recognizes those and automatically retries the transaction.
I guess you overstressed the aggregate root concept here. In my understanding of modelling an employee (that depends on your context) an employee is almost always an aggregate root possibly referenced by another aggregate root facility.
Both employee and facility almost always have natural keys. For the employee this is typically some employee id (printed on employee identification badges, or at least maintained in the human resources software system) and facilities have this natural keys too almost always containing some location part and some number like "MUC-1" for facility 1 located in munich. But that all depends on your context. In case employee and facility have this natural keys your database model should be quite clear.
I've been wondering how facebook manages the database design for all the different things that you can "like". If there is only one thing to like, this is simple, just a foreign key to what you like and a foreign key to who you are.
But there must be hundreds of different tables that you can "like" on facebook. How do they store the likes?
If you want to represent this sort of structure in a relational database, then you need to use a hierarchy normally referred to as table inheritance. In table inheritance, you have a single table that defines a parent type, then child tables whose primary keys are also foreign keys back to the parent.
Using the Facebook example, you might have something like this:
User
------------
UserId (PK)
Item
-------------
ItemId (PK)
ItemType (discriminator column)
OwnerId (FK to User)
Status
------------
ItemId (PK, FK to Item)
StatusText
RelationshipUpdate
------------------
ItemId (PK, FK to Item)
RelationshipStatus
RelationTo (FK to User)
Like
------------
OwnerId (FK to User)
ItemId (FK to Item)
Compound PK of OwnerId, ItemId
In the interest completeness, it's worth noting that Facebook doesn't use an RDBMS for this sort of thing. They have opted for a NoSQL solution for this sort of storage. However, this is one way of storing such loosely-coupled information within an RDBMS.
Facebook does not have traditional foreign keys and such, as they don't use relational databases for most of their data storage. Simply, they don't cut it for that.
However they use several NoSQL type data stores. The "Like" is most likely attributed based on a service, probably setup in an SOA style manner throughout their infrastructure. This way the "Like" can basically be attributed to anything they want it to be associated with. All this, with vast scalability and no tightly coupled relational issues to deal with. Something that Facebook, can't really afford to deal with at the volume they operate.
They could also be using an AOP (Aspect Oriented Programming) style processing mechanism to "attach" a "Like" to anything that may need one at page rendering time, but I get the notion that it is asynchronous processing via JavaScript against an SOA style web service or other delivery mechanism.
Either way, I'd love to hear how they have this setup from an architecture perspective myself. Considering their volume, even the simple "Like" button becomes a significant implementation of technology.
You can have a table with Id, ForeignId and Type. Type can be anything like Photo, Status, Event, etc… ForeignId would be the id of the record in the table Type. This makes possible for both comments and likes. You only need one table for all likes, one for all comments and the one I described.
Example:
Items
Id | Foreign Id | Type
----+-------------+--------
1 | 322 | Photo
4 | 346 | Status
Likes
Id | User Id | Item Id
----+-------------+--------
1 | 111 | 1
Here, user with Id 111 likes the photo with Id 322.
Note: I assume you are using an RDBMS, but see Adron's answer. Facebook does not use an RDBMS for most of their data.
I'm pretty sure Facebook does not store "like" information as how some other suggested it using RDBMS. With millions of users and possibly thousands of like, we're looking at thousands of rows to join here which would impact performance.
The best approach here is to append all "likes" in a single row. For example, a table with user_like_id column of text datatype. Then all id's who liked the post is appended. In this case, you only query one row and you got everything. This will be a lot faster than joining tables and getting counts.
EDIT: I haven't been here on this site lately and I just discovered this answer has been downvoted. Well, here's an example post with like count and their avatars. This is my design where I just implemented what I'm talking about.
The two components here are 1.) XREF table and 2.) JSON object.
The likes are still stored on a XREF table. But at the same time, data is appended on JSON object and stored on a text column on the post table.
Why did I store the likes info on a text column as JSON? So that there's no need to do db lookup/joins for the likes. If someone unlike the post, the JSON object is just updated.
Now I don't know why this answer is downvoted by some users here. This answer provides quick data retrieval. This is close to NoSQL approach which is how FB access data. In this case, there's no need for extra joins/lookup to get likes info.
And here's the table that holds the likes. It's just a simple XREF mapping between user and item table.
I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.