Aggregate Root Choice (for all Football/Soccer fans) - entity-framework

I have been reading StackOverflow for weeks, but I still could not decide whether my DDD Aggregate Root choice is correct. Long story short -- here are the entities. It is about the football/soccer domain:
League, Team and Match
Each Team can participate in one or more Leagues by playing Matches (i.e. English Premier League, UEFA Champions League). Each team has HomeMatches and AwayMatches in a certain League. Each Match has a League, HomeTeam and AwayTeam.
Each League has many Matches.
I think I need to have two repositories -- LeagueRepository where I can get all the matches for a certain league for a certain period. Through this repository I will automatically update the database when a round of matches has been played and I will record the results accordingly.
I also need a TeamRepository, where I can get all the Matches for a certain team in different leagues for different periods of time. This is for statistical purposes, i.e. give me all Liverpool Home Matches in the English Premier League for the last 10 years. Yeah, you guessed right -- it is about betting chances and odds calculations :)
Long story short -- my Domain is the Football/Soccer World. Those of you who follow the sport know those details and what a League, Team and Match is.
Is it OK to have two separate aggregate roots -- League and Team. I can reach a given match through either one of them. Is this OK with the DDD design?
Or should I introduce a new Entity called Sport and make it the sole aggregate root. Then a Sport will have many Leagues and many Teams.
I am using EF code-first approach and I am trying to identify my Repositories and aggregate roots. If you were designing this database, how would you structure those three entities -- League, Team and Match. Of course we are over-simplifying things here.
All your thoughts and comments will be greatly appreciated.
Thanks in advance.

To me, Sport seems like a waste of a concept. If you don't need to model, soccer, baseball, basketball, etc then I imagine the sport model would be mostly empty and a waste of space. If you think of your program in terms of teams and leauges, then that is the two repositories I would stick with. What advantage to a Sport repository do you see other then just having one root?

Three white papers that you should read to better understand how you should go about choosing your aggregates: http://dddcommunity.org/library/vernon_2011/

Related

Practical usage of noSQL [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I’m starting a new web project and have to decide what database to use. I know, the question is very long but please bear with me on this.
I am very familiar with relational databases and have used frameworks like hibernate to get my data from the DB into Objects. But I have no experience with noSQL DBs. I am aware of the concepts of Document, Key-Value, etc. types.
While I do my research one question pops out every time and I don’t know how someone would handle this in noSQL DBs like MongoDB or any other Document-Typed noSQL DB where consistency takes top priority.
For example: let’s assume that we are creating a small shopping management system where customers can buy and sell stuff.
We have:
CUSTOMERs
ORDERs
PRODUCTs
A single CUSTOMER can have multiple ORDERs and an ORDER can have multiple PRODUCTs.
In a traditional RDBMS I would of course have 3 tables.
In the first version of our application, the front end for the customer should display his/her personal data, ORDERs and all the PRODUCTs he or she bought per order. Also which products are available for sale. So I guess in noSQL I would model the CUSTOMER class like this:
{
"id": 993784,
"firstname": "John",
"lastname": "Doe",
"orders": [
{
"id": 3234,
"quantity": 4,
"products": [
{
"id:" 378234,
"type": "TV",
"resolution": "1920x1080",
"screenSize":37,
"price": 999
}
]
}
],
"products": [
{
"id:" 7932,
"type": "car",
"sold": false,
"horsepower": 90
}
]
}
But later I want to extend my application to have 3 different UIs instead of only the first one:
The CUSTOMER Dashboard where a customer can view all his/her orders.
The PRODUCT Dashboard where a customer can add or remove products in his/her store.
THE SOLD Dashboard where a customer can view all sold PRODUCTs ready for shipping.
One very important thing to consider (the reason why I even bother asking this question): I want to be flexible with the classes like PRODUCT because products can have different properties. For Example: A TV has screen size and resolution while a car has horsepower and other properties. And if a user adds a new product, he or she should be able to dynamically add those properties depending on what he/she knows about it.
Now to some practical use cases of two fictional users Jane and John:
Let's say, Jane buys from John. Does that mean i have to create the PRODUCTs two times? One time as a child of Jane's ORDER and another time to stay in the "products" property of John?
Later Jane wants to view all products that are available from any user. Do i have to load every user to query the "products" property to generate a list of all products?
In version 2 of the application i want to enable John to view all outgoing orders (not orders he made but orders from other users who bought stuff from him) instead of viewing all sold products. How would this be done in noSQL? Would i now need to create an "outgoing" array of orders and duplicate them? (an outgoing order of Jane is an incoming order of John)
Some of you may say that noSQL is not right for this use case but isn’t that very common? Especially when we do not know what the future brings? If it does not fit for this use case, what use case would it fit into? Only baby applications (I guess not)? Wasn’t noSQL designed for more complex and flexible data?
Thank you very much for your advises and opinions!
EDIT 1:
Because this question was put on hold because of the unprecise question:
I made a very clear and simple example. So my question is not general about the use of noSQL but how to handle this specific example. How would a experienced noSQL user handle this use case? How to model this data? A recommendation to simply not use noSQL at all for this use case is also a valid answer to me.
I simply want to know how to use a noSQL database but still be able to manage entities and avoid redundancy.
For example: Are MongoDB's DBRefs/Manual refs a good way to achieve this? Performance issues because of multiple queries? What else to think about? I guess these questions can probably be answered quite well.
There probably isn't the one right answer to your question. But I'll make a start.
While it is technically possible in NoSQL to store some business entity together with all entities that are transitively linked with it (like Customer, Order, Product), it is't always clever to do so. The traditional reasons for separating entities, namely redundancies and therefore update and delete anomalies, don't just go away because a different platform is used.
So if you stored the product description with every customer who buys or sells this product, you will get update anomalies. If you have to change the screen size from 37 to 35, you'll have to find all customer records containing this product, which can be quite cumbersome.
Also, building up such a deep nested structure favors one direction of evaluating those structures over all other directions. If you put all orders and products into the customer document, this is very fine for getting a comprehensive view for a customer: whatever she bought throughout her lifetime. But if you want to query your database by orders (which orders need to be fulfilled tonight?) or products (who ordered product 1234?) you'll have to load tons of data that are of no interest to this query.
Similar questions are due to storing all orders with a customer. Old orders will sometimes still be of interest, so they may not be deleted. But do you want to load lots of orders everytime you load the customer?
This doesn't mean not to make use of the complex structuring made possible by a document store. As a rule of thumb, I would suggest: As long as the nested information belongs to the same business entity, put it into one document. If, e.g., the product description has some hierarchic structure, like nested sections consisting of text, pics, and videos, they may all go into one document. But entities with a totally different life cycle, like customers, orders, and suppliers, should be kept separate. Another indicator is references: A product will frequently be referenced as a whole, e.g. when it is ordered by a customer or ordered from a supplier. But the different parts of the product description may possibly never be referenced from the outside.
This rule of thumb wasn't completely precise, and it's not supposed to be. One person's business entity is another person's dumb attribute. Imagine the color of a car: For the car owner, it's just a piece of information describing a car. For the manufacturer, it's a business entity, having an availability, a price, one or more suppliers, a way of handling it, etc.
Your question also touches the aspect of dynamically adding attributes. This is often praised as one of the goodies of NoSQL, but it's no free lunch. Let's assume, as you mentioned, that the user may add attributes. That's technically possible, but how will these attributes be processed by the system? There won't be a specific view, nor specific business rules, for those attributes. So the best the system can do is offer some generic mechanism for displaying those attributes that were defined at runtime and never reflected in the program code.
This doesn't mean the feature is useless. Imagine your product description may be complex, as described above. You might build a generic mechanism to display (and edit) descriptions made up of sections, texts, images, etc., and afterwards the users may enter descriptions of unlimited width and depth. But in contrast, imagine your user will add a tiny delivery date attribute to the order. Unless the system knows specifically how to interpret this date, it will just be a dumb piece of information without any effect.
Now imagine not the user, but the developer adds new attributes. She has the opportunity to enhance the code at the same time, e.g. building some functionality around delivery dates. But this means that, although the database doesn't require it by its own, a new release of the software needs to be rolled out to make use of the new information.
The absence of a database scheme even makes the programmer's task more complicated. When a relational table has a certain column, you may be sure that each of its records has this column. If you want to make sure that it has a meaningful value, make it not null, and you may be sure that each record contains a value of the correct data type. Nothing like that is guaranteed by schemaless databases. So, when reading a record, defensive programming is needed to find out which parts are present, and whether they have the expected content. The same holds for database maintenance via administrative tools. Adding an attribute and initializing it with a default value is a 2-liner in SQL, or a couple of mouse clicks in pgadmin. For a schemaless database, you will write a short program on your own to achieve this.
This doesn't mean that I dislike NoSQL databases. But I think the "schemaless" characteristic is sometimes overestimated, and I wouldn't make it the main, or only, reason to employ such a database.

Schema.org - Correct way to set up a football (soccer) club with multiple teams

I have a football club - Deal Community Sports FC, which has two teams - First Team and Reserves.
I began implementing the sportsTeam markup from schema.org for Deal Community Sports, but then ran into a brick wall of confusion when it came to including the two teams as a part of the club as a whole.
Should I be marking up Deal Community Sports as an organization or sportsClub and then including the two teams as members, or is there another more suitable way to do it? Ideally I don't want to have the club and each team as entirely separate entities, as this does not seem right.
Any suggestions?
For the whole club, you could use one of these types (whichever matches according to your understanding of how the club works): SportsOrganization or the more specific SportsTeam.
For each team, use SportsTeam.
And to relate these entities, you could use one of these properties (whichever is appropriate according to your understanding of the club):
department
member
subOrganization

MongoDB model design for meteorjs app

I'm more used to a relational database and am having a hard time thinking about how to design my database in mongoDB, and am even more unclear when taking into account some of the special considerations of database design for meteorjs, where I understand you often prefer separate collections over embedded documents/data in order to make better use of some of the benefits you get from collections.
Let's say I want to track students progress in high school. They need to complete certain required classes each school year in order to progress to the next year (freshman, sophomore, junior, senior), and they can also complete some electives. I need to track when the students complete each requirement or elective. And the requirements may change slightly from year to year, but I need to remember for example that Johnny completed all of the freshman requirements as they existed two years ago.
So I have:
Students
Requirements
Electives
Grades (frosh, etc.)
Years
Mostly, I'm trying to think about how to set up the requirements. In a relational DB, I'd have a table of requirements, with className, grade, and year, and a table of student_requirements, that tracks the students as they complete each requirement. But I'm thinking in MongoDB/meteorjs, I'd have a model for each grade/level that gets stored with a studentID and initially instantiates with false values for each requirement, like:
{
student: [studentID],
class: 'freshman'
year: 2014,
requirements: {
class1: false,
class2: false
}
}
and as the student completes a requirement, it updates like:
{
student: [studentID],
class: 'freshman'
year: 2014,
requirements: {
class1: false,
class2: [completionDateTime]
}
}
So in this way, each student will collect four Requirements documents, which are somewhat dictated by their initial instantiation values. And instead of the actual requirements for each grade/year living in the database, they would essentially live in the code itself.
Some of the actions I would like to be able to support are marking off requirements across a set of students at one time, and showing a grid of users/requirements to see who needs what.
Does this sound reasonable? Or is there a better way to approach this? I'm pretty early in this application and am hoping to avoid painting myself into a corner. Any help suggestion is appreciated. Thanks! :-)
Currently I'm thinking about my application data design too. I've read the examples in the MongoDB manual
look up MongoDB manual data model design - docs.mongodb.org/manual/core/data-model-design/
and here -> MongoDB manual one to one relationship - docs.mongodb.org/manual/tutorial/model-embedded-one-to-one-relationships-between-documents/
(sorry I can't post more than one link at the moment in an answer)
They say:
In general, use embedded data models when:
you have “contains” relationships between entities.
you have one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents.
The normalized approach uses a reference in a document, to another document. Just like in the Meteor.js book. They create a web app which shows posts, and each post has a set of comments. They use two collections, the posts and the comments. When adding a comment it's submitted together with the post_id.
So in your example you have a students collection. And each student has to fulfill requirements? And each student has his own requirements like a post has his own comments?
Then I would handle it like they did in the book. With two collections. I think that should be the normalized approach, not the embedded.
I'm a little confused myself, so maybe you can tell me, if my answer makes sense.
Maybe you can help me too? I'm trying to make a app that manages a flea market.
Users of the app create events.
The creator of the event invites users to be cashiers for that event.
Users create lists of stuff they want to sell. Max. number of lists/sellers per event. Max. number of position on a list (25/50).
Cashiers type in the positions of those lists at the event, to track what is sold.
Event creators make billings for the sold stuff of each list, to hand out the money afterwards.
I'm confused how to set up the data design. I need Events and Lists. Do I use the normalized approach, or the embedded one?
Edit:
After reading percona.com/blog/2013/08/01/schema-design-in-mongodb-vs-schema-design-in-mysql/ I found following advice:
If you read people information 99% of the time, having 2 separate collections can be a good solution: it avoids keeping in memory data is almost never used (passport information) and when you need to have all information for a given person, it may be acceptable to do the join in the application.
Same thing if you want to display the name of people on one screen and the passport information on another screen.
But if you want to display all information for a given person, storing everything in the same collection (with embedding or with a flat structure) is likely to be the best solution

Introduction to object databases

I'm trying to understand the idea of noSQL databases, to be more precise, the concept behind neo4j graph database. I have experience with SQL databases (MySQL, MS SQL), but the limitations of managing hierarchical data made me to expand my knowledge. But now I have some questions and I can't find their answers (maybe I don't know what to search).
Imagine we have list of countries in the world. Each country has it's GDP every year. Each country has it's GDP calculated by different sources - World Bank, their government, CIA etc. What's the best way to organise data in this case?
The simplest thing which came in mind is to have the node (the values are imaginary):
China:
GDPByWorldBank2012: 999,
GDPByCIA2011: 994,
GDPByGovernment2012: 1102,
In relational database, I would split the data in three tables: Countries, Sources and Values, where in Values I would have value of GDP, year, id of the country and id of the source.
Other thing which came in mind is to create nodes CIA, World bank, but node Government looks really weird. Even though, the idea is to have relationships (valueIfGDP):
CIA -> valueOfGDP - {year: 2011, value: 994} -> China
World Bank -> valueOfGDP - {year: 2012, value: 999} -> China
This looks pretty weird for me, what is more, what happens when we add the values for all the years from one source? We would have multiple relationships or what?
I'm sorry if my questions are too dumb and I would be happy if someone explain me or show me what book/article to read.
Thanks in advance. :)
Your questions are very legit and you're not the only one having difficulties to grasp graph modelling at first ;)
It is always easier to start thinking about the questions you wanna answer with your data before modelling it up front.
Let's imagine you wanna retrieve the GDP of year 2012 computed by CIA of all countries.
A simple way to achieve this is to label country nodes uniformly, and set an attribute name that obviously depends on the country name.
Moreover, CIA/WorldBank/Government in this domain are all "sources", let's label them uniformly as well.
For instance, that could give something like:
(ORGANIZATION {name: CIA})-[:HAS_COMPUTED_GDP {year:2011, value:994}]->(COUNTRY {name:China})
With Cypher Query Language, following this model, you would execute the following query:
START cia = node:nodes(name = "CIA")
MATCH cia-[gdp:HAS_COMPUTED_GDP]->(country)
WHERE gdp.year = 2012
RETURN cia, country, gdp
In this query, I used an index lookup as a starting point (rather than IDs which are a internal technical notion that shouldn't be used) to retrieve CIA by name and match the relevant subgraph to finally return CIA, the GDP relationships and their linked countries matching the input constraints.
Although Neo4J is totally schemaless, this does not mean you should necessarily have a totally flexible data model. Having a little structure will always help to make your queries or traversals easier to read.
If you're not familiar with Cypher Query Language (which is not the only way to read or write data into the graph), have a look at the excellent documentation of Neo4J (Cypher: http://docs.neo4j.org/chunked/stable/cypher-query-lang.html, complete: http://docs.neo4j.org/chunked/stable/index.html) and try some queries there: http://console.neo4j.org/!
And to answer your second question, if you wanna add another year of GDP computations, this will just boil down to adding new relationship "HAS_COMPUTED_GDP" between the organizations and the countries, no more no less.
Hope it helps :)

Am i violating any NF RULE on my database design?

Good day!
I am a newbie on creating database... I need to create a db for my recruitment web application.
My database schema is as follows:
NOTE: I included the applicant_id on other tables... e.g. exam, interview, exam type.
Am i violating any normalization rule? If i do, what do you recommend to improve my design? Thank you
Overall looks good. A few minor points to consider:
Interviewer is also a person. You will need to use program logic to prevent different / misspellings.
The longest real life email address I've seen was 62 characters.
In exam you use the reserved word date for a column name
(subjective) I would rename applicant_date to applied_at
I don't see a postal / zip code for the applicant
All result columns are VARCHAR(4). If they use the same values, can they be normalized?
Birthdate is better to store then age. You don't want to schedule someone for an interview on their birthdate (or if you're cruel by nature, you do want that :) ). Age can be derived from it and will also be correct at all times.
EDIT:
Given that result is PASS or FAIL, simply declare the field a boolean and name it 'passed'. A lot faster.
One area where I could see a potential problem is the Interviewer being integrated in interview. Also I would like to point at the source channel in applicant, which could potentially get blobbed (depending of what you're going to store in there).
You don't seem to be violating any normalization rules upon first glance. It's not clear from your schema design, however, that the applicant_id is a referencing the applicant table. Make sure you declare it as a foreign key that references the applicant table when actually implementing the scehma.
Not to make any assumptions on your data, but can the result of a screening be stored in 4 characters?
Age and gender are generally illegal questions to ask in interviews so you may not want to record such things. You might want a separate interviewer table. You also might want a separate table that stores qualifications so you can search for people you have interviewed with C# knowledge when the next opening comes up. I'd probably do something like a Qualifications table that is the lookup for quals you want to add to the applicant qualfications table. Then you'd need the qualification id, applicantId, years, skill level in the Applicant Qualification table.
I notice results is a varchar 4 field, I assume you are planning to put Pass/fail in it. I would consider having a numeric score as well. The guy who got 80% of the questions right passed but the guy who got 100% of them right might be the better candidate. In fact for interviews I might have interview questions and results tables. Then you can record the score and any comments about each question which can help later in evaluation of a lot of candidates. We did this manually in paper spreadsheets once when we were interviewing several hundred people (we had over a hundred openings at the time and this was way before personal computers) and found it most helpful to be able to compare answers to questions. It's hard to remember 200 people you interviewed and who said what. It might help later when you have a new opening to find the people who were strong onthe questions most pertinent to the new job who might not have been given a job at the time of the interview(5 excellent candidates, 1 job for instance).
I might also consider a field to mark if the candidate is unaccepatble for ever hiring for some reason. Such as he committed a felony or he lied on the resume and you caught him or he was just totally clueless in the interview. This can make it easy to prevent this person from being considered repeatedly.
I think that your DB structure has a lot of limitation for future usage. For example you can even have a description of the exam because this stable store the score and exam date. It may by that this kind of information are already stored in another system and you have to design only the result container. But even then the exam, screen and interview are just a form of test, that why the information about should be stored in one table and distinguished by some type id. If you decide to this approach you have to create another table to store the information about result
So the definition of that should look more like this:
TEST
TEST_ID
TEST_TYPE_ID ref TEST_TYPE - Table that define the test type
TEST_REQUIRED_SCORE - The value of the score that need to be reach to pass the exam.
... - Many others properties of TEST like duration, expire date, active inactive etc.
APPLICANT_RESULTS
APPLICANT_ID ref APPLICANT
TEST_ID = ref TEST
TESTS_DATE - The day of exam
TEST_START - The time when the test has started
TEST_FINISH - The time when the test has ended
APPLICANT_RESULT - The applicant result of taken test.
This kind of structure is more flexible and give the easy way to specify the requirements between the test in table like this
TEST_REQUIREMENTS - Table that specify the test hierarchy and limitation
TEST_ID ref TEST
REQUIRED_TEST ref TEST
ORDER - the order of exams
Another scenario is that in the future your employer will want to switch to an e-exam system. In that case only think what you will need are:
Create table that will store the question definition (one question can be used in exam, screen or interview)
Crate table that will store the question answers.
Create table that will store the information about the test question.
Create table for storing the answer for each question given from applicant.
A trigger that will update the over all score of test.