Relational database, multiple master detail - rdbms

I am designing a relational database for storing application information.
Each application can contain participants of different types: legal persons and physical persons.
Form of a legal person and physical persons are different and contains different fields.
I want to have a common pk for participants therefore I created three tables:
participants - here I store common information;
participants_phys_dtls - here I store fields of physical person;
participants_lgl_dtls - here I store fields of legal person;
Cons of this approach is that I have difficult structure and to get information
of participants I have to join three different tables using left join.
An Alternative solution is to unify these three tables into one (participants).
Cons of this solution is that the table is big and ambiguous.
Please advise me which solution to choose and why, or some other better solution for this problem.

You should go for the first approach you mentioned in your question that splits into three tables. This keeps you away from different anomalies like update, delete, insert etc.
This is a good read about normalization. Go through this link and you will know how normalization works and how we can normalize our DB Design. Hope it helps :)

Related

PostgreSQL: JSON column or one-to-many table for config options

We currently have a table which stores information about users. Some of the columns hold information such as user ID, name etc., but many other columns (booleans, integers and varchars etc) hold configuration options for each user.
This has over time resulted in the width of the table becoming quite big and I think the time has come to migrate this to something new, so I want to remove all the "option"-related columns to a separate data structure.
The typical way of doing this, from my experience, would be to have a new table which would simply have option_id and option_name, and a second new table which would contain user_id, option_id, option_value, for example.
However, a colleague suggested using the new jsonb column type as an alternative, but I don't know if I like the idea of storing relational data in a non-relational way. From a Java point of view, it's pretty much the same as far as I can tell - it'll just be turned into a POJO and then cached on the object.
I should mention the number of users will be quite low, only going into the thousands, and number of columns could and will go into the hundreds.
Does anyone have advice on the best way forward here?
Technically, you have already de-normalized your database structure by adding columns to a table that are irrelevant to some of the entities stored therein.
Using JSON is just another way to de-normalize, cramming a bunch of values into a single row-column field. The excellent binary support for JSON in Postgres (the jsonb data type) then lets you index elements within those JSON documents, as a way to quickly access those embedded values. This is quite screwy from a relational point of view, but is handy for some situations.
Either approach is commonly done for this kind of problem, and is not necessarily bad. In general, de-normalizing is often a pay-now-or-pay-later kind of solution. But for something like user preferences, there may not be a pay-later penalty, as there often is with most business-oriented problem domains.
Nevertheless, you should consider a normalized database structure.
By the way, this kind of table-structure Question might be better asked in the sister site, http://DBA.StackExchange.com/.
I suggest searching Stack Overflow, that DBA site, and the wider Internet for discussions of database design for storing user preferences. Like this.

Is it good practice to have 2 or more tables with the same columns?

I'm creating a web-app that lets users search for restaurants and cafes. Since I currently have no data other than their type to differentiate the two, I have two options on storing the list of eateries.
Use a single table for both restaurants and cafes, and have an enum (text) column stating if an entry is a restaurant or cafe.
Create two separate tables, one for restaurants, and one for cafes.
I will never need to execute a query that collects data from both, so the only thing that matters to me I guess is performance. What would you suggest as the better option for PostgreSQL?
Typical database modeling would lend itself to a single table. The main reason is maintainability. If you have two tables with the same columns and your client decides they want to add a column, say hours of operation. You now have to write two sets of code for creating the column, reading the new column, updating the new column, etc. Also, what if your client wants you to start tracking bars, now you need a third table with a third set of code. It gets messy quick. It would be better to have two tables, a data table (say Establishment) with most of the columns (name, location, etc.) and then a second table that's a "type" table (say EstablishmentType) with a row for Restaurant, Cafe, Bar, etc. And of course a foreign key linking the two. This way you can have "X" types and only need to maintain a single set of code.
There are of course exceptions to this rule where you may want separate tables:
Performance due to a HUGE data set. (It depends on your server, but were talking at least hundreds of thousands of rows before it should matter in Postgres). If this is the reason I would suggest table inheritance to keep much of the proper maintainability while speeding up performance.
Cafes and Restaurants have two completely different sets of functionality in your website. If the entirety of your code is saying if Cafe, do this, if Restaurant, do that, then you already have two sets of code to maintain, with the added hassle of if logic in your code. If that's the case, two separate tables is a much cleaner and logical option.
In the end I chose to use 2 separate tables, as I really will never need to search for both at the same time, and this way I can expand a single table in the future if I need to add another data field specific to cafes, for example.

JSON or relational tables for complex user profiles

I am trying to design a Postgres database for holding a variety of information about users and see two obvious ways to go about it - specifically, the different many-many relations.
Store the basic user data in a user_info table. In separate tables, store the many-many relations like what schools someone attended, places they worked at, and so on. There will be a large number of such tables, (it is easy to add things like what places someone visited, what books they've read, etc. etc. I expect this to grow to a rather large list of tables).
In the main user_info table, store a JSON blob (properly organized of course) with all this additional info.
Which of these two options should I choose? Naturally, read performance is more important. I know that JSON is generally slower than ordinary relational tables but I am unsure if looking up info from a lot of different tables (as in option 1) will be slower than getting a single json blob and displaying it in the browser. As a further note, the JSONB format, in Postgres, actually has good indexing options.
Update:
Following some comments that a graphdb is what needs to be used: I should clarify the question is not about the choice of technology (rdbms vs graph db). But about the choice of data type given the technology (rdbms).
NoSQL is great for when you don't know what data you're going to store or how it's going to be used, or it fits well with the list/hash model. Relational databases are great for when you have a lot of certainty about the data, how it will be used, and when it fits into the relational model. I would suggest a hybrid approach, especially given PostgreSQL 9.2's JSON performance improvements.
Make traditional relationships for things you know are solid.
Make use of JSON for data that you want to capture but aren't sure you need.
For simple lists, make use of PostgreSQL arrays or JSON rather than join tables.
Abstract this all behind model classes.
As you gain more knowledge about the data, change how its stored.
For example, make tables for People, Schools, Work and Places and join tables between them. Fields like People.name and Places.address are normal columns. Things like "list of a person's pets" store it as an array of TEXT or a JSON field until you feel you need a Pets table. Any extra information you don't immediately know what you're going to do with like "how big is a school's endowment" put into a JSON metadata column.
Using model classes allows you to refactor your database without worrying about every piece of code that touches the database. Just be sure that all code which makes assumptions about the table structure goes into model methods.

How to create HBase columns / table for related but separated entities

I saw video tutorial on HBase, where data got stored in a table like this:
EmployeeName - Height - ProjectInfo
------------------------------------
Jdoe - 5'7" - ProjA-TeamLead, ProjB-Contributor
What happens when some Business requirements comes up that name of ProjA has to be changed to ProjX ?
Wouldn't there be a separate table where Project information is stored?
In a relational database, yes: you'd have a project table, and the employee table would refer to it via a foreign key and only store the immutable project id (rather than the name). Then when you want to query it (in a relational database), you'd do a JOIN like:
SELECT
employee.name,
employee.height,
project.name,
employee_project_role.role_name
FROM
employee
INNER JOIN employee_project_role
ON employee_project_role.employee_id = employee.employee_id
INNER JOIN project
ON employee_project_role.project_id = project.project_id
This isn't how things are done in HBase (and other NoSQL databases); the reason is that since these databases are geared towards extremely large data sets, and distributed over many machines, the actual algorithms to transparently execute complex joins like this become a lot harder to pull off in ways that perform well. Thus, HBase doesn't even have built-in joins.
Instead, the general approach with systems like this is that you denormalize your data, and store things in a single table. So in this case, there might be one row per employee, and denormalized into that row is all of the employee's project role info (probably in separate columns -- the contents of a row in HBase is actually a key/value map, so you can represent repeating things like all of their different roles easily).
You're absolutely right, though: if you change the name of the project, that means you'd need to change the data that's stored for every employee. In this respect, the relational model is "cleaner". But if you're dealing with Petabytes of data or trillions of rows, the "clean" abstraction of a relational database becomes a lot messier, because you end up having to shard it all manually. The point of systems like HBase is to pay these costs up front in the design process, and not just assume the relational database will magically solve problems like this for you at scale. (Because it won't).
That said: if you don't expect to have at least Terabtyes of data (that's a million MB, remember), just do it in a relational database. It'll be much easier.
I think going through this presentation will give you some perspective:
http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf
And for a more programetical representation, have a look at:
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable

SQL table structure

I am starting a new project that will handle surveys and reviews. At this point I am trying to figure out what would be the best sql table structure to store and handle such information.
Basically, the survey will contain ratings, text reviews and additional optional information available for clients to share. Now I am thinking of either storing each information in a separate column or maybe merge all this data and store it as an XML in one column.
I am not sure what would be a better solution, but I have the following issues on my mind:
- would possible increase of information collected would be a problem in case of single XML column
- would a single XML column have any serious impact on performance when extracting and handling information from xml column
If you ever have a reason to query on a single piece of info, or update it alone, then don't store that data in XML, but instead as a separate column.
It is rare, IMO, that storing XML (or any other composite type of data) is a good idea in a DB. Although there are always exceptions.
Well, to keep this simple, you have two choices: dyanmic or static surveys.
Dynamic surveys would look like this:
Not only would reporting be more complicated, but so would the UI. The number of questions is unknown and you would eventually need logic to handle order, grouping, and data types.
Static surveys would look more like this:
Although you certainly give up some flexibility, the solution (including reports) is considerably simpler. You need not handle order, grouping, or data types (at least dynamically).
I like to argue that "Simplicity is the best Design" in almost everything.
Since I cannot know your requirements in detail, I cannot assume which is the better fit. But I can tell you this, the dynamic is often built when the static is sufficient.
Best of luck!
If you don't want to fight with a relational database that expects relational data you probably want reasonably normalized data. I don't see in your case what advantage the XML would give you. If you have multiple values entered in the survey, you probably want another table for survey entries with a foreign key to the survey.
If this is going to be a relatively extensive application you might think about a table for survey definition, a table for survey question, a table for survey response, and a table for survey question response. If the survey data can be multiple types, you might need a table for each kind of question that might be asked, though in some cases a column might do.
EDIT - I think you would at least have one row per answer to a question. If the answer is complex (doesn't correspond to just one instance of a simple data type) it might actually be multiple rows (though denormalizing into multiple columns is probably O.K. if the number of columns is small and fixed). If an answer to one question needs to be stored in multiple rows, you would almost certainly end up with one table that represents the answer, and has one row per answer, plus another table that represents pieces of the answer, and has one row per piece.
If the reason you are considering XML is that the answers are going to be of very different types (for example, a review with a rating, a title, a header, a body, and a comments section for one question; a list of hyperlinks for another question, etc.) then the answer table might actually have to be several tables, so that you can model the data for each type of question. That would be a pretty complicated case though.
Hopefully one row per response, in a single table, would be sufficient.
To piggyback off of Flimzy's answer, you want to simply store the data in the database and not a specific format (i.e. XML). You might a requirement at the moment for XML, but tomorrow it might be a CSV or a fixed width DAT file. Also, if you store just the data, then you can use the "power" of the database to search on specific columns of information and then return it as XML, if desired.