Third Normal Form uniqeness constraint - database-normalization

I am trying to make my database in 3NF i am confused about one thing. In the explanation below i do not understand how Zip can be the primary key of the address table if the zip can occur more than once. In the Student_Detail table a reoccuring zip is fine but as a primary key wont it lose its uniqeness?
Third Normal Form (3NF)
Third Normal form applies that every non-prime attribute of table must be dependent on primary key, or we can say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So this transitive functional dependency should be removed from the table and also the table must be in Second Normal form. For example, consider a table with following fields.
Student_Detail Table :
Student_id - Student_name - DOB - Street - city - State - Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table :
Student_id - Student_name - DOB - Zip
Address Table :
Zip - Street - city - state
The advantage of removing transtive dependency is,
Amount of data duplication is reduced.
Data integrity achieved.
Example:
http://www.studytonight.com/dbms/database-normalization.php

I'm assuming this is your question
i do not understand how Zip can be the primary key of the address table if the zip can occur more than once.
and the reason why you don't understand is just because Zip is a bad example.
All the explanation is correct. If you can infer any "non-prime" attribute base upon another "non-prime" attribute you have what is called "transitive dependency". You pull those to a different table and in its place you insert a FK reference.
Zip will not be able to appear more than once for that attribute is a PK. I believe it is just a bad example although the explanation is correct. Try to analyse it with different subjects.
Check if this example helps you in any way.

Related

mapping generalization constraints to sql (STI approcach)

I'm trying to model the following relationships between entities, mainly consisting of a partial, disjoint generalization.
original EERD
'mapped' to relational
Since I didn't need the subclasses to have any particular attributes I decided to use the "single table inheritance" approach, added the "type" field and moved the relationships towards the parent.
After that I had two choices to make:
1- type for the "business type" attribute
2- way to constraint participation to at most one of the 4 relationships based on the type attribute
For the sake of portability and extensibility I decided to implement no.1 as a lookup table (rather than enum or a hardcoded check).
About no.2 I figured the best way to enforce participation and exclusivity constraints on the four relationships would be a trigger.
The problem is that now I'm not really sure how to write a trigger function; for instance it would have to reference values inserted into business type, so I'd have to make sure those can't be changed or deleted in the future.
I feel like I'm doing something wrong so I wanted to ask for feedback before going further; is this a suitable approach in your opinion?
I found an interesting article describing a possible solution: it feels a bit like an 'hack' but it should be working
(it's intended for SQL Server, but it can be easily applied in postgres too).
EDIT:
It consists in adding a type field to the parent table, and then have every child table reference said field along with the parent's id by using a foreign key constraint (a UNIQUE constraint on this pair of fields has to be added beforehand, since FKs must be unique).
Now in order to force the type field to match the table it belongs to, one adds a check constraint/always generated value ensuring that the type column always has the same value
(eg: CHECK(Business_type_id = 1) in the Husbandry table, where 1 represents 'husbandry' in the type lookup table).
The only issue is that it requires a whole column in every subclass, each containing the same generated value repeated over and over (waste of space?), and it may fall apart as soon as the IDs in the lookup table are modified

Deciding primary key for DynamoDB

I have 3 fields to store in DynamoDB: identity-1, identity-2, score.
identity-1 and identity-2 are always unique in the table, i.e. no two entries can have same identity-1 or identity-2.
We want to allow entries to either have one of identity-1 or identity-2 or have both. Example:
identity-1
identity-2
score
a1
b1
s1
a2
s2
b3
s3
Access patterns are as follows:
Query identity-2 from identity-1
Query score from identity-1
Query score from identity-2
How do I define primary key in such case?
This is a "many:1" problem and there's a few ways to tackle it with DynamoDB. The simple answer here is to leverage Global Secondary Indexes (GSI). For every "identity" you wanted to do a direct look up from, you'd create a GSI.
GSI-1 would include Identity-1 as the hash key and you'd include Identity-2 and any other identities as a non-key attribute to include. You'd create a GSI for each identity you wanted to query directly on. You could also include the score as a non-key attribute if you wanted to directly look up score from any identity without having to resolve to the primary key (which we'll talk about).
The thing to consider with GSI's, though, is that they consume extra storage and throughput. If you create a GSI which includes all your attributes for every identity, you'd be paying for an additional copy of your table for each identity.
The other issue, so far, is that you haven't chosen a Primary Key for your table. You'll need a field to be your primary key and if none of your identities is non-nullable, you'll need a field which will be. It's often convenient to just call it what it is, so we'll call it pk.
You've got a few choices for pk here. Once is to define pk as a composite of your identities. For example: item.pk = item["identity-1"] || item["identity-2"]. Then you could do a query on the table for the identity == pk and if you don't find anything, you could then look up the index for the given identity. This works fine for your simple example, but as you wanted to do more complex things (such as many different identity types), you might find it to be a bit of a headache.
From past experience, my recommendation would be to adjust your approach slightly, however, and have an "users" table and a "scores" table. "users" would have a pk of a guid unique for every user and all their identities (call it "user_id"), you could then create a GSI for that table for every identity back to user_id. Then scores would then use "user_id" as the pk as well with no need for an index. Your application would always resolve to a "user_id" when a user was logged in or otherwise identified - then you can search for score without needing to track identity and you can look up all the associated identities or other user information without needing to create a very "fat" index of every identity->every other identity.

Self References

For an assessment task I'm doing, an entity album has the attribute also_bought, which is a self-referential attribute. However, this one attribute has multiple entries for any one album - as the also_bought recommendations are rarely only one recommendation - and thus, is a bit of a question mark when it comes to normalisation. I'm not sure whether it passes 1NF or not.
To be clear, the entire entity's set is
Album(album_id, title, playtime, genre, release_date, price, also_bought)
"Also bought" items should be stored in a separate table, something like.
AlsoBought (table)
album_id
also_bought_album_id
Then configure foreign keys from both columns to reference Album.album_id.
You mean that Album is a "self-referencing table" because it has a FK (foreign key) from one column list to another in the same table? (A FK constraint holds when subrow values for a column list must appear elsewhere.) If you mean that the type of also_bought is a list of album_ids, there is no FK from the former to the latter, because values for the former (lists of ids) are not values for the latter (ids). There's a constraint that is reminding you of a FK.
Anyway, normalization is done to one table, and doesn't depend on FKs.
But any time you are "normalizing to 1NF" eliminating "non-atomic columns" you have to start by deciding what your "table" "columns" contain. If you decide a cell for a column in a row contains "many values" then you don't have a relational table and you have to come up with one. The easiest way is to assume a set-valued column to get a relation and then follow the standard rules for elimination of too-complex column types.

Avoid duplicates in JPA Many-to-One

I have asked a variation of this question previously for Hibernate. However, I am now working with Eclipselink and it still troubles me. It is very simple:
I need to persist an address object that includes a city name in a many-to-one relationship. I would like to be able to persist the address and cascade save the city - but only if the city is unique. From my understanding this is not supported directly by JPA? Possible solutions include using the city name as the unique id in the city table and querying the city table for a specific city and then adding that object to the address prior to saving. I have certainly seen several StackOverflow questions/answers that seem to suggest this is the approach (JPA cascade persist - many to one)
Am I missing something here? Is there an alternative/better approach?
Yes, the solution is to get the city from the database, create it if it doesn't exist, and set it to the address.
There's no way around that. The city name doesn't have to be the primary key, though. I would use an autogenerated, non-functional, surrogate key as the PK, and add a unique constraint on the name of the city. This would at least allow you to fix typos in city names without having to update the thousand addresses referencing them.

DDD: modelling aggregate entities' unique global/local id in PostgreSQL

I have read Eric Evans' Domain Driven Design book and I have been trying to apply some of the concepts.
In his book, Eric talks about aggregates and how aggregate roots should have a unique global id whereas aggregate members should have a unique local id. I have been trying to apply that concept to my database tables and I'm running into some issues.
I have two tables in my PostgreSQL database: facilities and employees where employees can be assigned to a single facility.
In the past, I would lay out the employees table as follows:
CREATE TABLE "employees" (
"employeeid" serial NOT NULL PRIMARY KEY,
"facilityid" integer NOT NULL,
...
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is a globally unique id. I would then add code in the backend for access control validation, preventing users of one facility from accessing rows pertaining to other facilities. I have a feeling this might not be the safest way to do it.
What I am now considering is this layout:
CREATE TABLE "employees" (
"employeeid" integer NOT NULL,
"facilityid" integer NOT NULL,
...
PRIMARY KEY ("employeeid", "facilityid"),
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is unique (locally) for a given facilityid but needs to be paired with a facilityid to be unique globally.
Concretely, this is what I am looking for:
Employee A (employeeid: 1, facilityid: 1)
Employee B (employeeid: 2, facilityid: 1)
Employee C (employeeid: 1, facilityid: 2)
where A, B and C are 3 distinct employees and...
adding an employee D to facility 1 would give him the keys (employeeid : 3, facilityid: 1)
adding an employee E to facility 2 would give him the keys (employeeid : 2, facilityid: 2)
I see two ways of achieving this:
I could use triggers or stored procedures to automatically generate new employeeids and store the last ids for every facility in another table for quicker access but I am concerned about concurrency issues and ending up with 2 employees from the same facility with the same id.
I could possibly create a new sequence for each facility to manage the employeeids but I fear ending up with thousands of sequences to manage and with procedures to delete those sequences in case a facility is deleted. Is there anything wrong with this? It seems heavy to me.
Which approach should I take? Is there anything I'm missing out on?
I am inferring from your question that you will be running a single database for all facilities, or at least that if you have a local database as the "master" for each facility that the data will need to be combined in a central database without collisions.
I would make the facilityid the high order part of the primary key. You could probably assign new employee numbers using a simple SELECT max(employeeid) + 1 ... WHERE facilityid = n approach, since adding employees to any one facility is presumably not something that happens hundreds of times per second from multiple concurrent sources. There is some chance that this could generate an occasional serialization failure, but it is my opinion that any database access should be through a framework which recognizes those and automatically retries the transaction.
I guess you overstressed the aggregate root concept here. In my understanding of modelling an employee (that depends on your context) an employee is almost always an aggregate root possibly referenced by another aggregate root facility.
Both employee and facility almost always have natural keys. For the employee this is typically some employee id (printed on employee identification badges, or at least maintained in the human resources software system) and facilities have this natural keys too almost always containing some location part and some number like "MUC-1" for facility 1 located in munich. But that all depends on your context. In case employee and facility have this natural keys your database model should be quite clear.