I'm likely overthinking a problem here and may well get downvoted but I'm prepared to take the hit. I'm building my first schema in a data warehouse.
2 tables: events and contacts:
events(id(pk), cid, other, fields, here)
contacts(id (pk), cid(fk), other, fields, here)
Someone visits our website and registers. A line item is generated in events column "id" and a "cid" for contacts is generated. A new record is added to contacts.
I have two questions:
Can I make the primary key of contacts cid? Thus the primary key is also a foreign key?
I'm using MySQL Workbench to create the schema. When I create the contacts table I am able to set the foreign key of cid and the cardinality as either 1-1 or 1-many. From the point of view of contacts table, is the relationship 1-1 or to many? There will only ever be 1 cid record in contacts but if that user does multiple things (like receive an email from us etc) they will appear multiple times in events table. So, logically 1-many. But when creating this in Workbench the relation line appears as though it's a 1-many relation with the many part being at contacts, not the other way around as desired. It should be the other way around?
What is the relationship between events.cid and contacts.cid?
If a user's registration results in a single contact_ record while each user visit to the web site (each Session started) results in an event_ record belonging to that user’s contact_ record, then you have a One-To-Many relationship.
`contact_` = parent table (the One)
`event_` = child table (the Many)
Notice how I boiled down that relationship into a single sentence. That should be your goal when doing analysis work to determine table structure.
Relationships are almost always defined as a link from a primary key on parent table to a foreign key on a child table.
How you define the primary key is up to you. First decide whether you want a natural key or a surrogate key. In my experience a natural key never works out as the values always eventually change.
If using a surrogate key, decide what type. The usual choices are an integer tied to an automatically incrementing sequence generator, or a UUID. If ever federating data with other databases or systems then UUID is the way to go. If using an integer, decide on size, with 32-bit integers handling a total of 2-4 billion records. A 64-bit integer can track 18 quintillion records.
The foreign key in child table is simply a copy of its assigned parent’s primary key value. So the foreign key must have same data type as that parent primary key.
If a particular parent record owns multiple records in the child table, each of those child records will carry a copy of that parent’s primary key. So if the user Susan has five events, her primary key value appears once in the contact_ table and that same value appears five times in the event_ table stored in the foreign key column.
If cid uniquely identifies each contact_ record amongst all the other contact_ records, then you have a primary key. No need to define another.
Related
I'm writing a program that generates large numbers of rows to be inserted into a PostgreSQL database. Due to the presence of multiple indices, this process is getting slower over time. That's why I want to move to using COPY which seems to be significantly faster. The problem is that one of the tables has a foreign key into another, and I do not have the IDs for the foreign key column.
I was thinking that maybe if I could reserve a range in the sequence used for the primary key of the first table, I could do the ID assignment manually but I don't think Postgres natively supports such an operation. Is there a way to achieve this another way?
First off from your source data identify the business key for the parent and child tables. Create those tables and a unique constraint business key. This will not be the surrogate - auto generated - PK. Now create a staging table with all the columns necessary (except the FK). Since you will most likely be using across sessions this is a permanent table, but the intent is single time usage. With this insert into the parent table generating the pk from the sequence. Then insert into the child selecting the FK by referencing the business key from the parent.
insert into parent( <columns> )
select column_list
from stage
on conflict (business key ) do nothing;
insert into child ( <columns>, )
select s.<columns>, a.id
from stage s
join parent a on s.business key = a.business key
on conflict (a.parent_id, child_bk) do nothing;
Since the above is rather obscure in the abstract see a concrete example here. There is no need attempting to "reserve a range", just let the pk/fk develop naturally.
Do I really need an extra identity field say called id on a bridge table? For primary tables I set an id and have it start incrementing from 0. But not sure about bridge tables.
Example:
user
user_id (identity)
name
user_communities
id (identity) - do I even need this??
user_id
community_id
communities
community_id (identity)
name
No, you don't need an additional generated primary key on a bridge table - at least not if (user_id, community_id) is the primary key.
You would only need it in case you would allow a user to participate in the same community multiple times, e.g. with different roles.
Your relationship links two entities, thus you have the ids of the two entities in it. In that case the id in your bridge table is unnecessary.
But, although rarer, you could also have higher order relationships which connect two relationships or a relationship with other entities. Say, for example, you want to qualify a relationship with a set of properties (the strength of the relationship, its participants, etc), you could have a relationship properties table that links to the relationship (thus you would need its id) to a set of name-values pairs. You could even have a bridge table between two different bridge tables to connect them and assign certain properties to the connection (which relationship has priority over the other, e.g.)
I would like to do two things in PostgresSQL:
version rows in a table by a date range
ensure the integrity of the table by setting up single column foreign keys
For me, it seems I can do only one of the above at the same time.
Detailed example:
I need to version a content of a table based on the date range (so in any particular point in time there is only one row for the (customId, validFrom, validUntil) unique index (there are no overlapping ranges). but it's important that none of those columns are unique by themselves.
By using this method I can query my table and get the valid entity for any point in time, but I could not figure out how to link this table via the customId key to another table so the integrity of the table is guarded.
The problem is that the customId key is not unique as there can be more than one of the same key when multiple ranges are recorded.
One solution I have used before is creating an another x_history table when I am only interested in the latest state of the entity, and copy the old state to the history table every time, but this time, this wouldn't work really well because I would constantly query two table as it's "random" what version of data I am interested in during selects.
Example by data:
table a:
id (PK)
custom_id (unique in any single point of time via the above composite unique index)
valid_from (timestamp, storing the start of the validity of a)
valid_until (timestamp, storing the end of the validity of a)
table b:
id (PK)
a__custom_id (unique in any single point of time)
valid_from (timestamp, storing the start of the validity of b)
valid_until (timestamp, storing the end of the validity of b)
I would like to insert only those rows into table b which
b.a__custom_id exists in a.custom_id
b.a_custom_id, b.valid_from, b.valid_until is unique
You cannot easily have both foreign keys and historical data.
One way would be to have the validity range as part of the primary key, but then you have to update many rows whenever you modify an entry in the referenced table.
I think you can get away with a history table if you include the currently active version in the history table. Then you can just query the history table, and the table with the current values is just there for foreign keys.
The history table would have an exclusion constraint over the primary key and the time range.
I am looking at multi-tenancy database schema design for an SaaS concept. It will be ASP.NET MVC -> EF, but that isn't so important.
Below you can see an example database schema (the Tenant being the Company). The CompanyId is replicated throughout the schema and the primary key has been placed on both the natural key, plus the tenant Id.
Plugging this schema into the Entity Framework gives the following errors when I add the tables into the Entity Model file (Model1.edmx):
The relationship 'FK_Order_Customer' uses the set of foreign keys '{CustomerId, CompanyId}' that are partially contained in the set of primary keys '{OrderId, CompanyId}' of the table 'Order'. The set of foreign keys must be fully contained in the set of primary keys, or fully not contained in the set of primary keys to be mapped to a model.
The relationship 'FK_OrderLine_Customer' uses the set of foreign keys '{CustomerId, CompanyId}' that are partially contained in the set of primary keys '{OrderLineId, CompanyId}' of the table 'OrderLine'. The set of foreign keys must be fully contained in the set of primary keys, or fully not contained in the set of primary keys to be mapped to a model.
The relationship 'FK_OrderLine_Order' uses the set of foreign keys '{OrderId, CompanyId}' that are partially contained in the set of primary keys '{OrderLineId, CompanyId}' of the table 'OrderLine'. The set of foreign keys must be fully contained in the set of primary keys, or fully not contained in the set of primary keys to be mapped to a model.
The relationship 'FK_Order_Customer' uses the set of foreign keys '{CustomerId, CompanyId}' that are partially contained in the set of primary keys '{OrderId, CompanyId}' of the table 'Order'. The set of foreign keys must be fully contained in the set of primary keys, or fully not contained in the set of primary keys to be mapped to a model.
The relationship 'FK_OrderLine_Customer' uses the set of foreign keys '{CustomerId, CompanyId}' that are partially contained in the set of primary keys '{OrderLineId, CompanyId}' of the table 'OrderLine'. The set of foreign keys must be fully contained in the set of primary keys, or fully not contained in the set of primary keys to be mapped to a model.
The relationship 'FK_OrderLine_Order' uses the set of foreign keys '{OrderId, CompanyId}' that are partially contained in the set of primary keys '{OrderLineId, CompanyId}' of the table 'OrderLine'. The set of foreign keys must be fully contained in the set of primary keys, or fully not contained in the set of primary keys to be mapped to a model.
The relationship 'FK_OrderLine_Product' uses the set of foreign keys '{ProductId, CompanyId}' that are partially contained in the set of primary keys '{OrderLineId, CompanyId}' of the table 'OrderLine'. The set of foreign keys must be fully contained in the set of primary keys, or fully not contained in the set of primary keys to be mapped to a model.
The question is in two parts:
Is my database design incorrect? Should I refrain from these compound primary keys? I'm questioning my sanity regarding the fundamental schema design (frazzled brain syndrome). Please feel free to suggest the 'idealized' schema.
Alternatively, if the database design is correct, then is EF unable to match the keys because it perceives these foreign keys as a potential mis-configured 1:1 relationships (incorrectly)? In which case, is this an EF bug and how can I work around it?
On a quick scan of EF's error messages, it clearly doesn't like the way you're setting up compound keys, and I think it's probably nudging you in the right direction. Give some thought again to what makes your primary keys unique. Is the OrderID alone not unique, without a CompanyID? Is a ProductID not unique, without a CompanyID? An OrderLine certainly should be unique without a CompanyID, since an OrderLine should be associated only with a single Order.
If you truly need the CompanyID for all of these, which probably means that the company in question is supplying you with ProductID and OrderID, then you might want to go a different direction, and generate your own primary keys that are not intrinsic to the data. Simply set up an auto-increment column for your primary key, and let these be the internal OrderID, OrderLineID, ProductID, CompanyID, etc. At that point, the OrderLine won't need the customer's OrderID or CompanyID; the foreign key reference to the Order would be its starting point. (And the CustomerID should never be an attribute of an order line; it's an attribute of the order, not the order line.)
Compound keys are just messy. Try designing the model without them, and see if it simplifies things.
I think that the error is not in the design.
Is not in the EF.
Is in Sql Server relations.
Read the EF message:
The relationship 'FK_Order_Customer'
uses the set of foreign keys
'{CustomerId, CompanyId}' that are
partially contained in the set of
primary keys '{OrderId, CompanyId}' of
the table 'Order'. The set of foreign
keys must be fully contained in the
set of primary keys, or fully not
contained in the set of primary keys
to be mapped to a model.
ERROR
Actualy the relation betwen Order and Customer use only one field (probably you dragged with the mouse the field "CustomerId" from teh Order table to the "Id" of the Customer table)
SOLUTION
Right click on the wire that connect Order and Customer and in the relation add also the CompanyId
PS: The design is correct.
Putting the CompanyId in each table is rith solution in multi-tenant architecture because help to scale (usualy always want to select only records from the loggedIn company).
I think storing the company number in each of the tables is hurting you more than helping. I can understand why you want to do this (as the programmer/dba you can just go into any table and 'see' what data belongs to who which is comforting), but it is getting in the way of you setting up the database the way it should be.
Avoid the compound keys and your design gets a whole lot simpler.
If you have to absolutely add CompanyID column to each table, add it as a regular column and not a composite Key. Composite key is mostly used when you have to implement many to many relationship.
As someone mentioned also create a Non-clustered Index on CompanyID so joins to the Company table are benefitted.
Thanks!
First: like others said, when referencing a foreign key, use the whole primary key in the other table (ie. both fields).
Second, I cannot imagine not using a CompanyID column in most tables in a serious application. Orderdetail might perhaps be an exception in this case (also global lookup tables perhaps, unless they are tenant dependant). Thing is, you cannot do any safe sort of free form search on a table without either adding the CompanyID, or doing JOINs up until the point you reach a table which has that column. The latter one obviously costs performance. Perhaps in this case you could make an exception for orderdetail and only search in the joined version (only two tables). Then again, its not really consistent.
Also regarding making a it a compound key or not: its possible, but opens up the possibility that a bug writes information incorrectly (into non existent, or other people's administrations) for the duration of the bug. Try to fix that in production, not to mention explain it to customers why there are seeing their competitors orders in their system.
What are the different types of keys in RDBMS? Please include examples with your answer.
(I) Super Key – An attribute or a combination of attribute that is used to identify the records uniquely is known as Super Key. A table can have many Super Keys.
E.g. of Super Key
ID
ID, Name
ID, Address
ID, Department_ID
ID, Salary
Name, Address
Name, Address, Department_ID
So on as any combination which can identify the records uniquely will be a Super Key.
(II) Candidate Key – It can be defined as minimal Super Key or irreducible Super Key. In other words an attribute or a combination of attribute that identifies the record uniquely but none of its proper subsets can identify the records uniquely.
E.g. of Candidate Key
ID
Name, Address
For above table we have only two Candidate Keys (i.e. Irreducible Super Key) used to identify the records from the table uniquely. ID Key can identify the record uniquely and similarly combination of Name and Address can identify the record uniquely, but neither Name nor Address can be used to identify the records uniquely as it might be possible that we have two employees with similar name or two employees from the same house.
(III) Primary Key – A Candidate Key that is used by the database designer for unique identification of each row in a table is known as Primary Key. A Primary Key can consist of one or more attributes of a table.
E.g. of Primary Key - Database designer can use one of the Candidate Key as a Primary Key. In this case we have “ID” and “Name, Address” as Candidate Key, we will consider “ID” Key as a Primary Key as the other key is the combination of more than one attribute.
(IV) Foreign Key – A foreign key is an attribute or combination of attribute in one base table that points to the candidate key (generally it is the primary key) of another table. The purpose of the foreign key is to ensure referential integrity of the data i.e. only values that are supposed to appear in the database are permitted.
E.g. of Foreign Key – Let consider we have another table i.e. Department Table with Attributes “Department_ID”, “Department_Name”, “Manager_ID”, ”Location_ID” with Department_ID as an Primary Key. Now the Department_ID attribute of Employee Table (dependent or child table) can be defined as the Foreign Key as it can reference to the Department_ID attribute of the Departments table (the referenced or parent table), a Foreign Key value must match an existing value in the parent table or be NULL.
(V) Composite Key – If we use multiple attributes to create a Primary Key then that Primary Key is called Composite Key (also called a Compound Key or Concatenated Key).
E.g. of Composite Key, if we have used “Name, Address” as a Primary Key then it will be our Composite Key.
(VI) Alternate Key – Alternate Key can be any of the Candidate Keys except for the Primary Key.
E.g. of Alternate Key is “Name, Address” as it is the only other Candidate Key which is not a Primary Key.
(VII) Secondary Key – The attributes that are not even the Super Key but can be still used for identification of records (not unique) are known as Secondary Key.
E.g. of Secondary Key can be Name, Address, Salary, Department_ID etc. as they can identify the records but they might not be unique.
From here and here: (after i googled your title)
Alternate key - An alternate key is any candidate key which is not selected to be the primary key
Candidate key - A candidate key is a field or combination of fields that can act as a primary key field for that table to uniquely identify each record in that table.
Compound key - compound key (also called a composite key or concatenated key) is a key that consists of 2 or more attributes.
Primary key - a primary key is a value that can be used to identify a unique row in a table. Attributes are associated with it. Examples of primary keys are Social Security numbers (associated to a specific person) or ISBNs (associated to a specific book).
In the relational model of data, a primary key is a candidate key chosen as the main method of uniquely identifying a tuple in a relation.
Superkey - A superkey is defined in the relational model as a set of attributes of a relation variable (relvar) for which it holds that in all relations assigned to that variable there are no two distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar upon which all attributes of the relvar are functionally dependent.
Foreign key - a foreign key (FK) is a field or group of fields in a database record that points to a key field or group of fields forming a key of another database record in some (usually different) table. Usually a foreign key in one table refers to the primary key (PK) of another table. This way references can be made to link information together and it is an essential part of database normalization
Ólafur forgot the surrogate key:
A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.
There also exists a UNIQUE KEY.
The main difference between PRIMARY KEY and UNIQUE KEY is that the PRIMARY KEY never takes NULL value while a UNIQUE KEY may take NULL value.
Also, there can be only one PRIMARY KEY in a table while UNIQUE KEY may be more than one.
There is also a SURROGATE KEY: it occurs if one non prime attribute depends on another non prime attribute. that time you don't now to choose which key as primary key to split up your table. In that case use a surrogate key instead of a primary key. Usually this key is system defined and always have numeric values and its value often automatically incremented for new rows. Eg : ms acces = auto number & my SQL = identity column & oracle = sequence.
Partial Key:
It is a set of attributes that can uniquely identify weak entities and that are related to same owner entity. It is sometime called as Discriminator.
Alternate Key:
All Candidate Keys excluding the Primary Key are known as Alternate Keys.
Artificial Key:
If no obvious key, either stand alone or compound is available, then the last resort is to simply create a key, by assigning a unique number to each record or occurrence. Then this is known as developing an artificial key.
Compound Key:
If no single data element uniquely identifies occurrences within a construct, then combining multiple elements to create a unique identifier for the construct is known as creating a compound key.
Natural Key:
When one of the data elements stored within a construct is utilized as the primary key, then it is called the natural key.
Sharing my notes which I usually maintain while reading from Internet, I hope it may be helpful to someone
Candidate Key or available keys
Candidate keys are those keys which is candidate for primary key of a table. In simple words we can understand that such type of keys which full fill all the requirements of primary key which is not null and have unique records is a candidate for primary key. So thus type of key is known as candidate key. Every table must have at least one candidate key but at the same time can have several.
Primary Key
Such type of candidate key which is chosen as a primary key for table is known as primary key. Primary keys are used to identify tables. There is only one primary key per table. In SQL Server when we create primary key to any table then a clustered index is automatically created to that column.
Foreign Key
Foreign key are those keys which is used to define relationship between two tables. When we want to implement relationship between two tables then we use concept of foreign key. It is also known as referential integrity. We can create more than one foreign key per table. Foreign key is generally a primary key from one table that appears as a field in another where the first table has a relationship to the second. In other words, if we had a table A with a primary key X that linked to a table B where X was a field in B, then X would be a foreign key in B.
Alternate Key or Secondary
If any table have more than one candidate key, then after choosing primary key from those candidate key, rest of candidate keys are known as an alternate key of that table. Like here we can take a very simple example to understand the concept of alternate key. Suppose we have a table named Employee which has two columns EmpID and EmpMail, both have not null attributes and unique value. So both columns are treated as candidate key. Now we make EmpID as a primary key to that table then EmpMail is known as alternate key.
Composite Key
When we create keys on more than one column then that key is known as composite key. Like here we can take an example to understand this feature. I have a table Student which has two columns Sid and SrefNo and we make primary key on these two column. Then this key is known as composite key.
Natural keys
A natural key is one or more existing data attributes that are unique to the business concept. For the Customer table there was two candidate keys, in this case CustomerNumber and SocialSecurityNumber. Link http://www.agiledata.org/essays/keys.html
Surrogate key
Introduce a new column, called a surrogate key, which is a key that has no business meaning. An example of which is the AddressID column of the Address table in Figure 1. Addresses don't have an "easy" natural key because you would need to use all of the columns of the Address table to form a key for itself (you might be able to get away with just the combination of Street and ZipCode depending on your problem domain), therefore introducing a surrogate key is a much better option in this case. Link http://www.agiledata.org/essays/keys.html
Unique key
A unique key is a superkey--that is, in the relational model of database organization, a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples (rows) that have the same values for the attributes in this set
Aggregate or Compound keys
When more than one column is combined to form a unique key, their combined value is used to access each row and maintain uniqueness. These keys are referred to as aggregate or compound keys. Values are not combined, they are compared using their data types.
Simple key
Simple key made from only one attribute.
Super key
A superkey is defined in the relational model as a set of attributes of a relation variable (relvar) for which it holds that in all relations assigned to that variable there are no two distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a super key can also be defined as a set of attributes of a relvar upon which all attributes of the relvar are functionally dependent.
Partial Key or Discriminator key
It is a set of attributes that can uniquely identify weak entities and that are related to same owner entity. It is sometime called as Discriminator.