JOIN without keys? - tsql

in T-SQL (SQL Server 2008), is it technically correct to INNER JOIN two tables through key-less columns (no relationships)? Thank you.

Yes. I often join child of a child table bypassing the "middle" table.
Foreign keys are constraint for data integrity: nothing to do with query formulation. You should have them anyway unless you don't care about data quality and integrity.

You can join on any two fields that have the same data (and they should be the same datatype as well). If the fields are indexed you should not have performance issues either unless you are joining on two varchar (4000) fields. You may even need to do this when you have a poor database design and one column is serving more than one purpose espcially if you have used an EAV model.
However what you won't get in this scenario is data integrity. Without a foreign key to enforce the rules, you may find that the related table has values that don't have a match to the parent table. This could result in records not seen in queries that should be seen or records which cannot be correctly interpreted. So it is a good practice to set up foreign keys where they are applicable.

It will work, it might not be very efficient though...You should definitely create the foreign keys if you can.

Technically it will work. No problems there.However, sometimes the query plan generator will use FK's to help make better use of indexes. But, from a design standpoint, it's not such a great idea. You should be using FK's as much as possible, especially if you want to go the ORM route.

Related

PostgreSQL - What are the best practices for the USING table expression?

I'm taking a course about PostgreSQL coming from a MySQL background and I stumbled upon the USING table expression. I know it is a shorthand to, well, shorten the ON conditions for JOINs, but I have questions
https://www.postgresql.org/docs/13/queries-table-expressions.html
Are they actually used?
I think that having, say, a "customerid" PRIMARY key on some "customers" table just to be able to use USING is way less unconvenient than just having a normal "id" PRIMARY key as I've always done; is it bad practice?
USING clauses are used quite often. It is rather a design choice for the tables in a database. Sometimes customers.id is used in the primary table and sometimes customers.customer_id.
Usually you'll see customer_id as foreign keys in other tables.
If in your queries you plan to do a lot of simple joins on foreign vs primary keys structuring the tables to be able to use the USING clause might be worth it if it simplifies many queries.
I would say none of the two options could be considered bad practice.

Normalize and use primary / foreign keys SQL Server 2008 R2

NOTE: I have never done this before:
What are some steps or documentation to help normalize tables/views in a database? Currently, there are several tables and views in a database that do not use primary/foreign key concept and sort of repeats same information in multiple tables.
I'd like to clean this up and also somewhat setup a process that would keep relationship updated. Example, if a person zipcode changes or record is removed then it automatically updates its relationship with other tables row/s.
NOTE:* My question is to normalize existing database tables. The tables are live so how do I approach normalization? Do I create a brand new database with table structure I want and then move data to that database? Once data moved, I plug in stored procedures and imports?
This question is somewhat broad, so I will only explain the concept.
Views are generally used for reporting/data presentation purposes and therefore I would not try to normalise them. Your case may be different.
You also need to be clear about primary / foreign key concept:
Lack of actual constraints (e.g. PRIMARY KEY, FOREIGN KEY) defined on the table does not mean that the tables do not have logical relationships on columns.
Data maintenance can be implemented in Triggers.
If you really have a situation where a lot of highly de-normalised data exists in tables for no apparent reason and you want to normalise it then this problem can be approached in two ways:
Full re-write - I would recommend for small / new Apps
"Gradual" re-factoring - large / mature applications, where underlying data relationships are complex and / or may not be fully understood.
Within "Gradual" re-factoring there are a few ways as well:
2.a. You take 1 old table and replace it with a new table and at the same time change all code that uses the old table to use the new table. For large systems this can be problematic as you simply may not be aware of all places that reference this table. On the other hand, it may be useful for situations where the table structure change is not significant and/or when the number of dependencies is small.
2.b. Another way is to create new table(s) (in the same database) in the shape / form you desire. The current tables should be replaced with Views that return identical data (to old tables) but sourced from "new" tables. This approach removes / minimises the need to modify all dependencies immediately. The drawback is that the View that replaces the old table can become rather complex, especially if View Instead Of Triggers are needed to be implemented.

Postgres: how to delete entries in multiple tables

I have a task that to implement a 'rollback' (not the usual rollback) function for a batch of entries from different tables. For example:
def rollback(cursor, entries):
# entries is a dict of such form:
# {'table_name1': [id1, id2, ...], 'table_name2': [id1, id2, ...], ...}
I need to delete entries in each table_name. But because these entries may have relationship between so a bit complex. My idea is in several steps:
Find out all columns from all tables that are nullable.
Update all entries set all columns that are nullable to null. After this step there should be no circular depends (if not, i think they can't be insert into the table)
Find out their depends and make a topological sort.
Delete one by one.
My questions are:
Does the idea make sense?
Has anyone done something similar before? And how?
How to query the meta tables for step 3? coz i'm quite new to postgresql.
Any idea and suggestion would be appreciate.
(1) and (2) are not right. It's quite likely that there will be columns defined NOT NULL REFERENCES othertable(othercol) - there are in any normal schema.
What I think you need to do is to sort the foreign key dependency graph to find an ordering that allows you to DELETE, table-by-table, the data you need to remove. Be aware that circular dependencies are possible due to deferred foreign key constraints, so you need to demote/ignore DEFERRABLE INITIALLY DEFERRED constraints; you can temporarily violate those so long as it's all consistent again at COMMIT time.
Even then you might run into issues. What if a client used SET CONSTRAINTS to make a DEFERRABLE INITIALLY IMMEDIATE constraint DEFERRED during a transaction? You'd then fail to cope with the circular dependency. To handle this your code must [SET CONSTRAINTS ALL DEFERRED] before proceeding.
You will need to look at the information_schema or the PostgreSQL-specific system catalogs to work out the dependencies. It might be worth a look at the pg_dump source code too, since it tries to order dumped tables to avoid dependency conflicts. You'll be particularly interested in the pg_constraint
catalog, or its information_schema equivalents information_schema.referential_constraints, information_schema.constraint_table_usage and information_schema.constraint_column_usage.
You can use the either the information_schema or pg_catalog. Don't use both. information_schema is SQL-standard and more portable, but can be slow to query and doesn't have all the information pg_catalog contains. On the flip side, pg_catalog's schema isn't guaranteed to remain compatible across major versions (like 9.1 to 9.2) - though it generally does - and its use isn't portable.

Advantages and disadvantages of having composite primary key

Instead of having a composite primary key (this table maintains the relationship between the two tables which represents two entities [two tables]), the design is proposed to have identity column as primary key and the unique data constraint is enforced over two columns which represents the data from the primary key of entities.
For me having identity column for each relationship table is breaking the normalisation rules.
What is the industry standards?
What are the considerations to make before making the design decision on this?
Which approach is right?
There are lots of tables where you may want to have an identity column as a primary key. However, in the case of a M:M relationship table you describe, best practice is NOT to use a new identity column for the primary key.
RThomas's link in his comment provides the excellent reasons why the best practice is to NOT add an identity column. Here's that link.
The cons will outweigh the pros in pretty much every case, but since you asked for pros and cons I put a couple of unlikely pros in as well.
Cons
Adds complexity
Can lead to duplicate relationships unless you enforce uniqueness on the relationship (which a primary key would do by default).
Likely slower: db must maintain two indexes rather than one.
Pros
All the pros are pretty sketchy
If you had a situation where you needed to use the primary key of the relationship table as a join to a separate table (e.g. an audit table?) the join would likely be faster. (As noted though--adding and removing records will likely be slower. Further, if your relationship table is a relationship between tables that themselves use unique IDs, the speed increase from using one identity column in the join vs two will be minimal.)
The application, for simplicity, may assume that every table it works with has a unique ID as its primary key. (That's poor design in the app but you may not have control over it.) You could imagine a scenario where it is better to introduce some extra complexity in the DB than the extra complexity into such an app.
Cons:
Composite primary keys have to be imported in all referencing tables.
That means larger indexes, and more code to write (e.g. the joins,
the updates). If you are systematic about using composite primary
keys, it can become very cumbersome.
You can't update a part of the primary key. E.g. if you use
university_id, student_id as primary key in a table of university
students, and one student changes university, you have to delete
and recreate the record.
Pros:
Composite primary keys allow to enforce a common kind of constraint
in a powerful and seemless way. Suppose you have a table UNIVERSITY,
a table STUDENT, a table COURSE, and a table STUDENT_COURSE (which
student follows which course). If it is a constraint that you always
have to be a student of university A in order to follow a course of
university A, then that constraint will be automatically validated if
university_id is a part of the composite keys of both STUDENT and
COURSE.
You have to create all the columns in each tables wherever it is used as foreign key. This is the biggest disadvantage.

Does the entity framework preserve ordering when it does inserts into the database?

We plan on using identity columns in our sql server database. Currently we are using guids to generate unique ids, but it turns out that the ordering is relevant so we consider switching to identity columsn.
Since ordering is relevant we want to make sure that the order in which we add objects to the entity context is also the order in which they are inserted into the database. This is relevant since sql server will be generating values for the identity column.
Is this guaranteed by the entity framework? If not, what is an efficient solution to generating your own unique integer ids for a database that is being updated from different processes.
I am just guessing here (although it should be pretty easy to test), but I don't think EF can guarantee the order. I am pretty sure that the internal structure is based on an IEnumerable type, probably a List, which are just enumerated during insert, and enumeration is as far as I know not guaranteed to be in the same order every time.
I would instead add a dedicated "sort order" column to your database table and take it form there.
I wouldn't rely on the insert order as the order of your records returned. Sure, it'll work most of the time, but what if you ever need to insert a row somewhere? I'd say your best bet is to add an ordinal column and actively generate ordinals for each row as you'd like them to be returned.