how to establish a relationship between two columns in one table - tsql

I want establish a one-to-one relationship between two columns (a program code and a test code) in the same table. I want all tests with the same test code to have the same program code.
My first thought was to use a UDF to find cases where the same test code corresponds to two different programs. I learned that this won't work because t-sql only checks UDFs in check constraints after INSERTS -- not after UPDATES
why is t-sql allowing me to violate a check constraint that uses a UDP?
My next thought was to move the logic from the UDF into the check constraint itself. But t-sql is saying that sub-queries are not allowed in the check constraint. This also means I can't use EXISTS syntax (which I think also uses a sub-query).
ALTER TABLE [dbo].[mytable] WITH CHECK ADD CONSTRAINT [oneProgramPerTest] CHECK
(
(select COUNT(*)
from mydb.dbo.mytable u1
inner join
mydb.dbo.mytable u2 on u1.testcode=u2.testcode and u1.progcode <> u2.progcode
)=0
)
Unless there is some way to enforce this logic without (1) a udf or (2) a subquery then it seems like I need to create a "dummy" table of program codes and then enforce a one-to-one relationship between test codes from myTable and the dummy table. This seems really ugly so there has got to be a better way. Right?

Have you read about normalization (and if you haven't why are you designing a datbase?). You should havea structure with
tableA
id (PK)
programcode
other fields
tableB
programcode (PK)
testcode
Add a formal foreign key between the two tables and define program code as the PK in tableb.
Then to get the data you want:
select <Name specific fields, never use select *>
from tableA a
join tableB b on a.programcode = b.programcode

Related

Is there a way to reserve a range in a postgres sequence?

I'm writing a program that generates large numbers of rows to be inserted into a PostgreSQL database. Due to the presence of multiple indices, this process is getting slower over time. That's why I want to move to using COPY which seems to be significantly faster. The problem is that one of the tables has a foreign key into another, and I do not have the IDs for the foreign key column.
I was thinking that maybe if I could reserve a range in the sequence used for the primary key of the first table, I could do the ID assignment manually but I don't think Postgres natively supports such an operation. Is there a way to achieve this another way?
First off from your source data identify the business key for the parent and child tables. Create those tables and a unique constraint business key. This will not be the surrogate - auto generated - PK. Now create a staging table with all the columns necessary (except the FK). Since you will most likely be using across sessions this is a permanent table, but the intent is single time usage. With this insert into the parent table generating the pk from the sequence. Then insert into the child selecting the FK by referencing the business key from the parent.
insert into parent( <columns> )
select column_list
from stage
on conflict (business key ) do nothing;
insert into child ( <columns>, )
select s.<columns>, a.id
from stage s
join parent a on s.business key = a.business key
on conflict (a.parent_id, child_bk) do nothing;
Since the above is rather obscure in the abstract see a concrete example here. There is no need attempting to "reserve a range", just let the pk/fk develop naturally.

Create unique integer id column for result rows of union query

I have a view as below in which I union several tables and I'm thinking it might be a good idea to have a unique row number for each row in the result set. The prescient reason is I have an admin tool which doesn't know I'm using a view rather than an ordinary table, and which expects a unique id to be present, but I'm now speculating it might be worth doing more generally (i.e. it may make sense to do this in certain theoretical terms - discussion on this would be welcome). Wondering how to do this in postgresql.
CREATE VIEW subscriptions AS (
SELECT subscriber_id, course, end_at
FROM subscriptions_individual_stripe
UNION ALL SELECT subscriber_id, course, end_at
FROM subscriptions_individual_bank_transfer
ORDER BY end_at DESC);
Discussion
The reason these are separate tables is of course that they are actually different entities, and yet I also need to be able to contemplate them in a combined way, hence the VIEW. This is my way of avoiding so-called 'polymorphic relationships' in certain popular web frameworks.
I have a tool that expects an id and while my first thought was that views don't need a unique key, on the other hand, maybe they do...?
Reason being two records could exist in one of the UNIONed tables which were only unique by virtue of the primary key. If one does not include the primary key, the union should remove one of those, so a record would be lost. Should we also take that into account, i.e. select the primary key (here an integer id) for each of the UNIONed tables, but, "convert it" to some other unique id, so the view has its own unique integer primary key? Of course this won't be usable in terms of referencing anything in the original UNIONed tables, but I'm OK with that (The view is a terminal point of my analysis, I don't intend to do anything further with it, and of course it is not writable).
Update
I'm accepting S-Man's answer below because it is a solution to the question I asked, however, as pointed out, the row_number() must not be treated as if it was a real identifier because it will not be.
So as an important aside, I'm left wondering what row_number() is really intended for then. Perhaps it's (mainly? occasionally?) useful where you want to output some query when you plan to export the data somewhere else (i.e. seems almost spreadsheet-ish), and you abandon any sense of it being integrated with the rest of your database?
Table inheritance may be better as Abelisto has pointed out in the comments.
You can add a row count to the UNION using the row_number() window function:
demo:db<>fiddle
CREATE VIEW v_myview AS
SELECT
row_number() OVER (ORDER BY ...) AS id,
*
FROM (
SELECT ...
UNION
SELECT ...
) AS foo;
The main problem with this is: You should never deal with this id as an real identifier because the data of the table can change. So it could be that one table today generates a few records more than yesterday. So, the generated row numbers wouldn't match to the same record as before.
Edit: Removed the md5 solution I added before because of some problems with uniqueness on same data.

Fetching rows from multiple tables with UNION ALL or using one table in production?

I know that for relational database like Postgresql using separated tables would be more efficient but I'm concerning for performance issues because the most executed query will fetch rows from multiple tables using UNION ALL.
I have to option to handle this problem. First one is:
table1 -> column1, column2
table2 -> column1, column2
table3 -> column1, column2, column3
In this solution I have to use 3 different query merged with UNION ALL in production and this query will be performed a user logged in the system (the most executed query in the system)
The other is:
table -> column1, column2, typeColumn, extraColumnForTable3
In this solution I have to create an extra column typeColumn to distinguish which type the row is. And I also have to create a column extraColumnForTable3 for the type table3 and it will be NULL for table2 and table1 type. In this solution the most executed query will include only one SELECT statement.
There will be million of rows in production so I'm concerning about performance. NULL values may occupy an extra space in database but I think it can be negligible. I will use partial index that eliminates NULL values so I don't think it will affect the other queries that fetch specific types. Which one do you think more efficient in production?
In general I find that extensive use of UNION suggests bad database design. There are cases where UNION and UNION ALL make sense but they should be relatively rare outside of recursive common table expressions.
PostgreSQL provides a fairly large number of options for keeping performance on a single table manageable, and as you point out partial indexes are a very good way to manage this problem.
The major problem with breaking up tables such that such UNION statements are common is that it makes primary and foreign key management quite problematic. In general it is almost always far better to make sure your data structure is clear and manageable first, and then worry about optimization than it is to worry about optimization and then try to make the optimized solution manageable.

What is the difference between these two T-SQL statements?

In a SSIS package at work there are some SQL tasks that create staging tables for holding import data. All the statements take this form:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.tbNewTable') AND type in (N'U'))
BEGIN
TRUNCATE TABLE dbo.tbNewTable
END
ELSE
BEGIN
CREATE TABLE dbo.tbNewTable (
ColumnA VARCHAR(10) NULL,
ColumnB VARCHAR(10) NULL,
ColumnC INT NULL
) ON PRIMARY
END
In Itzik Ben-Gan's T-SQL Fundamentals I see a different form of statement for creating a table:
IF OBJECT_ID('dbo.tbNewTable', 'U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbNewTable
END
CREATE TABLE dbo.tbNewTable (
ColumnA VARCHAR(10) NULL,
ColumnB VARCHAR(10) NULL,
ColumnC INT NULL
) ON PRIMARY
Each of these appears to do the same thing. After execution, there will be a empty table called tbNewTable in the dbo schema.
Are there any practical or theoretical differences between the two? What implications might they have?
The first one assumes that if the table exists, it has the same columns as those it would create. The second one does not make that assumption. So if a table with that name happened to exist and had a different set of columns, the two would have very different results.
The first will not actually DROP the table -- it merely TRUNCATES all the data in said table. Hence why the CREATE is guarded.
Thus the form with the DROP will allow the subsequent CREATE to change the schema (when the new table is created) even if tbNewTable previously existed.
Because the DROP/CREATE alters the database schema it may not also be allowed in all cases. For instance, a view created with a SCHEMABINDING will prevent the table from being dropped. (This also hold true for more general FK relationships, should any exist.)
...when SCHEMABINDING is specified, the base table or tables cannot be modified in a way that would affect the view definition.
The TRUNCATE should be marginally faster in one of those constant "don't care" ways: there should be no performance consideration given to one over the other.
There are also permission differences. TRUNCATE only requires the ALTER permission.
The minimum permission required is ALTER on table_name. TRUNCATE TABLE permissions default to the table owner...
Happy coding.
These are very different..
The first does an equality check on the sys.objects system table and looks to see if there is a matching table name. If so, it truncates the table. Basically removing all rows but maintaining the table structure itself - i.e. the actual table is never dropped.
In the second, the check to make sure that the table exists is implicitly done using the OBJECT_ID() method. If so, the table is dropped completely - rows and structure.
If you have a primary and foreign key constraint on the table, you'll certainly have issues dropping it completely... and if you have other tables that are linked to the table you are trying to 'truncate' you'll have issues there too, unless you have cascade deletion turned on.
I tend to dislike either construction in an SSIS package. I create the tables in a deployment script and I want the package to fail if one of the tables I use is missing later on because then something drastically wrong has happened and I want to investigate what before I try putting data anywhere.

Django-ORM Left join with all columns of both tables

i have two tables A and B, i need all the columns of both tables using django ORM(left join).
i am new bee to django and programming please help.
One way is to use the .values() callable on your query (though what you are asking is not very clear). This returns a querydict, rather than a queryset, but behaves more like a left join done SQL directly into the database - i.e returns the rows with null entries from table B.
Assuming table A has a foreign key to table B in the models file.
TableA.object.filter(your filters here).values(tableA__field1, tableA__field2 , ... \
tableB__field1, tableB__field2, etc).
https://docs.djangoproject.com/en/1.3/topics/db/aggregation/#values