Database design preference: Using a DateTime and a BIT in SQL 2000 - tsql

I need to explain this by example:
Is there a best practice or preference for specifying a DateTime and BIT in a database table?
In my database I have a Widget table. I need to know if a widget is "Closed" and it's "Closed Date" Business rules say that if a widget is closed, it must have a closed date. If a widget is not closed, it should not have a "Closed Date".
To design this, I could do the following:
(Example 1):
CREATE TABLE [Widget]
(
[WidgetID] INT IDENTITY(1,1)
,[ClosedDate] DATETIME NULL
)
or (Example 2):
CREATE TABLE [Widget]
(
[WidgetID] INT IDENTITY(1,1)
,[IsClosed] BIT NOT NULL CONSTRAINT [DF_Widget_IsClosed] DEFAULT (0)
,[ClosedDate] DATETIME NULL
)
I think that Example 1 is cleaner because it is one less column to have to worry about. But, whenever I need to evaluate whether a Widget is Closed, I would need an extra step to figure out if the ClosedDate column IS NOT NULL.
Example 2 creates extra overhead because now I have to keep both the IsClosed and ClosedDate values in sync.
Is there a best practice when designing something like this?
Would querying the table be more performant for Example 2? Is there any reason why I should choose one design over the other?
Note: I would be accessing this value through an ORM tool as well as Stored Procedures.

I think that option 1 is better. Data integrity is better kept (impossible to have a closed date with a flag which says the inverse), takes less disk space in the case of extra large tables, and queries would still be performant and clear to understand for teammates.

The first is better. Checking for null is cheap, whereas keeping a separate flag makes it possible to have a closed date yet not be closed.

I think you have the IsClosed column as a computed column.
CREATE TABLE [Widget](
[WidgetID] INT IDENTITY(1,1),
[ClosedDate] DATETIME NULL,
IsClosed AS CAST(CASE WHEN ClosedDate IS NULL THEN 0 ELSE 1 END AS BIT)
)
The reason is that you are not storing anything and you can now code your application code and stored procs to use this column. If your business rule ever changes you can convert this into a real column and you will not need to change other code. Otherwise you will have business logic sprinkled throughout your application code and stored procs. This way, it is only in 1 place.
Finally, when you move to SQL2005 you can add the "Persisted" clause. So it will be stored increasing the performance slightly and you will not have an issue with keeping them in sync.

I would not assign semantic meaning to NULL. Doing so will bubble through your business logic and you will get code like ...
public class Widget
{
// stuff
public bool IsClosed
{
// what do you put here?
// it was null in the db so you have to use DateTime.MinDate or some such.
return( _closeDate == ?? );
}
// more stuff
}
Using null in that fashion is bad. NULL (and null) mean "I don't know". You are assigning semantic meaning to that answer when in reality, you should not. The closed status is the closed status and the closed date is the closed date, don't combine them. (God forbid you ever want to re-open a Widget but still remember when it got closed in the first place, for example.)
Eric Lippert has a nice blog post on using null in this way (kidna) as well.

Related

Entity Framework OrderBy().Where() expression

I've been using Entity Framework for a little while now with no issues, until i stumbled upon a curly one....well it is for me atleast anyway. I have searched the internet and cannot find anything related to this but i assume it is merely because i am asking the wrong question. So here goes...
query= query.OrderByDescending(u => u.DateCreated);
This is simple and works fine. However the table being queried is for workflow and there are 4 date columns, CreatedDate, EstimatedDate, RevisedDate and ActualDate. At the beginning of the workflow for this element the CreatedDate will and all the other date columns will be NULL. As the element progresses through workflow the subsequent dates will be filled.
So what i am trying to achieve is this, i don't want any grouping, i just want the date to be used for OrderBy() to be the last date in the workflow.
I can achieve this by adding another column to my table called FilterDate which is used solely for sorting and gets updated with the appropriate date based upon workflow, however this is adding another column to my table just because I can't come up with a smart method of achieving this.
It's not pretty, but this should be what you're looking for, assuming that ActualDate (if populated) is always >= RevisedDate >= EstimatedDate => CreatedDate:
query= query.OrderByDescending(u => u.ActualDate.HasValue
? u.ActualDate.Value
: u.RevisedDate.HasValue
? u.RevisedDate.Value
: u.EstimatedDate.HasValue
? u.EstimatedDate.Value
: u.CreatedDate);
This will order by whichever date is available preferencing the Actual over Revised over Estimated and defaulting to the Created.
This doesn't handle if it's possible that a RevisedDate could be after an ActualDate for instance. If the row had an RevisedDate of 2020-12-02 and an ActualDate of 2020-11-25, this query would be using the ActualDate for comparison, not the later RevisedDate.
If I understand your question properly then you don't need to add an extra column to your table.
You just need to add a [NotMapped] decorated property to your object, because that property won't be saved in the database.
[NotMapped]
public DateTime FilteringDate
{
get
{
if (ActualDate.HasValue) return (DateTime)ActualDate;
else if (RevisedDate.HasValue) return (DateTime)RevisedDate;
else if (EstimatedDate.HasValue) return (DateTime)EstimatedDate;
else return CreatedDate;
}
}
usage
query = query.OrderByDescending(u => u.FilteringDate);

How to implement a high performing non incremental ID in postgresql? [duplicate]

I would like to replace some of the sequences I use for id's in my postgresql db with my own custom made id generator. The generator would produce a random number with a checkdigit at the end. So this:
SELECT nextval('customers')
would be replaced by something like this:
SELECT get_new_rand_id('customer')
The function would then return a numerical value such as: [1-9][0-9]{9} where the last digit is a checksum.
The concerns I have is:
How do I make the thing atomic
How do I avoid returning the same id twice (this would be caught by trying to insert it into a column with unique constraint but then its to late to I think)
Is this a good idea at all?
Note1: I do not want to use uuid since it is to be communicated with customers and 10 digits is far simpler to communicate than the 36 character uuid.
Note2: The function would rarely be called with SELECT get_new_rand_id() but would be assigned as default value on the id-column instead of nextval().
EDIT: Ok, good discussusion below! Here are some explanation for why:
So why would I over-comlicate things this way? The purpouse is to hide the primary key from the customers.
I give each new customer a unique
customerId (generated serial number in
the db). Since I communicate that
number with the customer it is a
fairly simple task for my competitors
to monitor my business (there are
other numbers such as invoice nr and
order nr that have the same
properties). It is this monitoring I
would like to make a little bit
harder (note: not impossible but
harder).
Why the check digit?
Before there was any talk of hiding the serial nr I added a checkdigit to ordernr since there were klumbsy fingers at some points in the production, and my thought was that this would be a good practice to keep in the future.
After reading the discussion I can certainly see that my approach is not the best way to solve my problem, but I have no other good idea of how to solve it, so please help me out here.
Should I add an extra column where I put the id I expose to the customer and keep the serial as primary key?
How can I generate the id to expose in a sane and efficient way?
Is the checkdigit necessary?
For generating unique and random-looking identifiers from a serial, using ciphers might be a good idea. Since their output is bijective (there is a one-to-one mapping between input and output values) -- you will not have any collisions, unlike hashes. Which means your identifiers don't have to be as long as hashes.
Most cryptographic ciphers work on 64-bit or larger blocks, but the PostgreSQL wiki has an example PL/pgSQL procedure for a "non-cryptographic" cipher function that works on (32-bit) int type. Disclaimer: I have not tried using this function myself.
To use it for your primary keys, run the CREATE FUNCTION call from the wiki page, and then on your empty tables do:
ALTER TABLE foo ALTER COLUMN foo_id SET DEFAULT pseudo_encrypt(nextval('foo_foo_id_seq')::int);
And voila!
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> select * from foo;
foo_id
------------
1241588087
1500453386
1755259484
(4 rows)
I added my comment to your question and then realized that I should have explained myself better... My apologies.
You could have a second key - not the primary key - that is visible to the user. That key could use the primary as the seed for the hash function you describe and be the one that you use to do lookups. That key would be generated by a trigger after insert (which is much simpler than trying to ensure atomicity of the operation) and
That is the key that you share with your clients, never the PK. I know there is debate (albeit, I can't understand why) if PKs are to be invisible to the user applications or not. The modern database design practices, and my personal experience, all seem to suggest that PKs should NOT be visible to users. They tend to attach meaning to them and, over time, that is a very bad thing - regardless if they have a check digit in the key or not.
Your joins will still be done using the PK. This other generated key is just supposed to be used for client lookups. They are the face, the PK is the guts.
Hope that helps.
Edit: FWIW, there is little to be said about "right" or "wrong" in database design. Sometimes it boils down to a choice. I think the choice you face will be better served by leaving the PK alone and creating a secondary key - just that.
I think you are way over-complicating this. Why not let the database do what it does best and let it take care of atomicity and ensuring that the same id is not used twice? Why not use a postgresql SERIAL type and get an autogenerated surrogate primary key, just like an integer IDENTITY column in SQL Server or DB2? Use that on the column instead. Plus it will be faster than your user-defined function.
I concur regarding hiding this surrogate primary key and using an exposed secondary key (with a unique constraint on it) to lookup clients in your interface.
Are you using a sequence because you need a unique identifier across several tables? This is usually an indication that you need to rethink your table design, and those several tables should perhaps be combined into one, with an autogenerated surrogate primary key.
Also see here
How you generate the random and unique ids is a useful question - but you seem to be making a counter productive assumption about when to generate them!
My point is that you do not need to generate these id's at the time of creating your rows, because they are essentially independent of the data being inserted.
What I do is pre-generate random id's for future use, that way I can take my own sweet time and absolutely guarantee they are unique, and there's no processing to be done at the time of the insert.
For example I have an orders table with order_id in it. This id is generated on the fly when the user enters the order, incrementally 1,2,3 etc forever. The user does not need to see this internal id.
Then I have another table - random_ids with (order_id, random_id). I have a routine that runs every night which pre-loads this table with enough rows to more than cover the orders that might be inserted in the next 24 hours. (If I ever get 10000 orders in one day I'll have a problem - but that would be a good problem to have!)
This approach guarantees uniqueness and takes any processing load away from the insert transaction and into the batch routine, where it does not affect the user.
Your best bet would probably be some form of hash function, and then a checksum added to the end.
If you're not using this too often (you do not have a new customer every second, do you?) then it is feasible to just get a random number and then try to insert the record. Just be prepared to retry inserting with another number when it fails with unique constraint violation.
I'd use numbers 1000000 to 999999 (900000 possible numbers of the same length) and check digit using UPC or ISBN 10 algorithm. 2 check digits would be better though as they'll eliminate 99% of human errors instead of 9%.

Set nextval sequence data type to integer only

I have an issues running around my mind regarding default for 'id' field in my postgresql database. Here is the syntax:-
nextval('unsub_keyword_id_seq'::regclass)
However I'm not really understands even after read the documentations & I would like to set the value only for integer(digit only). I try to alter the column by change regclass to other OIDs but each time it will return errors.
Really appreciate if can get this solved very soon.
Update:
It just come to my idea on the data type for the column after I try & error with the code that will produce the id for the column.
Is integer(postgresql in this case) have it's own default length or not?
If I need to to insert long id, should I set the column length?
Kindly advise.
sorry if my questions quite confusing. your comments may help me to improve it.
From the comments:
I need to insert an id with length of 50 with consist of 2 alphabets & the rest is numeric. the problems occur as the data type is in integer & the data inserting in unsuccessful. is it possible to insert my desired data by retain the data type to integer?
If I understand this correctly, you probably need to format a string, e.g.
format('%s%s', 'XX', nextval('some_sequence_name'))

duplicate primary key in return table created by select union

I have the following query called searchit
SELECT 2 AS sourceID, BLOG_COMMENTS.bID, BLOG_TOPICS.Topic_Title,
BLOG_TOPICS.LFD, BLOG_TOPICS.LC,
BLOG_COMMENTS.Comment_Narrative
FROM BLOG_COMMENTS INNER JOIN BLOG_TOPICS
ON BLOG_COMMENTS.bID = BLOG_TOPICS.bID
WHERE (BLOG_COMMENTS.Comment_Narrative LIKE #Phrase)
This query executes AND returns the correct results in the query builder!
HOWEVER, the query needs to run in code-behind, so I have the following line:
DataTable blogcomments = btad.searchit(aphrase);
There are no null fields in any row of any column in EITHER of the tables. The tables are small enough I can easily detect null data. Note that bID is key for blog_topics and cID is key for blog comments.
In any case, when I run this I get the following error:
Failed to enable constraints. One or more rows contain values
violating non-null, unique, or foreign-key constraints.
Tables have a 1 x N relationship, many comments for each blog entry. IF I run the query with DISTINCT and remove the Comment_Narrative from the return fields, it returns data correctly (but I need the other rows!) However, when I return the other rows, I get the above error!
I think tells me that there is a constraint on the return table that I did not put there, therefore it must somehow be inheriting that constraint from the call to the query itself because one of the tables happens to have a primary key defined (which it MUST have). But why does the query work fine in the querybuilder? The querybuilder does not care that bID is duped in the result (and it should not be), but the code-behind DOES care.
Addendum:
Just as tests,
I removed the bID from the return list and I still get the error.
I removed the primary key from blog_topics.bID and I get the same error.
This kinda tells me that it's not the fact that my bID is duped that is causing the problem.
Another test:
I went into the designer code (I know it's nasty, I'm just desperate).
I added the following:
// zzz
try
{
this.Adapter.Fill(dataTable);
}
catch ( global::System.Exception ex )
{
}
Oddly enough, when I run it, I get the same error as before AND it doesn't show the changes I've made in the error message:
Line 13909: }
Line 13910: BPLL_Dataset.BLOG_TOPICSDataTable dataTable = new BPLL_Dataset.BLOG_TOPICSDataTable();
Line 13911: this.Adapter.Fill(dataTable);
Line 13912: return dataTable;
Line 13913: }
I'm stumped.... Unless maybe it sees I'm not doing anything in the try catch and is optimizing for me.
Another addendum:
Suspecting that it was ignoring the test code I added to the designer, I added something to the catch. It produces the SAME error and acts like it does not see this code. (Well, okay, it DOES NOT see this code, because it prints out same as before into the browser.)
// zzz
try
{
this.Adapter.Fill(dataTable);
}
catch ( global::System.Exception ex )
{
System.Web.HttpContext.Current.Response.Redirect("errorpage.aspx");
}
The thing is, when I made the original post, I was ALREADY trying to do a work-around. I'm not sure how far I can afford to go down the rabbit hole. Maybe I read the whole mess into C# and do all the joins and crap myself. I really hate to do that, because I've only recently gotten out of the habit, but I perceive I'm making a good faith effort to use the the tool the way God and Microsoft intended. From wit's end, tff.
You don't really show how you're running this query from C# ... but I'm assuming either as a straight text in a SqlCommand or it's being done by some ORM ... Have you attempted writing this query as a Stored Procedure and calling it that way? The stored Procedure would be easier to test and run by itself with sample data.
Given the fact that the error is mentioning null values I would presume that, if it is a problem with the query and not some other element of your code, then it'd have to be on one of the following fields:
BLOG_COMMENTS.bID
BLOG_TOPICS.bID
BLOG_COMMENTS.Comment_Narrative
If any of those fields are Nullable then you should be doing a COALESCE or an ISNULL on them before using them in any comparison or Join. It's situations like these which explain why most DBAs prefer to have as few nullable columns in tables as possible - they cause overhead and are prone to errors.
If that still doesn't fix your problem, then COALESCE/ISNULL all fields that are nullable and are being returned by this query. Take all null values out of the equation and just get the thing working and then, if you really need the null values to be null, go back through and remove the COALESCE/ISNULLs one at a time until you find the culprit.
My problem came from ignorance and a bit of dullness. I did not realize that just because a field is a key in the sql table does mean it has to be a key in the tableadapter. If one has a key field defined in the SQL table and then creates a table adapter, the corresponding field in the adapter will also be a key. All I had to do was unset the key field in the tableadapter and it worked.
Solution:
Select the key field in the adapter.
Right click
Select "Delete Key" (keeps the field, but removes the "key" icon)
That's it.

How to alter Postgres table data based on its contents?

This is probably a super simple question, but I'm struggling to come up with the right keywords to find it on Google.
I have a Postgres table that has among its contents a column of type text named content_type. That stores what type of entry is stored in that row.
There are only about 5 different types, and I decided I want to change one of them to display as something else in my application (I had been directly displaying these).
It struck me that it's funny that my view is being dictated by my database model, and I decided I would convert the types being stored in my database as strings into integers, and enumerate the possible types in my application with constants that convert them into their display names. That way, if I ever got the urge to change any category names again, I could just change it with one alteration of a constant. I also have the hunch that storing integers might be somewhat more efficient than storing text in the database.
First, a quick threshold question of, is this a good idea? Any feedback or anything I missed?
Second, and my main question, what's the Postgres command I could enter to make an alteration like this? I'm thinking I could start by renaming the old content_type column to old_content_type and then creating a new integer column content_type. However, what command would look at a row's old_content_type and fill in the new content_type column based off of that?
If you're finding that you need to change the display values, then yes, it's probably a good idea not to store them in a database. Integers are also more efficient to store and search, but I really wouldn't worry about it unless you've got millions of rows.
You just need to run an update to populate your new column:
update table_name set content_type = (case when old_content_type = 'a' then 1
when old_content_type = 'b' then 2 else 3 end);
If you're on Postgres 8.4 then using an enum type instead of a plain integer might be a good idea.
Ideally you'd have these fields referring to a table containing the definitions of type. This should be via a foreign key constraint. This way you know that your database is clean and has no invalid values (i.e. referential integrity).
There are many ways to handle this:
Having a table for each field that can contain a number of values (i.e. like an enum) is the most obvious - but it breaks down when you have a table that requires many attributes.
You can use the Entity-attribute-value model, but beware that this is too easy to abuse and cause problems when things grow.
You can use, or refer to my implementation solution PET (Parameter Enumeration Tables). This is a half way house between between 1 & 2.