Postgres Custom Domain Type vs UNIQUE - postgresql

I wan't to be sure that not a empty text can be stored into my table. Therefore I created a domain type:
CREATE DOMAIN non_empty_text AS TEXT CHECK( VALUE ~ '\S' ); and changed all text types to non_empty_text.
So far so good. But would it be more efficient when I would change the type back to text and create a UNIQUE index and a row with empty values?

You of course need to benchmark this, but offhand, I'd say you should change your approach.
The current domain type logic you have evaluates the string in memory. The second approach requires accessing an index and looking for a block that may or may not be in the cache. Accessing the storage, even if it doesn't happen all the time, is so expensive compared to an in-memory operation, that this probably isn't a good idea.

I think your approach with the domain is correct. The alternative with the unique constraint is an interesting idea, but I would consider it premature optimization.

Related

Is it possible to prevent the reading and/or setting of a field value with DBIx Class?

I'm working in a project that uses Catalyst and DBIx::Class.
I have a requirement where, under a certain condition, users should not be able to read or set a specific field in a table (e.g. the last_name field in a list of users that will be presented and may be edited by the user).
Instead of applying the conditional logic to each part of the project where that table field is read or set, risking old or new cases where the logic is missed, is it possible to implement the logic directly in the DBIx::Class based module, to never return or change the value of that field when the condition is met?
I've been trying to find the answer, and I'm still reading, but I'm somewhat new to DBIx::Class and its documentation. Any help would be highly appreciated. Thank you!
I‘d use an around Moose method modifier on the column accessor generated by DBIC.
This won‘t be a real security solution as you can still access data without the Result class, for example when using HashRefInflator.
Same for calling get_column.
Real security would be at the database level with column level security and not allowing the database user used by the application to fetch that field.
Another solution I can think of is an additional Result class for that table that doesn‘t include the column, maybe even defaulting to it and only use the one including the column when the user has a special role.

PostgreSQL - Any downside to using JSONB as a single function argument

So I have a PG function create_order (language is PL/PGSQL) which accepts a quite a few arguments.
I have noticed that every time I modify argument name, its type, or if I add a new argument, I have to drop the function (CREATE OR REPLACE does not work)
So I have been thinking what if I just accept one single argument of type jsonb and call it day...So the signature would look like create_order(args jsonb)
My questions are
is this considered bad practice in PG "world" (affects performance or some other aspect) or bad practice from a programming standpoint
if #1 is bad approach, would it better to create custom composite type and use it as an argument to a function?
I don't see a big problem with a jsonb function parameter, except maybe that individual parameters make it more obvious what the input values are. But that's nothing that can't be fixed with documentation.
On the other hand, I also see no problem with dropping and recreating functions when the signature changes. It may serve as a reminder that calling sites need to be updated.
I'd say that you should go with the approach that fits you and the problem at hand best – it does not matter from the PostgreSQL point of view.

CHECK CONSTRAINT text is changed by SQL Server. How do I avoid or work around this?

When I define a CHECK CONSTRAINT on a table, I find the condition clause stored can be different than what I entered.
Example:
Alter table T1 add constraint C1 CHECK (field1 in (1,2,3))
Looking at what is stored:
select cc.Definition from sys.check_constraints cc
inner join sys.objects o on o.object_id = cc.parent_object_id
where cc.type = 'C' and cc.name = 'T1';
I see:
([field1]=(3) OR [field1]=(2) OR [field1]=(1))
Whilst these are equivalent, they are not the same text.
(A similar behaviour occurs when using a BETWEEN clause).
My reason for wishing this did not happen is that I am trying to programatically ensure that all my CHECK constraints are correct by comparing the text I would use to define
the constraint with that stored in sys.check_constraints - and if different then drop and recreate the constraint.
However, in these cases, they are always different and so the program would always think it needs to recreate the constraint.
Question is:
Is there any known reason why SQL Server does this translation? Is it just removing a bit of syntactic sugar and storing the clause in a simpler form?
Is there a way to avoid the behaviour (other than to write my constraint clauses in the long form to match what SQL Server would change it to)?
Is there another way to tell if my check constraint is 'out of date' and needs recreating?
Is there any known reason why SQL Server does this translation? Is it just removing a bit of syntactic sugar and storing the clause in a simpler form?
I'm not aware of any reasons documented in the Books Online, or elsewhere. However, my guess is that it's normalized for some purposes that are internal to SQL Server. It might allow SQL Server to be a bit lenient in defining the expression (such as using Database for a column name), but guaranteeing that the column names are always appropriately escaped for whatever engine needs to parse the expression (ie, [Database]).
Is there a way to avoid the behaviour (other than to write my constraint clauses in the long form to match what SQL Server would change it to)?
Probably not. But if your constraints aren't terribly complicated, is re-writing the constraint clauses in the long form such a bad idea?
Is there another way to tell if my check constraint is 'out of date' and needs recreating?
Before I answer this directly, I'd point out that there's a bit of programming philosophy involved here. The API that SQL Server provides for the text of a CHECK constraint only guarantees that you'll get something equivalent to the original expression. While you could certainly build some fancy methods to try to ensure that you'll always be able to reproduce SQL Server's normalized version of the expression, there's no guarantee that Microsoft won't change its normalization rules in the future. And indeed, there's probably no guarantee that two equivalent expressions will always be normalized identically!
So, I'd first advise you to re-examine your architecture, and see if you can accomplish the same result without having to rely on undocumented API behavior.
Having said that, there are a number of methods outlined in this question (and answer).
Another alternative, which is a bit more brute-force but perhaps acceptable, would be to always assume that the expression is "out of date" and simply drop/re-create the constraint every time you check. Unless you're expecting these constraints to frequently become out-of-date (or the tables are quite large), it seems this would be a decent solution. You could probably even run it in a transaction, so that if the new constraint is already violated, simply roll-back the transaction and report the error.

Pretty URLs with hashes (md5)

In our web application we display a list of pulses, but for linking and such we make every pulse uniquely available. In our Couch DB we are giving every pulse a unique id by md5'ing their unique attributes. I.E.: www.foo.com/bar/
Though these md5 sums are extremely long and make for ugly URLs. Is there another way to hash the attributes that will require less characters but still guarantee uniqueness.
Thanks a lot
Instead of creating an ugly md5 you could use a method like this to create a random string of a given length containing certain characters and insert this into a row next to the md5 row that is used for retrieving the data from the database using the 'pretty url' string. One thing to think about would be to take out the vowels from the possible characters as with them, you could end up with bad words :) Also, make sure it does not already exist in the database of course, and if it does just create another one... that won't happen very often though.

Hashes vs Numeric id's

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).
Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?
In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?
Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?
The platform uses php/python with ISAM /w MySQL.
Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.
For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.
If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.
I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.
Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.
Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.
But watch out for collisions, obviously.
With hashes you
Are free to merge the database with a similar one (or a backup), if necessary
Are not doing something that could help some guessing attacks even a bit
Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
(Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
Helping yourself in case there appears a search engine that indexes by GUID.
Please let us know if the last 6 months brought you some clarity on this question...
Hashes aren't guaranteed to be unique, nor, I believe, consistent.
will your users have to remember/use the value? or are you looking at it from a security POV?
From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.
Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.
However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.
Hashes have the advantage that you can check if they are valid or not BEFORE performing any check to your database whether they exist or not. This can help you to fend off attacks with random hashes as you don't need to burden your database with fake lookups.
Therefor, if your hash has some kind of well-defined format with for example a checksum at the end, you can check if it's correct without needing to go to the database.