Weak points while comparing columns in trigger as hstore-data - postgresql

On some occasion I'd like before UPDATE to make sure which columns are changed. To make it as generic as possible, I don't want to use schema, table or column names in function. I found some solution here in SO and other places, and particularly liked idea to use hstore from this answer
Downside of hstore, as said widely, is it that this way I lose data types, everything is stringified.
But using it in context of trigger (while having no complex cols like json or hstore), where both NEW and OLD have same set of cols with according datatypes, I could think of just one problem: NULL and empty values will be not distinguishable.
What other problems I may be faced with, when I detect changes in trigger function like this:
changes := hstore(NEW) - hstore(OLD);
Alternative seems to use jsonb and then write some jsonb_diff function to discover changes. hstore's offered internal subtract-operation seems way more robust, but maybe I have not considered all weak points there.
I'm using Postgres 9.6.

Related

kdb - How to pass a table by reference to kdb function

Define the question
Given an empty table myt defined by
myt:([] id:`int$(); score:`int$())
It is trivial to insert one or more records into it, for example
`myt upsert `id`score!1 100
But when it comes to defining a function to insert into a given table, it seems a different trick.
A first try version could be
upd:{[t] t upsert `id`score!42 314;}
upd[myt]
Apparently it updates nothing to myt itself but a local copy version of it.
Difficulties of Possible solutions
Possible solution 1: using the global variable instead
Let myt be a global variable, the variable will then be accessed inside a function.
upd:{`myt upsert `id`score!42 314;}
upd[]
It looks a good solution, expect if many myts are required. Under this situation, one have to provide a lot of copy for upd function as following
upd0:{`myt0 upsert `id`score!42 314;}
upd1:{`myt1 upsert `id`score!42 314;}
upd2:{`myt2 upsert `id`score!42 314;}
...
So, the global variable solution is not a good solution here.
Possible solution 2: amending table outside function
One can also solve the problem by amending myt just outside the function, returning the modified result by removing the ending ;.
upd:{[t] t upsert `id`score!42 314} / return inserted valued
myt:upd[myt]
It works! But after running this code for millions of times, it works slower and slower. Because this solution discards the "in-place" property of upsert operator, the copy overhead increases as the size of table getting larger.
Pass argument by reference?
Maybe the concept of "pass-by-reference" solution here. Or maybe q has its own solution for this problem and I have not get the essential idea.
[UPDATE] Solved by adding "`" to call-by-name
As cillianreilly answers, it is simple to add a "`" symbol in front of myt to declare it as a global variable when pass it into function. So the perfect solution is direct.
upd:{[t] t upsert `id`score!42 314;}
upd[`myt] / it works
Your first version should achieve what you want. If you pass the table name as a symbol, it will update the global variable and return the table name. If you pass the table itself, it will return the updated table, which you can use in an assignment, as you found in possible solution 2. Note that the actual table will not have been updated by this operation.
q){[t;x]t upsert x}[myt;`id`score!42 314]
id score
--------
42 314
q)count myt
0
q){[t;x]t upsert x}[`myt;`id`score!42 314]
`myt
q)count myt
1
For possible solution 1, why would you need hundreds of myt tables? Regardless, there is no need to hardcode the table name into the function. You can just pass the table name as a symbol as demonstrated above, which will update the global for you. The official kx kdb tick example given on their github uses insert for exactly this scenario, but in practice a lot of developers use upsert. https://github.com/KxSystems/kdb-tick/blob/master/tick/r.q#L6
Hope this helps.

Postgres case sensitivity : Implications of Lower()

I have run into the issue of case sensitive searching in postgres, and have started to deal with this by using LOWER on each side of every WHERE test.
So far so good. However, I understand that in order to make use of indexes, they should be created use LOWER too, which makes sense.
However, what of the PK? presumably these are not going to be effective because it does not seem possible to create a PK using a function on the chosen PK field. Isnt this a concern for any filtering or joining which is done on PKs?
Is there a way of working around this ?
Here are some thoughts on this subjects.
First, you could add a constraint for any column requiring that the data stored be lower case. That would solve the problem inside the database.
Second, you could use a trigger to convert any value to lower case.
Third, you can use ilike. This can make use of indexes for case-insensitive searches.
And fourth, if all your primary keys are synthetic numeric primary keys, then you don't need to worry about case sensitivity.
You can still create a functional index on PK (even consisting of many columns):
CREATE TABLE test(a text, b text, c text, d text, primary key (a,b,c));
CREATE INDEX ON test (lower(a), lower(b), lower(c));
Though, it sounds like there is need for some data improvement operations to be done if you are experiencing this kind of behaviour almost everywhere in your database (like store everything in lower case).

Automatic password hashing in PostgreSQL

I have been using PostgreSQL for the past few weeks and I have been loving it!
I use crypt() and gen_salt() to generate the password hashes, by adding it to the insert query like so:
crypt(:password, gen_salt('bf', 8))
Likewise for the select I use something like:
crypt(:password, u.password)
I want to simplify my SQL code by automating the hash on the table's password column, instead of the SQL queries or additional functions.
To be more clear, when I insert a row in the table, I want it to convert hash/compare immediately.
Is there a way? And if yes, would that be wise?
I won't comment on the "would that be wise?" part of the question (not because I think it's unwise, but because I don't know enough about your needs).
If you want to automatically compute a column value during an INSERT or UPDATE, you need a trigger (see CREATE TRIGGER).
If you want to automatically compute a column value during a SELECT, you need a view (see CREATE VIEW).
There are other ways to achieve what you ask, but triggers and views are probably the most straightforward mechanisms.

Why to create empty (no rows, no columns) table in PostgreSQL

In answer to this question I've learned that you can create empty table in PostgreSQL.
create table t();
Is there any real use case for this? Why would you create empty table? Because you don't know what columns it will have?
These are the things from my point of view that a column less table is good for. They probably fall more into the warm and fuzzy category.
1.
One practical use of creating a table before you add any user
defined columns to it is that it allows you to iterate fast when
creating a new system or just doing rapid dev iterations in general.
2.
Kind of more of 1, but lets you stub out tables that your app logic or procedure can make reference too, even if the columns have
yet to
be put in place.
3.
I could see it coming in handing in a case where your at a big company with lots of developers. Maybe you want to reserve a name
months in advance before your work is complete. Just add the new
column-less table to the build. Of course they could still high
jack it, but you may be able to win the argument that you had it in
use well before they came along with their other plans. Kind of
fringe, but a valid benefit.
All of these are handy and I miss them when I'm not working in PostgreSQL.
I don't know the precise reason for its inclusion in PostgreSQL, but a zero-column table - or rather a zero-attribute relation - plays a role in the theory of relational algebra, on which SQL is (broadly) based.
Specifically, a zero-attribute relation with no tuples (in SQL terms, a table with no columns and no rows) is the relational equivalent of zero or false, while a relation with no attributes but one tuple (SQL: no columns, but one row, which isn't possible in PostgreSQL as far as I know) is true or one. Hugh Darwen, an outspoken advocate of relational theory and critic of SQL, dubbed these "Table Dum" and "Table Dee", respectively.
In normal algebra x + 0 == x and x * 0 == 0, whereas x * 1 == x; the idea is that in relational algebra, Table Dum and Table Dee can be used as similar primitives for joins, unions, etc.
PostgreSQL internally refers to tables (as well as views and sequences) as "relations", so although it is geared around implementing SQL, which isn't defined by this kind of pure relation algebra, there may be elements of that in its design or history.
It is not empty table - only empty result. PostgreSQL rows contains some invisible (in default) columns. I am not sure, but it can be artifact from dark age, when Postgres was Objected Relational database - and PG supported language POSTQUEL. This empty table can work as abstract ancestor in class hierarchy.
List of system columns
I don't think mine is the intended usage however recently I've used an empty table as a lock for a view which I create and change dynamically with EXECUTE. The function which creates/replace the view has ACCESS EXCLUSIVE on the empty table and the other functions which uses the view has ACCESS.

How to alter Postgres table data based on its contents?

This is probably a super simple question, but I'm struggling to come up with the right keywords to find it on Google.
I have a Postgres table that has among its contents a column of type text named content_type. That stores what type of entry is stored in that row.
There are only about 5 different types, and I decided I want to change one of them to display as something else in my application (I had been directly displaying these).
It struck me that it's funny that my view is being dictated by my database model, and I decided I would convert the types being stored in my database as strings into integers, and enumerate the possible types in my application with constants that convert them into their display names. That way, if I ever got the urge to change any category names again, I could just change it with one alteration of a constant. I also have the hunch that storing integers might be somewhat more efficient than storing text in the database.
First, a quick threshold question of, is this a good idea? Any feedback or anything I missed?
Second, and my main question, what's the Postgres command I could enter to make an alteration like this? I'm thinking I could start by renaming the old content_type column to old_content_type and then creating a new integer column content_type. However, what command would look at a row's old_content_type and fill in the new content_type column based off of that?
If you're finding that you need to change the display values, then yes, it's probably a good idea not to store them in a database. Integers are also more efficient to store and search, but I really wouldn't worry about it unless you've got millions of rows.
You just need to run an update to populate your new column:
update table_name set content_type = (case when old_content_type = 'a' then 1
when old_content_type = 'b' then 2 else 3 end);
If you're on Postgres 8.4 then using an enum type instead of a plain integer might be a good idea.
Ideally you'd have these fields referring to a table containing the definitions of type. This should be via a foreign key constraint. This way you know that your database is clean and has no invalid values (i.e. referential integrity).
There are many ways to handle this:
Having a table for each field that can contain a number of values (i.e. like an enum) is the most obvious - but it breaks down when you have a table that requires many attributes.
You can use the Entity-attribute-value model, but beware that this is too easy to abuse and cause problems when things grow.
You can use, or refer to my implementation solution PET (Parameter Enumeration Tables). This is a half way house between between 1 & 2.