How to create composite UNIQUE constraint with nullable columns? - postgresql

Let's say I have a table with several columns [a, b, c, d] which can all be nullable. This table is managed with Typeorm.
I want to create a unique constraint on [a, b, c]. However this constraint does not work if one of these column is NULL. I can insert for instance [a=0, b= 1, c=NULL, d=0] and [a=0, b= 1, c=NULL, d=1], where d has different values.
With raw SQL, I could set multiple partial constraints (Create unique constraint with null columns) however, in my case, the unique constraint is on 10 columns. It seems absurd to set a constraint for every possible combination...
I could also create a sort of hash function, but this method does not seem proper to me?
Does Typeorm provide a solution for such cases?

If you have values that can never appear in those columns, you can use them as a replacement in the index:
create unique index on the_table (coalesce(a,-1), coalesce(b, -1), coalesce(c, -1));
That way NULL values are treated the same inside the index, without the need to use them in the table.
If those columns are numeric or float (rather than integer or bigint) using '-Infinity' might be a better substitution value.
There is a drawback to this though:
This index will however not be usable for query on those columns unless you also use the coalesce() expression. So with the above index a query like:
select *
from the_table
where a = 10
and b = 100;
would not use the index. You would need to use the same expressions as used in the index itself:
select *
from the_table
where coalesce(a, -1) = 10
and coalesce(b, -1) = 100;

Related

How to re-map array column values in select in Postgresql?

Is it possible to re-map integer values from a Postgres array column in the select? This is what I have:
select unnest(tag_ids) from mention m where id = 288201;
unnest
---------
-143503
-143564
125192
143604
137694
tag_ids is integer[] column
I would like to translate those numbers. Functions like abs(unnest(..)) work but found I cannot use a CASE statement. Tx.
If you want to do anything non-trivial with the elements from an array after unnesting, use the set-returning function like table:
select u.tag_id
from mention m
cross join unnest(m.tag_ids) as u(tag_id)
where m.id = 288201;
Now, u.tag_id is an integer column that you can use like any other column, e.g. in a CASE expression.

How to store point (x, y) in a database?

I need to store location (x,y point) in my database where, point can be null and X and Y are always less then 999. At the moment I'm using EFCore Code First and Postgresql database, but I'd like to be flexible so that I can switch to MSSql without too much work. I'm not planning to move away from EF Core.
Right now, I have two columns: LocationX and LocationY, both are int? type. I'm not sure if this is good solution, because technically DB allows (X=2, Y=null), and it's should be. It's either both are null, or both are not.
My option two is to store it in a one string column: "123x321", with max length of 7.
Is there a better way?
Thanks,
Check constraint could be used to enforce both column are NULL or NOT NULL at the same time:
CREATE TABLE t(id INT,
x INT,
y INT,
CHECK((x IS NULL AND y IS NULL) OR (x IS NOT NULL AND y IS NOT NULL))
);
db<>fiddle demo
In addition to the check constraint suggested by #LukaszSzozda you can restrict the x,y values with an additional check constraint on each. So assuming they must also be in range 0,999 then
CREATE TABLE t(id INT,
x INT constraint x_range check ( x>=0 and x<=999),
y INT constraint y_range check ( y>=0 and y<=999),
CHECK((x IS NULL AND y IS NULL) OR (x IS NOT NULL AND y IS NOT NULL))
);
As far a your idea of storing a single string - very bad. Not only will you have the issue of separating values every time you need them it allows for distinctly invalid data. Values '1234567' and even 'abcdefg' are completely valid as far as the database is concerned.
So your table definition must account for and eliminate them. With this your table definition becomes:
create table txy
( xy_string varchar(7)
, constraint xy_format check( xy_string ~* '^\d{1,3}x\d{1,3}')
)
insert into txy(xy_string)
( values ('1x2'), ('354X512'), ('38x92') );
Which is actually a reduction as it is back to a single constraint, but your queries now require something like:
select xy_string
, regexp_replace(xy_string, '^(\d+)(X|x)(\d+)','\1') x
, regexp_replace(xy_string, '^(\d+)(X|x)(\d+)','\3') y
from txy;
See demo here.
In short never store groups of numbers values as a single delimited string. The additional work is just not worth it.

Use sum function in calculated column

Is it possible to use a sum function in a calculated column?
If yes, I would like to create a calculated column, that calculates the sum of a column in the same table where the date is smaller than the date of this entry. is this possible?
And last, would this optimize repeated calls on this value over the exemplified view below?
SELECT ProductGroup, SalesDate, (
SELECT SUM(Sales)
FROM SomeList
WHERE (ProductGroup= KVU.ProductGroup) AND (SalesDate<= KVU.SalesDate)) AS cumulated
FROM SomeList AS KVU
Is it possible to use a sum function in a calculated column?
Yes, it's possible using a scalar valued function (scalar UDF) for you computed column but this would be a disaster. Using scalar UDFs for computed columns destroy performance. Adding a scalar UDF that accesses data (which would be required here) makes things even worse.
It sounds to me like you just need a good ol' fashioned index to speed things up. First some sample data:
IF OBJECT_ID('dbo.somelist','U') IS NOT NULL DROP TABLE dbo.somelist;
GO
CREATE TABLE dbo.somelist
(
ProductGroup INT NOT NULL,
[Month] TINYINT NOT NULL CHECK ([Month] <= 12),
Sales DECIMAL(10,2) NOT NULL
);
INSERT dbo.somelist
VALUES (1,1,22),(2,1,45),(2,1,25),(2,1,19),(1,2,100),(1,2,200),(2,2,50.55);
and the correct index:
CREATE NONCLUSTERED INDEX nc_somelist ON dbo.somelist(ProductGroup,[Month])
INCLUDE (Sales);
With this index in place this query would be extremely efficient:
SELECT s.ProductGroup, s.[Month], SUM(s.Sales)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
If you needed to get a COUNT by month & product group you could create an indexed view like so:
CREATE VIEW dbo.vw_somelist WITH SCHEMABINDING AS
SELECT s.ProductGroup, s.[Month], TotalSales = COUNT_BIG(*)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
GO
CREATE UNIQUE CLUSTERED INDEX uq_cl__vw_somelist ON dbo.vw_somelist(ProductGroup, [Month]);
Once that indexed view was in place your COUNTs would be pre-aggregated. You cannot, however, include SUM in an indexed view.

PostgreSQL hierarchical nested set huge database

I have a database that must store thousands of scenarios (each scenario with a single unix_timestamp value). Each scenario has 1,800,000 registers organized in a Nested Set structure.
The general table structure is given by:
table_skeleton:
- unix_timestamp integer
- lft integer
- rgt integer
- value
Usually, my SELECTs are will perform taking all nested values within an specific scenario, it means for example:
SELECT * FROM table_skeleton WHERE unix_timestamp = 123 AND lft >= 10 AND rgt <= 53
So I hierarchically divided my table into master / children within groups of dates, for example:
table_skeleton_201303 inherits table_skeleton:
- unix_timestamp integer
- lft integer
- ...
and
table_skeleton_201304 inherits table_skeleton:
- unix_timestamp integer
- lft integer
- ...
And also created index for each children according to the usual search I am expecting, it is for example:
Create Index idx_201303
on table_skeleton_201303
using btree(unix_timestamp, lft, rgt)
It improved the retrieval, but it still takes about 1 minute for each select.
I imagined that this was because the index was too big to be loaded into memory always so I tried to create partial index for each timestamp, for example:
Create Index idx_201303_1362981600
on table_skeleton_201303
using btree(lft, rgt)
WHERE unix_timestamp = 1362981600
And in fact the second type of index created is much, much, much smaller than the general one. However, when I run an EXPLAIN ANALYZE for the SELECT I've previously shown here, the query solver ignores my new partial index and keeps using the giant old one.
Is there a reason for that?
Is there any new approach to optimize such type of huge nested set hierarchical database?
When you filter on a table by field_a > x and field_b > y, then an index for field_a, field_b will (actually just may, depending on the distribution and the percentage of rows with field_a > x, as per the statistics collected) only be used for "field_a > x", and field_b > y will be a sequential search.
In the case above, having two indexes, one for each field, could be used and each of the results joined, the internal equivalent of:
SELECT *
FROM table t
JOIN (
SELECT id table field_a > x) ta ON (ta.id = t.id)
JOIN (
SELECT id table field_b > y) tb ON (tb.id = t.id);
There is a change you could benefit from a GIST index, and treating your lft and rgt fields as points:
CREATE INDEX ON table USING GIST (unix_timestamp, point(lft, rgt));
SELECT * table
WHERE unix_timestamp = 123 AND
point(lft,rgt) <# box(point(10,'-inf'), point('inf',53));

Postgresql Select all columns and column names with a specific value for a row

I have a table with many(+1000) columns and rows(~1M). The columns have either the value 1 , or are NULL.
I want to be able to select, for a specific row (user) retrieve the column names that have a value of 1.
Since there are many columns on the table, specifying the columns would yield a extremely long query.
You're doing something SQL is quite bad at - dynamic access to columns, or treating a row as a set. It'd be nice if this were easier, but it doesn't work well with SQL's typed nature and the concept of a relation. Working with your data set in its current form is going to be frustrating; consider storing an array, json, or hstore of values instead.
Actually, for this particular data model, you could probably use a bitfield. See bit(n) and bit varying(n).
It's still possible to make a working query with your current model PostgreSQL extensions though.
Given sample:
CREATE TABLE blah (id serial primary key, a integer, b integer, c integer);
INSERT INTO blah(a,b,c) VALUES (NULL, NULL, 1), (1, NULL, 1), (NULL, NULL, NULL), (1, 1, 1);
I would unpivot each row into a key/value set using hstore (or in newer PostgreSQL versions, the json functions). SQL its self provides no way to dynamically access columns, so we have to use an extension. So:
SELECT id, hs FROM blah, LATERAL hstore(blah) hs;
then extract the hstores to sets:
SELECT id, k, v FROM blah, LATERAL each(hstore(blah)) kv(k,v);
... at which point your can filter for values matching the criteria. Note that all columns have been converted to text, so you may want to cast it back:
SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1;
You also need to exclude id from matching, so:
regress=> SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1 AND
k <> 'id';
id | k
----+---
1 | c
2 | a
2 | c
4 | a
4 | b
4 | c
(6 rows)