How to query PostgreSQL for all records where hstore is empty? - postgresql

I would like to query my PostgreSQL table for all rows that have an empty hstore. It must be obvious to all but me, as I can't find any documentation on how to do it nor other StackOverflow questions answering the question. Checking for NULL isn't helpful, as I get back all rows, even those rows where properties has no keys/values:
SELECT widgets.*
FROM "widgets"
WHERE properties IS NOT NULL
Any ideas how to do this?

One option would be to return all keys of the column as an array and check if that array is not empty:
SELECT *
FROM widgets
WHERE properties IS NOT NULL
AND array_length(akeys(properties),1) > 0;

A simpler way of doing this as per a message on the PostgreSQL listerve is:
SELECT * FROM widgets WHERE properties = ''::HSTORE;
an aside:
As a general rule, I would recommend that you don't have a column which can be empty sometimes and NULL other times, outside of distinct cases where you want a semantic difference there, e.g. for an application that interprets the two differently. Especially for BOOLEAN columns, I generally choose to prevent this extra degree of freedom.
I often do something do something similar to:
ALTER TABLE widgets ALTER COLUMN properties SET NOT NULL;
-- OR:
ALTER TABLE widgets ADD CONSTRAINT its_null_or_nothin CHECK ( properties <> ''::HSTORE );
Depending on what you find easier to remember, you can also create an empty hstore via hstore('{}'::TEXT[]).

Related

PostgresQL: Value is only allowed in one of two columns

I have a database with two tables "Config" and "Config_xml", each consisting of the same columns (id, content, modifier, etc...). The only difference is, that config only contains non-xml strings in its content column, whereas config_xml contains an xml string in its content column.
Now I'd like to combine these two tables into one, providing a content column and an xml_content column, to simplify querying, because at the moment I always have to query on both tables.
Now is there a way to constrain each row to allow a value in either content or xml_content?
Thanks in advance.
You can use a check constraint that requires one column to be null.
alter table the_table
add constraint check_content
check (num_nulls(config, config_xml) = 1);
To also avoid empty strings, you might want to use:
check (num_nulls(nullif(trim(config::text), ''), nullif(trim(config_xml::text), '')) = 1)

Fast way to check if PostgreSQL jsonb column contains certain string

The past two days I've been reading a lot about jsonb, full text search, gin index, trigram index and what not but I still can not find a definitive or at least a good enough answer on how to fastly search if a row of type JSONB contains certain string as a value. Since it's a search functionality the behavior should be like that of ILIKE
What I have is:
Table, lets call it app.table_1 which contains a lot of columns one of which is of type JSONB, so lets call it column_jsonb
The data inside column_jsonb will always be flatten (no nested objects, etc) but the keys can vary. An example of the data in the column with obfuscated values looks like this:
"{""Key1"": ""Value1"", ""Key2"": ""Value2"", ""Key3"": null, ""Key4"": ""Value4"", ""Key5"": ""Value5""}"
I have a GIN index for this column which doesn't seems to affect the search time significantly (I am testing with 20k records now which takes about 550ms). The indes looks like this:
CREATE INDEX ix_table_1_column_jsonb_gin
ON app.table_1 USING gin
(column_jsonb jsonb_path_ops)
TABLESPACE pg_default;
I am interested only in the VALUES and the way I am searching them now is this:
EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
Here search_term is variable coming from the front end with the string that the user is searching for
I have the following questions:
Is it possible to make the check faster without modifying the data model? I've read that trigram index might be usfeul for similar cases but at least for me it seems that converting jsonb to text and then checking will be slower and actually I am not sure if the trigram index will actually work if the column original type is JSONB and I explicitly cast each row to text? If I'm wroing I would really appreciate some explanation with example if possible.
Is there some JSONB function that I am not aware of which offers what I am searching for out of the box, I'm constrained to PostgreSQL v 11.9 so some new things coming with version 12 are not available for me.
If it's not possible to achieve significant improvement with the current data structure can you propose a way to restructure the data in column_jsonb maybe another column of some other type with data persisted in some other way, I don't know...
Thank you very much in advance!
If the data structure is flat, and you regularly need to search the values, and the values are all the same type, a traditional key/value table would seem more appropriate.
create table table1_options (
table1_id bigint not null references table1(id),
key text not null,
value text not null
);
create index table1_options_key on table1_options(key);
create index table1_options_value on table1_options(value);
select *
from table1_options
where value ilike 'some search%';
I've used simple B-Tree indexes, but you can use whatever you need to speed up your particular searches.
The downsides are that all values must have the same type (doesn't seem to be a problem here) and you need an extra table for each table. That last one can be mitigated somewhat with table inheritance.

Prevent non-collation characters in a NVarChar column using constraint?

Little weird requirement, but here it goes. We have a CustomerId VarChar(25) column in a table. We need to make it NVarChar(25) to work around issues with type conversions.
CHARINDEX vs LIKE search gives very different performance, why?
But, we don't want to allow non-latin characters to be stored in this column. Is there any way to place such a constraint on column? I'd rather let database handle this check. In general we OK with NVarChar for all of our strings, but some columns like ID's is not a good candidates for this because of possibility of look alike strings from different languages
Example:
CustomerId NVarChar(1) - PK
Value 1: BOPOH
Value 2: ВОРОН
Those 2 strings different (second one is Cyrillic)
I want to prevent this entry scenario. I want to make sure Value 2 can not be saved into the field.
Just in case it helps somebody. Not sure it's most "elegant" solution but I placed constraint like this on those fields:
ALTER TABLE [dbo].[Carrier] WITH CHECK ADD CONSTRAINT [CK_Carrier_CarrierId] CHECK ((CONVERT([varchar](25),[CarrierId],(0))=[CarrierId]))
GO

Postgres ENUM data type or CHECK CONSTRAINT?

I have been migrating a MySQL db to Pg (9.1), and have been emulating MySQL ENUM data types by creating a new data type in Pg, and then using that as the column definition. My question -- could I, and would it be better to, use a CHECK CONSTRAINT instead? The MySQL ENUM types are implemented to enforce specific values entries in the rows. Could that be done with a CHECK CONSTRAINT? and, if yes, would it be better (or worse)?
Based on the comments and answers here, and some rudimentary research, I have the following summary to offer for comments from the Postgres-erati. Will really appreciate your input.
There are three ways to restrict entries in a Postgres database table column. Consider a table to store "colors" where you want only 'red', 'green', or 'blue' to be valid entries.
Enumerated data type
CREATE TYPE valid_colors AS ENUM ('red', 'green', 'blue');
CREATE TABLE t (
color VALID_COLORS
);
Advantages are that the type can be defined once and then reused in as many tables as needed. A standard query can list all the values for an ENUM type, and can be used to make application form widgets.
SELECT n.nspname AS enum_schema,
t.typname AS enum_name,
e.enumlabel AS enum_value
FROM pg_type t JOIN
pg_enum e ON t.oid = e.enumtypid JOIN
pg_catalog.pg_namespace n ON n.oid = t.typnamespace
WHERE t.typname = 'valid_colors'
enum_schema | enum_name | enum_value
-------------+---------------+------------
public | valid_colors | red
public | valid_colors | green
public | valid_colors | blue
Disadvantages are, the ENUM type is stored in system catalogs, so a query as above is required to view its definition. These values are not apparent when viewing the table definition. And, since an ENUM type is actually a data type separate from the built in NUMERIC and TEXT data types, the regular numeric and string operators and functions don't work on it. So, one can't do a query like
SELECT FROM t WHERE color LIKE 'bl%';
Check constraints
CREATE TABLE t (
colors TEXT CHECK (colors IN ('red', 'green', 'blue'))
);
Two advantage are that, one, "what you see is what you get," that is, the valid values for the column are recorded right in the table definition, and two, all native string or numeric operators work.
Foreign keys
CREATE TABLE valid_colors (
id SERIAL PRIMARY KEY NOT NULL,
color TEXT
);
INSERT INTO valid_colors (color) VALUES
('red'),
('green'),
('blue');
CREATE TABLE t (
color_id INTEGER REFERENCES valid_colors (id)
);
Essentially the same as creating an ENUM type, except, the native numeric or string operators work, and one doesn't have to query system catalogs to discover the valid values. A join is required to link the color_id to the desired text value.
As other answers point out, check constraints have flexibility issues, but setting a foreign key on an integer id requires joining during lookups. Why not just use the allowed values as natural keys in the reference table?
To adapt the schema from punkish's answer:
CREATE TABLE valid_colors (
color TEXT PRIMARY KEY
);
INSERT INTO valid_colors (color) VALUES
('red'),
('green'),
('blue');
CREATE TABLE t (
color TEXT REFERENCES valid_colors (color) ON UPDATE CASCADE
);
Values are stored inline as in the check constraint case, so there are no joins, but new valid value options can be easily added and existing values instances can be updated via ON UPDATE CASCADE (e.g. if it's decided "red" should actually be "Red", update valid_colors accordingly and the change propagates automatically).
One of the big disadvantages of Foreign keys vs Check constraints is that any reporting or UI displays will have to perform a join to resolve the id to the text.
In a small system this is not a big deal but if you are working on a HR or similar system with very many small lookup tables then this can be a very big deal with lots of joins taking place just to get the text.
My recommendation would be that if the values are few and rarely changing, then use a constraint on a text field otherwise use a lookup table against an integer id field.
PostgreSQL has enum types, works as it should. I don't know if an enum is "better" than a constraint, they just both work.
From my point of view, given a same set of values
Enum is a better solution if you will use it on multiple column
If you want to limit the values of only one column in your application, a check constraint is a better solution.
Of course, there is a whole lot of other parameters which could creep in your decision process (typically, the fact that builtin operators are not available), but I think these two are the most prevalent ones.
I'm hoping somebody will chime in with a good answer from the PostgreSQL database side as to why one might be preferable to the other.
From a software developer point of view, I have a slight preference for using check constraints, since PostgreSQL enum's require a cast in your SQL to do an update/insert, such as:
INSERT INTO table1 (colA, colB) VALUES('foo', 'bar'::myenum)
where "myenum" is the enum type you specified in PostgreSQL.
This certainly makes the SQL non-portable (which may not be a big deal for most people), but also is just yet another thing you have to deal with while developing applications, so I prefer having VARCHARs (or other typical primitives) with check constraints.
As a side note, I've noticed that MySQL enums do not require this type of cast, so this is something particular to PostgreSQL in my experience.

Why does Postgres handle NULLs inconsistently where unique constraints are involved?

I recently noticed an inconsistency in how Postgres handles NULLs in columns with a unique constraint.
Consider a table of people:
create table People (
pid int not null,
name text not null,
SSN text unique,
primary key (pid)
);
The SSN column should be kept unique. We can check that:
-- Add a row.
insert into People(pid, name, SSN)
values(0, 'Bob', '123');
-- Test the unique constraint.
insert into People(pid, name, SSN)
values(1, 'Carol', '123');
The second insert fails because it violates the unique constraint on SSN. So far, so good. But let's try a NULL:
insert into People(pid, name, SSN)
values(1, 'Carol', null);
That works.
select *
from People;
0;"Bob";"123"
1;"Carol";"<NULL>"
A unique column will take a null. Interesting. How can Postgres assert that null is in any way unique, or not unique for that matter?
I wonder if I can add two rows with null in a unique column.
insert into People(pid, name, SSN)
values(2, 'Ted', null);
select *
from People;
0;"Bob";"123"
1;"Carol";"<NULL>"
2;"Ted";"<NULL>"
Yes I can. Now there are two rows with NULL in the SSN column even though SSN is supposed to be unique.
The Postgres documentation says, For the purpose of a unique constraint, null values are not considered equal.
Okay. I can see the point of this. It's a nice subtlety in null-handling: By considering all NULLs in a unique-constrained column to be disjoint, we delay the unique constraint enforcement until there is an actual non-null value on which to base that enforcement.
That's pretty cool. But here's where Postgres loses me. If all NULLs in a unique-constrained column are not equal, as the documentation says, then we should see all of the nulls in a select distinct query.
select distinct SSN
from People;
"<NULL>"
"123"
Nope. There's only a single null there. It seems like Postgres has this wrong. But I wonder: Is there another explanation?
Edit:
The Postgres docs do specify that "Null values are considered equal in this comparison." in the section on SELECT DISTINCT. While I do not understand that notion, I'm glad it's spelled out in the docs.
It is almost always a mistake when dealing with null to say:
"nulls behave like so-and-so here, *so they should behave like
such-and-such here"
Here is an excellent essay on the subject from a postgres perspective. Briefly summed up by saying nulls are treated differently depending on the context and don't make the mistake of making any assumptions about them.
The bottom line is, PostgreSQL does what it does with nulls because the SQL standard says so.
Nulls are obviously tricky and can be interpreted in multiple ways (unknown value, absent value, etc.), and so when the SQL standard was initially written, the authors had to make some calls at certain places. I'd say time has proved them more or less right, but that doesn't mean that there couldn't be another database language that handles unknown and absent values slightly (or wildly) differently. But PostgreSQL implements SQL, so that's that.
As was already mentioned in a different answer, Jeff Davis has written some good articles and presentations on dealing with nulls.
NULL is considered to be unique because NULL doesn't represent the absence of a value. A NULL in a column is an unknown value. When you compare two unknowns, you don't know whether or not they are equal because you don't know what they are.
Imagine that you have two boxes marked A and B. If you don't open the boxes and you can't see inside, you never know what the contents are. If you're asked "Are the contents of these two boxes the same?" you can only answer "I don't know".
In this case, PostgreSQL will do the same thing. When asked to compare two NULLs, it says "I don't know." This has a lot to do with the crazy semantics around NULL in SQL databases. The article linked to in the other answer is an excellent starting point to understanding how NULLs behave. Just beware: it varies by vendor.
Multiple NULL values in a unique index are okay because x = NULL is false for all x and, in particular, when x is itself NULL. You'll also run into this behavior in WHERE clauses where you have to say WHERE x IS NULL and WHERE x IS NOT NULL rather than WHERE x = NULL and WHERE x <> NULL.