Postgres 9.5 how to support boolean gin index - postgresql

Since btree_gin in 9.5 does not support boolean data type, how can I use boolean column as part of multi-column gin index?

Technically, it's possible, but you need to index the (is_read::int::bit) expression (instead of the column directly). But: you would need to use this expression in your WHERE clauses, to make use of this index; i.e.:
WHERE is_read::int::bit = '1'
-- or
WHERE is_read::int::bit = '0'
-- or even
WHERE is_read::int::bit < '1' -- which is just an obfuscated version of "= '0'"
However, this will make your queries less readable. And maybe even slower (see later).
If you ever query for one value (i.e. WHERE is_read or WHERE NOT is_read, but not both), a partial index would be a better fit.
However, dropping the column from the index could make it (somewhat) more compact, which can even fasten your queries in some cases.
I advise you to test each of these methods on your actual data, or show us (in another, follow-up question) your queries you are concerned with.
Here is a comparison for the above cases with some fairly artificial data:
http://rextester.com/OWXUA55980

Related

Eliminate accents of a string in postgresql [duplicate]

In Microsoft SQL Server, it's possible to specify an "accent insensitive" collation (for a database, table or column), which means that it's possible for a query like
SELECT * FROM users WHERE name LIKE 'João'
to find a row with a Joao name.
I know that it's possible to strip accents from strings in PostgreSQL using the unaccent_string contrib function, but I'm wondering if PostgreSQL supports these "accent insensitive" collations so the SELECT above would work.
Update for Postgres 12 or later
Postgres 12 adds nondeterministic ICU collations, enabling case-insensitive and accent-insensitive grouping and ordering. The manual:
ICU locales can only be used if support for ICU was configured when PostgreSQL was built.
If so, this works for you:
CREATE COLLATION ignore_accent (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
CREATE INDEX users_name_ignore_accent_idx ON users(name COLLATE ignore_accent);
SELECT * FROM users WHERE name = 'João' COLLATE ignore_accent;
fiddle
Read the manual for details.
This blog post by Laurenz Albe may help to understand.
But ICU collations also have drawbacks. The manual:
[...] they also have some drawbacks. Foremost, their use leads to a
performance penalty. Note, in particular, that B-tree cannot use
deduplication with indexes that use a nondeterministic collation.
Also, certain operations are not possible with nondeterministic
collations, such as pattern matching operations. Therefore, they
should be used only in cases where they are specifically wanted.
My "legacy" solution may still be superior:
For all versions
Use the unaccent module for that - which is completely different from what you are linking to.
unaccent is a text search dictionary that removes accents (diacritic
signs) from lexemes.
Install once per database with:
CREATE EXTENSION unaccent;
If you get an error like:
ERROR: could not open extension control file
"/usr/share/postgresql/<version>/extension/unaccent.control": No such file or directory
Install the contrib package on your database server like instructed in this related answer:
Error when creating unaccent extension on PostgreSQL
Among other things, it provides the function unaccent() you can use with your example (where LIKE seems not needed).
SELECT *
FROM users
WHERE unaccent(name) = unaccent('João');
Index
To use an index for that kind of query, create an index on the expression. However, Postgres only accepts IMMUTABLE functions for indexes. If a function can return a different result for the same input, the index could silently break.
unaccent() only STABLE not IMMUTABLE
Unfortunately, unaccent() is only STABLE, not IMMUTABLE. According to this thread on pgsql-bugs, this is due to three reasons:
It depends on the behavior of a dictionary.
There is no hard-wired connection to this dictionary.
It therefore also depends on the current search_path, which can change easily.
Some tutorials on the web instruct to just alter the function volatility to IMMUTABLE. This brute-force method can break under certain conditions.
Others suggest a simple IMMUTABLE wrapper function (like I did myself in the past).
There is an ongoing debate whether to make the variant with two parameters IMMUTABLE which declares the used dictionary explicitly. Read here or here.
Another alternative would be this module with an IMMUTABLE unaccent() function by Musicbrainz, provided on Github. Haven't tested it myself. I think I have come up with a better idea:
Best for now
This approach is more efficient than other solutions floating around, and safer.
Create an IMMUTABLE SQL wrapper function executing the two-parameter form with hard-wired, schema-qualified function and dictionary.
Since nesting a non-immutable function would disable function inlining, base it on a copy of the C-function, (fake) declared IMMUTABLE as well. Its only purpose is to be used in the SQL function wrapper. Not meant to be used on its own.
The sophistication is needed as there is no way to hard-wire the dictionary in the declaration of the C function. (Would require to hack the C code itself.) The SQL wrapper function does that and allows both function inlining and expression indexes.
CREATE OR REPLACE FUNCTION public.immutable_unaccent(regdictionary, text)
RETURNS text
LANGUAGE c IMMUTABLE PARALLEL SAFE STRICT AS
'$libdir/unaccent', 'unaccent_dict';
Then:
CREATE OR REPLACE FUNCTION public.f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.immutable_unaccent(regdictionary 'public.unaccent', $1)
$func$;
In Postgres 14 or later, an SQL-standard function is slightly cheaper, yet:
CREATE OR REPLACE FUNCTION public.f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT
BEGIN ATOMIC
SELECT public.immutable_unaccent(regdictionary 'public.unaccent', $1);
END;
See:
What does BEGIN ATOMIC mean in a PostgreSQL SQL function / procedure?
Drop PARALLEL SAFE from both functions for Postgres 9.5 or older.
public being the schema where you installed the extension (public is the default).
The explicit type declaration (regdictionary) defends against hypothetical attacks with overloaded variants of the function by malicious users.
Previously, I advocated a wrapper function based on the STABLE function unaccent() shipped with the unaccent module. That disabled function inlining. This version executes ten times faster than the simple wrapper function I had here earlier.
And that was already twice as fast as the first version which added SET search_path = public, pg_temp to the function - until I discovered that the dictionary can be schema-qualified, too. Still (Postgres 12) not too obvious from documentation.
If you lack the necessary privileges to create C functions, you are back to the second best implementation: An IMMUTABLE function wrapper around the STABLE unaccent() function provided by the module:
CREATE OR REPLACE FUNCTION public.f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.unaccent('public.unaccent', $1) -- schema-qualify function and dictionary
$func$;
Finally, the expression index to make queries fast:
CREATE INDEX users_unaccent_name_idx ON users(public.f_unaccent(name));
Remember to recreate indexes involving this function after any change to function or dictionary, like an in-place major release upgrade that would not recreate indexes. Recent major releases all had updates for the unaccent module.
Adapt queries to match the index (so the query planner will use it):
SELECT * FROM users
WHERE f_unaccent(name) = f_unaccent('João');
We don't need the function in the expression to the right of the operator. There we can also supply unaccented strings like 'Joao' directly.
The faster function does not translate to much faster queries using the expression index. Index look-ups operate on pre-computed values and are very fast either way. But index maintenance and queries not using the index benefit. And access methods like bitmap index scans may have to recheck values in the heap (the main relation), which involves executing the underlying function. See:
"Recheck Cond:" line in query plans with a bitmap index scan
Security for client programs has been tightened with Postgres 10.3 / 9.6.8 etc. You need to schema-qualify function and dictionary name as demonstrated when used in any indexes. See:
'text search dictionary “unaccent” does not exist' entries in postgres log, supposedly during automatic analyze
Ligatures
In Postgres 9.5 or older ligatures like 'Œ' or 'ß' have to be expanded manually (if you need that), since unaccent() always substitutes a single letter:
SELECT unaccent('Œ Æ œ æ ß');
unaccent
----------
E A e a S
You will love this update to unaccent in Postgres 9.6:
Extend contrib/unaccent's standard unaccent.rules file to handle all
diacritics known to Unicode, and expand ligatures correctly (Thomas
Munro, Léonard Benedetti)
Bold emphasis mine. Now we get:
SELECT unaccent('Œ Æ œ æ ß');
unaccent
----------
OE AE oe ae ss
Pattern matching
For LIKE or ILIKE with arbitrary patterns, combine this with the module pg_trgm in PostgreSQL 9.1 or later. Create a trigram GIN (typically preferable) or GIST expression index. Example for GIN:
CREATE INDEX users_unaccent_name_trgm_idx ON users
USING gin (f_unaccent(name) gin_trgm_ops);
Can be used for queries like:
SELECT * FROM users
WHERE f_unaccent(name) LIKE ('%' || f_unaccent('João') || '%');
GIN and GIST indexes are more expensive (to maintain) than plain B-tree:
Difference between GiST and GIN index
There are simpler solutions for just left-anchored patterns. More about pattern matching and performance:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
pg_trgm also provides useful operators for "similarity" (%) and "distance" (<->).
Trigram indexes also support simple regular expressions with ~ et al. and case insensitive pattern matching with ILIKE:
PostgreSQL accent + case insensitive search
No, PostgreSQL does not support collations in that sense
PostgreSQL does not support collations like that (accent insensitive or not) because no comparison can return equal unless things are binary-equal. This is because internally it would introduce a lot of complexities for things like a hash index. For this reason collations in their strictest sense only affect ordering and not equality.
Workarounds
Full-Text-Search Dictionary that Unaccents lexemes.
For FTS, you can define your own dictionary using unaccent,
CREATE EXTENSION unaccent;
CREATE TEXT SEARCH CONFIGURATION mydict ( COPY = simple );
ALTER TEXT SEARCH CONFIGURATION mydict
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, simple;
Which you can then index with a functional index,
-- Just some sample data...
CREATE TABLE myTable ( myCol )
AS VALUES ('fóó bar baz'),('qux quz');
-- No index required, but feel free to create one
CREATE INDEX ON myTable
USING GIST (to_tsvector('mydict', myCol));
You can now query it very simply
SELECT *
FROM myTable
WHERE to_tsvector('mydict', myCol) ## 'foo & bar'
mycol
-------------
fóó bar baz
(1 row)
See also
Creating a case-insensitive and accent/diacritics insensitive search on a field
Unaccent by itself.
The unaccent module can also be used by itself without FTS-integration, for that check out Erwin's answer
I'm pretty sure PostgreSQL relies on the underlying operating system for collation. It does support creating new collations, and customizing collations. I'm not sure how much work that might be for you, though. (Could be quite a lot.)

How to access a HSTORE column using PostgreSQL C library (libpq)?

I cannot find any documentation regarding HSTORE data access using the C library. Currently I'm considering to just convert the HSTORE columns into arrays in my queries but is there a way to avoid such conversions?
libpqtypes appears to have some support for hstore.
Another option is to avoid directly interacting with hstore in your code. You can still benefit from it in the database without dealing with its text representation on the client side. Say you want to fetch a hstore field; you just use:
SELECT t.id, k, v FROM thetable t, LATERAL each(t.hstorefield);
or on old PostgreSQL versions you can use the quirky and nonstandard set-returning-function-in-SELECT form:
SELECT t.id, each(t.hstorefield) FROM thetable t;
(but watch out if selecting multiple records from t this way, you'll get weird results wheras LATERAL will be fine).
Another option is to use hstore_to_array or hstore_to_matrix when querying, if you're comfortable dealing with PostgreSQL array representation.
To create hstore values you can use the hstore constructors that take arrays. Those arrays can in turn be created with array_agg over a VALUES clause if you don't want to deal with PostgreSQL's array representation in your code.
All this mess should go away in future, as PostgreSQL 9.4 is likely to have much better interoperation between hstore and json types, allowing you to just use the json representation when interacting with hstore.
The binary protocol for hstore is not complicated.
See the _send and _recv functions from its IO code.
Of course, that means requesting (or binding) it in binary format in libpq.
(see the paramFormats[] and resultFormat arguments to PQexecParams)

Why do Postgres Hstore indexes work for ? (operator) and not for EXIST (function)?

http://www.postgresql.org/docs/9.2/static/hstore.html states:
hstore has GiST and GIN index support for the #>, ?, ?& and ?| operators
Yet the indexes don't work for the EXIST function, which appears to be equivalent to the ? operator.
What is the difference between operators and functions that makes it harder to index one or the other?
Might future versions of the Hstore extension make these truly equivalent?
Lookup the documentation for "CREATE OPERATOR CLASS" which describes how you can create indexing methods for arbitrary operators. You also need to use "CREATE OPERATOR" to create an operator based on the EXIST function first.
(Caveat: I have no experience with hstore)
http://www.postgresql.org/docs/9.0/static/sql-createoperator.html
http://www.postgresql.org/docs/9.0/static/sql-createopclass.html
Here's your problem: PostgreSQL functions are planner-opaque. The planner has no way of knowing that the operator and the function are semantically equivalent. This comes up a lot.
PostgreSQL does have functional indexes so you can index outputs of immutable functions but this may not quite make things work perfectly well here since you'd probably be able to only index which rows return true for a given call, but this could still be very useful with partial indexes. For example you could always do something like:
CREATE INDEX bar_has_aaa ON foo(exists(bar, 'aaa'));
or
CREATE INDEX bar_has_aaa ON foo(id) where exists (bar, 'aaa');
But I don' see this going exactly where you need it to go. Hopefully it points you in the right direction though.
Edit: The following strikes me as a better workaround. Suppose we have a table foo:
CREATE TABLE foo (
id serial,
bar hstore
);
We can create a table method bar_keys:
CREATE FUNCTION bar_keys(foo) RETURNS text[] IMMUTABLE LANGUAGE SQL AS $$
SELECT akeys($1.bar);
$$;
Then we can index that using GIN:
CREATE INDEX foo_bar_keys_idx ON foo USING gin(bar_keys(foo));
And we can use it in our queries:
SELECT * FROM foo WHERE foo.bar_keys #> array['aaa'];
That should use an index. Note you could just index/use akeys directly, but I think a virtual column leads to cleaner syntax.

Is it possible to use CASE with IN?

I'm trying to construct a T-SQL statement with a WHERE clause determined by an input parameter. Something like:
SELECT * FROM table
WHERE id IN
CASE WHEN #param THEN
(1,2,4,5,8)
ELSE
(9,7,3)
END
I've tried all combination of moving the IN, CASE etc around that I can think of. Is this (or something like it) possible?
try this:
SELECT * FROM table
WHERE (#param='??' AND id IN (1,2,4,5,8))
OR (#param!='??' AND id in (9,7,3))
this will have a problem using an index.
The key with a dynamic search conditions is to make sure an index is used, instead of how can I easily reuse code, eliminate duplications in a query, or try to do everything with the same query. Here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
It covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
if #param = 'whatever'
select * from tbl where id in (1,2,4,5,8)
else
select * from tbl where id in (9,7,3)

Does Firebird need manual reindexing?

I use both Firebird embedded and Firebird Server, and from time to time I need to reindex the tables using a procedure like the following:
CREATE PROCEDURE MAINTENANCE_SELECTIVITY
ASDECLARE VARIABLE S VARCHAR(200);
BEGIN
FOR select RDB$INDEX_NAME FROM RDB$INDICES INTO :S DO
BEGIN
S = 'SET statistics INDEX ' || s || ';';
EXECUTE STATEMENT :s;
END
SUSPEND;
END
I guess this is normal using embedded, but is it really needed using a server? Is there a way to configure the server to do it automatically when required or periodically?
First, let me point out that I'm no Firebird expert, so I'm answering on the basis of how SQL Server works.
In that case, the answer is both yes, and no.
The indexes are of course updated on SQL Server, in the sense that if you insert a new row, all indexes for that table will contain that row, so it will be found. So basically, you don't need to keep reindexing the tables for that part to work. That's the "no" part.
The problem, however, is not with the index, but with the statistics. You're saying that you need to reindex the tables, but then you show code that manipulates statistics, and that's why I'm answering.
The short answer is that statistics goes slowly out of whack as time goes by. They might not deteriorate to a point where they're unusable, but they will deteriorate down from the perfect level they're in when you recreate/recalculate them. That's the "yes" part.
The main problem with stale statistics is that if the distribution of the keys in the indexes changes drastically, the statistics might not pick that up right away, and thus the query optimizer will pick the wrong indexes, based on the old, stale, statistics data it has on hand.
For instance, let's say one of your indexes has statistics that says that the keys are clumped together in one end of the value space (for instance, int-column with lots of 0's and 1's). Then you insert lots and lots of rows with values that make this index contain values spread out over the entire spectrum.
If you now do a query that uses a join from another table, on a column with low selectivity (also lots of 0's and 1's) against the table with this index of yours, the query optimizer might deduce that this index is good, since it will fetch many rows that will be used at the same time (they're on the same data page).
However, since the data has changed, it'll jump all over the index to find the relevant pieces, and thus not be so good after all.
After recalculating the statistics, the query optimizer might see that this index is sub-optimal for this query, and pick another index instead, which is more suited.
Basically, you need to recalculate the statistics periodically if your data is in flux. If your data rarely changes, you probably don't need to do it very often, but I would still add a maintenance job with some regularity that does this.
As for whether or not it is possible to ask Firebird to do it on its own, then again, I'm on thin ice, but I suspect there is. In SQL Server you can set up maintenance jobs that does this, on a schedule, and at the very least you should be able to kick off a batch file from the Windows scheduler to do something like it.
That does not reindex, it recomputes weights for indexes, which are used by optimizer to select most optimal index. You don't need to do that unless index size changes a lot. If you create the index before you add data, you need to do the recalculation.
Embedded and Server should have exactly same functionality apart the process model.
I wanted to update this answer for newer firebird. here is the updated dsql.
SET TERM ^ ;
CREATE OR ALTER PROCEDURE NEW_PROCEDURE
AS
DECLARE VARIABLE S VARCHAR(300);
begin
FOR select 'SET statistics INDEX ' || RDB$INDEX_NAME || ';'
FROM RDB$INDICES
WHERE RDB$INDEX_NAME <> 'PRIMARY' INTO :S
DO BEGIN
EXECUTE STATEMENT :s;
END
end^
SET TERM ; ^
GRANT EXECUTE ON PROCEDURE NEW_PROCEDURE TO SYSDBA;