Indexing an array for full text search - postgresql

I am trying to index documents to be searchable on their tag array.
CREATE INDEX doc_search_idx ON documents
USING gin(
to_tsvector('english', array_to_string(tags, ' ')) ||
to_tsvector('english', coalesce(notes, '')))
)
Where tags is a (ci)text[]. However, PG will refuse to index array_to_string because it is not always immutable.
PG::InvalidObjectDefinition: ERROR: functions in index expression must be marked IMMUTABLE
I've tried creating a homebrew array_to_string immutable function, but I feel like playing with fire as I don't know what I'm doing. Any way not to re-implement it?
Looks like I could just repackage the same function and label it immutable, but looks like there are risks when doing that.
How do I index the array for full-text search?

In my initial answer I suggested a plain cast to text: tags::text. However, while most casts to text from basic types are defined IMMUTABLE, this it is not the case for array types. Obviously because (quoting Tom Lane in a post to pgsql-general):
Because it's implemented via array_out/array_in rather than any more
direct method, and those are marked stable because they potentially
invoke non-immutable element I/O functions.
Bold emphasis mine.
We can work with that. The general case cannot be marked as IMMUTABLE. But for the case at hand (cast citext[] or text[] to text) we can safely assume immutability. Create a simple IMMUTABLE SQL function that wraps the function. However, the appeal of my simple solution is mostly gone now. You might as well wrap array_to_string() (like you already pondered) for which similar considerations apply.
For citext[] (create separate functions for text[] if needed):
Either (based on a plain cast to text):
CREATE OR REPLACE FUNCTION f_ciarr2text(citext[])
RETURNS text LANGUAGE sql IMMUTABLE AS 'SELECT $1::text';
This is faster.
Or (using array_to_string() for a result without curly braces):
CREATE OR REPLACE FUNCTION f_ciarr2text(citext[])
RETURNS text LANGUAGE sql IMMUTABLE AS $$SELECT array_to_string($1, ',')$$;
This is a bit more correct.
Then:
CREATE INDEX doc_search_idx ON documents USING gin (
to_tsvector('english', COALESCE(f_ciarr2text(tags), '')
|| ' ' || COALESCE(notes,'')));
I did not use the polymorphic type ANYARRAY like in your answer, because I know text[] or citext[] are safe, but I can't vouch for all other array types.
Tested in Postgres 9.4 and works for me.
I added a space between the two strings to avoid false positive matches across the concatenated strings. There is an example in the manual.
If you sometimes want to search just tags or just notes, consider a multicolumn index instead:
CREATE INDEX doc_search_idx ON documents USING gin (
to_tsvector('english', COALESCE(f_ciarr2text(tags), '')
, to_tsvector('english', COALESCE(notes,''));
The risks you are referring to apply to temporal functions mostly, which are used in the referenced question. If time zones (or just the type timestamptz) are involved, results are not actually immutable. We do not lie about immutability here. Our functions are actually IMMUTABLE. Postgres just can't tell from the general implementation it uses.
Related
Often people think they need text search, while similarity search with trigram indexes would be a better fit:
PostgreSQL LIKE query performance variations
Not relevant in this exact case, but while working with citext, consider this:
Index on column with data type citext not used

Here's my naive solution, to wrap it and call it immutable, as suspected.
CREATE FUNCTION immutable_array_to_string(arr ANYARRAY, sep TEXT)
RETURNS text
AS $$
SELECT array_to_string(arr, sep);
$$
LANGUAGE SQL
IMMUTABLE
;

Related

Postgres full text search for partial word

I've been seeing a lot of examples around like this one:
postgres full text search like operator
They all specify that you can do a prefix search like this:
SELECT *
FROM eventlogging
WHERE description_tsv ## to_tsquery('mess:*');
and it will retrieve a word like: "message"
However, what I do not see anywhere is whether or not there is a way to search for different parts of a word, such as a suffix?
The example that I am having trouble with right now is this:
CREATE TABLE IF NOT EXISTS project (
id VARCHAR NOT NULL,
org_name VARCHAR NOT NULL DEFAULT '',
project_name VARCHAR NOT NULL DEFAULT ''
);
insert into project(id, org_name, project_name) values ('123', 'org', 'proj');
insert into project(id, org_name, project_name) values ('456', 'huh', 'org');
insert into project(id, org_name, project_name) values ('789', 'orgs', 'project');
CREATE OR REPLACE FUNCTION get_projects(query_in VARCHAR)
RETURNS TABLE (id VARCHAR, org_name VARCHAR, project_name VARCHAR) AS $$
BEGIN
RETURN QUERY
SELECT * FROM project WHERE (
to_tsvector('simple', coalesce(project.project_name, '')) ||
to_tsvector('simple', coalesce(project.org_name, ''))
) ## to_tsquery('simple', query_in);
END;
$$ LANGUAGE plpgsql;
The following example returns:
select * from get_projects('org');
id org_name project_name
----------------------------
123 org proj
456 huh org
My question is: why does it not return orgs? Similarly, if I search for proj, I only get the project named "proj" but not the one named "project."
Bonus points: how can I get results if I search for a substring? For example, if I search for the string jec, I would like to get back the project named project. I'm not really looking for fuzzy searching, but I would say that I am looking for substring searching.
Am I completely wrong to be using to_tsquery? I also tried plainto_tsquery and I tried using english instead of simple, but several references said to stick with simple.
Full text search is different from substring search. Full text search is about searching whole words, omitting frequent words from indexing, ignoring inflection and the like. PostgreSQL full text search extends that somewhat by allowing prefix searches.
To search for substrings, you have to search with a condition like
WHERE word ~ 'suffix\M'
(This would be a suffix search with the regular expression matching operator ~.)
To speed up a search like that, create a trigram index:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON tab USING gin (doc gin_trgm_ops);
So-called prefix searching doesn't really thematically belong in full text searching. I think it was tossed in because, given that tokens would be stored in a btree anyway, adding that "feature" was free. No other types of partial matching are mentioned in the context of FTS because they don't exist.
You discuss the partial matching that does exist with FTS, the :* notation. But then in your example, you don't actually use it. That is why you don't see it working, because you don't use it. If you do use it, it does work:
select * from get_projects('org:*');
But given your description, it sounds like you don't want FTS in the first place. You want LIKE or regex, perhaps with index support from pg_trgm.
but several references said to stick with simple.
It is hard to know how good the judgement of anonymous references are, but if you only want to use 'simple' than most likely you shouldn't be using FTS in the first place. 'simple' is useful for analyzing or learning or debugging real FTS situations, and can be used as a baseline for building up more complex configurations.

Eliminate accents of a string in postgresql [duplicate]

In Microsoft SQL Server, it's possible to specify an "accent insensitive" collation (for a database, table or column), which means that it's possible for a query like
SELECT * FROM users WHERE name LIKE 'João'
to find a row with a Joao name.
I know that it's possible to strip accents from strings in PostgreSQL using the unaccent_string contrib function, but I'm wondering if PostgreSQL supports these "accent insensitive" collations so the SELECT above would work.
Update for Postgres 12 or later
Postgres 12 adds nondeterministic ICU collations, enabling case-insensitive and accent-insensitive grouping and ordering. The manual:
ICU locales can only be used if support for ICU was configured when PostgreSQL was built.
If so, this works for you:
CREATE COLLATION ignore_accent (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
CREATE INDEX users_name_ignore_accent_idx ON users(name COLLATE ignore_accent);
SELECT * FROM users WHERE name = 'João' COLLATE ignore_accent;
fiddle
Read the manual for details.
This blog post by Laurenz Albe may help to understand.
But ICU collations also have drawbacks. The manual:
[...] they also have some drawbacks. Foremost, their use leads to a
performance penalty. Note, in particular, that B-tree cannot use
deduplication with indexes that use a nondeterministic collation.
Also, certain operations are not possible with nondeterministic
collations, such as pattern matching operations. Therefore, they
should be used only in cases where they are specifically wanted.
My "legacy" solution may still be superior:
For all versions
Use the unaccent module for that - which is completely different from what you are linking to.
unaccent is a text search dictionary that removes accents (diacritic
signs) from lexemes.
Install once per database with:
CREATE EXTENSION unaccent;
If you get an error like:
ERROR: could not open extension control file
"/usr/share/postgresql/<version>/extension/unaccent.control": No such file or directory
Install the contrib package on your database server like instructed in this related answer:
Error when creating unaccent extension on PostgreSQL
Among other things, it provides the function unaccent() you can use with your example (where LIKE seems not needed).
SELECT *
FROM users
WHERE unaccent(name) = unaccent('João');
Index
To use an index for that kind of query, create an index on the expression. However, Postgres only accepts IMMUTABLE functions for indexes. If a function can return a different result for the same input, the index could silently break.
unaccent() only STABLE not IMMUTABLE
Unfortunately, unaccent() is only STABLE, not IMMUTABLE. According to this thread on pgsql-bugs, this is due to three reasons:
It depends on the behavior of a dictionary.
There is no hard-wired connection to this dictionary.
It therefore also depends on the current search_path, which can change easily.
Some tutorials on the web instruct to just alter the function volatility to IMMUTABLE. This brute-force method can break under certain conditions.
Others suggest a simple IMMUTABLE wrapper function (like I did myself in the past).
There is an ongoing debate whether to make the variant with two parameters IMMUTABLE which declares the used dictionary explicitly. Read here or here.
Another alternative would be this module with an IMMUTABLE unaccent() function by Musicbrainz, provided on Github. Haven't tested it myself. I think I have come up with a better idea:
Best for now
This approach is more efficient than other solutions floating around, and safer.
Create an IMMUTABLE SQL wrapper function executing the two-parameter form with hard-wired, schema-qualified function and dictionary.
Since nesting a non-immutable function would disable function inlining, base it on a copy of the C-function, (fake) declared IMMUTABLE as well. Its only purpose is to be used in the SQL function wrapper. Not meant to be used on its own.
The sophistication is needed as there is no way to hard-wire the dictionary in the declaration of the C function. (Would require to hack the C code itself.) The SQL wrapper function does that and allows both function inlining and expression indexes.
CREATE OR REPLACE FUNCTION public.immutable_unaccent(regdictionary, text)
RETURNS text
LANGUAGE c IMMUTABLE PARALLEL SAFE STRICT AS
'$libdir/unaccent', 'unaccent_dict';
Then:
CREATE OR REPLACE FUNCTION public.f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.immutable_unaccent(regdictionary 'public.unaccent', $1)
$func$;
In Postgres 14 or later, an SQL-standard function is slightly cheaper, yet:
CREATE OR REPLACE FUNCTION public.f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT
BEGIN ATOMIC
SELECT public.immutable_unaccent(regdictionary 'public.unaccent', $1);
END;
See:
What does BEGIN ATOMIC mean in a PostgreSQL SQL function / procedure?
Drop PARALLEL SAFE from both functions for Postgres 9.5 or older.
public being the schema where you installed the extension (public is the default).
The explicit type declaration (regdictionary) defends against hypothetical attacks with overloaded variants of the function by malicious users.
Previously, I advocated a wrapper function based on the STABLE function unaccent() shipped with the unaccent module. That disabled function inlining. This version executes ten times faster than the simple wrapper function I had here earlier.
And that was already twice as fast as the first version which added SET search_path = public, pg_temp to the function - until I discovered that the dictionary can be schema-qualified, too. Still (Postgres 12) not too obvious from documentation.
If you lack the necessary privileges to create C functions, you are back to the second best implementation: An IMMUTABLE function wrapper around the STABLE unaccent() function provided by the module:
CREATE OR REPLACE FUNCTION public.f_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.unaccent('public.unaccent', $1) -- schema-qualify function and dictionary
$func$;
Finally, the expression index to make queries fast:
CREATE INDEX users_unaccent_name_idx ON users(public.f_unaccent(name));
Remember to recreate indexes involving this function after any change to function or dictionary, like an in-place major release upgrade that would not recreate indexes. Recent major releases all had updates for the unaccent module.
Adapt queries to match the index (so the query planner will use it):
SELECT * FROM users
WHERE f_unaccent(name) = f_unaccent('João');
We don't need the function in the expression to the right of the operator. There we can also supply unaccented strings like 'Joao' directly.
The faster function does not translate to much faster queries using the expression index. Index look-ups operate on pre-computed values and are very fast either way. But index maintenance and queries not using the index benefit. And access methods like bitmap index scans may have to recheck values in the heap (the main relation), which involves executing the underlying function. See:
"Recheck Cond:" line in query plans with a bitmap index scan
Security for client programs has been tightened with Postgres 10.3 / 9.6.8 etc. You need to schema-qualify function and dictionary name as demonstrated when used in any indexes. See:
'text search dictionary “unaccent” does not exist' entries in postgres log, supposedly during automatic analyze
Ligatures
In Postgres 9.5 or older ligatures like 'Œ' or 'ß' have to be expanded manually (if you need that), since unaccent() always substitutes a single letter:
SELECT unaccent('Œ Æ œ æ ß');
unaccent
----------
E A e a S
You will love this update to unaccent in Postgres 9.6:
Extend contrib/unaccent's standard unaccent.rules file to handle all
diacritics known to Unicode, and expand ligatures correctly (Thomas
Munro, Léonard Benedetti)
Bold emphasis mine. Now we get:
SELECT unaccent('Œ Æ œ æ ß');
unaccent
----------
OE AE oe ae ss
Pattern matching
For LIKE or ILIKE with arbitrary patterns, combine this with the module pg_trgm in PostgreSQL 9.1 or later. Create a trigram GIN (typically preferable) or GIST expression index. Example for GIN:
CREATE INDEX users_unaccent_name_trgm_idx ON users
USING gin (f_unaccent(name) gin_trgm_ops);
Can be used for queries like:
SELECT * FROM users
WHERE f_unaccent(name) LIKE ('%' || f_unaccent('João') || '%');
GIN and GIST indexes are more expensive (to maintain) than plain B-tree:
Difference between GiST and GIN index
There are simpler solutions for just left-anchored patterns. More about pattern matching and performance:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
pg_trgm also provides useful operators for "similarity" (%) and "distance" (<->).
Trigram indexes also support simple regular expressions with ~ et al. and case insensitive pattern matching with ILIKE:
PostgreSQL accent + case insensitive search
No, PostgreSQL does not support collations in that sense
PostgreSQL does not support collations like that (accent insensitive or not) because no comparison can return equal unless things are binary-equal. This is because internally it would introduce a lot of complexities for things like a hash index. For this reason collations in their strictest sense only affect ordering and not equality.
Workarounds
Full-Text-Search Dictionary that Unaccents lexemes.
For FTS, you can define your own dictionary using unaccent,
CREATE EXTENSION unaccent;
CREATE TEXT SEARCH CONFIGURATION mydict ( COPY = simple );
ALTER TEXT SEARCH CONFIGURATION mydict
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, simple;
Which you can then index with a functional index,
-- Just some sample data...
CREATE TABLE myTable ( myCol )
AS VALUES ('fóó bar baz'),('qux quz');
-- No index required, but feel free to create one
CREATE INDEX ON myTable
USING GIST (to_tsvector('mydict', myCol));
You can now query it very simply
SELECT *
FROM myTable
WHERE to_tsvector('mydict', myCol) ## 'foo & bar'
mycol
-------------
fóó bar baz
(1 row)
See also
Creating a case-insensitive and accent/diacritics insensitive search on a field
Unaccent by itself.
The unaccent module can also be used by itself without FTS-integration, for that check out Erwin's answer
I'm pretty sure PostgreSQL relies on the underlying operating system for collation. It does support creating new collations, and customizing collations. I'm not sure how much work that might be for you, though. (Could be quite a lot.)

Syntax error in create aggregate

Trying to create an aggregate function:
create aggregate min (my_type) (
sfunc = least,
stype = my_type
);
ERROR: syntax error at or near "least"
LINE 2: sfunc = least,
^
What am I missing?
Although the manual calls least a function:
The GREATEST and LEAST functions select the largest or smallest value from a list of any number of expressions.
I can not find it:
\dfS least
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------+------------------+---------------------+------
(0 rows)
Like CASE, COALESCE and NULLIF, GREATEST and LEAST are listed in the chapter Conditional Expressions. These SQL constructs are not implemented as functions .. like #Laurenz provided in the meantime.
The manual advises:
Tip: If your needs go beyond the capabilities of these conditional
expressions, you might want to consider writing a stored procedure in
a more expressive programming language.
The terminology is a bit off here as well, since Postgres does not support true "stored procedures", just functions. (Which is why there is an open TODO item "Implement stored procedures".)
This manual page might be sharpened to avoid confusion ...
#Laurenz also provided an example. I would just use LEAST in the function to get identical functionality:
CREATE FUNCTION f_least(anyelement, anyelement)
RETURNS anyelement LANGUAGE sql IMMUTABLE AS
'SELECT LEAST($1, $2)';
Do not make it STRICT, that would be incorrect. LEAST(1, NULL) returns 1 and not NULL.
Even if STRICT was correct, I would not use it, because it can prevent function inlining.
Note that this function is limited to exactly two parameters while LEAST takes any number of parameters. You might overload the function to cover 3, 4 etc. input parameters. Or you could write a VARIADIC function for up to 100 parameters.
LEAST and GREATEST are not real functions; internally they are parsed as MinMaxExpr (see src/include/nodes/primnodes.h).
You could achieve what you want with a generic function like this:
CREATE FUNCTION my_least(anyelement, anyelement) RETURNS anyelement
LANGUAGE sql IMMUTABLE CALLED ON NULL INPUT
AS 'SELECT LEAST($1, $2)';
(thanks to Erwin Brandstetter for the CALLED ON NULL INPUT and the idea to use LEAST.)
Then you can create your aggregate as
CREATE AGGREGATE min(my_type) (sfunc = my_least, stype = my_type);
This will only work if there are comparison functions for my_type, otherwise you have to come up with a different my_least function.

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

Why do Postgres Hstore indexes work for ? (operator) and not for EXIST (function)?

http://www.postgresql.org/docs/9.2/static/hstore.html states:
hstore has GiST and GIN index support for the #>, ?, ?& and ?| operators
Yet the indexes don't work for the EXIST function, which appears to be equivalent to the ? operator.
What is the difference between operators and functions that makes it harder to index one or the other?
Might future versions of the Hstore extension make these truly equivalent?
Lookup the documentation for "CREATE OPERATOR CLASS" which describes how you can create indexing methods for arbitrary operators. You also need to use "CREATE OPERATOR" to create an operator based on the EXIST function first.
(Caveat: I have no experience with hstore)
http://www.postgresql.org/docs/9.0/static/sql-createoperator.html
http://www.postgresql.org/docs/9.0/static/sql-createopclass.html
Here's your problem: PostgreSQL functions are planner-opaque. The planner has no way of knowing that the operator and the function are semantically equivalent. This comes up a lot.
PostgreSQL does have functional indexes so you can index outputs of immutable functions but this may not quite make things work perfectly well here since you'd probably be able to only index which rows return true for a given call, but this could still be very useful with partial indexes. For example you could always do something like:
CREATE INDEX bar_has_aaa ON foo(exists(bar, 'aaa'));
or
CREATE INDEX bar_has_aaa ON foo(id) where exists (bar, 'aaa');
But I don' see this going exactly where you need it to go. Hopefully it points you in the right direction though.
Edit: The following strikes me as a better workaround. Suppose we have a table foo:
CREATE TABLE foo (
id serial,
bar hstore
);
We can create a table method bar_keys:
CREATE FUNCTION bar_keys(foo) RETURNS text[] IMMUTABLE LANGUAGE SQL AS $$
SELECT akeys($1.bar);
$$;
Then we can index that using GIN:
CREATE INDEX foo_bar_keys_idx ON foo USING gin(bar_keys(foo));
And we can use it in our queries:
SELECT * FROM foo WHERE foo.bar_keys #> array['aaa'];
That should use an index. Note you could just index/use akeys directly, but I think a virtual column leads to cleaner syntax.