Is there a way to index in postgres for fast substring searches - postgresql

I have a database and want to be able to look up in a table a search that's something like:
select * from table where column like "abc%def%ghi"
or
select * from table where column like "%def%ghi"
Is there a way to index the column so that this isn't too slow?
Edit:
Can I also clarify that the database is read only and won't be updated often.

Options for text search and indexing include:
full-text indexing with dictionary based search, including support for prefix-search, eg to_tsvector(mycol) ## to_tsquery('search:*')
text_pattern_ops indexes to support prefix string matches eg LIKE 'abc%' but not infix searches like %blah%;. A reverse()d index may be used for suffix searching.
pg_tgrm trigram indexes on newer versions as demonstrated in this recent dba.stackexchange.com post.
An external search and indexing tool like Apache Solr.
From the minimal information given above, I'd say that only a trigram index will be able to help you, since you're doing infix searches on a string and not looking for dictionary words. Unfortunately, trigram indexes are huge and rather inefficient; don't expect some kind of magical performance boost, and keep in mind that they take a lot of work for the database engine to build and keep up to date.

If you need just to, for instance, get unique substrings in an entire table, you can create a substring index:
CREATE INDEX i_test_sbstr ON tablename (substring(columname, 5, 3));
-- start at position 5, go for 3 characters
It is important that the substring() parameters in the index definition are
the same as you use in your query.
ref: http://www.postgresql.org/message-id/BANLkTinjUhGMc985QhDHKunHadM0MsGhjg#mail.gmail.com

For the like operator use one of the operator classes varchar_pattern_ops or text_pattern_ops
create index test_index on test_table (col varchar_pattern_ops);
That will only work if the pattern does not start with a % in which case another strategy is required.

Related

How does GIN index deal with tsquery with both & and |?

When I read the document, I just see an "extractValue" function, but I don't know how it works.
When I pass a query like
Select *
from people
WHERE people.belongings && to_tsquery('hat & (case | bag)')
(and I have a gin index on people.belongings)
would this query use the index? what would the extractValue do to this query?
=======
and another question, why not, or why can't, the GisT index to index an array's objects individually like GIN index?
extractValue is a support function that is used when building the index, not when searching it. In the case of full text search, it would be fed the tsvector and returns the index keys contained in it.
The support function used to get the keys in a tsquery would be extractQuery. For full text search, that would be gin_extract_tsquery. It is defined in src/backend/utils/adt/tsginidx.c if you are interested in the implementation. What is does is convert the tsquery into an internal representation that can be searched in the index.
The actual check if an index entry matches the search expression is done by gin_tsquery_consistent.
The support functions are described in the documentation.

Is there a way to create an index on a set of words (not all words) in postgres?

I want to have some sort of limited indexed Full-text search. With FTS postgres will index all the words in the text, but I want it to track only a given set of words. For example, I have a database of tweets, and I want them to be indexed by special words that I give: awesome, terrible and etc.
If someone will be interested in such a specific thing, I made it by creating a custom dictionary (thanks Mark).
My findings I documented here: https://hackmd.io/#z889TbyuRlm0vFIqFl_AYQ/r1gKJQBZS

Reasons for creating an Index for a string field in MongoDB

When I create an Index on a string-type field in MongoDB I get no significant speed boost from it. In fact, when I use the query:
db.movies.find({plot: /some text/}).explain("executionStats")
An Index is slowing down the query by 30-50% in my Database (~55k Docs).
I know, that I can use a "text" Index, which is working fine for me, but I was wondering, why you would create a "normal" Index on a string field.
Index on string fields will improve the performance of exact matches like,
db.movies.find({name: "some movie"})
Indexes will also be used for find queries with prefix expression,
db.movies.find({plot: /^begins with/})

Convert Sphinx Index to Table?

I go through a pretty intense sphinx configuration each day to convert the millions of records into a usable/searchable sphinx index.
However I now need to export that as an xml file, if not that as a new table.
Naturally I could do most/all of the work I do in the Sphinx Index in Mysql as well but it seems like a lot of unncessary work if I've just generated a Sphinx Index. Can I somehow 'export' that index to a table or is the full-text indexing essentially now useless to me as readable data?
Well it depends WHAT you want out.
The Sphinx index, is estiently a Inverted Index. https://en.wikipedia.org/wiki/Inverted_index
... as such its good for finding which 'documents' contain a given word, it litterally stores that as a list. (ideally suited to the fundamental function of a query! Just sphinx does heavy lifting for multi-word queries, as well as ranking results)
... such a structure is NOT organized by document. SO cant directly get a list of which words are in a given document. (to compute htat would have to traverse the entire data structure)
But if it's the inverted index that you DO want can dump it with indextool
http://sphinxsearch.com/docs/current.html#ref-indextool
... eg the --dumpdict and even --dumphitlist commands.
(although dumpdict only works on dict=keywords indexes)
You might be interested in the --dump-rows option on indexer
http://sphinxsearch.com/docs/current.html#ref-indexer
... it dumps out the textual data during indexing, retrieved from mysql.
It's not dumped from the index itself, and is not subject to all the 'magic' tokenizing and normalizing sphinx does (charset_table/wordforms etc)
Back to indextool there is also the --fold, --htmlstrip, --morph, which can be used in stream to tokenize text.
In theory could use these to use the 'power' of sphinx, and the settings from an actual index, to create a processed dataset (similar to what sphinx is doing generating index)

"LIKE" or "=" Operator not using index seek

got the following query
SELECT * FROM myDB.dbo.myTable
WHERE name = 'Stockholm'
or
SELECT * FROM myDB.dbo.myTable
WHERE name LIKE('Stockholm')
I have created a fulltext index which will be taken when I use CONTAINS(name,'Stockholm') but in the two cases above it performs only a clustered index scan. This is way to slow, above 1 second. I'm a little confused because I only want to search for perfect matches which should be as fast as CONTAINS(), shouldn't it? I've read that LIKE should use a index seek at least, if you don't use wildcards respectively not using wildcards at the beginning of the word you're searching for.
Thank you in advance
I bet you have no indexes on the name column. A Full-Text index is NOT a database index and is NOT used, unless you use a full-text predicate like CONTAINS.
As stated by #Panagiotis Kanavos and #Damien_The_Unbeliever there was no "non-fulltext" index on my name column. I had to simply add a index with the following query:
CREATE INDEX my_index_name
ON myDB.dbo.myTable (name)
This improves the performance from slightly above one second to under a half second.