How does GIN index deal with tsquery with both & and |? - postgresql

When I read the document, I just see an "extractValue" function, but I don't know how it works.
When I pass a query like
Select *
from people
WHERE people.belongings && to_tsquery('hat & (case | bag)')
(and I have a gin index on people.belongings)
would this query use the index? what would the extractValue do to this query?
=======
and another question, why not, or why can't, the GisT index to index an array's objects individually like GIN index?

extractValue is a support function that is used when building the index, not when searching it. In the case of full text search, it would be fed the tsvector and returns the index keys contained in it.
The support function used to get the keys in a tsquery would be extractQuery. For full text search, that would be gin_extract_tsquery. It is defined in src/backend/utils/adt/tsginidx.c if you are interested in the implementation. What is does is convert the tsquery into an internal representation that can be searched in the index.
The actual check if an index entry matches the search expression is done by gin_tsquery_consistent.
The support functions are described in the documentation.

Related

mongodb index on regex fields not working

I'm new in mongoDB and I'm facing an issue about performance that need your help. I have a collection with 400k records, when not create index for any field on the collection it takes 20-30s for each query then I create indexs for fields that usually using for search query, but the problem is, when using $regex to search for a string field with index on it, mongoDB does not use index on that field, mongodb still scan for all records in that collection, I've searched on internet with this keyword: "index on regex fields mongodb" and I found some answers which say that "MongoDB use prefix of RegEx to lookup indexes" which means you have to use "^" prefix for the index to work like "db.users.find({name: /^key word/})", but that is not working for me, does "index on $regex field" need MongoDB Atlas to work? because i'm using comunity version of mongoDB. Thanks!
There's a lot to unpack here. We'll split the answer into two parts, the first to try and answer some of the direct questions about index usage and the second to explore solutions to satisfy the application requirements.
Index Usage with $regex
As is true with an index in any database that captures the full string value as the key, MongoDB can use the index for a $regex operation but its efficiency in doing so greatly depends on the regex being applied. That is what the Index Use documentation from the comments and the other answers you reference are describing.
In the comments you mention that an example query might be db.users.find({name: {$regex: '.*keyword.*', $options: 'i'}}). That means that the regex is a both unanchored and case-insensitive. The aforementioned doumentation states directly:
Case insensitive regular expression queries generally cannot use indexes effectively.
Why is this? because the substring that you are searching for can be found in any string value captured by the index. So the document with matching value {name: 'a keyword'} would be located at one end of the index, {name: 'keyWord' }, may be somewhere in the middle, and {name: 'Z keyword'} may be at the end. The only way to ensure correct results is for the database to scan the index for all string values. So while it is still using the index, it may not be efficient as most of the scanned values will not be match and will be discarded.
You may always use .explain() to better understand how the database is answering the query, such as if and how it is using an index.
Solutions
So what do we do about this?
Well as #rickhg12hs suggests in the comments, it depends on exactly what you are trying to achieve. You reiterate that that you are looking for 'full regex search capability', but that is really an approach/solution rather than a goal. If what you really need, for example, is just to match an exact string in a case insensitive manner, then something as simple as a case insensitive index would likely do the trick.
However if truly do wish to perform arbitrary substring searching, then you are really looking at search engine capabilities. In that situation your best bets would probably be to emulate their indexes directly in MongoDB (e.g. have the application manually tokenize the strings to be indexed), stand up something like Solr/Elasticsearch next to MongoDB, or use MongoDB's Atlas Search offering. The $text operator mentioned in the comment has limitations when it comes to substring searching (such as just part of a word), which may or may not be relevant for your needs.

Does the phrase search operator <-> work with JSONB documents or only relational tables?

Does the phrase search operator <-> work with JSONB documents or only relational tables in PostgreSQL?
I haven't experimented with this yet, as I haven't yet set up Postgres hosting. The answer to this question will help determine what database and what tools I will be using.
I found this sample code at: https://compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/:
SELECT document_id, document_text FROM documents
WHERE document_tokens ## to_tsquery('jump <-> quick');
I just need to know if this operator is supported by JSONB documents.
The phrase search capability is integrated into the text search data type tsquery. The text search operator ## you display takes a tsvector to the left and a tsquery to the right. And a tsvector can be built from any character type as well as from a JSON document.
Related:
Match a phrase ending in a prefix with full text search
You can convert your json or jsonb document to a text search vector with one of the dedicated functions:
to_tsvector()
json(b)_to_tsvector()
Note that these only include values from the JSON document, not keys. Typically, that's what you want.
Basic example:
SELECT to_tsvector(jsonb '{"foo":"jump quickly"}')
## to_tsquery('jump <-> quick:*');
Demonstrating prefix matching on top of phrase search while being at it. See:
Get partial match from GIN indexed TSVECTOR column
Alternatively, you can simply create the tsvector from the text representation of your JSON document to also include key names:
SELECT to_tsvector((jsonb '{"foo-fighter":"jump quickly"}')::text)
## to_tsquery('foo <-> fight:*');
Produces a bigger tsvector, obviously.
Both can be indexed (which is the main point of text search). Only indexes are bound to relational tables. (And you can index the expression!)
The expression itself can be applied to any value, not bound to tables like you seem to imply.

Is GIN index on postgres jsonb column nested?

I have a json which I am storing it as jsonb in postgres.
{
"name": "Mr. Json",
"dept":{
"team":{
"aliases":["a1","a2","a3"],
"team_name": "xyz"
},
"type":"engineering",
"lead":"Mr. L"
},
"hobbies": ["Badminton", "Chess"],
"is_active": true
}
Have created a GIN index on the column
I need to do exact match queries like all rows containing type='engineering' and lead='Mr. L'.
I am currently doing containment queries like:
data #> '{"dept":{"type":"engineering"}}' and data #> '{"dept":{"lead":"Mr. L"}}'
I saw the query plan which shows GIN index is being used but I am unsure if this works or if there is some better way of achieving this.
Will I have to construct another index on nested keys?
Does indexing a jsonb column indexes the nested keys or just the top level ones?
Also please share some good resource on this.
From docs:
The default GIN operator class for jsonb supports queries with top-level key-exists operators ?, ?& and ?| operators and path/value-exists operator #>.
For containment #> it works with nested values. For other operators it works only for top-level keys or whatever level is used in expression index. Also, according to documentation, using expression index on level you want to query will be faster than simple index on whole column (makes sense as size is smaller).
If you are doing only containment search, consider using jsonb_path_ops while building your index. It is smaller and faster.

"LIKE" or "=" Operator not using index seek

got the following query
SELECT * FROM myDB.dbo.myTable
WHERE name = 'Stockholm'
or
SELECT * FROM myDB.dbo.myTable
WHERE name LIKE('Stockholm')
I have created a fulltext index which will be taken when I use CONTAINS(name,'Stockholm') but in the two cases above it performs only a clustered index scan. This is way to slow, above 1 second. I'm a little confused because I only want to search for perfect matches which should be as fast as CONTAINS(), shouldn't it? I've read that LIKE should use a index seek at least, if you don't use wildcards respectively not using wildcards at the beginning of the word you're searching for.
Thank you in advance
I bet you have no indexes on the name column. A Full-Text index is NOT a database index and is NOT used, unless you use a full-text predicate like CONTAINS.
As stated by #Panagiotis Kanavos and #Damien_The_Unbeliever there was no "non-fulltext" index on my name column. I had to simply add a index with the following query:
CREATE INDEX my_index_name
ON myDB.dbo.myTable (name)
This improves the performance from slightly above one second to under a half second.

Is there a way to index in postgres for fast substring searches

I have a database and want to be able to look up in a table a search that's something like:
select * from table where column like "abc%def%ghi"
or
select * from table where column like "%def%ghi"
Is there a way to index the column so that this isn't too slow?
Edit:
Can I also clarify that the database is read only and won't be updated often.
Options for text search and indexing include:
full-text indexing with dictionary based search, including support for prefix-search, eg to_tsvector(mycol) ## to_tsquery('search:*')
text_pattern_ops indexes to support prefix string matches eg LIKE 'abc%' but not infix searches like %blah%;. A reverse()d index may be used for suffix searching.
pg_tgrm trigram indexes on newer versions as demonstrated in this recent dba.stackexchange.com post.
An external search and indexing tool like Apache Solr.
From the minimal information given above, I'd say that only a trigram index will be able to help you, since you're doing infix searches on a string and not looking for dictionary words. Unfortunately, trigram indexes are huge and rather inefficient; don't expect some kind of magical performance boost, and keep in mind that they take a lot of work for the database engine to build and keep up to date.
If you need just to, for instance, get unique substrings in an entire table, you can create a substring index:
CREATE INDEX i_test_sbstr ON tablename (substring(columname, 5, 3));
-- start at position 5, go for 3 characters
It is important that the substring() parameters in the index definition are
the same as you use in your query.
ref: http://www.postgresql.org/message-id/BANLkTinjUhGMc985QhDHKunHadM0MsGhjg#mail.gmail.com
For the like operator use one of the operator classes varchar_pattern_ops or text_pattern_ops
create index test_index on test_table (col varchar_pattern_ops);
That will only work if the pattern does not start with a % in which case another strategy is required.