I am trying a query to search a table in which names start with a letter g.
"hospital #name ^g"
Is this right ? I am getting 0 results for it even after indexing the table with names and all other columns.
Do you have min_prefix_len set to 1 for that index? Without this Sphinx treats "g" as a separate word and will match only those documents that start with one-letter word "g", for example "g anotherword".
Related
Does the phrase search operator <-> work with JSONB documents or only relational tables in PostgreSQL?
I haven't experimented with this yet, as I haven't yet set up Postgres hosting. The answer to this question will help determine what database and what tools I will be using.
I found this sample code at: https://compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/:
SELECT document_id, document_text FROM documents
WHERE document_tokens ## to_tsquery('jump <-> quick');
I just need to know if this operator is supported by JSONB documents.
The phrase search capability is integrated into the text search data type tsquery. The text search operator ## you display takes a tsvector to the left and a tsquery to the right. And a tsvector can be built from any character type as well as from a JSON document.
Related:
Match a phrase ending in a prefix with full text search
You can convert your json or jsonb document to a text search vector with one of the dedicated functions:
to_tsvector()
json(b)_to_tsvector()
Note that these only include values from the JSON document, not keys. Typically, that's what you want.
Basic example:
SELECT to_tsvector(jsonb '{"foo":"jump quickly"}')
## to_tsquery('jump <-> quick:*');
Demonstrating prefix matching on top of phrase search while being at it. See:
Get partial match from GIN indexed TSVECTOR column
Alternatively, you can simply create the tsvector from the text representation of your JSON document to also include key names:
SELECT to_tsvector((jsonb '{"foo-fighter":"jump quickly"}')::text)
## to_tsquery('foo <-> fight:*');
Produces a bigger tsvector, obviously.
Both can be indexed (which is the main point of text search). Only indexes are bound to relational tables. (And you can index the expression!)
The expression itself can be applied to any value, not bound to tables like you seem to imply.
Can I have multiple type of index on same field? Will it affect performance?
Example :
db.users.createIndex({"username":"text"})
db.users.createIndex({"username":1})
Yes, you can have different types of indexes on single field. You can create indexes of type e.g text, 2dsphere, hash
You can not create same index with sparse and unique options.
Every write operation is going to update a relevant index entry of all possible types in this case
The two index options are very different.
When you create a regular index on a string field it indexes the entire value in the string. Mostly useful for single word strings (like a username for logins) where you can match exactly.
A text index on the other hard will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three). Mostly useful for true text (sentences, paragraphs, etc).
Text Search
Text search supports the search of string content in documents of a collection. MongoDB provides the $text operator to perform text search in queries and in aggregation pipelines.
The text search process:
tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a
document to a given search query.
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
$regex searches can be used with regular indexes on string fields, to provide some pattern matching and wildcard search. Not a terribly effective user of indexes but it will use indexes where it can:
If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.
http://docs.mongodb.org/manual/core/index-text/
http://docs.mongodb.org/manual/reference/operator/query/regex/
For a MongoDB field that contains strings (for example, state or province names), what (if any) difference is there between creating an index on a string-type field :
db.ensureIndex( { field: 1 } )
and creating a text index on that field:
db.ensureIndex( { field: "text" }
Where, in both cases, field is of string type.
I'm looking for a way to do a case-insensitive search on a text field which would contain a single word (maybe more). Being new to Mongo, I'm having trouble distinguishing between using the above two index methods, and even something like a $regex search.
The two index options are very different.
When you create a regular index on a string field it indexes the
entire value in the string. Mostly useful for single word strings
(like a username for logins) where you can match exactly.
A text index on the other hand will tokenize and stem the content of
the field. So it will break the string into individual words or
tokens, and will further reduce them to their stems so that variants
of the same word will match ("talk" matching "talks", "talked" and
"talking" for example, as "talk" is a stem of all three). Mostly
useful for true text (sentences, paragraphs, etc).
Text Search
Text search supports the search of string content in documents of a
collection. MongoDB provides the $text operator to perform text search
in queries and in aggregation pipelines.
The text search process:
tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query.
The $text operator can search for words and phrases. The query matches
on the complete stemmed words. For example, if a document field
contains the word blueberry, a search on the term blue will not match
the document. However, a search on either blueberry or blueberries
will match.
$regex searches can be used with regular indexes on string fields, to
provide some pattern matching and wildcard search. Not a terribly
effective user of indexes but it will use indexes where it can:
If an index exists for the field, then MongoDB matches the regular
expression against the values in the index, which can be faster than a
collection scan. Further optimization can occur if the regular
expression is a “prefix expression”, which means that all potential
matches start with the same string. This allows MongoDB to construct a
“range” from that prefix and only match against those values from the
index that fall within that range.
http://docs.mongodb.org/manual/core/index-text/
http://docs.mongodb.org/manual/reference/operator/query/regex/
text indexes allow you to search for words inside texts. You can do the same using a regex on a non text-indexed text field, but it would be much slower.
Prior to MongoDB 2.6, text search operations had to be made with their own command, which was a big drawback because you coulnd't combine it with other filters, nor treat the result as a common cursor. As of now, the text search is just another another operator for the typical find method and that's super nice.
So, Why is a text index, and its subsequent searchs faster than a regex on a non-indexed text field? It's because text indexes work as a dictionary, a clever one that's capable of discarding words on a per-language basis (defaults to english). When you run a text search query, you run it against the dictionary, saving yourself the time that would otherwise be spent iterating over the whole collection.
Keep in mind that the text index will grow along with your collection, and it can use a lot of space. I learnt this the hard way when using capped collections. There's no way to cap text indexes.
A regular index on a text field, such as
db.ensureIndex( { field: 1 } )
will be useful only if you search for the whole text. It's used for example to look for alphanumeric hashes. It doesn't make any sense to apply this kind of indexes when storing text paragraphs, phrases, etc.
I have a number of fields that have unordered information and Ineed to find certain records that have certain pattern search. For example:
field1 , field2
1 , "house, cars, people"
2 , "mazda, Jefff, cat 15th stre"
3 , "do, money, arreaz, cars"
.
.
N , "cars, postgres, json, abat"
As I postgres to search only the records that have the word "cars"
Thanks to anyone who can help
You should use regular expressions to match your particular example you can do the following:
field2 ~ '^cars$' or
field2 ~ ',cars,' or
field2 ~ '^cars,' or
field2 ~ ',cars$'
With regular expressions you can search commas at the beginning of a string, at the end of a string, the beginning of a string, etc..
If PostgreSQL's regular expressions' definition of word characters (words consit of alphanumeric character(s) and/or the underscore _) is good enough for you, you can use this:
WHERE field2 ~ '\mcars\M'
You can also use the pg_trgm contrib module, to speed up your queries.
But I strongly recommend, you should check PostgreSQL's full text search support (if you want search in your field like in a description), or you should revise your structure (f.ex. you could use another table for field2 entries (preferred), or even an array instead of the plain comma-separated field).
I am implementing sphinx search in my rails application.
I want to search with fuzzy on. It should search for spelling mistakes e.g if is enter search query charact*a*ristics, it should search for charact*e*ristics.
How should I implement this
Sphinx doesn't naturally allow for spelling mistakes - it doesn't care if the words are spelled correctly or not, it just indexes them and matches them.
There's two options around this - either use thinking-sphinx-raspell to catch spelling errors by users when they search, and offer them the choice to search again with an improved query (much like Google does); or maybe use the soundex or metaphone morphologies so words are indexed in a way that accounts for how they sound. Search on this page for stemming, you'll find the relevant section. Also have a read of Sphinx's documentation on the matter as well.
I've no idea how reliable either option would be - personally, I'd opt for #1.
By default, Sphinx does not pay any attention to wildcard searching using an asterisk character. You can turn it on, though:
development:
enable_star: true
# ... repeat for other environments
See http://pat.github.io/thinking-sphinx/advanced_config.html Wildcard/Star Syntax section.
Yes, Sphinx generaly always uses the extended match modes.
There are the following matching modes available:
SPH_MATCH_ALL, matches all query words (default mode);
SPH_MATCH_ANY, matches any of the query words;
SPH_MATCH_PHRASE, matches query as a phrase, requiring perfect match;
SPH_MATCH_BOOLEAN, matches query as a boolean expression (see Section 5.2, “Boolean query syntax”);
SPH_MATCH_EXTENDED, matches query as an expression in Sphinx internal query language (see Section 5.3, “Extended query syntax”);
SPH_MATCH_EXTENDED2, an alias for SPH_MATCH_EXTENDED;
SPH_MATCH_FULLSCAN, matches query, forcibly using the "full scan" mode as below. NB, any query terms will be ignored, such that filters, filter-ranges and grouping will still be applied, but no text-matching.
SPH_MATCH_EXTENDED2 was used during 0.9.8 and 0.9.9 development cycle, when the internal matching engine was being rewritten (for the sake of additional functionality and better performance). By 0.9.9-release, the older version was removed, and SPH_MATCH_EXTENDED and SPH_MATCH_EXTENDED2 are now just aliases.
enable_star
Enables star-syntax (or wildcard syntax) when searching through prefix/infix indexes. >Optional, default is is 0 (do not use wildcard syntax), for compatibility with 0.9.7. >Known values are 0 and 1.
For example, assume that the index was built with infixes and that enable_star is 1. Searching should work as follows:
"abcdef" query will match only those documents that contain the exact "abcdef" word in them.
"abc*" query will match those documents that contain any words starting with "abc" (including the documents which contain the exact "abc" word only);
"*cde*" query will match those documents that contain any words which have "cde" characters in any part of the word (including the documents which contain the exact "cde" word only).
"*def" query will match those documents that contain any words ending with "def" (including the documents that contain the exact "def" word only).
Example:
enable_star = 1