Postgres/full text search showing a preview of part of a document - postgresql

I'm using postgres 9.3 with full text search and I'm running a query like
select * from jobs where fts ## plainto_tsquery('pg_catalog.english','search term');
I'm getting the proper results, however, I'd like to be able to get a portion of the search results that match the terms searched. The FTS column is just a to_tsvector() of the description column. What I'd like to do is show a short excerpt of the description, with the terms highlighted. Any ideas on how I'd achieve this?

This is what the ts_headline() function is intended for.
It is designed to deliver you excerpts or highlights of the "original" text you have normalized. The most basic usage would be this:
SELECT ts_headline(description, keywords) as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
Note that "description" in this query is my guess to the name of your column that holds the original text and "fts" is the guess for the column that contains the normalized text.
This query will return a result set containing an excerpt of your orignal text with the matching tokens highlighted through HTML <b> tags.
There is a comma separated string of optional values you can pass into this function to alter its behavior. You could, for example, alter the surrounding tags you will get back by setting the StartSel and EndSel values:
SELECT ts_headline(description, keywords, 'StartSel=<em>,StopSel=</em>') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
Now the <b> tags will become <em> tags. Actually, they do not have to be HTML tags, you can pass in (almost) any string.
Another popular value to set is the amount of excerpts you wish to see by setting the MaxFragments values to control the maximum amount of possible excerpts to return in combination with the MaxWords and MinWords values to set how much text should surround each excerpt.
SELECT ts_headline(description, keywords, 'MaxFragments=4,MaxWords=5,MinWords=2') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
The above query will now show a maximum of four possible excerpts and have a word boundary set between two and five words.
If you wish to simply show the whole document with the results highlighted, you could use the HighlightAll value, which overrides all fragment values set:
SELECT ts_headline(description, keywords, 'HighlightAll=true') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
Note: beware of using ts_headline() for it is a possible bottleneck in performance. For each record you wish to highlight, the database has to go and fetch the whole text, parse it and insert the desired start and end elements.
Please use the function with great care and only set it loose on a small portion (top five or top ten records) of your complete result set.

Related

Using MYSQLI to select rows in which part of a column matches part of an input

I have a database in which one of the columns contains a series of information 'tags' about the row that are stored as a comma-separated list (a string) of dynamic length. I am using mysqli within PHP, and I want to select rows in which any of these items match any of the items in an input string.
For example, there could be a row describing an apple, containing the tags: "tasty, red, fruit, sour, sweet, green." I want this to show up as a result in a query like: "SELECT * FROM table WHERE info tags IN ('blue', 'red', 'yellow')", because it has at least one item ("red") overlapping. Kind of like "array_intersect" in PHP.
I think I could use IN if each row had only one tag, and I could use LIKE if I used only one input tag, but both are of dynamic length. I know I can loop over all the input tags, but I was hoping to put this in a single query. Is that possible? If not, can I use a different structure to store the tags in the database to make this possible (something other than a comma separated string)?
I think the best would be to create tags table (id + label) then separate "table_tags" table which holds table_id and tag_id.
that means using JOINS to get the final result.
another (but lazy) solution would be to prefix and suffix tags with commas so the full column contains something like:
,tasty,red,fruit,sour,sweet,green,
and you can do a LIKE search without being worried about overlapping words (i.e red vs bored) and still get a proper match by using LIKE '%,WORD,%'

Openedge SDO -> smart data browser - I want to filter the query results

I have an SDO supplying data to a read-only browser. The SDO query joins several tables and has calculated fields as well as natural data fields.
The users now want a search facility so the browser will only show rows where the search word appears in ANY of the text fields.
For example they want to see rows where
customer.name matches "*bob*" OR
customer.address1 matches "*bob*" OR
product.description matches "*bob*" OR
calc_field_1 matches "*bob*" OR
calc_field_2 matches "*bob*" OR ...
Ideally the answer will filter the SDO output as it is created - but I am also happy to filter the data on the way to the smartbrowser or in the smartbrowser.
The business problem you're trying to solve in fraught with performance issues if you implement it as written. I'd suggest
adding another character column to the table or db,
putting all the words from the other columns in it,
applying a word-index to the new column,
doing a search on that column, and then linking back to the source tables.
It'll be much faster and easier to use.
I used a very simple solution in the end. Users can enter a string they are looking for. If the string is in a cell in the browser then the cell is highlighted in yellow.
Before this the users had to scroll up and down trying to spot the cells of interest in hundreds of rows. We did not have the time or budget for anything fancier.
The important bit of code in the smartbrowser is like this...
on row-display of br_table in frame f-main
do:
if rowObject.field1 matches "*BOB*" then
rowObject.field1:BGCOLOR in browse br_table = 14.
if rowObject.field2 matches "*BOB*" then
rowObject.field2:BGCOLOR in browse br_table = 14.
if rowObject.field3 matches "*BOB*" then
rowObject.field3:BGCOLOR in browse br_table = 14.
... etc ...
it's not hard-coded to only look for Bob - but you should get the idea.

elasticsearch array field of keywords - how to index it

I've got input that is analogous to tags, where there are a couple of strings per record, and they should be thought of as keywords, not to be tokenized or broken up or analyzed in any particular way. I want it to show up in faceting "as-is", including spaces, slashes, dashes and ampersands.
I don't think I need multi_field here. There is one input value per record "keyPhrases" but the input value is a simple json array of strings.
I want elasticsearch to insert into the facets each of the values, and tag the record with all of the phrases.
Usually there are only one or two or three phrases per record, but there could be more. The set of keyPhrases is fairly small, like 30 or at most like 50. They could be thought of as "categories".
The faceting keeps breaking up the input strings and using lowercasing, even though I'm trying to specify not_analyzed, keyword tokenizer, keyword analyzer, and trying things like that.
I have other fields that keep their spacing and capitalization as I desire in the facets returned, however those fields are not_analyzed and are also store: true, but are also just exactly 1 string input per record, as opposed to many per record.
I could just take the top 1 keyPhrase per record and flatten it, but ideally all the tags would work and be available as facets.
Any ideas on how to do this?
Well, this is embarrassing.
My strict mapping wasn't actually committed to the server at the time I was trying this.
(I was dropping the index and creating the index again with each new mapping, and hadn't realized it, and this was not the final mapping, so it was getting loaded and then dropped.)

Unable to use Sphinx MVA sql_attr_multi

I have a field called "tags" and it has values (say) "Music, Art, Sports, Food" etc. How can I use setFilter function in PHP-Sphinx for this field. I know that it has to be an integer and should be used as an array in PHP. So, if I use a numeric field for tags, what about the delimiters (in this case comma). Currently, I am using "sql_attr_multi" like this…
sql_attr_multi = uint tags from field
I have to filter the search based on any of the keywords the user has selected, Music, Sports, Food etc. As such, only MVA is the right option to do this. But I am just not able to figure out, how to do this. I can store all tag elements as numeric values and make the tags field as int. But what about the comma or how will I convert the whole string (Music, Art, Sports, Food) as an integer. Later, how do I call setFilter using PHP.
Any help is highly appreciated.
Well using a MVA, suggests you already unique-ids for each tag.
Which if you had a seperate table for tags (with a PK), and many-to-many table joining your documents, and tags. (thats a very common way to store tags - in normal form)
If you have a text column containing the text, would be easier to just use a Field. Can easily filter by fields in the main text-query.
crispy creams #tags Food
for example (thats extended mode query)
(But fields can't do Grouping like you can with Attributes)

Is it possible to perform a Sphinx search on one string attribute?

sql_query=SELECT id,headline,summary,body,tags,issues,published_at
FROM sphinx_search
I am working on the search feature of my Web site and I am using Sphinx, Perl and Sphinx::Search. As long as I want to search in all the attributes and I don't restrict it to just one, everything goes well. However when the user searches for a specific tag, I can't just give the result of a fuzzy search, I want to use the power of Sphinx to search only on tags or issues, maybe sometimes the user wants to search on headline and issues.
How can I perform such a task?
You need to put it in Extended Match Mode
https://metacpan.org/module/JJSCHUTZ/Sphinx-Search-0.27.2/lib/Sphinx/Search.pm#SetMatchMode
Then you can use Extended Query syntax
http://sphinxsearch.com/docs/current.html#extended-syntax
Which includes the field search operator
#tags keyword1
(Be careful with sphinx, the word "attribute" has a specific meaning - values attached to the document, useful for sorting/grouping/filtering and returning with the resultset. Whereas I think you are talking about fields. All the columns from the sql_query you dont mark as an attribute, are a field - and full text searchable)