postgres: Search for multiple words in string - postgresql

I'm implementing a rudimentary form of search for my company backoffice system. I would like to find all products names that contain all words in a search query.
So if I have these two products:
Deodorant with cucumber flavor
Deoderant with apple flavor
the search query: cucumber deoderant should match only Deoderant with cucumber flavor.
I can make it work for the query deoderant cucumber like so:
SELECT product_name FROM products WHERE name ~* regexp_replace('deoderant cucumber', '\s+', '.*', 'g');
but this approach does not work when the order is not the name in the query and the product name.
I know that this is possible by doing something like
SELECT product_name FROM products WHERE name ~* 'deoderant' AND name ~* cucumber';
but I would ideally like to stay away from string interpolation as it becomes a bit messy in my current environment.
Is there another, more elegant way to do this?

You could convert the value into an array and then use the "contains" operator #> with the words you are looking for:
SELECT product_name
FROM products
WHERE regexp_split_to_array(lower(name), '\s+') #> array['deodorant', 'cucumber'];
Online example: https://rextester.com/GNL7374

Related

Search selected tabels in database and line up results?

I'm trying to build a sarch page on my case management system, but I'm struggling to get something dynamic in place.
I would like to search multiple tables for a given string.
And return a list of cases these refer to.
Ie I have 3 tables (the project include several more, but to explain I just use 3).
1: Case main table, including caseID, title, and description.
2: notes table, including a ntext note field.
This is 1:* from the case table so each case can have multiple notes
3: adress table, including street and city for the case
This is also 1:* from the case table so each case can have multiple addresses
I would like to search for ie "Sunset Boulevard", and if the string is found in either the case title, the note or the address I would like to return the list of cases that match.
I can do a normal SELECT statement at get the caseID and Title and in the WHERE Clause specify which to include, ie:
SELECT CaseID, Title
FROM Cases
WHERE Cases.caseID IN ( SELECT CaseID FROM notes WHERE Notes.note like '%Sunset boulevard%' )
OR Cases.caseID IN ( SELECT CaseID FROM address WHERE address.address1 like '%Sunset boulevard%' )
And then expand the where clause to all columns I want to search.
But that won't give me any hint on where the searched string has been found.
I also found another article here https://stackoverflow.com/a/709137
and could use this to search entire database for fields, but this will still not give me a list of cases.
Anyone got a "best practice" for creating small search engine on website?
Best practice will be to move so massive search functionality outside OLTP area and use search engine: eg. Solr, Sphinx, Elasticsearch etc.

How can I match up user inputs to ambiguous city names?

We have a set of tables shown below we use for our other tables to reference for location data. Some examples are:
Find all companies within X miles of X City
Create a company profile's location as X City
We solve the problem of multiple cities with similar names by matching with State as well, but now we ran into a different set of problems. We use Google's Place Autocomplete for both Geocoding and matching up a users query with our Cities. This works fairly well until Google's format deviates from ours.
Example:
St. Louis !== Saint Louis and
Ameca del Torro !== Ameca Torro
Is there a way to fuzzy match cities in our queries?
Our query to match cities now looks like:
SELECT c.id
FROM city c
INNER JOIN state s
ON s.id = c.state_id
WHERE c.name = 'Los Angeles' AND s.short_name = 'CA'
I've also considered the denormalizing city and simply storing coordinates to still accomplish the radius search. We have around 2 million rows in our company table now so a radius search would be performed on that rather than by city table with a JOIN on company. This would also mean we wouldn't be able to create custom regions (simply anyway) for cities, and add other attributes to cities in the future.
I found this answer but it is basically affirming our way of normalizing input is a good method, but not how we match to our local Table (unless Google offers a City Name export I don't know about).
The short answer is that you can use Postgres's full text search functionality, with a customized search configuration.
Since your dealing with place names, your probably want to avoid stemming, so you can use the simple configuration as a starting point. You can also add stop-words that make sense for place names (with the examples above, you can probably consider "St.", "Saint", and "del" as stop-words).
A pretty basic outline of setting up your customized is below:
Create a stopwords file and put it in your $SHAREDIR/tsearch_data Postgres directory. See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS.
Create a dictionary that uses this stopwords list (you can probably use the pg_catalog.simple as your template dictionary). See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY.
Create a search configuration for place names. See https://www.postgresql.org/docs/9.1/static/textsearch-configuration.html.
Alter your search configuration to use the dictionary you created in Step 2 (cf. the link above).
Another consideration is how to consider internationalization. It seems that the issue for your second example (Ameca del Torro vs. Ameca Torro) might be a Spanish vs. English representation of the name. If that's the case, you could also consider storing both a "localized" and "universal" (e.g. English) version of the city name.
At the end, your query (using full-text search) might look like this (where the 'places' is the name of your search configuration):
SELECT cities."id"
FROM cities
INNER JOIN "state" ON "state".id = cities.state_id
WHERE
"state".short_name = 'CA'
AND TO_TSVECTOR('places', cities.name) ## TO_TSQUERY('places', 'Los & Angeles')

How to get best matching products by number of matches in postgres

Postgres 9.1 shopping cart contains product table
create table products (
id char(30) primary key,
name char(50),
description text );
Cart has search field. If something is entered into it, autocomplete dropdown must show best matching products ordered by number products matched to this criteria.
How to implement such query in postgres 9.1 ? Search should performed by name and description fields in products table.
It is sufficient to use substring match. Full text search with sophisticated text match is not strictly required.
Update
Word joote can be part of product name or description.
For example for first match in image text may contain
.. See on jootetina ..
and for other product
Kasutatakse jootetina tegemiseks ..
and another with upper case
Jootetina on see ..
In this case query should return word jootetina and matching count 3.
How to make it working like auotcomplete which happens when search term is typed in Google Chrome address bar ?
How to implement this ?
Or if this is difficult, how to return word jootetina form all those texts which matches search term joote ?
select word, count(distinct id) as total
from (
select id,
regexp_split_to_table(name || ' ' || description, E'\\s+') as word
from products
) s
where position(lower('joote') in lower(word)) > 0
group by word
order by 2 desc, 1
First of all, do not use the data type char(n). That's a misunderstanding, you want varchar(n) or just text. I suggest text.
Any downsides of using data type "text" for storing strings?
With that fixed, you need a smart index-based approach or this is a performance nightmare. Either trigram GIN indexes on the original columns or a text_pattern_ops btree index on a materialized view of individual words (with count).
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
The MV approach is probably superior for many repetitions among words.

postgresql multiple column search with ranking

I want to search for multiple columns in multiple tables. Like this:
Given tables:
Users
id
first_name
last_name
email
Companies
user_id
address
Lands
name
company_id
Lets say User is Johny Bravo(johny.bravo#gmail.com) working in Washington in United States.
I want to find the record based on query
"ate" -> from United States, or
"rav" from Bravo
When I type "rav" my Johny Bravo rank is higher than Johny Bravos with other emails so it is first in results
How can I implement such functionality?
I've looked at ts_vector and ts_rank but it seems that it supports only right wildcard ("to_tsquery('Brav:*')") will work, also I don't need full-text-search functionalities(I will look for adresses and usernames so no need to alias names etc.) I can do wildcard search but then I would have to manually calculate ranking in application
You could use pg_trgm extension.
You must have the contrib installed, then you install the extension:
create extension pg_trgm;
Then you can create trigram indexes:
create index user_idx on user using gist (user_data gist_trgm_ops);
And you can then query which will give you first 10 most similar values:
select * from user order by user_data <-> 'rav' limit 10;
Note that you can replace user_data with an immutable function, which can concatenate all of the info into one (text) field thus enabling search across more fields.
To get "ranking score", you can use similarity function, which returns 1 for identical strings and 0 for completely unrelated.
If you need full text search across whole database, a better solution might be a separate search facility, such as Apache Solr.

Mysql to Sphinx query conversion

how can i write this query on sphinx select * from vehicle_details where make LIKE "%john%" OR id IN (1,2,3,4), can anyone help me? I've search a lot and i can't find the answer. please help
Well if you really want to use sphinx, could perhaps make id into a fake keyword, so can use it in the MATCH, eg
sql_query = SELECT id, CONCAT('id',id) as _id, make, description FROM ...
Now you have a id based keyword you can match on.
SELECT * FROM index WHERE MATCH('(#make *john*) | (#_id id1|id2|id3|id4)')
But do read up on sphinx keyword matching, as sphinx by default only matches whole words, you need to enable part word matching with wildcards, (eg with min_infix_len) so you can get close to a simple LIKE %..% match (which doesnt take into account words)
Actually pretty hard to do, becuase you mixing a string search (the LIKE which will be a MATCH) - with an attribute filter.
Would suggest two seperate queries, one to sphinx for the text filter. And the IN filter just do directly in database (mysql?). Merge the results in the application.