Combining all text fields into a single searchable field? - sphinx

One would normally have this query in their sphinx.conf file :
sql_query = SELECT id,text_field1,text_field2,text_field3 FROM table_name
Would there be much difference if I combine all fields into one searchable text field like so?
sql_query = SELECT id, CONCAT(text_field1,text_field2,text_field3) as searchable_text FROM table_name
What benefits does one have over the other?
Thanks!

I think either way is generally fine... however, Sphinx has the ability to focus queries at certain fields (see the extended query syntax examples). If you merge all the columns into one field, you'll lose that ability.
You'll also lose the ability to weight certain fields higher than others.

CONCAT(text_field1,text_field2,text_field3) is wrong
use CONCAT(text_field1,' ',text_field2,' ',text_field3)
but it's better to let index separate fields
search returns same result but you can select one of list if needed
'#text_field2 foo'

Related

Multiple optional query parameters with PostgreSQL

I use PostgreSQL10 and I want to built queries that have multiple optional parameters.
A user must input area name, but then it is optional to pick none or any combination of the following event, event date, category, category date, style
So a full query could be "all the banks (category), constructed in 1990 (category date) with modern architecture (style), that got renovated in 1992 (event and event date) in the area of NYC (area) ".
My problem is that all those are in different tables, connected by many-to-many tables, so I cannot do something like
SELECT * FROM mytable
WHERE (Event IS NULL OR Event = event)
I dont know if any good will come if I just join four tables.
I can easily find the area id, since it is required, but I dont know what the user chose, beside that.
Any suggestions on how to approach this, with Postgre?
Thanks
It might be optimal to build the entire query dynamically and only join in tables that you know you're going to need in order to apply the user's filters, but it's impractical. You're better off creating a view on the full set of tables. Use LEFT OUTER JOINs to ensure that you don't accidentally filter out valid combinations and index your tables to ensure that the query planner can navigate the table graph quickly. Then query the view with a WHERE clause reflecting only the filters you want to apply.
If performance becomes a concern and you don't mind having non-realtime data, you could use a materialized view to cache the results. Materialized views can be indexed directly, but this is a pretty radical change so don't do this unless you have to.

How to get best matching products by number of matches in postgres

Postgres 9.1 shopping cart contains product table
create table products (
id char(30) primary key,
name char(50),
description text );
Cart has search field. If something is entered into it, autocomplete dropdown must show best matching products ordered by number products matched to this criteria.
How to implement such query in postgres 9.1 ? Search should performed by name and description fields in products table.
It is sufficient to use substring match. Full text search with sophisticated text match is not strictly required.
Update
Word joote can be part of product name or description.
For example for first match in image text may contain
.. See on jootetina ..
and for other product
Kasutatakse jootetina tegemiseks ..
and another with upper case
Jootetina on see ..
In this case query should return word jootetina and matching count 3.
How to make it working like auotcomplete which happens when search term is typed in Google Chrome address bar ?
How to implement this ?
Or if this is difficult, how to return word jootetina form all those texts which matches search term joote ?
select word, count(distinct id) as total
from (
select id,
regexp_split_to_table(name || ' ' || description, E'\\s+') as word
from products
) s
where position(lower('joote') in lower(word)) > 0
group by word
order by 2 desc, 1
First of all, do not use the data type char(n). That's a misunderstanding, you want varchar(n) or just text. I suggest text.
Any downsides of using data type "text" for storing strings?
With that fixed, you need a smart index-based approach or this is a performance nightmare. Either trigram GIN indexes on the original columns or a text_pattern_ops btree index on a materialized view of individual words (with count).
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
The MV approach is probably superior for many repetitions among words.

Mysql to Sphinx query conversion

how can i write this query on sphinx select * from vehicle_details where make LIKE "%john%" OR id IN (1,2,3,4), can anyone help me? I've search a lot and i can't find the answer. please help
Well if you really want to use sphinx, could perhaps make id into a fake keyword, so can use it in the MATCH, eg
sql_query = SELECT id, CONCAT('id',id) as _id, make, description FROM ...
Now you have a id based keyword you can match on.
SELECT * FROM index WHERE MATCH('(#make *john*) | (#_id id1|id2|id3|id4)')
But do read up on sphinx keyword matching, as sphinx by default only matches whole words, you need to enable part word matching with wildcards, (eg with min_infix_len) so you can get close to a simple LIKE %..% match (which doesnt take into account words)
Actually pretty hard to do, becuase you mixing a string search (the LIKE which will be a MATCH) - with an attribute filter.
Would suggest two seperate queries, one to sphinx for the text filter. And the IN filter just do directly in database (mysql?). Merge the results in the application.

Must be possible to filter table names in a single database?

As far as I can tell, the search filter in the navigator will only search available database names, not table names.
If you click on a table name and start typing, it appears that a simple search can be performed beginning with the first letter of the tables.
I'm looking for way to be able to search all table names in a selected database. Sometimes there can be a lot of tables to sort through. It seems like a feature that would likely be there and I can't find it.
Found out the answer...
If you type for example *.test_table or the schema name instead of the asterisk it will filter them. The key is that the schema/database must be specified in the search query. The asterisk notation works with the table names as well. For example *.*test* will filter any table in any schema with test anywhere in the table name.
You can use the command
SHOW TABLES like '%%';
To have it always in your tools, you can add it as a snippet to SQL aditions panel on the right.
Then you can always either bring it in your editor and type your search key between %%, or just execute it as it is (It will fetch all the tables of the database) and then just filter using the "filter rows" input of the result set.

FullText Index - Searching values from another table

Is it possible, in SQL Server 2008, using the full text index syntax, to run a query such as this one?
SELECT *
FROM TABLE_TO_SEARCH S,
TABLE_WITH_STRINGS_TO_SEARCH SS
WHERE
CONTAINS(S.WHOLE_NAME,SS.FIRST_NAME)
OR CONTAINS(S.WHOLE_NAME,SS.LAST_NAME)
I need to search for the FIRST_NAME in table TABLE_TO_SEARCH, column WHOLE_NAME that has an full text index on it. It doesn't seem to be a valid query though... Is there any workaround to it by using the full text index search?
LATER EDIT:
Here is the business case: each night I am downloading from several websites information about "blacklisted" individuals and insert it into a table in this format: WholeName, LastName, FirstName, MiddleName. But the data is chaotic as WholeName does not necessarily contain either the last, first or middle name or the WholeName is null while the other 3 fields have values, or every of these 4 fields is null and so on. Also, the data may repeat itself as one blacklisted individual may come from 2+ of these websites. What I need to do is to compare this data, as chaotic as it is, against our customer data based on our customer's First and Last name and give it a matching score (rank) against the files we download from these websites.
First I tried with charindex or like operators but I couldn't create a scoring algorithm based on this and also it took 6 hours to compare just our customer's first and last name with only the WholeName column from the TABLE_TO_SEARCH table. I thought that perhaps implementing the full_text index it would get easier and faster but ... apparently I was wrong.
Has anyone dealt with a task like this? And if so, what was the best approach?
After skimming http://technet.microsoft.com/en-us/library/ms187787.aspx and http://technet.microsoft.com/en-us/library/ms142571.aspx I don't think it is possible to do your search in this way. Not only that, but it seems this type of index wouldn't work well with names anyway.
If you care about checking one name then all you have to do is set those values to variables. This method would allow you to use the full-text index.
Otherwise, I would suggest splitting the WHOLE_NAME column (if there is a space or unique character between the first and last name) and comparing each part to those other columns. If you are working with a huge data set, you may want to experiment with doing this at a temp table level and creating an index.
Good luck!