Can I create an index FROM another index? - sphinx

I am doing various regepx_filters in an index to modify the stored index text, this from data that is originally in tagged html format (multiple zones). After I do so is is possible to now make a second index based on the first modified index that uses only one of the zones in the original index?
index_zones = Title, Author, Description
Can I, after indexing this with a custom configuration, then duplicate this index in some way that says
Create IndexB based on IndexA using ZONE:(Title) only
Say for instance I did the following regexp:
regexp_filter=(<Title>.*?ipad.*?)(<\/Title>)<Description>.*?Used.*?<\/Description>=>\1 Used\2 in order to index used into the Title Zone.
Now I want to reindex or make a new index with just the newly indexed
<Title>Bla bla ipad bla bla Used</Title>
is this possible? If not can I then update my Mysql table with the newly indexed text?

I don't think it's possible to create an index based on an existing sphinx index. I also don't think its possible to retrieve the regexp_filtered result - im pretty sure its only available to query against.
Why dont you do your regex's before sphinx indexing? For example, create a new db column ipad_used_regex and populate this with whatever scripting language you choose. Or using mariaDb with the PCRE Regex Enhancements you could build the regex match into the SQL, something like this:
SELECT Title, REGEXP_REPLACE(Title, "(<Title>.*?ipad.*?)(<\/Title>)<Description>.*?Used.*?<\/Description>", '\\1 Used\\2') as ipad_used_regex
FROM `your_table`
You could then use this SQL in your sphinx index source?

Related

Must be possible to filter table names in a single database?

As far as I can tell, the search filter in the navigator will only search available database names, not table names.
If you click on a table name and start typing, it appears that a simple search can be performed beginning with the first letter of the tables.
I'm looking for way to be able to search all table names in a selected database. Sometimes there can be a lot of tables to sort through. It seems like a feature that would likely be there and I can't find it.
Found out the answer...
If you type for example *.test_table or the schema name instead of the asterisk it will filter them. The key is that the schema/database must be specified in the search query. The asterisk notation works with the table names as well. For example *.*test* will filter any table in any schema with test anywhere in the table name.
You can use the command
SHOW TABLES like '%%';
To have it always in your tools, you can add it as a snippet to SQL aditions panel on the right.
Then you can always either bring it in your editor and type your search key between %%, or just execute it as it is (It will fetch all the tables of the database) and then just filter using the "filter rows" input of the result set.

FullText Index - Searching values from another table

Is it possible, in SQL Server 2008, using the full text index syntax, to run a query such as this one?
SELECT *
FROM TABLE_TO_SEARCH S,
TABLE_WITH_STRINGS_TO_SEARCH SS
WHERE
CONTAINS(S.WHOLE_NAME,SS.FIRST_NAME)
OR CONTAINS(S.WHOLE_NAME,SS.LAST_NAME)
I need to search for the FIRST_NAME in table TABLE_TO_SEARCH, column WHOLE_NAME that has an full text index on it. It doesn't seem to be a valid query though... Is there any workaround to it by using the full text index search?
LATER EDIT:
Here is the business case: each night I am downloading from several websites information about "blacklisted" individuals and insert it into a table in this format: WholeName, LastName, FirstName, MiddleName. But the data is chaotic as WholeName does not necessarily contain either the last, first or middle name or the WholeName is null while the other 3 fields have values, or every of these 4 fields is null and so on. Also, the data may repeat itself as one blacklisted individual may come from 2+ of these websites. What I need to do is to compare this data, as chaotic as it is, against our customer data based on our customer's First and Last name and give it a matching score (rank) against the files we download from these websites.
First I tried with charindex or like operators but I couldn't create a scoring algorithm based on this and also it took 6 hours to compare just our customer's first and last name with only the WholeName column from the TABLE_TO_SEARCH table. I thought that perhaps implementing the full_text index it would get easier and faster but ... apparently I was wrong.
Has anyone dealt with a task like this? And if so, what was the best approach?
After skimming http://technet.microsoft.com/en-us/library/ms187787.aspx and http://technet.microsoft.com/en-us/library/ms142571.aspx I don't think it is possible to do your search in this way. Not only that, but it seems this type of index wouldn't work well with names anyway.
If you care about checking one name then all you have to do is set those values to variables. This method would allow you to use the full-text index.
Otherwise, I would suggest splitting the WHOLE_NAME column (if there is a space or unique character between the first and last name) and comparing each part to those other columns. If you are working with a huge data set, you may want to experiment with doing this at a temp table level and creating an index.
Good luck!

Grails index field order

Consider a table containing NAME, DATE, and VALUE, (amongst other things) that I want to index on NAME/DATE/VALUE and NAME/VALUE/DATE.
How do I configure this in Grails?
The following always creates both NAME/DATE/VALUE...
name(index:'name_date_value_idx,name_value_date_idx')
date(index:'name_date_value_idx,,name_value_date_idx')
value(index:'name_date_value_idx,name_value_date_idx')
The following ignores the first date index meaning name_date_value_idx doesn't contain the date field when created...
name(index:'name_date_value_idx,name_value_date_idx')
date(index:'name_date_value_idx')
value(index:'name_date_value_idx,name_value_date_idx')
date(index:'name_value_date_idx')
For completeness we're using a PostgreSQL Database. I would hope that this wouldn't matter though.
Thanks very much.

Postgres full-text search with synonyms

I have a database of restaurants which I do a full-text search on. The code looks something like this:
SELECT * FROM restaurant WHERE restaurant.search_vector ## plainto_tsquery(:terms);
And search_vector is defined like this:
alter table restaurant add column search_vector tsvector;
create index restaurant_search_index on restaurant using gin(search_vector);
create trigger restaurant_search_update before update or insert on restaurant
for each row execute procedure
tsvector_update_trigger('search_vector',
'pg_catalog.english','title');
Now, a notable problem with this search is the word barbecue. It can be spelled many different ways: barbecue, barbeque, BBQ, B.B.Q., B-B-Q, etc. When somebody searches any of these, I need to search restaurants for all of these terms.
From what I've read online, it seems I need to modify the dictionary (That would be pg_catalog.english, right?), but I'm not sure how to go about this.
Sounds like what you want to do is add a synonym dictionary in front of your english one. This will only work on single words though, so you might have problems with B.B.Q. if it gets parsed as three separate tokens.
Synonym dictionaries in postgresql.org docs
When I drumbled over a similiar problem I came across the option for Query Rewrites, see http://www.postgresql.org/docs/8.3/static/textsearch-features.html forexample, section 12.4.2.1
This is an easier approach then tackling the dictionary as it allows instantly extending your rewrite rules by just inserting new rules in your rewrite table.

Postgres full text search across multiple related tables

This may be a very simplistic question, so apologies in advance, but I am very new to database usage.
I'd like to have Postgres run its full text search across multiple joined tables. Imagine something like a model User, with related models UserProfile and UserInfo. The search would only be for Users, but would include information from UserProfile and UserInfo.
I'm planning on using a gin index for the search. I'm unclear, however, on whether I'm going to need a separate tsvector column in the User table to hold the aggregated tsvectors from across the tables, and to setup triggers to keep it up to date. Or if it's possible to create an index without a tsvector column that'll keep itself up to date whenever any of the relevant fields in any of the relevant tables change. Also, any tips on the syntax of the command to create all this would be much appreciated as well.
Your best answer is probably to have a separate tsvector column in each table (with an index on, of course). If you aggregate the data up to a shared tsvector, that'll create a lot of updates on that shared one whenever the individual ones update.
You will need one index per table. Then when you query it, obviously you need multiple WHERE clauses, one for each field. PostgreSQL will then automatically figure out which combination of indexes to use to give you the quickest results - likely using bitmap scanning. It will make your queries a little more complex to write (since you need multiple column matching clauses), but that keeps the flexibility to only query some of the fields in the cases where you want.
You cannot create one index that tracks multiple tables. To do that you need the separate tsvector column and triggers on each table to update it.