Oracle text with multiple phrases and Boolean logic - oracle12c

Am I doing it right? I am querying records that must contain exact phrase "buy now" and must contain one of the following
exact phrase "lonely beach" or
both words backpackers and camping
SELECT *
FROM Table1
WHERE contains(Text ,'buy now') >0
AND contains(Text ,'(lonely beach) OR (backpackers and camping)')>0
Can this query be simplified to one CONTAINS? How about brackets? Am I using them right?

Related

Postgresql Query - Return all matching search terms for each result row when using an ANY query and LIKE

Essentially what I'm trying to figure out is if there is a way to return all matching search terms in addition to the matched row when running a query that looks up a list of items using ANY or IN. In most cases the search term will exactly match the returned column value but in cases such as text search or with certain extensions like IP4r this is not always the case. In addition, you can have multiple search terms match on a single row.
To make this concrete suppose this is my query:
SELECT id, item_name, description FROM items WHERE description LIKE ANY('{%gaming%, %computer%, %socks%, %men%}');
and it returns the following two rows:
id, item_name, description
1, 'computer', 'super fast gaming computer that will help you win'
5, 'socks', 'These socks are sure to please the men in your family'
What I'd like to know is which original search terms map to the result row that was returned. In other words, I'd like the returned rows to look like this:
id, search_terms, item_name, description
1, '{%gaming%, %computer%}', 'computer', 'super fast gaming computer that will help you win'
5, '{%socks%, %men%}', 'socks', 'These socks are sure to please the men in your family'
Is there a way to efficiently do this in PostgreSQL? In the example above we're using LIKE with strings but in my real-world scenario I'm using the IP4r extension to do IP lookups against CIDR ranges where you can have multiple IP addresses in the same returned CIDR range.
I previously asked this question: PostgreSQL 9.5: Return matching search terms in each result row when using LIKE which used a CASE statement to almost solve the problem I'm describing here.
The added complexity in the scenario above is that you can have multiple search terms match a single row (e.f., gaming and computer are both matches for the description super fast gaming computer that will help you win). If you use a CASE statement then only the first match in the CASE statement gets set as the search term and you miss any other matching search terms.
Thank you for your help!
This would be a way using VALUES:
SELECT i.id, i.item_name, i.description, m.pat
FROM items AS i
JOIN (VALUES ('%gaming%'), ('%computer%'), ('%socks%'), ('%men%')) AS m(pat)
ON i.description LIKE m.pat;

Convert to SARGable query

I want to write a query to search the containing string in the table.
Table:
Create table tbl_sarg
(
colname varchar(100),
coladdres varchar(500)
);
Note: I just want to use Index Seek for searching on 300 millions of records.
Index:
create nonclustered index ncidx_colname on tbl_sarg(colname);
Sample Records:
insert into tbl_sarg values('John A Mak','HNo 102 Street Road Uk');
insert into tbl_sarg values('Shawn A Meben','Church road USA');
insert into tbl_sarg values('Lee Decose','ShopNo 22 K Mark UK');
insert into tbl_sarg values('James Don','A Mall, 90 feet road UAE');
Query 1:
select * from tbl_sarg
where colname like '%ee%'
Actual Execution Plan:
Query 2:
select * from tbl_sarg
where charindex('ee',colname)>0
Actual Execution Plan:
Query 3:
select * from tbl_sarg
where patindex('%ee%',colname)>0
Actual Execution Plan:
How to force the query processor to use the index seek instead table/index scan on large data set?
All the queries that you have posted, by definition are not SARgable, for instance, the use of '%..%'' automatically force the Query Engine to do a Scan, the other case is the use of functions (as charindex or patindex) inside your column inside a predicate.
Here some post: https://bertwagner.com/2017/08/22/how-to-search-and-destroy-non-sargable-queries-on-your-server/
Kimberly Tripp has written very interesting articles about it if for you is mandatory to execute this kind of query with wildcards, maybe it is worth to check about the possibility of using FullTextSearch feature. My point is, or your limit and do a precise predicate into your queries or you will have to change of strategy, almost forget, don't try to force the use of Seek with HINT, I can't see that this medicine will be better than the illness.
A search argument, or SARG in short, is a filter predicate that enables the optimizer to rely on
index order. The filter predicate uses the following form (or a variant with two delimiters of a
range, or with the operand positions flipped):
WHERE <column> <operator> <expression>
Such a filter is sargable if:
You don’t apply manipulation to the filtered column.
The operator identifies a consecutive range of qualifying rows in the index. That’s the
case with operators like =, >, >=, <, <=, BETWEEN, LIKE with a known prefix, and so on.
That’s not the case with operators like <>, LIKE with a wildcard as a prefix.
In most cases, when you apply manipulation to the filtered column, the optimizer doesn’t
try to be too smart and understand the meaning of the calculation, and if index ordering
can still be relied on. It simply assumes that the result values might sort differently than the
source values, and therefore index ordering can’t be trusted.
So why doesn’t SQL Server use the index for the %ee% query? Pretend for a moment that you held a phone book in your hand, and I asked you to find everyone whose last name contains the letters %ee%. You would have to scan every single page in the phone book, because the results would include things like:
Anne Lee
Lee Yung
Kathlee
Aleen
When I asked you for all last names containing %ee% anywhere in the name, my query was not sargable – meaning, you couldn’t leverage the indexes to do an index seek.
That’s where SQL Server’s Full Text Search comes in.

How to filter out data with special character in vectorwise

There is a column named Prod_code in my product table.
I need to select only valid prod_code from product table and load it into another table.
valid prod_code are the codes which don't have any special characters in it.
VALID prod_code: WER1234, ASD1345
INVALID prod_code: ABC$123,LPS????,$$$ (which I need to check and filter out).
How can I achieve this?
Src list of prod code
WER1234
ASD1345
ABC$123
LPS????
$$$
target list of prod code
WER1234
ASD1345
This is pretty hideous but it does the job. It uses the LIKE predicate to reject unacceptable strings. You can probably also use the SIMILAR TO predicate to specify acceptable strings but I would guess that LIKE is faster.
select prod_code from products
where prod_code not like '%\['+x'00'+'-'+x'2f'+','+
x'3a'+'-'+x'40'+','+
x'5d'+'-'+x'7f'+'\]%' escape '\'
and prod_code not like '%\%'
and prod_code not like '%[%'

Return rows where words match in two columns OR in match in one column and the other column is empty?

This is a follow-up to another question I recently asked.
I currently have a SphinxQL query like this:
SELECT * FROM my_index
WHERE MATCH(\'#field1 "a few words"/1 #field2 "more text here"/1\')
However, I would still like it to match rows in the case where one of the fields in the row is empty.
For example, let's say the following rows exist in the database:
field1 | field2
-----------------------
words in here | text in here
| text in here
The above query would match the first row, but it would not match the second row because the quorum operator specifies that there has to be one or more matches for each field.
Is what I'm asking possible?
The actual query I'm trying to make this work with was provided in Barry Hunter's answer to my previous question:
sphinxQL> SELECT *, WEIGHT() AS w FROM index
WHERE MATCH('#tags "cute hairy happy"/1 #tags2 "one two thee"/1') AND w = 2
OPTION ranker=expr('SUM(IF(word_count>=IF(user_weight=2,tags2_len,tags_len),1,0))'),
field_weights=(tags=1,tags2=2);
First problem is sphinx doesn't index "empty" so you can't search for it. (well actually the field_len attribute will be zero. But it can be hard to combine attribute filter with MATCH())
... so arrange for empty to be something to index
sql_query = SELECT id,...,IF(tags='','_empty_',tags) AS tags FROM ...
Then modify the query. As it happens your quorum search is easy!
#field1 "a few words _empty_"/1
Its just another word. But a more complex query would just have to be OR'ed with the word.
Then there is making it work within your complex query. But as luck would have it, its really easy. _empty_ is just another word. And in the case of the field being empty, one word will match. (ie there are no words in the field, not in the query)
So just add _empty_ into the two quorums and you done!

Postgresql full text search on really short documents (filename)

I have a database of filenames in which I'm trying to search using PGs full text search facility. I'm running the search query on a table of filenames, the problem is that the ranking functions are not ranking the results as I'd like them to do. For the sake of argument, let's assume the schema looks like this:
create table files (
id serial primary key,
filename text,
filename_ft tsvector
);
The query that I run looks something like this:
select filename, ts_rank(filename_ft, query)
from files, to_tsquery('simple', 'a|b|c') as query
where query ## name_ft
order by rank desc limit 5;
This will return the 5 results with the highest rank. However, those search queries are coming from another process, and in most cases the queries have some 'garbage' in them. For instance, a query for 'a xxxx' might be executed, where xxxxx is just a bunch of other terms. In most cases this still returns the correct results, because the suffix is simply not in the database.
However, sometimes a query contains some extraneous information that screws with the ranking function. For instance, a query for 'a b c' will return a filename containing the tokens 'b c' as first result, and an exact match on 'a' as second result, my guess this is due to the fact the the first result contains a larger percentage of the actual search tokens.
In most cases (if not all) the most important token appears as the first token in the query, so my question is, is there a way to give the tokens in the query a weight?
is there a way to give the tokens in the query a weight?
Yes, there is. See the documentation; search for "weight".
Whether assigning weights is the right choice is another matter. It sounds to me like you really want to exclude some of the data from the inputs to to_tsvector in index creation and searching, so you just don't include that garbage in the index.