I'm learning about Sphinx with the book "Introduction to Search with Sphinx" and the first query with the test data:
mysql> select * from test1 where match('test');
should produce this output:
but this is the output I am getting:
Why is the weight column missing from my result?
SphinxQL used to do a number of things 'magically', one of them was automatically add columns depending on the query. But it was decided that magic behaviour often confused programs (wasnt sure if the column would be there or not)
... so now, columns are only there if you explicitly ask for them (although can still use * for id+attributes)
mysql> select *,weight() as weight from test1 where match('test');
Related
I am working on the lab test values mapping (MEASUREMENT table of the OMOP CDM).
My local mapping table (handmade) has my measurement name (in French) and the associated LOINC code.
The LOINC vocabulary has been loaded from Athena (OHDSI community tool) https://athena.ohdsi.org/search-terms/
I load my local concepts into the CONCEPT table, then use an SQL query to associate the equivalent LOINC concept_id (from concept_code mapping/LOINC source codes).
I realise that the link is not made on the LOINC concept_code.
Indeed, when I filter the CONCEPT table on a LOINC concept_code (ex 34714-6) I find no result.
select *
from omop.concept
where concept_code in ('34714-6');
When I filter on the corresponding concept_id (3032080) I find the result with the desired concept_code.
select *
from omop.concept
where concept_id in ('3032080');
I have tested concept_code like '34714__' which returns the expected line.
This is not due to the encoding because when I copy/paste the resulting concept_code (filtering on concept_id = ‘3032080’) into my query concept_code in ('34714-6') I get the same problem.
However other LOINC codes work:
select *
from omop.concept
where concept_code in ('14646-4');
When I check what symbol exacty is being used :
select ASCII(substr(concept_code,1,1))
,ASCII(substr(concept_code,2,1))
,ASCII(substr(concept_code,3,1))
,ASCII(substr(concept_code,4,1))
,ASCII(substr(concept_code,5,1))
,ASCII(substr(concept_code,6,1))
,ASCII(substr(concept_code,7,1))
from omop.concept
where concept_id = 3032080 ;
I also checked/removed the whitespaces.
The same process works on drugs (concept_code from ATC).
Can you tell me where this error comes from?
Thank you for your help.
please check that you're using the latest sql-client and JDBC driver versions
I am writing dynamic sql code and it would be easier to use a generic where column in (<comma-seperated values>) clause, even when the clause might have 1 term (it will never have 0).
So, does this query:
select * from table where column in (value1)
have any different performance than
select * from table where column=value1
?
All my test result in the same execution plans, but if there is some knowledge/documentation that sets it to stone, it would be helpful.
This might not hold true for each and any RDBMS as well as for each an any query with its specific circumstances.
The engine will translate WHERE id IN(1,2,3) to WHERE id=1 OR id=2 OR id=3.
So your two ways to articulate the predicate will (probably) lead to exactly the same interpretation.
As always: We should not really bother about the way the engine "thinks". This was done pretty well by the developers :-) We tell - through a statement - what we want to get and not how we want to get this.
Some more details here, especially the first part.
I Think this will depend on platform you are using (optimizer of the given SQL engine).
I did a little test using MySQL Server and:
When I query select * from table where id = 1; i get 1 total, Query took 0.0043 seconds
When I query select * from table where id IN (1); i get 1 total, Query took 0.0039 seconds
I know this depends on Server and PC and what.. But The results are very close.
But you have to remember that IN is non-sargable (non search argument able), it will not use the index to resolve the query, = is sargable and support the index..
If you want the best one to use, You should test them in your environment because they both work so good!!
I had what I thought was a fairly straightforward SphinxQL query, but it turns out # variables are deprecated (see example below)
SELECT *,#weight AS m FROM test1 WHERE MATCH('tennis') ORDER BY m DESC LIMIT 0,1000 OPTION ranker=bm25, max_matches=3000, field_weights=(title=10, content=5);
I feel like there must be a way to sort the results by strength of match. What is the replacement?
On another note, what if I want to include in it a devaluation if certain other words appear. For example, let's say I wanted to devalue results that had the word "apparel" in them. Could that be executed in the same query?
Thanks!
Well results are 'by default' in weight decending, so just do...
SELECT * FROM test1 WHERE MATCH('tennis') LIMIT 0,1000 OPTION ...
But otherwise its, just the # variables, are replaced by 'functions' mainly because its more 'SQL like'. So #weight, is WEIGHT()
SELECT * FROM test1 WHERE MATCH('tennis') ORDER BY WEIGHT() DESC ...
or
SELECT *,WEIGHT() AS m FROM test1 WHERE MATCH('tennis') ORDER BY m DESC ...
For reference #group is instead GROUPBY(), #count is COUNT(*), #distinct is COUNT(DISTINCT ...), #geodist is GEODIST(...) , and #expr doesnt really have an equivlent, either just use the expression directly, or use your own custom named alias.
As for second question. Kinda tricky, they isnt really a 'negative' weighter. Ther is a keyword boost operator, but as far can't use it to specifically devalue.
The only way I can think maybe have it work, is if negative match was against a specific field, could build a complex ranking exspression. Basically as a negative weight instead, would need a specific field for the ranking expression, so could use to select that column
... MATCH('#!(negative) tennis #negative apparel')
... OPTION ranker=expr('SUM(word_count*IF(user_weight=99,-1,1))'), field_weights(negative=99)
That's a very basic demo expression for illustrative purposes, a real one would probably be a lot more complex. Its just showing using 99 as a placeholder for 'negative' multiplication.
Would need the new negative field creating, which could just be a duplicate of other field(s)
The program just selects everything if the carrid is ok even if it is not ok with the lt_spfli. And there aren't any entries with that carrid it gets runtime error. If I try with for all entries he just selects absolutely the entire SFLIGHT.
PARAMETERS: pa_airp TYPE S_FROMAIRP,
pa_carid TYPE S_CARR_ID.
DATA: lt_spfli TYPE RANGE OF SPFLI,
lt_sflight TYPE TABLE OF SFLIGHT.
SELECT CONNID FROM SPFLI
INTO TABLE lt_spfli
WHERE AIRPFROM = pa_airp.
SELECT * FROM SFLIGHT
INTO TABLE lt_sflight
WHERE CARRID = pa_carid AND CONNID in lt_spfli.
I just suppose, that you want every flight connection from a given airport...
Notice, that a RANGE structure has two more fields in front of the actual "compare value". So selecting directly into it will result in a very gibberish table.
Possible Solutions:
Selecting with RANGE
If you really want to use this temporary table, you can have a look at my answer here where I describe the way to fill RANGEs without any overhead. After this step, your current snippet will work the way to wanted it too. Just make sure, that it really has been filled or everything will be selected.
Selecting with FOR ALL ENTRIES
Before you use this variant you should make absolutely sure, that your specified data object is filled. Otherwise it will result in the same mess as the first solution. To do that, you could write:
* select connid
IF lt_spfli[] IS NOT INITIAL.
* select on SFLIGHT
ELSE.
* no result
ENDIF.
Selecting with JOIN
The "correct" approach in this case would be a JOIN like:
SELECT t~*
FROM spfli AS i
JOIN sflight AS t
ON t~carrid = #pa_carid
AND t~connid = i~connid
INTO TABLE #DATA(li_conns)
WHERE i~airpfrom = #pa_airp.
Use a FOR ALL ENTRIES instead of CONNID in lt_SPFLI.
As so:
SELECT *
FROM sflight
FOR ALL ENTRIES IN lt_spfli
WHERE carrid = pa_carid
AND connid = lt_spfli-connid
You are misunderstanding what a "Ranges Table" is. You fill it incorrectly.
This part of your code demonstrates the misunderstanding (with a little debug, you would see the erroneous contents immediately):
DATA: lt_spfli TYPE RANGE OF SPFLI.
SELECT CONNID FROM SPFLI INTO TABLE lt_spfli ...
A "Ranges Table" is an internal table with 4 components (SIGN, OPTION, LOW, HIGH), used in Open SQL to do complex selections on one database column (NB: it can also be used in several ABAP statements to test the value of an ABAP variable).
So, with your SQL statement, you only initialize the first component of the Ranges table, while you should transfer CONNID into the third component.
In "modern" Open SQL, you'd better do:
SELECT 'I' as SIGN, 'EQ' as OPTION, CONNID as LOW FROM SPFLI INTO TABLE #lt_spfli ...
For more information about Ranges Tables, you may refer to the answer here: What actually high and low means in a ranges table
I have a database of filenames in which I'm trying to search using PGs full text search facility. I'm running the search query on a table of filenames, the problem is that the ranking functions are not ranking the results as I'd like them to do. For the sake of argument, let's assume the schema looks like this:
create table files (
id serial primary key,
filename text,
filename_ft tsvector
);
The query that I run looks something like this:
select filename, ts_rank(filename_ft, query)
from files, to_tsquery('simple', 'a|b|c') as query
where query ## name_ft
order by rank desc limit 5;
This will return the 5 results with the highest rank. However, those search queries are coming from another process, and in most cases the queries have some 'garbage' in them. For instance, a query for 'a xxxx' might be executed, where xxxxx is just a bunch of other terms. In most cases this still returns the correct results, because the suffix is simply not in the database.
However, sometimes a query contains some extraneous information that screws with the ranking function. For instance, a query for 'a b c' will return a filename containing the tokens 'b c' as first result, and an exact match on 'a' as second result, my guess this is due to the fact the the first result contains a larger percentage of the actual search tokens.
In most cases (if not all) the most important token appears as the first token in the query, so my question is, is there a way to give the tokens in the query a weight?
is there a way to give the tokens in the query a weight?
Yes, there is. See the documentation; search for "weight".
Whether assigning weights is the right choice is another matter. It sounds to me like you really want to exclude some of the data from the inputs to to_tsvector in index creation and searching, so you just don't include that garbage in the index.