Sphinx not returning results in some cases - sphinx

I'm using extended2 match mode. Here is an example:
Keyword 'méz': 0 results
Keyword 'mézes': 4657 results
Charset is in utf-8, and min_word_len is 3.
Any suggestions?

Related

Postgresql 15 - syntax sensitivity

I've noticed that in older version of PG (example 13)
when I had query like:
select 1 where 1=1and 2=2
all was OK
but i try this in PG 15 I get error: trailing junk after numeric literal at or near "1a"
Have something changed or maybe there is a new option in configuration to make it more strict ?
This was changed in v 15.0.
From the release notes:
Prevent numeric literals from having non-numeric trailing characters (Peter Eisentraut)
Previously, query text like 123abc would be interpreted as 123 followed by a separate token abc.
and similar
Adjust JSON numeric literal processing to match the SQL/JSON-standard (Peter Eisentraut)
This accepts numeric formats like .1 and 1., and disallows trailing junk after numeric literals, like 1.type().

Regrex query in DB2-LUW

I need a regrex query to match any string having given character. So i tried for example
SELECT wt.CHGUSER FROM "CDB"."WTBALL" wt where REGEXP_LIKE (wt.CHGUSER, '^\d*115*$');
So i am expecting to fetch all the strings having 115 somewhere in between each string. I tried many combinations but i am getting empty column or weird combination.
Are you sure You need a regex? You write "all the strings having 115 somewhere in between each string", but test for a all-digit string with "115" somewhere...
Btw. this could be done also without regex:
WHERE LOCATE('115', wt.CHGUSER) > 0
AND TRANSLATE(wt.CHGUSER, '', '0123456789') --if You really want to test all-digit string
why not use the native "LIKE" expression?
where wt.CHGUSER like '%115%'
This will give different results than your regexp because your expression is looking for '115' so long as there is a digit immediate before and after it. A more generic regexp, which matches your question, would be '.*115.*'
What about -
REGEXP_LIKE (wt.CHGUSER, '^*\d115\d*$');

Does intersystems cache have a wildcard to search global node?

Sometimes i want to search a character with wildcard, I don't want to search all the global nodes to find specific characters. so i want to know is any wildcard i can use to match specific characters on global nodes. as if i want to find ^G("abc") in ^G with ^G("*s*")
There is no way to do this using low level $order/$query functions as #
kazamatzuri correctly said, but you can use %Library.Global:Get class query - first parameter is namespace, and second parameter is pattern string. You can have a documentation on pattern syntax in the class itself or here https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GGBL_managing#GGBL_managing_view
Here is an example using CALL statement - let's assume we want to find all global nodes in ^%SYS global of USER namespace starting with "D":
DEV>d $system.SQL.Shell()
SQL Command Line Shell
----------------------------------------------------
The command prefix is currently set to: <<nothing>>.
Enter <command>, 'q' to quit, '?' for help.
[SQL]DEV>>call %Library.Global_Get('USER','^%SYS("D":"E"')
1. call %Library.Global_Get('USER','^%SYS("D":"E"')
Dumping result #1
Name Value Name Format Value Format Permissions
^%SYS("DBRefByName","CONFIG-ANALYTICS") ^^f:\trakcare\config\db\analytics\ 1
^%SYS("DBRefByName","CONFIG-APPSYS") ^^f:\trakcare\config\db\appsys\ 1 1
^%SYS("DBRefByName","CONFIG-AUDIT0") ^^f:\trakcare\config\db\audit0\ 1 1
^%SYS("DBRefByName","CONFIG-AUDIT1") ^^f:\trakcare\config\db\audit1\ 1 1
^%SYS("DBRefByName","CONFIG-AUDIT2") ^^f:\trakcare\config\db\audit2\ 1 1
No.
You'll have to implement that yourself using $ORDER or $QUERY. There are pattern matching and regex utils though.
Cheers!

Wildcard searching between words with CRC mode in Sphinx

I use sphinx with CRC mode and min_infix_length = 1 and I want to use wildcard searching between character of a keyword. Assume I have some data like these in my index files:
name
-------
mickel
mick
mickol
mickil
micknil
nickol
nickal
and when I search for all record that their's name start with 'mick' and end with 'l':
select * from all where match ('mick*l')
I expect the results should be like this:
name
-------
mickel
mickol
mickil
micknil
but nothing returned. How can I do that?
I know that I can do this in dict=keywords mode but I should use crc mode for some reasons.
I also used '^' and '$' operators and didn't work.
You can't use 'middle' wildcards with CRC. One of the reaons for dict=keywords, the wildcards it can support are much more flexible.
With CRC, it 'precomputes' all the wildcard combinations, and injects them as seperate keywords in index, eg for
eg mickel as a document word, and with min_prefix_len=1, indexer willl create the words:
mickel
mickel*
micke*
mick*
mic*
mi*
m*
... as words in index, so all the combinations can match. If using min_infix_len, it also has to do all the combinations at the start as well (so (word_length)^2 + 1 combinations)
... if it had to precompute all the combinations for wildcards in the middle, would be a lot more again. Particularly if then allows all for middle AND start/end combinations as well)
Although having said that, you can rewrite
select * from all where match ('mick*l')
as
select * from all where match ('mick* *l')
because with min_infix_len, the start and end will be indexed as sperate words. Jus need to insist that both match. (although can't think how to make them bot match the same word!)

Indexing Euro (€) and Lb (£) in Sphinx

These don't seem to index, even when I explicitly add them to my charset_table:
charset_table=... U+20AC->U+20AC, U+00A3->U+00A3
I even tried mapping them to the dollar sign
U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024
Yet in each case they are unrecognized in other words MATCH('£1000') will not find 'cost is £1000' and if I try to map to $ as per the second example then MATCH('$1000)` will not either.
If I do a MySQL Search however where field like '%£%' I do get records leading me to believe the MySQL is encoding UTF-8 correctly. Meaning the Pound Sign and Euro characters are being stored correctly in MySQL but the Sphinx index is not recognizing them regardless, even after I explicitly add their Unicode characters to my charset_table.
Relevant portion of config:
`min_stemming_len = 1
stopword_step = 0
html_strip = 1
min_word_len = 1
min_infix_len = 0
index_zones = title,description
charset_type = utf8mb4_unicode_ci
charset_table = 0..9, A..Z->a..z, _, a..z, U+0026->U+0026, U+0027->U+0027, U+002E->U+002E, U+002D->U+002D, U+2014->U+002D#, U+2019->U+0027, U+0024->U+0024, U+20AC->U+0024, U+00A3->U+0024
Confirmed that the table/column is using utf8mb4_unicode_ci
Confirmed I can do a mysql search on Euro: Where Title like '%€%'
Confirmed I cannot find same record with SphinxQL: where MATCH('€')
There are a three things you should check:
First, look at This Question to check your MySQL char encoding;
Secondly, look in your Sphinx config to check charset_type matches it.
Lastly, remember, after any changes to charset_type or charset_table you need to rebuild indexes.
If none of the above helps, you could post your Sphinx Config here, which might give further clues as to the problem.