Spinx search normalized words - sphinx

I am test keywords with sphinxQL
call keywords ('azori', 'test', 1);
And get results
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | azori | a260 | 1550 | 1551 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
What means a260?

That looks like a soundex representation
https://en.wikipedia.org/wiki/Soundex
seems like you have morphology=soundex enabled on the particular index
http://sphinxsearch.com/docs/current.html#conf-morphology

Related

Output of Show Meta in SphinxQL

I am trying to check if my config has issues or I am not understanding Show Meta correctly;
If I make a regex in the config:
regexp_filter=NY=>New York
then if I do a SphinxQL search on 'NY'
Search Index where MATCH('NY')
and then Show Meta
it should show keyword1=New and keyword2=York not NY is that correct?
And if it does not then somehow my config is not working as intended?
it should show keyword1=New and keyword2=York not NY is that correct?
This is correct. When you do MATCH('NY') and have NY=>New York regexp conversion then Sphinx first converts NY into New York and only after that it starts searching, i.e. it forgets about NY completely. The same happens when indexing: it first prepares tokens, then indexes them forgetting about the original text.
To demonstrate (this is in Manticore (fork of Sphinx), but in terms of processing regexp_filter and how it affects searching works the same was as Sphinx):
mysql> create table t(f text) regexp_filter='NY=>New York';
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t values(0, 'I low New York');
Query OK, 1 row affected (0.01 sec)
mysql> select * from t where match('NY');
+---------------------+----------------+
| id | f |
+---------------------+----------------+
| 2810862456614682625 | I low New York |
+---------------------+----------------+
1 row in set (0.01 sec)
mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | new |
| docs[0] | 1 |
| hits[0] | 1 |
| keyword[1] | york |
| docs[1] | 1 |
| hits[1] | 1 |
+---------------+-------+
9 rows in set (0.00 sec)

Tableau - Calculated field for difference between date and maximum date in table

I have the following table that I have loaded in Tableau (It has only one column CreatedOnDate)
+-----------------+
| CreatedOnDate |
+-----------------+
| 1/1/2016 |
| 1/2/2016 |
| 1/3/2016 |
| 1/4/2016 |
| 1/5/2016 |
| 1/6/2016 |
| 1/7/2016 |
| 1/8/2016 |
| 1/9/2016 |
| 1/10/2016 |
| 1/11/2016 |
| 1/12/2016 |
| 1/13/2016 |
| 1/14/2016 |
+-----------------+
I want to be able to find the maximum date in the table, compare it with every date in the table and get the difference in days. For the above table, the maximum date in table is 1/14/2016. Every date is compared to 1/14/2016 to find the difference.
Expected Output
+-----------------+------------+
| CreatedOnDate | Difference |
+-----------------+------------+
| 1/1/2016 | 13 |
| 1/2/2016 | 12 |
| 1/3/2016 | 11 |
| 1/4/2016 | 10 |
| 1/5/2016 | 9 |
| 1/6/2016 | 8 |
| 1/7/2016 | 7 |
| 1/8/2016 | 6 |
| 1/9/2016 | 5 |
| 1/10/2016 | 4 |
| 1/11/2016 | 3 |
| 1/12/2016 | 2 |
| 1/13/2016 | 1 |
| 1/14/2016 | 0 |
+-----------------+------------+
My goal is to create this Difference calculated field. I am struggling to find a way to do this using DATEDIFF.
And help would be appreciated!!
woodhead92, this approach would work, but means you have to use table calculations. Much more flexible approach (available since v8) is Level of Details expressions:
First, define a MAX date for the whole dataset with this calculated field called MaxDate LOD:
{FIXED : MAX(CreatedOnDate) }
This will always calculate the maximum date on table (will overwrite filters as well, if you need to reflect them, make sure you add them to context.
Then you can use pretty much the same calculated field, but no need for ATTR or Table Calculations:
DATEDIFF('day', [CreatedOnDate], [MaxDate LOD])
Hope this helps!

Sphinx query takes too much time

I am making an index on a table with ~90 000 000 rows. Fulltext search must be done on a varchar field, called email. I also set parent_id as an attribute.
When I run queries to search emails that match words with small amount of hits, they are fired immediately:
mysql> SELECT count(*) FROM users WHERE MATCH('diedsmiling');
+----------+
| count(*) |
+----------+
| 26 |
+----------+
1 row in set (0.00 sec)
mysql> show meta;
+---------------+-------------+
| Variable_name | Value |
+---------------+-------------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | diedsmiling |
| docs[0] | 26 |
| hits[0] | 26 |
+---------------+-------------+
6 rows in set (0.00 sec)
Things get complicated when I am searching for emails that match words with a big amount of hits:
mysql> SELECT count(*) FROM users WHERE MATCH('mail');
+----------+
| count(*) |
+----------+
| 33237994 |
+----------+
1 row in set (9.21 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 9.210 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
+---------------+----------+
6 rows in set (0.00 sec)
Using parent_id attribute, doesn't give any profit:
mysql> SELECT count(*) FROM users WHERE MATCH('mail') AND parent_id = 62003;
+----------+
| count(*) |
+----------+
| 21404 |
+----------+
1 row in set (8.66 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 8.666 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
Here are my sphinx configs:
source src1
{
type = mysql
sql_host = HOST
sql_user = USER
sql_pass = PASS
sql_db = DATABASE
sql_port = 3306 # optional, default is 3306
sql_query = \
SELECT id, parent_id, email \
FROM users
sql_attr_uint = parent_id
}
index test1
{
source = src1
path = /var/lib/sphinx/test1
}
The query that I need to run looks like:
SELECT * FROM users WHERE MATCH('mail') AND parent_id = 62003;
I need to get all emails that match a certain work and have a certain parent_id.
My questions are:
Is there a way to optimize the situation described above? Maybe there is a more convenient matching mode for such type of queries? If I migrate to a server with SSD disks will the performance growth be significant?
Just to get count can just do
Select id from index where match(...) limit 0 option ranker=none; show meta;
And get from total_found.
Will be much more efficient than count[*) which invokes group by.
Or even call keywords('word','index',1); if only single words.

Sphinx calculation issue with large numbers

I have connected mysql client with sphinx server
when I issue this query
select 20130919.0+(15/4),15/4 from [INDEX] limit 1;
I get the following result
+------+--------+-------------------+----------+
| id | weight | 20130919.0+(15/4) | 15/4 |
+------+--------+-------------------+----------+
| 7414 | 1 | 20130924.000000 | 3.750000 |
+------+--------+-------------------+----------+
Note that 15/4 returns 3.75 but when it is added to 20130919.0 it returns wrong result.
in another case when i write the following query
select 2222+15/4,15/4 from [INDEX] limit 1;
It returns correct result.
+------+--------+-------------+----------+
| id | weight | 2222+15/4 | 15/4 |
+------+--------+-------------+----------+
| 7414 | 1 | 2225.750000 | 3.750000 |
+------+--------+-------------+----------+
similarly in the previous case third column should have the value 20130922.75. I thought the problem was that sphinx return rounded off number but in that case it should have been 20130923.000 not 20130924.000.
What I want is that it should return a correct floating point number but it is acting strangely. Hope someone here has any explanation for this behaviour.
Sphinx mostly does single precision float maths
http://en.wikipedia.org/wiki/Single-precision_floating-point_format
which only uses 8 bits for the exponent. The amount of decimal digits that can be stored precisely is approximately 7 - you have 8.
There is a double() function, but I havent tested it.
Edit: Actully no, double() wont help.
sphinxQL>select double(20130919.0)+(15/4),15/4 from sample2 limit 1;
+---------------------------+----------+
| double(20130919.0)+(15/4) | 15/4 |
+---------------------------+----------+
| 20130924.000000 | 3.750000 |
+---------------------------+----------+
1 row in set (0.03 sec)
sphinxQL>select double(20130919.0+(15/4)),15/4 from sample2 limit 1;
+---------------------------+----------+
| double(20130919.0+(15/4)) | 15/4 |
+---------------------------+----------+
| 20130924.000000 | 3.750000 |
+---------------------------+----------+
1 row in set (0.02 sec)

Numbering the rows in reverse order in an Emacs Org Mode table

I'd like to do something like this:
How to achieve a row index column in Emacs Org Mode using a Calc column rule
but I'd like the rows to be numbered in reverse order. I suspect this should be very easy, and should have something to do with #>, but e.g. $1=#>-## doesn't work.
You can try this example
| row | data |
|-----+------|
| 8 | |
| 7 | |
|-----+------|
| 6 | |
| 5 | |
| 4 | |
| 3 | 5123 |
| 2 | |
| 1 | 4234 |
#+TBLFM: $1='(- (length org-table-dlines) ##)