How to search terms containing slash ("/") with Sphinx / Manticore? - sphinx

I'm trying to find matches of term "/book" and not just "book", but Manticore returns same result for both terms. Index type is rt and charset_table includes slash ("/"). How can I get only "/book" matches?

RT mode
mysql> drop table if exists t; create table t(f text) charset_table = 'non_cjk, /'; insert into t(f) values ('book'), ('/book'); select * from t where match('\\/book'); select * from t where match('book');
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
create table t(f text) charset_table = 'non_cjk, /'
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
insert into t(f) values ('book'), ('/book')
--------------
Query OK, 2 rows affected (0.01 sec)
--------------
select * from t where match('\\/book')
--------------
+---------------------+-------+
| id | f |
+---------------------+-------+
| 1514651075267788906 | /book |
+---------------------+-------+
1 row in set (0.00 sec)
--------------
select * from t where match('book')
--------------
+---------------------+------+
| id | f |
+---------------------+------+
| 1514651075267788905 | book |
+---------------------+------+
1 row in set (0.00 sec)
Plain mode
Plain index
source src {
type = csvpipe
csvpipe_command = echo "1,book" && echo "2,/book"
csvpipe_field = f
}
index idx {
path = /tmp/idx
source = src
charset_table = non_cjk, /
stored_fields = f
}
searchd {
listen = 127.0.0.1:9315:mysql41
log = sphinx_min.log
pid_file = searchd.pid
binlog_path =
}
mysql> select * from idx where match('\\/book');
+------+-------+
| id | f |
+------+-------+
| 2 | /book |
+------+-------+
1 row in set (0.00 sec)
mysql> select * from idx where match('book');
+------+------+
| id | f |
+------+------+
| 1 | book |
+------+------+
1 row in set (0.00 sec)
RT index
index t {
type = rt
path = /tmp/idx
rt_field = f
charset_table = non_cjk, /
stored_fields = f
}
searchd {
listen = 127.0.0.1:9315:mysql41
log = sphinx_min.log
pid_file = searchd.pid
binlog_path =
}
mysql> insert into t(f) values ('book'), ('/book'); select * from t where match('\\/book'); select * from t where match('book');
Query OK, 2 rows affected (0.00 sec)
+---------------------+-------+
| id | f |
+---------------------+-------+
| 1514659513871892482 | /book |
+---------------------+-------+
1 row in set (0.00 sec)
+---------------------+------+
| id | f |
+---------------------+------+
| 1514659513871892481 | book |
+---------------------+------+
1 row in set (0.00 sec)

Related

Select from 2 rows in one table into a single row with 2 or more columns in the second table

I have a table that has 2 columns. One is a type column and the other is a value amount column. There are only 2 types/ I would like to select columns of this table into another table with 2 combined columns based on type and value. For example, the table may have order with 2 of the types in 2 rows. It would be inserted into the 2nd table as one row.
Example:
Table 1
| ID | OrderID | Type | Value |
|:-----|:--------:|:------------:|-------:|
| 1 | 300 | bike | 100 |
| 2 | 300 | skateboard | 150 |
| 3 | 700 | bike | 200 |
| 4 | 700 | skateboard | 50 |
| 5 | 800 | bike | 150 |
| 6 | 800 | skateboard | 100 _
What is the TSQL to have it inserted into the 2nd table with these values?
Table 2
| ID | OrderID | BikeValue | SkateboardValue |
|:----|:--------:|:----------:|-----------------:|
| 1 | 300 | 100 | 150 |
| 2 | 700 | 200 | 50 |
| 3 | 800 | 150 | 100 |
Just make it simple for yourself. Do two SQL statements. One to insert and another to update.
INSERT INTO Table2 (OrderID, BikeValue)
SELECT Table1.OrderID, Table1.Value
FROM Table1 (NOLOCK)
WHERE Table1.Type = 'bike'
UPDATE Table2 SET Table2.SkateboardValue = Table1.Value
FROM Table2
INNER JOIN Table1 ON Table1.OrderID = Table2.OrderID
WHERE Table1.Type = 'skateboard'

REMOVE_REPEATS is not working in SphinxQL

the document said REMOVE_REPEATS ( result_set, column, offset, limit ) - removes repeated adjusted rows with the same 'column' value. but when I run select remove_repeats((select * from rt), gid, 0, 10), The record gid=22 appeared twice.Shouldn't it appear only once?
mysql> select remove_repeats( (select * from rt),gid,0,10);
+------+------+
| id | gid |
+------+------+
| 1 | 11 |
| 2 | 22 |
| 3 | 33 |
| 4 | 22 |
+------+------+
4 rows in set (0.00 sec)
REMOVE_REPEATS() removes only repeated rows going one by another. In your case you can remove the 2nd occurrence of gid=22 if you order the sub-query by gid:
mysql> select remove_repeats( (select * from rt order by gid asc),gid,0,10);
+------+------+
| id | gid |
+------+------+
| 1 | 11 |
| 2 | 22 |
| 3 | 33 |
+------+------+
3 rows in set (0.00 sec)

How to debug PostgreSQL trigger error "query has no destination for result data"?

I have three table, like this:
data_buku table
+----+----------+----------------+
| kode_buku | * | * | stock |
+----+----------+----------------+
| 111 | * | * | 50 |
| 222 | * | * | 50 |
| 333 | * | * | 50 |
| 444 | * | * | 50 |
| 555 | * | * | 50 |
| 666 | * | * | 50 |
+----+-------+-----+----+--------+
data_pinjam table
+---------------+----------------------------+
| no_transaksi | kode_buku | * | jumlah |
+---------------+-------------+----+---------+
| 1 | 111 | * | 3 |
| 1 | 222 | * | 2 |
| 1 | 333 | * | 4 |
+---------------+-------------+----+---------+
data_kembali table
+---------------+-----+----+---------+
| no_transaksi | * | * | status |
+---------------+-----+----+---------+
| 1 | * | * | back |
+---------------+-----+----+---------+
From my Tables, I create a function and trigger on table data_kembali. While Insert query to table data_kembali, function will make action to sum jumlah on table data_pinjam where no_transaksi in table data_kembali same with data_pinjam, and will update stock in table data_buku. Each rows with same kode_buku values. Stock + Jumlah.
I have create function
CREATE OR REPLACE FUNCTION kembali()
RETURNS TRIGGER AS
$BODY$
DECLARE
CURRENT_STOK INT4;
r data_pinjam%ROWTYPE;
BEGIN
FOR r IN
SELECT *
FROM data_pinjam p
WHERE p.no_transaksi = new.no_transaksi
LOOP
select CURRENT_STOK stock from data_buku where kode_buku = r.kode_buku;
CURRENT_STOK = CURRENT_STOK + r.jumlah;
UPDATE data_buku SET STOCK = CURRENT_STOK WHERE kode_buku = r.kode_buku;
END LOOP;
UPDATE data_pinjam SET status = 'kembali' WHERE no_transaksi = new.no_transaksi;
update data_transaksi set status = 'kembali' where no_transaksi = new.no_transaksi;
RETURN NEW;
END;
$BODY$
LANGUAGE PLPGSQL VOLATILE
COST 100;
but while running, get output
ERROR: query has no destination for result data
SQL state: 42601
Hint: If you want to discard the results of a SELECT, use PERFORM instead.
Context: PL/pgSQL function kembali() line 13 at SQL statement
Can someone advise me regarding this trigger and function for update with loop?
The SELECT statement in the cursor loop requires a destination for the selected value:
SELECT stock
INTO CURRENT_STOK
FROM data_buku
WHERE kode_buku = r.kode_buku;
The variable CURRENT_STOK would also need to be declared (before the BEGIN).
But perhaps the intent of the cursor for loop is this?
UPDATE data_buku
SET STOCK = STOCK + r.jumlah
WHERE kode_buku = r.kode_buku;

Sphinx query takes too much time

I am making an index on a table with ~90 000 000 rows. Fulltext search must be done on a varchar field, called email. I also set parent_id as an attribute.
When I run queries to search emails that match words with small amount of hits, they are fired immediately:
mysql> SELECT count(*) FROM users WHERE MATCH('diedsmiling');
+----------+
| count(*) |
+----------+
| 26 |
+----------+
1 row in set (0.00 sec)
mysql> show meta;
+---------------+-------------+
| Variable_name | Value |
+---------------+-------------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | diedsmiling |
| docs[0] | 26 |
| hits[0] | 26 |
+---------------+-------------+
6 rows in set (0.00 sec)
Things get complicated when I am searching for emails that match words with a big amount of hits:
mysql> SELECT count(*) FROM users WHERE MATCH('mail');
+----------+
| count(*) |
+----------+
| 33237994 |
+----------+
1 row in set (9.21 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 9.210 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
+---------------+----------+
6 rows in set (0.00 sec)
Using parent_id attribute, doesn't give any profit:
mysql> SELECT count(*) FROM users WHERE MATCH('mail') AND parent_id = 62003;
+----------+
| count(*) |
+----------+
| 21404 |
+----------+
1 row in set (8.66 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 8.666 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
Here are my sphinx configs:
source src1
{
type = mysql
sql_host = HOST
sql_user = USER
sql_pass = PASS
sql_db = DATABASE
sql_port = 3306 # optional, default is 3306
sql_query = \
SELECT id, parent_id, email \
FROM users
sql_attr_uint = parent_id
}
index test1
{
source = src1
path = /var/lib/sphinx/test1
}
The query that I need to run looks like:
SELECT * FROM users WHERE MATCH('mail') AND parent_id = 62003;
I need to get all emails that match a certain work and have a certain parent_id.
My questions are:
Is there a way to optimize the situation described above? Maybe there is a more convenient matching mode for such type of queries? If I migrate to a server with SSD disks will the performance growth be significant?
Just to get count can just do
Select id from index where match(...) limit 0 option ranker=none; show meta;
And get from total_found.
Will be much more efficient than count[*) which invokes group by.
Or even call keywords('word','index',1); if only single words.

SphinxSearch Ranker=matchany on multiple fields

Using Sphinx 2.1.4-id64-dev (rel21-r4324)
I want to search over multiple fields but do not want "duplicate words" to increase weight.
So, I am using ranker=matchany option.
this works as I want when duplicates are in a single field:
MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('#(val,val2) bar') OPTION ranker=matchany;
+------+---------+------+----------+
| id | val | val2 | weight() |
+------+---------+------+----------+
| 3 | bar | | 1 |
| 4 | bar bar | | 1 |
+------+---------+------+----------+
2 rows in set (0.00 sec)
=> weights are equal, despite the duplicate word in doc 4.
But that do not work anymore when duplicates are over multiple fields:
MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('#(val,val2) foo') OPTION ranker=matchany;
+------+------+------+----------+
| id | val | val2 | weight() |
+------+------+------+----------+
| 2 | foo | foo | 2 |
| 1 | foo | | 1 |
+------+------+------+----------+
2 rows in set (0.00 sec)
weight of id-2 > weight of id-1
Is there a way to apply a "matchany" ranking mode on multiple fields?
Here is a sample sphinx.conf file :
source nptest
{
type = mysql
sql_host = localhost
sql_user = myuser
sql_pass = mypass
sql_db = test
sql_port = 3306
sql_query = \
SELECT 1, 'foo' AS val, '' AS val2 \
UNION \
SELECT 2, 'foo', 'foo' \
UNION \
SELECT 3, 'bar', '' \
UNION \
SELECT 4, 'bar bar', ''
sql_field_string = val
sql_field_string = val2
}
index nptest
{
type = plain
source = nptest
path = /var/lib/sphinxsearch/data/nptest
morphology = none
}
You need the expression ranker
http://sphinxsearch.com/docs/current.html#weighting
can start with the default expression for the matchany and tweak it.
Using doc_word_count instead of sum(word_count) should be useful.
After upgrading to Sphinx 2.2.1-id64-beta (r4330) I was able to use top() aggregate function in a "custom expression ranker" like this :
MySQL [(none)]> SELECT id, val, val2, weight() FROM nptest WHERE match('#(val,val2) foo') OPTION ranker=expr('top((word_count+(lcs-1)*max_lcs)*user_weight)'), field_weights=(val=3,val2=4);
+------+-------------+------+----------+
| id | val | val2 | weight() |
+------+-------------+------+----------+
| 2 | foo | foo | 4 |
| 1 | foo | | 3 |
| 5 | bar bar foo | bar | 3 |
+------+-------------+------+----------+
3 rows in set (0.00 sec)
That way, multiple occurrences accross multiple fields do not increase global weight and if fields have different weights, top weighted field is taken.
Many Thanks to barryhunter for his great help!