Sphinx query - right way of querying - sphinx

In my application I'm using MySQL query:
SELECT DISTINCT * FROM forum_topic \
LEFT JOIN forum_post ON forum_post.id_topic = forum_topic.Id \
WHERE MATCH (forum_post.content) AGAINST ('searching text') \
AND !MATCH (forum_topic.topic_name) AGAINST ('searching text') \
GROUP BY forum_topic.Id
but now I want to migrate into Sphinx. I created config file and table sph_counter in DB. Now my config looks like that:
source main
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = sphinx
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(Id) FROM forum_post
sql_query = SELECT * FROM forum_topic LEFT JOIN forum_post ON forum_post.id_topic = forum_topic.Id \
WHERE forum_post.Id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
AND MATCH (forum_post.content) AGAINST ('searching text') \
AND !MATCH (forum_topic.topic_name) AGAINST ('searching text')
GROUP BY(forum_topic.Id)
sql_attr_uint = id_topic
}
source delta : main
{
sql_query_pre = SET NAMES utf8
sql_query = SELECT * FROM forum_topic LEFT JOIN forum_post ON forum_post.id_topic = forum_topic.Id \
WHERE forum_post.Id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
AND MATCH (forum_post.content) AGAINST ('searching text') \
AND !MATCH (forum_topic.topic_name) AGAINST ('searching text')
GROUP BY(forum_topic.Id)
}
index main
{
source = main
path = /var/data/main_sphinx
charset_type = utf-8
}
index delta : main
{
source = delta
path = /var/data/delta_sphinx
charset_type = utf-8
}
Is that the right way I'm searching with Sphinx? Or have I do this from PHP script?

You dont put the 'query' in teh config file. You want the sphinx index to contain ALL your documents. Sphinx runs the query, offline and indexes teh results. Sphinx will then run queries against its index.
So you actully want something like
sql_query = SELECT p.*,t.* FROM forum_post p INNER JOIN forum_topic p ON p.id_topic = t.Id \
WHERE p.Id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
Wouldnt suggest the GROUP BY id_topic - because that will mean one document per topic. Which will meant sphinx will only see one post per thread, so most of the topic wont be searchable.
Have also moved the tables around, so Posts is first. So that the sphinx document_id - the first column from the SELECT list - is the post_id - because that is what is unique.
You have the topic id as a attribute, so can do grouping in sphinx if needbe.
Now you can run indexer with this index, and index every document.
Then you run queries against the index (like your 'searching text' example):
$cl->setMatchMode(SPH_MATCH_EXTENDED);
$res = $cl->Query('#content searching text','index');
This way, you build one index, then run arbitary queries against it.
(Using the #content syntax, means only searches the content column with your query, which is butter, than searching, and then excluding it from author.

Related

AS400 index configuration table

How can I view index of particular table in AS400? In which table index description of table is stored?
If your "index" is really a logical file, you can see a list of these using:
select * from qsys2.systables
where table_schema = 'YOURLIBNAME' and table_type = 'L'
To complete the previous answers: if your AS400/IBMi's files are "IBM's old style" Physical and Logical files, the qsys2.syskeys and qsys2.sysindexes are empty.
==> you retrieve index infos in QADBKFLD (for "indexes" info) and QADBXREF(for fields list) tables
select * from QSYS.QADBXREF where DBXFIL = 'YOUR_LOGICAL_FILE_NAME' and DBXLIB = 'YOUR_LIBRARY'
select * from QSYS.QADBKFLD where DBKFIL = 'YOUR_LOGICAL_FILE_NAME' and DBKLB2 = 'YOUR_LIBRARY'
WARNING: YOUR_LOGICAL_FILE_NAME is not your "table name", but the name of the file ! You have to join another table QSYS.QADBFDEP to match LOGICAL_FILE_NAME / TABLE_NAME :
To found indexes from your table's name:
Select r.*
from QSYS.QADBXREF r, QSYS.QADBFDEP d
where d.DBFFDP = r.DBXFIL and d.DBFLIB=r.DBXLIB
and d.DBFFIL = 'YOUR_TABLE_NAME' and d.DBFLIB = 'YOUR_LIBRARY'
To found all indexes' fields from your table:
Select DBXFIL , f.DBKFLD, DBKPOS , t.DBXUNQ
from QSYS.QADBXREF t
INNER JOIN QSYS.QADBKFLD f on DBXFIL = DBKFIL and DBXLIB = DBKLIB
INNER JOIN QSYS.QADBFDEP d on d.DBFFDP = t.DBXFIL and d.DBFLIB=t.DBXLIB
where d.DBFFIL = 'YOUR_TABLE_NAME' and d.DBFLIB = 'YOUR_LIBRARY'
order by DBXFIL, DBKPOS
if your indexes is create with SQL you can see liste of index in sysindexes system view
SELECT * FROM qsys2.sysindexes WHERE TABLE_SCHEMA='YOURLIBNAME' and
TABLE_NAME = 'YOURTABLENAME'
if you want detail columns for index you can join syskeys tables
SELECT KEYS.INDEX_NAME, KEYS.COLUMN_NAME
FROM qsys2.syskeys KEYS
JOIN qsys2.sysindexes IX ON KEYS.ixname = IX.name
WHERE TABLE_SCHEMA='YOURLIBNAME' and TABLE_NAME = 'YOURTABLENAME'
order by INDEX_NAME
You could also use commands to get the information. Command DSPDBR FILE(LIBNAME/FILENAME) will show a list of the objects dependent on a physical file. The objects that show a data dependency can then be further explored by running DSPFD FILE(LIBNAME/FILENAME). This will show the access paths of the logical file.

Is there any query to find table structure in Oracle_sqldeveloper

Hi i am new to oracle_sqldeveloper can you please give me the answer how to know the table structure and relationships of a database.
You can try
DESC <table_name>
Try this:
select table_name, column_name, data_type
from all_tab_columns
where table_name = <TABLE_NAME_HERE>
and owner = '<YOUR_USER_HERE_IN_CAPITAL_LETTERS>'
If you have comments on your table then to get columns' comments:
select tc.table_name, tc.column_name, tc.data_type, cc.comments
from all_col_comments cc, all_tab_columns tc
where tc.table_name = <TABLE_NAME_HERE>
and tc.owner = <OWNER_OF_TABLE_HERE>
and tc.table_name = cc.table_name
and tc.column_name = cc.column_name
and tc.owner = cc.owner
If you are logged in under owner of the table you can write this:
select table_name, column_name, data_type
from user_tab_columns
where table_name = <TABLE_NAME_HERE>
or to get columns with comments
select tc.table_name, tc.column_name, tc.data_type, cc.comments
from user_col_comments cc, user_tab_columns tc
where tc.table_name = '<TABLE_NAME_HERE>'
and tc.owner = '<YOUR_USER_HERE_IN_CAPITAL_LETTERS>'
and tc.table_name = cc.table_name
and tc.column_name = cc.column_name
To get relationships between tables user this query:
select uc1.table_name
, uc1.constraint_name
, cc1.column_name
, uc2.table_name r_table_name
, uc2.constraint_name r_constraint_name
, cc2.column_name r_column_name
from all_constraints uc1
, all_constraints uc2
, all_cons_columns cc1
, all_cons_columns cc2
where 1 = 1
and uc2.constraint_type = 'R'
and uc1.constraint_name = uc2.r_constraint_name
and cc1.table_name = uc1.table_name
and cc1.constraint_name = uc1.constraint_name
and cc2.table_name = uc1.table_name
and cc2.constraint_name = uc1.constraint_name
and uc1.owner = '<YOUR_USER_HERE_IN_CAPITAL_LETTERS>'
and uc2.owner = uc1.owner
and cc1.owner = uc1.owner
and cc2.owner = uc1.owner
order by 1
/
Columns with the "R_" prefix mean that they are foreign data (they represent foreign keys). As you can see, I used the tables with the "ALL_" prefix, to use similar tables with the "USER_" prefix, get rid of the "OWNER" section.
To know more about oracle data dictionary read this
1) type your table name.
2) right click on table name & click Open Declaration.

Rebuild sphinx index fail

We have 4 sphinx indexes built using data from one table. All indexes have the same source settings except that they take different documents. We have checks like this mod(id, 4) = <index number> to distribute documents and document attributes between indexes.
Question: One of the four indexes (the same one) fails to rebuild almost every time we rebuild the indexes. Other indexes never have this issue and are rebuild correctly.
We have partitioned the documents and attribute tables. For example this is how documents table is partitioned:
PARTITION BY HASH(mod(id, 4))(
PARTITION `p0` COMMENT '',
PARTITION `p1` COMMENT '',
PARTITION `p2` COMMENT '',
PARTITION `p3` COMMENT ''
);
We think that indexer hangs after it has received all documents but before it starts receiving attributes. We can see this when we check sessions on MySQL server.
The index which fails to rebuild is using mod(id, 4) = 0 condition.
We use Sphinx 2.0.4-release on Ubuntu 64bit 12.04.02 LTS.
Data source config
source ble_job_2 : ble_job
{
sql_query = select job_notice.id as id, \
body, title, source, company, \
UNIX_TIMESTAMP(insertDate) as date, \
substring(company, 1, 1) as companyletter, \
job_notice.locationCountry as country, \
location_us_state.stateName as state, \
0 as expired, \
clusterId, \
groupCity, \
groupCityAttr, \
job_notice.cityLat as citylat, \
job_notice.cityLng as citylng, \
job_notice.zipLat as ziplat, \
job_notice.zipLng as ziplng, \
feedId, job_notice.rating as rating, \
job_notice.cityId as cityid \
from job_notice \
left join location_us_state on job_notice.locationState = location_us_state.stateCode \
where job_notice.status != 'expired' \
and mod(job_notice.id, 4) = 1
sql_attr_multi = uint attr from query; \
select noticeId, attributeId as attr from job_notice_attribute where mod(noticeId, 4) = 1
} # source ble_job_2
Index config
index ble_job_2
{
type = plain
source = ble_job_2
path = /var/lib/sphinxsearch/data/ble_job_2
docinfo = extern
mlock = 0
morphology = none
stopwords = /etc/sphinxsearch/stopwords/blockwords.txt
min_word_len = 1
charset_type = utf-8
enable_star = 0
html_strip = 0
} # index_ble_job_2
Any help would be greatly appreciated.
Warm regards.
Luckily we have fixed the issue.
We have applied the range query setup and this helped us to get index rebuild stable. I think this is because Sphinx runs several queries and each returns limited relatively small set of results. This allows MySQL to complete the query normally and sent all the results back to Sphinx.
The same issue is described on Sphinx forum Indexer Hangs & MySQL Query Sleeps.
The changes in the config for data source are
sql_query_range = SELECT MIN(id),MAX(id) FROM job_notice where mod(job_notice.id, 4) = 1
sql_range_step = 200000
sql_query = select job_notice.id as id, \
...
and mod(job_notice.id, 4) = 1 and job_notice.id >= $start AND job_notice.id <= $end
Please note that no ranges should be applied to sql_attr_multi query - Bad query in Sphinx MVA

Sphinx weird behavior

I have weird trouble creating index on sphinx 2.0.5-id64-release (r3308)
/etc/sphinx/sphinx.conf
source keywords
{
// ..
sql_query = \
SELECT keywords.lid, keywords.keyword FROM keywords_sites \
LEFT JOIN keywords ON keywords_sites.kid = keywords.kid \
GROUP BY keywords_sites.kid \
sql_attr_uint = lid
sql_field_string = keyword
// ...
}
I get warning
WARNING: attribute 'lid' not found - IGNORING
But when i change query to:
sql_query = \
SELECT 1, keywords.lid, keywords.keyword FROM keywords_sites \
LEFT JOIN keywords ON keywords_sites.kid = keywords.kid \
GROUP BY keywords_sites.kid \
I don't get any warnings. Why is this happen?
The first column from the sql_query is ALWAYS used as the document_id.
The document_id can not be defined as an attibute.
If you want to store the primary key in an attribute as well, then you need to include it twice in the query.

Sphinx + Postgres + uuid issues

I have a sql_query for a source defined like so:
sql_query = SELECT \
criteria.item_uuid, \
criteria.user_id, \
criteria.color, \
criteria.selection, \
criteria.item_id, \
home.state, \
item.* \
FROM criteria \
INNER JOIN item USING (item_uuid) \
INNER JOIN user_info home USING (user_id) \
WHERE criteria.item_uuid IS NOT NULL
And then an index:
index csearch {
source = csearch
path = /usr/local/sphinx/var/data/csearch
docinfo = extern
enable_star = 1
min_prefix_len = 0
min_infix_len = 0
morphology = stem_en
}
But when I run indexer --rotate csearch I get:
indexing index 'csearch'...
WARNING: zero/NULL document_id, skipping
The idea is that the item_uuid column is the identifier I want, based on some combination of the other columns. The item_uuid column is a uuid type in postgres: perhaps sphinx does not support this? Anyway, any ideas here would be greatly appreciated.
Read the docs, the document_id must be unique unsigned non-zero integers.
http://www.sphx.org/docs/manual-1.10.html#data-restrictions
You could try using SELECT row_number(), uuid, etc...