How do you get all records from Sphinx? - sphinx

How can I get all the records from the index using Sphinx? Just like a SELECT * FROM index ?
I know that I can do something like this in order to get all the records matching a specific keyword: /usr/local/sphinx/bin/search keyword, but what I want to do is to get all the records from the index.

You can do it by using an empty query or setting the matching mode to SPH_MATCH_FULLSCAN.
There isn't a way to do either of these things with the "search" command line utility but the test.py client that comes with the Sphinx source can be used for this:
python ./sphinx-0.9.9-rc2/api/test.py -h localhost -i myindex

Try using an empty query. This worked for me with one of the releases.

In sphinx.conf, doctype has to be extern

Related

SphinxSearch indextool dumpdict what is the output format?

Executing indextool:
indextool --dumpdict myindex
gives output like this:
anymore,1,1,329756
baltimore,3,5,153685
Obviously the first column is the keyword, what are the other columns?
Umm, isn't there a header row :)
keyword,docs,hits,offset
(I suppose offset, but might not be self explanatory. AFAIK, just the offset within the doclist file. Sphinx uses that processing queries, the dict is a quick lookup table. probably of little use to external users, and can be ignored)

Executing the query using bq command line in Google Big Query

I execute a query using the below Python script and the table gets populated with 2,564,691 rows. When I run the same query using Google Big Query console, it returns 17,379,353 rows (query is as-is). I was wondering whether there is some issue with the below script. Not sure whether --replace in bq query replaces the past result set instead of appending to it.
Any help would be appreciated.
dateToday = (time.strftime("%Y/%m/%d"))
dateToday1 = dateToday.replace('/','')
commandStr = "type C:\Users\query.txt | bq query --allow_large_results --replace --destination_table table:dataset1_%s -n 1" %(dateToday1)
In the Web UI you can use Query History option to navigate to respective queries.
After you locate them - you can expand respective entries and see what exactly query was executed
I am more than sure that just comparing query texts you will see source of "discrepancy" right away!
added
In Query History - not only you can see Query Text, but also all configuration properties that were used for respective query - like Write Preference for example and others. So even if query text the same you can see potential difference in configuration that will give you a clue

Preserving some characters when using to_tsvector in PostgreSQL 9.3

I need to process strings like this "hello world #mention a #hashtag" and index them for searching using PostgreSQL. I do need to treat #mention and #hashtag specially.
The following produces a tsvector:
select to_tsvector('hello world #mention a #hashtag')
But the output looks like this:
"'a':4 'hashtag':5 'hello':1 'mention':3 'world':2"
What I would like is to see "#" preserved in front of 'mention' and # in front of 'hashtag'. Is there a way for me to do this using PostgreSQL ?
I'm not sure tsearch is the right solution for your use case. Tsearch is good at full-text search, but it sounds like you want relational data. Can you parse the data in your application and create tag/user relationships from #hashtags and #mentions?

Understanding SetFilter in sphinx

I am using Sphinx, but not fully understanding how SetFilter works. My assumption as of now is that in the Sphinx Config I have my query:
SELECT ID, Kittens, Puppies FROM db_animals;
And then I put in the attributes I would like to filter on:
sql_attr_uint = puppies
Then when I call Sphinx if I want to filter on one of these attributes I put this:
$cl->SetFilter( 'puppies', array($puppyID));
So, if $puppyID = 7
Sphinx will only return rows where the puppies column is set to 7.
Am I interpreting this correctly? Anything wrong here?
Yes to your first question. And no to the second.

Sqlalchemy with postgres. Try to get 'DISTINCT ON' instead of 'DISTINCT'

I need to generate query like this:
SELECT **DISTINCT ON** (article.code) article.code, article.title
First I try to make it via ORM distinct method and send it a list with fields. But it wont work. Second, I try to make it via sqlalchemy.sql.select - and it also generate sql query like this:
SELECT DISTINCT article.code, article.title
I Need SELECT **DISTINCT ON** (article.code)...
I look at source code and found in sqlalchemy.dialects.postgresql.base.PGCompiler.get_select_precolumns code for generating constructions like: 'DISTINCT ON'
But this method do not called. Instead of this called another method - sqlalchemy.sql.compiler.get_select_precolumns - it hasn't code for generating DISTINCT ON only for DISTINCT Maybe I should configure my session to called properly method?
This bug report suggests that DISTINCT ON works correctly in SQLAlchemy 0.7+. I think an upgrade is in order, unless you've uncovered a bug in 0.7.
Workarounds . . .
Volunteer to help get the 0.7 package
ready for Ubuntu.
Download and install from
source.
Rewrite queries to avoid DISTINCT
ON. I'm not sure whether that's
possible in the most general case.