SphinxQL - how to filter behind match - sphinx

I'm working on a project where I use Sphinx searchengine. But - as I realized - the Sphinx documentation is big but hard to understand.
So I was not able to find any information on how to use the WHERE clause to filter behind a MATCH-statement. What I tried yet is:
"SELECT *, country FROM all_gebrauchte_products WHERE MATCH('#searchtext (".$searchQuery.")') AND country='".$where."' ORDER BY WEIGHT() DESC LIMIT ".$page.", ".$limit." OPTION ranker=expr('sum(lcs)')"
If I use it without the country=$where clause, I get back many GUIDs but from different countries. So somehow I have to filter the country column;
If I use the above statement, I get error:
Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 index all_gebrauchte_products: parse error: unknown column: country'
But I set the index like this:
sql_query_range = SELECT MIN(gebr_id), MAX(gebr_id) FROM all_gebrauchte_products
sql_range_step = 10000
sql_query = \
SELECT a.gebr_id AS guid, 'products' AS data_type, a.gebr_products AS products, a.gebr_user AS username, a.gebr_date AS datadate, CONCAT(a.gebr_hersteller,' ', a.gebr_modell,' ', a.gebr_ukat,' ', a.gebr_kat,' ', a.gebr_bemerkung) AS searchtext, a.gebr_bild1 AS image1, a.gebr_bild2 AS image2, a.gebr_bild3 AS image3, a.gebr_bild4 AS image4, a.gebr_bild5 AS image5, b.h_land AS country, b.h_web AS weblink, b.h_firmenname AS company, b.h_strasse AS street, b.h_plz AS zipcode, b.h_ort AS city, a.gebr_aktiv AS active \
FROM all_gebrauchte_products a, all_haendler b \
WHERE a.gebr_user = b.h_loginname AND a.gebr_id>=$start AND a.gebr_id<=$end
sql_attr_uint = active
Can anybody tell me what is going wrong? Or how do I have to filter for country?
Thnx. in advance for your help.

Any columns in the sql_query you dont make an ATTRIBUTE, is automatically a FIELD (except the first column is always the document-id).
FIELDs are 'full-text' indexed, they are what you can match in the query - ie the MATCH(...) clause.
ATTRIBUTES are what can be 'filtered' in WHERE, sorted by in ORDER BY, grouped in GROUP BY, or retrieved in the SELECT (or even used in ranking expressions).
So you need country to be an ATTRIBUTE to be able use it in WHERE filter
You don't say but guess it's a string. You can use sql_field_string to make a column BOTH a FIELD and ATTRIBUTE, if you are still interested in being able to full-text query the column too.
(also because its a string, need a very recent version of sphinx. Sphinx only recently gained ability to filter by strings attributes)

Related

Sphinx 3 index a relational database

Here is my Database scheme:
Restaurants -> id, title
Categories -> id, restaurant_id, type(enum: cafe, restaurant, fastfood, burger, chiness)
Reviews -> id, restaurant_id, body
Reviews belong to a restaurant and a restaurant belongs to multiple categories
how can we query reviews belong to a category?
here is the current config:
sql_query = select reviews.id as id, reviews.body as body, restaurants.url as url, restaurants.id as restaurant_id from reviews inner join restaurants on(restaurants.id = reviews.restaurant_id)
sql_attr_multi = uint category_id from query; SELECT restaurant_id, categories.id as category_id FROM categories
the problem is in the sql_attr_multi it replaces restaurant_id by the reviews.id! it doesn't know the restaurant_id in the sql_attr_multi and it thinks we mean the reviews.id by it!
Firstly the first column in the resultset in sql_attr_multi, should be the 'document_id' from the main query. In your example the document_id is reviews.id, but then seem to use restaruant_id in the MVA. It can't match arbitrary attributes like that.
So would actually need something like
sql_attr_multi = uint category_id from query; \
SELECT reviews.id, categories.id as category_id \
FROM categories \
INNER JOIN reviews ON(reviews.restaurant_id = categories.restaurant_id) \
ORDER BY reviews.id
To get the review id to perform the join. Split into lines to make it easier to read. Have found it best to explictly order the rows via document_id.
But that still wouldnt let you query via the 'type' , ie 'cafe' etc, because your 'categories.id' seems to be a unique id per restaurant anyway.
(ie each cafe would would a different categories.id!)
So actually seems you would want the 'type' as the MVA value. MVAs are numberic (not string), but luckily, enum is easily converted to a integer!
sql_attr_multi = uint category_type from query; \
SELECT reviews.id, type+0 \
FROM categories \
INNER JOIN reviews ON(reviews.restaurant_id = categories.restaurant_id) \
ORDER BY reviews.id
The column names in the sql_attr_multi dont matter, just uses first column as document_id, second as the MVA value.
THen knowing cafe (for example) is the first value in enum, its 1. So query reivews for cafe... (burger would be 4 for example)
SphinxQL> SELECT * FROM myindex WHERE MATCH('keyword') AND category_type = 1

oracle: grouping on merged columns

I have a 2 tables FIRST
id,rl_no,adm_date,fees
1,123456,14-11-10,100
2,987654,10-11-12,30
3,4343,14-11-17,20
and SECOND
id,rollno,fare,type
1,123456,20,bs
5,634452,1000,bs
3,123456,900,bs
4,123456,700,bs
My requirement is twofold,
1, i first need to get all columns from both tables with common rl_no. So i used:
SELECT a.ID,a.rl_no,a.adm_date,a.fees,b.rollno,b.fare,b.type FROM FIRST a
INNER JOIN
SECOND b ON a.rl_no = b.rollno
The output is like this:
id,rl_no,adm_date,fees,rollno,fare,type
1,123456,14-11-10,100,123456,20,bs
1,123456,10-11-12,100,123456,900,bs
1,123456,14-11-17,100,123456,700,bs
2,Next i wanted to get the sum(fare) of those rollno that were common between the 2 tables and also whose fare >= fees from FIRST table group by rollno and id.
My query is:
SELECT x.ID,x.rl_no,,x.adm_date,x.fees,x.rollno,x.type,sum(x.fare) as "fare" from (SELECT a.ID,a.rl_no,a.adm_date,a.fees,b.rollno,b.fare,b.type FROM FIRST a
INNER JOIN
SECOND b ON a.rl_no = b.rollno) x, FIRST y
WHERE x.rollno = y.rl_no AND x.fare >= y.fees AND x.type IS NOT NULL GROUP BY x.rollno,x.ID ;
But this is throwing in exceptions.
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
The expected output will be like this:
id,rollno,adm_date,fare,type
1,123456,14-11-10,1620,bs
So could someone care to show an oracle newbie what i'm doing wrong here?
It looks like there's a couple different problems here;
Firstly, you're trying to group by an x.ID column which doesn't exist; it looks like you'll want to add ID to the selected columns in your sub-query.
Secondly, when aggregating with GROUP BY, all selected columns need to be either listed in the GROUP BY statement or aggregated. If you're grouping by rollno and ID, what do you want to have happen to all the extra values for adm_date, fees, and type? Are those always going to be the same for each distinct rollno and ID pair?
If so, simply add them to the GROUP BY statement, ie,
GROUP BY adm_date, fees, type, rollno, ID
If not, you'll need to work out exactly how you want to select which one to be output; If you've got output like your example (adding in an ID column here)
ID,adm_date,fees,rollno,fare,type
1,14-11-10,100,123456,20,bs
1,10-11-12,100,123456,900,bs
1,14-11-17,100,123456,700,bs
Call that result set 'a'. If I run;
SELECT a.ID, a.rollno, SUM(a.fare) as total_fare
FROM a
GROUP BY a.ID, a.rollno
Then the result will be a single row;
ID,rollno,total_fare
1,123456,1620
So, if you also select the adm_date, fees, and type columns, oracle has no idea what you mean to do with them. You're not using them for grouping, and you're not telling oracle how you want to pick which one to use.
You could do something like
SELECT a.ID,
FIRST(a.adm_date) as first_adm_date,
FIRST(a.fees) as first_fees,
a.rollno,
SUM(a.fare) as total_fare,
FIRST(a.type) as first_type
FROM a
GROUP BY a.ID, a.rollno
Which would give the result;
ID,first_adm_date,first_fees,rollno,total_fare,first_type
1,14-11-10,100,123456,1620,bs
I'm not sure if that's what you mean to do though.

How can I orderby id when I use distinct on Web2py

I'm using postgresql database and have a log table. I want to show the ticket information from this log table and want to sort by id. But ticket id has duplicate data, so I use distinct to filter, and then I can't sort by id when I use distinct.
How can I solve this issue? Thanks!
rows = db(db.log.ticket_id != '').select(db.log.ALL, orderby=~db.log.id, distinct=db.log.ticket_id , limitby=((page-1) * PAGE_ROWS, (page*PAGE_ROWS)))
I got the error message:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
And I try to:
rows = db(db.log.ticket_id != '').select(db.log.ALL, orderby=~db.log.id|db.log.ticket_id, distinct=db.log.ticket_id , limitby=((page-1) * PAGE_ROWS, (page*PAGE_ROWS)))
But still can't work...

PostgreSQL 9.1: Find exact expression in title

I'm looking for a way to search a specific expression - then a part of its - in all documents (and its values associated). The final order should be:
Complete expression (in title or content): using ILIKE and '%expression%'
One of the word (in title or content): using tsquery on tsvector indexes columns
I have two tables:
documents (id [integer], title [character varying], title_search [tsvector])
values (id [integer], content [character varying], content_search [tsvector], id_document [integer])
Here the request I am doing right now:
(SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE (title ILIKE(unaccent('%lorem ipsum%')) OR content ILIKE(unaccent('%lorem ipsum%'))))
UNION (SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE query ## title_search)
UNION (SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE query ## content_search)
ORDER BY rank DESC, title ASC
By doing this, I can get all documents with this expression and/or a part of its but I can't get to have those correctly ordrered. This is because I am relying on ranking with ts_rank on tsvector field which cannot be used to define exact expression.
So my questions are how I can get this to work as I expect? Am I wrong for using full text search?
Thank you.
It's a bit awkward but a solution I've used before is to include an extra "rank" column in your individual subqueries. For instance, the fist query would look like
select 1 as which_rank, id, title, ...
....
where title ILIKE(unaccent('%lorem ipsum%'))
OR content ILIKE(unaccent('%lorem ipsum%')))
then the second would be
select 2 as which_rank, id, title, ...
...
where query ## title_search
and the third would be
select 3 as which_rank, id, title, ...
...
where query ## content_search
If you include that ranking value in your sort order:
ORDER BY which_rank asc, rank DESC, title ASC
you can make sure the first case gets listed first, the second second, and the third third. You can also re-arrange which is 1, 2, 3 depending on your needs.

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.