Sphinx 3 index a relational database - sphinx

Here is my Database scheme:
Restaurants -> id, title
Categories -> id, restaurant_id, type(enum: cafe, restaurant, fastfood, burger, chiness)
Reviews -> id, restaurant_id, body
Reviews belong to a restaurant and a restaurant belongs to multiple categories
how can we query reviews belong to a category?
here is the current config:
sql_query = select reviews.id as id, reviews.body as body, restaurants.url as url, restaurants.id as restaurant_id from reviews inner join restaurants on(restaurants.id = reviews.restaurant_id)
sql_attr_multi = uint category_id from query; SELECT restaurant_id, categories.id as category_id FROM categories
the problem is in the sql_attr_multi it replaces restaurant_id by the reviews.id! it doesn't know the restaurant_id in the sql_attr_multi and it thinks we mean the reviews.id by it!

Firstly the first column in the resultset in sql_attr_multi, should be the 'document_id' from the main query. In your example the document_id is reviews.id, but then seem to use restaruant_id in the MVA. It can't match arbitrary attributes like that.
So would actually need something like
sql_attr_multi = uint category_id from query; \
SELECT reviews.id, categories.id as category_id \
FROM categories \
INNER JOIN reviews ON(reviews.restaurant_id = categories.restaurant_id) \
ORDER BY reviews.id
To get the review id to perform the join. Split into lines to make it easier to read. Have found it best to explictly order the rows via document_id.
But that still wouldnt let you query via the 'type' , ie 'cafe' etc, because your 'categories.id' seems to be a unique id per restaurant anyway.
(ie each cafe would would a different categories.id!)
So actually seems you would want the 'type' as the MVA value. MVAs are numberic (not string), but luckily, enum is easily converted to a integer!
sql_attr_multi = uint category_type from query; \
SELECT reviews.id, type+0 \
FROM categories \
INNER JOIN reviews ON(reviews.restaurant_id = categories.restaurant_id) \
ORDER BY reviews.id
The column names in the sql_attr_multi dont matter, just uses first column as document_id, second as the MVA value.
THen knowing cafe (for example) is the first value in enum, its 1. So query reivews for cafe... (burger would be 4 for example)
SphinxQL> SELECT * FROM myindex WHERE MATCH('keyword') AND category_type = 1

Related

Using ANY with raw data work but not subquery

I just can't figure it out why this query work
SELECT id, name, organization_id
FROM facilities
WHERE organization_id = ANY(
'{abc-xyz-123,678-ght-nmp}'
)
But this query wont work with error operator does not exist: uuid = uuid[]
SELECT id, name, organization_id
FROM facilities
WHERE organization_id = ANY(
SELECT organization_ids
FROM admins
WHERE id = 'jkl-iop-345'
)
When the subquery
SELECT organization_ids
FROM admins
WHERE id = 'jkl-iop-345'
give the exact result of {abc-xyz-123,678-ght-nmp}.
I'm using postgres (PostgreSQL) 13.3
The subquery produces one row that contains an array.
If you use = ANY (SELECT ...), the result set is converted to an array, so you end up with
{{abc-xyz-123,678-ght-nmp}}
which is an array of arrays.
You probably want
SELECT id, name, organization_id
FROM facilities
WHERE EXISTS (SELECT 1 FROM admins
WHERE admins.id = 'jkl-iop-345'
AND facilities.organization_id = ANY (admins.organization_ids)
);
Let me remark that storing references to other tables in an array, JSON or other composite data type is an exceptionally bad idea. A normalized schema with a junction table would serve you better.

SphinxQL - how to filter behind match

I'm working on a project where I use Sphinx searchengine. But - as I realized - the Sphinx documentation is big but hard to understand.
So I was not able to find any information on how to use the WHERE clause to filter behind a MATCH-statement. What I tried yet is:
"SELECT *, country FROM all_gebrauchte_products WHERE MATCH('#searchtext (".$searchQuery.")') AND country='".$where."' ORDER BY WEIGHT() DESC LIMIT ".$page.", ".$limit." OPTION ranker=expr('sum(lcs)')"
If I use it without the country=$where clause, I get back many GUIDs but from different countries. So somehow I have to filter the country column;
If I use the above statement, I get error:
Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 index all_gebrauchte_products: parse error: unknown column: country'
But I set the index like this:
sql_query_range = SELECT MIN(gebr_id), MAX(gebr_id) FROM all_gebrauchte_products
sql_range_step = 10000
sql_query = \
SELECT a.gebr_id AS guid, 'products' AS data_type, a.gebr_products AS products, a.gebr_user AS username, a.gebr_date AS datadate, CONCAT(a.gebr_hersteller,' ', a.gebr_modell,' ', a.gebr_ukat,' ', a.gebr_kat,' ', a.gebr_bemerkung) AS searchtext, a.gebr_bild1 AS image1, a.gebr_bild2 AS image2, a.gebr_bild3 AS image3, a.gebr_bild4 AS image4, a.gebr_bild5 AS image5, b.h_land AS country, b.h_web AS weblink, b.h_firmenname AS company, b.h_strasse AS street, b.h_plz AS zipcode, b.h_ort AS city, a.gebr_aktiv AS active \
FROM all_gebrauchte_products a, all_haendler b \
WHERE a.gebr_user = b.h_loginname AND a.gebr_id>=$start AND a.gebr_id<=$end
sql_attr_uint = active
Can anybody tell me what is going wrong? Or how do I have to filter for country?
Thnx. in advance for your help.
Any columns in the sql_query you dont make an ATTRIBUTE, is automatically a FIELD (except the first column is always the document-id).
FIELDs are 'full-text' indexed, they are what you can match in the query - ie the MATCH(...) clause.
ATTRIBUTES are what can be 'filtered' in WHERE, sorted by in ORDER BY, grouped in GROUP BY, or retrieved in the SELECT (or even used in ranking expressions).
So you need country to be an ATTRIBUTE to be able use it in WHERE filter
You don't say but guess it's a string. You can use sql_field_string to make a column BOTH a FIELD and ATTRIBUTE, if you are still interested in being able to full-text query the column too.
(also because its a string, need a very recent version of sphinx. Sphinx only recently gained ability to filter by strings attributes)

PostgreSQL 9.1: Find exact expression in title

I'm looking for a way to search a specific expression - then a part of its - in all documents (and its values associated). The final order should be:
Complete expression (in title or content): using ILIKE and '%expression%'
One of the word (in title or content): using tsquery on tsvector indexes columns
I have two tables:
documents (id [integer], title [character varying], title_search [tsvector])
values (id [integer], content [character varying], content_search [tsvector], id_document [integer])
Here the request I am doing right now:
(SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE (title ILIKE(unaccent('%lorem ipsum%')) OR content ILIKE(unaccent('%lorem ipsum%'))))
UNION (SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE query ## title_search)
UNION (SELECT id, title, content, title_search, content_search, ts_rank_cd(title_search, query) AS rank
FROM to_tsquery('lorem&ipsum|(lorem|ipsum)') query, documents
LEFT JOIN "values" ON id_document=id
WHERE query ## content_search)
ORDER BY rank DESC, title ASC
By doing this, I can get all documents with this expression and/or a part of its but I can't get to have those correctly ordrered. This is because I am relying on ranking with ts_rank on tsvector field which cannot be used to define exact expression.
So my questions are how I can get this to work as I expect? Am I wrong for using full text search?
Thank you.
It's a bit awkward but a solution I've used before is to include an extra "rank" column in your individual subqueries. For instance, the fist query would look like
select 1 as which_rank, id, title, ...
....
where title ILIKE(unaccent('%lorem ipsum%'))
OR content ILIKE(unaccent('%lorem ipsum%')))
then the second would be
select 2 as which_rank, id, title, ...
...
where query ## title_search
and the third would be
select 3 as which_rank, id, title, ...
...
where query ## content_search
If you include that ranking value in your sort order:
ORDER BY which_rank asc, rank DESC, title ASC
you can make sure the first case gets listed first, the second second, and the third third. You can also re-arrange which is 1, 2, 3 depending on your needs.

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Multiple WHERE clause for same Columns in TSQL

I am trying to query two tables that are in 1-to-many relationship.
What I've done is create a View knowing that i might end up with multiple records for the first table.
My scenario is as follows: I have a table "Items" and table "Properties".
"Properties" table contains an ItemsId column, PropertyId, PropertyValueId columns.
"Items" table/object contains a list of "Properties".
How would I query that "View" such that, I want to get all "Items" records that have a combination of "PropertyId" & "PropertyValueId" values.
In other words something similar to:
WHERE
(PropertyId = #val1 AND PropertyValueId = #val2) OR
(PropertyId = #val3 AND PropertyValueId = #val4) OR
(PropertyId = #val5 AND PropertyValueId = #val6)
WHERE clause is just a loop over "Items.Properties" collection.
"Items" represents a table of Items being stored in the database. Each & every Item has some dynamic properties, one or more. That's why I have another table called "Properties". Properties table contains columns:
ItemId, PropertyId, PropertyValue
"Item" object has a collection of Properties/Values. Prop1:val1, Prop2:val2, etc ...
Thanks
I may not have understood your requirement (despite the update) - if this or any other answer doesn't solve the problem please add some sample data for Items, Properties and the output and then hopefully it would become clear.
If Items is a specification of the property name-value pairs that you need (and has nothing to do with ItemId on Properties which seems strange...)
select p.itemid
from properties p
where exists (select 1 from items i where i.propertyId = p.propertyId and i.propertyValueId = p.propertyValueId)
group by p.itemid
having count(distinct p.propertyid) = (select count(*) from items)
This returns a set of itemids that have one (and only one) property value for each property defined in items. You can put the items count into a variable if you want.
I would use a query like this:
SELECT ItemId
FROM ItemView
WHERE (PropertyId = #val1 AND PropertyValueId = #val2)
OR (PropertyId = #val3 AND PropertyValueId = #val4)
OR (PropertyId = #val5 AND PropertyValueId = #val6)
GROUP BY ItemId
HAVING COUNT(*) = 3
The WHERE clause is the same as in your question, it only allows a row to be selected if the row has a matching property. You only need to make sure additionally that the items obtained have all the properties in the filter, which is done in the above query with the help of the HAVING clause: you are requesting items with 3 specific properties, therefore the number of properties per item in your result set (COUNT(*)) should be equal to 3.
In a more general case, when the number of properties queried may be arbitrary, you should probably consider passing the arguments in the form of a table and join the view to it:
…
FROM ItemView v
INNER JOIN RequestedProperties r ON v.PropertyId = r.Id
AND v.PropertyValueId = r.ValueId
GROUP BY v.ItemId
HAVING COUNT(*) = (SELECT COUNT(*) FROM RequestedProperties)