Using Ecto for Postgres fulltext search on GIN indexes - postgresql

I have a simple model:
schema "torrents" do
field :name, :string
field :magnet, :string
field :leechers, :integer
field :seeders, :integer
field :source, :string
field :filesize, :string
timestamps()
end
And I want to search based on the name. I added the relevant extensions and indexes to my database and table.
def change do
create table(:torrents) do
add :name, :string
add :magnet, :text
add :leechers, :integer
add :seeders, :integer
add :source, :string
add :filesize, :string
timestamps()
end
execute "CREATE EXTENSION pg_trgm;"
execute "CREATE INDEX torrents_name_trgm_index ON torrents USING gin (name gin_trgm_ops);"
create index(:torrents, [:magnet], unique: true)
end
I'm trying to search using the search term, but I always get zero results.
def search(query, search_term) do
from(u in query,
where: fragment("? % ?", u.name, ^search_term),
order_by: fragment("similarity(?, ?) DESC", u.name, ^search_term))
end
SELECT t0."id", t0."name", t0."magnet", t0."leechers", t0."seeders", t0."source",
t0."filesize", t0."inserted_at", t0."updated_at" FROM "torrents"
AS t0 WHERE (t0."name" % $1) ORDER BY similarity(t0."name", $2) DESC ["a", "a"]
Is something wrong with my search function?

My initial guess is that because you're using the % operator, the minimum limit to match is too high for your queries. This limit defaults to 0.3 (meaning that the strings' trigrams are 30% similar). If this threshold isn't met, no results will be returned.
If that is the issue, this threshold is configurable in a couple of ways. You can either use set_limit (docs here), or set the limit on a per query basis.
The set_limit option can be a bit of a hassle, as it needs to be set per connection every time. Ecto (through db_connection) has an option to set a callback function for after_connect (docs here).
To change the limit per query, you can use the similarity function in the where clause, like this:
def search(query, search_term, limit = 0.3) do
from(u in query,
where: fragment("similarity(?, ?) > ?", u.name, ^search_term, ^limit),
order_by: fragment("similarity(?, ?) DESC", u.name, ^search_term))
end
To start, I would try that with a limit of zero to see if you get any results.

Related

activerecord unique constraint on multiple records

how should I write the activerecord migration to reflect this :
CREATE TABLE table (
c1 data_type,
c2 data_type,
c3 data_type,
UNIQUE (c2, c3)
);
This adds a unique constraint on one column, but what I'm looking for is to create the unique constraint on the combination of 2 columns, like explained in the section Creating a UNIQUE constraint on multiple columns.
EDIT
More precisely: I have a table account and a table balance_previous_month.
class CreateBalance < ActiveRecord::Migration[6.1]
def change
create_table :balance_previous_month do |t|
t.decimal :amount, :precision => 8, :scale => 2
t.date :value_date
t.belongs_to :account, foreign_key: true
t.timestamps
end
end
end
Since we're in January, the value date (i.e. balance at the end of the previous month) is 2020-12-31.
I want to put a constraint on the table balance_previous_month where per account_id, there can be only one value_date with a given amount. The amount can be updated, but a given account can't have 2 identical value_dates.
The link you added to the other post is not exactly equivalent to your request since one answer talks about enforcing uniqueness through the model while the other talks about using an index while in your example you are using a constraint. (Check this for more information on the difference between them).
There are 2 places where you can enforce uniqueness, application and database and it can be done in both places at the same time as well.
Database
So if you want to enforce uniqueness by using an index you can use this:
def change
add_index :table, [:c2, :c3], unique: true
end
If you want to add a constraint as in your example you will have to run a direct sql query in your migration as there is no built-in way in rails to do that.
def up
execute <<-SQL
ALTER TABLE table
ADD UNIQUE (c2, c3)
SQL
end
Check the link above for more info about the difference between them.
Application
Enforcing uniqueness through the model:
validates :c2, uniqueness: { scope: :c3 }
Thanks to Daniel Sindrestean, this code works :
class CreateBalance < ActiveRecord::Migration[6.1]
def change
create_table :balance_previous_month do |t|
t.decimal :amount, :precision => 8, :scale => 2
t.date :value_date
t.belongs_to :account, foreign_key: true
t.timestamps
end
execute <<-SQL
ALTER TABLE balance_previous_month
ADD UNIQUE (account_id, value_date) # ActiveRecord creates account_id
SQL
end
end

How to write sophisticated subquery as from clause in ecto?

I'm trying to write in Ecto syntax following sql query, how to write subquery after FROM hierarchy, line, it's in from clause, but I doubt if it is possible in Ecto? I wonder if I can perform such query with use of table joins or even lateral joins without performance loss with the same effect?
SELECT routes.id, routes.name
FROM routes
WHERE routes.id IN
(SELECT DISTINCT hierarchy.parent
FROM hierarchy,
(SELECT DISTINCT unnest(segments.rels) AS rel
FROM segments
WHERE ST_Intersects(segments.geom, ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1866349.262143 6886808.978425)', -1), ST_GeomFromText('POINT(1883318.282423 6876413.542579)', -1)), 3857))) AS anon_1
WHERE hierarchy.child = anon_1.rel)
I've stuck on following code:
hierarchy_subquery =
Hierarchy
|> distinct([h], h.parent)
Route
|> select([r], r.id, r.name)
|> where([r], r.id in subquery(hierarchy_subquery))
Schemas:
defmodule MyApp.Hierarchy do
use MyApp.Schema
schema "hierarchy" do
field :parent, :integer
field :child, :integer
field :deph, :integer
end
end
defmodule MyApp.Route do
use MyApp.Schema
schema "routes" do
field :name, :string
field :intnames, :map
field :symbol, :string
field :country, :string
field :network, :string
field :level, :integer
field :top, :boolean
field :geom, Geo.Geometry, srid: 3857
end
end
defmodule MyApp.Segment do
use MyApp.Schema
schema "segments" do
field :ways, {:array, :integer}
field :nodes, {:array, :integer}
field :rels, {:array, :integer}
field :geom, Geo.LineString, srid: 3857
end
end
EDIT I've tested performance of various queries and this below is fastest:
from r in Route,
join: h in Hierarchy, on: r.id == h.parent,
join: s in subquery(
from s in Segment,
distinct: true,
where: fragment("ST_Intersects(?, ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857))", s.geom),
select: %{rel: fragment("unnest(?)", s.rels)}
),
where: s.rel == h.child,
select: {r.id, r.name}
Results:
Planning time: ~0.605 ms Execution time: ~37.232 ms
The same query as above but join replaced by inner_lateral_join for segments subquery:
Planning time: ~1.353 ms Execution time: ~38.518 ms
Subqueries from answer:
Planning time: ~1.017 ms Execution time: ~41.288 ms
I thought that inner_lateral_join would be faster but it isn't. Does anybody know how to speed up this query?
Here is what I would try. I haven't verified it works but it should point to the proper direction:
segments =
from s in Segment,
where: fragment("ST_Intersects(?, ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1866349.262143 6886808.978425)', -1), ST_GeomFromText('POINT(1883318.282423 6876413.542579)', -1)), 3857)))", s.geom),
distinct: true,
select: %{rel: fragment("unnest(?)", s.rel)}
hierarchy =
from h in Hierarchy,
join: s in subquery(segments),
where: h.child == s.rel,
distinct: true,
select: %{parent: h.parent}
routes =
from r in Route,
join: h in subquery(hierarchy),
where: r.top and r.id == h.parent
Things to keep in mind:
Start from the inner query and go to the outer one
To access the result of a subquery, you need to select a map in the subquery
Ecto only allows subqueries in from and join. The good news is that you can usually rewrite "x IN subquery" as a join
You can try to run each query individually and see if they work

How to full text search more than one "model" field using Ecto and PostgreSQL

I'm using this search function in my controller:
def search(query, search_term) do
(from u in query,
where: fragment("to_tsvector(?) ## plainto_tsquery(?)", u.name, ^search_term),
order_by: fragment("ts_rank(to_tsvector(?), plainto_tsquery(?)) DESC", u.name, ^search_term))
end
It's working for just one field of my model. I would like to search all fields or be able to search a selected few (name_label, contacts, ...) at the same time.
How to do it?
You can use
(to_tsvector(col1) || to_tsvector(col2))) ## plainto_tsquery(?)
to concatenate text search vectors.

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Sunspot Rails to order search results by model id?

Assume that I have the following model and I have made it searchable with sunspot_rails.
class Case < ActiveRecord::Base
searchable do
end
end
Standard schema.xml of Sunspot in Rails declare id as an indexed field. When I use the web interface to access solr and test queries a query like:
http://localhost:8982/solr/select/?q=id%3A%22Case+15%22&version=2.2&start=0&rows=10&indent=on
which searches for Cases with id equal to Case 15 works fine and returns results.
The problem is when I carry out the search with Sunspot Rails in the rails console:
s = Case.search do
keywords('id:"Case 15"')
end
I get:
=> <Sunspot::Search:{:fl=>"* score", :rows=>10, :start=>0, :q="id:\"Case 15\"", :defType=>"dismax", :fq=>["type:Case"]}>
which show that it correctly puts in :q the correct query value, but the hits are 0:
s.hits
returns
=> []
If we assume that keywords is not equivalent and only searches the text field (full-text search) and not the field defined before the colon :, then I can try the following:
s = Case.search do
with(:id, "Case 15")
end
but this fails with a Sunspot exception:
Sunspot::UnrecognizedFieldError: No field configured for Case with name 'id'
How can I search using the indexed standard solr/sunspot id field of my model?
And to make the question more useful, how can I order by the id. The following does not work:
s = Case.search do
keywords("xxxx")
order_by :id, :desc
end
does not work. Sunspot::UnrecognizedFieldError: No field configured for Case with name 'id'
The id that you are talking about is a Sunspot internal field and it should not be used directly.
Why not add your own id field (change variable name, to avoid name collision):
class Case < ActiveRecord::Base
searchable do
integer :model_id {|content| content.id }
end
end
and then
s = Case.search do
keywords("xxxx")
order_by :model_id, :desc
end
Other (messy) option would be to hack directly solr params:
s = Case.search do
keywords("xxxx")
adjust_solr_params(:sort, 'id desc')
end