I am using Sphinx, but not fully understanding how SetFilter works. My assumption as of now is that in the Sphinx Config I have my query:
SELECT ID, Kittens, Puppies FROM db_animals;
And then I put in the attributes I would like to filter on:
sql_attr_uint = puppies
Then when I call Sphinx if I want to filter on one of these attributes I put this:
$cl->SetFilter( 'puppies', array($puppyID));
So, if $puppyID = 7
Sphinx will only return rows where the puppies column is set to 7.
Am I interpreting this correctly? Anything wrong here?
Yes to your first question. And no to the second.
Related
I have a list of chemicals in my database and I provide our users with the ability to do a live search via our website. I use SQLAlchemy and the query I use looks something like this:
Compound.query.filter(Compound.name.ilike(f'%{name}%')).limit(50).all()
When someone searches for toluene, for example, they don't get the result they're looking for because there are many chemicals that have the word toluene in them, such as:
2, 4 Dinitrotoluene
2-Chloroethyl-p-toluenesulfonate
4-Bromotoluene
6-Amino-m-toluenesulfonic acid
a,2,4-trichlorotoluene
a,o-Dichlorotoluene
a-Bromtoluene
etc...
I realize I could increase my limit but I feel like 50 is more than enough. Or, I could change the ilike(f'%{name}%')) to something like ilike(f'{name}%')) but our business requirements don't want this. What I'd rather do is improve the ability for Postgres to return results so that toluene is at the top of the search results.
Any ideas on how Postgres' ilike capability?
Thanks in advance.
One option is to better rank the results. Postgres text search allows you to rank the results.
A cheap and dirty version of preferential ranking is to do multiple queries for name = ?, ilike(f'{name}%')), and ilike(f'%{name}%')) using a union. That way the ilike(f'{name}%')) results come first.
And rather than a hard limit, offer pagination. SQLAlchemy has paginate to help.
ILIKE yields a boolean. It doesn't specify what order to return the results, just whether to return them at all (you can order by a boolean, but if you only return trues there is nothing left to order by). So by the time you are done improving it, it would no longer be ILIKE at all but something else completely.
You might be looking for something like <-> from pg_trgm, which provides a distance score which can be sorted on. Although really, you could just order the result based on the length of the compound name, and return the shortest 50 that contain the target.
something like ilike(f'{name}%')) but our business requirements don't want this
Isn't your business requirement to get better results?
But at least in my database, this could just return a bunch of names in inverted format, like toluene, 2,4-dinitro, so the results might not be much better, unless you avoid storing such inverted names. Sorting by either <-> or by length would overcome that problem. But they would also penalize toluene, ACS reagent grade 99.99% by HPLC, should you have names like that.
code:
SELECT * FROM `detail` WHERE country='Malaysia' or state='' or region='' ORDER BY rand() LIMIT 4
In this query I want to find record which is related to malaysia. Only one record that I have in my table which is related to country='malaysia' but it show other 4 records. I don't have any idea why its happening?. So, How can I solve this issue? Please help me.
Thank You
You are also including records which have empty string for the state or region. Maybe you should just be checking the country field:
SELECT *
FROM detail
WHERE country = 'Malaysia'
ORDER BY rand()
LIMIT 4;
You can also remove
ORDER BY rand()
because sql doesn't sort the result by default, so this statement is superfluous.
Is it possible to compare two database fields in the query api? For example I want compare the fields tstamp and crdate like:
SELECT * FROM tt_content WHERE tstamp > crdate;
In the query api I could not found a solution. To get all records and compare the fields in a loop is not a performed way, because this could be over 2 million records (in my real case).
Thanks for your help.
The only way I can think of (and that the query builder supports) is to directly supply the statement. It'll look like this:
$query = $contentElementRepository->createQuery();
$query->statement('SELECT * FROM tt_content WHERE tstamp > crdate');
$matchingContentElements = $query->execute();
This probably breaks the database abstraction layer, so use it with caution. statement() has a second parameter where you can put parameters, in case you need some user input in the query.
Maybe there is another way to do this which I don't know, I'd be really interested in it myself.
there are a few topics about this already with accepted answers but I couldn't figure out a solution based on those:
Eg:
Ruby on Rails: must appear in the GROUP BY clause or be used in an aggregate function
GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function
PGError: ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
My query is:
Idea.unscoped.joins('inner join likes on ideas.id = likes.likeable_id').
select('likes.id, COUNT(*) AS like_count, ideas.id, ideas.title, ideas.intro, likeable_id').
group('likeable_id').
order('like_count DESC')
This is fine in development with sqlite but breaks on heroku with PostgreSQL.
The error is:
PG::GroupingError: ERROR: column "likes.id" must appear in the GROUP BY clause or be used in an aggregate function
If I put likes.id in my group by then the results make no sense. Tried to put group before select but doesn't help. I even tried to take the query into two parts. No joy. :(
Any suggestions appreciated. TIA!
I don't know why you want to select likes.id in the first place. I see that you basically want the like_count for each Idea; I don't see the point in selecting likes.id. Also, when you already have the ideas.id, I don't see why you would want to get the value of likes.likeable_id since they'll both be equal. :/
Anyway, the problem is since you're grouping by likeable_id (basically ideas.id), you can't "select" likes.id since they would be "lost" by the grouping.
I suppose SQLite is lax about this. I imagine it wouldn't group things properly.
ANYWAY(2) =>
Let me propose a cleaner solution.
# model
class Idea < ActiveRecord::Base
# to save you the effort of specifying the join-conditions
has_many :likes, foreign_key: :likeable_id
end
# in your code elsewhere
ideas = \
Idea.
joins(:likes).
group("ideas.id").
select("COUNT(likes.id) AS like_count, ideas.id, ideas.title, ideas.intro").
order("like_count DESC")
If you still want to get the IDs of likes for each item, then after the above, here's what you could do:
grouped_like_ids = \
Like.
select(:id, :likeable_id).
each_with_object({}) do |like, hash|
(hash[like.likeable_id] ||= []) << like.id
end
ideas.each do |idea|
# selected previously:
idea.like_count
idea.id
idea.title
idea.intro
# from the hash
like_ids = grouped_like_ids[idea.id] || []
end
Other readers: I'd be very interested in a "clean" one-query non-sub-query solution. Let me know in the comments if you leave a response. Thanks.
What differences are there between group by and group sort?
How to implement set select in sphinx SE?
groupby: what function and attribute to use for grouping
groupsort: what order to retrieve the groups in.
use 'select'. But be careful that your sphinxSE table has the right columns to use the output
(btw, its sphinxSE not "sphinx SE")