how to get records from more than one index using sphinxQL .
Here i have faced one problem
all the records i am going to retain in plain index except today records. Today records going to maintain in RT index.
while fetching records from that index , we need to get records from recently changed index.
using SphinxAPI it has return records from recently changed index [Rt index]. How to proceed the same way in SphinxQL.
SELECT * FROM index1, index2, index3 WHERE ...
SphinxQL is not like mysql, where a comma means a join, in sphinx its closer to a union
I think that the best way to achieve this will be to create distributed index, which consist of the indexes which you want to use. For example:
index tehindex
{
type = distributed
local = disk_based_index_name_here
local = rt_index_name_here
}
and then query sphinx with SphinxQL like this:
select * from tehindex where match('test');
Related
I have below query to fetch list of tickets.
EXPLAIN select * from ticket_type
where ticket_type.event_id='89898'
and ticket_type.active=true
and (ticket_type.is_unlimited = true OR ticket_type.number_of_sold_tickets < ticket_type.number_of_tickets)
order by ticket_type.ticket_type_order
I have created below indexes but not working.
Index on (ticket_type_order,event_id,is_unlimited,active)
Index on (ticket_type_order,event_id,active,number_of_sold_tickets,number_of_tickets).
The perfect index for this query would be
CREATE INDEX ON ticket_type (event_id, ticket_type_order)
WHERE active AND (is_unlimited OR number_of_sold_tickets < number_of_tickets);
Of course, a partial index like that might only be useful for this specific query.
If the WHERE conditions from the index definition are not very selective, or a somewhat slower execution is also acceptable, you can omit parts of or the whole WHERE clause. That makes the index more widely useful.
What is the size of the table and usual query result? The server is usually smart enough and disables indexes, if it expects to return more than the half of the table.
Index makes no sense, if the result is rather small. If the server has - let say - 1000 records after several filtration steps, the server stops using indexes. It is cheaper the finish the query using CPU, then loading an index from HDD. As result, indexes are never applied to small tables.
Order by is applied at the very end of the query processing. The first field in the index should be one of the fields from the where filter.
Boolean fields are seldom useful in the index. It has only two possible values. Index should be created for fields with a lot of different values.
Avoid or filtering. It is easy in your case. Put a very big number into number_of_tickets, if the tickets are unlimited.
The better index in your case would be just event_id. If the database server supports functional indexes, then you can try to add number_of_tickets - number_of_sold_tickets. Rewrite the statement as where number_of_tickets - number_of_sold_tickets > 0
UPDATE: Postgresql calls it "Index on Expression":
https://www.postgresql.org/docs/current/indexes-expressional.html
I have 2 models: Post and PostLike. I want to return a query of Posts filtered by their PostLike created_at field, as well as return the total count of posts in that query efficiently (without using .count()).
Simplified models:
class Post(db.Model):
...
post_likes = db.relationship('PostLike', back_populates="post")
class PostLike(db.Model):
...
created_at = db.column(db.Float)
post_id = db.Column(UUID(as_uuid=True), db.ForeignKey("post.id"), index=True)
post = db.relationship("Post", back_populates="post_likes", foreign_keys=[post_id])
Here are the queries I'm trying to run:
# Get all posts
posts = Post.query.join(PostLike, Post.post_likes).order_by(PostLike.created_at.desc()).all()
# Get total # of posts
posts = Post.query.join(PostLike, Post.post_likes).order_by(PostLike.created_at.desc()).count()
There are 3 problems with those queries.
I'm not sure those queries are the best for my use case. Are they?
The query returns the wrong number as count. The count query returns a number higher than the results of the .all() query. Why?
This is not performant as it is calling directly .count(). How do I implement an efficient query to also retrieve the count? Something like .statement.with_only_columns([func.count()])?
I'm using Postgres, and I'm expecting up to millions of rows to count. How do I achieve this efficiently?
re: efficiency of .count()
The comments in other answers (linked in a comment to this question) appear to be outdated for current versions of MySQL and PostgreSQL. Testing against a remote table containing a million rows showed that whether I used
n = session.query(Thing).count()
which renders a SELECT COUNT(*) FROM (SELECT … FROM table_name), or I use
n = session.query(sa.func.count(sa.text("*"))).select_from(Thing).scalar()
which renders SELECT COUNT(*) FROM table_name, the row count was returned in the same amount of time (0.2 seconds in my case). This was tested using SQLAlchemy 1.4.0b2 against both MySQL version 8.0.21 and PostgreSQL version 12.3.
I have a document structure which looks something like this:
{
...
"groupedFieldKey": "groupedFieldVal",
"otherFieldKey": "otherFieldVal",
"filterFieldKey": "filterFieldVal"
...
}
I am trying to fetch all documents which are unique with respect to groupedFieldKey. I also want to fetch otherField from ANY of these documents. This otherFieldKey has minor changes from one document to another, but I am comfortable with getting ANY of these values.
SELECT DISTINCT groupedFieldKey, otherField
FROM bucket
WHERE filterFieldKey = "filterFieldVal";
This query fetches all the documents because of the minor variations.
SELECT groupedFieldKey, maxOtherFieldKey
FROM bucket
WHERE filterFieldKey = "filterFieldVal"
GROUP BY groupFieldKey
LETTING maxOtherFieldKey= MAX(otherFieldKey);
This query works as expected, but is taking a long time due to the GROUP BY step. As this query is used to show products in UI, this is not a desired behaviour. I have tried applying indexes, but it has not given fast results.
Actual details of the records:
Number of records = 100,000
Size per record = Approx 10 KB
Time taken to load the first 10 records: 3s
Is there a better way to do this? A way of getting DISTINCT only on particular fields will be good.
EDIT 1:
You can follow this discussion thread in Couchbase forum: https://forums.couchbase.com/t/getting-distinct-on-the-basis-of-a-field-with-other-fields/26458
GROUP must materialize all the documents. You can try covering index
CREATE INDEX ix1 ON bucket(filterFieldKey, groupFieldKey, otherFieldKey);
I'm investigating how MongoDB would work for us. One of the most used queries is used to get latest (or from a given time) measurements for each station. There is thousands of stations and each station has tens of thousands of measurements.
So we plan to have one collection for stations and another for measurements.
In SQL we would do the query with
SELECT * FROM measurements
INNER JOIN (
SELECT max(meas_time) station_id
FROM measurements
WHERE meas_time <= 'time_to_query'
GROUP BY station_id
) t2 ON t2.station_id = measurements.station_id
AND t2.meas_time = measurements.meas_time
This returns one measurement for each station, and the measurement is the newest one before the 'time_to_query'.
What query should be used in MongoDB to produce the same result? We are really using Rails and MongoId, but it should not matter.
update:
This question is not about how to perform a JOIN in MongoDB. The fact that in SQL getting the right data out of the table requires a join doesn't necessary mean that in MongoDB we would also need a join. There is only one table used in the query.
We came up with this query
db.measurements.aggregate([{$group:{ _id:{'station_id':"$station_id"}, time:{$max:'$meas_time'}}}]);
with indexes
db.measurements.createIndex({ station_id: 1, meas_time: -1 });
Even though it seems to give the right data it is really slow. Takes roughly a minute to get a bit over 3000 documents from a collection of 65 million.
Just found that MongoDB is not using the index in this query even though we are using the 3.2 version.
I guess worst case solution would be something like this (out of my head):
meassures = []
StationId.all.each do |station|
meassurement = Meassurment.where(station_id: station.id, meas_time <= 'time_to_query').order_by(meas_time: -1).limit(1)
meassures << [station.name, meassurement.measure, ....]
end
It depends on how much time query can take. Data should anyway be indexed by station_id and meas_time.
How much time does the SQL query take?
got the following query
SELECT * FROM myDB.dbo.myTable
WHERE name = 'Stockholm'
or
SELECT * FROM myDB.dbo.myTable
WHERE name LIKE('Stockholm')
I have created a fulltext index which will be taken when I use CONTAINS(name,'Stockholm') but in the two cases above it performs only a clustered index scan. This is way to slow, above 1 second. I'm a little confused because I only want to search for perfect matches which should be as fast as CONTAINS(), shouldn't it? I've read that LIKE should use a index seek at least, if you don't use wildcards respectively not using wildcards at the beginning of the word you're searching for.
Thank you in advance
I bet you have no indexes on the name column. A Full-Text index is NOT a database index and is NOT used, unless you use a full-text predicate like CONTAINS.
As stated by #Panagiotis Kanavos and #Damien_The_Unbeliever there was no "non-fulltext" index on my name column. I had to simply add a index with the following query:
CREATE INDEX my_index_name
ON myDB.dbo.myTable (name)
This improves the performance from slightly above one second to under a half second.