Why Sphinx ranking affect result count

Why Sphinx ranking affect result count - sphinx

If ranking serve to influence the "weight"
It would influence sorting.
So why with my ranking, I have less result than with another ?
Manual say
SPH_MATCH_ANY uses SPH_RANK_MATCHANY ranker;
and later in manual
SPH_RANK_MATCHANY = sum((word_count+(lcs-1)*max_lcs)*user_weight)
ref : http://sphinxsearch.com/docs/current.html#weighting
So
mode=any
and
mode=extended2;ranker=expr:sum((word_count+(lcs-1)*max_lcs)*user_weight)'
would return same results but it doesn't. Why ?
has "ranking" influence on matching ?

It seems not possible with ranking expression
So my solution is to alter search string
$arrWords = explode(' ', $searchString);
$searchString = '"'.implode('"|"', $arrWords).'"';

Related

How to use AND with ANY in NSPredicates while staying efficient

Imagine I have a collection of books in Core Data. Each book can have multiple authors. Here's what this imaginary collection of books could look like (it only has one book, just to simplify things):
[
Book(authors: [
Author(first: "John", last: "Monarch"),
Author(first: "Sarah", last: "Monarch")
])
]
I want to filter my collection of books down to only those that have an author whose name is "Sarah Monarch".
From what I've seen so far, if I wanted to write an NSPredicate to filter my collection and return a filtered collection that only contains this book, I could do it by using a SUBQUERY:
NSPredicate(format: "SUBQUERY(authors, $author, $author.#first == 'Sarah' && $author.#last == 'Monarch').#count > 0")
My understanding is that this operation is essentially the same as:
books.filter {
book in
let matchingAuthors = book.authors.filter {
author in
author.first == "John" && author.last == "Monarch"
}
return matchingAuthors.count > 0
}
My problem is that it seems there's some inefficiency here — SUBQUERY (and the example code above) will look at all authors, when we can stop after finding just one that matches. My intuition would lead me to try a predicate like:
ANY (authors.#first == "Sarah" && authors.#last == "Monarch")
Which, as code, could be:
books.filter {
book in
return book.authors.contains {
author in
author.first == "John" && author.last == "Monarch"
}
}
But this predicate's syntax isn't valid.
If I'm right, and the SUBQUERY-based approach is less efficient (because it looks at all elements in a collection, rather than just stopping at the first match) is there a more correct and efficient way to do this?

The following is an alternative approach, which does not use SUBQUERY, but should ultimately have the same results. I've no idea whether this would in practice be more or less efficient than using SUBQUERY.
Your specific concern is the inefficiency of counting all the matching Authors, rather than just stopping when the first matching Author is found. Bear in mind that there is a lot going on behind the scenes whenever a fetch request is processed. First, CoreData has to parse your predicate and turn it into an equivalent SQLite query, which will look something like this:
SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZNAME, ... FROM ZBOOK t0 WHERE (SELECT COUNT(t1.Z_PK) FROM ZAUTHOR t1 WHERE (t0.Z_PK = t1.ZBOOK AND ( t1.ZFIRST == 'Sarah' AND t1.ZLAST == 'Monarch')) ) > 0
(The precise query will depend on whether the inverse relationship is defined and if so whether it is one-many or many-many; the above is based on the relationship having a to-one inverse, book).
When SQLite is handed that query to execute, it will check what indexes it has available and then invoke the query planner to determine how best to process it. The bit of interest is the subselect to get the count:
SELECT COUNT(t1.Z_PK) FROM ZAUTHOR t1 WHERE (t0.Z_PK = t1.ZBOOK AND ( t1.ZFIRST == 'Sarah' AND t1.ZLAST == 'Monarch'))
This is the bit of the query which corresponds to your first code snippet. Note that it is in SQLite terms a correlated subquery: it includes a parameter (t0.Z_PK - this is essentially the relevant Book) from the outer SELECT statement. SQLite will search the entire table of Authors, looking first to see whether they are related to that Book and then to see whether the author first and last names match. Your proposition is that this is inefficient; the nested select can stop as soon as any matching Author is found. In SQLite terms, that would correspond to a query like this:
SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZNAME, ... FROM ZBOOK t0 WHERE EXISTS(SELECT 1 FROM ZAUTHOR t1 WHERE (t0.Z_PK = t1.ZBOOK AND ( t1.ZFIRST == 'Sarah' AND t1.ZLAST == 'Monarch')) )
(It's unclear from the SQLite docs whether the EXISTS operator actually shortcuts the underlying subselect, but this answer elsewhere on SO suggests it does. If not it might be necessary to add "LIMIT 1" to the subselect to get it to stop after one row is returned). The problem is, I don't know of any way to craft a CoreData predicate which would be converted into a SQLite query using the EXISTS operator. Certainly it's not listed as a function in the NSExpression documentation, nor is it mentioned (or listed as a reserved word) in the predicate format documentation. Likewise, I don't know of any way to add "LIMIT 1" to the subquery (though it's relatively straightforward to add to the main fetch request using fetchLimit.
So not much scope for addressing the issue you identify. However, there might be other inefficiencies. Scanning the Author table for each Book in turn (the correlated subquery) might be one. Might it be more efficient to scan the Author table once, identify those that meet the relevant criteria (first = "Sarah" and last = "Monarch"), then use that (presumably much shorter) list to search for the books? As I said at the start, that's an open question: I have no idea whether it is or isn't more efficient.
To pass results of one fetch request to another, use NSFetchRequestExpression. It's a bit arcane but hopefully the following code is clear enough:
let authorFetch = Author.fetchRequest()
authorFetch.predicate = NSPredicate(format: "#first == 'Sarah' AND #last == 'Monarch'")
authorFetch.resultType = .managedObjectIDResultType
let contextExp = NSExpression(forConstantValue: self.managedObjectContext)
let fetchExp = NSExpression(forConstantValue: authorFetch)
let fre = NSFetchRequestExpression.expression(forFetch: fetchExp, context: contextExp, countOnly: false)
let bookFetch = Book.fetchRequest()
bookFetch.predicate = NSPredicate(format: "ANY authors IN %#", fre)
let results = try! self.managedObjectContext!.fetch(bookFetch)
The result of using that fetch is a SQL query like this:
SELECT DISTINCT 0, t0.Z_PK, t0.Z_OPT, t0.ZNAME, ... FROM ZBOOK t0 JOIN ZAUTHOR t1 ON t0.Z_PK = t1.ZBOOK WHERE t1.Z_PK IN (SELECT n1_t0.Z_PK FROM ZAUHTOR n1_t0 WHERE ( n1_t0.ZFIST == 'Sarah' AND n1_t0.ZLAST == 'Monarch')
This has its own complexities (a DISTINCT, a JOIN, and a subselect) but importantly the subselect is no longer correlated: it is independent of the outer SELECT so can be evaluated once rather than being re-evaluated for each row of the outer SELECT.

get data from Moodle Database from code

I have a question on how I can extract data from Moodle based on a parameter thats "greater than" or "less than" a given value.
For instance, I'd like to do something like:
**$record = $DB->get_record_sql('SELECT * FROM {question_attempts} WHERE questionid > ?', array(1));**
How can I achieve this, cause each time that I try this, I get a single record in return, instead of all the rows that meet this certain criteria.
Also, how can I get a query like this to work perfectly?
**$sql = ('SELECT * FROM {question_attempts} qa join {question_attempt_steps} qas on qas.questionattemptid = qa.id');**
In the end, I want to get all the quiz question marks for each user on the system, in each quiz.

Use $DB->get_records_sql() instead of $DB->get_record_sql, if you want more than one record to be returned.

Thanks Davo for the response back then (2016, wow!). I did manage to learn this over time.
Well, here is an example of a proper query for getting results from Moodle DB, using the > or < operators:
$quizid = 100; // just an example param here
$cutoffmark = 40 // anyone above 40% gets a Moodle badge!!
$sql = "SELECT q.name, qg.userid, qg.grade FROM {quiz} q JOIN {quiz_grades} qg ON qg.quiz = q.id WHERE q.id = ? AND qg.grade > ?";
$records = $DB->get_records_sql($sql, [$quizid, $cutoffmark]);
The query will return a record of quiz results with all student IDs and grades, who have a grade of over 40.

Continuing a Query (paginating) on a compound index

I have a (hopefully quick) question about MongoDB queries on compound indexes.
Say I have a data set (for example, comments) which I want to sort descending by score, and then date:
{ "score" : 10, "date" : ISODate("2014-02-24T00:00:00.000Z"), ...}
{ "score" : 10, "date" : ISODate("2014-02-18T00:00:00.000Z"), ...}
{ "score" : 10, "date" : ISODate("2014-02-12T00:00:00.000Z"), ...}
{ "score" : 9, "date" : ISODate("2014-02-22T00:00:00.000Z"), ...}
{ "score" : 9, "date" : ISODate("2014-02-16T00:00:00.000Z"), ...}
...
My understanding thus far is that I can make a compound index to support this query, which looks like {"score":-1,"date":-1}. (For clarity's sake, I am not using a date in the index, but an ObjectID for unique, roughly time-based order)
Now, say I want to support paging through the comments. The first page is easy enough, I can just stick a .limit(n) option on the end of the cursor. What I'm struggling with is continuing the search.
I have been referring to MongoDB: The Definitive Guide by Kristina Chodorow. In this book, Kristina mentions that using skip() on large datasets is not very performant, and recommends using range queries on parameters from the last seen result (eg. the last seen date).
What I would like to do is perform a range query that acts on two fields, but treats the second field as secondary to the first (just like the index is sorted.) Since my compound index is already sorted in exactly the order I want, it seems like there should be some way to jump into the search by pointing at a specific element in the index and traversing it in the sort order. However, from my (admittedly rudimentary) understanding of queries in MongoDB this doesn't seem possible.
As far as I can see, I have three options:
Using skip() anyway
Using either an $or query or two distinct queries: {$or : [{"score" : lastScore, "date" : { $lt : lastDate}}, {'score' : {$lt : lastScore}]}
Using the $max special query option
Number 3 seems like the closest to ideal for me, but the reference text notes that 'you should generally use "$lt" instead of "$max"'.
To summarize, I have a few questions:
Is there some way to perform the operation I described, that I may have missed? (Jumping into an index and traversing it in the sort order)
If not, of the three options I described (or any I have overlooked), which would (very generally speaking) give the most consistent performance under the compound index?
Why is $lt preferred over $max in most cases?
Thanks in advance for your help!

Another option is to store score and date in a sub-document and then index the sub-document. For example:
{
"a" : { "score" : 9,
"date" : ISODate("2014-02-22T00:00:00Z") },
...
}
db.foo.ensureIndex( { a : 1 } )
db.foo.find( { a : { $lt : { score : lastScore,
date: lastDate } } } ).sort( { a : -1 } )
With this approach you need to ensure that the fields in the BSON sub-document are always stored in the same order, otherwise the query won't match what you expect since index key comparison is binary comparison of the entire BSON sub-document.
I would go with using $max to specify the upper bound, in conjunction with $hint to make sure that the database uses the index you want. The reason that $lt is in general preferred over $max is because $max selects the index using the specified index bounds. This means:
the index chosen may not necessarily be the best choice.
if multiple indexes exist on same fields with different sort orders, the selection of the index may be ambiguous.
The above points are covered in further detail here.
One last point: max is equivalent to $lte, not $lt, so using this approach for pagination you'll need to skip over the first returned document to avoid outputting the same document twice.

MongoDB Indexing 2 fields and let index be used for search on 3th field

From MongoDB Documentation
If you have a compound index on multiple fields, you can use it to query on the beginning subset of fields. So if you have an index on
a,b,c
you can use it query on [a] [a,b] [a,b,c]
So lets say i have document with this fields
UserID
Name
Country
ExtraField
My Index order is [UserID,Name,Country]
So if i have query like
var q = (from c in collection.AsQueryable()
where c.UserID == UserID
where Name = "test"
where Country = 1
where ExtraField = "check"
select c);
Does this query use index for first 3 parameters and then search for ExtraField without index?
If yes, then is it same on this query too
var q = (from c in collection.AsQueryable()
where c.UserID == UserID
where ExtraField = "check"
select c);

The answer to both questions is yes.
For your first query, the performance will depend on the selectivity of the result set. So if the 3 fields in the index matched a very large number of documents, performance would be slow as all those documents would need to be scanned to match on ExtraField. If only a few documents were matched, performance would be fast.
Now, if your query didn't include the first field in the index at all, the index would not be used. The following query, for example, would not be able to use the index:
var q = (from c in collection.AsQueryable()
where Name = "test"
where Country = 1
select c);
Have a look here for some interesting facts about finding other combinations of fields in the index.
I would recommend using the explain command when in doubt about questions like this.

Complex SphinxQL Query

I'm trying to write a SphinxQL query that would replicate the following MySQL in a Sphinx RT index:
SELECT id FROM table WHERE colA LIKE 'valA' AND (colB = valB OR colC = valC OR ... colX = valX ... OR colY LIKE 'valY' .. OR colZ LIKE 'valZ')
As you can see I'm trying to get all the rows where one string column matches a certain value, AND matches any one of a list of values, which mixes and matches string and integer columns / values)
This is what I've gotten so far in SphinxQL:
SELECT id, (intColA = intValA OR intColB = intValB ...) as intCheck FROM rt_index WHERE MATCH('#requiredMatch = requiredValue');
The problem I'm running into is in matching all of the potential optional string values. The best possible query (if multiple MATCH statements were allowed and they were allowed as expressions) would be something like
SELECT id, (intColA = intValA OR MATCH('#checkColA valA|valB') OR ...) as optionalMatches FROM rt_index WHERE optionalMatches = 1 AND MATCH('#requireCol requiredVal')
I can see a potential way to do this with CRC32 string conversions and MVA attributes but these aren't supported with RT Indexes and I REALLY would prefer not switch from them.

One way would be to simply convert all your columns to normal fields. Then you can put all this logic inside the MATCH(..). Ie not using attributes.
Yes you can only have one MATCH per query.
Otherwise, yes you could use the CRC trick to make string attributes into integer ones, so can use for filtering.
Not sure why you would need MVA, but they are now supported in RT indexes in 2.0.2

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why Sphinx ranking affect result count - sphinx

It seems not possible with ranking expression So my solution is to alter search string $arrWords = explode(' ', $searchString); $searchString = '"'.implode('"|"', $arrWords).'"';

Related

How to use AND with ANY in NSPredicates while staying efficient

get data from Moodle Database from code

Continuing a Query (paginating) on a compound index

MongoDB Indexing 2 fields and let index be used for search on 3th field

Complex SphinxQL Query

Categories

Resources