Sphinx seems to force a Order on ID? - sphinx

I added a new field to my index (weight) which is an integer based value I want to sort on.
I added it to the select and invoked it as sql_attr_uint
When I call it in my query it shows up. However when I try to sort on it i get strange behavior. It always sorts on the record ID instead. So Order on ID is identical to Order on Weight.
I've checked the Index pretty thoroughly and can't find a reason why, does sphinx auto-sort on record ID somehow?
I know the details are fairly sparse yet I'm hoping there is some basic explanation I'm missing before asking anyone to delve in further.
As an update: I don't believe the ID field sort has been "imposed" on the index in anyway inadvertently since I can Order by a # of other fields, both integer and text and the results are returned independent of the ID values (e.g sort on last name Record #100 Adams will come before Record #1 Wyatt)
Yet ordering on Weight always returns the same order as ordering by ID whether it is asc or desc. No error about the field or index not existing or being sortable, no ignoring the order request (desc and asc work) it just ignores that particular field value and uses the ID instead.
Further Update: The Weight value is indexed via a join to the main table indexed by sphinx in the following manner:
sql_attr_multi = uint value_Weight from ranged-query; \
SELECT j.id AS ID, IF(s.Weight > 0, 1, 0) AS Weight \
FROM Customers j \
INNER JOIN CustomerSources s ON j.customer_id = s.customer_id \
AND j.id BETWEEN $start AND $end \
ORDER BY s.id; \
SELECT MIN(id), MAX(id) FROM Customers
Once indexed sorting on both id and value_Weight return the same sort whereas Weight and ID are unrelated.

Ah yes, from
http://sphinxsearch.com/docs/current/mva.html
Filtering and group-by (but not sorting) on MVA attributes is supported.
Can't sort by a MVA attribute (which as noted in comments makes sense, as MVAs usually contain many values, sorting by many values is rather 'tricky'.
When you try, it simply fails. So sorting is falling back on the 'natural' order of the index, which is usually by ID.
Use sql_attr_unit instead
http://sphinxsearch.com/docs/current/conf-sql-attr-uint.html
(but will proabbly mean rewriting the sql_query to perform the JOIN on CustomerSources )

Related

Postgres pagination with non-unique keys?

Suppose I have a table of events with (indexed) columns id : uuid and created : timestamp.
The id column is unique, but the created column is not. I would like to walk the table in chronological order using the created column.
Something like this:
SELECT * FROM events WHERE created >= $<after> ORDER BY created ASC LIMIT 10
Here $<after> is a template parameter that is taken from the previous query.
Now, I can see two issues with this:
Since created is not unique, the order will not be fully defined. Perhaps the sort should be id, created?
Each row should only be on one page, but with this query the last row is always included on the next page.
How should I go about this in Postgres?
SELECT * FROM events
WHERE created >= $<after> and (id >= $<id> OR created > $<after>)
ORDER BY created ASC ,id ASC LIMIT 10
that way the events each timestamp values will be ordered by id. and you can split pages anywhere.
you can say the same thing this way:
SELECT * FROM events
WHERE (created,id) >= ($<after>,$<id>)
ORDER BY created ASC ,id ASC LIMIT 10
and for me this produces a slightly better plan.
An index on (created,id) will help performance most, but for
many circumstances an index on created may suffice.
First, as you said, you should enforce a total ordering. Since the main thing you care about is created, you should start with that. id could be the secondary ordering, a tie breaker invisible to the user that just ensures the ordering is consistent. Secondly, instead of messing around with conditions on created, you could just use an offset clause to return later results:
SELECT * FROM events ORDER BY created ASC, id ASC LIMIT 10 OFFSET <10 * page number>
-- Note that page number is zero based

Are there plans to add 'OR' to attribute searches in Sphinx?

A little background is in order for this question since it is on surface too generic:
Recently I ran into an issue where I had to move the attribute values I was pushing into my sphinxql query as full-text because the attribute needed to be part of an 'OR' query.
In other words I was doing:
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3)
When I tried to add an 'OR' to the attributes such as:
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3) OR customer_id in (4,5,6)
it failed because Sphinx 2.* does not support OR in the attribute query.
I was also unable to simply put the name and customer IDs in to the query:
Select * from idx_test where MATCH('Terms ((#(name_id) 1|2|3)|(#customer_id) 4|5|6))')
Because (as far as I can tell) you can't push integer fields into the full_text search.
My solution was to index the id fields a second time appended by _text:
Select name_id, name_id as name_id_text
and then add that to the field list:
sql_attr_uint = name_id
sql_field_string = name_id_text
sql_attr_uint = customer_id
sql_field_string = customer_id_text
So now I can do my OR query as full_text:
Select * from idx_test where MATCH('Terms ((#(name_id_text) 1|2|3)|(#customer_id_text) 4|5|6))')
However recently I found an article that discussed the tradeoff between attribute and full-text searches. The upshot is that "it could reduce performance of queries that otherwise match few records". Which is precisely what my name_id/city_id query does. In an ideal world then I'd be able to go back to:
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3) OR customer_id in (4,5,6)
If Sphinx would only allow for OR between attributes since as far as I can tell once I have a query that is filtering down to a relatively low # of results I'd have a much faster query using attributes vs full_text.
So my two-part question therefor is:
Am I in fact correct that this is the case (a query that would reduce the # of results significantly is better served doing attributes then full-text)?
If so are there plans to add OR to the attribute part of the SphinxQL query?
If so, when?
OR filter has been added in the Sphinx fork (from 2.3 branch) - Manticore, see https://github.com/manticoresoftware/manticore/commit/76b04de04feb8a4db60d7309bf1e57114052e298
For now it's only between attributes, OR between MATCH and attributes is not supported yet.
While yes, OR is not supported directly in WHERE, can still run the query. Your
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3) OR customer_id in (4,5,6)
example can be written as
Select *, IN(name_id,1,2,3) + IN(customer_id,4,5,6) as filter
from idx_test where MATCH('Terms') and filter > 0
It is a bit more cumbersome, but should work. You still get the full benefit of the full-text inverted index, so performance actully shoudnt be bad. The fitler is only executed against docs matching the terms.
(this may look crazy, if coming from say mysql background, but remeber sphinxQL isnt mysql :)
You dont get 'short circuiting (ie customer_id filter, will still be run, even if matches name_id), so perhaps
Select *, IF(IN(name_id,1,2,3) OR IN(customer_id,4,5,6),1,0) as filter
from idx_test where MATCH('Terms') and filter =1
is even better, the if function has an OR operator! (as sphinx could potentially short-circuit, but don't know if it does)
(but also yes, if the 'filter' is highly selective (matching few rows), than including in the full-text query can be good. As it discards the rows earlier in processing. The problem with non-selective filters, is they have lots of matching rows, so a long doclist to process during text-query processing)

Will Postgres' DISTINCT function always return null as the first element?

I'm selecting distinct values from tables thru Java's JDBC connector and it seems that NULL value (if there's any) is always the first row in the ResultSet.
I need to remove this NULL from the List where I load this ResultSet. The logic looks only at the first element and if it's null then ignores it.
I'm not using any ORDER BY in the query, can I still trust that logic? I can't find any reference in Postgres' documentation about this.
You can add a check for NOT NULL. Simply like
select distinct columnName
from Tablename
where columnName IS NOT NULL
Also if you are not providing the ORDER BY clause then then order in which you are going to get the result is not guaranteed, hence you can not rely on it. So it is better and recommended to provide the ORDER BY clause if you want your result output in a particular output(i.e., ascending or descending)
If you are looking for a reference Postgresql document then it says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
If it is not stated in the manual, I wouldn't trust it. However, just for fun and try to figure out what logic is being used, running the following query does bring the NULL (for no apparent reason) to the top, while all other values are in an apparent random order:
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
However, cross joining the table with a modified version of itself brings two NULLs to the top, but random NULLs dispersed througout the resultset. So it doesn't seem to have a clear-cut logic clumping all NULL values at the top.
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
cross join (select n+3 from t) t2

TSQL Keyword Previous or Last or something similar

This question is geared for those who have more SQL experience than me.
I am writing a query(that will eventually be a Stored Procedure but this should be irrelevant) where I want to select the count of rows if the most recent entry's is equivalent to the one that was just entered before. And i want to continue to do this until it hits an entry that has a different value. (Poorly explained so I will show the example)
In my table I have a column 'Product_Id' and when this query is run i want it take the product_id and compare it to the previously entered product Id, if its the same I want to add one, and I want it to keep checking the previously entered product_id until it runs into a different product_id
I'm hoping it sounds more complicated than it is, and the query would look something like
Select count(Product_ID)
FROM dbo.myTable
Where Product_Id = previous(Product_Id)
Now, i know that previous isn't a keyword in TSQL, and neither was Last, but I'm hoping of someone who knows a keyword that does what I am asking.
Edit for Sam
USE DbName;
GO
WITH OrderedCount as
(
select ROW_NUMBER() OVER (Order by dbo.Line_Production.Run_Date DESC) as RowNumber,
Line_Production.Product_ID
From dbo.Line_Production
)
Select RowNumber, COUNT(OrderedCount.Product_ID) as PalletCount
From OrderedCount
WHERE OrderedCount.RowNumber + 1 = RowNumber
and Product_ID = Product_ID
Group by RowNumber
The OrderedCount portion works, and it returns the data back how I want it, I'm now having trouble comparing the Product_ID's for different RowNumbers
my Where Clause is wrong
There's no keyword. That would be a nice magic solution, but it doesn't exist, at least in part because there is no guaranteed ordering (okay, you could have the keyword only if there is an ORDER BY...). I can write you a query, but that'll take time, so for now I'll give you a few steps and I'll come back and see if you still need help in a bit.
Figure out an ORDER BY, otherwise no order is guaranteed. If there is a time entered field, that's a good choice, or an index, that works too.
Learn to use Row_Number.
Compare the table (with Row_Number) to itself where instance1.row - 1 = instance2.row.
If product_id is an identity column, couldn't you just do product_id - 1? In other words, if it's sequential, it's the same as using ROW_NUMBER mentioned in the previous comment.

sql date order by problem

i have image table, which has 2 or more rows with same date.. now im tring to do order by created_date DESC, which works fine and shows rows same position, but when i change the query and try again, it shows different positions.. and no i dont have any other order by field, so im bit confused on why its doing it and how can i fix it.
can you please help on this.
To get reproducible results you need to have columns in your order by clause that together are unique. Do you have an ID column? You can use that to tie-break:
ORDER BY created_date DESC, id
I suspect that this is happening because MySQL is not given any ordering information other than ORDER BY created_date DESC, so it does whatever is most convenient for MySQL depending on its complicated inner workings (caching, indexing, etc.). Assuming you have a unique key id, you could do:
SELECT * FROM table t ORDER BY t.created_date DESC, t.id ASC
Which would give you the same result every time because putting a comma in the arguments following ORDER BY gives it a secondary ordering rule that is executed when the first ordering rule doesn't produce a clear order between two rows.
To have consistent results, you will need to add at least more column to the 'ORDER BY' clause. Since the values in the created_date column are not unique, there is not a defined order. If you wanted that column to be 'unique', you could define it as a timestamp.