Sphinx Ranged query + killlist

Sphinx Ranged query + killlist - sphinx

i have a sphinx instance with two indexes configured: main and delta. Both of them have sql query range.
in the delta index i have a killlist query to remove modified articles from the main index.
should this query be ranged like the content query ?
i.e.
source delta : main {
sql_query_range = SELECT MIN(id),MAX(id) FROM documents
sql_range_step = 1000
sql_query = SELECT * FROM documents WHERE id>=$start AND id<=$end WHERE ID > (SELECT maxID from SphinxTable)
sql_query_killlist = SELECT id FROM documents WHERE id>=$start AND id<=$end WHERE ID > (SELECT maxID from SphinxTable)
}

should this query be ranged like the content query ?
No. killlists dont support ranged queries. It just runs one query.
Incidently, this:
sql_query_range = SELECT MIN(id),MAX(id) FROM documents
looks wrong. That is taking ALL ids from the documents table. But the sql_query has an aditional caluse using maxID from SphinxTable
Should be be something like
sql_query_range = SELECT (SELECT maxID from SphinxTable),MAX(id) FROM documents
Otherwise your are going to be issuing lots of queries - to fetch documents that would be in main, which will never match because of the second calise.
So just do
sql_query = SELECT * FROM documents WHERE id>=$start AND id<=$end WHERE ID > (SELECT maxID from SphinxTable) OR updated > (SELECT updatedts FROM SphinxTable)
sql_query_killlist = SELECT id FROM documents WHERE id <= (SELECT maxID from SphinxTable) AND updated > (SELECT updatedts FROM SphinxTable)
Note the change in equality. You want documents that are in the main, in your killlist. But you also only want the documents updated since the last reindex.

Related

Postgres, update statement from jsonb array with sorting

I have a jsonb column in my table - it contains array of json objects
one of fields in these objects is a date.
Now i added new column in my table of type timestamp.
And now i need statement which hepls me to update new column with most recent date value from jsonb array column af a same record.
Following statement works great on selecting most recent date from jsonb array column of certain record:
select history.date
from document,
jsonb_to_recordset(document.history) as history(date date)
where document.id = 'd093d6b0-702f-11eb-9439-0242ac130002'
order by history.date desc
limit 1;
On update i have tried following:
update document
set status_recent_change_date = subquery.history.date
from (
select id, history.date
from document,
jsonb_to_recordset(document.history) as history(date date)
) as subquery
where document.id = subquery.id
order by history.date desc
limit 1;
Last statement does not working.

demo:db<>fiddle
UPDATE document d
SET status_recent_change_date = s.date
FROM (
SELECT DISTINCT ON (id)
*
FROM document,
jsonb_to_recordset(document.history) AS history(date date)
ORDER BY id, history.date DESC
) s
WHERE d.id = s.id;
Using LIMIT would not work because you limit the entire output of your SELECT statement. But you want to limit the output of each document.id. This can be done using DISTINCT ON (id).
This result can be used to update each record using their id values.

You most likely don't need to use LIMIT command.
It is enough to do the sorting inside SUBQUERY:
UPDATE document SET status_recent_change_date = subquery.hdate
FROM (
SELECT id, history.date AS hdate
FROM document, jsonb_to_recordset(document.history) AS history(date date)
ORDER BY history.date DESC
) AS subquery
WHERE document.id = subquery.id

Create createQueryBuilder inside SQL query for TypeOrm

There is query that is dynamic built through createQueryBuilder, example query:
SELECT * FROM task
WHERE (("attrs"->>'count')::int > 300 )
ORDER BY id DESC
But me need next sql:
WITH tmp as (
SELECT * FROM task
WHERE (("attrs"->>'count')::int > 300 )
ORDER BY id DESC)
SELECT * FROM tmp LIMIT 100;
I know that in TypeOrm can add LIMIT in qb.limit (100); But need exactly for WITH. Is it possible create this through createQueryBuilder?

postgres select count distinct returning unexpected extra row

If there is one more UID in sessions than there is in users (obviously not supposed to be that way), then I expect to have a non-empty result set when I run the last select, but I get no rows returned - this result just doesn't make logical sense to me...
select count(distinct(uid)) from users;
> 108736
select count(distinct(uid)) from sessions;
> 108737
select count(*) from sessions where uid not in (select uid from users);
> 0
and just for completeness:
select count(*) from users where uid not in (select uid from sessions);
> 0
I have checked for nulls:
select count( * ) from sessions where uid is null;
> 0
select count( * ) from users where uid is null;
> 14
The schema is defined in sqlalchemy and includes a foreign key in the session table:
uid = Column(Integer, ForeignKey('users.uid', use_alter=True, name='fk_uid'))
This schema is a static dump for analytics purposes so there is no chance of concurrency issues...

Your third query does not do what you think it does.
The following query illustrates the problem:
SELECT 1 NOT IN (SELECT unnest(ARRAY[NULL]::int[]));
This returns NULL, because it can't say if 1 <> NULL.
So, in your query the where condition is always NULL, because users contains a NULL uid.
I recommend using EXCEPT do find the culprit in your sessions table.
SELECT uid from sessions EXCEPT SELECT uid from users;

How to optimize selecting one random row from a set acquired by JOIN

Query in English:
Retrieve a random row from stuff.
row is not mentioned in done.
row belongs to the highest* scored friend.
*if no rows that belong to highest scored friend are found, take the next friend, an so on.
My current query takes too long to complete, because it is randomly ordering all stuff, while it should randomly order batch after batch.
Here is an sqlfiddle with tables and data.
My query:
WITH ordered_friends AS (SELECT *
FROM friends
ORDER BY score DESC)
SELECT s.stuff_id
FROM ordered_friends
INNER JOIN (SELECT *
FROM stuff
ORDER BY random()) AS s ON s.owner = ordered_friends.friend
WHERE NOT EXISTS(
SELECT 1
FROM done
WHERE done.me = 42
AND done.friend = s.owner
AND done.stuff_id = s.stuff_id
)
-- but it should keep the order of ordered_friends (score)
-- it does not have to reorder all stuff
-- one batch for each friend is enough until a satisfying row is found.
LIMIT 1;

How about this?
SELECT s.stuff_id
FROM friends
CROSS JOIN LATERAL (SELECT stuff_id
FROM stuff
WHERE stuff.owner = friends.friend
AND NOT EXISTS(SELECT 1
FROM done
WHERE done.me = 42
AND done.friend = stuff.owner
AND done.stuff_id = stuff.stuff_id
)
ORDER BY random()
LIMIT 1
) s
ORDER BY friends.score DESC
LIMIT 1;
The following indexes would make it fast:
CREATE INDEX ON friends(score); -- for sorting
CREATE INDEX ON stuff(owner); -- for the nested loop
CREATE INDEX ON done(stuff_id, friend); -- for NOT EXISTS

Simple SELECT, but adding JOIN returns too many rows

The query below returns 9,817 records. Now, I want to SELECT one more field from another table. See the 2 lines that are commented out, where I've simply selected this additional field and added a JOIN statement to bind this new columns. With these lines added, the query now returns 649,200 records and I can't figure out why! I guess something is wrong with my WHERE criteria in conjunction with the JOIN statement. Please help, thanks.
SELECT DISTINCT dbo.IMPORT_DOCUMENTS.ITEMID, BEGDOC, BATCHID
--, dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.CATEGORY_ID
FROM IMPORT_DOCUMENTS
--JOIN dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS ON
dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID = dbo.IMPORT_DOCUMENTS.ITEMID
WHERE (BATCHID LIKE 'IC0%' OR BATCHID LIKE 'LP0%')
AND dbo.IMPORT_DOCUMENTS.ITEMID IN
(SELECT dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID FROM
CATEGORY_COLLECTION_CATEGORY_RESULTS
WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN(
SELECT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16))
AND Sample_Id > 0)
AND dbo.IMPORT_DOCUMENTS.ITEMID NOT IN
(SELECT ASSIGNMENT_FOLDER_DOCUMENTS.Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)

One possible reason is because one of your tables contains data at lower level, lower than your join key. For example, there may be multiple records per item id. The same item id is repeated X number of times. I would fix the query like the below. Without data knowledge, Try running the below modified query.... If output is not what you're looking for, convert it into SELECT Within a Select...
Hope this helps....
Try this SQL: SELECT DISTINCT a.ITEMID, a.BEGDOC, a.BATCHID, b.CATEGORY_ID FROM IMPORT_DOCUMENTS a JOIN (SELECT DISTINCT ITEMID FROM CATEGORY_COLLECTION_CATEGORY_RESULTS WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN (SELECT DISTINCT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16)) AND Sample_Id > 0) B ON a.ITEMID =b.ITEMID WHERE a.(a.BATCHID LIKE 'IC0%' OR a.BATCHID LIKE 'LP0%') AND a.ITEMID NOT IN (SELECT DIDTINCT Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)