Pagination on large data sets? – Abort count(*) after a certain time - oracle10g

We use the following pagination technique here:
get count(*) of given filter
get first 25 records of given filter
-> render some pagination links on the page
This works pretty well as long as count(*) is reasonable fast. In our case the data size has grown to a point where a non-indexd query (although most stuff is covered by indices) takes more than a minute. So at this point the user waits for a mostly unimportant number (total records matching filter, number of pages). The first N records are often ready pretty fast.
Therefore I have two questions:
can I limit the count(*) to a certain number
or would it be possible to limit it by time? (no count() known after 20ms)
Or just in general: are there some easy ways to avoid that problem? We would like to keep the system as untouched as possible.
Database: Oracle 10g
Update
There are several scenarios
a) there's an index -> neither count(*) nor the actual select should be a problem
b) there's no index
count(*) is HUGE, and it takes ages to determine it -> rownum would help
count(*) is zero or very low, here a time limit would help. Or I could just dont do a count(*) if the result set is already below the page limit.

You could use 'where rownum < x' to limit the number of rows to count. And if you need to show to your user that you has more register, you could use x+1 in count just to see if there is more than x registers.

Related

Getting the total number of records in a single knexjs query when using the limit() method

I use knexjs and postgresql. Is it possible in knexjs to get the total of records from the same query in which the limit is used?
For example:
knex.select().from('project').limit(50)
Is it possible to somehow get the total number of records in the same query if there are more than 50?
The question arose due to the fact that my query is much more complex, which uses a lot of subqueries and conditions, and I would not like to make this query twice to get the data in one query and the total number of records (I use the .count() method) from another.
I do not know your obscurification manager (knexjs?) but I would think you should be able to add the window version of the count() function to your select list. In plain SQL something like: Where ... represents your current select list. (see demo)
select ..., count(*) over() total_rows
from project
limit 5;
This works because the window count function counts all rows selected, after all rows selected, but before the LIMIT clause is applied. Note: This adds a column to the result set with the same value in every row.

Is there a way to see if a limit offset query has reached the end with pgpromse?

I have a table of posts. I would like to query these posts as pages. Because I would like to keep my endpoints stateless I would like to do this with offset and limit like this:
SELECT * FROM post LIMIT 50 OFFSET $1 ORDER BY id
Where $1 one would be the page number times the page size (50). The easy way to check if we have reached the end would be to see if we got 50 pages back. The problem of course is if the number of pages is divisible by 50, we can't be sure.
The way I have solved this until now is by simply fetching 51 posts per query with the page size still being 50. That way if the return query is less than 51, we have reached the end.
Unfortunately, this seems a very hacky way to do this. So I was wondering, is there some feature within pg-promise or postgresql that would indicate that I have reached the end of a table without resorting to tricks like this?
The simplest method with the lowest overhead I found:
You can request pageLimit+1 rows on every page request. In your controller you will check if rowsCount > pageLimit and will know that there is more data available. Of course, before returning the rows, you would need to remove the last element and send along the rows something like a hasNext boolean.
It is usually way cheaper for the DB to retrieve an extra row of data than count all rows or make an extra request for page+1 to check if it returns any rows.
Well there is no built in process for this directly. But you can count the rows and add that to the results. You could then even give the user the number of items or number of pages:
-- Item count
with pc(cnt) as (select count(*) from post)
select p.*, cnt
from post p
cross join pc
limit 50 offset $1;
-- page count
with pc(cnt) as (select count(*)/50 + ((count(*)%50)>0)::int from post)
select p.*, cnt
from post p
cross join pc
limit 50 offset $1;
Caution: The count function can be slow, and even when not it does add to response time. Is it worth the additional overhead? Only you and the user can answer that.
This method works well only in specific settings (SPA with caching of network requests and desire to make pagination feel faster with pre-fetching):
One every page, you make two requests: one for the current page data and one for the next page's data.
It works if you for example use a React Single-Page Application with react-query where the nextPage will not be refetched but reused when user opens it.
Otherwise, if the nextPage is not reused, it's worse than checking for a total number of rows to determine whether there are any rows left as you will make 2 requests for every page.
It will even make the user interface snappier as the transition to the next page will always be instant.
This method will work well if you have a lot of page transitions as the total number of calls equals numberOfPages+1, so if on average users go to 10 pages, numberOfPages+1=10+1 or just 10% overhead. But if your users usually do not go beyond the first page, it makes little sense as in this case numberOfPages+1=2 calls for a single page.

MySQL Workbench - script storing return in array and performing calculations?

Firstly, this is part of my college homework.
Now that's out of the way: I need to write a query that will get the number of free apps in a DB as a percentage of the total number of apps, sorted by what category the app is in.
I can get the number of free apps and also the number of total apps by category. Now I need to find the percentage, this is where it goes a bit pear-shaped.
Here is what I have so far:
-- find total number of apps per category
select #totalAppsPerCategory := count(*), category_primary
from apps
group by category_primary;
-- find number of free apps per category
select #freeAppsPerCategory:= count(*), category_primary
from apps
where (price = 0.0)
group by category_primary;
-- find percentage of free apps per category
set #totals = #freeAppsPerCategory / #totalAppsPercategory * 100;
select #totals, category_primary
from apps
group by category_primary;
It then lists the categories but the percentage listed in each category is the exact same value.
I had initially thought to use an array, but from what I have read mySql does not seem to support arrays.
I'm a bit lost of how to proceed from here.
Finally figured it out. Since I had been saving the previous results in variables it seems that it was not able to calculate on a row by row basis, which is why all the percentages were identical, it was an average. So the calculation needed to be part of the query.
Here's what I came up with:
SELECT DISTINCT
category_primary,
CONCAT(FORMAT(COUNT(CASE
WHEN price = 0 THEN 1
END) / COUNT(*) * 100,
1),
'%') AS FreeAppSharePercent
FROM
apps
GROUP BY category_primary
ORDER BY FreeAppSharePercent DESC;
Then the query result is:

improve performance for postgres

I count the number of users in this way, it runs for 5 seconds to produce results, I am looking for a better solution
SELECT COUNT(*)
FROM (SELECT user_id
FROM slot_result_primary
WHERE session_timestamp BETWEEN 1590598800000 AND 1590685199999
GROUP BY user_id) AS foo
First of all you can simplify the query:
SELECT COUNT(DISTINCT user_id)
FROM slot_result_primary
WHERE session_timestamp BETWEEN 1590598800000 AND 1590685199999
Most importantly - make sure you have an index on sesion_timestamp
Counting is a very heavy operation in Postgres. It should be avoided if possible.
It is very difficult to make it better so for each row Postgress needs to go the the disc. You can indeed create a better index to choose which rows to pick from the disc faster but even with this count time will always go up in time in a linear time compared to the size of the data.
Your index should be:
CREATE INDEX session_timestamp_user_id_index ON slot_result_primary (session_timestamp, user_id)
for best results.
Still an index will not solve your count problems fully. In a similar situation I faced two days ago (with a SELECT query running 3s and count
running 1s) dedicated indexes allowed to push down the time of select to 0,3ms but best I could do with count was 700ms.
Here you can find a good article with a summary why count is difficult and different ways to make it better:
https://www.citusdata.com/blog/2016/10/12/count-performance/

Query distinct values from historical database

If I run this query on large Historical database without specifying a date, will KDB be smart enough to retrive status values from index and not bring database down?
select distinct status from trades
The only way kdb can possibly tell all the distinct status is by reading from every partition. Yes this will take a lot of memory but unless you yourself want to maintain a cache of all distinct status, there is nothing else you can do. As previous mentioned an attribute will speed the query up but the query time will still only scale with the number of partitions.
To retrieve using index, kdb provides 'g#' attribute. Distinct alone can take more time which depends on size of your table(it will be linear search without `g# attribute).
Check this-> http://code.kx.com/q4m3/8_Tables/#88-attributes
Let's look at simple example:
q) a: 10000000#1 2 3 5
q) b:`g#a
q) \ts distinct a
68 134217888
q) \ts distinct b
0 288
Difference shows that g# attribute makes a lot of difference in time and space taken during searching. It is becauseg# attribute creates and maintains index on vector.