I started using Zend_Paginator,
it works everything fine but I noticed that there is one more query which slows the load time down.
The additional query:
SELECT COUNT(1) AS `zend_paginator_row_count` FROM `content`
The normal query:
SELECT `content`.`id`, `content`.`name` FROM `content` LIMIT 2
PHP:
$adapter = new Zend_Paginator_Adapter_DbSelect($table->select()->from($table, array('id', 'name')));
$paginator = new Zend_Paginator($adapter);
Could I merge the two querys into one (for better performance)?
Make sure you have an index on one or more int-based columns of the selected table. So the count query will not have much of a performance inpact. You can use setRowCount() to provide the count (if you have it).
From http://framework.zend.com/manual/en/zend.paginator.usage.html :
Note: Instead of selecting every
matching row of a given query, the
DbSelect and DbTableSelect adapters
retrieve only the smallest amount of
data necessary for displaying the
current page. Because of this, a
second query is dynamically generated
to determine the total number of
matching rows. However, it is possible
to directly supply a count or count
query yourself. See the setRowCount()
method in the DbSelect adapter for
more information.
Merging the two wouldn't really have much of a performance increase if any. The actual select content has the limit statement in it (as you are trying to get a subset of the entire table in the database) where the count needs to count all rows in the database. The reason it is done like this is to prevent having to select a very large set of data simply to get the count.
Related
I'm currently working on a project involving keeping track of users and their actions with my database (PostgreSQL as the RDMS), and I have run into an issue when trying to perform COUNT(*) on occurrences of each user. What I want is to be able to, efficiently, count the number of times each user appears from every record, and also be able to achieve looking at counts on a particular date range.
So, the problem is how do we achieve counting the total number of times a user appears from the tables contents, and how do we count the total number on a date range.
What I've tried
As you might know, Postgres doesn't support COUNT(*) very well using indices, so we have to consider other ways to reduce the # of records it looks at in order to speed up the query. So my first approach is to create a table to keep track of the number of times a user has a log message associated with them, and on what day (similar to the idea behind a materialized view, but I dont want continually refresh the materialized view with my count query). Here is what I've come up with:
CREATE TABLE users_counts(user varchar(65536), counter int default 0, day date);
CREATE RULE inc_user_date_count
AS ON INSERT TO main_table
DO ALSO UPDATE users_counts SET counter = counter + 1
WHERE user = NEW.user AND day = DATE(NEW.date_);
What this does is every time a new record is inserted into my 'main_table', we update the current users_counts table to increment the records whose date is equal to the new records date, and the user names are the same.
NOTE: the date_ column in 'main_table' is a timestamp so I must cast the new records date_ to be a DATE type.
The problem is, what if the user column value doesn't already exist in my new table 'users_count' for the current day, then nothing is updated.
Here is my question:
How do I write the rule such that we check if a user exists for the current day, if so increment that counter, otherwise insert new row with user, day, and counter of 1;
I also would like to know if my approach makes sense to do, or is there any ideas I am missing that I just haven't thought about. As my database grows, it is increasingly inefficient to perform counting, so I want to avoid any performance bottlenecks.
EDIT 1: I was able to actually figure this out by creating a separate RULE but I'm not sure if this is correct:
CREATE RULE test_insert AS ON INSERT TO main_table
DO ALSO INSERT INTO users_counts(user, counter, day)
SELECT NEW.user, 1, DATE(NEW.date)
WHERE NOT EXISTS (SELECT user FROM users.log_messages WHERE user = NEW.user_);
Basically, an insert happens if the user doesn't already exist in my CACHED table called user_counts, and the first rule above updates the count.
What I'm unsure of is how do I know when which rule is called first, the update rule or insert.. And there must be a better way, how do I combine the two rules? Can this be done with a function?
It is true that postgresql is notoriously slow when it comes to count(*) queries. However if you do have a where clause that limits the number of entries the query will be much faster. If you are using postgresql 9.2 or newer this query will be just as fast as it's in mysql because of index only scans which was added in 9.2 but it's best to explain analyze your query to make sure.
Does my solution make sense?
Very much so provided that your explain analyze show that index only scans are not being used. Trigger based solutions like the one that you have adapted find wide usage. But as you have realized the problem with the initial state arises (whether to do an update or an insert).
which rule is called first
Multiple rules on the same table and same event type are applied in
alphabetical name order.
from http://www.postgresql.org/docs/9.1/static/sql-createrule.html
the same applies for triggers. If you want a particular rule to be executed first change it's name so that it comes up higher in the alphabetical order.
how do I combine the two rules?
One solution is to modify your rule to perform an upsert (Look right at the bottom of that page for a sample upsert ). The other is to populate the counter table with initial values. The trick is to create the trigger at the same time to avoid errors. This blog post explains it really well.
While the initial setup will be slow each individual insert will probably be faster. The two opposing factors being the slowness of a WHERE NOT EXISTS query vs the overhead of catching an exception.
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need.
Source the postgresql documentation page linked above.
Scenario:
I am displaying a table of records. It initially displays the first 500 with "show more" at the bottom, which returns the next 500.
Issue:
If between initial display and clicking "show more" 1 record is added, that will cause "order by date, offset 500, limit 500" to overlap by 1 row.
I'd like to "order by date, offset until 'id of last row shown', limit 500"
My row IDs are UUIDs. I am open to alternative approaches that achieve the same result.
If you can order by ID, you can paginate using
where id > $last_seen_id limit 500
but that's not going to be useful where you're sorting by date.
Sort stability!
I really hope that "date" actually means "timestamp" though, otherwise your ordering will be unstable and you can miss rows in pagination; you'll have to order by date, id to get stable ordering if it's really a date, and should probably do so even for timestamp.
State on client
One option is to push the state out to the client. Have the client remember the last-seen (date,id) tuple, and use:
where date > $last_seen_date and id > $last_seen_id limit 500
Cursors
Do you care about scalability? If not, you can use a server-side cursor. Declare the cursor for the full query, without the LIMIT. Then FETCH chunks of rows as requested. To do this your app must have a way to consistently bind a connection to a specific user's requests, though, and not to reset that connection or return it to the pool between requests. This might not be practical with your pool/framework, but is probably the best solution if you can do it.
Temp tables
Another even less scalable option is to CREATE TABLE sessiondata.myuser_myrequest_blah AS SELECT .... then paginate that table. It's guaranteed not to change. This avoids the difficulty of needing to keep a consistent connection across requests, but will have a very slow first-request response time and is completely impractical for large user counts or large amounts of data.
Related questions
Handling paging with changing sort orders
Using "Cursors" for paging in PostgreSQL
How to provide an API client with 1,000,000 database results?
i think you can use a subquery in the where to accomplish this.
e.g. given you're paginating through a users table, and you want the records after a given user:
SELECT *
FROM users
WHERE created_at > (
SELECT created_at
FROM users
WHERE users.id = '00000000-1111-2222-3333-444444444444'
LIMIT 1
)
ORDER BY created_at DESC limit 5;
This is similar to to this question which doesn't have any answers. I've read all about how to use cursors with the twitter, facebook, and disqus api's and also this article about how disqus generally built their cursors, but I still cannot seem to grok the concept of how they work and how to implement a similar solution in my own projects. Can someone explain specifically the different techniques and concepts behind them?
Lets first understand why offset pagination fails for large data sets with an example.
Clients provide two parameters limit for number of results and offset and for page offset.
For example, with offset = 40, limit = 20, we can tell the database to return the next 20 items, skipping the first 40.
Drawbacks:
Using LIMIT OFFSET doesn’t scale well for large
datasets. As the offset increases the farther you go within the
dataset, the database still has to read up to offset + count rows
from disk, before discarding the offset and only returning count
rows.
If items are being written to the dataset at a high frequency, the
page window becomes unreliable, potentially skipping or returning
duplicate results.
How Cursors solve this ?
Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer.
We will use parameters next_cursor along with limit as the parameters provided by client in this case.
Let’s assume we want to paginate from the most recent user to the oldest user.When client request for the first time , suppose we select the first page through query:
SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT %limit
Where limit is equal to limit plus one, to fetch one more result than the count specified by the client. The extra result isn’t returned in the result set, but we use the ID of the value as the next_cursor.
The response from the server would be:
{
"users": [...],
"next_cursor": "1234", # the user id of the extra result
}
The client would then provide next_cursor as cursor in the second request.
SELECT * FROM users
WHERE team_id = %team_id
AND id <= %cursor
ORDER BY id DESC
LIMIT %limit
With this, we’ve addressed the drawbacks of offset based pagination:
Instead of the window being calculated from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.
This will scale well for large datasets. We’re using a WHERE clause to fetch rows with id values less than the last id from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen.
For detailed explanation you can visit this wonderful engineering article from slack!
Here is an article about pagination: paginating-real-time-data-cursor-based-pagination
Cursors – we need to have at least one column with unique sequential values to implement cursor based pagination. This can be similar to Twitter’s max_id parameter or Facebook’s after parameter.
In general you should pass the current item or page number in the request as a param. Other usual param is the batch size of the page. Then on the server side backend you select and return the proper dataset, with an SQL query for example.
enter image description hereHere's what I am Done with. The cursor is working as a pointer and it points to that index. and limit will pick that many rows from that pointer. Let's say we have given id 10 and limit 5 then it will go to id 10 and pick 5 elements from there.
Some Graph API connections uses cursors by default. You can use 'limit' and 'before'/'after' parameters in your call. If you are still not clear, you can post your code here and I can explain with it.
I need to setup worldship to pull from one of our postgres databases. I need to have it so that the packages are sorted by id. I have no way (that i am aware of) of having worldship send the order by clause so I need to have the default for the records returned to be returned by id.
On a second note I have no idea how postgres default sorts it looks like it by the last time the record was changed so if i write a two records id 1,2 then change record 2 when I run the query it returns them with record 2 being first.
Rows are returned in an unspecified order, per sql specs, unless you add an order by clause. In Postgres, that means you'll get rows in, basically, the order that live rows read on the disk.
If you want a consistent order without needing to add an order by clause, create a view as suggested in Jack's comment.
There is no such thing as a "default sort". Rows in a table are not sorted.
You could fake this with a view (as suggested by Jack Maney) there is no way you can influence the order of the rows that are returned.
But if you do that, be aware that adding an additional ORDER BY to a SELECT based on that view will sort the data twice.
Another option might be to run the CLUSTER command on that table to physically order the rows on the disk according to the column you want. But this sill does not guarantee that the rows are returned in that order. Not even with a plain SELECT * FROM your_table (but chances are reasonably high for that).
You will need to re-run this statement on a regular basis because the order created by the CLUSTER command is not automatically maintained.
For what it's worth, which probably isn't much, from my testing, it appears that PostgreSQL's "default" ordering is based on the time the records were last updated. The most recently updated records will appear last. Note that I couldn't find any documentation to support this. It's just what I've found from my own testing.
You could eventually use a sorted index, which should guarantee you order of retrieved rows in case the query plan hits the index, or if you force it, but this approach will be more than circuitous :). ORDER BY clause is the way to go as mentioned already.
this is my first time using SQL at all, so this might sound basic. I'm making an iPhone app that creates and uses a sqlite3 database (I'm using the libsqlite3.dylib database as well as importing "sqlite3.h"). I've been able to correctly created the database and a table in it, but now I need to know the best way to get stuff back from it.
How would I go about retrieving all the information in the table? It's very important that I be able to access each row in the order that it is in the table. What I want to do (if this helps) is get all the info from the various fields in a single row, put all that into one object, and then store the object in an array, and then do the same for the next row, and the next, etc. At the end, I should have an array with the same number of elements as I have rows in my sql table. Thank you.
My SQL is rusty, but I think you can use SELECT * FROM myTable and then iterate through the results. You can also use a LIMIT/OFFSET(1) structure if you do not want to retrieve all elements at one from your table (for example due to memory concerns).
(1) Note that this can perform unexpectedly bad, depending on your use case. Look here for more info...
How would I go about retrieving all the information in the table? It's
very important that I be able to access each row in the order that it
is in the table.
That is not how SQL works. Rows are not kept in the table in a specific order as far as SQL is concerned. The order of rows returned by a query is determined by the ORDER BY clause in the query, e.g. ORDER BY DateCreated, or ORDER BY Price.
But SQLite has a rowid virtual column that can be used for this purpose. It reflects the sequence in which the rows were inserted. Except that it might change with a VACUUM. If you make it an INTEGER PRIMARY KEY it should stay constant.
order by rowid