If I am using Varnish to cache my entire documents, by what mechanism would you advise I increment a page view count as well.
For example, lets supose that I have an auction listing, such as ebay, and I would like to cache the entire page since I know it is never going to change.
How would you then increase the page view count of this listing.
Lets say that my application is running from Zend Framework.
Would it be correct to make an ESI (Edge Side Include) to a node.js server which increments a page view count in Redis?
I'm looking for something that wil be 100% supported and will yielf accurate page view request numbers. (I'm not concerned about duplicate requests either, I"ll handle that in my application logic to prevent one IP from nuking the page view count).
I would separate your statistics logic from your application. Use a small piece of javascript that requests a resource with a unique timestamp (e.g. an image like /statistics?pageId=3&ts=234234249). You can cache your complete page (no need to bother with ESI) and have the statistics handled by a fast (multiplexing) server like node.js, netty, tornado.
If you need the pageCount in your page, request a small piece of javascript/json data instead of an image and update the DOM in javascript.
This way, you can log better statistics (e.g. dimensions of the page), you minimize traffic and keep statistics a separate concern.
Related
I wanna ask you about lists and pagination in APIs
I want to build a long list in home screen that's mean this request will have a lot of traffic because it's the main screen and I want to build it in a good way to handle the traffic
After I searched about the way of how I gonna implement it
Can I depend on postgresql in pagination ? Or I need to use search engine like solr
If I depend on the database and users started to visit the app, then this request gonna submit a lot of queries on the database is this gonna kill the database ?
Also I'm using Redis to Cache Some data and this gonna handle some traffic but the problem with home screen the response it too large and I can't cache all of this response in one key in Redis
Can anyone explain to me what is the best way to implement this request for pagination .. the only thing I want is pagination I'm not looking to implement a full text search but to handle the traffic I read that search engine will handle it to not affect the database or kill it
Thanks a lot :D
You can do this seamlessly with the pagination technology we know in PostgreSQL. PostgreSQL has enough functions and capabilities to do this. (limit, offset, fetch)
But let me give you a recommendation.
There are several types of pagination.
The first type is that the count of pages must be known in advance. This technology is outdated and is not recommended. Because at this time you need to know the count of records in the table. But calculating count of records is a very slowing process, mainly in large tables.
The second type is that the number of pages is not known in advance. Information from the next page is brought in parts only if necessary. Just like Google, LinkedIn and other big companies use it. In this case, it is not necessary to calculate the count of any table.
I have a requirement where we need to show around 24k records which has 84 cols in one go, as user wants filtering on entire set of data.
So can we have virtual scrolling mechanism with ag-grid without lazy loading?? If so could you please here. Any example are most welcome for reference.
Having tried this sort of thing with a similar number of rows and columns, I've found that it's just about impossible to get reasonable performance, especially if you are using things like "framework" renderers. And if you enable grouping, you're going to have a bad time.
What my team has done to enable filtering and sorting across an entire large dataset includes:
We used the client-side row model - the grid's simplest mode
We only load a "page" of data at a time. This involves trial and error with a reasonable sample of data and the actual features that you are using to arrive at the maximum page size that still allows the grid to perform well with respect to scrolling / rendering.
We implemented our own paging. This includes display of a paging control, and fetching the next/previous page from the server. This obviously requires server-side support. From an ag-grid point of view, it is only ever managing one page of data. Each page gets completely replaced with the next page via round-trip to the server.
We implemented sorting and filtering on the server side. When the user sorts or filters, we catch the event, and send the sort/filter parameters to the server, and get back a new page. When this happens, we revert to page 0 (or page 1 in user parlance).
This fits in nicely with support for non-grid filters that we have elsewhere in the page (in our case, a toolbar above the grid).
We only enable grouping when there is a single page of data, and encourage our users to filter their data to get down to one page of data so that they can group it. Depending on the data, page size might be as high as 1,000 rows. Again, you have to arrive at page size on a case-by-case basis.
So, in short, when we have the need to support filtering/sorting over a large dataset, we do all of the performance-intensive bits on the server side.
I'm sure that others will argue that ag-grid has a lot of advanced features that I'm suggesting that you not use. And they would be correct, for small-to-medium sized datasets, but when it comes to handling large datasets, I've found that ag-grid just can't handle it with reasonable performance.
I am going to build a page that is designed to be "viewed" alot, but much fewer users will "write" into the database. For example, only 1 in 100 users may post his news on my site, and the rest will just read the news.
In the above case, 100 SAME QUERIES will be performed when they visit my homepage while the actual database change is little. Actually 99 of those queries are a waste of computer power. Are there any methods that can cache the results of the first query, and when they detect the same query in a short time, can deliver the cached result?
I use MongoDB and Tornado. However, some posts say that the MongoDB does not do caching.
Making a static, cached HTML with something like Nginx is not preferred, because I want to render a personalized page by Tornado each time.
I use MongoDB and Tornado. However, some posts say that the MongoDB does not do caching.
I dunno who said that but MongoDB does have a way to cache queries, in fact it uses the OS' LRU to cache since it does not do memory management itself.
So long as your working set fits into the LRU without the OS having to page it out or swap constantly you should be reading this query from memory at most times. So, yes, MongoDB can cache but technically it doesn't; the OS does.
Actually 99 of those queries are a waste of computer power.
Caching mechanisms to solve these kind of problems is the same across most techs whether they by MongoDB or SQL. Of course, this only matters if it is a problem, you are probably micro-optimising if you ask me; unless you get Facebook or Google or Youtube type traffic.
The caching subject goes onto a huge subject that ranges from caching queries in either pre-aggregated MongoDB/Memcache/Redis etc to caching HTML and other web resources to make as little work as possible on the server end.
Your scenario, personally as I said, sounds as though you are thinking wrong about the wasted computer power. Even if you were to cache this query in another collection/tech you would probably use the same amount of power and resources retrieving the result from that tech than if you just didn't bother. However that assumption comes down to you having the right indexes, schema, set-up etc.
I recommend you read some links on good schema design and index creation:
http://docs.mongodb.org/manual/core/indexes/
https://docs.mongodb.com/manual/core/data-model-operations/#large-number-of-collections
Making a static, cached HTML with something like Nginx is not preferred, because I want to render a personalized page by Tornado each time.
Yea I think by trying to worry about query caching you are pre-maturely optimising, especially if you don't want to take off, what would be 90% of the load on your server each time; loading the page itself.
I would focus on your schema and indexes and then worry about caching if you really need it.
The author of the Motor (MOngo + TORnado) package gives an example of caching his list of categories here: http://emptysquare.net/blog/refactoring-tornado-code-with-gen-engine/
Basically, he defines a global list of categories and queries the database to fill it in; then, whenever he need the categories in his pages, he checks the list: if it exists, he uses it, if not, he queries again and fills it in. He has it set up to invalidate the list whenever he inserts to the database, but depending on your usage you could create a global timeout variable to keep track of when you need to re-query next. If you're doing something complicated, this could get out of hand, but if it's just a list of the most recent posts or something, I think it would be fine.
So we run a downline report. That gathers everyone in the downline of the person who is logged in. Some people of clients run this with no problem as it returns less than 100 records.
Some people of clients however returns 4,000 - 6,000 rows which comes out to be about 8 MB worth of information. I actually had to up my buffer limit on my development machine to handle the large request.
What are some of the best ways to store this large piece of data and help prevent it from being run multiple times consecutively?
Can it be stored in a cookie?
Session is out of the question as this would eat up way to much memory on the server.
I'm open to pretty much anything at this point, trying to better streamline the old process into a much quicker efficient one.
Right now what is done, is it loads the entire recordset, it loops through the recordset building out the data into return_value cells.
Would this be better to turn into a jquery/ajax call?
The only main requirements are:
classic asp
jquery/javascript
T-SQL
Why not change the report to be paged? Phase 1: run the entire query, but the page only displays the right set of rows based on selected page. Now your response buffer problem is fixed. Phase 2: move the paging into the query using Row_Number(), now your database usage problem is fixed. Phase 3: offer the user an option of "display to screen" (using above) or "export to csv" where you can most likely export all the data, since csv is nice and compact.
Using a cookie seems unwise, given the responses to the question What is the maximum size of a web browser's cookie's key?.
I would suggest using ASP to create a file on the Web server and writing the data to that file. When the user requests the report, you can then determine if "enough time" has passed for it to be worth running the report again, or if the cached version is sufficient. User's login details could presumably be used for naming the file, or the Session.SessionID, or you could store something new in the user's session. Advantage of using their login would be that your cache of the report can exist longer than a user's session.
Taking Brian's Answer further, query page count, which would be records returned / items per page rounded up. Then join the results of every page query on client side. Pages start at a offset provided through the query. Now you have the full amount on the client without overflowing your buffer. And it can be tailored to an interface and user option (display x per page).
We're using Crystal 11 through their webserver. When we run a report, it does the Sql query and displays the first page of the report in the Crystal web reportviewer.
When you hit the next page button, it reruns the Sql query and displays the next page.
How do we get the requerying of the data to stop?
We also have multiple people running the same reports at the same time (it is a web server after all), and we don't want to cache data between different instances of the same report, we only want to cache the data in each single instance of the report.
The reason to have pagination is not only a presentation concern. With pagination the single most important advantage is lazy loading of data - so that in theory, depending on given filters, you load only what you need.
Just imagine if you have millions of records in your db and you load all of them. First of all is gonna be a hell of a lot slower, second you're fetching a lot of stuff you don't really need. All the web models nowadays are based on lazy loading rather than bulk loading. Think about Google App Engine: you can't retrieve more than 1000 records in a given transaction from the Google Datastore - and you know that if you'll only try and display them your browser will die.
I'll close with a question - do you have a performance issue of any kind?
If so, you probably think you'll make it better but it's probably not the case, because you'll reduce the load on the server but each single query will be much more resource consuming.
If not my advice is to leave it alone! :)