Pagination and listing in APIs - postgresql

I wanna ask you about lists and pagination in APIs
I want to build a long list in home screen that's mean this request will have a lot of traffic because it's the main screen and I want to build it in a good way to handle the traffic
After I searched about the way of how I gonna implement it
Can I depend on postgresql in pagination ? Or I need to use search engine like solr
If I depend on the database and users started to visit the app, then this request gonna submit a lot of queries on the database is this gonna kill the database ?
Also I'm using Redis to Cache Some data and this gonna handle some traffic but the problem with home screen the response it too large and I can't cache all of this response in one key in Redis
Can anyone explain to me what is the best way to implement this request for pagination .. the only thing I want is pagination I'm not looking to implement a full text search but to handle the traffic I read that search engine will handle it to not affect the database or kill it
Thanks a lot :D

You can do this seamlessly with the pagination technology we know in PostgreSQL. PostgreSQL has enough functions and capabilities to do this. (limit, offset, fetch)
But let me give you a recommendation.
There are several types of pagination.
The first type is that the count of pages must be known in advance. This technology is outdated and is not recommended. Because at this time you need to know the count of records in the table. But calculating count of records is a very slowing process, mainly in large tables.
The second type is that the number of pages is not known in advance. Information from the next page is brought in parts only if necessary. Just like Google, LinkedIn and other big companies use it. In this case, it is not necessary to calculate the count of any table.

Related

Find out since how long the cache has been stored in IndexDB?

I am using a web application for doing data entry which has a mechanism for storing the data entry form (which is an html form) in the browser cache IndexDB.
I am able to see the form in the browser dev tool like so :
I want to know for how long the Index DB will be able to store the form in the browser? Is it possible that it is months since the browser cache was same? Will closing the browser clear the keys? or is this persistent enough storage to last for a few months?
Is it possible to find out when(the exact date or time) the cache entry was made in the IndexDB?
I am asking this because I suspect some discperancy in the form for some of our users as the data being sent is a little different than expected.
Any help would is appreciated.
Thanks
DHIS2, the application you are referring to, has an application you and other users can use to clear any cached data. This app is named "Browser Cache Cleaner", and gives you a list of different things to clear. I would try this app and see if your users still have these issues.
Databases don't expose the timestamp of when the database record was last modified. That's something the developer needs make the application to store in the database records. For example, one could have created_at and modified_at columns to track when the record was created and when was it last modified.
IndexedDB is a persistent client storage API, so yes, data will stay permanently unless the user clears the browser's cache.
If there is some discrepancy in the form being sent, I would look at the caching strategy. Offline data caching is a pretty broad topic (also I don't know much about your application), but Google's Offline Cookbook is a good place to start digging in this topic, as long as caching strategies for your use.

how to implement number of views of a particular page

So basically I want to implement the same functionality as StackOverflow's:
viewed 59344 times
So here is some background information:
I want to count only unique visits. The assumption that registered users will read the article many times (it is evolving)
I use MongoDB as a store
I would like it to be close to real-time
My system will have a registration, but I want to count the views of anonymous users as well
I understand that the best way to count unique visits is through registration, but the thing is that a big chunk of users will be just passive readers who do not need to create an account to read the information from the application. As far as I understand, the most convenient way is to save the IP address of every user, who reads the post. I also understand that IP addresses will not provide uniqueness (some different users will have the same IP, because they are behind the same ISP and one user can have different IPs, by using proxies, tor, etc)
The use of Mongo is not absolutely essential, just the thing is that everything is written in Mongo right now, so I will switch only if it will be much faster/convenient.
Background
Are you certain you need to track "unique" views?
I actually wouldn't expect popular sites to try to keep the view counts unique - bigger is better and re-visits for new comments are still additional "views" in the the sense of showing new content/comments/ads. There are other possible subtleties to "correctness" that may or may not be important for your use case, such as excluding crawlers or your own company's users/IPs.
Instead of spending time tracking unique views (which isn't overly meaningful), I would look at counting unique user interactions such as voting/liking/commenting on the page. You can then determine "popularity" of a page with some formula based on those metrics. There is an interesting example of this approach in the Radioactivity module for Drupal, where a "hotness" metric is calculated based on activity based on recency of user interactions.
Approaches to consider
1) For a simple view counter in MongoDB, I would just use $inc to bump up the view count when the page is loaded. You can exclude logging users by role as needed (for example admin users).
2) For a more accurate view counter I would pass off the problem to a web analytics platform (which you should be using with your site for more detailed analysis anyway). For example, you can use Google Analytics API or an open source application like Piwik. Web analytics systems already have solutions in place for determining unique users/views, and the API calls for these can be asynchronous via JavaScript.
3) If implementing your own unique view tracking a definite requirement, I would use a separate collection for tracking views and upsert based on your uniqueness criteria (unique view per user,article pair for registered users or session_id,article pair for anon users). I would combine this with approach #1 (incrementing a view counter for the article views) by incrementing a counter of article views if the upsert results in an insert.
One of the way that you can solve the problem is using the cookies , once a user has visited the page , you can have one cookie added saying that he is already visited the page and you do not need to count him again. You can keep on appending some key to know what all pages he had visited. I know cookies can be deleted but in any solution there will be tradeoff.
From the mongoDB prospective , if you want very fast insert and read , i would suggest couple of things you can do.
1) As you create a article , create a document like this in your may be log collection
{"_id" : "Article URL" , {"Hit" : 0}}
Why i am not suggesting to add IP address or any other information because , as you will add IP addresses , the size of the document going to change mongoDB need to find new allocated space. Which is bad from performance angle. As you are only incrementing the counter it will not increase the size of the document and it will no need to change it place. + You have limitation on the maximum size of the document you can have.
2) Creating document in advance will give direct update statement and no worry to check for the existence of the document for the article Id or not.

Caching repeating query results in MongoDB

I am going to build a page that is designed to be "viewed" alot, but much fewer users will "write" into the database. For example, only 1 in 100 users may post his news on my site, and the rest will just read the news.
In the above case, 100 SAME QUERIES will be performed when they visit my homepage while the actual database change is little. Actually 99 of those queries are a waste of computer power. Are there any methods that can cache the results of the first query, and when they detect the same query in a short time, can deliver the cached result?
I use MongoDB and Tornado. However, some posts say that the MongoDB does not do caching.
Making a static, cached HTML with something like Nginx is not preferred, because I want to render a personalized page by Tornado each time.
I use MongoDB and Tornado. However, some posts say that the MongoDB does not do caching.
I dunno who said that but MongoDB does have a way to cache queries, in fact it uses the OS' LRU to cache since it does not do memory management itself.
So long as your working set fits into the LRU without the OS having to page it out or swap constantly you should be reading this query from memory at most times. So, yes, MongoDB can cache but technically it doesn't; the OS does.
Actually 99 of those queries are a waste of computer power.
Caching mechanisms to solve these kind of problems is the same across most techs whether they by MongoDB or SQL. Of course, this only matters if it is a problem, you are probably micro-optimising if you ask me; unless you get Facebook or Google or Youtube type traffic.
The caching subject goes onto a huge subject that ranges from caching queries in either pre-aggregated MongoDB/Memcache/Redis etc to caching HTML and other web resources to make as little work as possible on the server end.
Your scenario, personally as I said, sounds as though you are thinking wrong about the wasted computer power. Even if you were to cache this query in another collection/tech you would probably use the same amount of power and resources retrieving the result from that tech than if you just didn't bother. However that assumption comes down to you having the right indexes, schema, set-up etc.
I recommend you read some links on good schema design and index creation:
http://docs.mongodb.org/manual/core/indexes/
https://docs.mongodb.com/manual/core/data-model-operations/#large-number-of-collections
Making a static, cached HTML with something like Nginx is not preferred, because I want to render a personalized page by Tornado each time.
Yea I think by trying to worry about query caching you are pre-maturely optimising, especially if you don't want to take off, what would be 90% of the load on your server each time; loading the page itself.
I would focus on your schema and indexes and then worry about caching if you really need it.
The author of the Motor (MOngo + TORnado) package gives an example of caching his list of categories here: http://emptysquare.net/blog/refactoring-tornado-code-with-gen-engine/
Basically, he defines a global list of categories and queries the database to fill it in; then, whenever he need the categories in his pages, he checks the list: if it exists, he uses it, if not, he queries again and fills it in. He has it set up to invalidate the list whenever he inserts to the database, but depending on your usage you could create a global timeout variable to keep track of when you need to re-query next. If you're doing something complicated, this could get out of hand, but if it's just a list of the most recent posts or something, I think it would be fine.

Realtime backend platform for reporting / dashboards?

I will build a dashboard system for my apps, where a page will have several widgets that draw charts, tables and glyphs representing potentially unrelated data.
The client will be HTML5 and I can push for only modern web browser.
My big problem is what backend use for this. I want to store "tables" for use in the charts and in real-time update the widgets.
For example, a invoicing widget will show how much $$ have been collected today. In the "table" will have a row for each total of the invoice:
inv = 1; total = 50
Total: 50
and the widget will draw that. When new data is pushed:
inv = 2; total = 100
Total: 150
The widget will show in realtime the total to the end-user.
The data is private for the user company. Eventually I will need to purge too old data (ie: I only need to keep as much data is necessary to proper evaluation of the info need for the end-user. For example, only keep 1 month of invoicing totals).
I'm thinking in use something like http://www.firebase.com/ or http://pusher.com/ but I suspect only solve the "notify in realtime" part of the equation. As far as I understand, they not let me get past data (ie: If the data is update in the weekend and the user open his dashboard to see what happened)
Then I see http://derbyjs.com/ and the possibility to use mongodb.
I wonder which backend/platform will bring me closer to the build of this system. I have experience with python/django/.net/postgress but could accept the use of something else if solve best this kind of app behavior.
Firebase offers both the "notify in relatime" part that you mention, as well as persistent data storage. Take a look at the tutorial, which walks you through building a real-time persisted chat app (the past chat messages are stored in Firebase and are sent back to the client every time you reload). And you can do much more complicated stuff like the real-time charts / widgets that you mention as well.
The big limitation with Firebase right now is that we're in closed beta and the data is currently unprotected (anybody can read and write your data). The security features are coming soon though.
Some other backend platforms you may want to evaluate are: Meteor and Simperium. Firebase and Simperium are cloud services where your data is stored in the cloud and you don't have to manage any servers of your own, while Meteor and DerbyJS are platforms that you have to install and run on your own server.
I would recommend signalR. It's amazing and you can literally do anything with it. Check it out: www.signalr.net and if you have any problems simply go to www.jabbr.net You will find a very helpful community there. I implemented a notification mechanism similar to facebook together with real time monitoring and a small chat in the same web site.

Cloudant / CouchDB "pull" replication for 600+ documents to iPhone

I'm using Cloudant and I'm struggling to pull/replicate 600 documents from server to my iPhone. First, it's pretty slow because it has to go one-document-at-a-time, and Second Cloudant was giving me "timeouts" after the 100th-or-so REST request. (I have a ticket with Cloudant for this one, as it's unacceptable!)
I was wondering if anyone has found a way / hack to "bulk" replicate when pulling. I was thinking, perhaps it's possible to "zip up" all of the changes, send them in one file, and fast-forward the iPhone database to the last-change seq.
Any helps is great -- thanks!
Can you not hit _all_docs?include_docs=true to get everything in one shot? http://wiki.apache.org/couchdb/HTTP_Document_API#all_docs
I don't know couchcoccoa but it looks like the API supports this: http://couchbaselabs.github.com/CouchCocoa/docs/interfaceCouchDatabase.html#a49d0904f438587b988860891e8049885
Actually, why not make a view. Make a view that gives you your list and make sure your id is there. With your id, you can then go to the document and get all the rest of the required information that you need in order to update it if you need to.
There really is no reason you would ever need to hit every document individually. They have views and search2.0 for that. Keep in mind you are using a cloud based technology. This stuff is not sitting in your basement, you can't just hit it a million times per device in a few seconds and expect anyone to not notice and/or get upset (an exaggeration, yes I know).
What I do not understand is that you are trying to replicate it to an iPhone? Are you running apache and couchdb in your app? Why not just read the JSON data and throw it into a database. or just throw it into a file if it updates that much and keep overwriting it. There is so many options that are a whole lot less messy.