how to implement number of views of a particular page - mongodb

So basically I want to implement the same functionality as StackOverflow's:
viewed 59344 times
So here is some background information:
I want to count only unique visits. The assumption that registered users will read the article many times (it is evolving)
I use MongoDB as a store
I would like it to be close to real-time
My system will have a registration, but I want to count the views of anonymous users as well
I understand that the best way to count unique visits is through registration, but the thing is that a big chunk of users will be just passive readers who do not need to create an account to read the information from the application. As far as I understand, the most convenient way is to save the IP address of every user, who reads the post. I also understand that IP addresses will not provide uniqueness (some different users will have the same IP, because they are behind the same ISP and one user can have different IPs, by using proxies, tor, etc)
The use of Mongo is not absolutely essential, just the thing is that everything is written in Mongo right now, so I will switch only if it will be much faster/convenient.

Background
Are you certain you need to track "unique" views?
I actually wouldn't expect popular sites to try to keep the view counts unique - bigger is better and re-visits for new comments are still additional "views" in the the sense of showing new content/comments/ads. There are other possible subtleties to "correctness" that may or may not be important for your use case, such as excluding crawlers or your own company's users/IPs.
Instead of spending time tracking unique views (which isn't overly meaningful), I would look at counting unique user interactions such as voting/liking/commenting on the page. You can then determine "popularity" of a page with some formula based on those metrics. There is an interesting example of this approach in the Radioactivity module for Drupal, where a "hotness" metric is calculated based on activity based on recency of user interactions.
Approaches to consider
1) For a simple view counter in MongoDB, I would just use $inc to bump up the view count when the page is loaded. You can exclude logging users by role as needed (for example admin users).
2) For a more accurate view counter I would pass off the problem to a web analytics platform (which you should be using with your site for more detailed analysis anyway). For example, you can use Google Analytics API or an open source application like Piwik. Web analytics systems already have solutions in place for determining unique users/views, and the API calls for these can be asynchronous via JavaScript.
3) If implementing your own unique view tracking a definite requirement, I would use a separate collection for tracking views and upsert based on your uniqueness criteria (unique view per user,article pair for registered users or session_id,article pair for anon users). I would combine this with approach #1 (incrementing a view counter for the article views) by incrementing a counter of article views if the upsert results in an insert.

One of the way that you can solve the problem is using the cookies , once a user has visited the page , you can have one cookie added saying that he is already visited the page and you do not need to count him again. You can keep on appending some key to know what all pages he had visited. I know cookies can be deleted but in any solution there will be tradeoff.
From the mongoDB prospective , if you want very fast insert and read , i would suggest couple of things you can do.
1) As you create a article , create a document like this in your may be log collection
{"_id" : "Article URL" , {"Hit" : 0}}
Why i am not suggesting to add IP address or any other information because , as you will add IP addresses , the size of the document going to change mongoDB need to find new allocated space. Which is bad from performance angle. As you are only incrementing the counter it will not increase the size of the document and it will no need to change it place. + You have limitation on the maximum size of the document you can have.
2) Creating document in advance will give direct update statement and no worry to check for the existence of the document for the article Id or not.

Related

Pagination and listing in APIs

I wanna ask you about lists and pagination in APIs
I want to build a long list in home screen that's mean this request will have a lot of traffic because it's the main screen and I want to build it in a good way to handle the traffic
After I searched about the way of how I gonna implement it
Can I depend on postgresql in pagination ? Or I need to use search engine like solr
If I depend on the database and users started to visit the app, then this request gonna submit a lot of queries on the database is this gonna kill the database ?
Also I'm using Redis to Cache Some data and this gonna handle some traffic but the problem with home screen the response it too large and I can't cache all of this response in one key in Redis
Can anyone explain to me what is the best way to implement this request for pagination .. the only thing I want is pagination I'm not looking to implement a full text search but to handle the traffic I read that search engine will handle it to not affect the database or kill it
Thanks a lot :D
You can do this seamlessly with the pagination technology we know in PostgreSQL. PostgreSQL has enough functions and capabilities to do this. (limit, offset, fetch)
But let me give you a recommendation.
There are several types of pagination.
The first type is that the count of pages must be known in advance. This technology is outdated and is not recommended. Because at this time you need to know the count of records in the table. But calculating count of records is a very slowing process, mainly in large tables.
The second type is that the number of pages is not known in advance. Information from the next page is brought in parts only if necessary. Just like Google, LinkedIn and other big companies use it. In this case, it is not necessary to calculate the count of any table.

Best way of passing object information throw pages

This question probably already exist but its too specific and hard to seach for it.
So, imagine that we have a ecommerce application.
On page 1 we have a list of products. And when its tapped, we go to a page 2, where it holds more information about the product that you just tapped for. Pretty much like any other ecommerce out there.
Which one of these two situations are better:
When one product is tapped, we pass via arguments all the informations about this product to the page 2. Then, no requests to the database is necessary.
When one product is tapped, we pass its ID only, then we need to do a request to get this product information from database.
You might think its obvious that the option 1 is better, but with option 2 we pretty much guarantee that all product informations have the last update from database, because the owner might change the product price milliseconds after you just clicked.
image describing user interaction
I would go for the second option most of the time.
As you already said, the information will always be up to date. Also if you request all the products with all of their information it creates quite some overhead, depending on the size of the product page and the information about the different products. Another thing is updating the information live. Maybe you'll decide to add a Stream later on, updating the information while the user is on the product page. Querying each product will make that easier as well.
If you can afford the resources of requesting the product every time and your process isn't too expensive it's in my opinion the better option.

Microservices - Storing user data in separate database

I am building a microservice that has two separate services: a user service and a comments service. The user service stores the user details like email, first/last name job title, etc, and the comments service stores all comments made by the user.
In the UI, I need to populate the comments (via a REST API) and show the first/last name, email, and job title of the user.
Is it recommended that we store all these user details in the comments database?
If yes, then every time a user changes their details first/last name or job title then I will have to update their details in all the comments (I don't think this is a good idea )
If no, then if I store just the userid in the comments DB, how am I supposed to get the user details for each comment? Let's say we want to show 20 comments per page in the UI.
First, challenge architecture. Let's assume that the both services in the question are part of a larger ecosystem of microservices that all make use of the user information. Else separation will most certainly be overengineered. But from the word "comments" we can at least guess that there is at least one other class of objects, that is the things being commented. So let's assume a "user service" is a meaningful crumb to break out into a microservice, because at least some other crumbs get the necessary weight to justify the microservice breakup.
In that case I suggest the following strategy:
Second, implement an abstraction layer into your comments service right away so that most of the code will not have to care about where the user comes from (i.e. don't join or $lookup). This is also a great opportunity for local testing, because you can just create a collection with the data you need and run service level integration tests against it.
Third, for integration with the user service, get the data from there via API (which should support bulk data selection in any case) every time you need it. Because you have the abstraction layer, you can add caching, cache timeout and displacement strategies and whatever you may need below this abstraction without caring in the main portion of the code. Add such on an as needed basis. Keep it simple.
Fourth, when things really go heavyweight and you have to care with tens of thousands of users, tons of comments and many requests per second the comments service could, still below the abstraction, implement an upfront replication pattern to get the full user database locally. This will usually be done based on an asynchronous message being sent by the user service to all subscribers when something changes in te user base. When it suits the subscribers (i.e. the comment service), they can trigger full or (from time to time) delta replication of the changes. Suitable collections will be already in place from what you did for caching. And it will probably be considerably less info you need in the comments service, than is stored in user service (let alone the hashed password, other login options or accounting information).
Fifth, should you still hit performance challenges, you can break the abstraction for the few cases you need to and do the join or $lookup.
Follow the steps in order, and stop as soon as the overall assembly works fine. Every step adds considerable complexity, and when you don't need it, don't implement it.

Proper state management architecture to implement read/unread of items

Context: We are implementing a news app. For now, you can assume the news to be the same across all users, and maintains an order based on the parameters we set (according to trends, and date).
Problem: We are not sure what the best implementation for keeping track of what users read is. We want to be able to configure a way in which we can track what users read and what they didn’t.
Assumption: You can assume that the posts in the database are in a descending order, based on time.
So, the ideal scenario is that: when there are posts: A,B,C,D,E fetched from the server in the app, and the user read A,B. Now the user only gets to see C,D,E when they check for next posts. If they do previous, they see posts in the following order B-> A.
Furthermore, when P,Q is added to the database, now, the user must see next posts in the order of P->Q->C->D->E and so on.
Example: Let us assume there are 20 news in our app right now, and Gavin picks up his phone and starts reading from our app. In midst of his usage, he finds himself occupied with some other work, so quits the app after reading 5 news posts.
The challenge for us now is to figure the best way to make sure Gavin doesn’t have to re-read the 5 posts he already did.
One way we thought we could solve this problem is through use of index. We can assume uniform ordering for our posts as mentioned in the context, so we could use an index to track where Gavin was last in the order of news and show him news based on that index.
However, one problem with that approach is, we could easily have 5 new posts when Gavin picks up his phone and uses our app again. So, if we have the news based on date, technically that indexing approach means that we omit 5 unread new posts instead of the 5 read old ones.
We've also thought of maintaining three lists: Read, Unread and New so that we fetch only posts that are not in our lists. For example, in my initial example: A-B-C-D-E is in unread initially. Then, after user reads A-B, read becomes A-B. Meanwhile, when P-Q is added in the database, P-Q is added to the list of unread posts as P-Q-C-D-E.
How do you solve this problem? Any suggestions are welcome as we kind of think we're not thinking out of box when it comes to a solution for the problem. Thank you! :)
As i first read problem the solution ends up in my mind is also having 2 different list read unread and new ones are added to end of unread ones and unread list is shown in reverse order so most recent ones are on the top. However is it the most efficient way? Discussible. For example if number of new number increases a lot, then will be memory inefficient. But i assume small numbers in general.

Access Record record - show lock status

I use a datasheet view of a query with aggregate sub queries attached as fields. Of course this is not editable and that is fine as its merely an overview listing of all the records along with some sum information from related tables. I have noticed that when a query is not editable the record selector lock information is not displayed. This made me wonder.
Is there is some event that can be captured to display in more or less real time when a record is locked or released by other users?
Alternatively is there any other way to display in my overview list or elsewhere what records are currently locked and if possible by what user?
Access 2010(x64)
For an updatable query, the locked status may be displayed on the left margin as you have noted. But that reflects record-locking by the query engine, not the same thing as whether a data result is updateable under normal circumstances.
For a read-only query, Access won't show a lock icon because in that context it isn't useful information (from most people's point of view).
You could use VBA to check the attribute of the query as a whole, and display a notification when the form is loaded. But that doesn't relate to the record-locking icon.
Is there is some event that can be captured to display in more or less real time when a record is locked or released by other users? -- I believe the simple answer is no.
Access 2007 saw the end of the JET Security model, so there is no way for you to manage user-level security in files created using 2007 or later.
The only alternative would be to use the Win API to register users by their NT ids, and to develop your own model which responded to activity. Clearly this would be no mean feat!
[Edit]
As for detecting record locks, it's possible you could implement this using an events handler class together with the ADO library:
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms678373%28v=vs.85%29.aspx
If you don't mind getting your hands dirty with Class Modules (something some pundits never got to grips with), then you can find a lead-in here.