Speedup database on webpage viewcount - mongodb

How to optimize the viewcount calculation on mongoDB?
We have an huge number of almost static pages apart from the viewcount. We've tried to calculate it from log without triggering DB operation when users are viewing the webpage, and process the log during easy hours. Is a more elegant way to optimize this viewcount calculation?

You could use Google Analytics or something similar to do it for you. Plus you'd get a whole lot of other useful metrics.

Related

How can I handle a very large database and do not miss the performance?

if i want to develop an application, I'm worried about its performance after the number of users and stored data increases.
actually I don't know what is the best way to implement a program that it works with a really large data and do some things like search in it, find and receive user information, search text and so on in real time without any delay !
Let's me explain the problem more
for example i have chosen 'Mongodb' as a database and suppose we have at least five million users and a user want to log in into the system, the user has sent the username and password
The first thing that we should do is to find the user with that username and then check the password, in mongodb we should use something like 'find' method to get the user's information, something like below:
Users.find({ username: entered_username })
then get the user information and we check the password
but the 'find' method should search the username between million users and it's a large number and if any person request for authentication, this method should be run for each of them and it cause a heavy processing on the system
but unfortunately this problem is only for something like finding a user, if we decide to search a text when we have a lot of texts and posts on the database the problem is more bigger
i don't know how big companies like facebook and linkedin search through millions of data in such a short span of time. actually i don't want to create something like facebook or more but i have a large amount of data and i'm looking for a good way to handle it
is there any framework or something else that help me to handle large data on the databases or is there exist a method to implement data on database so that we search and find data fast and quickly? should i use a particular data structure?
i founded an opensource project elasticsearch that it help us to search faster but i don't know if i found something with elastic how can i find it on mongodb too for doing something like updating data and if i use elastic search i should use mongodb too or not!? can i use elastic as a database and as a search engine simultaneous !?
if i use elasticsearch and mongodb together then i should have two copies of my data, one in mongodb and one in elasticsearch!? and this two copies of the data that are separated :( i wish elasticsearch search in the mongodb that does not have to create two copies of the data
thank you if you help me to find out a good way and understand what should i do.
When you talk about performance, it usually boils down to three things:
Your design
Your definition of "quick", and
How much you're willing to pay
Your design
MongoDB is great if you want to iterate on your data model, can scale horizontally, and very quick if used properly. Elasticsearch on the other hand, is not a database. However, it is very quick for searching. A traditional relational database will be useful if you know exactly how your data looks like, and don't expect it to change much, or is relational by nature.
You can, for example, use a relational database for user login, use MongoDB for everything else, and use Elastic for textual, searchable data. There is no rule that tells you to keep everything within a single database.
Make sure you understand indexing, and know how to utilize it to its fullest potential. The fastest hardware will not help you if you don't design your database properly.
Conclusion: use any tool you need, combine if necessary, but understand their strengths and weaknesses.
Your definition of "quick"
How "quick" is quick enough for your application? Is 100ms quick enough? Is 10ms quick enough? Remember that more performance you ask of the machine, more expensive it will be. You can get more performance with a better design, but design can only go so far.
Usually this boils down to what is acceptable for you and your client. Not every application needs a sub-10ms response time. There's plenty of applications that can tolerate queries that return in seconds.
Conclusion: determine what is acceptable, and design accordingly.
How much you're willing to pay
Of course, it all depends on how much you're willing to pay for all the hardware that need to host all that stuff. MongoDB might be open source, but you need some place to host it. Also, you cannot expect magic. You can't throw thousands of queries and updates per second, and expect it to be blazing fast when you only give it 1 GB of RAM.
Conclusion: never under-provision to save money if you want your application to be successful.

Implement interval analysis on top of PostgreSQL

I have a couple of millions entries in a table which start and end timestamps. I want to implement an analysis tool which determines unique entries for a specific interval. Let's say between yesterday and 2 month before yesterday.
Depending on the interval the queries take between a couple of seconds and 30 minutes. How would I implement an analysis tool for a web front-end which would allow to quite quickly query this data, similar to Google Analytics.
I was thinking of moving the data into Redis and do something clever with interval and sorted sets etc. but I was wondering if there's something in PostgreSQL which would allow to execute aggregated queries, re-use old queries, so that for instance, after querying the first couple of days it does not start from scratch again when looking at different interval.
If not, what should I do? Export the data to something like Apache Spark or Dynamo DB and analysis in there to fill Redis for retrieving it quicker?
Either will do.
Aggregation is a basic task they all can do, and your data is smll enough to fit into main memory. So you don't even need a database (but the aggregation functions of a database may still be better implemented than if you rewrite them; and SQL is quite convenient to use.
Jusr do it. Give it a try.
P.S. make sure to enable data indexing, and choose the right data types. Maybe check query plans, too.

Using xmlpipe2 with Sphinx

I'm attempting to load large amounts of data directly into Sphinx from Mongo; and currently the best method I've found has been using xmlpipe2.
I'm wondering however if there are ways to just do updates to the dataset, as a full reindex of hundreds of thousands of records can take a while and be a bit intensive on the system.
Is there a better way to do this?
Thank you!
Main plus delta scheme. When all the updates goes to separate smaller index as described here:
http://sphinxsearch.com/docs/current.html#delta-updates

Statistics (& money transfers) with mongoDB

1) My first questions is regarding the best solution to store statistics with mongoDB
If i want to store large amounts of statistics (lets say visitors on a specific site - down to hourly), a noSQL DB like mongoDB seems to work very fine. But how do I structure those tables to get the most out of mongoDB?
I'd increase the visitor amount for that specific object id (for example SITE_MONTH_DAY_YEAR_SOMEOTHERFANCYPARAMETER) by one every time a user visits the page. But if the database gets big (>10g), doesnt that slow down (like it would on mysql) because it has to search for the object_id and update it? Is the data always accurate when I update it (afaik mongoDB does not have any table locking?)
Wouldnt it be faster just INSERTING one row for every visitor? (and more accurate) On the other hand, reading the statistics would be much faster with my first solution, wouldnt it? (especially in terms of "grouping" by site/date/[...]).
2) For every visitor counted I'd like to make a money transfer between two users. It is crucial that those transfers are always accurate. How would you achieve that?
I was thinking about a hourly cron that picks the amount of vistiors from the mongoDB.statistics for the last hour and updates the users balance. I'd prefer doing this directly/live while counting the user - but what happens if thousands of visitors are calling the script simultaneously, is there any risk of getting wrong balances?

iphone sdk sqlite lookup performance for +40k records

What is the best way to get this thing done:
I have a huge table with +40k records (tv show titles) in sqlite and I want to do real time lookups to this table. For eg if user searches for a show, as and when user enters search terms I read sqlite and filter records after every keystroke (like google search suggestion).
My performance benchmark is 100 milliseconds. A few things I have thought of are: creating indexes, splitting the data into multiple tables.
However, I would really appreciate any suggestions to achieve this in the fastest possible time so I can avoid any ui refresh delays - it would be awesome to have feedback from coders who have already done something similar.
Things to do:
Index fields appropriately.
Limit yourself to only 10-15 records on the initial query—that should be enough to populate the top of the table view.
If you don't need to sort, don't. If you do need to sort, sort on an indexed field.
Do as much as you can in SQLite rather than your own code.
Do as little as you can overall.
You'll likely find what I have: SQLite and the iPhone are actually amazingly capable as long as you don't do anything really dumb.
Keep "perceived performance" in mind - doing lookups right after a key is hit is could be somewhat expensive. How many milliseconds does it take a user to hit a key, though? You can probably get away with not updating the resultlist until the user hasn't typed anything for several hundred milliseconds. (For really fast users, perhaps update every X hundred millisecodns while he's still typing).
How do you know the performance will be bad? 40k rows is not that much, even for an iPhone... try it on the phone before you optimize.
Avoid doing any joins, try to use paging so that you keep the amount of data returned to a minimum. Perhaps you should try loading the whole thing into memory, then sort and do binary search? If it is just a list of show titles it would fit?