Best database for a Statistics System [closed] - mongodb

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I need to build a Statistics System but I don't know if MongoDB would be the best solution. The system needs to track couple of things and than display the information. For example of a similar thing - a site, and every user that first visits the site adds a row with information about him. The system needs to store the data as fast as possible, and, for example, it creats a chart of the growth of users viewing the page using Google Chrome. Also, if a user visits again, a field in the users's already row is updated (say a field called "Days").
The system needs to handle 200,000 new visits a day (new records), 20,000,000 users visits again (updates) a day, and 800,000,000 DB records. It needs also to output the data fast - for example, creating a chart of how much users visits each day from England, using Google Chrome, etc.
So what would be the best DB to handle this data? Would MongoDB handle this fine?
Thanks!

Mongodb allows atomic updates and scales very well. That's exactly what it's designed for. But keep in mind two things: beware the disk space, it may run out very quickly and if you need quick stats (like region coverage, traffic sources, etc.), you have to precompute them. The fastest way is to build a simple daemon for this that would keep all numbers in memory and save it hourly/daily.

Redis is a very good choice for it, provided you have a lot of RAM, or a strategy to shard the data over multiple nodes. it's good because:
it is in memory, so you can do real time analytics (I think bit.ly's real time stats use it). in fact, it was originally created for that.
it is very very fast, can do hundreds of thousands of updates a seconds with ease.
it has atomic operations.
it has sorted sets which are great for time series.

RDM Workgroup is a database management system for desktop and server environments and allows in-memory speed as well.
You can also use its persistence feature; where you manage data in-memory and then transfer that data on-disk when the application shuts down so there is no data loss.
It is based on the network model with an intuitive interface so its scalability is top-notch and will be able to handle the large load of new visitors that you will be expecting.

Related

Architecture: Creating a time filterable leaderboard system with mongoose

This question is more of a architecture based one.
I have a website which has 3 PVP (player vs player) games. Each game has it's own mongoDB collection
and it's documents have properties such as timestamp, amount won (points) and players involved.
I want to create a leaderboard system that retrieves data from all these 3 games, and shows who has won the most in a top 10 kind of style. This system will be accessed most likely through a HTTP end point. And I'd also like this leaderboard to be filterable by time: top 10 from the last week/month/year/all time
Problems:
As the user database has grown and more games have been created, Computing the table each time the end point is hit takes longer and longer. Page load times take a super long.
Initial Idea
Technologies
Mongoose, Express, Nuxt(Vue), Socket.io
I would suggest some sort of caching scheme. Two basic method I would consider:
Create a service that automatically tabulates the leaderboard and caches it or saves it to another mongo object. Clients are then served the cached version. This option is nice as it creates a historical record which could make for fun features in the future.
Cache the response in your express service and update it only on some frequency. As explained here: https://medium.com/the-node-js-collection/simple-server-side-cache-for-express-js-with-node-js-45ff296ca0f0 The risk with this method is if you have concurrent requests when the leaderboard is being generated it could hit your mongo server hard.
Without knowing all the details, I would go with the first option as it is immune to concurrent requests and could be extended in the future with some sort of historical leaderboard feature.
As for filtering, I'd recommend using the tables in vue-bootstrap. Data is easily represented in tables and sorting is built-in.

Best way to use API in app that has limited use? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My situation is that I am building an app (in swift) that pulls data from an API & displays them in a tableview (say for example up to 500 cells). The problem occurs with the API. It is limited to 200 calls/day and 6k/month, and one request is equal to 100 pieces of data, so to display 500 cells it would cost 5 call credits.
I am stuck on how to efficiently use this API. Currently, each time the user refreshes the tableview it will cost 5 credits. Therefore after this has been done 40 times, the API cap has been reached for the day.
The only solution I have though of is to have some script in js/ruby/python that pulls the data every x minutes or x hours and saves this to Firebase databse or firebase Cloud storage and then in my app I can pull the data from Firebase?
My other idea was to run the script on a server and pull the data from there.
Is there any other simpler alternatives that I am missing?
To prevent over consuming why not you run the API and save the results to your own DB; create a custom API specific for your app to pull from your personal storage and this way you can control the interval and frequency of how often you pull on the premium API.
You can setup a job to auto update your personal DB with the premium data every x amount of time, update new entries and add new ones as you see fit while on the client side they will pull the same premium data you’ve pulled; imo that would be how I would go about because without control you’ll find yourself facing a major scaling issue.

High Volume MongoDB with Twitter Streaming API, Ruby on Rails, Heroku setup

I'm looking to re-code an application to better handle spikes in tweets. I'm moving to Heroku and MongoDB (either MongoLab or MongoHQ) for the database solution.
During certain news events, tweet volume might spike to 15,000 / second. Typically with each tweet, I parse the tweet and store various pieces of data such as user data, etc. My idea is to store the raw tweets in a separate collection, and have a separate process grab raw tweets and parse them. The goal here is when there is a massive spike in tweets, my application isn't trying to parse all of these, but is essentially backlogging the raw tweets in another collection. As the volume slows, the process can take care of the backlog over time.
My question is three fold:
Can MongoDB handle this type of volume with regards to inserts into a collection at a rate of 15,000 tweets per second?
Any idea on the better setup: MongoHQ or MongoLab?
Any feedback on the overall setup?
Thanks!
The write volume that it will handle depends on lots of factors - hardware, indexes, size of each document, etc. Your best bet is to test it in the environment you're planning to use. If the demands of the write load exceed the capacity of a single mongo server, you can always use just multiple shards.
They are very similar, but there are some differences in pricing and the actual site design has a bunch of differences. There's a thread of discussion about it here: https://webmasters.stackexchange.com/questions/20782/mongodb-hosting-mongolab-vs-mongohq-vs-mongomachine
Overall it seems to make sense. Sounds like you will probably want to flesh out some details about how you will be processing the backlog. Will you be polling it by querying periodically, deleting tweets from the backlog as it processes them, etc.
Completely agree on the need to test this. In general, mongo can handle that many writes, but in practice it depends on the size of your set up, other operations, indexes, etc.
I had to do a similar approach for collecting tons of metrics data. I used a lightweight event-machine process to accept incoming requests in parallel, and store them in a simple format, then another process would take those requests and send them up to a central server. The main goal was to make sure no data was lost if the central server was down, but it also allowed me to put in some throttling logic so that the spikes in data wouldn't overwhelm the system.
I'd be interested to see how this works out for you price-wise, vs. a vps like linode. (I'm a huge Heroku fan, but with certain architectures it can get pricey quickly)

Core Data max storage iPhone

Is there a limit to how much persistent storage a single iPhone app may consume?
What does save set the error argument to if the iPhone hits a per-app limit? What if it hits the hardware limit?
Is it possible to limit the number of objects stored for certain entities? If so, what's a good approach to doing this?
acani, an iPhone app I'm working on, downloads the nearest 20 users from the server and saves them to Core Data. After using the app for a while, the users SQLite table could become rather large. How could I limit it? What should I limit it to? Once this table has reached capacity, how could I make it so that newly downloaded users replace the oldest downloaded users?
Thanks!
Matt
I don't know the answer to the limits questions, but I would think you would want the maximum amount of data to be limited well ahead of that. There are some iPhone apps (games0 which take up a large amount of storage (I think Myst is something line 1.5G). But if you allowed your database to grow to those sorts of sizes you might start to impact on the storage the user has for their other applications.
I'd be inclined to suggest that your application needs to have some sort of database house keeping implemented. You will have to write this. Either automatically triggered or manually triggered by the user. For example you might want to setup a settings option where the user can specify how many "old" users it wants to preserve. If users are being added automatically based on location, what sort of algorythm would a user most likely want to cull the list with?
There is a 2GB limit for apps from the App Store but as far as user data goes, you should be able to basically fill the disk. When that happens, your saves will start to fail, I believe with 'NSFileWriteOutOfSpaceError' bubbled up from the PSC.
As far as limiting entity space, there's no Core Data support for this - you'd have to handle it programatically. You could extend the validation system to check for certain conditions (free space, number of entities) and fail an insert or update if these didn't match your criteria.
If you want to delete old users, just sort the results and delete the first/last one.

almost live forex currency rates [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I need to get live forex exchange rates for my personal application. I know that there's no free service that has this data available for download. I've been using Yahoo Finance, but I've just found out that it has a delay of 15 minutes or so. Is there any way I could get fresher rates somewhere? Say, 5-minute old instead of 15?
Many forex brokers offer free "informers" that autoload data in an interval of seconds, so maybe there's a few that allow this data to be downloaded in bigger intervals without the use of their informers strictly for personal use?
TrueFX has free real-time (multiple updates per second) forex quotes, but only for a limited number of pairs: http://webrates.truefx.com/rates/connect.html?f=html
They also have free downloadable tick data for the same pairs, going back to May 2009: http://truefx.com/?page=downloads
You can get real-time quotes for a larger selection of pairs from FXCM: http://rates.fxcm.com/RatesXML
Realtime rates for about 40 currency pairs are available here: http://1forge.com/forex-data-api, eg: https://1forge.com/forex-quotes/quotes
They also have free downloadable tick-data, going back to 2007, but you need to create a demo account and use a COM based Windows API called Order2Go to retrieve it.
They promised that they will make available the same tick data in CSV format for free sometime this year here: http://www.forexcodesource.com/index.php/Category:Historical_Data
Here are a bunch of equity/fx data providers, however they are not free.
http://finviz.com/store/market-data-providers.ashx
If you're trying to keep everything free, then you'll probably have to hack something together.
For example, in MT4 there is a DDE hook that you can use to broadcast the quotes. You'll need a windows box(or vm) running MT4 and an app listening to the DDE server, that would forward the quotes off to your linux server via a TCP socket, or even HTTP. The lag should be less than a second if done right.
Here's the .net library I use to receive the DDE quotes.
http://www.4xlab.net/cs/forums/136/ShowPost.aspx
Also, if you are looking for historical tick data, then this is a great source.
http://ratedata.gaincapital.com/
download metatrader from any broker, and write an expert adviser to log all the data you want to a file. have another process that read the file. if you really want to get fancy, you can call c functions from mt4 code. its not that hard to write some c code to store data to a db instead of logging it to a file.