Architecture: Creating a time filterable leaderboard system with mongoose - mongodb

This question is more of a architecture based one.
I have a website which has 3 PVP (player vs player) games. Each game has it's own mongoDB collection
and it's documents have properties such as timestamp, amount won (points) and players involved.
I want to create a leaderboard system that retrieves data from all these 3 games, and shows who has won the most in a top 10 kind of style. This system will be accessed most likely through a HTTP end point. And I'd also like this leaderboard to be filterable by time: top 10 from the last week/month/year/all time
Problems:
As the user database has grown and more games have been created, Computing the table each time the end point is hit takes longer and longer. Page load times take a super long.
Initial Idea
Technologies
Mongoose, Express, Nuxt(Vue), Socket.io

I would suggest some sort of caching scheme. Two basic method I would consider:
Create a service that automatically tabulates the leaderboard and caches it or saves it to another mongo object. Clients are then served the cached version. This option is nice as it creates a historical record which could make for fun features in the future.
Cache the response in your express service and update it only on some frequency. As explained here: https://medium.com/the-node-js-collection/simple-server-side-cache-for-express-js-with-node-js-45ff296ca0f0 The risk with this method is if you have concurrent requests when the leaderboard is being generated it could hit your mongo server hard.
Without knowing all the details, I would go with the first option as it is immune to concurrent requests and could be extended in the future with some sort of historical leaderboard feature.
As for filtering, I'd recommend using the tables in vue-bootstrap. Data is easily represented in tables and sorting is built-in.

Related

Use one Mongo doc to hold thousands of objects, or thousands of docs each holding one?

In my web app, an authenticated user can pick songs from his spotify playlist to play at a party. I want guests (nonauthenticated users) to be able to view the picked songs on a dynamically created react route and vote on their favorite songs on their own device (probably a phone).
I am using a Mongo, Express, React/Redux, Node stack.
Since the guests don't have access to my app's redux store, the only way they can view the authenticated user's picked songs is through a GET request to my app's database. My initial plan was to just store playlist documents, and the users can GET those playlists to make a request to the spotify api. However, they are unauthorized and need an access token. This means that my database has to store every single one of the songs that the authenticated user picked.
My question has to do with design. I don't think it's a good idea for my one document to hold every song because some people might want to pick thousands of songs, and one document won't be able to hold all of the songs. On the other hand, creating a separate document for each song seems a little bit too excessive.
Can anyone help me figure out which option is better, or if there is a different option I haven't thought of that can avoid this problem altogether? Thank you
Assuming that if you would store each song in a separated document, the main disadvantage of this strategy is the space complexity, you'll need more space to store all documents.
But, supposing you'll keep all song documents at the same collection, it gives some advantages, for example: queries and sorts operations will be more flexible and faster. It helps you to save both processing and development time. A similar logic is showed here.
Use just one document to store all songs makes your database operations more complex, what requires more development time and code to organize all retrieved data on the proper way. Another disadvantage is that it isn't a long term scalable strategy, mainly because the limit of a BSON document is 16MB.
At my vision, the design of separated documents for each song is more appropriate and the reasons are:
Space is monetarily cheap.
Save time complexity must be a priority on all points of software development. Database queries usually are the slower operations in a software. So, reduce the cost of time at database operations is a good objective to seek. Storing all documents in one collection instead of in one document will retrieve all data already organized, with no no need to retreat at code.

PouchDB / Ionic 1 / CouchDB - architecture recommendation

I have a multi-user single-page mobile app developed with Ionic 1, PouchDB and CouchDB. User's management is achieved with SuperLogin.
I would like to add a feature computing a score (something similar to the score in the Waze app) for each user based on his current data, and keeping a track of the former values of that score every past day.
I am wondering about the best way to implement this.
About my app:
it should be able to work offline, and then sync with the server when online (this is why I am using PouchDB and CouchDB, working great so far). So on the server, I have one CouchDB database per user, storing his own data
on the server. The PouchDB database in the app is syncing with the one of the user on the server.
I am considering various options for the score, but none of them really satisfy me, so your advice would be welcome (possibly for yet another option)
Option 1: The score is computed in the app by the Ionic code. The result is stored as a local database object, with a date and a score value. This happens whenever the user changes its data. As the DB is synced with the server, these scores are updated in the server too. However, if some days the user does not use the application, the score won't be computed for these days. More over, if the user runs the app on 2 different devices, and update some data on one of them, this will make the score recomputed locally, then propagated on the server. When the changed data propagate to the server and to the other device, this will trigger a new score computation on this other device, and might lead to conflicts on the score object in the server. Finally, if at some point in time, I want to change the way of computing the score, the value given to each user will depend on whether he has upgraded to the latest app version.
Option 2: have a server-side process that triggers every day, and compute each user's score by connecting to each user's DB on the server, reading its data, computing the corresponding score, and storing it (date+value) back in the server DB. This option looks cleaner to me, but it would require further developments, and an additional process to maintain and keep alive on the server. And if the user inputs data to the application while not connected to the internet, the score will not be updated in the app until he gets connected again (which would cause the server process to recompute the score, and propagate it back to the app through CouchDB sync)
Option 3: have some kind of "stored procedure" in the CouchDB server, triggering every time related data change, in charge of computing the score of each user. But I don't think this is doable with CouchDB.
So how would you do this score computation please??
Many thanks!

How to ensure that parallel queries to ext. system are executed only once and then cached

Server frameworks: Scala, Play 2.2, ReactiveMongo, Heroku
I think I have quite interesting brain teaser for you:
In my trip-planning application I want to display weather forecast on a map(similar to this). I'm using a paid REST service to query weather data. To speed up user experience and reduce costs I plan to cache weather data for each location for one hour.
There are a few not-so obvious things to consider:
It might require to query up to 100 location for weather to display one weather map
Weather must be queried in parallel because it would take too long to query it in serial fashion considering network latency
However launching 100 threads for each user request is not an option as well (imagine just 5 users looking at a map at one time)
The solution is to have let's say 50 workers that query weather for user requests
Multiple users might be viewing the same portion of map
There is a possible racing condition where one location is queried multiple times.
However it should be queried only once and then cached.
The application is running in clustered environment meaning there will be several play instances.
Coming from a Java EE background I can come up with a pretty good solution using the Java EE stack.
However I wonder how to do this using something more natural to Scala/Play stack: Akka. There is an example (google "heroku scala akka") for similar problem but it doesn't solve one issue: Racing condition when multiple users query the same data at once.
How would you implement this?
EDIT: I have decided that the requirement to ensure that weather data is updated only once is not necessary. The situation would happen far too infrequently to be a real problem and all proposed solutions would bring too much overhead and complexity to the system to be viable.
Thanks everyone for your time and effort. I hope answers to this question will help someone in the future with similar problem.
In Akka you can choose from multiple routing strategies. ConsistentHashingRoutingLogic could serve you well in this situation. Since actors are single-threaded you can easily maintain a cache in each actor. This routing logic will assure that two equal messages will always hit the same actor.
Each actor can work in the following way:
1. check local cache (for example apache commons LRUMap)
- if found, return
2. check global cache (distributed memcache or any other key-value store)
- if found, store the result in the local cache and return
3. query the REST service
4. store the result in the global and local caches
You can have a look at this question, which I based my answer on.
I decided that I'll post my JMS solution as well.
Controller that processes the request for weather does following:
Query the DB for weather data. If there are NO locations with out-of-date data reply immediately. Otherwise continue:
Start listening on a topic (explained later).
For each location: Check whether the weather for the location isn't being updated.
If not send a weather update request message to queue.
Certain amount of workers (50?) listen to that queue.
Worker first marks the location weather as being updated
Worker retrieves updated weather and updates the DB.
Worker sends a message to a topic with weather data for that location.
When controller receives (via topic) weather updates for all out-of-date locations, combine it with up-to-date locations and reply.

Is there any value in using core data for iPhone apps?

Can people give me examples of why they would use coreData in an application?
I ask this because most apps are just clients to a central server where an API of some sort gives you the information you need.
In my case I'm writing a timesheet application for a web app which has an API and I'm debating if there is any value in replicating the data structure on my server in core data(Sqlite)
e.g
Project has many timesheets
employee has many timesheets
It seems to me that I can just connect to the API on every call for lists of projects or existing timesheets for example.
I realize for some kind of offline mode you could store locally in core data but this creates way more problems because you now have a big problem with syncing that data back to the web server when you get connection again.. e.g. the project selected for a timesheet no longer exists.
Can any experienced developer shed some light on there experiences on when core data is best practice approach?
EDIT
I realise of course there is value in storing local persistance but the key value of user defaults seems to cover most applications I can think of.
You shouldn't think of CoreData simply as an SQLite database. It's not JUST an SQLite database. Sure, SQLite is an option, but there are other options as well, such as in-memory and, as of iOS5, a whole slew of custom data stores. The biggest benefit with CoreData is persistence, obviously. But even if you are using an in-memory data store, you get the benefits of a very well structured object graph, and all of the heavy lifting with regards to pulling information out of or putting information into the data store is handled by CoreData for you, without you necessarily needing to concern yourself with what is backing that data store. Sure, today you don't care too much about persistence, so you could use an in-memory data store. What happens if tomorrow, or in a month, or a year, you decide to add a feature that would really benefit from persistence? With CoreData, you simply change or add a persistent data store, and all of your methods to get information out or in remain unchanged. The overhead for that sort of addition is minimal in comparison to if you were trying to access SQLite or some other data store directly. IMHO, that's the biggest benefit: abstraction. And, in essence, abstraction is one of the most powerful things behind OOP. Granted, building the Data Model just for in-memory storage could be overkill for your app, depending on how involved the app is. But, just as a side note, you may want to consider what is faster: Requesting information from your web service every time you want to perform some action, or requesting the information once, storing it in memory, and acting on that stored value for the remainder of the session. An in-memory data store wouldn't persistent beyond that particular session.
Additionally, with CoreData you get a lot of other great features like saving, fetching, and undo-redo.
There are basically two kinds of apps. Those that provide you with local functionality (games, professional applications, navigation systems...) and those that grant access to a remote service.
Your app seems to be in the second category. If you access remote services, your users will want to access new or real-time data (you don't want to read 2 week old Facebook posts) but in some cases, local caching makes sense (e.g. reading your mails when you're on the train with unstable network).
I assume that the value of accessing cached entries when not connected to a network is pretty low for your customers (internal or external) compared to the importance of accessing real-time-data. So local storage might be not necessary at all.
If you don't have hundreds of entries in your timetable, "normal" serialization (NSCoding-protocol) might be enough. If you only access some "dashboard-data", you will be able to get along with simple request/response-caching (NSURLCache can do a lot of things...).
Core Data does make more sense if you have complex data structures which should be synchronized with a server. This adds a lot of synchronization logic to your project as well as complexity from Core Data integration (concurrency, thread-safety, in-app-conflicts...).
If you want to create a "client"-app with a server driven user experience, local storage is not necessary at all so my suggestion is: Keep it as simple as possible unless there is a real need for offline storage.
It's ideal for if you want to store data locally on the phone.
Seriously though, if you can't see a need for it for your timesheet app, then don't worry about it and don't use it.
Solving the sync problems that you would have with an "offline" mode would be detailed in your design of your app. For example - don't allow projects to be deleted. Why would you? Wouldn't you want to go back in time and look at previous data for particular projects? Instead just have a marker on the project to show it as inactive and a date/time that it was made inactive. If the data that is being synced from the device is for that project and is before the date/time that it was marked as inactive, then it's fine to sync. Otherwise display a message and the user will have to sort it.
It depends purely on your application's design whether you need to store some data locally or not, if it is a real problem or a thin GUI client around your web service. Apart from "offline" mode the other reason to cache server data on client side might be to take traffic load from your server. Just think what does it mean for your server to send every time the whole timesheet data to the client, or just the changes. Yes, it means more implementation on both side, but in some cases it has serious advantages.
EDIT: example added
You have 1000 records per user in your timesheet application and one record is cca 1 kbyte. In this case every time a user starts your application, it has to fetch ~1Mbyte data from your server. If you cache the data locally, the server can tell you that let's say two records were updated since your last update, so you'll have to download only 2 kbyte. Now you should scale up this for several tens of thousands of user and you will immediately notice the difference of the server bandwidth and CPU usage.

High Volume MongoDB with Twitter Streaming API, Ruby on Rails, Heroku setup

I'm looking to re-code an application to better handle spikes in tweets. I'm moving to Heroku and MongoDB (either MongoLab or MongoHQ) for the database solution.
During certain news events, tweet volume might spike to 15,000 / second. Typically with each tweet, I parse the tweet and store various pieces of data such as user data, etc. My idea is to store the raw tweets in a separate collection, and have a separate process grab raw tweets and parse them. The goal here is when there is a massive spike in tweets, my application isn't trying to parse all of these, but is essentially backlogging the raw tweets in another collection. As the volume slows, the process can take care of the backlog over time.
My question is three fold:
Can MongoDB handle this type of volume with regards to inserts into a collection at a rate of 15,000 tweets per second?
Any idea on the better setup: MongoHQ or MongoLab?
Any feedback on the overall setup?
Thanks!
The write volume that it will handle depends on lots of factors - hardware, indexes, size of each document, etc. Your best bet is to test it in the environment you're planning to use. If the demands of the write load exceed the capacity of a single mongo server, you can always use just multiple shards.
They are very similar, but there are some differences in pricing and the actual site design has a bunch of differences. There's a thread of discussion about it here: https://webmasters.stackexchange.com/questions/20782/mongodb-hosting-mongolab-vs-mongohq-vs-mongomachine
Overall it seems to make sense. Sounds like you will probably want to flesh out some details about how you will be processing the backlog. Will you be polling it by querying periodically, deleting tweets from the backlog as it processes them, etc.
Completely agree on the need to test this. In general, mongo can handle that many writes, but in practice it depends on the size of your set up, other operations, indexes, etc.
I had to do a similar approach for collecting tons of metrics data. I used a lightweight event-machine process to accept incoming requests in parallel, and store them in a simple format, then another process would take those requests and send them up to a central server. The main goal was to make sure no data was lost if the central server was down, but it also allowed me to put in some throttling logic so that the spikes in data wouldn't overwhelm the system.
I'd be interested to see how this works out for you price-wise, vs. a vps like linode. (I'm a huge Heroku fan, but with certain architectures it can get pricey quickly)