Firebase database Retrieve high score rank of a user [duplicate] - unity3d

I have project that I need to display a leaderboard of the top 20, and if the user not in the leaderboard they will appear in the 21st place with their current ranking.
Is there efficient way to this?
I am using Cloud Firestore as a database. I believe it was mistake to choose it instead of MongoDB but I am in the middle of the project so I must do it with Cloud Firestore.
The app will be use by 30K users. Is there any way to do it without getting all the 30k users?
this.authProvider.afs.collection('profiles', ref => ref.where('status', '==', 1)
.where('point', '>', 0)
.orderBy('point', 'desc').limit(20))
This is code I did to get the top 20 but what will be the best practice for getting current logged in user rank if they are not in the top 20?

Finding an arbitrary player's rank in leaderboard, in a manner that scales is a common hard problem with databases.
There are a few factors that will drive the solution you'll need to pick, such as:
Total Number players
Rate that individual players add scores
Rate that new scores are added (concurrent players * above)
Score range: Bounded or Unbounded
Score distribution (uniform, or are their 'hot scores')
Simplistic approach
The typical simplistic approach is to count all players with a higher score, eg SELECT count(id) FROM players WHERE score > {playerScore}.
This method works at low scale, but as your player base grows, it quickly becomes both slow and resource expensive (both in MongoDB and Cloud Firestore).
Cloud Firestore doesn't natively support count as it's a non-scalable operation. You'll need to implement it on the client-side by simply counting the returned documents. Alternatively, you could use Cloud Functions for Firebase to do the aggregation on the server-side to avoid the extra bandwidth of returning documents.
Periodic Update
Rather than giving them a live ranking, change it to only updating every so often, such as every hour. For example, if you look at Stack Overflow's rankings, they are only updated daily.
For this approach, you could schedule a function, or schedule App Engine if it takes longer than 540 seconds to run. The function would write out the player list as in a ladder collection with a new rank field populated with the players rank. When a player views the ladder now, you can easily get the top X + the players own rank in O(X) time.
Better yet, you could further optimize and explicitly write out the top X as a single document as well, so to retrieve the ladder you only need to read 2 documents, top-X & player, saving on money and making it faster.
This approach would really work for any number of players and any write rate since it's done out of band. You might need to adjust the frequency though as you grow depending on your willingness to pay. 30K players each hour would be $0.072 per hour($1.73 per day) unless you did optimizations (e.g, ignore all 0 score players since you know they are tied last).
Inverted Index
In this method, we'll create somewhat of an inverted index. This method works if there is a bounded score range that is significantly smaller want the number of players (e.g, 0-999 scores vs 30K players). It could also work for an unbounded score range where the number of unique scores was still significantly smaller than the number of players.
Using a separate collection called 'scores', you have a document for each individual score (non-existent if no-one has that score) with a field called player_count.
When a player gets a new total score, you'll do 1-2 writes in the scores collection. One write is to +1 to player_count for their new score and if it isn't their first time -1 to their old score. This approach works for both "Your latest score is your current score" and "Your highest score is your current score" style ladders.
Finding out a player's exact rank is as easy as something like SELECT sum(player_count)+1 FROM scores WHERE score > {playerScore}.
Since Cloud Firestore doesn't support sum(), you'd do the above but sum on the client side. The +1 is because the sum is the number of players above you, so adding 1 gives you that player's rank.
Using this approach, you'll need to read a maximum of 999 documents, averaging 500ish to get a players rank, although in practice this will be less if you delete scores that have zero players.
Write rate of new scores is important to understand as you'll only be able to update an individual score once every 2 seconds* on average, which for a perfectly distributed score range from 0-999 would mean 500 new scores/second**. You can increase this by using distributed counters for each score.
* Only 1 new score per 2 seconds since each score generates 2 writes
** Assuming average game time of 2 minute, 500 new scores/second could support 60000 concurrent players without distributed counters. If you're using a "Highest score is your current score" this will be much higher in practice.
Sharded N-ary Tree
This is by far the hardest approach, but could allow you to have both faster and real-time ranking positions for all players. It can be thought of as a read-optimized version of of the Inverted Index approach above, whereas the Inverted Index approach above is a write optimized version of this.
You can follow this related article for 'Fast and Reliable Ranking in Datastore' on a general approach that is applicable. For this approach, you'll want to have a bounded score (it's possible with unbounded, but will require changes from the below).
I wouldn't recommend this approach as you'll need to do distributed counters for the top level nodes for any ladder with semi-frequent updates, which would likely negate the read-time benefits.
Final thoughts
Depending on how often you display the leaderboard for players, you could combine approaches to optimize this a lot more.
Combining 'Inverted Index' with 'Periodic Update' at a shorter time frame can give you O(1) ranking access for all players.
As long as over all players the leaderboard is viewed > 4 times over the duration of the 'Periodic Update' you'll save money and have a faster leaderboard.
Essentially each period, say 5-15 minutes you read all documents from scores in descending order. Using this, keep a running total of players_count. Re-write each score into a new collection called scores_ranking with a new field players_above. This new field contains the running total excluding the current scores player_count.
To get a player's rank, all you need to do now is read the document of the player's score from score_ranking -> Their rank is players_above + 1.

One solution not mentioned here which I'm about to implement in my online game and may be usable in your use case, is to estimate the user's rank if they're not in any visible leaderboard because frankly the user isn't going to know (or care?) whether they're ranked 22,882nd or 22,838th.
If 20th place has a score of 250 points and there are 32,000 players total, then each point below 250 is worth on average 127 places, though you may want to use some sort of curve so as they move up a point toward bottom of the visible leaderboard they don't jump exactly 127 places each time - most of the jumps in rank should be closer to zero points.
It's up to you whether you want to identify this estimated ranking as an estimation or not, and you could add some a random salt to the number so it looks authentic:
// Real rank: 22,838
// Display to user:
player rank: ~22.8k // rounded
player rank: 22,882nd // rounded with random salt of 44
I'll be doing the latter.

Alternative perspective - NoSQL and document stores make this type of task overly complex. If you used Postgres this is pretty simple using a count function. Firebase is tempting because it's easy to get going with but use cases like this are when relational databases shine. Supabase is worth a look https://supabase.io/ similar to firebase so you can get going quickly with a backend but its opensource and built on Postgres so you get a relational database.

A solution that hasn't been mentioned by Dan is the use of security rules combined with Google Cloud Functions.
Create the highscore's map. Example:
highScores (top20)
Then:
Give the users write/read access to highScores.
Give the document/map highScores the smallest score in a property.
Let the users only write to highScores if his score > smallest score.
Create a write trigger in Google Cloud Functions that will activate when a new highScore is written. In that function, delete the smallest score.
This looks to me the easiest option. It is realtime as well.

You could do something with cloud storage. So manually have a file that has all the users' scores (in order), and then you just read that file and find the position of the score in that file.
Then to write to the file, you could set up a CRON job to periodically add all documents with a flag isWrittenToFile false, add them all to the file (and mark them as true). That way you won't eat up your writes. And reading a file every time the user wants to view their position is probably not that intensive. It could be done from a cloud function.

2022 Updated and Working Answer
To solve the problem of having a leaderboards with user and points, and to know your position in this leaderboards in an less problematic way, I have this solution:
1) You should create your Firestorage Document like this
In my case, I have a document perMission that has for each user a field, with the userId as property and the respective leaderboard points as value.
It will be easier to update the values inside my Javascript code.
For example, whenever an user completed a mission (update it's points):
import { doc, setDoc, increment } from "firebase/firestore";
const docRef = doc(db, 'leaderboards', 'perMission');
setDoc(docRef, { [userId]: increment(1) }, { merge: true });
The increment value can be as you want. In my case I run this code every time the user completes a mission, increasing the value.
2) To get the position inside the leaderboards
So here in your client side, to get your position, you have to order the values and then loop through them to get your position inside the leaderboards.
Here you can also use the object to get all the users and its respective points, ordered. But here I am not doing this, I am only interested in my position.
The code is commented explaining each block.
// Values coming from the database.
const leaderboards = {
userId1: 1,
userId2: 20,
userId3: 30,
userId4: 12,
userId5: 40,
userId6: 2
};
// Values coming from your user.
const myUser = "userId4";
const myPoints = leaderboards[myUser];
// Sort the values in decrescent mode.
const sortedLeaderboardsPoints = Object.values(leaderboards).sort(
(a, b) => b - a
);
// To get your specific position
const myPosition = sortedLeaderboardsPoints.reduce(
(previous, current, index) => {
if (myPoints === current) {
return index + 1 + previous;
}
return previous;
},
0
);
// Will print: [40, 30, 20, 12, 2, 1]
console.log(sortedLeaderboardsPoints);
// Will print: 4
console.log(myPosition);
You can now use your position, even if the array is super big, the logic is running in the client side. So be careful with that. You can also improve the client side code, to reduce the array, limit it, etc.
But be aware that you should do the rest of the code in your client side, and not Firebase side.
This answer is mainly to show you how to store and use the database in a "good way".

Related

Firestore query where document id not in array [duplicate]

I'm having a little trouble wrapping my head around how to best structure my (very simple) Firestore app. I have a set of users like this:
users: {
'A123': {
'name':'Adam'
},
'B234': {
'name':'Bella'
},
'C345': {
'name':'Charlie'
}
}
...and each user can 'like' or 'dislike' any number of other users (like Tinder).
I'd like to structure a "likes" table (or Firestore equivalent) so that I can list people who I haven't yet liked or disliked. My initial thought was to create a "likes" object within the user table with boolean values like this:
users: {
'A123': {
'name':'Adam',
'likedBy': {
'B234':true,
},
'disLikedBy': {
'C345':true
}
},
'B234': {
'name':'Bella'
},
'C345': {
'name':'Charlie'
}
}
That way if I am Charlie and I know my ID, I could list users that I haven't yet liked or disliked with:
var usersRef = firebase.firestore().collection('users')
.where('likedBy.C345','==',false)
.where('dislikedBy.C345','==',false)
This doesn't work (everyone gets listed) so I suspect that my approach is wrong, especially the '==false' part. Could someone please point me in the right direction of how to structure this? As a bonus extra question, what happens if somebody changes their name? Do I need to change all of the embedded "likedBy" data? Or could I use a cloud function to achieve this?
Thanks!
There isn't a perfect solution for this problem, but there are alternatives you can do depending on what trade-offs you want.
The options: Overscan vs Underscan
Remember that Cloud Firestore only allows queries that scale independent of the total size of your dataset.
This can be really helpful in preventing you from building something that works in test with 10 documents, but blows up as soon as you go to production and become popular. Unfortunately, this type of problem doesn't fit that scalable pattern and the more profiles you have, and the more likes people create, the longer it takes to answer the query you want here.
The solution then is to find a one or more queries that scale and most closely represent what you want. There are 2 options I can think of that make trade-offs in different ways:
Overscan --> Do a broader query and then filter on the client-side
Underscan --> Do one or more narrower queries that might miss a few results.
Overscan
In the Overscan option, you're basically trading increased cost to get 100% accuracy.
Given your use-case, I imagine this might actually be your best option. Since the total number of profiles is likely orders of magnitude larger than the number of profiles an individual has liked, the increased cost of overscanning is probably inconsequential.
Simply select all profiles that match any other conditions you have, and then on the client side, filter out any that the user has already liked.
First, get all the profiles liked by the user:
var likedUsers = firebase.firestore().collection('users')
.where('likedBy.C345','==',false)
Then get all users, checking against the first list and discarding anything that matches.
var allUsers = firebase.firestore().collection('users').get()
Depending on the scale, you'll probably want to optimize the first step, e.g. every time the user likes someone, update an array in a single document for that user for everyone they have liked. This way you can simply get a single document for the first step.
var likedUsers = firebase.firestore().collection('likedUsers')
.doc('C345').get()
Since this query does scale by the size of the result set (by defining the result set to be the data set), Cloud Firestore can answer it without a bunch of hidden unscalable work. The unscalable part is left to you to optimize (with 2 examples above).
Underscan
In the Underscan option, you're basically trading accuracy to get a narrower (hence cheaper) set of results.
This method is more complex, so you probably only want to consider it if for some reason the liked to unliked ratio is not as I suspect in the Overscan option.
The basic idea is to exclude someone if you've definitely liked them, and accept the trade-off that you might also exclude someone you haven't yet liked - yes, basically a Bloom filter.
In each users profile store a map of true/false values from 0 to m (we'll get to what m is later), where everything is set to false initially.
When a user likes the profile, calculate the hash of the user's ID to insert into the Bloom filter and set all those bits in the map to true.
So let's say C345 hashes to 0110 if we used m = 4, then your map would look like:
likedBy: {
0: false,
1: true,
2: true,
3: false }
Now, to find people you definitely haven't liked, you need use the same concept to do a query against each bit in the map. For any bit 0 to m that your hash is true on, query for it to be false:
var usersRef = firebase.firestore().collection('users')
.where('likedBy.1','==',false)
Etc. (This will get easier when we support OR queries in the future). Anyone who has a false value on a bit where your user's ID hashes to true definitely hasn't been liked by them.
Since it's unlikely you want to display ALL profiles, just enough to display a single page, you can probably randomly select a single one of the ID's hash bits that is true and just query against it. If you run out of profiles, just select another one that was true and restart.
Assuming most profiles are liked 500 or less times, you can keep the false positive ratio to ~20% or less using m = 1675.
There are handy online calculators to help you work out ratios of likes per profile, desired false positive ratio, and m, for example here.
Overscan - bonus
You'll quickly realize in the Overscan option that every time you run the query, the same profiles the user didn't like last time will be shown. I'm assuming you don't want that. Worse, all the ones the user liked will be early on in the query, meaning you'll end up having to skip them all the time and increase your costs.
There is an easy fix for that, use the method I describe on this question, Firestore: How to get random documents in a collection. This will enable you to pull random profiles from the set, giving you a more even distribution and reducing the chance of stumbling on lots of previously liked profiles.
Underscan - bonus
One problem I suspect you'll have with the Underscan option is really popular profiles. If someone is almost always liked, you might start exceeding the usefulness of a bloom filter if that profile has a size not reasonable to keep in a single document (you'll want m to be less than say 8000 to avoid running into per document index limits in Cloud Firestore).
For this problem, you want to combine the Overscan option just for these profiles. Using Cloud Functions, any profile that has more than x% of the map set to true gets a popular flag set to true. Overscan everyone on the popular flag and weave them into your results from the Underscan (remember to do the discard setup).

Database schema for a tinder like app

I have a database of million of Objects (simply say lot of objects). Everyday i will present to my users 3 selected objects, and like with tinder they can swipe left to say they don't like or swipe right to say they like it.
I select each objects based on their location (more closest to the user are selected first) and also based on few user settings.
I m under mongoDB.
now the problem, how to implement the database in the way it's can provide fastly everyday a selection of object to show to the end user (and skip all the object he already swipe).
Well, considering you have made your choice of using MongoDB, you will have to maintain multiple collections. One is your main collection, and you will have to maintain user specific collections which hold user data, say the document ids the user has swiped. Then, when you want to fetch data, you might want to do a setDifference aggregation. SetDifference does this:
Takes two sets and returns an array containing the elements that only
exist in the first set; i.e. performs a relative complement of the
second set relative to the first.
Now how performant this is would depend on the size of your sets and the overall scale.
EDIT
I agree with your comment that this is not a scalable solution.
Solution 2:
One solution I could think of is to use a graph based solution, like Neo4j. You could represent all your 1M objects and all your user objects as nodes and have relationships between users and objects that he has swiped. Your query would be to return a list of all objects the user is not connected to.
You cannot shard a graph, which brings up scaling challenges. Graph based solutions require that the entire graph be in memory. So the feasibility of this solution depends on you.
Solution 3:
Use MySQL. Have 2 tables, one being the objects table and the other being (uid-viewed_object) mapping. A join would solve your problem. Joins work well for the longest time, till you hit a scale. So I don't think is a bad starting point.
Solution 4:
Use Bloom filters. Your problem eventually boils down to a set membership problem. Give a set of ids, check if its part of another set. A Bloom filter is a probabilistic data structure which answers set membership. They are super small and super efficient. But ya, its probabilistic though, false negatives will never happen, but false positives can. So thats a trade off. Check out this for how its used : http://blog.vawter.com/2016/03/17/Using-Bloomfilters-to-Avoid-Repetition/
Ill update the answer if I can think of something else.

Player matching in Google Play Services based on player ranking/skill

Is there a way I can match players using Google Play Game Service based on individual players' skill level in the game?
I have locally stored the player level of each player, and want a player to be matched to his/her closest ranked player.
For example: a player ranked 10 (beginner) should be paired with the closest ranked player available (e.g. 5 to 15) instead of an expert level 100 player, so that we can have a balanced competition.
There are two variables that can be set to influence the match making:
First, you can set a variant of the game using RoomConfig.Builder.setVariant(). This method takes an non-negative value indicating the type of match. The variant specified needs to match exactly with other participants in order for auto-matching to take place. I suppose you could be strict in your match making and use the variant as the player's level. In this way, players would only be matched with players of the same level. An alternative would be to group levels together in a range, for example levels 1-5 could play each other, likewise group 6-8, etc.
The second variable is the exclusiveBitMask. This is passed in when calling RoomConfig.CreateAutoMatchCritera(). This method takes the min and max number of players to match, and the exclusiveBitMask. This mask when logically AND'ed with the other players will equal 0. This is used for things like role based games (need to have 1 offense and 1 defense). One possible use of this would be to mask out high level vs. low level capabilities so there is no outrageous mismatch.
I think Clayton Wilkinson's answer is all correct and I voted it up.
But, I imagine the OP is hoping for some way to do skill-based matching without splitting the player-base into segments.
Sadly, the answer is no you can't. The choices are to use some other matchmaking system or split up your player base. If you choose to split your player base then you need a lot of concurrent users to avoid making your players wait a long time.
On a recent title we rolled our own matchmaking service based on Raknet because we wanted more nuanced matchmaking. It's a lot of hassle though, and GPGS is pretty great otherwise, so skill based matching would have to be a very high priority before you consider abandoning GPGS.

MongoDB: sort by relevance function

I want to create a basic sorting algorithm determined by relevance in mongodb and meteor. However, I cannot store this statically and update it because it uses current time as one of the parameters.
Ideally, what I'd like to have is something like as follows:
Post.relevance = function(magnitude) {
magnitude = magnitude || 1.8;
check(magnitude, Number);
var score = this.upvotes - this.downvotes;
var hoursAgo = moment().hours - moment(this.createdAt).hours();
return (score - 1) / Math.pow((hoursAgo + 2), magnitude);
}
From what I gather, I will have to use the aggregation pipeline to generate this query, but I can't quite get the details for my post page.
How would one generate this advanced query using mongodb?
The algorithm depends on how you define relevance.
Before going on, I'd like to mention that I haven't previously implemented such an algorithm in a production environment and I am only expressing my personal opinion on how I would tackle this.
Personally, according to your schema, I would consider the following methods to be the most common in determining relevance:
Relevance in terms of popularity - this is how search engines
determine the relevance of content: the more views a website, the
more relevant it is
Relevance in terms of quality - in your case, you could go for a dynamic generated algorithm in terms of upvotes/downvotes ratio
Time relevance - the way you are currently quantifying the relevance, using an algorithm that uses time as a filtering mechanism; still, I wouldn't go with this one as relevant content will always be valuable
Out of the 3 aforementioned scenarios, I would recommend you go for a mix between the first two.
You have to find a good way to represent the relationship between view popularity, upvotes and downvotes. This means that you first have to update your database schema in order for it to hold a view count for each post:
{
_id: ObjectId(...),
title: 'A Random Post',
authorId: ObjectId(...),
createdAt: '01-01-1900',
editedAt: '02-01-1900',
upvotes: 76,
downvotes: 15,
viewCount: 8655,
relevance:
}
Afterwards, you can determine a formula to calculate the relevance. For example, if you assume that the more views a post gets, the more popular it is, you could use the following formula:
Relevance = viewCount * upvotes/downvotes
Still, the most important part is how you choose to store the relevance attribute.
As I see it, you have two possible options:
Store it in the database alongside all the other elements - this
means that you would have to constantly update the relevance for each
post, while continuously issuing $inc updates for viewCount,
upvotes and downvotes
Determine the relevance after querying the database- only issue
$inc updates for viewCount, upvotes and downvotes; after pulling
the data from the database, you would have to parse the resulting
array and quantify the relevance, without storing it in the database
Obviously, the first scenario would generate a lot more strain on the server due to a greater number of update operations. Still, it would allow you to query relevant posts by firing a simple query.
//Top 10 most relevant posts
db.posts.find({}).sort({ relevance: -1 }).limit(10);
If you go with the second option, you would first have to pull all the documents from the database and then do some extra work to determine the relevance, before sending the data from the server.

Why is my identifier collision rate increasing?

I'm using a hash of IP + User Agent as a unique identifier for every user that visits a website. This is a simple scheme with a pretty clear pitfall: identifier collisions. Multiple individuals browse the internet with the same IP + user agent combination. Unique users identified by the same hash will be recognized as a single user. I want to know how frequently this identifier error will be made.
To calculate the frequency, I've created a two-step funnel that should theoretically convert at zero percent: publish.click > signup.complete. (Users have to signup before they publish.) Running this funnel for 1 day gives me a conversion rate of 0.37%. That figure is, I figured, my unique identifier collision probability for that funnel. Looking at the raw data (a table about 10,000 rows long), I confirmed this hypothesis. 37 signups were completed by new users identified by the same hash as old users who completed publish.click during the funnel period (1 day). (I know this because hashes matched up across the funnel, while UIDs, which are assigned at signup, did not.)
I thought I had it all figured out...
But then I ran the funnel for 1 week, and the conversion rate increased to 0.78%. For 5 months, the conversion rate jumped to 1.71%.
What could be at play here? Why is my conversion (collision) rate increasing with widening experiment period?
I think it may have something to do with the fact that unique users typically only fire signup.complete once, while they may fire publish.click multiple times over the course of a period. I'm struggling however to put this hypothesis into words.
Any help would be appreciated.
Possible explanations starting with the simplest:
The collision rate is relatively stable, but your initial measurement isn't significant because of the low volume of positives that you got. 37 isn't very many. In this case, you've got two decent data points.
The collision rate isn't very stable and changes over time as usage changes (at work, at home, using mobile, etc.). The fact that you got three data points that show an upward trend is just a coincidence. This wouldn't surprise me, as funnel conversion rates change significantly over time, especially on a weekly basis. Also bots that we haven't caught.
If you really get multiple publishes, and sign-ups are absolutely a one-time thing, then your collision rate would increase as users who only signed up and didn't publish eventually publish. That won't increase their funnel conversion, but it will provide an extra publish for somebody else to convert on. Essentially, every additional publish raises the probability that I, as a new user, am going to get confused with a previous publish event.
Note from OP. Hypothesis 3 turned out to be the correct hypothesis.