How to maximize existing number field in Cloud Firestore

How to maximize existing number field in Cloud Firestore - swift

I have several fields in one document that contain user records for mini games. After the user has played a few of them, I update the records in this document. I cannot be allowed to overwrite the existing record value with a smaller one. Therefore, I need to maximize them.
The solutions I have considered:
Transactions. This solution does not work for me, because it will not work without an Internet connection.
Cloud Functions. I can trigger the function when the document is updated or created. This solution works for me, but it complicates the logic in my application a lot.
Security Rules. I can prevent the document from being written if its new value is less than the old one. But this solution will only work well if you write one field at a time.
Ideally, I would like something like the following:
let data: [String: Any] = [
"game_id0": FieldValue.maximum(newRecord0),
"game_id1": FieldValue.maximum(newRecord1),
"game_id2": FieldValue.maximum(newRecord2),
]
let docRef = db.collection("user_records").document(documentId)
docRef.setData(data, merge: true)
But unfortunately FieldValue class only has methods: increment, arrayUnion, arrayRemove and delete.
In the description for the protocol, I found the maximum and minimum methods, but I doubt that this can be legally used.
Can anyone tell me any other feasible method?
UPD:
Let the following document be stored on the server:
{ "game_id": 13 }
The user plays the game from one device (which is offline) and scores 20 points.
Then the same user plays the same game from another device (which is online) and scores 22 points. An update request is sent. The server now stores the following information:
{ "game_id": 22 }
On the first device of the user, the Internet appears and the recording takes place. In this case, overwriting occurs and the document takes the following form:
{ "game_id": 20 }
That is, the previously collected user record is overwritten.
But I need the recording to occur only if the new value is greater than the current one. That is, the data after step two should not change.

If you can't use a transaction (which would normally be the right thing to do), then you have to use one of your other methods. I don't think you have any alternatives. You are probably going to have an easier time with Cloud Functions, and I don't think that's going to complicate things as much as you say. It should just be a few lines of code to check that any updated value should not be less than existing values.

Related

Firebase analytics - Unity - time spent on a level

is there any possibility to get exact time spent on a certain level in a game via firebase analytics? Thank you so much 🙏
I tried to use logEvents.

The best way to do so would be measuring the time on the level within your codebase, then have a very dedicated event for level completion, in which you would pass the time spent on the level.
Let's get to details. I will use Kotlin as an example, but it should be obvious what I'm doing here and you can see more language examples here.
firebaseAnalytics.setUserProperty("user_id", userId)
firebaseAnalytics.logEvent("level_completed") {
param("name", levelName)
param("difficulty", difficulty)
param("subscription_status", subscriptionStatus)
param("minutes", minutesSpentOnLevel)
param("score", score)
}
Now see how I have a bunch of parameters with the event? These parameters are important since they will allow you to conduct a more thorough and robust analysis later on, answer more questions. Like, Hey, what is the most difficult level? Do people still have troubles on it when the game difficulty is lower? How many times has this level been rage-quit or lost (for that you'd likely need a level_started event). What about our paid players, are they having similar troubles on this level as well? How many people have ragequit the game on this level and never played again? That would likely be easier answer with sql at this point, taking the latest value of the level name for the level_started, grouped by the user_id. Or, you could also have levelName as a UserProperty as well as the EventProperty, then it would be somewhat trivial to answer in the default analytics interface.
Note that you're limited in the number of event parameters you can send per event. The total number of unique parameter names is limited too. As well as the number of unique event names you're allowed to have. In our case, the event name would be level_completed. See the limits here.
Because of those limitations, it's important to name your event properties in somewhat generic way so that you would be able to efficiently reuse them elsewhere. For this reason, I named minutes and not something like minutes_spent_on_the_level. You could then reuse this property to send the minutes the player spent actively playing, minutes the player spent idling, minutes the player spent on any info page, minutes they spent choosing their upgrades, etc. Same idea about having name property rather than level_name. Could as well be id.
You need to carefully and thoughtfully stuff your event with event properties. I normally have a wrapper around the firebase sdk, in which I would enrich events with dimensions that I always want to be there, like the user_id or subscription_status to not have to add them manually every time I send an event. I also usually have some more adequate logging there Firebase Analytics default logging is completely awful. I also have some sanitizing there, lowercasing all values unless I'm passing something case-sensitive like base64 values, making sure I don't have double spaces (so replacing \s+ with " " (space)), maybe also adding the user's local timestamp as another parameter. The latter is very helpful to indicate time-cheating users, especially if your game is an idler.
Good. We're halfway there :) Bear with me.
Now You need to go to firebase and register your eps (event parameters) into cds (custom dimensions and metrics). If you don't register your eps, they won't be counted towards the global cd limit count (it's about 50 custom dimensions and 50 custom metrics). You register the cds in the Custom Definitions section of FB.
Now you need to know whether this is a dimension or a metric, as well as the scope of your dimension. It's much easier than it sounds. The rule of thumb is: if you want to be able to run mathematical aggregation functions on your dimension, then it's a metric. Otherwise - it's a dimension. So:
firebaseAnalytics.setUserProperty("user_id", userId) <-- dimension
param("name", levelName) <-- dimension
param("difficulty", difficulty) <-- dimension (or can be a metric, depends)
param("subscription_status", subscriptionStatus) <-- dimension (can be a metric too, but even less likely)
param("minutes", minutesSpentOnLevel) <-- metric
param("score", score) <-- metric
Now another important thing to understand is the scope. Because Firebase and GA4 are still, essentially just in Beta being actively worked on, you only have user or hit scope for the dimensions and only hit for the metrics. The scope basically just indicates how the value persists. In my example, we only need the user_id as a user-scoped cd. Because user_id is the user-level dimension, it is set separately form the logEvent function. Although I suspect you can do it there too. Haven't tried tho.
Now, we're almost there.
Finally, you don't want to use Firebase to look at your data. It's horrible at data presentation. It's good at debugging though. Cuz that's what it was intended for initially. Because of how horrible it is, it's always advised to link it to GA4. Now GA4 will allow you to look at the Firebase values much more efficiently. Note that you will likely need to re-register your custom dimensions from Firebase in GA4. Because GA4 is capable of getting multiple data streams, of which firebase would be just one data source. But GA4's CDs limits are very close to Firebase's. Ok, let's be frank. GA4's data model is almost exactly copied from that of Firebase's. But GA4 has a much better analytics capabilities.
Good, you've moved to GA4. Now, GA4 is a very raw not-officially-beta product as well as Firebase Analytics. Because of that, it's advised to first change your data retention to 12 months and only use the explorer for analysis, pretty much ignoring the pre-generated reports. They are just not very reliable at this point.
Finally, you may find it easier to just use SQL to get your analysis done. For that, you can easily copy your data from GA4 to a sandbox instance of BQ. It's very easy to do.This is the best, most reliable known method of using GA4 at this moment. I mean, advanced analysts do the export into BQ, then ETL the data from BQ into a proper storage like Snowflake or even s3, or Aurora, or whatever you prefer and then on top of that, use a proper BI tool like Looker, PowerBI, Tableau, etc. A lot of people just stay in BQ though, it's fine. Lots of BI tools have BQ connectors, it's just BQ gets expensive quickly if you do a lot of analysis.
Whew, I hope you'll enjoy analyzing your game's data. Data-driven decisions rock in games. Well... They rock everywhere, to be honest.

Does Firebase Real time database always read the complete node if you reference it?

On an hypothetic node structure like:
NodeA:
-Subnode1: 000000001
-Subnode2: "thisIsAVeeeeeeeeeeeryLoooongString"
I would like to update the NodeA every X minutes, just write it, not reading it, Subnode1 would be a timestamp which I set with Server.TimeStamp and Subnode2 would be a changing string.
I would like to know if just by referencing 'NodeA' Firebase will read the contents of the whole node, and if it does, is there a way to avoid it? since the Subnode2 can be quite heavy and I would like to have control whenever I want to read it.
Clarifications:
I'm not reading the node using any querying function. My question arises because I wonder if when the app starts the referenced nodes (using dbReference = fbbase.GetReference(path)) are read automatically.
I know I could use different references for each node but then I would incur into different upload costs since it would mean 2 different connections (yes, the uploads also have costs depending on the frequency)
I'm using Firebase SDK for Unity.
Thanks in advance.

If you query NodeA, it will pull down the entire contents of that node, including all of its children.
If you want just a specific child, query it instead. You can certainly build a path to Subnode1 if you want.
There is no way to exclude a certain child from a query, while getting all others. If you don't want all children, you must query each desired child individually.

Firebase rtdb charges on storage volumes and data downloads. If you are simply updating the record in the node you should not incur costs other than minor network costs.

A reference does not incure any fee's
that being said, reads and writes do.
Reason being is a reference is a hypothetical location for a document or a query and does not nessessarily exist until its contents has been populated by an update snapshot
when you read or write to a node, your data + overhead is calculated based on the current cost model per kb

Firestore query where document id not in array [duplicate]

I'm having a little trouble wrapping my head around how to best structure my (very simple) Firestore app. I have a set of users like this:
users: {
'A123': {
'name':'Adam'
},
'B234': {
'name':'Bella'
},
'C345': {
'name':'Charlie'
}
}
...and each user can 'like' or 'dislike' any number of other users (like Tinder).
I'd like to structure a "likes" table (or Firestore equivalent) so that I can list people who I haven't yet liked or disliked. My initial thought was to create a "likes" object within the user table with boolean values like this:
users: {
'A123': {
'name':'Adam',
'likedBy': {
'B234':true,
},
'disLikedBy': {
'C345':true
}
},
'B234': {
'name':'Bella'
},
'C345': {
'name':'Charlie'
}
}
That way if I am Charlie and I know my ID, I could list users that I haven't yet liked or disliked with:
var usersRef = firebase.firestore().collection('users')
.where('likedBy.C345','==',false)
.where('dislikedBy.C345','==',false)
This doesn't work (everyone gets listed) so I suspect that my approach is wrong, especially the '==false' part. Could someone please point me in the right direction of how to structure this? As a bonus extra question, what happens if somebody changes their name? Do I need to change all of the embedded "likedBy" data? Or could I use a cloud function to achieve this?
Thanks!

There isn't a perfect solution for this problem, but there are alternatives you can do depending on what trade-offs you want.
The options: Overscan vs Underscan
Remember that Cloud Firestore only allows queries that scale independent of the total size of your dataset.
This can be really helpful in preventing you from building something that works in test with 10 documents, but blows up as soon as you go to production and become popular. Unfortunately, this type of problem doesn't fit that scalable pattern and the more profiles you have, and the more likes people create, the longer it takes to answer the query you want here.
The solution then is to find a one or more queries that scale and most closely represent what you want. There are 2 options I can think of that make trade-offs in different ways:
Overscan --> Do a broader query and then filter on the client-side
Underscan --> Do one or more narrower queries that might miss a few results.
Overscan
In the Overscan option, you're basically trading increased cost to get 100% accuracy.
Given your use-case, I imagine this might actually be your best option. Since the total number of profiles is likely orders of magnitude larger than the number of profiles an individual has liked, the increased cost of overscanning is probably inconsequential.
Simply select all profiles that match any other conditions you have, and then on the client side, filter out any that the user has already liked.
First, get all the profiles liked by the user:
var likedUsers = firebase.firestore().collection('users')
.where('likedBy.C345','==',false)
Then get all users, checking against the first list and discarding anything that matches.
var allUsers = firebase.firestore().collection('users').get()
Depending on the scale, you'll probably want to optimize the first step, e.g. every time the user likes someone, update an array in a single document for that user for everyone they have liked. This way you can simply get a single document for the first step.
var likedUsers = firebase.firestore().collection('likedUsers')
.doc('C345').get()
Since this query does scale by the size of the result set (by defining the result set to be the data set), Cloud Firestore can answer it without a bunch of hidden unscalable work. The unscalable part is left to you to optimize (with 2 examples above).
Underscan
In the Underscan option, you're basically trading accuracy to get a narrower (hence cheaper) set of results.
This method is more complex, so you probably only want to consider it if for some reason the liked to unliked ratio is not as I suspect in the Overscan option.
The basic idea is to exclude someone if you've definitely liked them, and accept the trade-off that you might also exclude someone you haven't yet liked - yes, basically a Bloom filter.
In each users profile store a map of true/false values from 0 to m (we'll get to what m is later), where everything is set to false initially.
When a user likes the profile, calculate the hash of the user's ID to insert into the Bloom filter and set all those bits in the map to true.
So let's say C345 hashes to 0110 if we used m = 4, then your map would look like:
likedBy: {
0: false,
1: true,
2: true,
3: false }
Now, to find people you definitely haven't liked, you need use the same concept to do a query against each bit in the map. For any bit 0 to m that your hash is true on, query for it to be false:
var usersRef = firebase.firestore().collection('users')
.where('likedBy.1','==',false)
Etc. (This will get easier when we support OR queries in the future). Anyone who has a false value on a bit where your user's ID hashes to true definitely hasn't been liked by them.
Since it's unlikely you want to display ALL profiles, just enough to display a single page, you can probably randomly select a single one of the ID's hash bits that is true and just query against it. If you run out of profiles, just select another one that was true and restart.
Assuming most profiles are liked 500 or less times, you can keep the false positive ratio to ~20% or less using m = 1675.
There are handy online calculators to help you work out ratios of likes per profile, desired false positive ratio, and m, for example here.
Overscan - bonus
You'll quickly realize in the Overscan option that every time you run the query, the same profiles the user didn't like last time will be shown. I'm assuming you don't want that. Worse, all the ones the user liked will be early on in the query, meaning you'll end up having to skip them all the time and increase your costs.
There is an easy fix for that, use the method I describe on this question, Firestore: How to get random documents in a collection. This will enable you to pull random profiles from the set, giving you a more even distribution and reducing the chance of stumbling on lots of previously liked profiles.
Underscan - bonus
One problem I suspect you'll have with the Underscan option is really popular profiles. If someone is almost always liked, you might start exceeding the usefulness of a bloom filter if that profile has a size not reasonable to keep in a single document (you'll want m to be less than say 8000 to avoid running into per document index limits in Cloud Firestore).
For this problem, you want to combine the Overscan option just for these profiles. Using Cloud Functions, any profile that has more than x% of the map set to true gets a popular flag set to true. Overscan everyone on the popular flag and weave them into your results from the Underscan (remember to do the discard setup).

Saving only modified fields

I am working on a game where a player is switched between servers based on their location in the game world to give the illusion of a single server while maintaining scalability. When a player joins the server their data is loaded from a database (MongoDB) and when they quit or change server their data is saved.
My problem arises from cases where a players data is modified from a separate server from the player which needs to happen occasionally. The data in the database is changed but when the player leaves or changes server the data is overwritten:
To solve this problem I was thinking of storing only modified data as usually the data you want is the most recently changed. However when trying to find ways to do this I have noticed a lack of cases where this has been done. Is there any good reasons not to do this and use another method to ensure modified data is not overwritten? The only problem I could think of is data consistency where fields are updated and only some of them are overwritten potentially putting the player in an invalid state, which could be avoided fairly easily by updating all dependent fields together.
If there any other reasons against persisting only a selection of an object or any other ways to solve this problem that doesn't introduce any major problems I would love to hear of them.

This is a classic example of underlying state change between DB and code.
Add an integer to your player profile/data document; call it v. Let's assume v = 6.
When the player joins, the server loads the record. The server knows it's "local" view of the data is v = 6. When the player leaves, the code will call
findAndModify({query: {"userID":"ID1","v":6}, update: {"$inc": { v: 1}, "$set": { fldtochange: newval, anotherfldtochange: newval2 } } });
We show literal 6 here for simplicity but it would be a variable populated during the server load. This command will succeed ONLY if the original value of v = 6 is intact. If someone has changed it, no update will occur. You can take a variety of paths to recover including a re-read of the data and doing a delta to the state in your local server. If v = 6 is still there, it is atomically incremented +1 (e.g. to 7) and the rest of the fields set with new values.

Creating new collections vs array properties

Coming from a MySQL background, I've been questioning the some of the design patterns when working with Mongo. One question I keep asking myself is when should I create a new collection vs creating a property of an array type? My current situation goes as follows:
I have a collection of Users who all have at least 1 Inbox
Each inbox has 0 or more messages
Each message can have 0 or comments
My current structure looks like this:
{
username:"danramosd",
inboxes:[
{
name:"inbox1",
messages:[
{
message:"this is my message"
comments:[
{
comment:"this is a great message"
}
]
}
]
}
]
}
For simplicity I only listed 1 inbox, 1 message and 1 comment. Realistically though there could be many more.
An approach I believe that would work better is to use 4 collections:
Users - stores just the username
Inboxes - name of the inbox, along with the UID of User it belongs to
Messages - content of the message, along with the UID of inbox it belongs to
Comments - content of the comment, along with the UID of the message it belongs to.
So which one would be the better approach?

No one can help you with this question, because it is highly dependent on your application:
how many inboxes/messages/comments do you have on average
how often do you write/modify/delete these elements
how often do you read them
a lot of other things that I forgot to mention
When you are selecting one approach over another you are doing tradeofs.
If you store everything together (in one collection as your first case) you make it super easy to get all the things for a particular user. Taking apart the thing that most probably you do not need all the information at once, you at the same time makes it super hard to update some parts of the elements (try to write a query that will add a comment or remove the third comment). Even if this is easy - mongodb does not handle well growing documents because whenever you exceeds the padding factor it moves the document to another location (which is expensive) and increases the padding factor. Also keep in mind that this potentially can hit mongodb's limit on the size of the document.
It is always a good idea to read all mongodb use cases before trying to design any storage schema. Not surprisingly they have a comprehensive overview of your case as well.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse