Trace cause of Firestore reads - google-cloud-firestore

I am having an excessive amount of Firestore reads in the past few weeks. My system generally was processing about 60k reads per day. About 3 weeks ago it jumped to roughly 10 million a day and the past 2 days have hit over 40 million records in a single day! My user base has not grown, my code has not changed so there is no reason for this spike. I suspect an endpoint is being hit from someone outside the scope of my application that may be trying to penetrate or retrieve records. I have reached Firestore repeatedly for help with this as it becoming a huge loss every day this happens but they are unable to assist me.
Is there a way to trace an origin of read requests or more importantly see counts for which collections or documents are being read? This must be traceable somehow as Firestore bills you per read but I cannot seem to find it.

There is currently no source IP address tracking with Cloud Firestore. All reads fall under the same bucket, which is that "they happened".
If you're building a mobile app, now would be a good time to use security rules to limit which authenticated users can read and write what parts of your database, so that it's not just being accessed unlimited from anywhere on the internet.

Related

How to improve CloudKit server latency when uploading data

I am having a hard time uploading data to my CloudKit container in a series of
'modify records' operations. I have an 'uploader' function in my app that can populate the CloudKit private database with a lot of user data. I batch the records into multiple CKModifyRecordsOperations, with 300 records in each operation as a maximum, before I upload them. When I do this with a bit of data (less than 50MB even), it can take dozens of minutes to do a simple upload. This includes a robust retry logic that takes the CKErrorRetryAfterKey key from any timed-out operations and replays them after the delay (which after happens frequently).
I checked the CloudKit dashboard, and for the container's telemetry section, the 'server latency' seems very very high (over 100,000 for 95% percentile). It also suggests the 'average request size' is 150KB on average across the last few days as I've been testing this, which doesn't seem like a lot, but the server response time is 10 seconds on each operation on average! This seems super slow.
I've tried throttling the requests so only 20 modify operations are sent at a time, but it doesn't seem to help. I have 'query' indexes for 'recordName' field for each recordType, and 'query, searchable, sortable' on some of the custom fields on the recordTypes (though not all). The CKModifyRecordsOperations' configurations have 'qualityOfService' set to 'userInitiated'. But none of this seems to help. I'm not sure what else I can try to improve the 'upload' times (downloading records seem to be happen as expected).
Is there anything else I can try to improve the time it takes to upload a few thousand records? Or is it out of my control?

Firestore ignore limit (on flutter)

I have a simple collections, and to test, i have created 10k documents in this collections.
After that, when i do a simple query with limit(5):
Firestore.instance.collection(myCollection).orderBy(myOrderBy).limit(5).getDocuments();
And i see that in my console :
W/CursorWindow(21291): Window is full: requested allocation 253420 bytes, free space 68329 bytes, window size 2097152 bytes
I/zygote64(21291): Background concurrent copying GC freed 535155(13MB) AllocSpace objects, 5(1240KB) LOS objects, 50% free, 17MB/35MB, paused 60us total 102.836ms
When i go to my Dashboard Firebase i see i have 10k read.
So I conclude that my query returns 6 results, but that it reads the entire database. Which can quickly decrease performance and increase the price.
I looked for a solution and I find this:
Firestore.instance.settings(persistenceEnabled: false;)
It seems to be working, but I have trouble understanding.
By default Firestore loads the entire collection to be able to make requests Offline?
Changing the firestore settings when launching the application would be enough, I'm not likely to be surprised?
And if I disable persistence I assume that if the user makes an offline write request, it will no longer be persisted when he is online again. Is a compromise possible?
Thanks,
Firestore's offline storage behaves as a cache, persisting any documents it has recently seen. It does not pre-load documents you haven't told it to load with a query/read operation, so in the query you show that would be at most 5 documents for each time you execute the query.
Did you add the 10K documents from the same client where you are running the query by any chance? If so, the local cache of that client may/would contain all those documents, since the client added them. You'll want to uninstall/reinstall the client to wipe the cache in that case, to get a more realistic experience of what your users would get.
The fact that you see 10K reads in your usage tab is a separate issue, not explained by the code you shared. One things to keep in mind is that documents loaded in the console are also charged reads.

Save Followers count in a field or query each time if needed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to create an App like Twitter. Now i have a question about this projects database architecture. I want to show each users Followers/Following count in his/her profile like Twitter, but i don't know that i have to query every time from Followers/Followings table/collection or this values can be two small separate field in user record? If i query every time definitely takes very much time and database overhead. In the other hand, If i save in two field for each user, When there is a change, I have to do 2 actions, Modifying Followers or Followings table and This two fields in user record. My database will be huge and very large amount of data.
Which approach is good and standard?
Well, if you want to know what is right, there is only one answer.
Each of the separate fields in the user record contains derived data (data that can be easily derived via a query). Therefore it constitutes a duplication. Therefore it fails Normalisation.
The consequence of failed Normalisation is, you have an Update Anomaly. You no longer have One Fact in One Place, you have one fact in two places. And you have to update them every time one fact changes, every time the Followers/Followed per User changes. Within a Transaction.
That isn't a "trade-off" against performance concerns, that is a crime. When the fact in two places gets "out of synch", your crimes will be exposed. You will have to re-visit the app and database and perform some hard labour to make amends. And you may have to do that several times. Until you remove the causative problem.
Performance
As for the load on the database, if your application is serious, and you expect to be in business next year, get a real SQL platform.
Population or load for this requirement is simply not an issue on a commercial platform. You always get what you pay for, so pay something of value, and get something of value.
Note that if you have millions of Users, that does not mean you have millions of Followers per User. Note that your files will be indexed, so you will not chase down 16 million Users to count 25 Followers, your index will allow you to identify 25 Followers in a maximum of 25 index rows, in very few pages. This kind of concern simply does not exist on a commercial platform, it is the concern of people with no platform.
Well, it depends who is it for?
If it's for your users - they can see how many followers they are having. I would do this Twitter API call only when user logs in to your service.
If for some reason it must be done for all users. I think best way would be to do this followers-count-api-call for example once in an hour, every second hour or just daily. This could be achieved by a script that runs in cron.
Do you really need followers or just followers count? Or both?
If both, you can request Twitter user's followers and limit it to 100 (if your cron runs every minute to every fifteen minutes). Then loop those follower ids against your database and keep inserting them, until there is match. Twitter returns all the newest follower id:s by default. So this is possible at this moment.
Just remember you can make only 15 request per user tokens agains Twitter API when requesting Followers. This limit could vary between different endpoints.
Good to mention that I assumed that you are getting only follower ids. Those you can get 5000 at a time. If you want to request follower objects, there the limit is only 200 per request.
Hope this helps :D

Large mongo update queue burst issue

I'm doing some user analytics tracking with mongo. I'm averaging about 200 updates a second to documents (around 400k) based on a users email address. There are 3 shards split along email alphabetically. It works pretty well except for the daily user state change scripts. It bursts the requests to about 6k per second.
This causes a tail spin effect where it overloads the mongo queue and it never seems to catch up again. Scripts fail, bosses get angry, etc. They also won't allow the scripts to be throttled. Since they are update operations and not insert they are not able to be submitted in bulk. What I see for me options are.
1:) Finding a way to allocate a large queue to mongo so it can wait for low points and get the data updated
2:) Writing a custom throttling solution
3:) Finding a more efficient indexing strategy (currently just indexing the email address)
Pretty much anything is on the table.
Any help is greatly appreciated

Statistics (& money transfers) with mongoDB

1) My first questions is regarding the best solution to store statistics with mongoDB
If i want to store large amounts of statistics (lets say visitors on a specific site - down to hourly), a noSQL DB like mongoDB seems to work very fine. But how do I structure those tables to get the most out of mongoDB?
I'd increase the visitor amount for that specific object id (for example SITE_MONTH_DAY_YEAR_SOMEOTHERFANCYPARAMETER) by one every time a user visits the page. But if the database gets big (>10g), doesnt that slow down (like it would on mysql) because it has to search for the object_id and update it? Is the data always accurate when I update it (afaik mongoDB does not have any table locking?)
Wouldnt it be faster just INSERTING one row for every visitor? (and more accurate) On the other hand, reading the statistics would be much faster with my first solution, wouldnt it? (especially in terms of "grouping" by site/date/[...]).
2) For every visitor counted I'd like to make a money transfer between two users. It is crucial that those transfers are always accurate. How would you achieve that?
I was thinking about a hourly cron that picks the amount of vistiors from the mongoDB.statistics for the last hour and updates the users balance. I'd prefer doing this directly/live while counting the user - but what happens if thousands of visitors are calling the script simultaneously, is there any risk of getting wrong balances?