Can we check Firestore reads origin? - google-cloud-firestore

Is there a way to quantify how many Firestore reads come from clients and how many from Google Cloud Functions?
I'd like to reduce my project reads costs.

Firebase currently does not provide tools to track the origin of document reads. All the reads fall under the same bucket, which is: "Read happened". If you need to measure specific reads from your app, you will have to track that yourself somehow. You can add a logger to your app which will track if the request came from the client or just the Cloud Function itself.
This documentation may come in handy.

Firestore audit logging information
Google Cloud services write audit logs to help you answer the
questions, "Who did what, where, and when?" within your Google Cloud
resources.
Data Access audit logs
Includes "admin read" operations that read metadata or configuration
information. Also includes "data read" and "data write" operations
that read or write user-provided data.
To receive Data Access audit logs, you must explicitly enable them.
https://cloud.google.com/firestore/docs/audit-logging

Related

Google Cloud Spanner real time Change Data Capture to PubSub/Kafka through Cloud Data Fusion or Others

I would like to achieve a real time change data capture (log-based preferred) pipeline from Google Cloud Spanner to PubSub/Kafka for my downstream real time applications. Could you please let me know if there is a great and cost-effective way to achieve that? I will appreciate any advice and recommendations.
In addition, for Cloud Data Fusion from google, I noticed that it could achieve real time from mysql/postgresql to cloud spanner, but I did not find the way go from cloud spanner to pubsub/kafka in real time.
Also, I found another two ways, which to be listed here for any comments or suggestions.
Use Debezium, a log-based change data capture Kafka connector from the link https://cloud.google.com/architecture/capturing-change-logs-with-debezium#deploying_debezium_on_gke_on_google_cloud
Create a polling service (which may miss some data) to poll data from cloud spanner from the link: https://cloud.google.com/architecture/deploying-event-sourced-systems-with-cloud-spanner
If you have any suggestion or comment on this, I will be really grateful.
There's a open source implementation of a polling service for Cloud Spanner that can also automatically push changes to PubSub here: https://github.com/cloudspannerecosystem/spanner-change-watcher
It is however not log-based. It has some inherent limitations:
It can miss updates if the same record is updated twice within the polling interval. In that case, only the last value will be reported.
It only supports soft deletes.
You could have a look at the samples to see if it is something that might suit your needs at least to some degree: https://github.com/cloudspannerecosystem/spanner-change-watcher/tree/master/samples
Cloud Spanner has a new feature called Change Streams that would allow building a downstream pipeline from Spanner to PubSub/Kafka.
At this time, there's not a pre-packaged Spanner to PubSub/Kafka connector.
The way to read change streams currently is to use the SpannerIO Apache Beam connector that would allow building the pipeline with Dataflow, or also directly querying the API.
Disclaimer: I'm a Developer Advocate that works with the Cloud Spanner team.

Per-collection read/write monitoring in Firestore [duplicate]

I am getting very high counts of Entity Writes in my firestore database.
Write permission in most of the paths are restricted, done from back-end server using admin SDK. Only a very few paths have write access- specifically only to the users who are (authenticated & registered & joined and approved in a specific group), so even though the ways to abuse are apparently thin, yet hard to specifically identify.
Only way I see- is to execute Cloud Functions on every write, and have the function log the paths somewhere to analyze. But that introduces further costs and complexity.
Is there any way/recommendation to monitor/profile where (i.e.- path) and who (UID or any identity) are performing the writes? There are tools to do such for RTDB, bu't can't find anything for Firestore.
I am also wondering if there is any way to restrict ip/users automatically in case of abuse (i.e.- high rate of read/write)?
What I'm currently doing is going to firestore console => menu usage => view usage
and I see something like this:
It's not the same as the profiler, but better than nothing.
I'm also keeping an eye on the video on the link below to see if someone provides an answer. People are asking for the profiler too.
https://www.youtube.com/watch?v=9CObBsjk6Tc

flutter data storage: local storage vs cloud storage

a question about local and remote storage of user data. Is there a best practices for the common situation where a user accesses data from an API and can favourite or otherwise personalise the data.
I have seen tutorials, e.g. a movie browsing app, where the use can make a list of favourite movies, where this personalised data is stored locally (e.g. in sqflite) and other tutorials where this data is stored remotely, eg. firebase. And firebase has an offline mode, so that data can be synced later. In that case, is it a common use case to set up local storage as well as cloud storage? Is there a common practice for this situation?
Thanks for any insights.
This is not specifically a Flutter question, more of a general app development question. It's very common to have both local and cloud "storage" but I wouldn't think of it that way. If you're interacting with an API backend I wouldn't consider it as the cloud storage for your app. Instead look at it as a different component within your applications overall architecture. You API/Backend component, this way it's not apart of your app instead it's something your app interacts with.
I assume you know the purpose of your API. Returns your data you want to see, keeps track of user profile information and other sensitive information.
When it comes to local storage I'd say the most common scenarios for local storage is results caching and storing information that the API requires on every session to make the user experience a bit better. See some examples below for both:
On instagram they store your "Feed watermark" which is a string value that is linked to a specific set of results so that when you open the app and request again they return that set of results, plus anything new - Local storage
They also "store locally" (better referred to as caching) a small set of your feeds from your posts, a list of user profiles that has stories on them and your DM's for instant and offline access. This way when the app loads up it has something to show while performing the action to get the new information. - Caching
They also store your login token, that never expires. - Local storage
tl;dr: Yes. If you need data on every session to use your API store that locally in a secure way and use that to interact with your "Cloud storage".

What does eventual or strong mean in the context of Google Cloud Storage consistency?

What does eventual or strong mean in the context of Google Cloud Storage consistency?
From the Consistency section of the documentation:
Google Cloud Storage provides strong global consistency for all
read-after-write, read-after-update, and read-after-delete operations,
including both data and metadata. When you upload a file (PUT) to
Google Cloud Storage, and you receive a success response, the object
is immediately available for download (GET) and metadata (HEAD)
operations, from any location in Google's global network.
That means it will take time to replicate all over the networks and it will not be available until the replication is finished (to demonstrate strong consistency). It is more understandable by the statement from the doc that says, "When you upload an object, the object is not available until it is completely uploaded." And that is why, the latency for writing to a globally-consistent replicated store may be slightly higher than to a non-replicated or non-committed store because a success response is returned only when multiple writes complete, not just one. Here what it says more.

Standard way to record reporting information?

I get logs for every call to my API. From these logs, I can retrieve interesting information that I use in a reporting dashboard.
My problem is that the number of those logs is getting bigger and bigger, and now I am looking for a new solution to store them.
Do I store logs or I store only the information I get from these logs?
Which database do I choose for storage (MySQL, HBase, MongoDB,Cassandra) ?