Firestore - recommended way to write to a document and tell its size? [duplicate] - google-cloud-firestore

From Firestore docs, we get that the maximum size for a Firestore document is:
Maximum size for a document 1 MiB (1,048,576 bytes)
QUESTION
How can I know the current size of a single doc, to check if I'm approaching
that 1mb limit?
Example:
var docRef = db.collection("cities").doc("SF");
docRef.get().then(function(doc) {
if (doc.exists) {
console.log("Document data:", doc.data());
// IS THERE A PROPERTY THAT CAN DISPLAY THE DOCUMENT FILE SIZE?
} else {
// doc.data() will be undefined in this case
console.log("No such document!");
}
}).catch(function(error) {
console.log("Error getting document:", error);
});

The calculations used to compute the size of a document is fully documented here. There is a lot of text there, so please navigate there to read it. It's not worthwhile to copy all that text here.
If you're having to manually compute the size of a document as it grows, my opinion is that you're probably not modeling your data scalably. If you have lists of data that can grow unbounded, you probably shouldn't be using a list field, and instead put that data in documents in a new collection or subcollection. There are some exceptions to this rule, but generally speaking, you should not have to worry about computing the size of a document in your client code.

I've published a npm package that calculates the size of a Firestore document.
Other packages like sizeof or object-sizeof that calculate the size of JS object will not give you a precise result because some primitives in Firestore have different byte value. For example boolean in Js is stored in 4 bytes, in a Firestore document it's 1 byte. Null is 0 bytes, in Firestore it's 1 byte.
Additionally to that Firestore has own unique types with fixed byte size: Geo point, Date, Reference.
Reference is a large object. Packages like sizeof will traverse through all the methods/properties of Reference, instead just doing the right thing here. Which is to sum String value of a document name + path to it + 16 bytes. Also, if Reference points to a parent doc , sizeof or object-sizeof will not detect circular reference here which might spell even bigger trouble than incorrect size.

For Android users who want to check the size of a document against the maximum of 1 MiB (1,048,576 bytes) quota, there is a library I have made and that can help you calculate that:
https://github.com/alexmamo/FirestoreDocument-Android/tree/master/firestore-document
In this way, you'll be able to always stay below the limit. The algorithm behind this library is the one that is explained in the official documentation regarding the Storage Size.

I was looking in the Firebase reference expecting the metadata would have an attribute, but it doesn't. You can check it here.
So my next approach would be to figure the weight of the object as an approximation. The sizeOf library seems to have a reasonable API for it.
So it would be something like:
sizeof.sizeof(doc.data());
I wouldn't use the document snapshot, because it contains metadata, like if there are pending saves. On another hand overestimating could be better in some cases.
[UPDATE] Thanks to Doug Stevenson for the wonderful insight
So I was curious how much the difference would actually be, so with my clunky js I made a dirty comparison, you can see the demo here
Considering this object:
{
"boolean": true,
"number": 1,
"text": "example"
}
And discounting the id this is the result:
| Method | Bytes |
|---------|-------|
| FireDoc | 37 |
| sizeOf | 64 |
So sizeOf library could be a good predictor if we want to overestimate (assuming calculations are fine and will behave more or less equal for more complex entities). But as explained in the comment, it is a rough estimation.

For Swift users,
If you want to estimate the document size then I use the following. Returns the estimated size of document in Bytes. It's not 100% accurate but gives a solid estimate. Basically just converts each key, value in the data map to a string and returns total bytes of String + 1. You can see the following link for details on how Firebase determines doc size: https://firebase.google.com/docs/firestore/storage-size.
func getDocumentSize(data: [String : Any]) -> Int{
var size = 0
for (k, v) in data {
size += k.count + 1
if let map = v as? [String : Any]{
size += getDocumentSize(data: map)
} else if let array = v as? [String]{
for a in array {
size += a.count + 1
}
} else if let s = v as? String{
size += s.count + 1
}
}
return size
}

You can use this calculator (code snipped), i write by myself.
source : https://firebase.google.com/docs/firestore/storage-size
<!DOCTYPE html>
<html>
<head>
<title>Calculte Firestore Size</title>
</head>
<body>
<h1>Firestore Document Size Calculator</h1>
<h2 id="response" style="color:red">This is a Heading</h2>
<textarea id="id" style="width: 100%" placeholder="Firestore Doc Ref"></textarea>
<textarea id="json" style="width: 100%; min-height: 200px" placeholder="Firestore Doc Value JSON STRING"></textarea>
<textarea id="quantity" style="width: 100%;" placeholder="How Many repeat this value?"></textarea>
<script>
document.getElementById("json").value='{"type": "Personal","done": false , "priority": 1 , "description": "Learn Cloud Firestore"}'
document.getElementById("id").value = 'users/jeff/tasks/my_task_id'
calculate()
function yuzdeBul(total,number) {
if (number == 0) {
return 0;
}
const sonuc = Math.ceil(parseInt(number) / (parseInt(total) / 100));
return sonuc;
}
function calculate(){
var quantity = parseInt(document.getElementById("quantity").value || 1);
var firestoreId = document.getElementById("id").value;
var refTotal = firestoreId
.split("/")
.map((v) => v.length + 1)
.reduce((a, b) => a + b, 0) + 16;
var idTotal = 0
//console.log(idTotal);
var parseJson = JSON.parse(document.getElementById("json").value);
idTotal += calculateObj(parseJson);
idTotal+=32;
idTotal*=quantity;
idTotal+=refTotal;
document.getElementById("response").innerHTML = idTotal + "/" + 1048576 + " %"+yuzdeBul(1048576,idTotal);
}
function calculateObj(myObj) {
var total = Object.keys(myObj).map((key) => {
var keySize = key.toString().length + 1;
var findType = typeof myObj[key];
//console.log(key,findType)
if (findType == "string") {
keySize += myObj[key].length + 1;
} else if (findType == "boolean") {
keySize += 1;
}
if (findType == "number") {
keySize += 8;
}
if (findType == "object") {
keySize += calculateObj(myObj[key]);
}
return keySize;
});
return total.reduce((a, b) => a + b, 0);
}
document.getElementById("json").addEventListener("change", calculate);
document.getElementById("id").addEventListener("change", calculate);
document.getElementById("quantity").addEventListener("change", calculate);
</script>
</body>
</html>

So I was looking for a way to reduce unnecessary document reads by accumulating data in arrays and go worried about the size.
Turns out I wasn't even close to the limit.
Here's what you can do,
Create a new collection and add a document with the worst-case scenario for live data and using cloud console export that collection, you will see the document size.
Here is a screenshot of my export
Assuming all the documents are equal in size, each is 0.0003MB
You can also see if the documents exceed the 1024byte limit
document exceeding limit from the console
Note: you can only export when you have enabled billing.!

Related

Finding Closest Users Without Looping Through Many Records

When users log in to my app, I want them to be able to see the number of users within a selected radius. I was planning to start storing coordinates in my database and looping through each record (~50,000), running userCoordinates.distance(from: databaseCoordinateValue). However, during testing, I've found that this process takes a long time and is not a scalable solution. Do you have any advice on how to quickly query database items within a defined radius?
I am using:
Swift 4
Firebase (Firestore beta)
Xcode 10
Example of database structure and how data gets stored
database.collection("users").document((Auth.auth().currentUser?.uid)!).setData([
"available_tags" : ["milk", "honey"]]) { err in
if let err = err {
print("Error adding document: \(err)")
}
}
Take a look at s2 geometry - http://s2geometry.io/. The basic concept is you encode each location on earth as a 64 bit # with locations close to each other being close #'s. Then, you can look up locations within x distance by finding anything that is +/- a certain # from the location. Now, the actual implementation is a bit more complicated, so you end up needing to creating multiple 'cells' ie. a min and max # on the range. Then, you do lookups for each cell. (More info at http://s2geometry.io/devguide/examples/coverings.)
Here's an example of doing this in node.js / javascript. I use this in the backend and have the frontend just pass in the region/area.
const S2 = require("node-s2");
static async getUsersInRegion(region) {
// create a region
const s2RegionRect = new S2.S2LatLngRect(
new S2.S2LatLng(region.NECorner.latitude, region.NECorner.longitude),
new S2.S2LatLng(region.SWCorner.latitude, region.SWCorner.longitude),
);
// find the cell that will cover the requested region
const coveringCells = S2.getCoverSync(s2RegionRect, { max_cells: 4 });
// query all the users in each covering region/range simultaneously/in parallel
const coveringCellQueriesPromies = coveringCells.map(coveringCell => {
const cellMaxID = coveringCell
.id()
.rangeMax()
.id();
const cellMinID = coveringCell
.id()
.rangeMin()
.id();
return firestore
.collection("User")
.where("geoHash", "<=", cellMaxID)
.where("geoHash", ">=", cellMinID).
get();
});
// wait for all the queries to return
const userQueriesResult = await Promise.all(coveringCellQueriesPromies);
// create a set of users in the region
const users = [];
// iterate through each cell and each user in it to find those in the range
userQueriesResult.forEach(userInCoveringCellQueryResult => {
userInCoveringCellQueryResult.forEach(userResult => {
// create a cell id from the has
const user = userResult.data();
const s2CellId = new S2.S2CellId(user.geoHash.toString());
// validate that the user is in the view region
// since cells will have areas outside of the input region
if (s2RegionRect.contains(s2CellId.toLatLng())) {
user.id = userResult.id;
users.push(user);
}
});
});
return users;
}
S2 geometry has a lot of ways to find the covering cells (i.e. what area you want to look up values for), so it's definitely worth looking at the API and finding the right match for your use case.

How do I add polyline length to a Bing Maps polygon

I am trying to recreate a tax map within my system using Bing Maps. My problem is in listing the length, in feet, of the sides of the polygons I am creating. I have a good idea of how to get the length of polylines I am creating from the MSSQL 2012 geometry or geography items in my database. I cannot figure out how to present it to the user effectively though. I have two ideas for how I would like to do this.
Place the lengths directly on or adjacent to the polyline in question.
Create an emphasized point on the full polygon and list to the side of the map, the lengths of the sides of the polygon based on a clockwise order.
Either of the 2 options would work as an acceptable solution. I used this tutorial to create my current environment so I would be looking to integrate the solution into it in some way:
How to create a spatial web service that connects a database to Bing Maps using EF5
Note that my implementation only uses the countries part of the code so I do not need to deal with single points like cities that are in that tutorial.
The relevant piece of code that handles drawing on the map that I would need to edit can be found here:
Bing Maps v7 WellKnowTextModule
If you want to get the perimeter of a polygon in SQL2012 you can grab the exterior ring of it. The exterior ring will be a LineString i.e. "#g.STExteriorRing()". Then measure the length along that line. i.e. "#g.STExteriorRing().STLength()". However, countries are usually not just single Polygons, they can be MultiPpolygons, or GeometryCollections. So to calculate these lengths we have to do a bit more work. Here is a helper method you can add to the service to calculate the perimeters of these shapes:
private double CalculateLength(SqlGeometry geom)
{
double length = 0;
if(string.Compare(geom.STGeometryType().Value, "polygon", true) == 0)
{
}
else if (string.Compare(geom.STGeometryType().Value, "multipolygon", true) == 0)
{
int numPolygon = geom.STNumGeometries().Value;
for(int i = 1; i <= numPolygon; i++){
length += geom.STGeometryN(i).STExteriorRing().STLength().Value;
}
}
else if (string.Compare(geom.STGeometryType().Value, "geometrycollection", true) == 0)
{
int numGeom = geom.STNumGeometries().Value;
for (int i = 1; i <= numGeom; i++)
{
length += CalculateLength(geom.STGeometryN(i));
}
}
return length;
}
To get the length info from the server side to the client add a property to the Country or BaseEntity class like this:
[DataMember]
public double Perimeter { get; set; }
From here you can populate this value after the linq query is used to get the response results using a simple loop that calls the helper method from earlier:
for (int i = 0; i < r.Results.Count;i++)
{
var geom = SqlGeometry.STGeomFromText(new System.Data.SqlTypes.SqlChars(r.Results[i].WKT), 4326);
r.Results[i].Perimeter = CalculateLength(geom);
}
As for displaying the information on the map. An easy way to place the information on a polyline is to choose a coordinate along the line, perhaps the middle one, just get the # or coordinates in the line and find the middle index and use that coordinate for a pushpin. You can then create a custom push using either a background image with text, or using custom HTML:
http://www.bingmapsportal.com/ISDK/AjaxV7#Pushpins4
http://www.bingmapsportal.com/ISDK/AjaxV7#Pushpins15
Wanted to add an addendum to the answer I accepted as I feel it changes it a bit.
While working on this I found that I was not actually able to get each line segment's length via entity framework. This is due to the fact that the query required changing the geography I had back to a geometry then parse it to its base line segments and then change those line segments back to geographies. The query, even in SQL, would take minutes so it was not an option to run dynamically in EF.
I ended up creating another table in my database containing the parsed line segments for each side of each polygon I had. Then I could use the centroids of the line segments as faux cities. I then added this logic into the DisplayData javascript function from the tutorial mentioned in the question after the for loop in the method.
if (shape.getLength) {
} else {
var chkPolygon = data.Results[0].WKT.substring(0, data.Results[0].WKT.indexOf('(', 0));
chkPolygon = chkPolygon.replace(/\s/g, '');
switch (chkPolygon.toLowerCase()) {
case 'point':
case 'polygon':
var latlonCheck = map.getCenter();
var setSides = window.location.origin + "/SpatialService.svc/FindNearBy?latitude=" +
latlonCheck.latitude + "&longitude=" + latlonCheck.longitude +
"&radius=" + data.Results[0].ID + "&layerName=" + "city" + "&callback=?";
CallRESTService(setSides, DisplaySides);
default:
break;
}
}
the data.Results[0].ID would find all the line segments in the new table for that specific country. Then the DisplaySides function is used to overlay the html pushpins as "cities" over the appropriate points for each side on the map
function DisplaySides(getSides) {
infobox.setOptions({ visible: false });
if (getSides && getSides.Results != null) {
for (var i = 0; i < getSides.Results.length; i++) {
var sideLenFtShort = Math.round(getSides.Results[i].LengthFeet * 100) / 100;
var htmlLenString = "<div style='font-size:14px;border:thin solid black;background-color:white;font-weight:bold;color:black;'>" + sideLenFtShort.toString(); + "</div>";
var testString = {
pushpinOptions: { width: null, height: null, htmlContent: htmlLenString }
};
var sideCtr = WKTModule.Read(getSides.Results[i].WKT, testString);
dataLayer.push(sideCtr);
}
}
else if (getSides && getSides.Error != null) {
alert("Error: " + getSides.Error);
}
}

Support for basic datatypes in H5Attributes?

I am trying out the beta hdf5 toolkit of ilnumerics.
Currently I see H5Attributes support only ilnumerics arrays. Is there any plan to extend it for basic datatypes (such as string) as part of the final release?
Does ilnumerics H5 wrappers provide provision for extending any functionality to a particular
datatype?
ILNumerics internally uses the official HDF5 libraries from the HDF Group, of course. H5Attributes in HDF5 correspond to datasets with the limitation of being not capable of partial I/O. Besides that, H5Attributes are plain arrays! Support for basic (scalar) element types is given by assuming the array stored to be scalar.
Strings are a complete different story: strings in general are variable length datatypes. In terms of HDF5 strings are arrays of element type Char. The number of characters in the string determines the length of the array. In order to store a string into a dataset or attribute, you will have to store its individual characters as elements of the array. In ILNumerics, you can convert your string into ILArrray or ILArray (for ASCII data) and store that into the dataset/ attribute.
Please consult the following test case which stores a string as value into an attribute and reads the content back into a string.
Disclaimer: This is part of our internal test suite. You will not be able to compile the example directly, since it depends on the existence of several functions which may are not available. However, you will be able to understand how to store strings into datasets and attributes:
public void StringASCIAttribute() {
string file = "deleteA0001.h5";
string val = "This is a long string to be stored into an attribute.\r\n";
// transfer string into ILArray<Char>
ILArray<Char> A = ILMath.array<Char>(' ', 1, val.Length);
for (int i = 0; i < val.Length; i++) {
A.SetValue(val[i], 0, i);
}
// store the string as attribute of a group
using (var f = new H5File(file)) {
f.Add(new H5Group("grp1") {
Attributes = {
{ "title", A }
}
});
}
// check by reading back
// read back
using (var f = new H5File(file)) {
// must exist in the file
Assert.IsTrue(f.Get<H5Group>("grp1").Attributes.ContainsKey("title"));
// check size
var attr = f.Get<H5Group>("grp1").Attributes["title"];
Assert.IsTrue(attr.Size == ILMath.size(1, val.Length));
// read back
ILArray<Char> titleChar = attr.Get<Char>();
ILArray<byte> titleByte = attr.Get<byte>();
// compare byte values (sum)
int origsum = 0;
foreach (var c in val) origsum += (Byte)c;
Assert.IsTrue(ILMath.sumall(ILMath.toint32(titleByte)) == origsum);
StringBuilder title = new StringBuilder(attr.Size[1]);
for (int i = 0; i < titleChar.Length; i++) {
title.Append(titleChar.GetValue(i));
}
Assert.IsTrue(title.ToString() == val);
}
}
This stores arbitrary strings as 'Char-array' into HDF5 attributes and would work just the same for H5Dataset.
As an alternative solution you may use HDF5DotNet (http://hdf5.net/default.aspx) wrapper to write attributes as strings:
H5.open()
Uri destination = new Uri(#"C:\yourFileLocation\FileName.h5");
//Create an HDF5 file
H5FileId fileId = H5F.create(destination.LocalPath, H5F.CreateMode.ACC_TRUNC);
//Add a group to the file
H5GroupId groupId = H5G.create(fileId, "groupName");
string myString = "String attribute";
byte[] attrData = Encoding.ASCII.GetBytes(myString);
//Create an attribute of type STRING attached to the group
H5AttributeId attrId = H5A.create(groupId, "attributeName", H5T.create(H5T.CreateClass.STRING, attrData.Length),
H5S.create(H5S.H5SClass.SCALAR));
//Write the string into the attribute
H5A.write(attributeId, H5T.create(H5T.CreateClass.STRING, attrData.Length), new H5Array<byte>(attrData));
H5A.close(attributeId);
H5G.close(groupId);
H5F.close(fileId);
H5.close();

Filter getElementsByTagName list by option values

I'm using getElementsByTagName to return all the select lists on a page - is it possible to then filter these based upon an option value, ie of the first or second item in the list?
The reason is that for reasons I won't go into here there are a block of select lists with number values (1,2,3,4,5 etc) and others which have text values (Blue and Black, Red and Black etc) and I only want the scripting I have to run on the ones with numerical values. I can't add a class to them which would more easily let me do this however I can be certain that the first option value in the list will be "1".
Therefore is there a way to filter the returned list of selects on the page by only those whose first option value is "1"?
I am pretty sure that there is a better solution, but for the moment you can try something like:
var allSelect = document.getElementsByTagName("select");
var result = filterBy(allSelect, 0/*0 == The first option*/, "1"/* 1 == the value of the first option*/);
function filterBy(allSelect, index, theValue) {
var result = [];
for (var i = 0; i < allSelect.length; i++) {
if(allSelect[i].options[index].value == theValue ) {
result.push(allSelect[i]);
}
}
return result;
}
I managed to get this working by wrapping a simple IF statement around the action to be performed (in this case, disabling options) as follows:
inputs = document.getElementsByTagName('select');
for (i = 0; i < inputs.length; i++) {
if (inputs[i].options[1].text == 1) {
// perform action required
}
}
No doubt there is a slicker or more economic way to do this but the main thing is it works for me.

Peculiar Map/Reduce result from CouchDB

I have been using CouchDB for quite sometime without any issues. That is up until now. I recently saw something in my map/reduce results which I had overlooked!
This is before performing a sum on the "avgs" variable. I'm basically trying to find the average of all values pertaining to a particular key. Nothing fancy. The result is as expected.
Note the result for timestamp 1308474660000 (4th row in the table):
Now I sum the "avgs" array. Now here is something that is peculiar about the result. The sum for the key with timestamp 1308474660000 is a null!! Why is CouchDB spitting out nulls for a simple sum? I tried with a custom addition function and its the same problem.
Can someone explain to me why is there this issue with my map/reduce result?
CouchDB version: 1.0.1
UPDATE:
After doing a rereduce I get a reduce overflow error!
Error: reduce_overflow_error
Reduce output must shrink more rapidly: Current output: '["001,1,1,1,1,1,11,1,1,1,1,1,1,11,1,1,1,1,1,1,11,1,1,1,1,1,1,11,1,1,1,1,1,101,1,1,1,1,1,1,11,1,1,1,1'... (first 100 of 396 bytes)
This is my modified reduce function:
function (key, values, rereduce) {
if(!rereduce) {
var avgs = [];
for(var i=values.length-1; i>=0 ; i--) {
avgs.push(Number(values[i][0])/Number(values[i][1]));
}
return avgs;
} else {
return sum(values);
};
}
UPDATE 2:
Well now it has gotten worse. Its selectively rereducing. Also, the ones it has rereduced show wrong results. The length of the value in 4th row for timestamp (1308474660000) should be 2 and not 3.
UPDATE 3:
I finally got it to work. I hadn't understood the specifics of rereduce properly. AFAIK, Couchdb itself decides how to/when to rereduce. In this example, whenever the array was long enough to process, Couchdb would send it to rereduce. So I basically had to sum twice. Once in reduce, and again in rereduce.
function (key, values, rereduce) {
if(!rereduce) {
var avgs = [];
for(var i=values.length-1; i>=0 ; i--) {
avgs.push(Number(values[i][0])/Number(values[i][1]));
}
return sum(avgs);
} else {
return sum(values); //If my understanding of rereduce is correct, it only receives only the avgs that are large enough to not be processed by reduce.
}
}
Your for loop in the reduce function is probably not doing what you think it is. For example, it might be throwing an exception that you did not expect.
You are expecting an array of 2-tuples:
// Expectation
values = [ [value1, total1]
, [value2, total2]
, [value3, total3]
];
During a re-reduce, the function will get old results from itself before.
// Re-reduce values
values = [ avg1
, avg2
, avg3
]
Therefore I would begin by examining how your code works if and when rereduce is true. Perhaps something simple will fix it (although often I have to log() things until I find the problem.)
function(keys, values, rereduce) {
if(rereduce)
return sum(values);
// ... then the same code as before.
}
I will elaborate on my count/sum comment, just in case you are curious.
This code is not tested, but hopefully you will get the idea. The end result is always a simple object {"count":C, "sum":S} and you know the average by computing S / C.
function (key, values, rereduce) {
// Reduce function
var count = 0;
var sum = 0;
var i;
if(!rereduce) {
// `values` stores actual map output
for(i = 0; i < values.length; i++) {
count += Number(values[i][1]);
sum += Number(values[i][0]);
}
return {"count":count, "sum":sum};
}
else {
// `values` stores count/sum objects returned previously.
for(i = 0; i < values.length; i++) {
count += values[i].count;
sum += values[i].sum;
}
return {"count":count, "sum":sum};
}
}
I use the following code to do average. Hope it helps.
function (key, values) {
return sum(values)/values.length;
}