Writes from Firestore Function GraphQL query call to Firestore DB sub-collection - google-cloud-firestore

I have a functional prototype of an app that does the following :
A user calls a page on an app that creates a basic record for an object in a Firestore DB collection
A Firebase Function gets triggered by the "create" event
The function calls a GraphQL endpoint with a (pretty dense) query
The return of the query contains arrays of data that I want to feed back in the form of sub-collection in Firebase (and not nested arrays) for the same object
As of now and for the purpose of my initial tests, I run "for" loops that scan the data provided through the query and write those to the DB.
I am looking at saving on response times and Firestore usage metrics by optimising the code and would do with some help.
Here is an example of what I get back from the GraphQL queries :
{
"data": {
"enrichObjectQuery": {
"guid": "itemRef",
"oneOfTheDataElements": [
{
"date": "2022-01-31",
"value": 15
},
{
"date": "2022-07-31",
"value": 18
},
{
"date": "2022-12-16",
"value": 12
}
]
}
}
}
And here is an example of the code I use to inject data into Firestore. I rushed into doing something functional but I find the overall execution time to be long. The architecture is of course far from being optimal but I am pretty sure my code is part of the problem.
This is just an example and I have several of these for different data elements. Hope I did not mess anything up by making a generic example.
// enrich an object
for (let i = 0; i < data.enrichObjectQuery.oneOfTheDataElements.length; i++) {
const date = new Date(data.enrichObjectQuery.oneOfTheDataElements[i].date);
const timestamp = admin.firestore.Timestamp.fromDate(date);
db.collection("enrichObjectQuery")
.doc(data.enrichObjectQuery.guid)
.collection("oneOfTheDataElements")
.doc(data
. enrichObjectQuery.guid+data
. enrichObjectQuery.oneOfTheDataElements[i].date)
.set({
date: timestamp,
value: data.enrichObjectQuery.oneOfTheDataElements[i].value,
}, {merge: true});
}
Basically, it is doing what it is expected to do at this point.
I am now looking at optimising/refactoring all this to have the fastest and lightest code as possible... and looking for suggestions! My idea would be to avoid "loops" and inject directly an array as a sub-collection in Firestore but I am not sure to see how to do this.
(As you have probably guessed by now, I am bit of a beginner in this)

Related

Mongoose mongodb modify data before returning with pagination

So am fetching data with mongoose and i would like to modify the data like apply some date formats. Currently i have
const count = await UserModel.countDocuments();
const rows = await UserModel.find({ name:{$regex: search, $options: 'i'}, status:10 })
.sort([["updated_at", -1]])
.skip(page * perPage)
.limit(perPage)
.exec();
res.json({ count, rows });
The above UserModel is a mongoose model
I would like to modify some of objects like applying date formats before the data is returned while still paginating as above.
Currently i have added the following which works but i have to loop through all rows which will be a performance nighmare for large data.
res.json({ count, rows:rows.map(el=>({...el,created_at:'format date here'})) });
Is there a better option
As much as I understood your question, If you need to apply some date formats before showing data on frontend, you just need to pass the retrieved date in a date-formating library before displaying it, like in JS:
const d = new Date("2015-03-25T12:00:00Z");
However, if you want to get date in formatted form, than you must format it before storing. I hope that answer your question.
I think the warning from #Fabian Strathaus in the comments is an important consideration. I would strongly recommend that the approach you are trying to solve sets you up for success overall as opposed to introducing new pain points elsewhere with your project.
Assuming that you want to do this, an alternative approach is to ask the database to do this directly. More specifically, the $dateToString operator sounds like it could be of use here. This playground example demonstrates the basic behavior by adding a formatted date field which will be returned directly from the database. It takes the following document:
{
_id: 1,
created_at: ISODate("2022-01-15T08:15:39.736Z")
}
We then execute this sample aggregation:
db.collection.aggregate([
{
"$addFields": {
created_at_formatted: {
$dateToString: {
format: "%m/%d/%Y",
date: "$created_at"
}
}
}
}
])
The document that gets returned is:
{
"_id": 1,
"created_at": ISODate("2022-01-15T08:15:39.736Z"),
"created_at_formatted": "01/15/2022"
}
You could make use of this in a variety of ways, such as by creating and querying a view which will automatically create and return this formatted field.
I also want to comment on this statement that you made:
Currently i have added the following which works but i have to loop through all rows which will be a performance nighmare for large data.
It's good to hear that you're thinking about performance upfront. That said, your query includes a query predicate of name:{$regex: search, $options: 'i'}. Unanchored and/or case insensitive regex filters cannot use indexes efficiently. So if your status predicate is not selective, then you may need to take a look at alternative approaches for filtering on name to make sure that the query is performant.

MapReduce function to return two outputs. MongoDB

I am currently using doing some basic mapReduce using MongoDB.
I currently have data that looks like this:
db.football_team.insert({name: "Tane Shane", weight: 93, gender: "m"});
db.football_team.insert({name: "Lily Jones", weight: 45, gender: "f"});
...
I want to create a mapReduce function to group data by gender and show
Total number of each gender, Male & Female
Average weight of each gender
I can create a map / reduce function to carry out each function seperately, just cant get my head around how to show output for both. I am guessing since the grouping is based on Gender, Map function should stay the same and just alter something ont he reduce section...
Work so far
var map1 = function()
{var key = this.gender;
emit(key, {count:1});}
var reduce1 = function(key, values)
{var sum=0;
values.forEach(function(value){sum+=value["count"];});
return{count: sum};};
db.football_team.mapReduce(map1, reduce1, {out: "gender_stats"});
Output
db.football_team.find()
{"_id" : "f", "value" : {"count": 12} }
{"_id" : "m", "value" : {"count": 18} }
Thanks
The key rule to "map/reduce" in any implementation is basically that the same shape of data needs to be emitted by the mapper as is also returned by the reducer. The key reason for this is part of how "map/reduce" conceptually works by quite possibly calling the reducer multiple times. Which basically means you can call your reducer function on output that was already emitted from a previous pass through the reducer along with other data from the mapper.
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
That said, your best approach to "average" is therefore to total the data along with a count, and then simply divide the two. This actually adds another step to a "map/reduce" operation as a finalize function.
db.football_team.mapReduce(
// mapper
function() {
emit(this.gender, { count: 1, weight: this.weight });
},
// reducer
function(key,values) {
var output = { count: 0, weight: 0 };
values.forEach(value => {
output.count += value.count;
output.weight += value.weight;
});
return output;
},
// options and finalize
{
"out": "gender_stats", // or { "inline": 1 } if you don't need another collection
"finalize": function(key,value) {
value.avg_weight = value.weight / value.count; // take an average
delete value.weight; // optionally remove the unwanted key
return value;
}
}
)
All fine because both mapper and reducer are emitting data with the same shape and also expecting input in that shape within the reducer itself. The finalize method of course is just invoked after all "reducing" is finally done and just processes each result.
As noted though, the aggregate() method actually does this far more effectively and in native coded methods which do not incur the overhead ( and potential security risks ) of server side JavaScript interpretation and execution:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}}
])
And that's basically it. Moreover you can actually continue and do other things after a $group pipeline stage ( or any stage for that matter ) in ways that you cannot do with a MongoDB mapReduce implementation. Notably something like applying a $sort to the results:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}},
{ "$sort": { "avg_weight": -1 } }
])
The only sorting allowed by mapReduce is solely that the key used with emit is always sorted in ascending order. But you cannot sort the aggregated result in output in any other way, without of course performing queries when output to another collection, or by working "in memory" with returned results from the server.
As a "side note" ( though an important one ), you probably should also consider in "learning" that the reality is the "server-side JavaScript" functionality of MongoDB is really a work-around more than being a feature. When MongoDB was first introduced, it applied a JavaScript engine for server execution mostly to make up for features which had not yet been implemented.
Thus to make up for the lack of the complete implementation of many query operators and aggregation functions which would come later, adding a JavaScript engine was a "quick fix" to allow certain things to be done with minimal implementation.
The result over the years is those JavaScript engine features are gradually being removed. The group() function of the API is removed. The eval() function of the API is deprecated and scheduled for removal at the next major version. The writing is basically "on the wall" for the limited future of these JavaScript on the server features, as the clear pattern is where the native features provide support for something, then the need to continue support for the JavaScript engine basically goes away.
The core wisdom here being that focusing on learning these JavaScript on the server features, is probably not really worth the time invested unless you have a pressing use case that currently cannot be solved by any other means.

Find DocumentId through Discovery GUI tool

I want to train my Discovery collection where I have already uploaded over 200 documents. I uploaded these documents through the GUI. Looking through the Discovery documentation, I know that I have will have to make API calls to train my collection since the training API has not been exposed through the GUI yet. As part of the training API calls I need to include a document that looks like this:
{
"natural_language_query": "{natural_language_query}",
"filter": "{filter_definition}"
"examples": [
{
"document_id": "{document_id_1}",
"cross_reference": "{cross_reference_1}",
"relevance": 0
},
{
"document_id": "{document_id_2}",
"cross_reference": "{cross_reference_2}",
"relevance": 0
}
]
}
My question is how should I get the documentIds for the documents that I have already uploaded? Is there a way to find this through the GUI? Or perhaps an API call that will return something like:
{
"document_name" = "MyDocument1",
"documentId" = "the_document_id_for_MyDocument1"
},
...
{
"document_name" = "MyDocumentN",
"documentId" = "the_document_id_for_MyDocumentN"
}
Or would the only way to get the documentIds would be to create a new collection and upload all of the documents through API calls directly and track the documentIds as I get them back?
Using the GUI, perform the following steps:
Input term(_id) in the "Group query results (Aggregation)"
textbox.
Under "Fields to return", select "Specify" to input
extracted_metadata
Note, that query and filter inputs should remain empty

How can I call beforeSave methods recursively while saving associative data in CakePHP 3?

I am trying to write a REST Api using CakePHP 3.
I have two tables Documents and DocumentImages, when I send POST request with body:
{
"description": "Short Desc.",
"company_id": 2,
"department_id": 3,
"status": 0,
"document_images":[
{
"base64" : "xyz"
},
{
"base64" : "abc"
}
]
}
It saves both to Documents and DocumentImages and makes the document_id of images as it must be.
Now, I need to do something after saving the document and before saving the images, however beforeSave function in images never be called, so it saves both of the entities in DocumentsController.
What can I do for catch the event after the document saved but before the images save?
By the way, if anyone suggest a solution using CRUD I will be very appreciate.

MongoDB: Find document given field values in an object with an unknown key

I'm making a database on theses/arguments. They are related to other arguments, which I've placed in an object with a dynamic key, which is completely random.
{
_id : "aeokejXMwGKvWzF5L",
text : "test",
relations : {
cF6iKAkDJg5eQGsgb : {
type : "interpretation",
originId : "uFEjssN2RgcrgiTjh",
ratings: [...]
}
}
}
Can I find this document if I only know what the value of type is? That is I want to do something like this:
db.theses.find({relations['anything']: { type: "interpretation"}}})
This could've been done easily with the positional operator, if relations had been an array. But then I cannot make changes to the objects in ratings, as mongo doesn't support those updates. I'm asking here to see if I can keep from having to change the database structure.
Though you seem to have approached this structure due to a problem with updates in using nested arrays, you really have only caused another problem by doing something else which is not really supported, and that is that there is no "wildcard" concept for searching unspecified keys using the standard query operators that are optimal.
The only way you can really search for such data is by using JavaScript code on the server to traverse the keys using $where. This is clearly not a really good idea as it requires brute force evaluation rather than using useful things like an index, but it can be approached as follows:
db.theses.find(function() {
var relations = this.relations;
return Object.keys(relations).some(function(rel) {
return relations[rel].type == "interpretation";
});
))
While this will return those objects from the collection that contain the required nested value, it must inspect each object in the collection in order to do the evaluation. This is why such evaluation should really only be used when paired with something that can directly use an index instead as a hard value from the object in the collection.
Still the better solution is to consider remodelling the data to take advantage of indexes in search. Where it is neccessary to update the "ratings" information, then basically "flatten" the structure to consider each "rating" element as the only array data instead:
{
"_id": "aeokejXMwGKvWzF5L",
"text": "test",
"relationsRatings": [
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 1,
"ratingScore": 5
},
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 2,
"ratingScore": 6
}
]
}
Now searching is of course quite simple:
db.theses.find({ "relationsRatings.type": "interpretation" })
And of course the positional $ operator can now be used with the flatter structure:
db.theses.update(
{ "relationsRatings.ratingId": 1 },
{ "$set": { "relationsRatings.$.ratingScore": 7 } }
)
Of course this means duplication of the "related" data for each "ratings" value, but this is generally the cost of being to update by matched position as this is all that is supported with a single level of array nesting only.
So you can force the logic to match with the way you have it structured, but it is not a great idea to do so and will lead to performance problems. If however your main need here is to update the "ratings" information rather than just append to the inner list, then a flatter structure will be of greater benefit and of course be a lot faster to search.