How to batch requests to the same URL without causing memory leaks - axios

I have a system that processes images. Essentially, I provide an ID to it, and it fetches a source image, and then it begins performing transformations on it to resize and reformat it.
This system gets quite a bit of usage, and one of the things that I've noticed is that I tend to get many requests for the same ID simultaneously, but in different requests to the webserver.
What I'd like to do is "batch" these requests. For example, if there's 5 simultaneous requests for the image "user-upload.png", I'd like there to be only one HTTP request to fetch the source image.
I'm using NestJS with default scopes for my service, so the service is shared between requests. Requests to fetch the image are done with the HttpModule, which is using axios internally.
I only care about simultaneous requests. Once the request finishes, it will be cached, and that prevents new requests from hitting the HTTP url.
I've thought about doing something like this (Pseudocode):
#Provider()
class ImageFetcher {
// Store in flight requests as a map between id:promise
inFlightRequests = { }
fetchImage(id: string) {
if (this.inFlightRequests[id]) {
return this.inFlightRequests[id]
}
this.inFlightRequests[id] = new Promise(async (resolve, reject) => {
const { data } = await this.httpService.get('/images' + id)
// error handling omitted here
resolve(data)
delete inFlightRequests[id]
})
return this.inFlightRequests[id]
}
}
The most obvious issue I see is the potential for a memory leak. This is solveable with more custom code, but I thought I'd see if anyone has any suggestions for doing this without writing more code.
In particular, I've also thought about using an axios interceptor, but I'm not entirely sure how to handle that properly. Any pointers here would be really appreciated.

Related

How to do actions when MongoDB Realm Web SDK change stream closes or times out?

I want to delete all of a user's inserts in a collection when they stop watching a change stream from a React client. I'm using the Realm Web SDK for this.
Here's a summary of my code with what I want to do at the end of it:
import * as Realm from "realm-web";
const realmApp: Realm.App = new Realm.App({ id: realmAppId });
const credentials = Realm.Credentials.anonymous();
const user: Realm.User = await realmApp.logIn(credentials);
const mongodb = realmApp?.currentUser?.mongoClient("mongodb-atlas");
const users = mongodb?.db("users").collection("users");
const changeStream = users.watch();
for await (const change of changeStream) {
switch (change.operationType) {
case "insert": {
...
break;
}
case ...
}
}
// This pseudo-code shows what I want to do
changeStream.on("close", () => // delete all user's inserts)
changeStream.on("timeout", () => // delete all user's inserts)
changeStream.on("user closes app thus also closing stream", () => ... )
Realm Web SDK patterns seem rather different from the NodeJS ones and do not seem to include a method for closing a stream or for running a callback when it closes. In any case, they don't fit my use case.
These MongoDB Realm Web docs lead to more docs about Realm. Unless I'm missing it, both sets don't talk about how to monitor for closing and timing out of a change stream watcher instantiated from the Realm Web SDK, and how to do something when it happens.
I thought another way to do this would be in Realm's Triggers. But it doesn't seem likely from their docs.
Can this even be done from a front end client? Is there a way to do this on MongoDB itself in a "serverless" way?
If you want to delete the inserts specifically when a (client-)listener of a change-stream stops listening you have to implement some logic on client side. There is currently no way to get notified of such even within Mongodb Realm.
Sice a watcher could be closed because the app / browser is closed I would recommend against running the deletion logic on your client. Instead notify a server (or call a Mongodb Realm function / http endpoint) to make the deletions.
You can use the Beacon API to reliably send a request to trigger the delete, even when the window unloads.
Client side
const inserts = [];
for await (const change of changeStream) {
switch (change.operationType) {
case 'insert': inserts.push(change);
}
}
// This point is only reached if the generator returns / stream closes
navigator.sendBeacon('url/to/endpoint', JSON.stringify(inserts));
// Might also add a handler to catch users closing the app.
window.addEventListener('unload', sendBeacon);
Note that the unload event is not reliable MDN. But there are some alternatives which maybe be good enough for your use case.
Inside a realm function you could delete the documents.
That being said, maybe there is a better way to do what you want to achieve. Is it really the timeout of the change stream listener that has to trigger the delete or some other userevent?

Is there a way to save ParseObject without make a HTTP request to the REST API?

I didn't find very much about this topic, so I wonder if it is an easy task to achieve or if it's actually not possible. My problem is that I have a lot of HTTP requests on my server even if a Cloud function is called only once. So I suppose that all the object updating / savings / queries are made by using the REST API. I have so many HTTP requests that several hundred are going timeout, I suppose for the huge traffic that it's generated.
Is there a way to save a ParseObject by executing the query directly to MongoDB? If it's not possible at the moment can you give me some hints if there are already some helper functions to convert a ParseQuery and a ParseObject to the relative in MongoDB so that I can use the MongoDB driver directly?
It's really important for my application to reduce HTTP requests traffic at the moment.
Any idea? Thanks!
EDIT:
Here an example to reproduce the concept:
Make a cloud function:
Parse.Cloud.define('hello', async (req, res) => {
let testClassObject = new Parse.Object('TestClass');
await testClassObject.save(null, {useMasterKey: true});
let query = new Parse.Query('TestClass');
let testClassRecords = await query.find({useMasterKey: true});
return testClassRecords;
});
Make a POST request:
POST http://localhost:1337/parse/functions/hello
Capture HTTP traffic on port 1337 using Wireshark:
You can see that for 1 POST request other 2 are made because of the saving / query code. My goal would be to avoid these two HTTP calls and instead make a DB call directly so that less traffic will go through the whole webserver stack.
Link to the Github question: https://github.com/parse-community/parse-server/issues/6549
The Parse Server directAccess option should do the magic for you. Please make sure you are initializing Parse Server like this:
const api = new ParseServer({
...
directAccess: true
});
...

Extremely high loading times - Requests not running async. Mongoose

Overview
I've built an application with Vue, Express and MongoDB (mongoose ORM).
On loading the landing page, a series of GET requests are made for various bits of data. The loading times are extremely high, I've recorded some times as high as 22s for a particular route. It's lead me to believe that my requests are running sequentially, despite specifying in my logic that everything should run async
I've tried reducing the size of the objects being returned from the requests as well as using the .lean() method. These attempts shaved off a couple of seconds, but the overall issue is not remotely sorted. Times are still stupid high. To give an example:
From This:
// Method to find all users
var users = await User.find({});
To:
// Method to find all users
var users = await User.find({}, "username, uid").lean();
On the page in question, there are about 5 main components. Each component is making a get request. One of these is a Chat Column and the code for it is as follows:
ChatCol.vue
beforeMount () {
this.$store.dispatch('retrieve_chat')
}
Store.js (am using Vuex store)
retrieve_chat (context) {
return new Promise((resolve, reject) => {
axios({
url: api.dev + 'api/v1/chat',
method: 'GET',
})
.then(res => {
context.commit('set_chat', res.data)
resolve(res);
}).catch(err => {
// alert(err)
reject(err);
})
})
},
Requests in this format are being made on all the components. About 5 of them in the page in question.
Backend / Server Code
To give some context into the requests being made.
The client will hit the route 'http://localhost:3000/api/v1/chat'
and the code that makes the request on the server is the following:
var Chat = require("../models/ChatMessage");
module.exports = {
// LIMIT CHAT TO 100 MESSAGES
async get_chat(req, res) {
Chat.find({}, function(err, messages) {
if (err) {
return res.status(500).send({
message: "Interval Server Error",
type: "MONGO_CHAT_DOCUMENT_QUERY",
err: err,
})
}
if (!messages) {
return res.status(400).send({
message: "Resource not found",
type: "MONGO_CHAT_DOCUMENT_QUERY",
details: "!messages - no messages found",
})
}
messages.reverse();
return res.status(200).json({
messages,
});
}).sort({"_id": -1}).limit(30);
},
}
If I look at the network tab on the chrome dev tools, this is how the requests appear. Apologies for the long winded post, I literally have no idea what is causing this
Important Note:
It was mentioned to me that mongodb has this feature where it locks when mutating the data, and I thought that might be the case, but there are no mutations taking place. It's just 3/4 get requests happening in parallel, albeit pretty big requests, but they shouldn't be taking as long as they are
Screenshot of the network tab:
(ignore the failed req, and some of the duplicate named requests)
StackOverflow sempais please help. It's a very big application and I don't know what the issue is exactly, so If I've missed out any details - Apologies, I'll clarify anything that needs clarity.
Large amount of base64 encoded data from a previously abandoned and poorly implemented image upload feature was being stored in each chat message as well as other places, causing large amounts of data to be loaded in and ultimately lead to huge loading times.
Thank you Neil Lunn.

How RestBase wiki handle caching

Following the installation of RestBase using standard config, I have a working version of summary API.
The problem that the caching mechanism seems strange to me.
The piece of code would decide whether to look at a table cache for fast response. But I cannot make it a server-cache depend on some time-constrain (max-age when the cache is written for example). It means that the decision to use cache or not entirely depend on clients.
Can someone explain the workflow of RestBase caching mechanism?
// Inside key.value.js
getRevision(hyper, req) {
//This one get the header from client request and decide to use cache
or not depend on the value. Does it mean server caching is non-existent?
if (mwUtil.isNoCacheRequest(req)) {
throw new HTTPError({ status: 404 });
}
//If should use cache, below run
const rp = req.params;
const storeReq = {
uri: new URI([rp.domain, 'sys', 'table', rp.bucket, '']),
body: {
table: rp.bucket,
attributes: {
key: rp.key
},
limit: 1
}
};
return hyper.get(storeReq).then(returnRevision(req));
}
Cache invalidation is done by the change propagation service, which is triggered on page edits and similar events. Cache control headers are probably set in the Varnish VCL logic. See here for a full Wikimedia infrastructure diagram - it is outdated but gives you the generic idea of how things are wired together.

cache2k, read through and blocking

I have used cache2k with read through in a web application to load blog posts on demand. However, I am concerned about blocking for the read through feature. For example, if multiple threads (requests) ask the cache for the same key, is it possible for the read through method to be called multiple times to load the same key/value into the cache?
I get the impression from the documentation that the read through feature does block concurrent requests for the same key until the load has completed, but may I have mis-read the documentation. I just want to check that this is the behaviour.
The method which initializes the cache looks like this:
private void initializeURItoPostCache()
{
final CacheLoader<String, PostImpl> postFileLoader = new CacheLoader<String, PostImpl>(){
#Override public PostImpl load(String uri)
{
// Fetch the data and create the post object
final PostImpl post = new PostImpl();
//.. code omitted
return post;
}
};
// Initialize the cache with a read-through loader
this.cacheUriToPost = new Cache2kBuilder<String, PostImpl>(){}
.name("cacheBlogPosts")
.eternal(true)
.loader(postFileLoader)
.build();
}
The following method is used to request a post from the cache:
public Post getPostByURI(final String uri)
{
// Check with the index service to ensure the URI is known (valid to the application)
if(this.indexService.isValidPostURI(uri))
{
// We have a post associated with the given URI, so
// request it from the cache
return this.cacheUriToPost.get(uri);
}
return EMPTY_POST;
}
Many thanks in advance, and a happy and prosperous New Year to all.
When multiple requests to the same key will provoke a cache loader call, cache2k will only invoke the loader once. Other threads wait until the load is finished. This behavior is called blocking read through. To cite from the Java Doc:
Blocking: If the loader is invoked by Cache.get(K) or other methods that allow transparent access concurrent requests on the same key will block until the loading is completed. For expired values blocking can be avoided by enabling Cache2kBuilder.refreshAhead(boolean). There is no guarantee that the loader is invoked only for one key at a time. For example, after Cache.clear() is called load operations for one key may overlap.
This behavior is very important for caches, since it protects against the Cache stampede. An example: A high traffic website receives 1000 requests per second. One resource takes quite long to generate, about 100 milliseconds. When the cache is not blocking out the multiple requests when there is a cache miss, there would be at least 100 requests hitting the loader for the same key. "at least" is an understatement, since your machine will probably not handle 100 requests at the same speed then one.
Keep in mind that there is no hard guarantee by the cache. The loader must still be able to perform correctly when called for the same key at the same time. For example blocking read through and Cache.clear() lead to competing requirements. The Cache.clear() should be fast, which means we don't want to wait for ongoing load operations to finish.