I'm running into an issue with my socket.io implementation and don't know how to solve it. I'm using pg_notify with LISTEN so when a certain value is modified in the db, it emits 'is_logged_in' to a certain client.
That in itself is working fine - my issue is when I refresh the page, socket.io disconnects the current socket_id, creates a new socket_id as usual, but when this happens, it's creating a second pgsql client instance and duplicating requests - fires the "logged_in" event 2x.
If I refresh the page again, and then manually fire the pg "logged_in" trigger, it will now emit 3 times etc. I have a leak.
const io = require('socket.io')();
const pg = require('pg');
io.on('connection', (socket) => {
const pgsql = new pg.Client({
(host, port, user, pass, db)
})
pgsql.connect()
pgsql.query("LISTEN logged_in");
pgsql.on('notification', function (data) {
socket.to(json.socket_id).emit('is_logged_in', { status:'Y' });
});
socket.on('disconnect', () => {
//pgsql.end();
});
});
I've tried killing the pgsql instance (in the socket.on disconnect) but for some reason the LISTEN stops working when I do that.
I've also tried moving the new pg.Client outside the io.on connection but when I refresh the page, the old socket_id disconnects, the new one connects, and it never executes the code to recreate the pg client.
Any ideas?
These are creating problems probably:
The pgsql instance is created on each socket connection request and is not being destroyed on disconnection
notification handler is not being removed on disconnection
I'm not much familiar with postgres, but I have worked extensively with socket. so, something like this should fix your issue:
const io = require('socket.io')();
const pg = require('pg');
const pgsql = new pg.Client({
(host, port, user, pass, db)
})
pgsql.connect();
io.on('connection', (socket) => {
pgsql.query("LISTEN logged_in");
const handler = function (data) {
socket.to(json.socket_id).emit('is_logged_in', { status:'Y' });
// You could also do pgsql.off('notification', handler) here probably
// or check if pgsql.once method is available as we need to call this handler only once?
}
pgsql.on('notification', handler);
socket.on('disconnect', () => {
pgsql.off('notification', handler);
//pgsql.end(); // Call it in server termination logic
});
});
Related
In a normal Cloud Run something like the following seems to properly close a Mongoose/MongoDB connection.
const cleanup = async () => {
await mongoose.disconnect()
console.log('database | disconnected from db')
process.exit()
}
const shutdownSignals = ['SIGTERM', 'SIGINT']
shutdownSignals.forEach((sig) => process.once(sig, cleanup))
But for a Cloud-Functions-managed Cloud Run this seems not to be the case. The instances shut down without waiting the usual 10s that "normal" Cloud Runs give after the SIGTERM is sent, so I never see the database | disconnected from db.
How would one go about this? I don't wanna create a connection for every single Cloud Functions call (very wasteful in my case).
Well, here is what I went with for now:
import mongoose from 'mongoose'
import { Sema } from 'async-sema'
functions.cloudEvent('someCloudFunction', async (event) => {
await connect()
// actual computation here
await disconnect()
})
const state = {
num: 0,
sema: new Sema(1),
}
export async function connect() {
await state.sema.acquire()
if (state.num === 0) {
try {
await mongoose.connect(MONGO_DB_URL)
} catch (e) {
process.exit(1)
}
}
state.num += 1
state.sema.release()
}
export async function disconnect() {
await state.sema.acquire()
state.num -= 1
if (state.num === 0) {
await mongoose.disconnect()
}
state.sema.release()
}
As one can see I used kind of a "reference counting" of the processes which want to use the connection, and ensured proper concurrency with async-sema.
I should note that this works well with the setup I have; I allow many concurrent requests to one of my Cloud Functions instances. In other cases this solution might not improve over just opening up (and closing) a connection every single time the function is called. But as stuff like https://cloud.google.com/functions/docs/writing/write-event-driven-functions#termination seems to imply, everything has to be handled inside the cloudEvent function.
I coded the next Node/Express/Mongo script:
const { MongoClient } = require("mongodb");
const stream = require("stream");
async function main() {
// CONECTING TO LOCALHOST (REPLICA SET)
const client = new MongoClient("mongodb://localhost:27018");
try{
// CONECTION
await client.connect();
// EXECUTING MY WATCHER
console.log("Watching ...");
await myWatcher(client, 15000);
} catch (e) {
// ERROR MANAGEMENT
console.log(`Error > ${e}`);
} finally {
// CLOSING CLIENT CONECTION ???
await client.close(); << ????
}
}main().catch(console.error);
// MY WATCHER. LISTENING CHANGES FROM MY DATABASE
async function myWatcher(client, timeInMs, pipeline = []) {
// TARGET TO WATCH
const watching = client.db("myDatabase").collection("myCollection").watch(pipeline);
// WATCHING CHANGES ON TARGET
watching.on("change", (next) => {
console.log(JSON.stringify(next));
console.log(`Doing my things...`);
});
// CLOSING THE WATCHER ???
closeChangeStream(timeInMs, watching); << ????
}
// CHANGE STREAM CLOSER
function closeChangeStream(timeInMs = 60000, watching) {
return new Promise((resolve) => {
setTimeout(() => {
console.log("Closing the change stream");
watching.close();
resolve();
}, timeInMs);
});
}
So, the goal is to keep always myWatcher function in an active state, to watch any database changes and for example, send an user notification when is detected some updating. The closeChangeStream function close myWatcher function in X seconds after any database changes. So, to keep the myWatcher always active, do you recomment not to use the closeChangeStream function ??
Another thing. With this goal in mind, to keep always myWatcher function in an active state, if I keep the await client.close();, my code emits an error: Topology is closed, so when I ignore this await client.close(), my code works perfectly. Do you recomment not to use the await client.close() function to keep always myWatcher function in an active state ??
Im a newbee in this topics !
thanks for the advice !
Thanks for help !
MongoDB change streams are implemented in a pub/sub paradigm.
Send your application to a friend in the Sudan. Have both you and your friend run the application (that has the change stream implemented). If you open up mongosh and run db.getCollection('myCollection').updateOne({_id: ObjectId("6220ee09197c13d24a7997b7")}, {FirstName: Bob}); both you and your friend will get the console.log for the change stream.
This is assuming you're not running localhost, but you can simulate this with two copies of the applications locally.
The issue comes from going into production and suddenly you have 200 load bearers, 5 developers, etc. running and your watch fires a ton of writes around the globe.
I believe, the practice is to functionize it. Wrap your watch in a function and fire the function when you're about to do a write (and close after you do your associated writes).
I am looking at implementing Row Level security with our node express + pg-promise + postgres service.
We've tried a few approaches with no success:
create a getDb(tenantId) wrapper which calls the SET app.current_tenant = '${tenantId}';` sql statement before returning the db object
getDb(tenantId) wrapper that gets a new db object every time - this works for a few requests but eventually causes too many db connections and errors out (which is understandable as it is not using pg-promise's connection pool management)
getDb(tenantId) wrapper that uses a name value (map) to store a list of db connections per tenant. This works for a short while but eventually results in too many db connections).
utilising the initOptions > connect event - have not found a way to get hold of the current request object (to then set the tenant_id)
Can someone (hopefully vitaly-t :)) please suggest the best strategy for injecting the current tenant before all sql queries are run inside a connection.
Thank you very much
here is an abbreviated code example:
const promise = require('bluebird');
const initOptions = {
promiseLib: promise,
connect: async (client, dc, useCount) => {
try {
// "hook" into the db connect event - and set the tenantId so all future sql queries in this connection
// have an implied WHERE tenant_id = app.current_setting('app.current_tenant')::UUID (aka PostGres Row Level Security)
const tenantId = client.$ctx?.cn?.tenantId || client.$ctx?.cnOptions?.tenantId;
if (tenantId) {
await client.query(`SET app.current_tenant = '${tenantId}';`);
}
} catch (ex) {
log.error('error in db.js initOptions', {ex});
}
}
};
const pgp = require('pg-promise')(initOptions);
const options = tenantIdOptional => {
return {
user: process.env.POSTGRES_USER,
host: process.env.POSTGRES_HOST,
database: process.env.POSTGRES_DATABASE,
password: process.env.POSTGRES_PASSWORD,
port: process.env.POSTGRES_PORT,
max: 100,
tenantId: tenantIdOptional
};
};
const db = pgp(options());
const getDb = tenantId => {
// how to inject tenantId into the db object
// 1. this was getting an error "WARNING: Creating a duplicate database object for the same connection and Error: write EPIPE"
// const tmpDb = pgp(options(tenantId));
// return tmpDb;
// 2. this was running the set app.current_tenant BEFORE the database connection was established
// const setTenantId = async () => {
// await db.query(`SET app.current_tenant = '${tenantId}';`);
// };
// setTenantId();
// return db;
// 3. this is bypassing the connection pool management - and is not working
// db.connect(options(tenantId));
// return db;
return db;
};
// Exporting the global database object for shared use:
const exportFunctions = {
getDb,
db // have to also export db for the legacy non-Row level security areas of the service
};
module.exports = exportFunctions;
SET operation is connection-bound, i.e. the operation only has effect while the current connection session lasts. For fresh connections spawned by the pool, you need to re-apply the settings.
The standard way of controlling current connection session is via tasks:
await db.task('my-task', async t => {
await t.none('SET app.current_tenant = ${tenantId}', {tenantId});
// ... run all session-related queries here
});
Or you can use method tx instead, if a transaction is needed.
But if you have tenantId known globally, and you want it automatically propagated through all connections, then you can use event connect instead:
const initOptions = {
connect(client) {
client.query('SET app.current_tenant = $1', [tenantId]);
}
};
The latter is kind of an after-thought work-around, but it does work reliably, has best performance, and avoids creating the extra tasks.
have not found a way to get hold of the current request object (to then set the tenant_id)
This should be very straightforward for any HTTP library out there, but is outside of scope here.
When my lambda function is activated it connects to my MongoDB Atlas instance, significantly slowing down the response by 1000-2000ms
I can cache the DB connection, but the cache only lasts if requests are made quickly after the last one and would not persist for a request made an hour later.
Do any of the native AWS DB's avoid this problem and allow an instant connection every time? (documentDB, DynamoDB etc)
CODE
let response
import { MongoClient } from 'mongodb'
let cachedDb = null
const uri =
'mongodb+srv://XXXX'
function connectToDatabase(uri) {
if (cachedDb && cachedDb.serverConfig.isConnected()) {
console.log('=> using cached database instance')
return Promise.resolve(cachedDb)
}
const dbName = 'test'
return MongoClient.connect(uri, { useNewUrlParser: true, useUnifiedTopology: true }).then(
client => {
cachedDb = client.db(dbName)
return cachedDb
}
)
}
export async function lambdaHandler() {
try {
const client = await connectToDatabase(uri)
const collection = client.collection('users')
const profile = await collection.findOne({ user: 'myuser' })
response = profile
}
} catch (err) {
console.log(err)
return err
}
return response
}
We have the same issue with mysql connections, the cached variables disappear when the lambda function cold starts.
The only solution I have is to keep the cache alive with function warming.
Just set up a periodic cron job to trigger your function every 5-15 minutes, and rest assured, it will always be idle.
You can check also this one: https://www.npmjs.com/package/lambda-warmer
You are facing cold start. It's not related to DB connection.
In order to keep you lambda function warm you can set up CloudWatch event that will trigger Lambda periodically (normally once per 5 minutes should be enough).
Also if you are using DocumentDB, you must put Lambda into VPC. It requires ENI (Elastic Network Interface) to be provisioned therefore it adds more time to start. So for example if you can avoid using VPC, then it could give you some performance advantages.
More info:
Good article about cold start
AWS Lambda Execution Context
I have setup an AWS lambda to do some data saving for me to MongoDB. I'd like to reuse the connection so I dont have to create a new connection every time the lambda is invoked. But if I leave the db connection open, the callback for the Lambda handler doesnt work!
Is there something I'm doing wrong thats creating this behavior? Here is my code:
var MongoClient = require('mongodb').MongoClient
exports.handler = (event, context, callback) => {
MongoClient.connect(process.env.MONGOURL, function (err, database) {
//database.close();
callback(null, "Successful db connection")
});
}
This is caused by not setting context.callbackWaitsForEmptyEventLoop = false. If left at the default true, the callback does not cause Lambda to return the response because your database connection is keeping the event loop from being empty.
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html