AWS Lambda callback being blocked by open mongodb connection? - mongodb

I have setup an AWS lambda to do some data saving for me to MongoDB. I'd like to reuse the connection so I dont have to create a new connection every time the lambda is invoked. But if I leave the db connection open, the callback for the Lambda handler doesnt work!
Is there something I'm doing wrong thats creating this behavior? Here is my code:
var MongoClient = require('mongodb').MongoClient
exports.handler = (event, context, callback) => {
MongoClient.connect(process.env.MONGOURL, function (err, database) {
//database.close();
callback(null, "Successful db connection")
});
}

This is caused by not setting context.callbackWaitsForEmptyEventLoop = false. If left at the default true, the callback does not cause Lambda to return the response because your database connection is keeping the event loop from being empty.
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html

Related

Trying to use Knex onConflict times out my Cloud Function

I am trying to insert geoJSON data into a PostGIS instance on a regular schedule and there is usually duplicate data each time it runs. I am looping through this geoJSON data and trying to use Knex.js onConflict modifier to ignore when a duplicate key field is found but, it times out my cloud function.
async function insertFeatures() {
try {
const results = await getGeoJSON();
pool = pool || (await createPool());
const st = knexPostgis(pool);
for (const feature of results.features) {
const { geometry, properties } = feature;
const { region, date, type, name, url } = properties;
const point = st.geomFromGeoJSON(geometry);
await pool('observations').insert({
region: region,
url: url,
date: date,
name: name,
type: type,
geom: point,
})
.onConflict('url')
.ignore()
}
} catch (error) {
console.log(error)
return res.status(500).json({
message: error + "Poop"
});
}
}
The timeout error could be caused by a variety of reasons,either it could be transaction batch size your function is processing or connection pool size or database server limitations.Here in your cloud function, check whether when setting up the pool, knex allows us to optionally register afterCreate callback, if this callback is added it is getting positive that you make the call to the done callback that is passed as the last parameter to your registered callback or else no connection will be acquired leading to timeout.
Also one way to see what knex is doing internally is to set DEBUG=knex:* environment variable, before running the code so that knex outputs information about queries, transactions and pool connections while code executes.It is advised that you set batch sizes, connection pool size and connection limits from the database server to match the workload that you are pushing to the server, this ensures the basic timeout issues caused.
Also check for similar examples here:
Knex timeout error acquiring connection
When trying to mass insert timeout occurs for knexjs error
Having timeout error after upgrading knex
Knex timeout acquiring a connection

PostgreSQL SET runtime variables with typeorm, how to ensure the session is isolated?

I would like to set run time variables for each executed query without using transactions.
for example:
SET app.current_user_id = ${userId};
How can I ensure the session will be isolated and prevent race condition on the DB?
To ensure the session will be isolated, you'll need to work with a specific connection from the pool. In postgres SESSION and CONNECTION are equivalent.
The relevant method of typeORM is createQueryRunner. There is no info about it in the docs but it is documented in the api.
Creates a query runner used for perform queries on a single database
connection. Using query runners you can control your queries to
execute using single database connection and manually control your
database transaction.
Usage example:
const foo = <T>(callback: <T>(em: EntityManager) => Promise<T>): Promise<T> => {
const connection = getConnection();
const queryRunner = connection.createQueryRunner();
return new Promise(async (resolve, reject) => {
let res: T;
try {
await queryRunner.connect();
await queryRunner.manager.query(`SET app.current_user_id = ${userId};`)
res = await callback(queryRunner.manager);
} catch (err) {
reject(err);
} finally {
await queryRunner.manager.query(`RESET app.current_user_id`)
await queryRunner.release();
resolve(res);
}
});
};
This was my answer also for How to add a request timeout in Typeorm/Typescript?

Initial Request Slow on Lamda Due to DB connection

When my lambda function is activated it connects to my MongoDB Atlas instance, significantly slowing down the response by 1000-2000ms
I can cache the DB connection, but the cache only lasts if requests are made quickly after the last one and would not persist for a request made an hour later.
Do any of the native AWS DB's avoid this problem and allow an instant connection every time? (documentDB, DynamoDB etc)
CODE
let response
import { MongoClient } from 'mongodb'
let cachedDb = null
const uri =
'mongodb+srv://XXXX'
function connectToDatabase(uri) {
if (cachedDb && cachedDb.serverConfig.isConnected()) {
console.log('=> using cached database instance')
return Promise.resolve(cachedDb)
}
const dbName = 'test'
return MongoClient.connect(uri, { useNewUrlParser: true, useUnifiedTopology: true }).then(
client => {
cachedDb = client.db(dbName)
return cachedDb
}
)
}
export async function lambdaHandler() {
try {
const client = await connectToDatabase(uri)
const collection = client.collection('users')
const profile = await collection.findOne({ user: 'myuser' })
response = profile
}
} catch (err) {
console.log(err)
return err
}
return response
}
We have the same issue with mysql connections, the cached variables disappear when the lambda function cold starts.
The only solution I have is to keep the cache alive with function warming.
Just set up a periodic cron job to trigger your function every 5-15 minutes, and rest assured, it will always be idle.
You can check also this one: https://www.npmjs.com/package/lambda-warmer
You are facing cold start. It's not related to DB connection.
In order to keep you lambda function warm you can set up CloudWatch event that will trigger Lambda periodically (normally once per 5 minutes should be enough).
Also if you are using DocumentDB, you must put Lambda into VPC. It requires ENI (Elastic Network Interface) to be provisioned therefore it adds more time to start. So for example if you can avoid using VPC, then it could give you some performance advantages.
More info:
Good article about cold start
AWS Lambda Execution Context

socket.io duplicate emit events on browser refresh

I'm running into an issue with my socket.io implementation and don't know how to solve it. I'm using pg_notify with LISTEN so when a certain value is modified in the db, it emits 'is_logged_in' to a certain client.
That in itself is working fine - my issue is when I refresh the page, socket.io disconnects the current socket_id, creates a new socket_id as usual, but when this happens, it's creating a second pgsql client instance and duplicating requests - fires the "logged_in" event 2x.
If I refresh the page again, and then manually fire the pg "logged_in" trigger, it will now emit 3 times etc. I have a leak.
const io = require('socket.io')();
const pg = require('pg');
io.on('connection', (socket) => {
const pgsql = new pg.Client({
(host, port, user, pass, db)
})
pgsql.connect()
pgsql.query("LISTEN logged_in");
pgsql.on('notification', function (data) {
socket.to(json.socket_id).emit('is_logged_in', { status:'Y' });
});
socket.on('disconnect', () => {
//pgsql.end();
});
});
I've tried killing the pgsql instance (in the socket.on disconnect) but for some reason the LISTEN stops working when I do that.
I've also tried moving the new pg.Client outside the io.on connection but when I refresh the page, the old socket_id disconnects, the new one connects, and it never executes the code to recreate the pg client.
Any ideas?
These are creating problems probably:
The pgsql instance is created on each socket connection request and is not being destroyed on disconnection
notification handler is not being removed on disconnection
I'm not much familiar with postgres, but I have worked extensively with socket. so, something like this should fix your issue:
const io = require('socket.io')();
const pg = require('pg');
const pgsql = new pg.Client({
(host, port, user, pass, db)
})
pgsql.connect();
io.on('connection', (socket) => {
pgsql.query("LISTEN logged_in");
const handler = function (data) {
socket.to(json.socket_id).emit('is_logged_in', { status:'Y' });
// You could also do pgsql.off('notification', handler) here probably
// or check if pgsql.once method is available as we need to call this handler only once?
}
pgsql.on('notification', handler);
socket.on('disconnect', () => {
pgsql.off('notification', handler);
//pgsql.end(); // Call it in server termination logic
});
});

Streaming from mongodb in AWS lambda times out

I have a lambda function which connects to a mongodb database and streams some records from the database.
exports.handler = (event, context, callback) => {
let url = event.mongodbUrl;
let collectionName = event.collectionName;
MongoClient.connect(url, (error, db) => {
if (error) {
console.log("Error connecting to mongodb: ${error}");
callback(error);
} else {
console.log("Connected to mongodb");
let events = [];
console.log("Streaming data from mongodb...");
let mongoStream = db.collection(collectionName).find().sort({ _id : -1 }).limit(500).stream();
mongoStream.on("data", data => {
events.push(data);
});
mongoStream.once("end", () => {
console.log("Stream ended");
db.close(() => {
console.log("Database connection closed");
callback(null, "Lambda function succeeded!!");
});
});
}
});
};
When the stream is ended I close the database connection and call the callback function which should end the lambda function. This works locally using node-lambda, but when I try to run it in AWS lambda I get all of the logs, including console.log("Database connection closed"); coming through, but the callback doesn't seem to be called, so the function always times out, despite the last log occurring a few seconds before the time out.
I can force it to end using context.succeed(), but that seems to be deprecated when using node version 4, so I want to avoid using it. How can I stop this function from timing out in AWS lambda?
Add the following line at the beginning of your handler function:
context.callbackWaitsForEmptyEventLoop = false
Try following:
mongoStream.once("end", callback);
This is also calling back with err and result but will not lose the context.