Initial Request Slow on Lamda Due to DB connection - mongodb

When my lambda function is activated it connects to my MongoDB Atlas instance, significantly slowing down the response by 1000-2000ms
I can cache the DB connection, but the cache only lasts if requests are made quickly after the last one and would not persist for a request made an hour later.
Do any of the native AWS DB's avoid this problem and allow an instant connection every time? (documentDB, DynamoDB etc)
CODE
let response
import { MongoClient } from 'mongodb'
let cachedDb = null
const uri =
'mongodb+srv://XXXX'
function connectToDatabase(uri) {
if (cachedDb && cachedDb.serverConfig.isConnected()) {
console.log('=> using cached database instance')
return Promise.resolve(cachedDb)
}
const dbName = 'test'
return MongoClient.connect(uri, { useNewUrlParser: true, useUnifiedTopology: true }).then(
client => {
cachedDb = client.db(dbName)
return cachedDb
}
)
}
export async function lambdaHandler() {
try {
const client = await connectToDatabase(uri)
const collection = client.collection('users')
const profile = await collection.findOne({ user: 'myuser' })
response = profile
}
} catch (err) {
console.log(err)
return err
}
return response
}

We have the same issue with mysql connections, the cached variables disappear when the lambda function cold starts.
The only solution I have is to keep the cache alive with function warming.
Just set up a periodic cron job to trigger your function every 5-15 minutes, and rest assured, it will always be idle.
You can check also this one: https://www.npmjs.com/package/lambda-warmer

You are facing cold start. It's not related to DB connection.
In order to keep you lambda function warm you can set up CloudWatch event that will trigger Lambda periodically (normally once per 5 minutes should be enough).
Also if you are using DocumentDB, you must put Lambda into VPC. It requires ENI (Elastic Network Interface) to be provisioned therefore it adds more time to start. So for example if you can avoid using VPC, then it could give you some performance advantages.
More info:
Good article about cold start
AWS Lambda Execution Context

Related

How do I gracefully disconnect MongoDB in Google Functions? Behavior of "normal" Cloud Run and "Functions Cloud Run" seems to be different

In a normal Cloud Run something like the following seems to properly close a Mongoose/MongoDB connection.
const cleanup = async () => {
await mongoose.disconnect()
console.log('database | disconnected from db')
process.exit()
}
const shutdownSignals = ['SIGTERM', 'SIGINT']
shutdownSignals.forEach((sig) => process.once(sig, cleanup))
But for a Cloud-Functions-managed Cloud Run this seems not to be the case. The instances shut down without waiting the usual 10s that "normal" Cloud Runs give after the SIGTERM is sent, so I never see the database | disconnected from db.
How would one go about this? I don't wanna create a connection for every single Cloud Functions call (very wasteful in my case).
Well, here is what I went with for now:
import mongoose from 'mongoose'
import { Sema } from 'async-sema'
functions.cloudEvent('someCloudFunction', async (event) => {
await connect()
// actual computation here
await disconnect()
})
const state = {
num: 0,
sema: new Sema(1),
}
export async function connect() {
await state.sema.acquire()
if (state.num === 0) {
try {
await mongoose.connect(MONGO_DB_URL)
} catch (e) {
process.exit(1)
}
}
state.num += 1
state.sema.release()
}
export async function disconnect() {
await state.sema.acquire()
state.num -= 1
if (state.num === 0) {
await mongoose.disconnect()
}
state.sema.release()
}
As one can see I used kind of a "reference counting" of the processes which want to use the connection, and ensured proper concurrency with async-sema.
I should note that this works well with the setup I have; I allow many concurrent requests to one of my Cloud Functions instances. In other cases this solution might not improve over just opening up (and closing) a connection every single time the function is called. But as stuff like https://cloud.google.com/functions/docs/writing/write-event-driven-functions#termination seems to imply, everything has to be handled inside the cloudEvent function.

Trying to use Knex onConflict times out my Cloud Function

I am trying to insert geoJSON data into a PostGIS instance on a regular schedule and there is usually duplicate data each time it runs. I am looping through this geoJSON data and trying to use Knex.js onConflict modifier to ignore when a duplicate key field is found but, it times out my cloud function.
async function insertFeatures() {
try {
const results = await getGeoJSON();
pool = pool || (await createPool());
const st = knexPostgis(pool);
for (const feature of results.features) {
const { geometry, properties } = feature;
const { region, date, type, name, url } = properties;
const point = st.geomFromGeoJSON(geometry);
await pool('observations').insert({
region: region,
url: url,
date: date,
name: name,
type: type,
geom: point,
})
.onConflict('url')
.ignore()
}
} catch (error) {
console.log(error)
return res.status(500).json({
message: error + "Poop"
});
}
}
The timeout error could be caused by a variety of reasons,either it could be transaction batch size your function is processing or connection pool size or database server limitations.Here in your cloud function, check whether when setting up the pool, knex allows us to optionally register afterCreate callback, if this callback is added it is getting positive that you make the call to the done callback that is passed as the last parameter to your registered callback or else no connection will be acquired leading to timeout.
Also one way to see what knex is doing internally is to set DEBUG=knex:* environment variable, before running the code so that knex outputs information about queries, transactions and pool connections while code executes.It is advised that you set batch sizes, connection pool size and connection limits from the database server to match the workload that you are pushing to the server, this ensures the basic timeout issues caused.
Also check for similar examples here:
Knex timeout error acquiring connection
When trying to mass insert timeout occurs for knexjs error
Having timeout error after upgrading knex
Knex timeout acquiring a connection

pg-promise and Row Level Security

I am looking at implementing Row Level security with our node express + pg-promise + postgres service.
We've tried a few approaches with no success:
create a getDb(tenantId) wrapper which calls the SET app.current_tenant = '${tenantId}';` sql statement before returning the db object
getDb(tenantId) wrapper that gets a new db object every time - this works for a few requests but eventually causes too many db connections and errors out (which is understandable as it is not using pg-promise's connection pool management)
getDb(tenantId) wrapper that uses a name value (map) to store a list of db connections per tenant. This works for a short while but eventually results in too many db connections).
utilising the initOptions > connect event - have not found a way to get hold of the current request object (to then set the tenant_id)
Can someone (hopefully vitaly-t :)) please suggest the best strategy for injecting the current tenant before all sql queries are run inside a connection.
Thank you very much
here is an abbreviated code example:
const promise = require('bluebird');
const initOptions = {
promiseLib: promise,
connect: async (client, dc, useCount) => {
try {
// "hook" into the db connect event - and set the tenantId so all future sql queries in this connection
// have an implied WHERE tenant_id = app.current_setting('app.current_tenant')::UUID (aka PostGres Row Level Security)
const tenantId = client.$ctx?.cn?.tenantId || client.$ctx?.cnOptions?.tenantId;
if (tenantId) {
await client.query(`SET app.current_tenant = '${tenantId}';`);
}
} catch (ex) {
log.error('error in db.js initOptions', {ex});
}
}
};
const pgp = require('pg-promise')(initOptions);
const options = tenantIdOptional => {
return {
user: process.env.POSTGRES_USER,
host: process.env.POSTGRES_HOST,
database: process.env.POSTGRES_DATABASE,
password: process.env.POSTGRES_PASSWORD,
port: process.env.POSTGRES_PORT,
max: 100,
tenantId: tenantIdOptional
};
};
const db = pgp(options());
const getDb = tenantId => {
// how to inject tenantId into the db object
// 1. this was getting an error "WARNING: Creating a duplicate database object for the same connection and Error: write EPIPE"
// const tmpDb = pgp(options(tenantId));
// return tmpDb;
// 2. this was running the set app.current_tenant BEFORE the database connection was established
// const setTenantId = async () => {
// await db.query(`SET app.current_tenant = '${tenantId}';`);
// };
// setTenantId();
// return db;
// 3. this is bypassing the connection pool management - and is not working
// db.connect(options(tenantId));
// return db;
return db;
};
// Exporting the global database object for shared use:
const exportFunctions = {
getDb,
db // have to also export db for the legacy non-Row level security areas of the service
};
module.exports = exportFunctions;
SET operation is connection-bound, i.e. the operation only has effect while the current connection session lasts. For fresh connections spawned by the pool, you need to re-apply the settings.
The standard way of controlling current connection session is via tasks:
await db.task('my-task', async t => {
await t.none('SET app.current_tenant = ${tenantId}', {tenantId});
// ... run all session-related queries here
});
Or you can use method tx instead, if a transaction is needed.
But if you have tenantId known globally, and you want it automatically propagated through all connections, then you can use event connect instead:
const initOptions = {
connect(client) {
client.query('SET app.current_tenant = $1', [tenantId]);
}
};
The latter is kind of an after-thought work-around, but it does work reliably, has best performance, and avoids creating the extra tasks.
have not found a way to get hold of the current request object (to then set the tenant_id)
This should be very straightforward for any HTTP library out there, but is outside of scope here.

PostgreSQL SET runtime variables with typeorm, how to ensure the session is isolated?

I would like to set run time variables for each executed query without using transactions.
for example:
SET app.current_user_id = ${userId};
How can I ensure the session will be isolated and prevent race condition on the DB?
To ensure the session will be isolated, you'll need to work with a specific connection from the pool. In postgres SESSION and CONNECTION are equivalent.
The relevant method of typeORM is createQueryRunner. There is no info about it in the docs but it is documented in the api.
Creates a query runner used for perform queries on a single database
connection. Using query runners you can control your queries to
execute using single database connection and manually control your
database transaction.
Usage example:
const foo = <T>(callback: <T>(em: EntityManager) => Promise<T>): Promise<T> => {
const connection = getConnection();
const queryRunner = connection.createQueryRunner();
return new Promise(async (resolve, reject) => {
let res: T;
try {
await queryRunner.connect();
await queryRunner.manager.query(`SET app.current_user_id = ${userId};`)
res = await callback(queryRunner.manager);
} catch (err) {
reject(err);
} finally {
await queryRunner.manager.query(`RESET app.current_user_id`)
await queryRunner.release();
resolve(res);
}
});
};
This was my answer also for How to add a request timeout in Typeorm/Typescript?

AWS Lambda callback being blocked by open mongodb connection?

I have setup an AWS lambda to do some data saving for me to MongoDB. I'd like to reuse the connection so I dont have to create a new connection every time the lambda is invoked. But if I leave the db connection open, the callback for the Lambda handler doesnt work!
Is there something I'm doing wrong thats creating this behavior? Here is my code:
var MongoClient = require('mongodb').MongoClient
exports.handler = (event, context, callback) => {
MongoClient.connect(process.env.MONGOURL, function (err, database) {
//database.close();
callback(null, "Successful db connection")
});
}
This is caused by not setting context.callbackWaitsForEmptyEventLoop = false. If left at the default true, the callback does not cause Lambda to return the response because your database connection is keeping the event loop from being empty.
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html