pg-promise and Row Level Security - pg-promise

I am looking at implementing Row Level security with our node express + pg-promise + postgres service.
We've tried a few approaches with no success:
create a getDb(tenantId) wrapper which calls the SET app.current_tenant = '${tenantId}';` sql statement before returning the db object
getDb(tenantId) wrapper that gets a new db object every time - this works for a few requests but eventually causes too many db connections and errors out (which is understandable as it is not using pg-promise's connection pool management)
getDb(tenantId) wrapper that uses a name value (map) to store a list of db connections per tenant. This works for a short while but eventually results in too many db connections).
utilising the initOptions > connect event - have not found a way to get hold of the current request object (to then set the tenant_id)
Can someone (hopefully vitaly-t :)) please suggest the best strategy for injecting the current tenant before all sql queries are run inside a connection.
Thank you very much
here is an abbreviated code example:
const promise = require('bluebird');
const initOptions = {
promiseLib: promise,
connect: async (client, dc, useCount) => {
try {
// "hook" into the db connect event - and set the tenantId so all future sql queries in this connection
// have an implied WHERE tenant_id = app.current_setting('app.current_tenant')::UUID (aka PostGres Row Level Security)
const tenantId = client.$ctx?.cn?.tenantId || client.$ctx?.cnOptions?.tenantId;
if (tenantId) {
await client.query(`SET app.current_tenant = '${tenantId}';`);
}
} catch (ex) {
log.error('error in db.js initOptions', {ex});
}
}
};
const pgp = require('pg-promise')(initOptions);
const options = tenantIdOptional => {
return {
user: process.env.POSTGRES_USER,
host: process.env.POSTGRES_HOST,
database: process.env.POSTGRES_DATABASE,
password: process.env.POSTGRES_PASSWORD,
port: process.env.POSTGRES_PORT,
max: 100,
tenantId: tenantIdOptional
};
};
const db = pgp(options());
const getDb = tenantId => {
// how to inject tenantId into the db object
// 1. this was getting an error "WARNING: Creating a duplicate database object for the same connection and Error: write EPIPE"
// const tmpDb = pgp(options(tenantId));
// return tmpDb;
// 2. this was running the set app.current_tenant BEFORE the database connection was established
// const setTenantId = async () => {
// await db.query(`SET app.current_tenant = '${tenantId}';`);
// };
// setTenantId();
// return db;
// 3. this is bypassing the connection pool management - and is not working
// db.connect(options(tenantId));
// return db;
return db;
};
// Exporting the global database object for shared use:
const exportFunctions = {
getDb,
db // have to also export db for the legacy non-Row level security areas of the service
};
module.exports = exportFunctions;

SET operation is connection-bound, i.e. the operation only has effect while the current connection session lasts. For fresh connections spawned by the pool, you need to re-apply the settings.
The standard way of controlling current connection session is via tasks:
await db.task('my-task', async t => {
await t.none('SET app.current_tenant = ${tenantId}', {tenantId});
// ... run all session-related queries here
});
Or you can use method tx instead, if a transaction is needed.
But if you have tenantId known globally, and you want it automatically propagated through all connections, then you can use event connect instead:
const initOptions = {
connect(client) {
client.query('SET app.current_tenant = $1', [tenantId]);
}
};
The latter is kind of an after-thought work-around, but it does work reliably, has best performance, and avoids creating the extra tasks.
have not found a way to get hold of the current request object (to then set the tenant_id)
This should be very straightforward for any HTTP library out there, but is outside of scope here.

Related

Trying to use Knex onConflict times out my Cloud Function

I am trying to insert geoJSON data into a PostGIS instance on a regular schedule and there is usually duplicate data each time it runs. I am looping through this geoJSON data and trying to use Knex.js onConflict modifier to ignore when a duplicate key field is found but, it times out my cloud function.
async function insertFeatures() {
try {
const results = await getGeoJSON();
pool = pool || (await createPool());
const st = knexPostgis(pool);
for (const feature of results.features) {
const { geometry, properties } = feature;
const { region, date, type, name, url } = properties;
const point = st.geomFromGeoJSON(geometry);
await pool('observations').insert({
region: region,
url: url,
date: date,
name: name,
type: type,
geom: point,
})
.onConflict('url')
.ignore()
}
} catch (error) {
console.log(error)
return res.status(500).json({
message: error + "Poop"
});
}
}
The timeout error could be caused by a variety of reasons,either it could be transaction batch size your function is processing or connection pool size or database server limitations.Here in your cloud function, check whether when setting up the pool, knex allows us to optionally register afterCreate callback, if this callback is added it is getting positive that you make the call to the done callback that is passed as the last parameter to your registered callback or else no connection will be acquired leading to timeout.
Also one way to see what knex is doing internally is to set DEBUG=knex:* environment variable, before running the code so that knex outputs information about queries, transactions and pool connections while code executes.It is advised that you set batch sizes, connection pool size and connection limits from the database server to match the workload that you are pushing to the server, this ensures the basic timeout issues caused.
Also check for similar examples here:
Knex timeout error acquiring connection
When trying to mass insert timeout occurs for knexjs error
Having timeout error after upgrading knex
Knex timeout acquiring a connection

PostgreSQL SET runtime variables with typeorm, how to ensure the session is isolated?

I would like to set run time variables for each executed query without using transactions.
for example:
SET app.current_user_id = ${userId};
How can I ensure the session will be isolated and prevent race condition on the DB?
To ensure the session will be isolated, you'll need to work with a specific connection from the pool. In postgres SESSION and CONNECTION are equivalent.
The relevant method of typeORM is createQueryRunner. There is no info about it in the docs but it is documented in the api.
Creates a query runner used for perform queries on a single database
connection. Using query runners you can control your queries to
execute using single database connection and manually control your
database transaction.
Usage example:
const foo = <T>(callback: <T>(em: EntityManager) => Promise<T>): Promise<T> => {
const connection = getConnection();
const queryRunner = connection.createQueryRunner();
return new Promise(async (resolve, reject) => {
let res: T;
try {
await queryRunner.connect();
await queryRunner.manager.query(`SET app.current_user_id = ${userId};`)
res = await callback(queryRunner.manager);
} catch (err) {
reject(err);
} finally {
await queryRunner.manager.query(`RESET app.current_user_id`)
await queryRunner.release();
resolve(res);
}
});
};
This was my answer also for How to add a request timeout in Typeorm/Typescript?

Initial Request Slow on Lamda Due to DB connection

When my lambda function is activated it connects to my MongoDB Atlas instance, significantly slowing down the response by 1000-2000ms
I can cache the DB connection, but the cache only lasts if requests are made quickly after the last one and would not persist for a request made an hour later.
Do any of the native AWS DB's avoid this problem and allow an instant connection every time? (documentDB, DynamoDB etc)
CODE
let response
import { MongoClient } from 'mongodb'
let cachedDb = null
const uri =
'mongodb+srv://XXXX'
function connectToDatabase(uri) {
if (cachedDb && cachedDb.serverConfig.isConnected()) {
console.log('=> using cached database instance')
return Promise.resolve(cachedDb)
}
const dbName = 'test'
return MongoClient.connect(uri, { useNewUrlParser: true, useUnifiedTopology: true }).then(
client => {
cachedDb = client.db(dbName)
return cachedDb
}
)
}
export async function lambdaHandler() {
try {
const client = await connectToDatabase(uri)
const collection = client.collection('users')
const profile = await collection.findOne({ user: 'myuser' })
response = profile
}
} catch (err) {
console.log(err)
return err
}
return response
}
We have the same issue with mysql connections, the cached variables disappear when the lambda function cold starts.
The only solution I have is to keep the cache alive with function warming.
Just set up a periodic cron job to trigger your function every 5-15 minutes, and rest assured, it will always be idle.
You can check also this one: https://www.npmjs.com/package/lambda-warmer
You are facing cold start. It's not related to DB connection.
In order to keep you lambda function warm you can set up CloudWatch event that will trigger Lambda periodically (normally once per 5 minutes should be enough).
Also if you are using DocumentDB, you must put Lambda into VPC. It requires ENI (Elastic Network Interface) to be provisioned therefore it adds more time to start. So for example if you can avoid using VPC, then it could give you some performance advantages.
More info:
Good article about cold start
AWS Lambda Execution Context

socket.io duplicate emit events on browser refresh

I'm running into an issue with my socket.io implementation and don't know how to solve it. I'm using pg_notify with LISTEN so when a certain value is modified in the db, it emits 'is_logged_in' to a certain client.
That in itself is working fine - my issue is when I refresh the page, socket.io disconnects the current socket_id, creates a new socket_id as usual, but when this happens, it's creating a second pgsql client instance and duplicating requests - fires the "logged_in" event 2x.
If I refresh the page again, and then manually fire the pg "logged_in" trigger, it will now emit 3 times etc. I have a leak.
const io = require('socket.io')();
const pg = require('pg');
io.on('connection', (socket) => {
const pgsql = new pg.Client({
(host, port, user, pass, db)
})
pgsql.connect()
pgsql.query("LISTEN logged_in");
pgsql.on('notification', function (data) {
socket.to(json.socket_id).emit('is_logged_in', { status:'Y' });
});
socket.on('disconnect', () => {
//pgsql.end();
});
});
I've tried killing the pgsql instance (in the socket.on disconnect) but for some reason the LISTEN stops working when I do that.
I've also tried moving the new pg.Client outside the io.on connection but when I refresh the page, the old socket_id disconnects, the new one connects, and it never executes the code to recreate the pg client.
Any ideas?
These are creating problems probably:
The pgsql instance is created on each socket connection request and is not being destroyed on disconnection
notification handler is not being removed on disconnection
I'm not much familiar with postgres, but I have worked extensively with socket. so, something like this should fix your issue:
const io = require('socket.io')();
const pg = require('pg');
const pgsql = new pg.Client({
(host, port, user, pass, db)
})
pgsql.connect();
io.on('connection', (socket) => {
pgsql.query("LISTEN logged_in");
const handler = function (data) {
socket.to(json.socket_id).emit('is_logged_in', { status:'Y' });
// You could also do pgsql.off('notification', handler) here probably
// or check if pgsql.once method is available as we need to call this handler only once?
}
pgsql.on('notification', handler);
socket.on('disconnect', () => {
pgsql.off('notification', handler);
//pgsql.end(); // Call it in server termination logic
});
});

Get connection used by DatabaseFactory.GetDatabase().ExecuteReader()

We have two different query strategies that we'd ideally like to operate in conjunction on our site without opening redundant connections. One strategy uses the enterprise library to pull Database objects and Execute_____(DbCommand)s on the Database, without directly selecting any sort of connection. Effectively like this:
Database db = DatabaseFactory.CreateDatabase();
DbCommand q = db.GetStoredProcCommand("SomeProc");
using (IDataReader r = db.ExecuteReader(q))
{
List<RecordType> rv = new List<RecordType>();
while (r.Read())
{
rv.Add(RecordType.CreateFromReader(r));
}
return rv;
}
The other, newer strategy, uses a library that asks for an IDbConnection, which it Close()es immediately after execution. So, we do something like this:
DbConnection c = DatabaseFactory.CreateDatabase().CreateConnection();
using (QueryBuilder qb = new QueryBuilder(c))
{
return qb.Find<RecordType>(ConditionCollection);
}
But, the connection returned by CreateConnection() isn't the same one used by the Database.ExecuteReader(), which is apparently left open between queries. So, when we call a data access method using the new strategy after one using the old strategy inside a TransactionScope, it causes unnecessary promotion -- promotion that I'm not sure we have the ability to configure for (we don't have administrative access to the SQL Server).
Before we go down the path of modifying the query-builder-library to work with the Enterprise Library's Database objects ... Is there a way to retrieve, if existent, the open connection last used by one of the Database.Execute_______() methods?
Yes, you can get the connection associated with a transaction. Enterprise Library internally manages a collection of transactions and the associated database connections so if you are in a transaction you can retrieve the connection associated with a database using the static TransactionScopeConnections.GetConnection method:
using (var scope = new TransactionScope())
{
IEnumerable<RecordType> records = GetRecordTypes();
Database db = DatabaseFactory.CreateDatabase();
DbConnection connection = TransactionScopeConnections.GetConnection(db).Connection;
}
public static IEnumerable<RecordType> GetRecordTypes()
{
Database db = DatabaseFactory.CreateDatabase();
DbCommand q = db.GetStoredProcCommand("GetLogEntries");
using (IDataReader r = db.ExecuteReader(q))
{
List<RecordType> rv = new List<RecordType>();
while (r.Read())
{
rv.Add(RecordType.CreateFromReader(r));
}
return rv;
}
}