Static lookup data stored in localhost for 1000+ users (connections) - postgresql

Sometimes you have static data that is used by all customers. I am looking for a solution that fetches this from localhost (127.0.0.1) using a sort of database.
I have done some tests using Golang fetching from a local Postgresql database and it works perfect. But how does this scale to 1000+ users?
I noticed that only 1 session was started at the local server regardless which computer (as I used 127.0.0.1 in Golang to call Postgres). At some point this may or maybe not be a bottleneck for 1000 users to only using one session?
My questions are:
How many concurrent users can Postgresql handle per session before
it become a bottleneck? Or is this handled by the calling language (Golang)?
Is it even possible to handle many queries per session from
different users?
Is there other better ways to manage static lookup data for all customers than a local Postgresql database (Redis?)
I hope this question fits this forum. Otherwise, please point me in right direction.

Every session creates a new postgres process, which gets forked from the "main" postgres process listening to the port (default 5432).
Default is that 100 sessions can be opened in parallel, but this can easily be changed in postgresql.conf.
There are no parallel queries being executed in one session.

Related

How to setup mutli-tenancy using row level security on Postgres with knex

I am architecting a database where I expected to have 1,000s of tenants where some data will be shared between tenants. I am currently planning on using Postgres with row level security for tenant isolation. I am also using knex and Objection.js to model the database in node.js.
Most of the tutorials I have seen look like this where you create a separate knex connection per tenant. However, I've run into a problem on my development machine where after I create ~100 connections, I received this error: "remaining connection slots are reserved for non-replication superuser connections".
I'm investigating a few possible solutions/work-arounds, but I was wondering if anyone has been able to make this setup work the way I'm intending. Thanks!
Perhaps one solution might be to cache a limited number of connections, and destroy the oldest cached connection when the limit is reached. See this code as an example.
That code should probably be improved, however, to use a Map as the knexCache instead of an object, since a Map remembers the insertion order.

Is it possible to redirect SELECT statements on different server using postgres foreign data wrapper

We are using PostgreSQL version 13 and our database has a high volume of read request about 95% and 5% write request. We have a read only replica in slave mode as well. So to offload my primary server I want to derive a strategy where I can redirect reads on slave server or maybe some other master-replica server and writes on the primary server.
To achieve this I found that this can be achieved using PostgreSQL foreign data wrapper.
To test my idea I created the remote server and I was able to access the data on remote server.
So I was looking if I can devise a solution where I can redirect my read requests on slave database using foreign data wrapper and more precisely if this is a wise solution since foreign data wrapper will access data from other database and requests can get slow.
Any help in this regard is highly appreciated.

Postgres replication betwenn 2 databases on same server

I need to create a replica of existing database, that would copy any changing operation from master to slave, I.e create a mirror some sort of. I found a lot of examples in web but they all describes process when master and slave are on different servers.
I would like to create a write replica on the same server where master is located , without spinning up second instance of Postgres.
Is it possible to do so and could you point me a direction where I could find a solution how to do it?
Thank you.
P.S. I understand that replication on 2 servers is better, but I just need to do it on one common server.
If you want physical replication, you will need to run two instances of PostgreSQL. If they are on the same server machine, they will need to have different port numbers. The different port numbers is the only complexity, otherwise it is just like running on two different servers.
If you want logical replication, you can do that within a single instance, but you will need to jump through some hoops to create the subscription intra-instance, as described in the "Notes" section
You could consider using a simple trigger to insert/update/delete data on the other database as soon as the main one get modified.
A more "professional" way would be to use synchronous replication.

How should I manage postgres database handles in a serverless environment?

I have an API running in AWS Lambda and AWS Gateway using Up. My API creates a database connection on startup, and therefore Lambda does this when the function is triggered for the first time. My API is written in node using Express and pg-promise to connect to and query the database.
The problem is that Lambda creates new instances of the function as it sees fit, and sometimes it appears as though there are multiple instances of it at one time.
I keep running out of DB connections as my Lambda function is using up too many database handles. If I log into Postgres and look at the pg_stat_activity table I can see lots of connections to the database.
What is the recommended pattern for solving this issue? Can one limit the number of simultaneous instances of a function in Lambda? Can you share a connection pool across instances of a function (I doubt it).
UPDATE
AWS now provides a product called RDS Proxy which is a managed connection pooling solution to solve this very issue: https://aws.amazon.com/blogs/compute/using-amazon-rds-proxy-with-aws-lambda/
There a couple ways that you can run out of database connections:
You have more concurrent Lambda executions than you have available database connections. This is certainly possible.
Your Lambda function is opening database connections but not closing them. This is a likely culprit, since web frameworks tend to keep database connections open across requests (which is more efficient), but on Lambda have no opportunity to close them since AWS will silently terminate the instance.
You can solve 1 by controlling the number of available connections on the database server (the max_connections setting on PostgreSQL) and the maximum number of concurrent Lambda function invocations (as documented here). Of course, that just trades one problem for another, since Lambda will return 429 errors when it hits the limit.
Addressing 2 is more tricky. The traditional and right way of dealing with database connection exhaustion is to use connection pooling. But with Lambda you can't do that on the client, and with RDS you don't have the option to do that on the server. You could set up an intermediary persistent connection pooler, but that makes for a more complicated setup.
In the absence of pooling, one option is to create and destroy a database connection on each function invocation. Unfortunately that will add quite a bit of overhead and latency to your requests.
Another option is to carefully control your client-side and server-side connection parameters. The idea is first to have the database close connections after a relatively short idle time (on PostgreSQL this is controlled by the tcp_keepalives_* settings). Then, to make sure that the client never tries to use a closed connection, you set a connection timeout on the client (how to do so will be framework dependent) that is shorter than that value.
My hope is that AWS will give us a solution for this at some point (such as server-side RDS connection pooling). You can see various proposed solutions in this AWS forum thread.
You have two options to fix this:
You can tweak Postgres to disconnect those idle connections. This is the best way but may require some trial-and-error.
You have to make sure that you connect to the database inside your handler and disconnect before your function returns or exits. In express, you'll have to connect/disconnect while inside your route handlers.

Data sharding based on schemes in PostgreSQL

I would like to develop a multi-tenant web application using PostgreSQL DB, having the data of each tenant in a dedicated scheme.
Each query or update will access only a single tenant scheme and/or the public scheme.
Assuming I will, at some point, need to scale out and have several PostgreSQL servers, is there some automatic way in which I can connect to a single load balancer of some sort, that will redirect the queries/updates to the relevant server, based on the required scheme?
The challenging part of this question is 'automatic way'. I have a feeling that postgres is moving that way, maybe 9.5 or later will have multi-master tendencies with partitioning allowing spreading of data across a cluster so that your frontend doesn't have to change.
Assuming that your tenants can operate in separate databases, and you are looking for a way to operate a query in the correct database, perhaps something like dns could be used during your connection to the database, using the tenant ID as a component in the dns host. Something like:
tenant_1.example.com -> 192.168.0.10
tenant_2.example.com -> 192.168.0.11
tenant_3.example.com -> 192.168.0.11
etc.example.com -> 192.168.0.X
Then you could use the connection as a map to the correct db installation. The tricky part here is the overlapping data that all tenants would need access to. If that overlapping data needs to be joined to then it will have to exist in all databases. Either copied or dblinked. If the overlapping data needs to be updated then automatic is going to be tough. Good question.