Is there a way to Rate Limit or Throttle a user or a connection in PostgreSql? - postgresql

We have a setup wherein a Database instance is shared between multiple users.
We are trying to implement some form or throttling or Rate limiting for a shared PostgreSQL so that one user may not starve other users from consuming all the resources.
One approach that we can think of is adding connections pools and fixing the number of connections that we give each tenant.
But one user can still starve all the resource over a few connections. Is there a way to throttle resource usage per connection or per user in PostgreSQL?

No, the postgres documentation makes it clear that's not possible using Postgres alone.
It's usually a (very) bad sign if your application allows one user to starve resources from others - it suggests you've got a bottleneck in your application, and that bottleneck will appear when you least want it to.

Related

Does more instances means more connections?

I am trying to build a messaging app such that each message will be inserted to the database. Also, my backend will hit the database every second to retrieve the latest messages. I am worried that if I have many users who are using the messages feature, I will hit the database maximum number of connection quickly and the app will not work for other users. So, how can I make sure this problem will not happen?
I am just wondering if I created two (db-g1-small) instances, will I have twice the number of connection (1000 per instance)?. Google documentation says (db-g1-small) 1,000 Maximum Connections.
How can I keep track of the number of connections?. What will happen if the number of connection to the database reaches the maximum?
https://cloud.google.com/sql/pricing#2nd-gen-instance-pricing
You shouldn't have a unique connection per user to your database. Instead, your backend should use connection pooling to maintain a consistent number of connections to your instance. You can view some example of how best to do this on the Managing Database Connections page.
It's incredibly unlikely that you'll need 1000 open connections. Most applications use far, far less for optimal performance. You can check out this article about benchmarking different connection pool sizes.

Revoke JWT session tokens with a blacklist. Should I create another system for the blacklist for performance?

I'm creating a web application (in C++, for performance) where I'm expecting to process a tremendous amount of events per second; like thousands.
I've been reading about invalidating JWT tokens in my web sessions, and the most reasonable solution for that is to have a storage place for blacklisted tokens. The list has to be checked for every request, and what I'm wondering is performance related: Should I create a separate system for storing my blacklisted tokens (like redis)? Or should I just use the same PostgreSQL database I'm using for everything else? What are the advantages of using another system?
The reason I'm asking is that I saw many discussions about invalidating JWT tokens online, and many suggest to use redis (and don't explain whether it's just a solution relevant to their design or whether it's a replacement to their SQL database server for some reason). Well, why not use the same database you're using for your web application? Is there a reason that makes redis better for this?
Redis is a lot faster since its stored on memory of the server rather than opening a connection to the DB, querying and returning the results. So if speed is of importance then Redis is what you would want.
The only negative is if the server restarts the blacklisted tokens are gone. Unless you save them on disk somewhere.

MongoDB connection overhead on client side

We are evaluating different alternatives for multi-tenancy in our platform. We think that one database per customer is the way to go as data structure and requirements are completely different from one customer to another, and we want to keep them as isolated as possible.
However we are facing the question of how to manage the connection to multiple databases. We don't want to have one app instance per customer. Instead we want to have a pool of app instances handling requests for all our customers and use the correct database depending on the customer.
Our concern is if keeping connections open to many (maybe thousands) of database will cause a performance issue. We are actually worried about memory usage, so we are wondering what's the overhead on client side when performing a connection to the MongoDB server.
Also we are thinking about moving the database access to a different service, which is going to be responsible of handling the database connection for all customers. In this case, is there an existing tool that allows to do that kind of "multiplexing" of MongoDB databases?
Some additional notes:
We discarded sharding. It won't fit our needs. We need different databases.
Databases will be in different servers with reserved resources. This means all databases run its own mondod process and we need different connections.
We use Java driver.

Postgres 9.0 and pgpool replication : single point of failure?

My application uses Postgresql 9.0 and is composed by one or more stations that interacts with a global database: it is like a common client server application but to avoid any additional hardware, all stations include both client and server: a main station is promoted to act also as server, and any other act as a client to it. This solution permits me to be scalable: a user may initially need a single station but it can decide to expand to more in future without a useless separate server in the initial phase.
I'm trying to avoid that if main station goes down all others stop working; to do it the best solution could be to continuously replicate the main database to unused database on one or more stations.
Searching I've found that pgpool can be used for my needs but from all examples and tutorial it seems that point of failure moves from main database to server that runs pgpool.
I read something about multiple pgpool and heartbeat tool but it isn't clear how to do it.
Considering my architecture, where doesn't exist separated and specialized servers, can someone give me some hints ? In case of failover it seems that pgpool do everything in automatic, can I consider that failover situation can be handled by a standard user without the intervention of an administrator ?
For these kind of applications I really like Amazon's Dynamo design. The document by the link is quite big, but it is worth reading. In fact, there're applications that already implement this approach:
mongoDB
Cassandra
Project Voldemort
Maybe others, but I'm not aware. Cassandra started within Facebook, Voldemort is the one used by LinkedIn. Making things distributed and adding redundancy into your data distribution you will step away from traditional Master-Slave replication approaches.
If you'd like to stay with PostgreSQL, it shouldn't be a big deal to implement such approach. You will need to implement an extra layer (a proxy), that will decide based on pre-configured options how to retrieve/save the data.
The proxying layer can be implemented in:
application (requires lot's of work IMHO);
database;
as a middleware.
You can use PL/Proxy on the middleware layer, project originated in Skype. It is deeply integrated into the PostgreSQL, so I'd say it is a combination of options 2 and 3. PL/Proxy will require you to use functions for all kind of queries against the database.
In case you will hit performance issues, PgBouncer can be used.
Last note: any way you decide to go, a known amount of development will be required.
EDIT:
It all depends on what you call “failure” and what you consider system being in an interrupted state.
Let's look on the pgpool features.
Connection Pooling PostgreSQL is using a single process (fork) per session. Obviously, if you have a very busy site, you'll hit the OS limit. To overcome this, connection poolers are used. They also allow you to use your resources evenly, so generally it's a good idea to have pooler before your database.In case of pgpool outage you'll face a big number of clients unable to reach your database. If you'll point them directly to the database, avoiding pooler, you'll face performance issues.
Replication All your queries will be auto-replicated to slave instances. This has meaning for the DML and DDL queries.In case of pgpool outage your replication will stop and slaves will not be able to catchup with master, as there's no change tracking done outside pgpool (as far as I know).
Load Balance Your read-only queries will be spread across several instances, achieving nice response times, allowing you to put more bandwidth on the system.In case of pgpool outage your queries will suddenly run much slower, if the system is capable of handling such a load. And this is in the case that master database will catchup instead of failed pgpool.
Limiting Exceeding Connections pgpool will queue connections in case they're not being able to process immediately.In case of pgpool outage all such connections will be aborted, which might brake the DB/Application protocol, i.e. Application was designed to never get connection aborts.
Parallel Query A single query is executed on several nodes to reduce response time.In case of pgpool outage such queries will not be possible, resulting in a longer processing.
If you're fine to face such conditions and you don't treat them as a failure, then pgpool can serve you well. And if 5 minutes of outage will cost your company several thousands $, then you should seek for a more solid solution.
The higher is the cost of the outage, the more fine tuned failover system should be.
Typically, it is not just single tool used to achieve failover automation.
In each failure you will have to tweak:
DNS, unless you want all clients' reconfiguration;
re-initialize backups and failover procedures;
make sure old master will not try to fight for it's role in case it comes back (STONITH);
in my experience we're people from DBA, SysAdmin, Architects and Operations departments who decide proper strategies.
Finally, in my view, pgpool is a good tool, I do use it. But it is not designed as a complete failover solution, not without extra thinking, measures taken, scripts written. Thus I've provided links to the distributed databases, they provide a much higher level of availability.
And PostgreSQL can be made distributed with a little effort due to it's great extensibility.
First of all, I'd recommend checking out pgBouncer rather than pgpool. Next, what level of scaling are you attempting to reach? You might just choose to run your connection pooler on all your client systems (bouncer is light enough for this to work).
That said, vyegorov's answer is probably the direction you should really be looking at in this day and age. Are you sure you really need a database?
EDIT
So, the rather obvious answer is that pgPool creates a single point of failure if you only have one box running it. The obvious solution is to run multiple poolers across multiple boxes. You then need to engineer your application code to handle database disconnections. This is not as easy at it sounds, but basically you need to use 2-phase commit for non-idempotent changes. So to the greatest extent possible you should make your changes idempotent.
Based on your comments, I'd guess that maybe you have limited experience dealing with database replication? pgPool does statement based replication. There are tradeoffs here. The benefit is that it's very easy to set up. The downside is that there is no guarantee that data on the replicated databases will be identical. It is also (I believe but haven't checked lately) not compatible with 2pc.
My prior comment asking if you really need a database was driven by my perception that you have designed a system without going into much detail around this part of it. I have about 2 decades experience working on "this part" of similar systems. I expect you will find that there are no out of the box solutions and that the issues involved get very complicated. In other words, I'm suggesting you re-consider your design.
Try reading this blog (with lots of information about PostgreSQL and PgPool-II):
https://www.itenlight.com/blog/2016/05/21/PostgreSQL+HA+with+pgpool-II+-+Part+5
Search for "WATCHDOG" on that same blog. With that you can configure a PgPool-II cluster. Two machines on the same subnet are required, though, and a virtual IP on the same subnet.
Hope that this is useful for anyone trying the same thing (even if this answer is a lot late).
PGPool certainly becomes a single point of failure, but it is a much smaller one than a Postgres instance.
Though I have not attempted it yet, it should be possible to have two machines with PGPool installed, but only running on one. You can then use Linux-HA to restart PGPool on the standby host if the primary becomes unavailable, and to optionally fail it back again when the primary comes back. You can at the same time use Linux-HA to move a single virtual IP over as well, so that your clients can connect to a single IP for their Postgres services.
Death of the postgres server will make PGPool send queries to the backup Postgres (promoting it to master if necessary).
Death of the PGPool server will cause a brief outage (configurable, but likely in the region of <1min) until PGPool starts up on the standby, the IP address is claimed, and a gratuitous ARP sent out. Of course, the client will have to be intelligent enough to reconnect without dying.

mongodb & max connections

i am going to have a website with 20k+ concurrent users.
i am going to use mongodb using one management node and 3 or more nodes for data sharding.
now my problem is maximum connections. if i have that many users accessing the database, how can i make sure they don't reach the maximum limit? also do i have to change anything maybe on the kernel to increase the connections?
basically the database will be used to keep hold of connected users to the site, so there are going to be heavy read/write operations.
thank you in advance.
You don't want to open a new database connection each time a new user connects. I don't know if you'll be able to scale to 20k+ concurrent users easily, since MongoDB uses a new thread for each new connection. You want your web app backend to have just one to a few database connections open and just use those in a pool, particularly since web usage is very asynchronous and event driven.
see: http://www.mongodb.org/display/DOCS/Connections
The server will use one thread per TCP
connection, therefore it is highly recomended that your application
use some sort of connection pooling. Luckily, most drivers handle this
for you behind the scenes. One notable exception is setups where your
app spawns a new process for each request, such as CGI and some
configurations of PHP.
Whatever driver you're using, you'll have to find out how they handle connections and if they pool or not. For instance, Node's Mongoose is non-blocking and so you use one connection per app usually. This is the kind of thing you probably want.