Will PgBouncer reuse postgresql session sequence cache? - postgresql

I want to use postgres sequence with cache CREATE SEQUENCE serial CACHE 100.
The goal is to improve performance of 3000 usages per second of SELECT nextval('serial'); by ~500 connection/application threads concurrently.
The issue is that I am doing intensive autoscaling and connections will be disconnected and reconnected occasionally leaving "holes" of unused ids in the sequence each time a connection is disconnected.
Well, the good news might be that I am using a PgBouncer heroku buildpack with transaction pool mode.
My question is: will the transaction pool mode solve the "holes" issues that I described, will it reuse the session in a way that the next application connection will take this session from the pool and continue using the cache of the sequence?

This depends on the setting of server_reset_query. If you have that set to DISCARD ALL, then sequence caches are discarded before a server connected is handed out to a client. But for transaction pooling, the recommended server_reset_query is empty, so you will be able to reuse sequence caches in that case. You can also use a different DISCARD command, depending on your needs.

Related

Multithreading with PostgreSQL JDBC

I'm still a student and not so experienced with multithreading and databases so I might have missed some obvious stuff - hoping for an answer at a more beginner level.
I'm busy creating a dummy Java application that allows users to submit subway station locations and then lookup the nearest station to their location. This is all happening over HTTP.
The backend for this application is PostgreSQL (with PostGis) and I connect to the database via the PostgreSQL JDBC.
I want my application to be as multithreaded as possible. Every time I receive a new HTTP connection, I spin up a new thread and service the users request. But I'm not sure how much point there is to this if reads and writes to the database themselves cannot be parallel.
According to this, PostgreSQL JDBC is not thread safe. But what does that mean exactly? Does that just mean that reads and writes within a single connection are not thread safe (i.e. in each instance of DriverManager.getConnection())? But what about if I made a new connection every time an HTTP request came in? Would that be safe to do in parallel? And would that affect performance badly?
Any other suggestions on broad approach to take?
JDBC in general is not thread-safe, not just the Postgres driver.
This means you can not run multiple statements created from the same Connection instance in multiple threads at the same time.
If you want to run two statements in parallel, you need to create two different physical connections to the database.
As creating connections isn't cost-free, the usual approach is to have a connection pool (e.g. through a ConnectionPoolDataSource) that keeps a set of connections open. The application then takes connections from the pool and puts them back when the query (or transaction) is done.

HikariCP connection pool - 'active' - how to debug?

I am building an app using Spring-Boot/Hibernate with Postgres as the database. I am using Spring 2.0, so Hikari is the default connection pool provider.
Currently, I am trying to load-test the application with a REST end-point that does an 'update-if-exists and insert if new' to an entity in the database. Its a fairly small entity with 'BIGSERIAL' primary key and no constraints on any other field.
The default connection pool size is 10 and I haven't really tweaked any other parameters - either of the HikariCP or for Postgres.
The point at which I am stuck at this moment is to debug connections in 'active' state and what they are doing or why they stuck currently.
When I run '10 simultaneous users', it basically translates into 2 or 3 times that many queries and thus, when I turn on the HikariCP debug logs, it hangs at something like this -
(total=10, active=10, idle=0, waiting=2) and the 'active' connections do not really release the connections, which is what I am trying to find out because the queries are fairly simple and the table itself is just 4 fields (including the primary key).
The best practices from HikariCP folks as well generally is that increasing the connection pool is not the right first step towards scaling.
If I do increase the connection pool size to 20, things start working for 10 simultaneous/concurrent users but then again, its not the root cause/solution for the problem I believe.
Is there any way I can log either Hibernate or Postgres messages that might help in knowing what these 'active' connections are waiting on and why the connection doesn't get released even after I increase the wait-time to a long time?
If it is a connection-leak ( as is reported when the leak-detection-threshold is reduced to a lower value (e.g. 30 seconds) ), then how can I tell if Hibernate is responsible for this connection leak or if it is something else?
If it is a lock/wait at the database level, how can I get the root of this?
UPDATE
After help from #brettw, I took a thread-dump when the connections were exhausted and it pointed in the direction of a connection-leak. The threads on HikariCP issues board - https://github.com/brettwooldridge/HikariCP/issues/1030#issuecomment-347632771 - which points to the Hibernate not closing connections which then pointed me to https://jira.spring.io/browse/SPR-14548, which talks about setting Hibernate's connection closing mode since the default mode holds the connection for too long. After setting spring.jpa.properties.hibernate.connection.handling_mode=DELAYED_ACQUISITION_AND_RELEASE_AFTER_TRANSACTION, the connection pool worked perfectly.
Also, the point made here - https://github.com/brettwooldridge/HikariCP/issues/612#issuecomment-209839908 is right - a connection leak should not be covered up by the pool.
It sounds like you could be hitting a true deadlock in the database. There should be a way to query PostgreSQL for current active queries, and current lock states. You'll have to google it.
Also, I would try a simple thread dump to see where all the threads are blocked. It could be a code-level synchronization deadlock.
If all of the threads are blocked on getConnection(), it is a leak.
If all of the threads are down in the driver, according to the stacktrace for each thread, it is a database deadlock.
If all of the threads are blocked waiting for a lock in your application code, then you have a synchronization deadlock -- likely two locks with inverted acquisition order in different parts of the code.
The HikariCP leakDetectionThreshold could be useful, but it will only show where the connection was acquired, not where the thread is currently stuck. Still, it could provide a clue.

How do I manage connection pooling to PostgreSQL from sidekiq?

The problem I have a rails application that runs a few hundred sidekiq background processes. They all connect to a PostgreSQL database which is not exactly happy about providing 250 connections - it can, but if all sidekiq processes accidentally send queries to the db, it crumbles.
Option 1 I have been thinking about adding pgBouncer in front of the db, however I cannot currently use it's transactional mode, since I'm highly dependent upon setting the search_path at the beginning of each job processing for determining which "country" (PostgreSQL schema) to work on (apartment-gem). In this case, I would have to use the session based connection pooling mode. This however would, as far as I know, require me to disconnect the connections after each job processing, to release the connections back into the pool, and that would be really costly performance wise wouldn't it? Am I missing out on something?
Option 2 use application layer based connection pooling is of cause also an option, however I'm not really sure how I would be able to do that for PostgreSQL with sidekiq?
Option 3 something I have not thought of?
Option 1: You're correct, sessions would require you to drop and reconnect and that adds overhead. How costly would be dependent on access pattern ie what fraction of the connection/tcp handshake etc is of the total work done and what sort of latency you need. Definitely worth benchmarking but if the connections are short lived then the overhead will be really noticeable.
Option 2/3: You could rate limit or throttle your sidekiq jobs. There are a few projects here tackling this...
Queue limits
Sidekiq Limit Fetch: Restrict number of workers which are able to run specified queues simultaneously. You can pause queues and resize queue distribution dynamically. Also tracks number of active workers per queue. Supports global mode (multiple sidekiq processes). There is an additional blocking queue mode.
Sidekiq Throttler: Sidekiq::Throttler is a middleware for Sidekiq that adds the ability to rate limit job execution on a per-worker basis.
sidekiq-rate-limiter: Redis backed, per worker rate limits for job processing.
Sidekiq::Throttled: Concurrency and threshold throttling.
I got the above from here
https://github.com/mperham/sidekiq/wiki/Related-Projects
If your application must have a connection per process and you're unable to break it up where more threads can use a connection then it's pgBouncer or Application based connection pooling. Connection pooling is in effect either going to throttle or limit your app in some way in order to save the DB.
Sidekiq should only require one connection for each worker thread. If you are setting your concurrency to a reasonable value, say 10-25, I don't think you should be using 250 simultaneous database connections. How many worker processes are you running, and what is their concurrency?
Also, you can see on that page that even if you have a high concurrency setting, you can still create a connection pool shared by the threads within that process.

Postgres connection should be always on? or connect before running each query?

I am debating if I should keep my postgres connection always on, and check/re-connect before running query. Or I should connect it before run each query and close the connection as soon as it is done. Thanks!
As long as the Postgres server isn't totally jammed with connections (i.e. this is not an app that will be creating a gigantic number of perpetual connections), I don't think it's a problem to maintain the connection. I would also recommend checking the connection and handling reconnects prior to each query however. Many libraries offer ways to do this. For example, with MyBatis (Java), you can have it issue a test query each time, which can be specified. I use the lightweight SELECT 1 for this.
I would say the key thing to consider is to keep the connection idle in transaction for as little time as possible, as when that happens, it can have a variety of different impacts on performance (such as slowing down other queries, preventing high-turnover tables from being vacuumed in a timely manner, etc.). This is not to say that any time spent in idle in transaction is automatically bad, but it should be considered and minimized where possible. (e.g. if you have some calculations that are going several minutes to run, make sure to either commit or rollback prior to doing those (which one would depend on context).
If you're doing a bunch of SELECTs, and don't have anything you need to commit, I would recommend doing a rollback to help keep the idle in transaction states to a minimum.
I just realized the postgres connection string has a bunch of setting for the connection pooling, for example:
User ID=root;Password=myPassword;Host=localhost;Port=5432;Database=myDataBase;
Pooling=true;Min Pool Size=0;Max Pool Size=100;Connection Lifetime=0;
So in my code, I can just close the connection after the command finished execution. But behind the scene, the connection is actually still alive and stored in the connection pool to be used again.

What are advantages of using transaction pooling with pgbouncer?

I'm having trouble finding a good summary of the advantages/disadvantages of using pgbouncer for transaction pooling vs session pooling.
Does it mean that a transaction heavy workload is somehow better load balanced? Is it to prevent as many connections being required to connect from pgbouncer to the database?
Transaction-level pooling will help if you have apps that hold idle sessions. PgBouncer won't need to keep sessions open and idle, it just grabs one when a new transaction is started. Those idle sessions only cost you a pgbouncer connection, not a real idle Pg session with a backend sitting around wasting memory & synchronisation overhead doing nothing.
The main reason you'd want session pooling instead of transaction pooling is if you want to use named prepared statements, advisory locks, listen/notify, or other features that operate on a session level not a transaction level.