We have a use-case where we are having RDS Postgres which we are accessing through Java application. We want to run 2 select queries and don't want to run them sequentially to save the latency. If query1 responses are non-null and non-empty then we'll consider its responses for further processing otherwise will use responses of query2.
I haven't found proper resource which mentions how to do that ? I have confusions like whether its possible or not ? Do I need to create 2 different sessions first and then start async call, as I think 2 queries cannot run together in one postgres session.
Please guide me out here.
Related
I am architecting a database where I expected to have 1,000s of tenants where some data will be shared between tenants. I am currently planning on using Postgres with row level security for tenant isolation. I am also using knex and Objection.js to model the database in node.js.
Most of the tutorials I have seen look like this where you create a separate knex connection per tenant. However, I've run into a problem on my development machine where after I create ~100 connections, I received this error: "remaining connection slots are reserved for non-replication superuser connections".
I'm investigating a few possible solutions/work-arounds, but I was wondering if anyone has been able to make this setup work the way I'm intending. Thanks!
Perhaps one solution might be to cache a limited number of connections, and destroy the oldest cached connection when the limit is reached. See this code as an example.
That code should probably be improved, however, to use a Map as the knexCache instead of an object, since a Map remembers the insertion order.
I have built an application with Spring Boot and JPA to migrate a Jira postgres database.
Basically, I have 5000 users that I need to migrate. Each user means 67 update queries in different tables.
Each query uses the LOWER function to compare ignoring case.
Some pseudo-code:
for (user : users){
for (query : queries) {
jdbcTemplate.execute(query.replace(user....
I ignore any errors, so if a single query fails, I still go on and execute the other 66.
I am running this in 10 separate threads and each user is taking roughly 120 seconds to migrate. (20 threads resulted in database dead lock)
At this pace, it's gonna take more than a day, which is not acceptable (I am running this in a test environment before doing in production).
The queries looks like this:
UPDATE table SET column = 'NEWUSERNAME' where LOWER(column) = LOWER('CURRENTUSERNAME');
Is there anything I can do to try and optimize this migration?
UPDATE:
I changed my approach. First, I select every element with the CURRENTUSERNAME and get it's ID. Then I create the UPDATE queries using the ID as the "where" clause.
Other than that, it is still taking a long time (4+ hours) to execute.
I am running millions of UPDATEs, each at a time. I know jdbcTemplate has a bulk method, but if a single UPDATE fails, I believe it roll's back every successful update too. Also, I am not aware of the performance improvement it would have, if any.
So, to update the question, given that I have millions of UPDATE queries to run, what would be the best way execute them? (bulk, multi threading, something else)
I am planning to use AWS RDS Postgres version 10.4 and above for storing data in a single table comprising of ~15 columns.
My use case is to serve:
1. Periodically (after 1 hour) store/update rows in to this table.
2. Periodically (after 1 hour) fetch data from the table say 500 rows at a time.
3. Frequently fetch small data (10 rows) from the table (100's of queries in parallel)
Does AWS RDS Postgres support serving all of above use cases
I am aware of Read-Replicas support, but is there any in built load balancer to serve the queries that come in parallel?
How many read queries can Postgres be able to process concurrently?
Thanks in advance
Your usecases seems to be a normal fit for all relational database systems. So I would say: yes.
The question is: how fast the DB can handle the 100 queries (3).
In general the postgresql documentation is one of the best I ever read. So give it a try:
https://www.postgresql.org/docs/10/parallel-query.html
But also take into consideration how big your data is!
That said, try w/o read replicas first! You might not need them.
I'm looking to send multiple read queries to a Postgres database in order to reduce the number of trips that need to be made to a painfully remote database. Is there anything in libpq that supports this behavior?
Yes, you can use the asynchronous handling functions in libpq. On the linked page it says:
Using PQsendQuery and PQgetResult solves one of PQexec's problems: If
a command string contains multiple SQL commands, the results of those
commands can be obtained individually. (This allows a simple form of
overlapped processing, by the way: the client can be handling the
results of one command while the server is still working on later
queries in the same command string.)
For example, you should be able to call PQsendQuery with a string containing multiple queries, then repeatedly call PQgetResult to get the result sets. PQgetResult returns NULL when there are no more result sets to obtain.
If desired, you can also avoid your application blocking while it waits for these queries to execute (described in more detail on the linked page).
I'm working with an ETL tool, Business Objects Data Services, which has the capability of specifying parallel execution of functions. The documentation says that before you can do this, you have to make sure that your database, which in our case is Postgres, allows "a stored procedure to run in parallel". Can anyone tell me if Postgres does that?
Sure. Just run your queries in different connections, and they will run in parallel transactions. Beware of locking though.
You can also call different stored procedures from the same connection (and effectively still run them in parallel) by using DBLink.
See this SO answer to see an example.