Pgbadger - report from multiple servers - postgresql

I've read the documentation for pg_badger, and it says that it can analyze multiple log files, but does that mean that it can analyze logs from multiple servers or just one?
I have to make an sql analysis of five different (postgres) servers (they are all on aws RDS) for slow sql queries, and I was thinking on putting all those log files from those servers on one separate server instance with just pgbadger to analyze them?
That was my thinking, any advice would be valuable.

Related

pg_dump from Azure Postgres Service which contains large data set

We are facing well-known pg_dumps effeciency problems in terms of velocity. We currently have a Azure hosted PostgreSQL, which holds our resources that are being created/updated by SmileCDR. Somehow after three months it is getting larger due to saving FHIR objects. Now, we want to have brand new environment; in that case persistent data in PostgreSQL has to be ripped out and new database has to be initiated with old data set.
Please be advised.
pg_dump consumes relative much more time, almost a day. How can we speed up backup-restore process?
What kind of alternatives that we could use and apply whereas pg_dump to achieve the goal?
Important notes;
Flyway utilized by SmileCDR to make versioning in PostgreSQL.
Everything has to be copied from old one to new one.
PostgreSQL version is 11, 2vCores, 100GB storage.
FHIR objects are being kept in PostgreSQL.
Some kind of suggestions like multiple jobs, without compress, directory format have been practiced but it didn't affect significantly.
Since you put yourself in the cage of database hosting, you have no alternative to pg_dump. They make it deliberately hard for you to get your data out.
The best you can do is a directory format dump with many processes; that will read data in parallel wherever possible:
pg_dump -F d -j 6 -f backupdir -h ... -p ... -U ... dbname
In this example, I specified 6 processes to run in parallel. This will speed up processing unless you have only one large table and all the others are quite small.
Alternatively, you may use smileutil with the synchronize-fhir-servers command, bulk export API on the system level, subscription mechanism. Just a warning that these options may be too slow to migrate the 100Gb database.
As you mentioned, if you can replicate the Azure VM that may be the fastest.

Running two podman/docker containers of PostgreSQL on a single host

I have two applications, each of which use several databases. Before the days of Docker, I would have just put all the databases on one host (due to resource consumption associated with running multiple physical hosts/VMs).
Logically, it seems to me that separating these into groups (1 group of DBs per application) is the right thing to do and with containers the overhead is low and this seems possible. However, I have not seen this use case. I've seen multiple instances of containerized Postgres running so as to maintain multiple versions (hence different images).
Is there a good technical reason why people do not do this (two or more containers of PostgreSQL instances using the same image for purposes of isolating groups of DBs)?
When I tried to do this, I ran into errors having to do with the second instance trying to configure the postgres user. I had to pass in an option to ignore migration errors. I'm wondering if there is a good reason not to do this.
Well, I am not used to work with prosgresql but with mysql, sqlite and ms sql - and docker, of course.
When I entered docker I used to read a lot about microservices, developing of these and, of course, the devops ideas behind docker and microsoervices.
In this world I would absolutly prefer to have 2 containers of the same base image with a multi stage build and / or different env-files to run you infrastructure. Docker is not only allowing this, it is prefering this.

Using multiple PostgeSQL servers with a single shared network data directory?

How does PostgreSQL handle running multiple servers on different machines using a shared data directory? Does it automatically handle this under-the-hood without problems? Is it possible, but requiring some special configuration? Or is this a bad idea in general?
I'm doing some data science on high performance machine cluster, where I submit jobs, the job is run by a random machine, and each machine has access to a shared network drive. Currently, I'm using SQLite, where this use-case works fine. A single shared SQLite database file can handle multiple connections from different machines without trouble.
I'm now attempting to switch over to PostgreSQL. Intercommunication between the machines of the cluster is surprisingly not straightforward. So while the immediate solution should be having one server which all the other machines connect to, this might not end up being practical. Ideally, I could just continue doing what I've been doing with the SQLite setup. That is, have each machine run it's own PostgreSQL server, which then connects to the shared databases.
No, no, no and yes.
A PostgreSQL installation ("cluster" is the term used in the manuals) expects to be in charge of all of its files. It carefully coordinates access between multiple processes accessing those files. You are supposed to access PostgreSQL in a client/server manner over a socket (unix if local, tcp if not).
This is not supported with PostgreSQL. It will lead to corruption and data loss. If you can't simplify your networking, then you best stick to SQLite. (Assuming it is actually safe with SQLite, something I haven't verified)

SAS - DB2 - connection- coding

Can anyone let me know how to pull data from DB2 using SAS program. I have a DB2 query and want to write SAS code to pull the data from DB2 using the DB2 query. Please share you knowledge in achieving this task.[SAS-Mainframe]. (2) Pointers in connecting to DB2(mainframe) using SAS.
Most likely the issue is with your JCL, not SAS. On the mainframe, jobs are run in lpars (logical partitions). An analogy would be several computers networked together. Each lpar(or computer) would be set up with software and networked to hard drives and db2 servers. Usually one lpar is set aside to run only production jobs, another for development, etc. It is a way to make sure production jobs get the resources they need without development jobs interfering.
In this scenario, each lpar would have SAS installed, but only one partition would be networked to the db2 server you are trying to get your data from. Your JCL would tell the system which lpar to run your job on. Either the wrong lpar is coded in your JCL or your job is running in a default lpar which is not the one your job needs.
The JCL code to run in the correct lpar is customized for each system, so only someone who is running jobs on your system will know what the code is. I suggest going to someone also running jobs your system and tell them as you said 'SAS program without DB2 connectivity is working fine, but otherwise it is not.' They should be able to point you to the JCL code you need.
Good luck.

SQL Server management scaling

We are playing with a multi tenant architecture not baed on partitions but rather havings tons of databases. We decided to run some tests
Generated 5 000 database schemas, each contains ~ 100 DB objects. 250k tables & 250k other DB objects (keys, indexes) at all.
Found cons:
Tried to open list of tables from SQL MGMT Studio – it took ~ 10-15 min. MGMT Studio allocated ~ 700 Mb of RAM
DB Utilities don’t work – tried Red Gate, DB Forge, Adept SQL Diff
Any advice when managing and running SQL Server like this?
Try to use sqlcmd utility running from command prompt.
You could try writing your own management tool, targeted specifically at what you need using SMO:
Creating SMO Programs - MSDN
That way you could simplify the program and load only what is required and potentially increasing performance.