Faster dump load in Progress 9.1E - progress-4gl

How the dump load can be done faster in Progress?
I need to automate the process of dump load,so that I can have dump load on weekly basis?

Generally one wouldn't need to do a weekly D&L as the server engine does a decent job of managing is data. A D&L should only be done when there's an evident concern about performance, when changing versions, or making a significant organizational change in the data extents.
Having said that, a binary D&L is usually the fastest, particularly if you can make it multi-threaded.

Ok, dumping and loading to cross platforms to build a training system is probably a legitimate use-case. (If it were Linux to Linux you could just backup and restore -- you may be able to do that Linux to UNIX if the byte ordering is the same...)
The binary format is portable across platforms and versions of Progress. You can binary dump a Progress version 8 HPUX database and load it into a Windows OpenEdge 11 db if you'd like.
To do a binary dump use:
proutil dbname -C dump tablename
That will create tablename.bd. You can then load that table with:
proutil dbname -C load tablename
Once all of the data has been loaded you need to remember to rebuild the indexes:
proutil dbname -C idxbuild all
You can run many simultaneous proutil commands. There is no need to go one table at a time. You just need to have the db up and running in multi-user mode. Take a look at this for a longer explanation: http://www.greenfieldtech.com/downloads/files/DB-20_Bascom%20D+L.ppt
It is helpful to split your database up into multiple storage areas (and they should be type 2 areas) for best results. Check out: http://dbappraise.com/ppt/sos.pptx for some ideas on that.
There are a lot of tuning options available for binary dump & load. Details depend on what version of Progress you are running. Many of them probably aren't really useful anyway but you should look at the presentations above and the documentation and ask questions.

Related

Postgres auto tuning

As we all know postgres performance highly depends on config params. Eg if I have ssd drive or more RAM I need to tell that postgres by changing relevant cfg param
I wonder if there is any tool (for Linux) which can suggest best postgres configuration for current hardware?
Im aware Websites (eg pgtune) where I can enter server spec and those can suggest best config
However each hardware is different (eg I might have better raid / controller or some processes what might consume more ram etc). My wish would be postgres doing self tuning, analysing query execution time available resources etc
Understand there is no such mechanism, so maybe there is some tool / script I can run which can do this job for me (checking eg disk seq. / random disk read, memory available etc) and telling me what to change in config
There are parameters that you can tweak to get better performance from postgresql.
This article gives good read about that.
There are few scripts that can do that. One that is mentioned in postgres wiki is this one.
To get more idea about what more tuning your database needs, you need to log its request and performance, after analysing those logs you can tune more params. For this there is pgbadger log analyzer.
After using database in production, you get more idea regarding what requirements you have and how to approach them rather than making changes just based on os or hardware configuration.

Are there any negative performance or functionality downsides to using pg_upgrade with --link option afterwards?

I'm about upgrade a quite large PostgreSQL cluster from 9.3 to 11.
The upgrade
The cluster is approximately 1,2Tb in size. The database has a disk system consisting of a fast HW RAID 10 array of 8 DC-edition SSDs with 192GB ram and 64 cores. I am performing the upgrade by replicating the data to a new server with streaming replication first, then upgrading that one to 11.
I tested the upgrade using pg_upgrade with the --link option, this takes less than a minute. I also tested the upgrade regularly (without --link) with many jobs, that takes several hours (+4).
Questions
Now the obvious choice is of cause for me to use the --link option, however all this makes me wonder - is there any downsides (performance or functionality wise) to using that over the regular slower method? I do not know the internal workings of postgresql data structures, but I have a feeling there could be a performance difference after the upgrade between rewriting the data entirely and to just using hard links - whatever that means?
Considerations
The only thing I can find in the documentation about the drawbacks of --link is the downside of not being able to access the old data directory after the upgrade is performed https://www.postgresql.org/docs/11/pgupgrade.htm However that is only a safety concern and not a performance drawback and doesn't really apply in my case of replicating the data first.
The only other thing I can think of is reclaiming space, with whatever performance upsides that might have. However as I understand it, that can also be achieved by running a VACUUM FULL DATABASE (or CLUSTER?) command after the --link-upgraded database has been upgraded? Also the reclaiming of space is not very impactful performance wise on an SSD as I understand.
I appreciate if anyone can help cast some light into this.
There is absolutely no downside to using hard links (with the exception you noted, that the old cluster is dead and has to be removed).
A hard link is in no way different from a normal file.
A “file” in UNIX is in reality an “inode”, a structure containing file metadata. An entry in a directory is a (hard) link to that inode.
If you create another hard link to the inode, the same file will be in two different directories, but that has no impact whatsoever on the behavior of the file.
Of course you must make sure that you don't start both the only and the new server. Instant data corruption would ensue. That's why you should remove the old cluster as soon as possible.

Dumping postgres DB, time and .sql file weight

I have a big db (nominatim db, for address geocoding reverse), is about 408gb big.
Now, to provide an estimate to the customer, I would like to know how long will take the export/reimport procedure and how big will .sql dump file be.
My postgresql version is 9.4, is installed on a centOS 6.7 virtual machine, with 16gb RAM and 500 gb disk space.
Can you help me?
Thank you all guys for your answer, anyway to restore the dumped db I don't use the command pg_restore but psql -d newdb -f dump.sql (I read this way to do in a official doc). This because I have to set-up this db on another machine to avoid the nominatim db indexing procedure! I don't know if someone knows nominatim (is a openstreetmap opensource product) but the db indexing process of European map (15.8 gb), in a CentOS 6.7 machine with 16gb ram tooks me 32 days...
Than another possible question should be: pg_restore is equal to psql -d -f? Wich is faster?
Thanks again
As #a_horse_with_no_name says, nobody will be able to give you exact answers for your environment. But this is the procedure I would use to get some estimates.
I have generally found that a compressed backup of my data is 1/10th or less the size of the live database. You can also usually deduct the on-disk size of the indexes from the backup size as well. Examine the size of things in-database to get a better idea. You can also try forming a subset of the database you have which is much smaller and compare the live size to the compressed backup; this may give you a ratio that should be in the ballpark. SQL files are gassy and compress well; the on-disk representation Postgres uses seems to be even gassier though. Price of performance probably.
The best way to estimate time is just to do some exploratory runs. In my experience this usually takes longer than you expect. I have a ~1 TB database that I'm fairly sure would take about a month to restore, but it's also aggressively indexed. I have several ~20 GB databases that backup/restore in about 15 minutes. So it's pretty variable, but indexes add time. If you can set up a similar server, you can try the backup-restore procedure and see how long it will take. I would recommend doing this anyway, just to build confidence and suss out any lingering issues before you pull the trigger.
I would also recommend you try out pg_dump's "custom format" (pg_dump -Fc) which makes compressed archives that are easy for pg_restore to use.

How to verify large postgresql Databases running different version have the same data without dumping

How Would I verify that the data in a 8.3 postgresql DB is the same as the data in a 9.0 DB
When I did a sql dump on a example table there we3re many differences that showed but this was due to 9.0 truncating 0's on the end and begining of date fields, also the order of the dump was not fixed, even though this can be sorted with sort(no pun intended). it does not allow validation as it would loose what table it was part of as the sorted sql dump would be a meaningless splat of sql commands with dump settings thrown in for extra.
count(*) is also not adequate.
I would like to be 100% sure that the data in one is equal to the data in the other despite the version differences and the way that at the very least dates are held in 9.0.
I should add I have several hundred tables and many hundred GB of data. so i need a automated process like diff DUMPa.sql DUMP2.sql, a SHA of the data(not the format) would be idea, but one cannot diff binary dumps of PostgreSQL for well known reasons. I am aware mysql has a checksum feature, but im using postgresql.
First the bad news. There is really no way to offer the full concerns you want addressed without loading all the data into an intermediary program and directly comparing. This will take time and it will drag your system down load-wise so my recommendation is set up some sort of replication and compare replicas.
One thing you might be able to do is to use something like Slony or Bucardo to replicate, and then triggers to move data into secondary child partitions and replicate those onto a consolidated server for comparison. You could then compare within PostgreSQL. This would reduce the load and it would mean your reporting data would be relatively easy to manage compared to other approaches. But all the data is going to have to be loaded and compared line-by-line.

Attaching Informix .dat and .idx files

We are trying to duplicate one of our informix database on a test server, but without Informix expertise in house we can only guess what we need to do. I am learning this stuff on the fly myself and nowhere near the expertise level needed to operate Informix efficiently or even inefficiently. Anyhow...
We managed to copy the .dat and .idx files from the live server somewhere. Installed Linux and the latest Informix Dynamic Server on it and have it up and running.
Now what should we do with the .dat and idx files from the live server? Do we copy it somewhere and it will recognize it automatically?
Or is there an equivalent way like you can do attach DB from MS SQLServer to register the database files in the new database?
At my rope end...
You've asked a pretty complicated question without realizing it. Informix is architected as a shared everything database engine, meaning all resources available to the instance are available to every database in that instance. This means that more than one database can store data in any given dbspace, .dat or .idx file in your case. Most DBA's know better than to do that but it's something to be aware of. Given this knowledge you now know that the .dat and .idx files do not belong to a database but belong the instance. The dbspaces and files were created to contain your databases data but they technically belong to the instance. It's worth noting that the .dat and .idx files are known to the database by the logical dbspace name.
Armed with this background info and assuming that the production and development servers are running the same OS and that your hardware is relatively the same, not a combination of PARISC, Itanium or x86/x64, I'll throw a couple of options out for you.
Create the dbspaces that you need in the new instance and use onunload and onload
to copy the database from production to development.
Use ontape or onbar to backup the entire production instance and
restore it over your development instance.
Option 1 requires that you know what the dbspaces are named and how large they are. Use onstat -d on the production instance to find this out. BTW, the numbers listed in onstat -d are in pages, I believe that Linux is a 2K page.
Option 2 simply requires that the paths for the data files are the same on both servers. This means that the ROOTDBS needs to be the same in both instances. That can be found by executing onstat -c | grep ROOTDBS
There's a lot that has been left out but I hope that this gives you the info that you need to move forward with your task.
The .dat and .idx files are associated with C-ISAM, or, when organized in a directory called dbase.dbs (where dbase is the name of your database), the .dat and .idx files are associated with Informix Standard Engine, aka Informix SE. SE uses C-ISAM to manage its storage. SE is rather different from (and much simpler than) Informix Dynamic Server (IDS). It is not impossible that the .dat and .idx files are associated with IDS; it is just extremely unlikely.
From the information available, it sounds as though your production server is running SE. To get the data from SE to IDS, you will probably want to use DB-Export at the SE end and DB-Import at the Linux/IDS end. Certainly, that is the simplest way to do it.
There are other possible solutions - C-ISAM datablade being one such - but they are more expensive and probably not warranted. There are other possible loading solutions, such as HPL (High-Performance Loader).
For more information about Informix, either use the various web sites already referenced (http://www.informix.com is a link to the Informix section of IBM's web site), or use the International Informix User Group (IIUG) web site. There are mailing lists available (which require you to belong, but membership is free) for discussing Informix in detail.
Those Informix-SE datafiles (.DAT) and their associated index files (.IDX) are useless unless you also have all the associated catalog files, such as SYSTABLES.DAT SYSTABLES.IDX, SYSCOLUMNS, SYSINDEXES, etc.
Then you also have to worry about which version of Informix-SE created them, as some have a 2K or 4K index file node size.
Your best approach is to obtain all the .DAT and .IDX files from the source db, plus the correct standard engine, installed on the same hardware and operating system it came from.
Long story short, on the source machine, run "dbexport" to unload all the data to ascii files, and run "dbschema" to generate all the table schemas and indexes. It also wouldn't hurt to run a "bcheck" on all the files before unloading them to ascii flat files.
I don't have any Informix-specific advice but for situations like this you can usually find the answer by looking up how to move a database (a common admin task, and usually well described in the manual) and just skipping the steps that would remove the old database.
Also, be careful of problems caused by different system architectures; some DBs fail spectacularly if you move them from a big-endian system (such as Solaris) to a little-endian system (such as x86 Linux) Again, the manual section on moving a DB would cover any extra steps that are needed.