Moving the data from transaction table to history table to increase insert performance, postgres - postgresql

I have 3 database tables, each contain 6 million rows and adding 3 million rows every year.
Following are the table information:
Table 1: 20 fields with 50 characters average in each filed. Has 2 indexes both are on timestamp fields.
Table 2: 5 fields, 2 byte array field and 1 xml field
Table 3: 4 fields, 1 byte array field
Following is the usage:
Insert 15 to 20 records per second in each table.
A view is created by joining first 2 tables and the select is mostly based on the date field in the first table.
Right Now, insert one record each in all three table together takes about 100 milliseconds.
I'm planning to migrate from postgres 8.4 to 9.2. I would like to do some optimization for insert performance also.Also, I'm planning to create history tables and keep the old record into those tables. I have the following questions in this regard
Will create history tables and move older data to those tables help in increasing the insert performance?
If it helps, how often I need to move the old records into the history tables, daily? or weekly/monthly/yearly?
If i keep only one month (220,000) data instead of one year data (3 million) will it help in improving insert performance?
Thanks in advance,
Sudheer

I'm sure someone better informed than I will show up and provide a better answer, but my impression is that:
Insert performance is mostly a function of your indexing strategy and your hardware
Performance, in general, is better under 9.0+ than 8.4, and this may rub off on insert performance, but I'm not certain of that.
None of your ideas are going to directly affect insert performance
Now, that said, the cost of maintaining a small index is lower than a large one, so it may be that creating history tables and moving old data there will improve performance simply by reducing index pressure. But I would expect dropping one of your indexes to have a direct and greater effect. Perhaps you could have a history table with both indexes and just maintain one of them on the "today" table?
If I were in your shoes, I'd get a copy of production going on my machine running 8.4 with a similar configuration. Then upgrade to 9.2 and see if the insert performance changes. Then try out these ideas and benchmark them, see which ones improve the situation. It's absolutely essential that things be kept as similar to production as possible for this to yield useful information, but it will certainly be better information than any hypothetical answer you might get.
Now, 100ms seems pretty slow for inserting one row IMO. Better hardware would certainly improve this situation. The usual suggestion would be a big striped RAID array with a battery-backed cache. PostgreSQL 9.0 High Performance has more information on all of this.

Related

How to decrease size of a large postgresql table

I have a postgresql table that is "frozen" i.e. no new data is coming into it. The table is strictly used for reading purposes. The table contains about 17M records. The table has 130 columns and can be queried multiple different ways. To make the queries faster, I created indices for all combinations for filters that can be used. So I have a total of about 265 indexes on the table. Each index is about 1.1 GB. This makes the total table size to be around 265 GB. I have vacuumed the table as well.
Question
Is there a way to further bring down the disk usage of this table?
Is there a better way to handle queries for "frozen" tables that never get any data entered into them?
If your table or indexes are bloated, then VACUUM FULL tablename could shrink them. But if they aren't bloated then this won't do any good. This is not a benign operation, it will lock the table for a period of time (needing rebuild hundreds of index, probably a long period of time) and generate large amounts of IO and of WAL, the last of which will be especially troublesome for replicas. So I would test it on a non-production clone to see it actually shrinks things and see about how long of a maintenance window you will need to declare.
Other than that, be more judicious in your choice of indexes. How did you get the list of "all combinations for filters that can be used"? Was it by inspecting your source code, or just by tackling slow queries one by one until you ran out of slow queries? Maybe you can look at snapshots of pg_stat_user_indexes taken a few days apart to see if all them are actually being used.
Are these mostly two-column indexes?

postgres many tables vs one huge table

I am using postgresql db.
my application manages many objects of the same type.
for each object my application performs intense db writing - each object has a line inserted to db at least once every 30 seconds. I also need to retrieve the data by object id.
my question is how it's best to design the database? use one huge table for all the objects (slower inserts) or use table for each object (more complicated retrievals)?
Tables are meant to hold a huge number of objects of the same type. So, your second option, that is one table per object, doesn't seem to look right. But of course, more information is needed.
My tip: start with one table. If you run into problems - mainly performance - try to split it up. It's not that hard.
Logically, you should use one table.
However, so called "write amplification" problem exhibited by PostgreSQL seems to have been one of the main reasons why Uber switeched from PostgreSQL to MySQL. Quote:
"For tables with a large number of secondary indexes, these
superfluous steps can cause enormous inefficiencies. For instance, if
we have a table with a dozen indexes defined on it, an update to a
field that is only covered by a single index must be propagated into
all 12 indexes to reflect the ctid for the new row."
Whether this is a problem for your workload, only measurement can tell - I'd recommend starting with one table, measuring performance, and then switching to multi-table (or partitioning, or perhaps switching the DBMS altogether) only if the measurements justify it.
A single table is probably the best solution if you are certain that all objects will continue to have the same attributes.
INSERT does not get significantly slower as the table grows – it is the number of indexes that slows down data modification.
I'd rather be worried about data growth. Do you have a design for getting rid of old data? Big DELETEs can be painful; sometimes partitioning helps.

Performance improvement for fetching records from a Table of 10 million records in Postgres DB

I have a analytic table that contains 10 million records and for producing charts i have to fetch records from analytic table. several other tables are also joined to this table and data is fetched currently But it takes around 10 minutes even though i have indexed the joined column and i have used Materialized views in Postgres.But still performance is very low it takes 5 mins for executing the select query from Materialized view.
Please suggest me some technique to get the result within 5sec. I dont want to change the DB storage structure as so much of code changes has to be done to support it. I would like to know if there is some in built methods for query speed improvement.
Thanks in Advance
In general you can take care of this issue by creating a better data structure(Most engines do this to an extent for you with keys).
But if you were to create a sorting column of sorts. and create a tree like structure then you'd be left to a search rate of (N(log[N]) rather then what you may be facing right now. This will ensure you always have a huge speed up in your searches.
This is in regards to binary tree's, Red-Black trees and so on.
Another implementation for a speedup may be to make use of something allong the lines of REDIS, ie - a nice database caching layer.
For analytical reasons in the past I have also chosen to make use of technologies related to hadoop. Though this may be a larger migration in your case at this point.

Entity FrameWork CodeFirst- Table name

Hi i am building database, with couple tables (products, orders, costumers) and i am interested if it is possible to do such a trick, generate table every-day based on orders table with name of current day, because orders table will have about 1000 or more rows every-day and it will hurt application speed.
1000 rows is nothing. What database are you using? Most modern databases can handle millions of rows with no issue, as long as you put some effort into proper indexing of the table.
From your comment, I'm assuming you don't know about database table indexing.
A database index is a data structure that improves the speed of data
retrieval operations on a database table at the cost of slower writes
and increased storage space. Indices can be created using one or more
columns of a database table, providing the basis for both rapid random
lookups and efficient access of ordered records.
From http://en.wikipedia.org/wiki/Database_index
You need to add indexes to your database tables to ensure they can be searched optimally.
What you are suggesting is a bad idea IMO, and it's going to make working with the application a pain. Instead, if you really fill this table with vast amounts of data you could consider periodically archiving old data, but don't do this until you really need to.

Improve insert performance massively

In my application I need to massively improve insert performance. Example: A file with about 21K records takes over 100 min to insert. There are reasons it can takes some time, like 20 min or so but over 100 min is just too long.
Data is inserted into 3 tables (many-to-many). Id's are generated from a sequence but I have already googled and set hibernate.id.new_generator_mappings = true and allocationSize + sequence increment to 1000.
Also the amount of data is not anything extraordinary at all, the file is 90 mb.
I have verified with visual vm that most of the time is spent in jdbc driver (postgresql) and hibernate. I think the issue is related to a unique constraint in the child table. The service layer makes a manual check (=SELECT) before inserting. If the record already exists, it reuses it instead of waiting for a constraint exception.
So to sum it up for the specific file there will be 1 insert per table (could be different but not for this file which is the ideal (fastest) case). That means total 60k inserts + 20k selects. Still over 100 min seems very long (yeah hardware counts and it is on a simple PC with 7200 rpm drive, no ssd or raid). However this is an improved version over a previous application (plain jdbc) on which the same insert on this hardware took about 15 min. Considering that in both cases about 4-5 min is spent on "pre-processing" the increase is massive.
Any tips who this could be improved? Is there any batch loading functionality?
see
spring-data JPA: manual commit transaction and restart new one
Add entityManager.flush() and entityManager.clear() after every n-th call to save() method. If you use hibernate add hibernate.jdbc.batch_size=100 which seems like a reasonable choice.
Performance increase was > 10x, probably close to 100x.
sounds like a database problem. check your tables if they use InnoDB or MyISAM, the latter is in my experience very slow with insert and is the default for new dbs. remove foreign keys as far as you can
If your problem really is related to a single unique index InnoDB could do the trick.