I am working with a database of a million rows approx.. using python to parse documents and populate the table with terms.. The insert statements work fine but the update statements get extremely time consuming as the table size grows..
It would be great if some can explain this phenomenon and also tell me if there is a faster way to do updates.
Thanks,
Arnav
Sounds like you have an indexing problem. Whenever I hear about problems getting worse as table size grows, it makes me wonder if you're doing a table scan whenever you interact with a table.
Check to see if you have a primary key and meaningful indexes on that table. Look at the WHERE clause you have on that UPDATE and make sure there's an index on those columns to make finding that record as fast as possible.
UPDATE: Write a SELECT query using the WHERE clause you use to UPDATE and ask the database engine to EXPLAIN PLAN. If you see a TABLE SCAN, you'll know what to do.
Related
In PostgreSQL,
when I create a table, and doesn't create any index for it, will PostgreSQL automatically create some default index for the table?
When I later update and query the table several times, will PostgreSQL be smart enough to automatically create an index for me based on how and how often I update and query the table?
If not, what commands in PostgreSQL can help me manually choose an index that will improve the performance of the table?
Thanks.
No database engine will create indexes on its own. Indexes have an important impact on performance (when modifying the records), and it's your role to know and calculate the performance gain or drop to take a clever/informed decision. The only index which is automatically created is the PrimaryKey index.
The only thing your database engine will be "smart" about, is when and how to use the indexes which already exists. This is called the query optimizer, and it bases its decision on complex algorithms and internal statistics.
There are tools to analyze how the database works to suggest some indexes. But the best, and simplest way, is to use an EXPLAIN.
https://www.postgresql.org/docs/9.5/static/sql-explain.html
I'm trying to insert couple of million rows into a PostgreSQL database. I am wondering what is the best way to do it.
CREATE TABLE AS
INSERT INTO
I'm looking to see which one is better and why? I have read through some blogs but still couldn't come to a conclusion.
I think INSERT INTO is a bulk insert operation. Please correct me if I'm wrong. Whether CREATE TABLE AS SELECT is a bulk insert operation?
Please advise.
CREATE TABLE AS is a bulk insert operation as well. The main difference is that CREATE TABLE AS is easier to optimize for PostgreSQL; it is clear that no WAL information has to be written (unless WAL-based replication is active, of course). See the wal_level documentation and Disable WAL Archival and Streaming Replication for some other cases where this optimization applies.
I'm currently building an OLAP database in postgres and want to compare the performance of a column-store vs row-store database. CitusDB open-sourced its columnar-store extension cstore_fdw so I'm comparing database performance with and without this extension.
The example shows how to make a test db and query it. I have that example running. But then I try to add indices to it to and get the error ERROR: cannot create index on foreign table "table_name". It makes sense that I can't add indices to a foreign table. Yet, I still need to index that table, or else there's no way it will do well slicing or drilling into the data. How do I do this?
cstore_fdw currently doesn't support PostgreSQL indexes. But it automatically stores some min/max statistics in skip indexes which makes execution of some queries much more efficient.
To learn more about how to use skip indexes please consult documentation.
I am trying to add a new column to a table with upwards of 9 million records.
This issue is the column needs to be default value of 'N'. When updating the table the database is getting an issue with the temp data being filled. Also, it is taking a huge amount of time.
I was wondering if anyone knows of anyway to make this faster or a better way of doing this to avoid problems with the temp data filling up.
The database is Oracle10g.
If you could move to 11g and the column was NOT NULL, Oracle has an optimization where the default value doesn't need to be stored in each row so you can add the column very quickly. Unfortunately, it sounds like you're stuck with a depricated version of Oracle where that isn't available.
Most likely, you don't have a lot of really good options other than waiting. It may be more efficient, assuming you're doing this during a period of downtime, to create a new table with the new column, do a direct-path insert of all the data from the old table to the new table, rename the tables, and re-point any constraints at the new table. Whether this is actually more efficient than waiting for the update will depend on your hardware and your table but an INSERT is likely to be more efficient than an UPDATE. On the other hand, for a new single-character column that isn't going to create a lot of migrated rows, you're probably better off waiting for the UPDATE rather than going to this level of effort-- there are a lot of things that could potentially go wrong that you'd need to test and validate (i.e. making sure that you updated all the constraints correctly).
I've got PostgreSQL 9.2 and a tiny database with just a bit of seed data for a website that I'm working on.
The following query seems to run forever:
ALTER TABLE diagnose_bodypart ADD description text NOT NULL;
diagnose_bodypart is a table with less than 10 rows. I've let the query run for over a minute with no results. What could be the problem? Any recommendations for debugging this?
Adding a column does not require rewriting a table (unless you specify a DEFAULT). It is a quick operation absent any locks. pg_locks is the place to check, as Craig pointed out.
In general the most likely cause are long-running transactions. I would be looking at what work-flows are hitting these tables and how long the transactions are staying open for. Locks of this sort are typically transactional and so committing transactions will usually fix the problem.