If I had a table of products and another table of manufacturers, and I wanted that table to have a count of products, is there a way in postgres to say "this column equals the number of rows in this other table that meet this condition"?
EDIT: I mean to say that the column value will be automatically calculated. So if I have a table with a column for the number of products that are red, I want this column to consistently equal the number of rows that result from doing select * from products where color='red';, without having to consistently perform that query myself.
You should not store calculated values in an operational database. If it's data warehouse, go ahead.
You can use a view to do the calculation for you.
http://sqlfiddle.com/#!15/0b744/1
You can use a materialized view to increase performance, and refresh it with a trigger on products table.
Related
I have imported a large database in anylogic, having various columns. The rows can be selected using unique primary key in database table. Similarly, how can i move through columns using integer indexes?
The attached picture shows the selection query of encircled cell, to get to other cell i need to change columns again in query which is surely not efficient 1.
I have a relation where I'll be querying the table with two columns for eg: findByXIdAndStatus.
select * from X where XId = '12345' and status = 'INACTIVE'
Status is a column which holds two values, ACTIVE or INACTIVE. Is it sufficient to create an index on XId column or do I need to create a composite index on both XId and Status or should I create multiple indexes on both columns. Currently I am using postgres DB.
The most performant index might cover the entire WHERE clause, and so would be on (XId, status). However, if the XId column itself already be very restrictive, then Postgres might use a single column index on just XId. In that case, Postgres would have to seek back to clustered index to find the status value, in order to complete filtering. In any case, you may try both types of indices, check the EXPLAIN plans, and then decide which option to use.
The single-column index on xid is probably good enough.
If only a small percentage of the rows are inactive, a partial index would be beneficial:
CREATE INDEX ON x (xid) WHERE status = 'INACTIVE';
I have a table in PostgreSQL,
but the problem is that my data isn't organized in a proper data order.
For example, the first row of my table is '2017-05-30', and last row is '2017-02-23'.
So I want to "sort" my table by date.
I'm not asking about
SELECT * FROM MY_TABLE ORDER BY DATE;
I want to "update" my table.
How can I do this?
You can't sort a PostgreSQL table in the sense you ask.
In relational algebra, the order of the rows is unimportant and there is no guarantee that rows in a table are stored in any specific order. There is also no way to ensure that rows are returned in a particular order unless you specify the order specifically e.g. by using the ORDER BY. Otherwise, you shouldn't rely on the order of the returned rows.
As pointed out the in comments, RDBMS may rearrange the order of rows in query results for optimization purposes and so on.
You can, if you like, add a new sequence number field using row_number() indicating the ranks of rows with respect to your order (e.g. the date field).
I'm having the following Redshift performance issue:
I have a table with ~ 2 billion rows, which has ~100 varchar columns and one int8 column (intCol). The table is relatively sparse, although there are columns which have values in each row.
The following query:
select colA from tableA where intCol = ‘111111’;
returns approximately 30 rows and runs relatively quickly (~2 mins)
However, the query:
select * from tableA where intCol = ‘111111’;
takes an undetermined amount of time (gave up after 60 mins).
I know pruning the columns in the projection is usually better but this application needs the full row.
Questions:
Is this just a fundamentally bad thing to do in Redshift?
If not, why is this particular query taking so long? Is it related to the structure of the table somehow? Is there some Redshift knob to tweak to make it faster? I haven't yet messed with the distkey and sortkey on the table, but it's not clear that those should matter in this case.
The main reason why the first query is faster is because Redshift is a columnar database. A columnar database
stores table data per column, writing a same column data into a same block on the storage. This behavior is different from a row-based database like MySQL or PostgreSQL. Based on this, since the first query selects only colA column, Redshift does not need to access other columns at all, while the second query accesses all columns causing a huge disk access.
To improve the performance of the second query, you may need to set "sortkey" to colA column. By setting sortkey to a column, that column data will be stored in sorted order on the storage. It reduces the cost of disk access when fetching records with a condition including that column.
Does anyone know why the order of the rows changed after I made an update to table? Is there any way to make the order go back or change to another order eg:order by alphabetical?
This is the update I performed:
update t set amount = amount + 1 where account = accountNumber
After this update when I go and see the table, the order has changed
A table doesn't have a natural row order, some database systems will actually refuse your query if you don't add an ORDER BY clause at the end of your SELECT
Why did the order change?
Because the database engine fetches your rows in the physical order they come from the storage. Some engines, like SQL Server, can have a CLUSTERED INDEX which forces a physical order, but it is still never really guaranteed that you get your results in that precise order.
The clustered index exist mostly as an optimization. PostgreSQL has a similar CLUSTER function to change the physical order, but it's an heavy process which locks the table : http://www.postgresql.org/docs/9.1/static/sql-cluster.html
How to force an alphabetical order of the rows?
Add an ORDER BY clause in your query.
SELECT * FROM table ORDER BY column