Index method for a column used only for ordering - postgresql

I have a table product_images with a foreign key product_id and integer field order to manualy set order of product's images. Knowing that the table will be used only like this:
SELECT * FROM product_images
WHERE product_id = ?
ORDER BY "order"
-- what is the optimal index method for product_id and order?
Is that enough?:
CREATE INDEX product_images_unique_order
ON "product_images"("product_id", "order");
SQL Fiddle

Yes, that should do it.
PostgreSQL might decide not to use that index, depending on how many rows you have, how many images any given product_id has, and how scattered about the table all of the rows with the same product_id are, and how wide the rows of the product_images table are; plus many other things.
But by having that index you provide PostgreSQL with the opportunity to use it.

Related

Use index to speed up query using values from different tables

I have a table products, a table orders and a table orderProducts.
Products have a name as a PK (apple, banana, mango) and a price .
orders have a created_at date and an id as a PK.
orderProducts connects orders and products, so they have a product_name and an order_id. Now I would like to show all orders for a given product that happened in the last 24 hours.
I use the following query:
SELECT
orders.id,
orders.created_at,
products.name,
products.price
FROM
orderProducts
JOIN products ON
products.name=orderProducts.product
JOIN orders ON
orders.id=orderProducts.order
WHERE
products.name='banana'
AND
orders.created_at BETWEEN NOW() - INTERVAL '24 HOURS' AND NOW()
ORDER BY
orders.created_at
This works, but I would like to optimize this query with an index. This index would need to first be ordered by
the product name, so it can be filtered
then the created_at of the order in descending order, so it can select only the ones from 24 hours ago
The problem is, that from what I have seen, indexes can only be created on a single table, without the possibility of joining another tables values to it. Since two individual index do not solve this problem either, I was wondering if there was an alternative way to optimize this particular query.
Here are the table scripts:
CREATE TABLE products
(
name text PRIMARY KEY,
price integer,
)
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
)
CREATE TABLE orderProducts
(
product text REFERENCES products(name),
"order" integer REFERENCES orders(id),
)
First of all. Please do not put indices everywhere - that lead to slower changing operations...
As proposed by #Laurenz Albe - do not guess - check.
Other than that. Note that you know product name, price is repeated - so you can query that once. Question if in your case two queries are going to be faster then single one... Check that.
Please read docs. I would try this index:
create index orders_id_created_at on orders(created_at desc, id)
Normally id should go first, since that is unique, however here system should be able to filter out on both predicates - where/join. Just guessing here.
orderProducts I would like to see index on both columns, however for this query only one should be needed. In practice you are going from products to orders, or other way - both paths are possible, that is why I've wrote about indexing both columns. I would use two separate indexes:
create index orderproducts_product_id on orderproducts (product_id) include (order_id);
create index orderproducts_order_id on orderproducts (order_id) include (product_id);
Probably that is not changing much, but... idea is to use only index, but not the table itself.
These rules are important in terms of performance:
Integer index faster than string index, therefore, you should try to make the primary keys always be an integer. Because join the tables uses primary keys too.
If when in where clauses always use two fields then we must create an index for both fields.
Foreign-Keys are not indexed, you must create an index for foreign-key fields manually.
So, recommended table scripts will be are that:
CREATE TABLE products
(
id serial primary key,
name text,
price integer
);
CREATE UNIQUE INDEX products_name_idx ON products USING btree (name);
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX orders_created_at_idx ON orders USING btree (created_at);
CREATE TABLE orderProducts
(
product_id integer REFERENCES products(id),
order_id integer REFERENCES orders(id)
);
CREATE INDEX orderproducts_product_id_idx ON orderproducts USING btree (product_id, order_id);
---- OR ----
CREATE INDEX orderproducts_product_id ON orderproducts (product_id);
CREATE INDEX orderproducts_order_id ON orderproducts (order_id);

Reading an append-only list from PostgreSQL

I would like to implement an append-only list in PostgreSQL. Basically, this is trivial: Create a table, and only ever INSERT into that table.
However, I would like to be able to read that list again, in the order it was created. How can I do this? Is a simple SELECT * FROM MyTable enough? If not, what do I sort by?
Rows in a relational database have no inherent sort order. The only way to get a guaranteed sort order is to use an order by.
You can either create an identity column that is incremented on every insert or a timestamp column that records the precise time a row was inserted (or do both).
e.g.
create table append_only
(
id bigint generated always as identity,
... other columns ...
created_at timestamp default clock_timestamp()
);
Then use that column for an order by. By having both, you can use the id column as a tie breaker when sorting by the timestamp in case two rows were inserted at exactly same microsecond.
You could create column with data type SERIAL(similiar to AUTOINCREMENT/SEQUENCE):
CREATE TABLE myTable(id SERIAL, ...)
SELECT * FROM myTable ORDER BY id;

Ideal postgres index for non unique varchar column

I need to create a varchar category column in a table and search for rows that are belonging to a particular category.
ie. ALTER TABLE items ADD COLUMN category VARCHAR(30)
The number of categories is very small (repeated across the table)
and the intention is to only use = in the where clause.
ie. select * from items where category = 'food'
What kind of index would be ideal in postgres?
Especially if the table is never expected to be too big (less than 5,000 rows always)
This is a textbook usecase for a Hash Index - you have a very small number of distinct values and only use the equality operator to query them. Using a hash index will enable you to index a relatively small hash of the value, which will allow for faster querying.

Compute shared hstore key names in Postgresql

If I have a table with an HSTORE column:
CREATE TABLE thing (properties hstore);
How could I query that table to find the hstore key names that exist in every row.
For example, if the table above had the following data:
properties
-------------------------------------------------
"width"=>"b", "height"=>"a"
"width"=>"b", "height"=>"a", "surface"=>"black"
"width"=>"c"
How would I write a query that returned 'width', as that is the only key that occurs in each row?
skeys() will give me all the property keys, but I'm not sure how to aggregate them so I only have the ones that occur in each row.
The manual gets us most of the way there, but not all the way... way down at the bottom of http://www.postgresql.org/docs/8.3/static/hstore.html under the heading "Statistics", they describe a way to count keys in an hstore.
If we adapt that to your sample table above, you can compare the counts to the # of rows in the table.
SELECT key
FROM (SELECT (each(properties)).key FROM thing1) AS stat
GROUP BY key
HAVING count(*) = (select count(*) from thing1)
ORDER BY key;
If you want to find the opposite (all those keys that are not in every row of your table), just change the = to < and you're in business!

How multiple indexes in postgres work on the same column

I was wondering I'm not really sure how multiple indexes would work on the same column.
So lets say I have an id column and a country column. And on those I have an index on id and another index on id and country. When I do my query plan it looks like its using both those indexes. I was just wondering how that works? Can I force it to use just the id and country index.
Also is it bad practice to do that? When is it a good idea to index the same column multiple times?
It is common to have indexes on both (id) and (country,id), or alternatively (country) and (country,id) if you have queries that benefit from each of them. You might also have (id) and (id, country) if you want the "covering" index on (id,country) to support index only scans, but still need the stand along to enforce a unique constraint.
In theory you could just have (id,country) and still use it to enforce uniqueness of id, but PostgreSQL does not support that at this time.
You could also sensibly have different indexes on the same column if you need to support different collations or operator classes.
If you want to force PostgreSQL to not use a particular index to see what happens with it gone, you can drop it in a transactions then roll it back when done:
BEGIN; drop index table_id_country_idx; explain analyze select * from ....; ROLLBACK;