Create an index over distinct values of a column in PostgreSQL - postgresql

I have a PostgreSQL table that looks like this:
CREATE TABLE items (
name TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY (name, value)
);
I frequently do a query to see what values are available:
SELECT DISTINCT value FROM items;
How do I create an index in PostgreSQL that the above query to not have to iterate over all of the items table?

Using a completely different query you can force PostgreSQL to use an index and get the equivalent of DISTINCT column..
https://wiki.postgresql.org/wiki/Loose_indexscan

Related

Convert PostgreSQL JSONB column results for use in condition with IN

I have a table with a JSONB column that is used to store multiple tags (integer) that have been applied to a task, eg.: '[123, 456, 789]'.
ALTER TABLE "public"."task" ADD COLUMN "tags" jsonb;
I also have a table dedicated to storing all the tags that can be used, and the primary key of each record is used in my JSONB column of the task table.
CREATE TABLE public.tag (
tag_id serial NOT NULL,
label varchar(50) NOT NULL,
);
In this table (tag) I have an index based on the task ID, and I want to use this index in a query that returns the tags labels that were used in a task.
SELECT * FROM task, tag WHERE task.tags #> to_jsonb(tag.tag_id)
Using to_jsonb is really bad as it doesn't use my table's index, but if I change the SQL to something like the example below, the index is used and SQL performance is much better.
SELECT * FROM tag WHERE tag.tag_id IN (123, 456, 789)
How do I convert the jsonb column (task table) to a set of integer values ​​that can be used with the IN condition, as in the example below?
SELECT * FROM task, tag WHERE tag.tag_id IN (task.tags);
You can use PostgreSQL jsonb_array_elements function which convert JSON elements to table records. For example:
SELECT * FROM task, tag WHERE tag.tag_id in (
select jsonb_array_elements('[200, 100, 789]'::jsonb)::int4 as json_data
);
But, for best performance, if you get JSON data from the table fields, so you must index this JSON field not use the standard btree index type. For JSON types PostgreSQL has a different index type as GIN index. This index type will give the best performance. I use this index in my table which has a million records. Very very best performance. Example for creating GIN index:
CREATE INDEX tag_table_json_index ON tag_table USING gin (json_field_name jsonb_path_ops);

Amazon Redshift COMPOUND SORTKEY - does insertion order matter?

Let's say I've created an empty table in Redshift like this:
CREATE TABLE my_table (
val_1 INT ,
val_2 INT ,
val_3 FLOAT
)
COMPOUND SORTKEY(val_1, val_2)
;
When I first populate the table (let's say with the results of some query), should the records be inserted in the SORTKEY order, using the ORDER BY in the code below:
INSERT INTO my_table
SELECT val_1, val_2, val_3 FROM other_table
ORDER BY val_1, val_2
Or is there no need to do that; i.e. SORTKEY ordering of inserted records is handled physically by Redshift itself? Thx.
Assuming the same behaviour for INSERT INTO as for loading via the COPY command, there is no need to order the records first. According to the AWS docs all the following constraints be fulfilled in order to add the records to sorted region of the table - in your example you have a COMPOUND SORTKEY of 2 columns:
The table uses a compound sort key with only one sort column.
The sort column is NOT NULL.
The table is 100 percent sorted or empty.
All the new rows are higher in sort order than the existing rows, including rows marked for deletion. In this instance, Amazon Redshift uses the first eight bytes of the sort key to determine sort order.

Indexing PostgreSQL JSONB Array Elements

Like the title says, how can I index a JSONB array?
The contents look like...
["some_value", "another_value"]
I can easily access the elements like...
SELECT * FROM table WHERE data->>0 = 'some_value';
I created an index like so...
CREATE INDEX table_data_idx ON table USING gin ((data) jsonb_path_ops);
When I run EXPLAIN, I still see it sequentially scanning...
What am I missing on indexing an array of text elements?
If you want to support that exact query with an index, the index would have to look like this:
CREATE INDEX ON "table" ((data->>0));
If you want to use the index you have, you cannot limit the search to just a specific array element (in your case, the first). You can speed up a search for some_value anywhere in the array:
SELECT * FROM "table"
WHERE data #> '["some_value"]'::jsonb;
I ended up taking a different approach. I am still having problems getting the search to work using a JSONB Type, so I ended up switching my column to a varchar ARRAY
CREATE TABLE table (
data varchar ARRAY NOT NULL
);
CREATE INDEX table_data_idx ON table USING GIN (data);
SELECT * FROM table WHERE data #> '{some_value}';
This works and is using the index.
I think my problem with my JSONB approach is because the element is actually nested much further and being treated as text.
i.e. data->'some_key'->>'array_key'->>0
And everytime I try to search I get all sorts of invalid token errors and other such things.
You may want to create a materialized view that has the primary key (or other unique index of your table) and expands the array field into a text column with the jsonb_array_elements_text function:
CREATE MATERIALIZED VIEW table_mv
AS
SELECT DISTINCT table.id, jsonb_array_elements_text(data->0) AS array_elem FROM table;
You can then create a unique index on this materialized view (primary keys are not supported on materialized views):
CREATE UNIQUE INDEX table_array_idx ON table_mv(id, array_elem);
Then query with a join to the original table on its primary key:
SELECT * FROM table INNER JOIN table_mv ON table.id = table_mv.id WHERE table_mv.array_elem = 'some_value';
This query should use the unique index and then look up the primary key of the original table, both very fast.

How to convert a simple postgresql table to hypertable or timescale db table using created_at for indexing

The problem is that when I want to convert a simple Postgresql table to timescaledb table or hypertable using created_at table field for indexing then it will show this error. The table name is orders. Here cas_admin_db_new is the databse name.
I have tried all the possible way. which is bellow but the orders table doesn't convert into hypertable.
SELECT create_hypertable('orders','created_at', chunk_time_interval => 6040800000000);
ERROR: cannot create a unique index without the column "created_at" (used in partitioning)
SELECT create_hypertable('public.orders','created_at', chunk_time_interval => 6040800000000);
ERROR: cannot create a unique index without the column "created_at" (used in partitioning)
cas_admin_db_new=# SELECT create_hypertable('public.orders','created_at', chunk_time_interval => 6040800000000, created_default_indexes=>FALSE);
ERROR: function create_hypertable(unknown, unknown, chunk_time_interval => bigint, created_default_indexes => boolean) does not exist
cas_admin_db_new=# SELECT create_hypertable('"ORDER"','created_at', chunk_time_interval => 6040800000000);
ERROR: relation "ORDER" does not exist
LINE 1: SELECT create_hypertable('"ORDER"','created_at', chunk_time_...
Timescale person here. The issue is that your schema probably lists some other column as a primary key (or UNIQUE index).
TimescaleDB requires that any PK/unique index includes all partitioning keys, in your case, created_at.
That's because we do this heavy underlying partitioning, and don't want to build global lookup structures to ensure uniqueness outside of what we already use for partitioning.
More info:
https://docs.timescale.com/timescaledb/latest/how-to-guides/schema-management/indexing/##best-practices
You need to drop your current primary key on table and create new composite primary key like so:
ALTER TABLE table_name ADD PRIMARY KEY (id, created_at);
But there is problem: Unfortunately ActiveRecord doesn't support composite primary key.

Compute shared hstore key names in Postgresql

If I have a table with an HSTORE column:
CREATE TABLE thing (properties hstore);
How could I query that table to find the hstore key names that exist in every row.
For example, if the table above had the following data:
properties
-------------------------------------------------
"width"=>"b", "height"=>"a"
"width"=>"b", "height"=>"a", "surface"=>"black"
"width"=>"c"
How would I write a query that returned 'width', as that is the only key that occurs in each row?
skeys() will give me all the property keys, but I'm not sure how to aggregate them so I only have the ones that occur in each row.
The manual gets us most of the way there, but not all the way... way down at the bottom of http://www.postgresql.org/docs/8.3/static/hstore.html under the heading "Statistics", they describe a way to count keys in an hstore.
If we adapt that to your sample table above, you can compare the counts to the # of rows in the table.
SELECT key
FROM (SELECT (each(properties)).key FROM thing1) AS stat
GROUP BY key
HAVING count(*) = (select count(*) from thing1)
ORDER BY key;
If you want to find the opposite (all those keys that are not in every row of your table), just change the = to < and you're in business!