ERROR: data type tstzrange[] has no default operator class for access method "gist" in Postgres 10 - postgresql

I am trying to set an index to a tstzrange[] column in PostgreSQL 10. I created the column via the pgAdmin 4 GUI, set its name and data type as tstzrange[] and set it as not null, nothing more.
I then did a CREATE EXTENSION btree_gist; for the database and it worked.
Then I saw in the documentation that I should index the range and I do:
CREATE INDEX era_ac_range_idx ON era_ac USING GIST (era_ac_range);
...but then I get:
ERROR: data type tstzrange[] has no default operator class for
access method "gist"
which, frankly, I don't know what it actually means, or how to solve it. What should I do ?
PS, that column is currently empty, has no data yet.
Ps2, This table describes chronological eras, there is an id, the era name (eg the sixties) and the timezone range (eg 1960-1969).
A date is inserted by the user and I want to check in which era it belongs.

Well, you have an array of timestamp-ranges as a single column. You can index an array with a GIN index and a range with (iirc) GIN or GiST. However, I'm not sure how an index on a column that is both would operate. I guess you could model it as an N-dimensional r-tree or some such.
I'm assuming you want to check for overlapping ranges.Could you normalise the data and have a linked table with one range in each row?

Related

GIN Index implementation

Generally Trigram Indexes are supposed to store the trigrams of the values in the index value.
I have understood the structure of GIN Index and how they store the values.
One thing I am stuck with is, whether they would store the trigrams of the texts given or the texts themselves.
I've read some articles and they all show gin index storing words with tsvector
Now If this is the case, GIN index shouldn't be working for searches like
SELECT * FROM table WHERE data LIKE '%word%';
But it seems to work for such a case too. I have used a database of a million rows where the column I'm searching on is a random text of size 30. I haven't used tsvector since the column is just a single word of size 30.
Example Column Value: bVeADxRVWpCeEHyNLxxfkfVkSAKkKw
But on using gin index on this column using trgm_gin_ops,
The fuzzy search seems to be much much faster. It works well.
But if gin is just storing the words as it is shown in the above image, it should'nt work for %word%. but it does, which leads me to ask the question: are gin indexes simply made up of the text values or the trigrams of the text values ?
My whole question can be simplified into this:
If I create an index a column with values like this 'bVeADxRVWpCeEHyNLxxfkfVkSAKkKw', would gin simply index this value or would it store the trigrams of the values in it's index tree. (bVe, VeA, eAD,...., kKw)
The G in GIN stands for generalized. It just works with a list of tokens per tuple-field to be indexed, but what that token actually represents depends on the operator class to define and extract. The default operator class for tsvector uses stemmed words, the operator class "gin_trgm_ops" (which is for text, but not the default one for text) uses trigrams. An example based on one will have limited applicability to the other. To understand it in a generalized way, you need to consider the tokens to just be labels. One token can point to many rows, and one row can be pointed to by many tokens. Once you get into what the tokens mean, that is the business of the operator class, not of the GIN machinery itself.
When using gin_trgm_ops, '%word%' breaks down to 'wor' and 'ord', both of which must be present in the index (for the same row) in order for '%word%' to possibly match. But 'ordinary worry' also has both of those trigrams in it, so it would pass the bitmap index scan but then be rejected by the recheck

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table.
Once finished I checked the SVL_S3LOG to see the structure of the external table and saw it was updated and I have new column with Array<int> type like expected.
When I have tried to execute select * on the external table I got this error: "Invalid operation: Nested tables do not support '*' in the SELECT clause.;"
So I have tried to detailed the select statement with all columns names:
select name, date, books.... (books is the Array<int> type)
from external_table_a1
and got this error:
Invalid operation: column "books" does not exist in external_table_a1;"
I have also checked under "AWS Glue" the table external_table_a1 and saw that column "books" is recognized and have the type Array<int>.
Can someone explain why my simple query is wrong?
What am I missing?
Querying JSON data is a bit of a hassle with Redshift: when parsing is enabled (eg using the appropriate SerDe configuration) the JSON is stored as a SUPER type. In your case that's the Array<int>.
The AWS documentation on Querying semistructured data seems pretty straightforward, mentioning that PartiQL uses "dotted notation and array subscript for path navigation when accessing nested data". This doesn't work for me, although I don't find any reasons in their SUPER Limitations Documentation.
Solution 1
What I have to do is set the flags set json_serialization_enable to true; and set json_serialization_parse_nested_strings to true; which will parse the SUPER type as JSON (ie back to JSON). I can then use JSON-functions to query the data. Unnesting data gets even crazier because you can only use the unnest syntax select item from table as t, t.items as item on SUPER types. I genuinely don't think that this is the supposed way to query and unnest SUPER objects but that's the only approach that worked for me.
They described that in some older "Amazon Redshift Developer Guide".
Solution 2
When you are writing your query or creating a query Redshift will try to fit the output into one of the basic column data types. If the result of your query does not match any of those types, Redshift will not process the query. Hence, in order to convert a SUPER to a compatible type you will have to unnest it (using the rather peculiar Redshift unnest syntax).
For me, this works in certain cases but I'm not always able to properly index arrays, not can I access the array index (using my_table.array_column as array_entry at array_index syntax).

How to SET jsonb_column = json_build_array( string_column ) in Sequelize UPDATE?

I'm converting a one-to-one relationship into a one-to-many relationship. The old relationship was just a foreign key on the parent record. The new relationship will be an array of foreign keys on the parent record.
(Using Postgres dialect, BTW.)
First I'll add a new JSONB column, which will hold an array of UUIDs.
Then I'll run a query to update all existing rows such that the value from the old column is now stored in the new column (as the first element in an array).
Finally, I'll remove the old column.
I'm looking for help with step 2: writing the update statement that will update all rows, setting the value of the new column based on the value of the old column. Basically, I'm trying to figure out how to express this SQL query using Sequelize:
UPDATE "myTable"
SET "newColumn" = json_build_array("oldColumn")
-- ^^ this really works, btw
Where:
newColumn is type JSONB, and should hold an array (of UUIDs)
oldColumn is type UUID
names are double-quoted because they're mixed case in the DB (shrug)
Expressed using Sequelize sugar, that might be something like:
const { models } = require('../sequelize')
await models.MyModel.update({ newColumn: [ 'oldColumn' ] })
...except that would result in saving an array that contains the string "oldColumn" rather than an array whose first element is the value in that row's oldColumn column.
My experience, and the Sequelize documentation, is focused on working with individual rows via the standard instance methods. I could do that here, but it'd be a lot better to have the database engine do the work internally instead of forcing it to transfer every row to Node and then back again.
Looking for whatever is the most Sequelize-idiomatic way of doing this, if there is one.
Any help is appreciated.

Postgresql: auto lowercase text while (or before) inserting to a column

I want to achieve case insensitive uniqueness in a varchar column. But, there is no case insensitive text data type in Postgres. Since original case of text is not important, it will be a good idea to convert all to lowercase/uppercase before inserting in a column with UNIQUE constraint. Also, it will require one INDEX for quick search.
Is there any way in Postgres to manipulate data before insertion?
I looked at this other question: How to automatically convert a MySQL column to lowercase.
It suggests using triggers on insert/update to lowercase text or to use views with lowercased text. But, none of the suggested methods ensure uniqueness.
Also, since this data will be read/written by various applications, lowercasing data in every individual application is not a good idea.
ALTER TABLE your_table
ADD CONSTRAINT your_table_the_column_lowercase_ck
CHECK (the_column = lower(the_column));
From the manual:
The use of indexes to enforce unique constraints could be considered
an implementation detail that should not be accessed directly.
You don't need a case-insensitive data type (although there is one)
CREATE UNIQUE INDEX idx_lower_unique
ON your_table (lower(the_column));
That way you don't even have to mess around with the original data.

DB2 Auto generated Column / GENERATED ALWAYS pros and cons over sequence

Earlier we were using 'GENERATED ALWAYS' for generating the values for a primary key. But now it is suggested that we should, instead of using 'GENERATED ALWAYS' , use sequence for populating the value of primary key. What do you think can be the reason of this change? It this just a matter of choice?
Earlier Code:
CREATE TABLE SCH.TAB1
(TAB_P INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1, NO CACHE),
.
.
);
Now it is
CREATE TABLE SCH.TAB1
(TAB_P INTEGER ),
.
.
);
now while inserting, generate the value for TAB_P via sequence.
I tend to use identity columns more than sequences, but I'll compare the two for you.
Sequences can generate numbers for any purpose, while an identity column is strictly attached to a column in a table.
Since a sequence is an independent object, it can generate numbers for multiple tables (or anything else), and is not affected when any table is dropped. When a table with a identity column is dropped, there is no memory of what value was last assigned by that identity column.
A table can have only one identity column, so if you want to want to record multiple sequential numbers into different columns in the same table, sequence objects can handle that.
The most common requirement for a sequential number generator in a database is to assign a technical key to a row, which is handled well by an identity column. For more complicated number generation needs, a sequence object offers more flexibility.
This might probably be to handle ids in case there are lots of deletes on the table.
For eg: In case of identity, if your ids are
1
2
3
Now if you delete record 3, your table will have
1
2
And then if your insert a new record, the ids will be
1
2
4
As opposed to this, if you are not using an identity column and are generating the id using code, then after delete for the new insert you can calculate id as max(id) + 1, so the ids will be in order
1
2
3
I can't think of any other reason, why an identity column should not be used.
Heres something I found on the publib site:
Comparing IDENTITY columns and sequences
While there are similarities between IDENTITY columns and sequences, there are also differences. The characteristics of each can be used when designing your database and applications.
An identity column has the following characteristics:
An identity column can be defined as
part of a table only when the table
is created. Once a table is created,
you cannot alter it to add an
identity column. (However, existing
identity column characteristics might
be altered.)
An identity column
automatically generates values for a
single table.
When an identity
column is defined as GENERATED
ALWAYS, the values used are always
generated by the database manager.
Applications are not allowed to
provide their own values during the
modification of the contents of the
table.
A sequence object has the following characteristics:
A sequence object is a database
object that is not tied to any one
table.
A sequence object generates
sequential values that can be used in
any SQL or XQuery statement.
Since a sequence object can be used
by any application, there are two
expressions used to control the
retrieval of the next value in the
specified sequence and the value
generated previous to the statement
being executed. The PREVIOUS VALUE
expression returns the most recently
generated value for the specified
sequence for a previous statement
within the current session. The NEXT
VALUE expression returns the next
value for the specified sequence. The
use of these expressions allows the
same value to be used across several
SQL and XQuery statements within
several tables.
While these are not all of the characteristics of these two items, these characteristics will assist you in determining which to use depending on your database design and the applications using the database.
I don't know why anyone would EVER use an identity column rather than a sequence.
Sequences accomplish the same thing and are far more straight forward. Identity columns are much more of a pain especially when you want to do unloads and loads of the data to other environments. I not going to go into all the differences as that information can be found in the manuals but I can tell you that the DBA's have to almost always get involved anytime a user wants to migrate data from one environment to another when a table with an identity is involved because it can get confusing for the users. We have no issues when a sequence is used. We allow the users to update any schema objects so they can alter their sequences if they need to.