Slow UPDATE appending on array inside JSONB column - postgresql

I have a table with the following structure (massiveJS default):
CREATE TABLE "myTable" (
id uuid NOT NULL DEFAULT uuid_generate_v4(),
body jsonb NOT NULL,
created_at timestamptz NULL DEFAULT now(),
updated_at timestamptz NULL,
CONSTRAINT mytable_pkey PRIMARY KEY (id)
);
This table has a GIN index on the body column and a btree index on the primary key (default).
The json object stored inside the body column has an array that can grow really large (up to 50k items).
We use this statement to append new items to the array:
UPDATE "myTable"
SET body = jsonb_set(body, '{myArray}', (body->'myArray')
|| '["item1", "item2", "item3", "itemx""]')
WHERE id = 'e6e325da-0e8b-4d2e-9481-bc7e03c195b1'
The system updates this array in several records frequently and when the system is under heavy load this UPDATE starts getting really slow, PostgreSQL CPU goes to 100% and the whole application slows down.
Is there any way we could speed up this UPDATE statement in order to append/remove elements from the array without forcing PostgreSQL too much?

Related

Convert PostgreSQL JSONB column results for use in condition with IN

I have a table with a JSONB column that is used to store multiple tags (integer) that have been applied to a task, eg.: '[123, 456, 789]'.
ALTER TABLE "public"."task" ADD COLUMN "tags" jsonb;
I also have a table dedicated to storing all the tags that can be used, and the primary key of each record is used in my JSONB column of the task table.
CREATE TABLE public.tag (
tag_id serial NOT NULL,
label varchar(50) NOT NULL,
);
In this table (tag) I have an index based on the task ID, and I want to use this index in a query that returns the tags labels that were used in a task.
SELECT * FROM task, tag WHERE task.tags #> to_jsonb(tag.tag_id)
Using to_jsonb is really bad as it doesn't use my table's index, but if I change the SQL to something like the example below, the index is used and SQL performance is much better.
SELECT * FROM tag WHERE tag.tag_id IN (123, 456, 789)
How do I convert the jsonb column (task table) to a set of integer values ​​that can be used with the IN condition, as in the example below?
SELECT * FROM task, tag WHERE tag.tag_id IN (task.tags);
You can use PostgreSQL jsonb_array_elements function which convert JSON elements to table records. For example:
SELECT * FROM task, tag WHERE tag.tag_id in (
select jsonb_array_elements('[200, 100, 789]'::jsonb)::int4 as json_data
);
But, for best performance, if you get JSON data from the table fields, so you must index this JSON field not use the standard btree index type. For JSON types PostgreSQL has a different index type as GIN index. This index type will give the best performance. I use this index in my table which has a million records. Very very best performance. Example for creating GIN index:
CREATE INDEX tag_table_json_index ON tag_table USING gin (json_field_name jsonb_path_ops);

Multicolum index vs singel column index for time series data in Postgres

This table started out at short term storage for meter data before it was going to be validated and added to some long term storage tables.
Turns out the clients wants to keep this data for a long time since we saved it and it is growing fast.
create table metering_meterreading
(
id bigserial not null. # Primary Key
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
meter_device_id uuid not null, # FK to meter_device, BTREE index
series_id uuid not null # FK to series, BTREE index
organization_id uuid not null. # FK to org , BTREE index
);
I am planning on dropping the primary key since (org_id, meter_device_id, series_id, timestamp) makes it unique. It was just added by my ORM (django) and I didn't care when we started.
But since I pretty much always want to filter in organization, meter_device, and series to get a range of time series data I am wondering if it would be more efficient to have a multicolumn index on (organization_id, meter_device_id, series_id, timestamp) instead of the separate indexes.
I read somewhere that if I had a range it should be the rightmost in the index.
This is still not an super efficient table for timeseries data, since it will grow large, but I am planning in fixing that by partitioning on range, or maybe even use Timescale. But before partitioning I would like it to be as efficient as possible to look up data in it.
I also saw an example somewhere that used a separate table to identify the metric:
create table metric
(
id
organization_id
meter_device_id
series_id
) UNIQE (organization_id, meter_device_id, series_id)
;
create table metering_meterreading
(
metric_id. bigserial, FK to metric, BTREE index
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
);
But I am not sure if that is actually better than just putting them all in table. It might impact ingestion rate since there is another table involved now.
If (org_id, meter_device_id, series_id, timestamp) uniquely determine a table row, you need to use a multi-column primary key over all of them. So you automatically have a 4-column index on these columns. Just make sure that timestamp is last in the list, then that index will support your query ideally.

References to multiple tables in PostgreSQL

I have many time series stored in a PostgreSQL database over multiple tables. I would like to create a table 'anomalies' which references to time series with particuliar behaviour, for instance a value that is exceptionally high.
My question is the following: what is the best way to link the entries of 'anomalies' with other tables?
I could create a foreign key in each table referencing to an entry in anomaly, but then it would be not so obvious to go from the anomaly to the entry referencing the anomaly.
The other possibility I see is to store the name of the corresponding table in the entries of anomalies, but it does not seem like a good idea, as the table name might change, or the table might get deleted.
Is there a more elegant solution to do this?
CREATE TABLE type_1(
type_1_id SERIAL PRIMARY KEY,
type_1_name TEXT NOT NULL,
unique(type_1_name)
)
CREATE TABLE type_1_ts(
date DATE NOT NULL,
value REAL NOT NULL,
type_1_id INTEGER REFERENCES type_1(type_1_id) NOT NULL,
PRIMARY KEY(type_1_id, date)
)
CREATE TABLE type_2(
type_2_id SERIAL PRIMARY KEY,
type_2_name TEXT NOT NULL,
unique(type_2_name)
)
CREATE TABLE type_2_ts(
date DATE NOT NULL,
value REAL NOT NULL,
state INTEGER NOT NULL,
type_2_id INTEGER REFERENCES type_2(type_2_id) NOT NULL,
PRIMARY KEY(type_2_id, date)
)
CREATE TABLE anomalies(
anomaly_id SERIAL PRIMARY_KEY,
date DATE NOT NULL,
property TEXT NOT NULL,
value REAL NOT NULL,
-- reference to a table_name and an entry id?
table_name TEXT
data_id INEGER
)
What I'd like to do at the end is to be able to do:
SELECT * FROM ANOMALIES WHERE table_name='type_1',
or simply list the data_type corresponding to the entries

sqlite3 - copy autoincrement field to other column on INSERT

Is there a more performant way to copy the autoincrement field value to another field after row insert?
I'd like to make it automatic with triggers, but I haven't enough knowledge with them to be sure of my code.
this answer (for mysql) doesn't count for it is in the query, and i don't know if it would work with multiple rows
this is the table:
CREATE TABLE IF NOT EXISTS commenti (
id INTEGER PRIMARY KEY AUTOINCREMENT,
content TEXT COLLATE NOCASE,
idcommento INTEGER REFERENCES commenti (id)
ON DELETE CASCADE ON UPDATE CASCADE);
and this is the trigger i'm using: it copies id (the autoincrement field) into idcommento only if the value is null in the query.
CREATE TRIGGER set_comment_to_self AFTER INSERT ON commenti
WHEN NEW.idcommento IS NULL
BEGIN
UPDATE commenti SET idcommento = NEW.id WHERE id = NEW.id;
END;
I don't know if I can use some sort of easy NEW.idcommento = NEW.id istead of searching every time in all the table...

Implicit Index for table

I am learning Postgresql and db in general. I have a simple query like this and I want to understand what it does
CREATE TABLE adempiere.c_mom(
c_mom_id NUMERIC(10,0) NOT NULL,
isactive character(1) DEFAULT 'Y'::bpchar NOT NULL,
start_date date NOT NULL,
start_time timestamp without time zone NOT NULL,
end_time timestamp without time zone NOT NULL,
CONSTRAINT c_mom_pkey PRIMARY KEY (c_mom_id)
);
So after I execute this I got
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "c_mom_pkey" for table "c_mom"
Now I know that my PK is c_mom_id, but what is the purpose of creating an implicit index it under name c_mom_key?
What does DEFAULT 'Y'::bpchar, or in general what does :: in psql do?
Thank you
The :: notation is a PostgreSQL-specific type cast notation, in this case to type bpchar (blank-padded char).
An index is created to back primary keys to make them efficient. If there wasn't an index to back it, each insert statement would have to scan the whole table just to figure out if that insertion would create a duplicate key or not. Using an index speeds that up (dramatically if the table is large).
This is not PostgreSQL specific. A lot of relational databases will create unique indexes to back primary keys.