Flattened jsonb vs Nested jsonb Performance Postgresql 9.4 - postgresql

Flattened:
{key1:value1, key2:value2, key3:value3}
vs. Nested:
{id1:{key1:value1,key2:value2,key3:value3},id2:{}}
In the flattened version I would store multiple rows and have to group it, in the nested version all the data would be in one row.
Which way would be faster and is it a huge difference?

Related

PostgreSQL JSONB data type for a column performance for OLTP database

PostgreSQL v13
I am analyzing the use JSONB data type for a column in a table.
JSONB will be one of the column in the table. The idea is to get flexibility. The information (keys) stored inside the JSON will not be same every time and we may add keys overtime. The JSON object is expected to be under 50 KB and once written will not be changed.
My concerns/questions :
This is an OLTP db and requires high read/write performance. Any performance issues with JSONB data type ?
Does having JSONB lead to more bloat in the table and hence we may suffer overtime ?
In general pls share your experience with JSONB for such use-case.
Thanks in advance!

Mondrian: Aggregate tables for columnar DB

I'm working with Mondrian on a columnar DB, means I have a big flat fully denormalized fact table that contains all facts and dimensions. Unfortunately, I am not able to use a aggregate table. When I collapse all dimensions in the aggregate table, Mondrian successfully recognizes the aggregate table. But when I keep e. g. the time dimension, Mondrian does not:
14:06:14,859 WARN [AggTableManager] Recognizer.checkUnusedColumns: Candidate aggregate table 'agg_days_flattened' for fact table 'flattened' has a column 'dayofMonth' with unknown usage.
14:06:14,860 WARN [AggTableManager] Recognizer.checkUnusedColumns: Candidate aggregate table 'agg_days_flattened' for fact table 'flattened' has a column 'month' with unknown usage.
14:06:14,860 WARN [AggTableManager] Recognizer.checkUnusedColumns: Candidate aggregate table 'agg_days_flattened' for fact table 'flattened' has a column 'year' with unknown usage.
Furthermore, the aggregate table is not used when I perform a corresponding MDX query. When I model the same cube with a classical star schema, everything works fine.
For me, it looks like Mondrian neads "true" foreignkey/primarykey mappings to work with agggregate tables, which do not apply to my szenario (big flat fully denormalized fact table).
Does anyone have an idea?

Postgres Array column vs JSONB column

Is a Postgres Array column more easily indexed than a JSONB column with a JSON array in it?
https://www.postgresql.org/docs/current/arrays.html
https://www.compose.com/articles/faster-operations-with-the-jsonb-data-type-in-postgresql/
Syntactically, the JSONB array may be easier to use as you don't have to wrap your query value in a dummy array constructor:
where jsonbcolumn ? 'abc';
vs
where textarraycolumn #> ARRAY['abc']
On the other hand, the planner is likely to make better decisions with the PostgreSQL array, as it collects statistics on its contents, but doesn't on JSONB.
Also, you should read the docs for the version of PostgreSQL you are using, which is hopefully greater than 9.4 and really really should be greater than 9.1.

Selecting from JSONB field slow

I have a relatively small table (~50k rows). When I select all records, it takes a ~40s. The table has 3 JSONB columns. When I select every column except for the JSONBs, the query takes ~700ms.
If I add in just one of the JSONB fields, the query time jumps to nearly 10s.
I'm never using a where clause referencing something inside the JSONB, just selecting *. Even so, I tried adding GIN indexes because I saw them frequently mentioned as a performance booster for JSONB.
I've ran a full vacuum.
Postgres version 9.6
explain (analyze, buffers) select * from message;
Seq Scan on message (cost=0.00..5541.69 rows=52969 width=834) (actual
time=1.736..116.183 rows=52969 loops=1)
Buffers: shared hit=64 read=4948
Planning time: 0.151 ms
Execution time: 133.555 ms
Jsonb is PostgreSQL varlena data type - that means so when the value is longer than 2KB, then it is stored in auxiliary table (named TOAST table). A pointer to TOAST table is stored in main table. So when you don't touch Jsonb column, then this value is not read.
GIN index doesn't help in this case. It helps just for searching.
10sec on 50K values is long time - maybe your Jsonb values are pretty long, or your IO system doesn't perform well. Please, check size of your table, and check the performance of your IO. The cheap cloud machines usually has terrible IO.
Another possible reason of slowdown is a complexity of Jsonb data type. Jsonb is serialized tree of json sub objects. If you don't need some special features of Jsonb data type, then use JSON data type. This is just test (JSON format is checked on input only). The output of JSONB is faster than Jsonb, because JSON is internally text, and there are not any operations necessary. Output of Jsonb should be serialized, what is more expensive.

Redshift COPY csv array field to separate rows

I have a relatively large MongoDB collection that I'm migrating into Redshift. It's ~600mm documents, so I want the copy to be as efficient as possible.
The problem is, I have an array field in my Mongo collection, but I'd like to insert each value from the array into separate rows in Redshift.
Mongo:
{
id: 123,
names: ["market", "fashion", "food"]
}
In Redshift, I want columns for "id" and "names", where the primary key is (id, name). So I should get 3 new Redshift rows from that one mongo document.
Is it possible to do that with a Redshift COPY command? I can export my data as either a csv or json into s3, but I don't want to have to do any additional processing on the data due to how long it takes to do that many documents.
You can probably do it on COPY with triggers, but it'd be quite awkward and the performance would be miserable (since you can't just transform the row and would need to do INSERTs from the trigger function).
It's a trivial transform, though, why not just pass it through any scripting language on export?
You can also import as-is, and transform afterwards (should be pretty fast on Redshift):
CREATE TABLE mydata_load (
id int4,
names text[]
);
do the copy
CREATE TABLE mydata AS SELECT id, unnest(names) as name FROM mydata_load;
Redshift does not have support for Arrays as PostgreSQL does, so you cannot just insert the data as is.
However, MongoDB has a simple aggregate function which allows you to unwind arrays exactly as you want - by using the other columns as the keys. So I'd export the result of that into a JSON, and then store it into Redshift using JSONPaths.