DB Architecture question: Would using a JSONB column for storing row changes be an efficient solution? - postgresql

I have an app which allows for dynamic table and column generation. One feature we are looking to implement is change tracking for each row. Anytime any changes occur, we want to keep track of the field and what the new value is.
One potential solution we are looking at is adding a "history" JSONB column to the table which would contain an ongoing array of every change. When a field is updated or multiple fields are updated, we could append a new element to the history field by using the || append syntax which will append the element to the existing jsonb vlaue:
update table set history = history || '{new node with changes}'
When viewing the "history changes", it would all be contained in this JSONB field and we'd have logic to parse out /display the changes over time.
Note: We won't need to query this JSONB column. It's simply the place to store all the row level changes. It seems like this should be an efficient way of saving updates.
My question is, is this a viable solution in regards to performance over time? It's possible this history field could become large over time, so would making any row changes become slower the larger the field? Or does the fact we are using the || operator to append data to the field mitigate any performance issues?
FYI, I am currently on Postgres 11, but could upgrade to newer versions if that had an impact.
Thanks for any feedback you can provide.

Related

MongoDb Best Practice | Insert "null" fields

I have a question regarding best practices to insert Documents in MongoDb.
In my data source the key "myData2" can be null or a string. Should I add "myData2" as null to my database or is it better to leave the value out if not defined? What is the "clean" way to deal with this?
[{
"myData1": "Stuff",
"myData2": null
}]
Since MongoDB permits fields to be added to documents at any time, most (production) applications are written to handle both of the following cases:
A new field is added to the code, but the existing data doesn't have it, and it needs to be added over time to the existing data either on demand or as a background process
A field is no longer used by the code but still contains values in the database
What would your application do if the field is missing, as opposed to if it's set to the null value? If it would do the same thing, then I suggest not setting fields to null values for two reasons:
It streamlines the code because you only need to handle one possibility (missing field) on the reading side, instead of two (field missing or null)
It requires less storage space in the database.

iSQLOutput - Update only Selected columns

My flow is simple and I am just reading a raw file into a SQL table.
At times the raw file contains data corresponding to existing records. I do not want to insert a new record in that case and would only want to update the existing record in the SQL table. The challenge is, there is a 'record creation date' column which I initialize at the time of record creation. The update operation overwrites that column too. I just want to avoid overwriting that column, while updating the other columns from the information coming from the raw file.
So far I am having no idea about how to do that. Could someone make a recommendation?
I defaulted the creation column to auto-populate in the SQL database itself. And I changed my flow to just update the remaining records. Talend job is now not touching that column. Problem solved.
Yet another reminder of 'Simplification is underrated'. :)

"In place" updates with PostgreSQL

We have a table with relatively large rows (10-40kb) and we have to update a single integer column in this table quite frequently.
As far as I know, an UPDATE in PostgreSQL is a transaction of DELETE+INSERT (due MVCC).
That's mean PostgreSQL will delete and re-insert the entire row, even if I want to update just a single integer (not requires additional space).
I would like to know if it's possible to perform an UPDATE operation "in place" without deleting and re-inserting rows?
Only fields stored in-line need to be copied. For fields stored out-of-line in TOAST tables, only the reference to the TOAST entry is copied.
Whether a field is stored out-of-line depends on the size of the value in the field and on the field's data type.
If the tuples are large but only have a few fields - like
some_id integer,
frequently_updated integer,
charblob text
then there's not much point changing anything because updates of frequently_updated won't generally rewrite the data in charblob, at least if it's big enough that it's worth caring.
OTOH, if you have a table with lots of fields you'll be rewriting a lot more with each update.
HOT will only help you to a limited extent because a HOT update can only happen when no updated column(s) are part of an index and there's enough free space on the same database page. For wide rows you won't fit many copies on a page even with TOAST, so HOT will be of limited benefit.
It can be worth separating such fields out into separate tables if they're really frequently updated but the rest of the table has wide rows that don't change much.

How can I (partially) automate the transfer of a FileMaker database structure and field contents to a second database?

I'm trying to copy some field values to a duplicate database. One record at a time. This is used for history and so I can delete some records in the original database to keep it fast.
I don't want to manually save the values in a variable because there are hundreds of fields. So I want to go to the first field, save the field name and value and then go over to the other database and save the data. Then run a 'Go to Next Field' and loop through all the fields.
This works perfectly, but here is the problem: When a field is a calculation you cannot tab into it and therefore 'Go to Next Field' doesn't work. It skips it.
I though of doing a 'Go to Object' but then I need to name all the objects and I can't find a script to name objects.
Can anyone out there think of a solution?
Thanks!
This is one of those problems where I always found it easier to do an export/import.
Export all the data you want from the one database, and then import it into the other database. All you need to do is:
Manually specify which fields you want to copy
Map the data from the export to the right fields in the new database/table
You can even write a script to do these things for you.
There are several ways to achieve this.
To make a "history file", I have found there are several cases out there, so lets take a look.
CASE ONE
Single file I just want to "keep" a very large file with historical data, because I need to erease all data in my Main file.
In this case, you should create a "clone" table (in the same file ore in other file, is the same). Then change any calculation field to the type of the calculation result (number, text, date, an so on...). Remove any "auto entered value or calculation from any field, like auto number, auto creation date, etc..). You will have a "Plain Table" with no calculations or auto entered data.
Then add a field to control duplicate data. If you have lets say an invoice number (unique) for each record, you can do this to achieve this task. But if you do not have a unique field that identifies the record as unique, then you have to create one...
To create such a field, I recommed to add a new field on the clone table and set as an aunto entered calculation and make a field combination that is unique... somthing like this: invoiceNumber & "-" & lineNumber & "-" " & date.
On the clone table make shure that validation is set up for "always", and no empty values allowed and that this value is unique.
Once you setup the clone table... then you can import your records, making sure that the auto enty option is on. Yo can do it as many times as you like, new records will be added and no duplicates.
If you want, can make a Script to do the move to historical table all the current records before deleting them.
NOTE:
This technique works fine when the data you try to keep do not have changes over time. This means, once the record is created is has no changes.
CASE TWO
A historical table must be created but some fields are updated.
In the beginnig I thougth a historical data, never changes. In some cases I found this is not the case, like the case I want to track historical invoices but at the same time, keep track if they are paid or not...
In this case you may use the same technique above, but instead of importing data... you must update data based on the "unique" fields that identifiy the record.
Hope this technique helps
FileMaker's FieldNames() function, along with GetField() can give you a list of field names and then their values

Postgres full text search across multiple related tables

This may be a very simplistic question, so apologies in advance, but I am very new to database usage.
I'd like to have Postgres run its full text search across multiple joined tables. Imagine something like a model User, with related models UserProfile and UserInfo. The search would only be for Users, but would include information from UserProfile and UserInfo.
I'm planning on using a gin index for the search. I'm unclear, however, on whether I'm going to need a separate tsvector column in the User table to hold the aggregated tsvectors from across the tables, and to setup triggers to keep it up to date. Or if it's possible to create an index without a tsvector column that'll keep itself up to date whenever any of the relevant fields in any of the relevant tables change. Also, any tips on the syntax of the command to create all this would be much appreciated as well.
Your best answer is probably to have a separate tsvector column in each table (with an index on, of course). If you aggregate the data up to a shared tsvector, that'll create a lot of updates on that shared one whenever the individual ones update.
You will need one index per table. Then when you query it, obviously you need multiple WHERE clauses, one for each field. PostgreSQL will then automatically figure out which combination of indexes to use to give you the quickest results - likely using bitmap scanning. It will make your queries a little more complex to write (since you need multiple column matching clauses), but that keeps the flexibility to only query some of the fields in the cases where you want.
You cannot create one index that tracks multiple tables. To do that you need the separate tsvector column and triggers on each table to update it.