How to backup whole table into a single field item? - postgresql

I have few very small tables (a total of ~1000 rows) that I want to backup regularly into the same DB, to a single table. I know it sounds weird but hear me out.
Let's say that the tables I want to backup are named linux_commands, and windows_commands. These two tables have roughly: id (pkey), name, definition, config (jsonb), commands.
I want to back these up everyday into a table called commands_backup and I want this new table to have a date field, a field for windows_commands, and another one for linux_commands, so three columns in total. Each day, a script would run and write current date to date field, and then fetch whole linux_commands table and write it to related field in a single row, then do the same for windows_commands.
How would you setup something like this? Also, what is the best data type for storing whole data set in a single item?

In the target table, windows_commands and linux_commands should be type jsonb.
Then you can use:
INSERT INTO commands_backup VALUES (
current_date,
(SELECT jsonb_agg(to_jsonb(linux_commands)) FROM linux_commands),
(SELECT jsonb_agg(to_jsonb(windows_commands)) FROM windows_commands)
);

Related

Set a global alias for a table and a column?

I'm working with a huge Firebird database where tables have completely unreadable names like WTF$RANDOM_ABBREVIATION_6792 or RPG$RANDOM_ABBREVIATION_5462 where columns have names like "rid9312", "1NUM5", "2NUM4", "RNAME8".
I need to set them global aliases to be able to use them as a full-length table names like Document and column names like
Document.CreationDate instead of xecblob.DDATE4
or
TempDoc.MovingOrderID instead of TMP$LINKED_DOC_6101.DID6101
Altering the database, or a table, or a column might be a big problem because the records might be counted by millions and tens of millions, and more over that, a major part of the Delphi-written front-end for the database is bound to the table names and column names.
Is there a way to do that somehow?
The closest thing there is to a "global alias" is to create views. For example:
create view document
as
select
DDATE4 as creationdate
-- , other columns...
from xecblob;
or
create view document (creationdate /*, other column aliases... */)
as
select
DDATE4
-- , other columns...
from xecblob;
(personally, I find the first variant more readable)
This does require altering the database, but there is no real cost associated with that (it doesn't matter if the table contains no, one, thousands or millions of records).

Dynamic table selection

Is it possible to dynamically select a table by name?
For example I have a table, and every time records are uploaded to it a backup is created first with the date appended to the table name.
table_20191108
table_20191109
table_20191110
table_20191111
What I would like to do is basically write some type of dynamic sql that always
select * from table_MAXDATE
I would like to do this so I can compare table to the most recent backup (e.g. table_20191111) in order to see what changed between the two tables.
haven't tried anything specific yet.

select all columns except two in q kdb historical database

In output I want to select all columns except two columns from a table in q/kdb historical database.
I tried running below query but it does not work on hdb.
delete colid,coltime from table where date=.z.d-1
but it is failing with below error
ERROR: 'par
(trying to update a physically partitioned table)
I referred https://code.kx.com/wiki/Cookbook/ProgrammingIdioms#How_do_I_select_all_the_columns_of_a_table_except_one.3F but no help.
How can we display all columns except for two in kdb historical database?
The reason you are getting par error is due to the fact that it is a partitioned table.
The error is documented here
trying to update a partitioned table
You cannot directly update, delete anything on a partitioned table ( there is a separate db maintenance script for that)
The query you have used as fix is basically selecting the data first in-memory (temporarily) and then deleting the columns, hence it is working.
delete colid,coltime from select from table where date=.z.d-1
You can try the following functional form :
c:cols[t] except `p
?[t;enlist(=;`date;2015.01.01) ;0b;c!c]
Could try a functional select:
?[table;enlist(=;`date;.z.d);0b;{x!x}cols[table]except`colid`coltime]
Here the last argument is a dictionary of column name to column title, which tells the query what to extract. Instead of deleting the columns you specified this selects all but those two, which is the same query more or less.
To see what the functional form of a query is you can run something like:
parse"select colid,coltime from table where date=.z.d"
And it will output the arguments to the functional select.
You can read more on functional selects at code.kx.com.
Only select queries work on partitioned tables, which you resolved by structuring your query where you first selected the table into memory, then deleted the columns you did not want.
If you have a large number of columns and don't want to create a bulky select query you could use a functional select.
?[table;();0b;{x!x}((cols table) except `colid`coltime)]
And show all columns except a subset of columns. The column clause expects a dictionary hence I am using the function {x!x} to convert my list to a dictionary. See more information here
https://code.kx.com/q/ref/funsql/
As nyi mentioned, if you want to permanently delete columns from an historical database you can use the deleteCol function in the dbmaint tools https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md

"ON UPDATE" equivalent for Amazon Redshift

I want a create a table that has a column updated_date that is updated to SYSDATE every time any field in that row is updated. How should I do this in Redshift?
You should be creating table definition like below, that will make sure whenever you insert the record, it populates sysdate.
create table test(
id integer not null,
update_at timestamp DEFAULT SYSDATE);
Every time field update?
Remember, Redshift is DW solution, not a simple database, hence updates should be avoided or minimized.
UPDATE= DELETE + INSERT
Ideally instead of updating any record, you should be deleting and inserting it, so takes care of update_at population while updating which is eventually, DELETE+INSERT.
Also, most of use ETLs, you may using stg_sales table for populating you date, then also, above solution works, where you could do something like below.
DELETE from SALES where id in (select Id from stg_sales);
INSERT INTO SALES select id from stg_sales;
Hope this answers your question.
Redshift doesn't support UPSERTs, so you should load your data to a temporary/staging table first and check for IDs in the main tables, which also exist in the staging table (i.e. which need to be updated).
Delete those records, and INSERT the data from the staging table, which will have the new updated_date.
Also, don't forget to run VACUUM on your tables every once in a while, because your use case involves a lot of DELETEs and UPDATEs.
Refer this for additional info.

Redshift query a daily-generated table

I am looking for a way to create a Redshift query that will retrieve data from a table that is generated daily. Tables in our cluster are of the form:
event_table_2016_06_14
event_table_2016_06_13
.. and so on.
I have tried writing a query that appends the current date to the table name, but this does not seem to work correctly (invalid operation):
SELECT * FROM concat('event_table_', to_char(getdate(),'YYYY_MM_DD'))
Any suggestions on how this can be performed are greatly appreciated!
I have tried writing a query that appends the current date to the
table name, but this does not seem to work correctly (invalid
operation):
Redshift does not support that. But you most likely won't need it.
Try the following (expanding on the answer from #ketan):
Create your main table with appropriate (for joins) DIST key, and COMPOUND or simple SORT KEY on timestamp column, and proper compression on columns.
Daily, create a temp table (use CREATE TABLE ... LIKE - this will preserve DIST/SORT keys), load it with daily data, VACUUM SORT.
Copy sorted temp table into main table using ALTER TABLE APPEND - this will copy the data sorted, and will reduce VACUUM on the main table. You may still need VACUUM SORT after that.
After that query your main table normally, probably giving it a range on timestamp. Redshift is optimised for these scenarios, and 99% of times you don't need to optimise table scans yourself - even on tables with billion of rows scans take milliseconds to few seconds. You may need to optimise elsewhere, but that's the second step.
To get insight in the performance of scans, use STL_QUERY system table to find your query ID, and then use STL_SCAN (or SVL_QUERY_SUMMARY) table to see how fast the scan was.
Your example is actually the main use case for ALTER TABLE APPEND.
I am assuming that you are creating a new table everyday.
What you can do is:
Create a view on top of event_table_* tables. Query your data using this view.
Whenever you create or drop a table, update the view.
If you want, you can avoid #2: Instead of creating a new table everyday, create empty tables for next 1-2 years. So, no need to update the view every day. However, do remember that there is an upper limit of 9,900 tables in Redshift.
Edit: If you always need to query today's table (instead of all tables, as I assumed originally), I don't think you can do that without updating your view.
However, you can modify your design to have just one table, with date as sort-key. So, whenever your table is queried with some date, all disk blocks that don't have that date will be skipped. That'll be as efficient as having time-series tables.