Get table name of Redshift COPY commands from stl_load_commits - amazon-redshift

Trying to get a list of COPY commands run on a particular date and the tables that were updated for each COPY command.
Working with this query:
select
slc.query as query_id,
trim(slc.filename) as file,
slc.curtime as updated,
slc.lines_scanned as rows,
sq.querytxt as querytxt
from stl_load_commits slc
join stl_query sq on sq.query = slc.query
where trunc(slc.curtime) = '2020-05-07';
How can we get the table that was updated for each COPY command? Maybe using a Redshift RegEx function on querytxt? Or joining to another system table to find the table id or name?

this regex will select the table or schema.table from stl_query.querytxt
select
slc.query as query_id,
trim(slc.filename) as file,
slc.curtime as updated,
slc.lines_scanned as rows,
sq.querytxt as querytxt,
REGEXP_REPLACE(LOWER(sq.querytxt), '^copy (analyze )?(\\S+).*$', '$2') AS t
from stl_load_commits slc
join stl_query sq on sq.query = slc.query
where trunc(updated) = '2020-05-07';

Related

Multiple table UPDATE using SELECT that requires JOIN using PGAdmin

I am using PGAdmin III and postgres 9.6
I have postgres schemas that consists of 250 tables with various columns. Two tables are of similar names and have the same columns, but unique data and IDs. I want to UPDATE one column in both tables with a specific number but choosing which rows get updated is complicated.
I have a script which correctly works on one table at a time, but I wonder if it can be done in a single script. I have tried using DO and LOOP but keep running into roadblocks.
Here's the working single-table script:
UPDATE table1
SET number = 44
WHERE table1.id IN
(SELECT table1.id FROM plan
JOIN name ON name.id = plan.nameid
JOIN type ON type.id = name.typeid
JOIN simul ON simul.id = plan.simulid
LEFT JOIN table1 ON table1.id = tableid
WHERE type.tag = 'X' AND plan.class LIKE '%Table%' AND simul.label IN
('SIMUL90','SIMUL99','SIMUL87'));
"plan.class" contains text which determines class of simul and thus either Table1 or Table2. Table1 and Table2 are of identical column structure but their IDs may be identical so UNION is out of consideration. TH other joins are to fine tune the SELECT so I get only a small subset of the table. The number to update is the same for either table.

Postgres - UNION ALL vs INSERT INTO which is better?

I have two query by union all and insert into temp table.
Query 1
select *
from (
select a.id as id, a.name as name from a
union all
select b.id as id, b.name as name from b
)
Query 2
drop table if exists temporary;
create temp table temporary as
select id as id, name as name
from a;
insert into temporary
select id as id, name as name
from b;
select * from temp;
Please tell me which one is better for performance?
I would expect the second option to have better performance, at least at the the database level. Both versions require doing a full table scan of both the a and b tables. But the first version would create an unnecessary intermediate table, used only for the purpose of the insert.
The only potential issue with doing two separate inserts is latency, i.e. the time it might take some process to get to and from the database. If you are worried about this, then you can limit to one insert statement:
INSERT INTO temporary (id, name)
SELECT id, name FROM a
UNION ALL
SELECT id, name FROM b;
This would just require one trip to the database.
I think use union all is the better performance way, not sure, you can try it your self. In tab run of SQL application alway show time to run. I take a snapshot in oracle; mysql and sql sv have the same tool to see it
click here to see image

selecting a distinct column with alias table not working in postgres

SELECT pl_id,
distinct ON (store.store_ID),
in_user_id
FROM plan1.plan_copy_levl copy1
INNER JOIN plan1._PLAN_STORE store
ON copy1.PLAN_ID = store .PLAN_ID;
while running this query in postgres server i am getting the below error..How to use the distinct clause..in above code plan 1 is the schema name.
ERROR: syntax error at or near "distinct" LINE 2: distinct ON
(store.store_ID),
You are missing an order by where the first set of rows should be the ones specified in the distinct on clause. Also, the distinct on clause should be at start of the selection list.
Try this:
SELECT distinct ON (store_ID) store.store_ID, pl_id,
in_user_id
FROM plan1.plan_copy_levl copy1
INNER JOIN plan1._PLAN_STORE store
ON copy1.PLAN_ID = store .PLAN_ID
order by store_ID, pl_id;

Amazon Redshift how to get the last date a table inserted data

I am trying to get the last date an insert was performed in a table (on Amazon Redshift), is there any way to do this using the metadata? The tables do not store any timestamp column, and even if they had it, we need to find out for 3k tables so it would be impractical so a metadata approach is our strategy. Any tips?
All insert execution steps for queries are logged in STL_INSERT. This query should give you the information you're looking for:
SELECT sti.schema, sti.table, sq.endtime, sq.querytxt
FROM
(SELECT MAX(query) as query, tbl, MAX(i.endtime) as last_insert
FROM stl_insert i
GROUP BY tbl
ORDER BY tbl) inserts
JOIN stl_query sq ON sq.query = inserts.query
JOIN svv_table_info sti ON sti.table_id = inserts.tbl
ORDER BY inserts.last_insert DESC;
Note: The STL tables only retain approximately two to five days of log history.

Can a database table partition name be used as a part of WHERE clause for IBM DB2 9.7 SELECT statement?

I am trying to select all data out of the same specific table partition for 100+ tables using the DB2 EXPORT utility. The partition name is constant across all of my partitioned tables, which makes this method more advantageous than using some other possible methods.
I cannot detach the partitions as they are in a production environment.
In order to script this for semi-automation, I need to be able to run the query:
SELECT * FROM MYTABLE
WHERE PARTITION_NAME = MYPARTITION;
I am not able to find the correct syntax for utilizing this type of logic in my SELECT statement passed to the EXPORT utility.
You can do something like this by looking up the partition number first:
SELECT SEQNO
FROM SYSCAT.DATAPARTITIONS
WHERE TABNAME = 'YOURTABLE' AND DATAPARTITIONNAME = 'WHATEVER'
then using the SEQNO value in the query:
SELECT * FROM MYTABLE
WHERE DATAPARTITIONNUM(anycolumn) = <SEQNO value>
Edit:
Since it does not matter what column you reference in DATAPARTITIONNUM(), and since each table is guaranteed to have at least one column, you can automatically generate queries by joining SYSCAT.DATAPARTITIONS and SYSCAT.COLUMNS:
select
'select * from', p.tabname,
'where datapartitionnum(', colname, ') = ', seqno
from syscat.datapartitions p
inner join syscat.columns c
on p.tabschema = c.tabschema and p.tabname = c.tabname
where colno = 1
and datapartitionname = '<your partition name>'
and p.tabname in (<your table list>)
However, building dependency on database metadata into your application is, in my view, not very reliable. You can simply specify the appropriate partitioning key range to extract the data, which will be as efficient.