Log some summary data to the execution results window - pentaho-spoon

I'm reasonably new to pentaho spoon and I'm trying to find a way to write the result of a query to the execution results window.
I'm importing a lot of data into a number of postgres tables and I'd like to log the number of rows that the tables hold at the end on my job.
I can run the following query in postgres to get the information I want:
select relname as table_name, n_live_tup as rows
from pg_stat_user_tables
where pg_stat_user_tables.n_live_tup > 0
order by relname asc;
But I can't figure out get the dozen rows returned into the log window.

Related

How to optimize a query that searches a many-to-many table

I have 3 tables:
table1:{id, uid}
table2:{id, uid}
table1_table2:{table1_id, table2_id}
I need to execute the following queries:
SELECT 1 FROM table1_table2
LEFT JOIN table1 ON table1.id = table1_table2.table1_id
LEFT JOIN table2 ON table2.id = table1_table2.table2_id
WHERE table1.uid = ? and table2.uid = ?
I have unique indices on UUID columns, so I expected the search to be fast. When I have an almost empty database, select takes 0 ms, when there are 50,000 records in table 1, 100 records in table 2 and 110,000 records in table1_table2, select takes 10 ms, which is a lot, because I have to make 400,000 queries. Can I have O(1) on select?
Now I'm using hibernate(spring data) and postgres.
You have unique indices but have you updated statistics with ANALYZE as well?
What type is used for UID column and what type are you feeding it with from Java?
Is there any difference, when you run it from Hibernate/Java and from Postgres console?
Run the query with "EXPLAIN", get the execution plan - from Java as well as from Postgres console, and observe any differences. See How to get query plan information from Postgres into JDBC

Bad JDBC select performance with big table

I have a simple select on a big table with PostgreSQL:
select a, b from c order by id asc
I prepare a statement with JDBC and the first result takes very long. It seems that the result is materialized fully immediately. If I execute the query interactively it shows the same behavior. If I add limit 20, the result comes immediately, so there is no indexing or full table scan:
select a, b from c order by id asc Limit 20
Normally, the ResultSet is accessed with a cursor and should immediately deliver results. Also, the memory consumption grows constantly during the query execution, supporting the thesis that the ResultSet is materialized immediately.
Any hints about that?

How can I get the total run time of a query in redshift, with a query?

I'm in the process of benchmarking some queries in redshift so that I can say something intelligent about changes I've made to a table, such as adding encodings and running a vacuum. I can query the stl_query table with a LIKE clause to find the queries I'm interested in, so I have the query id, but tables/views like stv_query_summary are much too granular and I'm not sure how to generate the summarization I need!
The gui dashboard shows the metrics I'm interested in, but the format is difficult to store for later analysis/comparison (in other words, I want to avoid taking screenshots). Is there a good way to rebuild that view with sql selects?
To add to Alex answer, I want to comment that stl_query table has the inconvenience that if the query was in a queue before the runtime then the queue time will be included in the run time and therefore the runtime won't be a very good indicator of performance for the query.
To understand the actual runtime of the query, check on stl_wlm_query for the total_exec_time.
select total_exec_time
from stl_wlm_query
where query='query_id'
There are some usefuls tools/scripts in https://github.com/awslabs/amazon-redshift-utils
Here is one of said scripts stripped out to give you query run times in milliseconds. Play with the filters, ordering etc to show the results you are looking for:
select userid, label, stl_query.query, trim(database) as database, trim(querytxt) as qrytext, starttime, endtime, datediff(milliseconds, starttime,endtime)::numeric(12,2) as run_milliseconds,
aborted, decode(alrt.event,'Very selective query filter','Filter','Scanned a large number of deleted rows','Deleted','Nested Loop Join in the query plan','Nested Loop','Distributed a large number of rows across the network','Distributed','Broadcasted a large number of rows across the network','Broadcast','Missing query planner statistics','Stats',alrt.event) as event
from stl_query
left outer join ( select query, trim(split_part(event,':',1)) as event from STL_ALERT_EVENT_LOG group by query, trim(split_part(event,':',1)) ) as alrt on alrt.query = stl_query.query
where userid <> 1
-- and (querytxt like 'SELECT%' or querytxt like 'select%' )
-- and database = ''
order by starttime desc
limit 100

Amazon Redshift how to get the last date a table inserted data

I am trying to get the last date an insert was performed in a table (on Amazon Redshift), is there any way to do this using the metadata? The tables do not store any timestamp column, and even if they had it, we need to find out for 3k tables so it would be impractical so a metadata approach is our strategy. Any tips?
All insert execution steps for queries are logged in STL_INSERT. This query should give you the information you're looking for:
SELECT sti.schema, sti.table, sq.endtime, sq.querytxt
FROM
(SELECT MAX(query) as query, tbl, MAX(i.endtime) as last_insert
FROM stl_insert i
GROUP BY tbl
ORDER BY tbl) inserts
JOIN stl_query sq ON sq.query = inserts.query
JOIN svv_table_info sti ON sti.table_id = inserts.tbl
ORDER BY inserts.last_insert DESC;
Note: The STL tables only retain approximately two to five days of log history.

ERROR: could not read block 4707 of relation 1663/16384/16564: Success

I am using psql 8.1.18 on Glassfishserver. I have a query like this:
select ip,round((select sum(t1.size) from table t1))
from table
where date > '2011.07.29'
and date < '2011.07.30'
and ip = '255.255.255.255'
group by ip;
When I run this query I got this error:
ERROR: could not read block 4707 of relation 1663/16384/16564: Success
However this query works fine:
select ip,round(sum(size)/175)
from table
where date > '2011.07.29'
and l_date < '2011.07.30'
and ip = '255.255.255.255'
group by ip;
I think it might be a database error and I need to restore the table from the backup, maybe. But first I need to learn where this corrupted data exist. Does anyone know how to find 1663/16384/16564 relation? Or 4707 block?
EDIT:
I tried this code:
select relname , relfilenode from pg_class where relname in ('1663','16384','16564');
but it returns:
relname | relfilenode
---------+-------------
(0 rows)
It looks like there are bad blocks in a table or an index.
To find the bad data, Maybe you can query pg_class views ;
select oid,relname from pg_class where oid =1663 or oid=16564;
just see what's the result!
IF the result is an index, just recreate the corrupted index;
IF the result is a table , than it means that there are some data of the table is damaged,
you can set the parameter "zero_damaged_pages" to on to by pass those corrupted data or
restore the table from your recently backup set !
more information about the parameter "zero_damaged_pages"
http://www.postgresql.org/docs/9.0/static/runtime-config-developer.html