Slow query / disable cache - Sybase Adaptive Server - database-performance

This query seems to be running incredibly slow (25 seconds for 4 million records!) on Sybase v10 at a clients database:
Select max(tnr) from myTable;
With tnr being the primary key.
If I run it 1000x on our server however, it seems to go fast (15 ms...) which makes me think it's because the query result is cached. Is there a way to disable the cache for this query (or entire database) in Sybase to reproduce this problem?
I tried:
call sa_flush_cache ();
call sa_flush_statistics ();
But didn't seem to do the trick.

Unfortunately dbcc cacheremove will not work as it does not clear down the pages from cache but rather removes the descriptor and places it back on the free chain.
Aside from restarting the data server the only way to do this is to bind the object to a cache and then do your tests then unbind the object which will remove all the pages from cache.

Try dbcc cacheremove

Related

DB2 Tables Not Loading when run in Batch

I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html

What is the fastest way to insert rows into a PostgreSQL Database with GeoKettle?

Let's say I have a .csv-File with 100 million rows. I import that csv-file into pentaho Kettle and want to write all rows into a PostgreSQL database. What is the fastest insert-transformation? I have tried the normal table output transformation and the PostgreSQL Bulk Loader (which is way faster than the table output). But still, it is too slow. Is there a faster way than using the PostgreSQL Bulk Loader?
Considering the fact that PostgreSQL Bulk Loader runs COPY table_name FROM STDIN - there's nothing faster from data load in postgres. Multi-value insert will be slower, just multiple insert will be slowest. So you can't make it faster.
To speed up COPY you can:
set commit_delay to 100000;
set synchronous_commit to off;
and other server side tricks (like dropping indexes before loading).
NB:
very old but still relevant depesz post
most probably won't work with pentaho Kettle,but worth of checking pgloader
update
https://www.postgresql.org/docs/current/static/runtime-config-wal.html
synchronous_commit (enum)
Specifies whether transaction commit will wait for WAL records to be
written to disk before the command returns a “success” indication to
the client. Valid values are on, remote_apply, remote_write, local,
and off. The default, and safe, setting is on. When off, there can be
a delay between when success is reported to the client and when the
transaction is really guaranteed to be safe against a server crash.
(The maximum delay is three times wal_writer_delay.) Unlike fsync,
setting this parameter to off does not create any risk of database
inconsistency: an operating system or database crash might result in
some recent allegedly-committed transactions being lost, but the
database state will be just the same as if those transactions had been
aborted cleanly. So, turning synchronous_commit off can be a useful
alternative when performance is more important than exact certainty
about the durability of a transaction.
(emphasis mine)
Also notice I recommend using SETfor the session level, so if the GeoKettle does not allow to set config before running commands on postgres, you can use pgbouncer connect_query for the specific user/database pair, or think some other trick. And if you can't do anything to set synchronous_commit per session and you decide to change it per database or user (so it would be applied to GeoKettle connection, don't forget to set it back to on after load is over.

Deal with Postgresql Error -canceling statement due to conflict with recovery- in psycopg2

I'm creating a reporting engine that makes a couple of long queries over a standby server and process the result with pandas. Everything works fine but sometimes I have some issues with the execution of those queries using a psycopg2 cursor: the query is cancelled with the following message:
ERROR: cancelling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
I was investigating this issue
PostgreSQL ERROR: canceling statement due to conflict with recovery
https://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT
but all solutions suggest fixing the issue making modifications to the server's configuration. I can't make those modifications (We won the last football game against IT guys :) ) so I want to know how can I deal with this situation from the perspective of a developer. Can I resolve this issue using python code? My temporary solution is simple: catch the exception and retry all the failed queries. Maybe could be done better (I hope so).
Thanks in advance
There is nothing you can do to avoid that error without changing the PostgreSQL configuration (from PostgreSQL 9.1 on, you could e.g. set hot_standby_feedback to on).
You are dealing with the error in the correct fashion – simply retry the failed transaction.
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication on the slave and resume after the query.
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
I recently encountered a similar error and was also in the position of not being a dba/devops person with access to the underlying database settings.
My solution was to reduce the time of the query where ever possible. Obviously this requires deep knowledge of your tables and data, but I was able to solve my problem with a combination of a more efficient WHERE filter, a GROUPBY aggregation, and more extensive use of indexes.
By reducing the amount of server side execute time and data, you reduce the chance of a rollback error occurring.
However, a rollback can still occur during your shortened window, so a comprehensive solution would also make use of some retry logic for when a rollback error occurs.
Update: A colleague implemented said retry logic as well as batching the query to make the data volumes smaller. These three solutions have made the problem go away entirely.
I got the same error. What you CAN do (if the query is simple enough), is deviding the data into smaller chunks as a workaround.
I did this within a python loop to call the query multiple times with the LIMIT and OFFSET parameter like:
query_chunk = f"""
SELECT *
FROM {database}.{datatable}
LIMIT {chunk_size} OFFSET {i_chunk * chunk_size}
"""
where database and datatable are the names of your sources..
The chunk_size is individually and to set this to a not too high value is crucial for the query to finish.

Clear oracle cache between queries

I want to get to know the real time of my query execution with different hints and without it. But oracle DB caches the query after its first execution and second time it executes quickly. How can I clear this cache after each query execution?
ALTER SYSTEM FLUSH BUFFER_CACHE
More details in the manual:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_2013.htm#i2053602

SQL queries running slowly or stuck after DBCC DBReindex or Alter Index

All,
SQL 2005 sp3, database is about 70gb in size. Once in a while when I reindex all of my indexes in all of my tables, the front end seems to freeze up or run very slowly. These are queries coming from the front end, not stored procedures in sql server. The front end is using JTDS JDBC connection to access the SQL Server. If we stop and restart the web services sending the queries the problem seems to go away. It is my understandning that we have a connection pool in which we re-use connections and dont establish a new connection each time.
This problem does not happen every time we reindex. I have tried both ways with dbcc dbreindex and alter index online = on and sort in tempdb = on.
Any insight into why this problem occurs once in a while and how to prevent this problem would be very helpful.
Thanks in advance,
Gary Abbott
When this happens next time, look into sys.dm_exec_requests to see what is blocking the requests from the clients. The blocking_session_id will indicate who is blocking, and the wait_type and wait_resource will indicate what is blocking on. You can also use the Activity Monitor to the same effect.
On a pre-grown database an online index rebuild will not block normal activity 9select/insert/update/delete). The load on the server may increase as a result of the online index rebuild and this could result in overall slower responses, but should not cause blocking.
If the database is not pre-grown though then the extra allocations of the index rebuild will trigger database growth events, which can be very slow if left default at 10% increments and without instant file initialisation enabled. During a database growth event all activity is frozen in that database, and this may be your problem even if the indexes are rebuilt online. Again, Activity Monitor and sys.dm_exec_requests would both clearly show this as happening.