Does q/kdb+ ever act asynchronously when synchronously instructed to write tables to disk? - kdb

I've had a strange bug pop up -- when I write to a partitioned table, then immediately after do a select on that same table, I get an error like:
./2018.04.23/ngbarx/cunadj. OS reports: Operation not permitted.
This error does not appear if, after writing the table, I wait a few seconds. This to me seems like pointing towards there being a caching situation, where q responds before an operation is complete, but afaik everything I am doing should be synchronous.
I would love to understand what I am doing wrong / what is causing this error exactly / which commands are executing asynchronously.
The horrible details:
Am writing from Python connected to q synchronously using the qpython3 package
The q session is launched with slaves i.e. -s 4
To write the partitioned table, I am using the unofficial function .Q.dcfgnt which can be found here
I write to a q session that was initialized with a database directory as is usual when dealing with partitioned tables
After writing the table with .Q.dcfgnt, but before doing the select, I also do .Q.chk`:/db/; system"l /db/"; .Q.cn table in that order, just to be sure the table is up and ready to use in the q session. These might be both overkill and in the wrong order, but I believe they are all synchronous calls afaik, please correct me if I am wrong.
The trigger to the error is a 10#select from table; I understand
why this is a bad idea to do in general on a partitioned table, but
also from my understanding it shouldn't be triggering the particular error that I am getting.

Related

PostgreSQL: See query results as scan progresses, rather than wait till end and show all

For a simple select query like select column_name from table_name on a very large table, is it possible to have the output being provided as the scan of the table progresses?
If I abort the command after sometime, I expect to get output from the select at least thus far.
Think cat, which I believe won't wait till it completes the full read of the file.
Does MySQL or other RDBMS systems support this?
PostgreSQL always streams the result to the client, and usually it is the client library that collects the whole result set before returning it to the user.
The C API libpq has functionality that supports this. The main disadvantage with this approach is that you could get a run time error after you already have received a some rows, so that's a case you'd have to handle.
The traditional way to receive a query result in parts is to use a cursor and fetch results from it. This is a technique supported by all client APIs.
Cursors are probably what you are looking for, and they are supported by all RDBMS I know in some fashion.

DB2 Tables Not Loading when run in Batch

I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html

What would happen if I run two SQL commands using the same DB connection?

I'm writing a program to run mass calculation and output results into PostgreSQL.
My platform is Windows Sever 2008, PostgreSQL 10. My program is written in C.
The results would be produced group by group, finishing of each group will create an extra thread to write the output.
Now since the output threads are created one by one, it is possible that two or more SQL input commands will be created simultaneously, or the previous one is under process when new ones call the function.
So my questions are:
(1) What would happen if one thread is in SQL processing and another thread called PQexec(PGconn *conn, const char *query), would they effect each other?
(2) What if I apply different PGconn? Would it speed up?
If you try to call PQexec on a connection that is in the process of executing an SQL statement, you would cause a protocol violation. That just doesn't work.
Processing could certainly be made faster if you use several database connections in parallel — concurrent transactions is something that PostgreSQL is designed for.

Deal with Postgresql Error -canceling statement due to conflict with recovery- in psycopg2

I'm creating a reporting engine that makes a couple of long queries over a standby server and process the result with pandas. Everything works fine but sometimes I have some issues with the execution of those queries using a psycopg2 cursor: the query is cancelled with the following message:
ERROR: cancelling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
I was investigating this issue
PostgreSQL ERROR: canceling statement due to conflict with recovery
https://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT
but all solutions suggest fixing the issue making modifications to the server's configuration. I can't make those modifications (We won the last football game against IT guys :) ) so I want to know how can I deal with this situation from the perspective of a developer. Can I resolve this issue using python code? My temporary solution is simple: catch the exception and retry all the failed queries. Maybe could be done better (I hope so).
Thanks in advance
There is nothing you can do to avoid that error without changing the PostgreSQL configuration (from PostgreSQL 9.1 on, you could e.g. set hot_standby_feedback to on).
You are dealing with the error in the correct fashion – simply retry the failed transaction.
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication on the slave and resume after the query.
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
I recently encountered a similar error and was also in the position of not being a dba/devops person with access to the underlying database settings.
My solution was to reduce the time of the query where ever possible. Obviously this requires deep knowledge of your tables and data, but I was able to solve my problem with a combination of a more efficient WHERE filter, a GROUPBY aggregation, and more extensive use of indexes.
By reducing the amount of server side execute time and data, you reduce the chance of a rollback error occurring.
However, a rollback can still occur during your shortened window, so a comprehensive solution would also make use of some retry logic for when a rollback error occurs.
Update: A colleague implemented said retry logic as well as batching the query to make the data volumes smaller. These three solutions have made the problem go away entirely.
I got the same error. What you CAN do (if the query is simple enough), is deviding the data into smaller chunks as a workaround.
I did this within a python loop to call the query multiple times with the LIMIT and OFFSET parameter like:
query_chunk = f"""
SELECT *
FROM {database}.{datatable}
LIMIT {chunk_size} OFFSET {i_chunk * chunk_size}
"""
where database and datatable are the names of your sources..
The chunk_size is individually and to set this to a not too high value is crucial for the query to finish.

PostgreSQL: How to execute an insert trigger without delaying insert response?

Does anyone knows how can I set up an insert trigger so when a perform an insert from my application, the data gets inserted and postgres returns, even before the trigger finishes executing?
There is no built-in support for this; you will have to hack something up. Options include:
Write the trigger in C, Perl, or Python and have it launch a separate process to do the things you want. This can get tricky and possibly slightly dangerous to your database system, and it only works if the things you want to do are outside of the database.
Write a lightweight trigger function that only records an entry into a log or task table, and have a separate job or daemon that looks into that table on its own schedule and executes things from there. That's more or less how Slony works.
The question is : why do you need it? Triggers should be fast. If you need to do something complicated, write trigger that send notification to some daemon that does the complex part - for example using LISTEN/NOTIFY feature of PostgreSQL.