kdb - How to pass a table by reference to kdb function - kdb

Define the question
Given an empty table myt defined by
myt:([] id:`int$(); score:`int$())
It is trivial to insert one or more records into it, for example
`myt upsert `id`score!1 100
But when it comes to defining a function to insert into a given table, it seems a different trick.
A first try version could be
upd:{[t] t upsert `id`score!42 314;}
upd[myt]
Apparently it updates nothing to myt itself but a local copy version of it.
Difficulties of Possible solutions
Possible solution 1: using the global variable instead
Let myt be a global variable, the variable will then be accessed inside a function.
upd:{`myt upsert `id`score!42 314;}
upd[]
It looks a good solution, expect if many myts are required. Under this situation, one have to provide a lot of copy for upd function as following
upd0:{`myt0 upsert `id`score!42 314;}
upd1:{`myt1 upsert `id`score!42 314;}
upd2:{`myt2 upsert `id`score!42 314;}
...
So, the global variable solution is not a good solution here.
Possible solution 2: amending table outside function
One can also solve the problem by amending myt just outside the function, returning the modified result by removing the ending ;.
upd:{[t] t upsert `id`score!42 314} / return inserted valued
myt:upd[myt]
It works! But after running this code for millions of times, it works slower and slower. Because this solution discards the "in-place" property of upsert operator, the copy overhead increases as the size of table getting larger.
Pass argument by reference?
Maybe the concept of "pass-by-reference" solution here. Or maybe q has its own solution for this problem and I have not get the essential idea.
[UPDATE] Solved by adding "`" to call-by-name
As cillianreilly answers, it is simple to add a "`" symbol in front of myt to declare it as a global variable when pass it into function. So the perfect solution is direct.
upd:{[t] t upsert `id`score!42 314;}
upd[`myt] / it works

Your first version should achieve what you want. If you pass the table name as a symbol, it will update the global variable and return the table name. If you pass the table itself, it will return the updated table, which you can use in an assignment, as you found in possible solution 2. Note that the actual table will not have been updated by this operation.
q){[t;x]t upsert x}[myt;`id`score!42 314]
id score
--------
42 314
q)count myt
0
q){[t;x]t upsert x}[`myt;`id`score!42 314]
`myt
q)count myt
1
For possible solution 1, why would you need hundreds of myt tables? Regardless, there is no need to hardcode the table name into the function. You can just pass the table name as a symbol as demonstrated above, which will update the global for you. The official kx kdb tick example given on their github uses insert for exactly this scenario, but in practice a lot of developers use upsert. https://github.com/KxSystems/kdb-tick/blob/master/tick/r.q#L6
Hope this helps.

Related

Perl DBI: Insert a whole hash in a single call, complement to fetchall_hashref()

I am using Perl DBI with DBD::Informix but I suspect the exact database is not important to this question. Bottom line: I'm looking for a mass-insert method where I can insert the contents of a hash [of hashes] into a table in one call.
Commentary:
I have grown enamored of the Perl DBI method fetchall_hashref(), wherein I can run the query and fetch the whole blessed active set - which may be thousands of rows - into a hash in one call.
I'm reading through the POD on both of these modules, looking for some equivalent of this in an insert statement or PUT call, something like a putall_hashref() method. The best I find is a simple one-row insert, where I've PREPAREd an INSERT statement with ? placeholders and then execute the PREPAREd stament. I've used a PUT cursor available in ESQL/C (and Informix-4GL) but even those are still one row at time.
I need a mass-insert method.
Is such a method in there someplace but I've missed it?
I see the comments by Shawn and zdim.
Shawn & zdim,
Example of current (untested) code, though I've used similar stuff before:
$db_partns = $extract_smi_p->fetchall_hashref("partition");
# Pulls from temp tab in DB1
...
Now loop to insert each row from above hash into new temp table in another DB
for my $partn (keys %$db_partns)
{
$put_statement_p->execute(#{$db_partns->{$partn}}{#partn_fields});
}
Note: #partn_fields is an array of keys i.e. column names. (Using hash-slice scheme.)
To use execute_array() I'd need to separate all values in each column into a separate array. Thanks for the clever idea; I may use it some day. But to set that up is an even uglier setup than I'm already doing.

Weak points while comparing columns in trigger as hstore-data

On some occasion I'd like before UPDATE to make sure which columns are changed. To make it as generic as possible, I don't want to use schema, table or column names in function. I found some solution here in SO and other places, and particularly liked idea to use hstore from this answer
Downside of hstore, as said widely, is it that this way I lose data types, everything is stringified.
But using it in context of trigger (while having no complex cols like json or hstore), where both NEW and OLD have same set of cols with according datatypes, I could think of just one problem: NULL and empty values will be not distinguishable.
What other problems I may be faced with, when I detect changes in trigger function like this:
changes := hstore(NEW) - hstore(OLD);
Alternative seems to use jsonb and then write some jsonb_diff function to discover changes. hstore's offered internal subtract-operation seems way more robust, but maybe I have not considered all weak points there.
I'm using Postgres 9.6.

Does PostgreSQL have the equivalent of an Oracle ArrayBind?

Oracle has the ability to do bulk inserts by passing arrays as bind variables. The database then does a separate row insert for each member of the array:
http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html
Thus if I have an array:
string[] arr = { 1, 2, 3}
And I pass this as a bind to my SQL:
insert into my_table(my_col) values (:arr)
I end up with 3 rows in the table.
Is there a way to do this in PostgreSQL w/o modifying the SQL? (i.e. I don't want to use the copy command, an explicit multirow insert, etc)
Nearest that you can use is :
insert into my_table(my_col) SELECT unnest(:arr)
PgJDBC supports COPY, and that's about your best option. I know it's not what you want, and it's frustrating that you have to use a different row representation, but it's about the best you'll get.
That said, you will find that if you prepare a statement then addBatch and executeBatch, you'll get pretty solid performance. Sufficiently so that it's not usually worth caring about using COPY. See Statement.executeBatch. You can create "array bind" on top of that with a trivial function that's a few lines long. It's not as good as server-side array binding, but it'll do pretty well.
No, you cannot do that in PostgreSQL.
You'll either have to use a multi-row INSERT or a COPY statement.
I'm not sure which language you're targeting, but in Java, for example, this is possible using Connection.createArrayOf().
Related question / answer:
error setting java String[] to postgres prepared statement

Is it possible to use a stable function in an index in Postgres?

I've been working on a project at work and have come to the realization that I must invoke a function in several of the queries' WHERE clauses. The performance isn't terrible exactly, but I would love to improve it. So I looked at the docs for indexes which mentioned that:
An index field can be an expression computed from the values of one or more columns of the table row.
Awesome. So I tried creating an index:
CREATE INDEX idx_foo ON foo_table (stable_function(foo_column));
And received an error:
ERROR: functions in index expression must be marked IMMUTABLE
So then I read about Function Volatility Categories which had this to say about stable volatility:
In particular, it is safe to use an expression containing such a function in an index scan condition.
Based on the phrasing "index scan condition" I'm guessing it doesn't mean an actual index. So what does it mean? Is it possible to utilize a stable function in an index? Or do we have to go all the way and ensure this would work as an immutable function?
We're using Postgres v9.0.1.
An "index scan condition" is a search condition, and can use a volatile function, which will be called for each row processed. An index definition can only use a function if it is immutable -- that is, that function will always return the same value when called with any given set of arguments, and has no user-visible side effects. If you think about it a little, you should be able to see what kind of trouble you could get into if the function might return a different value than what it did when the index entry was created.
You might be tempted to lie to the database and declare a function as immutable which isn't really; but if you do, the database will probably do surprising things that you would rather it didn't.
9.0.1 has bugs for which fixes are available. Please upgrade to 9.0.somethingrecent.
http://www.postgresql.org/support/versioning/

Execute statements for every record in a table

I have a temporary table (or, say, a function which returns a table of values).
I want to execute some statements for each record in the table.
Can this be done without using cursors?
I'm not opposed to cursors, but would like a more elegant syntax\way of doing it.
Something like this randomly made-up syntax:
for (select A,B from #temp) exec DoSomething A,B
I'm using Sql Server 2005.
I dont think what you want to to is that easy.
What i have found is that you can create a scalar function taking the arguments A and B and then from within the function execute an Extended Stored Procedure. This might achieve what you want to do, but it seems that this might make the code even more complex.
I think for readibility and maintainability, you should stick to the CURSOR implementation.
I would look into changing the stored proc so that it can work against a set of data rather than a single row input.
Would CROSS/OUTER APPLY do what you want if you need RBAR processing.
It's elegant, but depends on what processing you need to do