Performance varies depending on how PostgreSQL function is called - postgresql

I'm calling a function that "RETURNS INTEGER".
I've noticed that it executes faster if I call it like this:
SELECT * FROM MyFunction;
The other ways to call it which are slower are like this:
2. SELECT MyFunction;
3. SELECT FROM MyFunction;
Does anybody know if there is a reason for this behavior?
I'm thinking there must be something else happening that's unrelated to how the function is called.
I executed this function many many times and used every combination regarding the order -- meaning 1 2 3, 1 3 2, 2 1 3, 2 3 1, 3 1 2, 3 2 1. It's consistently showing superior performance when executed using method #1.

Related

KDB - Is there a limit to the number of functions called at one time when updating tables?

I’ve been running a number of functions to update a table and I keep adding more functions as I wish to update and call other various items. I have not run into any issues yet (currently at 7 functions) but I’m mindful that there may be a limit. I did find that there is a limit of 8 parameters for a single function but nothing noting a limit on the below. If not, great. I wanted to be mindful as I scale up.
updateTable: FuncG FuncF FunE FuncD FuncC FuncB FuncA ::; // max number of functions?
t: updateTable t;
I made a fake update statement with loads of function calls, and it seems like you're fine:
q)t:([]a:1 2 3)
q)f:{x+1}
q)value "update ",(raze 1000#enlist"f "),"a from t"
a
----
1001
1002
1003
One thing you might want to do is make a single function composed from a list of your functions:
q)f:{x+1}
q)g:{2*x}
q)h:{x+1+2}
q)(('[;])/)(f;g;h)
{x+1}{2*x}{x+1+2}
q)composed:(('[;])/)(f;g;h)
q)t:([]a:1 2 3)
q)update composed a from t
a
--
9
11
13
so that you only have a single function in your update statement, and it should scale.

Siddhi: Ignore duplicate events while pattern active

I have a non-occurring event pattern to detect if a certain condition happens, then alert me if that condition doesn't change within a time limit. The below query could be described as "If value 1 appears for a user, alert me if there is not a new value for that user within 5 seconds":
define stream inStream(name string, value int);
partition with (name of inStream)
begin
from every in=inStream[ value == 1 ]
-> not inStream[ not(value == 1) ] for 5 sec
select in.name, in.value
insert into outStream;
end;
This query works exactly as expected: if I don't receive a value different than 1 within 5 seconds then the query is triggered. The issue arises when there are duplicate events with the value of 1.
If I send the event {name: "bob", value: 1} every second for 10 seconds, I would like to see the query triggered twice, once at 5 seconds, and once at 10 seconds. Right now however I see the query being triggered every second starting at 5 seconds in. Essentially the query (working as it should) is starting the 5 second timer for every event with value 1 it sees. However I would like it to not start that timer (or not output at least) if there is already a timer running.
I attempted to solve this with the following query (simply adding an 'output' line):
define stream inStream(name string, value int);
partition with (name of inStream)
begin
from every in=inStream[ value == 1 ]
-> not inStream[ not(value == 1) ] for 5 sec
select in.name, in.value
output first every 5 sec
insert into outStream;
end;
I also tried output last and output all.
The queries above did not work as expected: in the case of all and last no events at all were output, in the case of first only a single event was output, not the subsequent ones after the first 5 second block passed.
Is there any way to achieve what I would like? I have a hunch that using time windows or output is the way to solve it, but so far have not been able to get it to work.
The second query in the original question ends up working as expected. I was previously on Siddhi 4.1.4. After upgrading to Siddhi 5.0.0 the query works as I wanted it to.

SQL Server performance function vs no function

I have a query (relationship between CONTRACT <-> ORDERS) that I decided to break up into 2 parts (contract and orders) so I can reuse in another stored procedure.
When I run the code before the break up, it took around 10 secs to run; however, when I use a function for getting the contract, then pump the data into a temp table first, then join to the other parts it takes 2m:30s - why the difference in time?
The function takes less than a second to run and returns only one row i.e. details of one contract (contract_id is the parameter supplied to the function).
The part that is most effecting the performance the (ORDERS) largest table in the query has 4.1 million rows and joins to a few other tables however; if I just run the sub query for orders in isolation with a particular filter i.e. the contract id it takes less than a second to run and just happens to return zero records based for the contract I am testing on (due to filtering on the type of order it is looking for).
Base on the above information you would think 1 sec at most for the function + 1 sec at most to get the orders + summarize = 2 seconds at most, not 2 and half minutes!
Where am I going wrong, how do I begin to isolate the issue in time difference?
I know someone is going to tell me to paste the code but surely it is an issue of the database vs indexes perhaps vs how the compiler performs when dealing with raw code versus broken up code into parts. Is there an area of the code I can look at before having to post my whole code as I have tried variations of OUTER APPLY vs LEFT JOIN from the contract temp table to the orders subquery and both give me about the same result. Any ideas?
I don't think the issue was with the code but the network I was running it on. Although bizarre in the fact I had 2 versions of the proc running side by side and yesterday or rather before the weekend one was running in 10 secs and it is still running in 10 secs 3 days later and my new version (using the function) was taking anywhere between 2 to 3 minutes. This morning it is running at 2 or 3 seconds!! So I don't know if it is the fact I changed from declaring my table structure and using a table variable instead first to where previously I was using SELECT ... INTO #Contract made the difference or the network or precompiling has an affect. Whatever it is it no longer an issue. Should I delete this post?

kdb ticker plant: where to find documentation on .u.upd?

I am aware of this resource. But it does not spell out what parameters .u.upd takes and how to check if it worked.
This statement executes without error, although it does not seem to do anything:
.u.upd[`t;(`$"abc";1;2;3)]
If I define the table beforehand, e.g.
t:([] name:"aaa";a:1;b:2;c:3)
then the above .u.upd still runs without error, and does not change t.
.u.upd has the same function signature as insert (see http://code.kx.com/q/ref/qsql/#insert) in prefix form. In the most simplest case, .u.upd may get defined as insert.
so:
.u.upd[`table;<records>]
For example:
q).u.upd:insert
q)show tbl:([] a:`x`y;b:10 20)
a b
----
x 10
y 20
q).u.upd[`tbl;(`z;30)]
,2
q)show tbl
a b
----
x 10
y 20
z 30
q).u.upd[`tbl;(`a`b`c;1 2 3)]
3 4 5
q)show tbl
a b
----
x 10
y 20
z 30
a 1
b 2
c 3
Documentation including the event sequence, connection diagram etc. for tickerplants can be found here:
http://www.timestored.com/kdb-guides/kdb-tick-data-store
.u.upd[tableName; tableData] accepts two arguments, for inserting data
to a named table. This function will normally be called from a
feedhandler. It takes the tableData, adds a time column if one is
present, inserts it into the in-memory table, appends to the log file
and finally increases the log file counter.

How to pick items from warehouse to minimise travel in TSQL?

I am looking at this problem from a TSQL point of view, however any advice would be appreciated.
Scenario
I have 2 sets of criteria which identify items in a warehouse to be selected.
Query 1 returns 100 items
Query 2 returns 100 items
I need to pick any 25 of the 100 items returned in query 1.
I need to pick any 25 of the 100 items returned in query 2.
- The items in query 1/2 will not be the same, ever.
Each item is stored in a segment of the warehouse.
A segment of the warehouse may contain numerous items.
I wish to select the 50 items (25 from each query) in a way as to reduce the number of segments I must visit to select the items.
Suggested Approach
My initial idea has been to combined the 2 result sets and produce a list of
Segment ID, NumberOfItemsRequiredInSegment
I would then select 25 items from each query, giving preference to those in a segments with the most NumberOfItemsRequiredInSegment. I know this would not be optimal but would be an easy to implement heuristic.
Questions
1) I suspect this is a standard combinational problem, but I don't recognise it.. perhaps multiple knapsack, does anyone recognise it?
2) Is there a better (easy-ish to impliment) heuristic or solution - ideally in TSQL?
Many thanks.
This might also not be optimal but i think would at least perform fairly well.
Calculate this set for query 1.
Segment ID, NumberOfItemsRequiredInSegment
take the top 25, Just by sorting by NumberOfItemsRequiredInSegment. call this subset A.
take the top 25 from query 2, by joining to A and sorting by "case when A.segmentID is not null then 1 else 0, NumberOfItemsRequiredInSegmentFromQuery2".
repeat this but take the top 25 from query 2 first. return the better performing of the 2 sets.
The one scenario where i think this fails would be if you got something like this.
Segment Count Query 1 Count Query 2
A 10 1
B 5 1
C 5 1
D 5 4
E 5 4
F 4 4
G 4 5
H 1 5
J 1 5
K 1 10
You need to make sure you choose A, D, E, from when choosing the best segments from query 1. To deal with this you'd almost still need to join to query two, so you can get the count from there to use as a tie breaker.