PostgreSQL Rolling Standard Deviation over time in single query - postgresql

This may be an easily solvable question but I can't see an immediate solution. I am calling a PostgreSQL function which returns multiple columns, 2 of which are relevant to this question - a date column & a numeric field of return values. An example of the function call would be
SELECT curr_date, return_val
FROM schema.function_name($1,$2);
With example output such as
"2014-07-31";0.003767
"2014-08-07";-0.028531
"2014-08-14";0.020051
"2014-08-21";-0.003541
"2014-08-28";0.007766
"2014-09-04";-0.021926
"2014-09-11";0.026330
"2014-09-18";0.008137
"2014-09-25";-0.033303
"2014-10-02";0.030100
"2014-10-09";-0.012116
"2014-10-16";-0.017148
So on, so forth. The data will always return from this function with the dates ascending. What I would like to do is to use Postgres's stddev_samp function on every row, but only considering the return_value's from that row's date back in time. Something like:
SELECT curr_date, return_val,
--stddev_samp(return_val) where curr_date <= curr_date of current row
FROM schema.function_name($1,$2);
Naturally, if I calculated the sample deviation of the return_value's from 2014-07-31 to 2014-10-02 in the sample provided, it would differ slightly to calculating it using the result set from 2014-07-31 to any other date present. I know I could probably write another function which takes a numeric array as input and returns the standard deviation as output, and then call this in my query above, but I'm hoping someone may have a simpler approach which I'm just currently not seeing. If any other information is required, feel free to ask. I'm using version 10.7.

demo:db<>fiddle
Using window functions:
SELECT
stddev_samp(return_val) OVER(ORDER BY curr_date)
FROM
mytable

Related

Tableau calculated field that refers to its own previous lagged (-1) value to calculate

I need help on a basic calculation that I'm unable to figure on Tableau.
I am trying to setup a calculated field that has dependency on its previous value to calculate its current value. Here is a simple example from Excel -
Sample Exhibit
As you can see, each value in a row is dependent on its previous value and multiplied by a constant.
In Tableau, when I'm trying to create a calculated field, it is not letting me refer to itself (-1 lagged value) in the code. I'd appreciate any help on how this can be resolved. Thanks in advance!
Tableau can do this client side with a table calc. You’ll have to learn how table calcs operate from the help- especially partitioning and addressing. Then you can use the function Previous_Value() to refer to the previous value. Practice on something simple first to make sure you understand how previous value() works. Hint, the argument to that function doesn’t mean what most people assume it means
If you want to perform this calculation server side instead, then you’ll need to use custom SQL so you can specify an analytic aka windowing query
Check the LOOKUP field to get the value from the preceding row. For example: LOOKUP(SUM([Value]),-1)
https://help.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.htm#lookupexpression-offset
You may need to make yourself familiar with the Table Calculation partitioning if not getting the expected result.

running total using windows function in sql has same result for same data

From every references that I search how to do cumulative sum / running total. they said it's better using windows function, so I did
select grandtotal,sum(grandtotal)over(order by agentname) from call
but I realize that the results are okay as long as the value of each rows are different. Here is the result :
Is There anyway to fix this?
You might want to review the documentation on window specifications (which is here). The default is "range between" which defines the range by the values in the row. You want "rows between":
select grandtotal,
sum(grandtotal) over (order by agentname rows between unbounded preceding and current row)
from call;
Alternatively, you could include an id column in the sort to guarantee uniqueness and not have to deal with the issue of equal key values.

Deterministic function for getting today's date

I am trying to create an indexed view using the following code (so that I can publish it to replication it as a table):
CREATE VIEW lc.vw_dates
WITH SCHEMABINDING
AS
SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), number) AS SettingDate
FROM lc.numbers
WHERE number<8
GO
CREATE UNIQUE CLUSTERED INDEX
idx_LCDates ON lc.vw_dates(SettingDate)
lc.numbers is simply a table with 1 column (number) which is incremented by row 1-100.
However, I keep getting the error:
Column 'SettingDate' in view 'lc.vw_dates' cannot be used in an index or statistics or as a partition key because it is non-deterministic.
I realize that GETDATE() is non-deterministic. But, is there a way to make this work?
I am using MS SQL 2012.
Edit: The hope was to be able to Convert GetDate() to make it deterministic (it seems like it should be when stripping off the time). If nobody knows of a method to do this, I will close this question and mark the suggestion to create a calendar table as correct.
The definition of a deterministic function (from MSDN) is:
Deterministic functions always return the same result any time they are called with a specific set of input values and given the same state of the database. Nondeterministic functions may return different results each time they are called with a specific set of input values even if the database state that they access remains the same.
Note that this definition does not involve any particular span of time over which the result must remain the same. It must be the same result always, for a given input.
Any function you can imagine that always returns the date at the point the function is called, will by definition, return a different result if you run it one day and then again the next day (regardless of the state of the database).
Therefore, it is impossible for a function that returns the current date to be deterministic.
The only possible interpretation of this question that could enable a deterministic function, is if you were happy to pass as input to the function some information about what day it is.
Something like:
select fn_myDeterministicGetDate('2015-11-25')
But I think that would defeat the point as far as you're concerned.

kdb/q: use function in a select from partitioned table

I'm trying to get max drawdown from a partitioned table across multiple dates. The query works fine when run with a date constrained to a specific day. E.g.
select {max neg x-maxs x} pnl from trades where date=last date
It's getting map-reduced over multiple dates so the above query no longer works. I can make the query run over multiple dates by adding another aggregation:
select max {max neg x-maxs x} pnl from trades
but it's not getting the max drawdown from continuous sequence of trades but a maximum of daily drawdowns.
I wonder if there's a way to make it work with a single select without chaining selects like
select {max neg x-maxs x} pnl from select pnl from trades
I've got a rather big query to pull a lot of various metrics on the trades where max drawdown is just one of them. Using chained select means that I need to break the big query into two queries, map-reduced and non-map-reduced, and then join them back which would make the query look ugly.
Thanks!
Select query runs on each date in partition db and apply function to each date values and finally aggregates them depending upon the call (user defined function behaves differently than plain 'q' functions).
So I don't think you can combine that into one query. But there are ways you can look for to make your query more generalized and reusable for different scenarios.
For ex. convert your query to functional form and use variables in that query for column name and user function. Put this in one function which will accept column name and user function. Now you can call this function with different set of (column ;function). Something like :
runF:{[col;usrfunc] funtional_query_uses_col_userfunc }
All this depends on your use cases. Also check for memory usage as you'll be taking lot of data into memory.

n-th row in PostgreSQL for p-quantile

I'm trying to fetch the n-th row of a query result. Further posts suggested the use of OFFSET or LIMIT but those forbid the use of variables (ERROR: argument of OFFSET must not contain variables). Further I read about the usage of cursors but I'm not quite sure how to use them even after reading their PostgreSQL manpage. Any other suggestions or examples for how to use cursors?
My main goal is to calculate the p-quantile of a row and since PostgreSQL doesn't provide this function by default I have to write it on my own.
Cheers
The following returns the 5th row of a result set:
select *
from (
select <column_list>,
row_number() over (order by some_sort_column) as rn
) t
where rn = 5;
You have to include an order by because otherwise the concept of "5th row" doesn't make sense.
You mention "use of variable" so I'm not sure what you are actually trying to achive. But you should be able to supply the value 5 as a variable for this query (or even a sub-select).
You might also want to dig further into windowing functions. Because with that you could e.g. do a sum() over the 3 rows before the current row (or similar constructs) - which could also be useful for you.
if you would like to get 10th record, below query also work fine.
select * from table_name order by sort_column limit 1 offset 9
OFFSET simply skip that many rows before beginning to return rows as mentioned in LIMIT clause.