Postgres resampling time series data - postgresql

I have OHLCV data of stocks stored in 1-minute increments inside Postgres.
I am trying to resample data to 5 minutes interval. I have used this answer to generate the following SQL query.
Here is the SQL query generated:
SELECT
avg('open') AS open,
avg('high') AS high,
avg('low') AS low,
avg('close') AS close,
avg('volume') AS volume,
avg('open_interest') AS open_interest,
to_timestamp(floor(EXTRACT(epoch FROM 'timestamp') / 300) * 300) AS interval_alias
WHERE 'symbol'='IRFC-N8' GROUP BY interval_alias
I am getting this error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) function avg(unknown) is not unique
LINE 1: SELECT avg('open') AS open, avg('high') AS high, avg('low') ...
^
HINT: Could not choose a best candidate function. You might need to add explicit type casts.
Could you tell me what went wrong?
Edit 1: Code formatted for better rendering.
Edit 2: According to the answer below, I need to use double quotes around the parameter to the avg function. I am using sqlalchemy to generate the expressions and it is creating single quoted strings. Here is the part of code which generates the avg query:
cols = list()
cols.append(func.avg(self.p.open).label(self.p.open))
cols.append(func.avg(self.p.high).label(self.p.high))
cols.append(func.avg(self.p.low).label(self.p.low))
cols.append(func.avg(self.p.close).label(self.p.close))
cols.append(func.avg(self.p.volume).label(self.p.volume))
cols.append(func.avg(self.p.openinterest).label(self.p.openinterest))
seconds = self._get_seconds()
cols.append(func.to_timestamp(func.floor(func.extract("epoch", "timestamp") / seconds) * seconds).label("interval_alias"))
SqlAlchemy should have known better to use double quotes but its generating single quotes.

db<>fiddle
Your error is the use of single quotes ' instead of double quotes " in your AVG() calls. The single quotes mark texts but you want to name the columns for the average value. So you need the double quotes or can leave them (both variants are shown in the db fiddle).
Edit (The real problem):
It seems that self.p.columnname gives just the column's name, not the column itself. On SQLAlchemy the reference to a specific column is table.c.columnname for referencing the specific column. Please use the p instead of c.
Attention:
If you average all your data you may lose important data as the real minimum and maximum. You may want to aggregate with other functions as MIN or MAX. Maybe the WITHIN GROUP functions could help you.
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://www.postgresql.org/docs/9.5/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE

Related

AWS Athena: Handling big numbers

I have files on S3 where two columns contain only positive integers which can be of 10^26. Unfortunately, according to AWS Docs Athena only supports values in a range up to 2^63-1 (approx 10^19). So at the moment these column are represented as a string.
When it comes to filtering it is not that big of an issue, as I can use regex. For example, if I want to get all records between 5e^21 and 6e^21 my query would look like:
SELECT *
FROM database.table
WHERE (regexp_like(col_1, '^5[\d]{21}$'))
I have approx 300M rows (approx 12GB in parquet) and it takes about 7 seconds, so performance wise it ok.
However, sometimes I would like to perform some math operation on these two big columns, e.g subtract one big column from another. Casting these records to DOUBLE wouldn't work due to approximation error. Ideally, I would want to stay within Athena. At the moment, I have about 100M rows that are greater then 2^63-1, but this number can grow in a future.
What would be the right way to approach problem of having numerical records that exceed available range? Also what are your thoughts on using regex for filtering? Is there a better/more appropriate way to do it?
You can cast numbers of the form 5e21 to an approximate 64bit double or an exact numeric 128bit decimal. First you'll need to remove the caret ^, with the replace function. Then a simple cast will work:
SELECT CAST(replace('5e^21', '^', '') as DOUBLE);
_col0
--------
5.0E21
or
SELECT CAST(replace('5e^21', '^', '') as DECIMAL);
_col0
------------------------
5000000000000000000000
If you are going to this table often, I would rewrite it the new data type to save processing time.

PostgreSQL Rolling Standard Deviation over time in single query

This may be an easily solvable question but I can't see an immediate solution. I am calling a PostgreSQL function which returns multiple columns, 2 of which are relevant to this question - a date column & a numeric field of return values. An example of the function call would be
SELECT curr_date, return_val
FROM schema.function_name($1,$2);
With example output such as
"2014-07-31";0.003767
"2014-08-07";-0.028531
"2014-08-14";0.020051
"2014-08-21";-0.003541
"2014-08-28";0.007766
"2014-09-04";-0.021926
"2014-09-11";0.026330
"2014-09-18";0.008137
"2014-09-25";-0.033303
"2014-10-02";0.030100
"2014-10-09";-0.012116
"2014-10-16";-0.017148
So on, so forth. The data will always return from this function with the dates ascending. What I would like to do is to use Postgres's stddev_samp function on every row, but only considering the return_value's from that row's date back in time. Something like:
SELECT curr_date, return_val,
--stddev_samp(return_val) where curr_date <= curr_date of current row
FROM schema.function_name($1,$2);
Naturally, if I calculated the sample deviation of the return_value's from 2014-07-31 to 2014-10-02 in the sample provided, it would differ slightly to calculating it using the result set from 2014-07-31 to any other date present. I know I could probably write another function which takes a numeric array as input and returns the standard deviation as output, and then call this in my query above, but I'm hoping someone may have a simpler approach which I'm just currently not seeing. If any other information is required, feel free to ask. I'm using version 10.7.
demo:db<>fiddle
Using window functions:
SELECT
stddev_samp(return_val) OVER(ORDER BY curr_date)
FROM
mytable

How do you divide a single column in a table by a constant in Tableau?

Sorry if this seems trivial, but I am fairly new to Tableau. I have a simple table that has 1 dimension for columns and 1 dimension for rows. My Marks are the Count of a third dimension. I'd like to divide only 1 of the columns in the table by a constant but not all of them. When I have tried conditional statements, I receive the error regarding mix of non-aggregate and aggregate statements.
What is the best way to divide a single column's values based upon a condition?
Thanks in advance.
Typically the error regarding non-aggregate and aggregate statements can be resolved using the ATTR() function.
SUM([Sales]) / [Constant]
Turns to:
SUM([Sales]) / ATTR([Constant])
Or conversely, which might or might not fit your data:
[Sales] / [Constant]
You just cant mix the two as in the first example.
Edit
This is probably a more accurate place for the ATTR() function given what I'm guessing is your use case:
If ATTR([Segment]) = 'Corporate'
Then COUNT(Sales) / SUM([Constant])
END
Try turning the constant to a discrete measure and see if that works. (right-click on measure and select 'discrete')
Also, without seeing the conditional code you are using, you probably need to wrap the entire condition with count() in order to not get the Aggregate/Non-Aggregate error, like this:
Count(If [MyDimension] = "XX" then [MyOtherDimension] else Null End)
NOT like this:
If [MyDimension] = "XX" then Count([MyOtherDimension]) else Null End

UPDATE SQL Command not saving the results

I looked though the forum but I couldn't find a issue like mine.
Essentially I have a table called [p005_MMAT].[dbo].[Storage_Max]. It has three columns Date, HistValue and Tag_ID. I want to make all the values in 'HistValue' column to have 2 decimal places. For example if a number is 1.1, I want it to be 1.10 or if its 1 then also I want it to look like 1.00.
Here is the sql update statement I am using
update [p005_MMAT].[dbo].[Storage_Max]
set [HistValue] = cast([HistValue] as decimal (10,2))
where [Tag_ID] = 94
After executing the query it says 3339 rows affected but when I perform a simple select statement it appears the column had no affect of. I have used that cast function in select statement and it adds two decimal places.
Please advice.
The problem is the datatype and SQL Server. Float or real will not have the trailing zeros. You either have to change the datatype of the column or just deal with it and handle the formatting in your queries or application.
You could run something like the following
select
cast([HistValue] as decimal (10,2))
from [p005_MMAT].[dbo].[Storage_Max]
where [Tag_ID] = 94

running total using windows function in sql has same result for same data

From every references that I search how to do cumulative sum / running total. they said it's better using windows function, so I did
select grandtotal,sum(grandtotal)over(order by agentname) from call
but I realize that the results are okay as long as the value of each rows are different. Here is the result :
Is There anyway to fix this?
You might want to review the documentation on window specifications (which is here). The default is "range between" which defines the range by the values in the row. You want "rows between":
select grandtotal,
sum(grandtotal) over (order by agentname rows between unbounded preceding and current row)
from call;
Alternatively, you could include an id column in the sort to guarantee uniqueness and not have to deal with the issue of equal key values.