Order by custom named rows - postgresql

I’d like to sort my postgres results by some fancy ranking function, but for sake of simplicity, let’s say that I’d like to add two custom rows and sort by them.
SELECT my_table.*,
extract(epoch from (age(current_date, '2012-09-12 10:43:40'::date)))/3600 AS age_in_hours
Fancy_function_counting_distance() AS distance
FROM my_table
ORDER BY distance + age_in_hours;
However, it doesn’t work, since I’m getting error: ERROR: column "distance" does not exist.
Is it possible to order my results by that custom named rows?
I’m running postgres 9.1.x

As per the SQL standard, aliases in the SELECT list are not visible in ORDER BY.
You can use column-position specification (eg ORDER BY 1,2), but that doesn't accept an expression; you cannot ORDER BY 1+2, for example. So you need to use a subquery to generate the result set then sort it in an outer query:
SELECT *
FROM (
SELECT my_table.*,
extract(epoch from (age(current_date, '2012-09-12 10:43:40'::date)))/3600 AS age_in_hours
Fancy_function_counting_distance() AS distance
FROM my_table
) x
ORDER BY distance + age_in_hours;

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

Pyspark: correlated column is not allowed in predicate

I have a table with three columns EVENT, TIME, and `PRICE. For all events I would like to aggregate on previous events, for simplicity we'll assume it is mean.
What I would like to do is the following,
SELECT (
SELECT COUNT(*), MEAN(ti.PRICE)
    FROM table_1 ti
WHERE ti.EVENT = to.EVENT AND ti.TIME < to.TIME
), EVENT
FROM table_1
though if I run this in a pyspark environment or pyspark.sql(query) I get the error correlated column is not allowed in predicate.
Now, I wonder how I can change either the query to run without errors, or, how I can use native pyspark functions (F.filter....) to achieve the same result.
read other stackoverflow, that did not help

Cannot create materialized view with ORDER BY clause in TimescaleDb 2.7.0

The timescale docs seem to suggest that since 2.7.0 it should be possible to make materialized views which include an order by clause. (See "timescale.finalized" option here and "function support" here).
However, I have not been able to get this to work for me. When I try to create my materialized view I get:
ERROR: invalid continuous aggregate query
DETAIL: ORDER BY is not supported in queries defining continuous aggregates.
HINT: Use ORDER BY clauses in SELECTS from the continuous aggregate view instead.
Is there something fundamental I'm misunderstanding about how this should work?
Here is the full script:
> select extname, extversion from pg_extension where extname = 'timescaledb';
extname | extversion
-------------+------------
timescaledb | 2.7.0
(1 row)
> CREATE TABLE stocks_real_time (
time TIMESTAMPTZ NOT NULL,
price DOUBLE PRECISION NULL
);
CREATE TABLE
> SELECT create_hypertable('stocks_real_time','time');
create_hypertable
-------------------------------
(7,public,stocks_real_time,t)
(1 row)
> CREATE MATERIALIZED VIEW mat_view_stocks_real_time
WITH (timescaledb.continuous)
AS (
SELECT
time_bucket('60 minutes', time) as bucketed_time,
AVG(price) as price
FROM stocks_real_time
GROUP BY bucketed_time
ORDER BY bucketed_time
);
ERROR: invalid continuous aggregate query
DETAIL: ORDER BY is not supported in queries defining continuous aggregates.
HINT: Use ORDER BY clauses in SELECTS from the continuous aggregate view instead.
I still get the same error if I explicitly add "timescaledb.finalized=true" to the with clause.
(NB: I work at Timescale!)
We have an open issue to support this, and I think the confusion is because we now support aggregates with order by clauses in them, this means things like: SELECT percentile_cont(price) WITHIN GROUP (ORDER BY time) or SELECT array_agg(foo ORDER BY time)
So I think that is probably where the confusion is coming from, but like I said, we have an open issue to support that sort of order by. You can also apply the order by in the SELECT from the continuous aggregate though: ie SELECT * FROM mat_view_stocks_real_time ORDER BY bucketed_time and that should work just fine.

How do I sort partition data when using query binding on an SSAS cube?

I'm trying to implement various sorts as described in this article.
I have a typical Sales Measure Group partitioned by fiscal period. If I try to add an order by clause to the query it fails when processing because SSAS wraps the query into a subquery. Is there a way to prevent this from happening? How do I ensure the sort order in a case like this?
Here is the code that is generated for a partition:
SELECT *
FROM
(
SELECT *
FROM [Sales]
WHERE SaleDate between '1/1/2015' and '1/28/2015'
order by SaleDate
)
AS [Sales]
I replaced the field names with * for clarity.
SELECT TOP 100 PERCENT * FROM Sales ORDER BY SaleDate
That is not guaranteed to work. The best way to order it is to ensure the clustered index is on the column you want to order by.

PostgreSQL array_agg order

Table 'animals':
animal_name animal_type
Tom Cat
Jerry Mouse
Kermit Frog
Query:
SELECT
array_to_string(array_agg(animal_name),';') animal_names,
array_to_string(array_agg(animal_type),';') animal_types
FROM animals;
Expected result:
Tom;Jerry;Kerimt, Cat;Mouse;Frog
OR
Tom;Kerimt;Jerry, Cat;Frog;Mouse
Can I be sure that order in first aggregate function will always be the same as in second.
I mean I would't like to get:
Tom;Jerry;Kermit, Frog;Mouse,Cat
Use an ORDER BY, like this example from the manual:
SELECT array_agg(a ORDER BY b DESC) FROM table;
If you are on a PostgreSQL version < 9.0 then:
From: http://www.postgresql.org/docs/8.4/static/functions-aggregate.html
In the current implementation, the order of the input is in principle unspecified. Supplying the input values from a sorted subquery will usually work, however. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
So in your case you would write:
SELECT
array_to_string(array_agg(animal_name),';') animal_names,
array_to_string(array_agg(animal_type),';') animal_types
FROM (SELECT animal_name, animal_type FROM animals) AS x;
The input to the array_agg would then be unordered but it would be the same in both columns. And if you like you could add an ORDER BY clause to the subquery.
According to Tom Lane:
... If I read it right, the OP wants to be sure that the two aggregate functions will see the data in the *same* unspecified order. I think that's a pretty safe assumption. The server would have to go way out of its way to do differently, and it doesn't.
... So it is documented behavior that an aggregate without its own ORDER BY will see the rows in whatever order the FROM clause supplies them.
So I think it's fine to assume that all the aggregates, none of which uses ORDER BY, in your query will see input data in the same order. The order itself is unspecified though (which depends on the order the FROM clause supplies rows).
Source: PostgreSQL mailing list
Do this:
SELECT
array_to_string(array_agg(animal_name order by animal_name),';') animal_names,
array_to_string(array_agg(animal_type order by animal_type),';') animal_types
FROM
animals;