postgres `order by` argument type - postgresql

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?

This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.

Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?

I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

Related

ORDER BY CASE & Ordinal not working

Sorting column #7 as an example -
This code does not sort data at all:
ORDER BY CASE WHEN '1'='2' THEN 5
WHEN '1'='1' THEN 7
ELSE 13 END
If I change it to a hard-coded ordinal it works:
ORDER BY 7
As long as the respective expressions in the SELECT list are of the same type, you can do it by using the expressions themselves instead of the SELECT list number:
SELECT expression1, expression2, ...
...
ORDER BY CASE
WHEN 1=2
THEN expression5
WHEN 1=1
THEN expression7
ELSE expression13
END;
If the data types are not the same, season with type casts.
Your query does not work because only integer literals can be used as column numbers in ORDER BY. In all other cases, an integer just stands for its constant value.
If it were not like this, ORDER BY expressions could easily become ambiguous. Look at the following:
... ORDER BY intcol + 3;
Should that mean “add three” or “add expression number three from the SELECT list”?

Create function to compute an average of 3 values

I'm trying to write a function in postgre sql to take an average across three columns. I have written the following function:
create function xcol_avg (col1, col2, col3)
returns numeric as $$
begin
return (coalesce(col1, 0) + coalesce(col2,0) +coalesce(col3, 0))/
case when (col 1 is null or col1 = 0 then 0 else 1 end +
case when (col 2 is null or col2 = 0 then 0 else 1 end +
case when (col 3 is null or col3 = 0 then 0 else 1 end;
end
What is the problem with my code? Also, is there a way to get the function to return null if it ends up dividing by 0? Any help is really appreciated.
Thanks!
Actually, you can make a function that will use a variable number of arguments and depending on their number compute the average. In Postgres there's a word VARIADIC for such things:
SQL functions can be declared to accept variable numbers of arguments, so long as all the "optional" arguments are of the same data type
Function code:
CREATE FUNCTION xcol_avg(numeric, VARIADIC numeric[])
RETURNS numeric
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
RETURN (SELECT AVG(vals) FROM unnest($2 || ARRAY[$1]) t(vals));
END;
$$;
Use case with different number of arguments:
select xcol_avg(1,6); -- returns 3.5
select xcol_avg(1,5.5,4); -- returns 3.5
select xcol_avg(1,2,3,4,5,6,7); -- returns 4
Click on this Button to try this online.
Explanation:
Marking a function as IMMUTABLE improves the execution time by allowing the optimizer to pre-evaluate the function. Immutable functions cannot modify the database and are guaranteed to always return the same results when called with the same input.
Declaring the last parameter of a function as VARIADIC which has to be of an array type lets you provide optional arguments that will be passed to the function as an array. Note that you don't explicitly write the array, you just list your parameters as you normally would.
unnest() is a function that returns a set of rows by expanding an array. In other words it's "unpacking" the array elements into separate rows
|| is an array operator that provides the array-to-array concatenation. Here it serves the purpose of connecting the first (required) argument with the rest given in a VARIADIC array.
AVG() is an aggregate function that computes an average of all input values. In our case it would take "unpacked" rows from a column named vals and compute the average.
With this solution you don't need to worry about dividing by zero, as at least one argument is required and avg() is doing the job you wanted to do manually by building up the denominator.
Apply it in a query:
This function would also work for computing an average of multiple columns in a row. Consider a table tbl with columns name, cost1, cost2, cost3 and below statement:
SELECT
name, cost1, cost2, cost3,
xcol_avg(cost1, cost2, cost3) AS average_cost
FROM tbl
For more general information about CREATE FUNCTION check the resourceful documentation.

Prepare dynamic case statement using PostgreSQL 9.3

I have the following case statement to prepare as a dynamic as shown below:
Example:
I have the case statement:
case cola
when cola between '2001-01-01' and '2001-01-05' then 'G1'
when cola between '2001-01-10' and '2001-01-15' then 'G2'
when cola between '2001-01-20' and '2001-01-25' then 'G3'
when cola between '2001-02-01' and '2001-02-05' then 'G4'
when cola between '2001-02-10' and '2001-02-15' then 'G5'
else ''
end
Note: Now I want to create dynamic case statement because of the values dates and name passing as a parameter and it may change.
Declare
dates varchar = '2001-01-01to2001-01-05,2001-01-10to2001-01-15,
2001-01-20to2001-01-25,2001-02-01to2001-02-05,
2001-02-10to2001-02-15';
names varchar = 'G1,G2,G3,G4,G5';
The values in the variables may change as per the requirements, it will be dynamic. So the case statement should be dynamic without using loop.
You may not need any function for this, just join to a mapping data-set:
with cola_map(low, high, value) as (
values(date '2001-01-01', date '2001-01-05', 'G1'),
('2001-01-10', '2001-01-15', 'G2'),
('2001-01-20', '2001-01-25', 'G3'),
('2001-02-01', '2001-02-05', 'G4'),
('2001-02-10', '2001-02-15', 'G5')
-- you can include as many rows, as you want
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
If your date ranges could collide, you can use DISTINCT ON or GROUP BY to avoid row duplication.
Note: you can use a simple sub-select too, I used a CTE, because it's more readable.
Edit: passing these data (as a single parameter) can be achieved by passing a multi-dimensional array (or an array of row-values, but that requires you to have a distinct, predefined composite type).
Passing arrays as parameters can depend on the actual client (& driver) you use, but in general, you can use the array's input representation:
-- sql
with cola_map(low, high, value) as (
select d[1]::date, d[2]::date, d[3]
from unnest(?::text[][]) d
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
// client pseudo code
query = db.prepare(sql);
query.bind(1, "{{2001-01-10,2001-01-15,G2},{2001-01-20,2001-01-25,G3}}");
query.execute();
Passing each chunk of data separately is also possible with some clients (or with some abstractions), but this is highly depends on your driver/orm/etc. you use.

SQL invalid conversion return null instead of throwing error

I have a table with a varchar column, and I want to find values that match a certain number. So lets say that column contains the following entries (except with millions of rows in real life):
123456789012
2345678
3456
23 45
713?2
00123456789012
So I decide I want all the rows which are numerically 123456789012 write a statement that looks something like this:
SELECT * FROM MyTable WHERE CAST(MyColumn as bigint) = 123456789012
It should return the first and last row, but instead the whole query blows up because it can't convert the "23 45" and "713?2" to bigint.
Is there another way to do the conversion that will return NULL for values that can't convert?
SQL Server does NOT guarantee boolean operator short-circuit, see On SQL Server boolean operator short-circuit. So all solution using ISNUMERIC(...) AND CAST(...) are fundamentally flawed (they may work, but hey can arbitrarily fail later dependiong on the generated plan). A better solution is using CASE, as Thomas suggests: CASE ISNUMERIC(...) WHEN 1 THEN CAST(...) ELSE NULL END. But, as gbn pointed out, ISNUMERIC is notoriously finicky in identifying what 'numeric' means and many cases where one would expect it to return 0 it returns 1. So mixing the CASE with the LIKE:
CASE WHEN MyRow NOT LIKE '%[^0-9]%' THEN CAST(MyRow as bigint) ELSE NULL END
But the real problem is that if you have millions of rows and you have to search them like this, you'll always end up scanning end-to-end since the expression is not SARG-able (no matter how we rewrite it). The real issue here is data purity, and should be addressed at the appropriate level, where the data is populated. Another thing to consider is if is possible to create a persisted computed column with this expression and create a filtered index on it which eliminates NULL (ie. non-numeric). That would speed up things a little.
If you are using SQL Server 2012 you can use the 2 new methods:
TRY_CAST()
TRY_CONVERT()
Both methods are equivalent. They return a value cast to the specified data type if the cast succeeds; otherwise, returns null. The only difference is that CONVERT is SQL Server specific, CAST is ANSI. using CAST will make your code more portable (although not sure if any other database provider implements TRY_CAST)
ISNUMERIC will accept empty string and values like 1.23 or 5E-04 so could be unreliable.
And you don't know what order things will be evaluated in so it could still fail (SQL is declarative, not procedural, so the WHERE clause probably won't be evaluated left to right)
So:
you want to accept value that consist only of the characters 0-9
you need to materialise the "number" filter so it's applied before CAST
Something like:
SELECT
*
FROM
(
SELECT TOP 2000000000 *
FROM MyTable
WHERE MyColumn NOT LIKE '%[^0-9]%' --double negative rejects anything except 0-9
ORDER BY MyColumn
) foo
WHERE
CAST(MyColumn as bigint) = 123456789012 --applied after number check
Edit: quick example that fails.
CREATE TABLE #foo (bigintstring varchar(100))
INSERT #foo (bigintstring )VALUES ('1.23')
INSERT #foo (bigintstring )VALUES ('1 23')
INSERT #foo (bigintstring )VALUES ('123')
SELECT * FROM #foo
WHERE
ISNUMERIC(bigintstring) = 1
AND
CAST(bigintstring AS bigint) = 123
SELECT *
FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as float) = 123456789012
The ISNUMERIC() function should give you what you need.
SELECT * FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as bigint) = 123456789012
And to add a case statement like Thomas suggested:
SELECT * FROM MyTable
WHERE CASE(ISNUMERIC(MyRow)
WHEN 1 THEN CAST(MyRow as bigint)
ELSE NULL
END = 123456789012
http://msdn.microsoft.com/en-us/library/ms186272.aspx
SELECT *
FROM MyTable
WHERE (ISNUMERIC(MyColumn) = 1) AND (CAST(MyColumn as bigint) = 123456789012)
Additionally you can use a CASE statement in order to get null values.
SELECT
CASE
WHEN (ISNUMERIC(MyColumn) = 1) THEN CAST(MyColumn as bigint)
ELSE NULL
END AS 'MyColumnAsBigInt'
FROM tableName
If you require additional filtering, for numerics which are not valid to be cast to bigint, you can use the following instead of ISNUMERIC:
PATINDEX('%[^0-9]%',MyColumn)) = 0
If you need decimal values instead of integers, cast to float instead and change the regex to '%[^0-9.]%'

Postgres query error

I have a query in postgres
insert into c_d (select * from cd where ak = '22019763');
And I get the following error
ERROR: column "region" is of type integer but expression is of type character varying
HINT: You will need to rewrite or cast the expression.
An INSERT INTO table1 SELECT * FROM table2 depends entirely on order of the columns, which is part of the table definition. It will line each column of table1 up with the column of table2 with the same order value, regardless of names.
The problem you have here is whatever column from cd with the same order value as c_d of the table "region" has an incompatible type, and an implicit typecast is not available to clear the confusion.
INSERT INTO SELECT * statements are stylistically bad form unless the two tables are defined, and will forever be defined, exactly the same way. All it takes is for a single extra column to get added to cd, and you'll start getting errors about extraneous extra columns.
If it is at all possible, what I would suggest is explicitly calling out the columns within the SELECT statement. You can call a function to change type within each of the column references (or you could define a new type cast to do this implicitly -- see CREATE CAST), and you can use AS to set the column label to match that of your target column.
If you can't do this for some reason, indicate that in your question.
Check out the PostgreSQL insert documentation. The syntax is:
INSERT INTO table [ ( column [, ...] ) ]
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) | query }
which here would look something like:
INSERT INTO c_d (column1, column2...) select * from cd where ak = '22019763'
This is the syntax you want to use when inserting values from one table to another where the column types and order are not exactly the same.