UNION types text and bigint cannot be matched - postgresql

I'm running a complex stored procedure and I'm getting an error when I have 3 unions, but with 2 unions no error. If I remove either of the top two unions it runs fine. If I make one of the NULLs a 0, it runs fine. The error is "UNION types text and bigint cannot be matched"
```lang-sql
SELECT NULL AS total_time_spent
FROM tbl1
GROUP BY student_id
UNION ALL
SELECT NULL AS total_time_spent
FROM tbl2
GROUP BY student_id
UNION ALL
SELECT sum(cast(("value" ->> 'seconds') AS integer)) AS total_time_spent
FROM tbl3
GROUP BY student_id
```
I've tried all kinds of casting on the sum result or the sum input. The json that I'm pulling from is either NULL, [] or something like this:
[{"date": "2020-09-17", "seconds": 458}]

According to the SQL standard, the NULL value exists in every data type, but lacking an explicit type cast, the first subquery resolves the data type to to text (earlier versions of PostgreSQL would have used unknown here, but we don't want this data type in query results).
The error message is then a consequence of the type resolution rules for UNION in PostgreSQL.
Use an explicit type case to avoid the problem:
SELECT CAST(NULL AS bigint) FROM ...
UNION ...

Related

How can I insert union tables to table in PostgreSQL?

I have this query and insert rows to MYSQl database and work perfect.
insert int test(id,user)
select null,user from table2
union
select null,user from table3
But when run the above query in PostgreSQL not work. And I get this error column "id" is of type integer but expression is of type text, But when I run two query below as shown as worked.
When I run below query in PostgreSQL it works properly:
insert into test(id,user)
select null,user from table2
Or below query in PostgreSQL it works properly:
insert int test(id,user)
select null,user from table3
Or below query in PostgreSQL it works properly:
select null,user from table2
union
select null,user from table3
null is not a real value and thus has no data type. The default assumed data type is text, that's where the error message comes from. Just cast the value to int in the first SELECT:
insert into test(id, "user")
select null::int, "user" from table2
union
select null, "user" from table3
Or even better, leave out the id completely so that any default defined for the id column is used. It sounds strange to try and insert null into a column named id
insert into test("user")
select "user" from table2
union
select "user" from table3
Note that user is a reserved keyword and a built-in function, so you will have to quote it to avoid problems. In the long run I recommend to find a different name for that column.

Athena - Union tables with incompatible data types

We have two tables with a column differing in its data type. A column in first table is of type int, while the same column on second table is of type float/real. if it was a naked column I could have CAST'ed to a common type, the problem here is, these columns are deep inside a struct.
Error i'm getting is,
SYNTAX_ERROR: line 23:1: column 4 in row(priceconfiguration row(maximumvalue integer, minimumvalue integer, type varchar, value integer)) query has incompatible types: Union, row(priceconfiguration row(maximumvalue integer, minimumvalue integer, type varchar, value real))
The query (simplified) is,
WITH t1 AS (
SELECT
"so"."createdon"
, "so"."modifiedon"
, "so"."deletedon"
, "so"."createdby"
, "so"."priceconfiguration"
, "so"."year"
, "so"."month"
, "so"."day"
FROM
my_db.raw_price so
UNION ALL
SELECT
"ao"."createdon"
, "ao"."modifiedon"
, "ao"."deletedon"
, "ao"."createdby"
, "ao"."priceconfiguration"
, "ao"."year"
, "ao"."month"
, "ao"."day"
FROM
my_db.src_price ao
)
SELECT t1.* FROM t1 ORDER BY "modifiedon" DESC
In fact, the real table is more complex than this and the column priceconfiguration is nested deep inside the tables. So CASTing the column under question is directly not possible, unless all the structs are un-nested to CAST the offending column.
Is there a way to UNION these two tables without unnesting and casting?
The solution was to upgrade the Athena Engine Version to v2.
V2 Engine has more support for schema evolution. As per the AWS doc,
Schema evolution support has been added for data in Parquet format.
Added support for reading array, map, or row type columns from
partitions where the partition schema is different from the table
schema. This can occur when the table schema was updated after the
partition was created. The changed column types must be compatible.
For row types, trailing fields may be added or dropped, but the
corresponding fields (by ordinal) must have the same name.
Ref:
https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference.html

Is a subquery able to select columns from outer query? [duplicate]

This question already has answers here:
sql server 2008 management studio not checking the syntax of my query
(2 answers)
Closed 1 year ago.
I have the following select:
SELECT DISTINCT pl
FROM [dbo].[VendorPriceList] h
WHERE PartNumber IN (SELECT DISTINCT PartNumber
FROM [dbo].InvoiceData
WHERE amount > 10
AND invoiceDate > DATEADD(yyyy, -1, CURRENT_TIMESTAMP)
UNION
SELECT DISTINCT PartNumber
FROM [dbo].VendorDeals)
The issue here is that the table [dbo].VendorDeals has NO column PartNumber, however no error is detected and the query works with the first part of the union.
Even more, IntelliSense also allows and recognize PartNumber. This fails only when inside a complex statement.
It is pretty obvious that if you qualify column names, the mistake will be evident.
This isn't a bug in SQL Server/the T-SQL dialect parsing, no, this is working exactly as intended. The problem, or bug, is in your T-SQL; specifically because you haven't qualified your columns. As I don't have the definition of your table, I'm going to provide sample DDL first:
CREATE TABLE dbo.Table1 (MyColumn varchar(10), OtherColumn int);
CREATE TABLE dbo.Table2 (YourColumn varchar(10) OtherColumn int);
And then an example that is similar to your query:
SELECT MyColumn
FROM dbo.Table1
WHERE MyColumn IN (SELECT MyColumn FROM dbo.Table2);
This, firstly, will parse; it is a valid query. Secondly, provided that dbo.Table2 contains at least one row, then every row from table dbo.Table1 will be returned where MyColumn has a non-NULL value. Why? Well, let's qualify the column with table's name as SQL Server would parse them:
SELECT Table1.MyColumn
FROM dbo.Table1
WHERE Table1.MyColumn IN (SELECT Table1.MyColumn FROM dbo.Table2);
Notice that the column inside the IN is also referencing Table1, not Table2. By default if a column has it's alias omitted in a subquery it will be assumed to be referencing the table(s) defined in that subquery. If, however, none of the tables in the sub query have a column by that name, then it will be assumed to reference a table where that column does exist; in this case Table1.
Let's, instead, take a different example, using the other column in the tables:
SELECT OtherColumn
FROM dbo.Table1
WHERE OtherColumn IN (SELECT OtherColumn FROM dbo.Table2);
This would be parsed as the following:
SELECT Table1.OtherColumn
FROM dbo.Table1
WHERE Table1.OtherColumn IN (SELECT Table2.OtherColumn FROM dbo.Table2);
This is because OtherColumn exists in both tables. As, in the subquery, OtherColumn isn't qualified it is assumed the column wanted is the one in the table defined in the same scope, Table2.
So what is the solution? Alias and qualify your columns:
SELECT T1.MyColumn
FROM dbo.Table1 T1
WHERE T1.MyColumn IN (SELECT T2.MyColumn FROM dbo.Table2 T2);
This will, unsurprisingly, error as Table2 has no column MyColumn.
Personally, I suggest that unless you have only one table being referenced in a query, you alias and qualify all your columns. This not only ensures that the wrong column can't be referenced (such as in a subquery) but also means that other readers know exactly what columns are being referenced. It also stops failures in the future. I have honestly lost count how many times over years I have had a process fall over due to the "ambiguous column" error, due to a table's definition being changed and a query referencing the table wasn't properly qualified by the developer...

"CAST" function with "DISTINCT ON" not changing the type of the field

I have two tables parent and child . I need to join these two tables and get the results into one.
This pid(one column in parent table) may have duplicate entries and the field type of pid is VARCHAR.
But the field type of 'cid' in the child table is INTEGER.
As i need distinct value i used DISTINCT ON in the patent table query. When i take union with child table,
the query throws error because of FIELD TYPE differs(pid and cid).
I used "DISTINCT ON" (CAST(pid AS INTEGER)) to make the CAST same for both tables.
But the CAST of pid is not changing. Still its shows error.
When i use "DISTINCT CAST(pid AS INTEGER))" instead of "DISTINCT ON" no errors came, but the result(number of rows) is not correct.
The query i used
Select DISTINCT ON (pid) pid AS id,
first_name
last_name AS last_name,
email AS email
from parent where pid IS NOT NULL
UNION
Select cid AS id,
child_first_name AS first_name,
child_last_name AS last_name,
child_email AS email
from child where cid IS NOT NULL
Is any one have idea of using "CAST" function with "DISTINCT ON".
DISTINCT ON (CAST(pid AS INTEGER)) pid AS id
This will cast the pid value for the DISTINCT calculation, not for the result.
Assuming you don't need to cast the value in order to do a DISTINCT on it, you should do something like:
SELECT DISTINCT ON (pid) pid::INTEGER AS id,
...
UNION
SELECT cid,
...
i.e., cast it when it's being selected, rather than in the DISTINCT calculation. If you do need to cast it in there as well, then you simply have to cast it in both places.

nested SELECT statements interact in ways that I don't understand

I thought I understood how I can do a SELECT from the results of another SELECT statement, but there seems to be some sort of blurring of scope that I don't understand. I am using SQL Server 2008R2.
It is easiest to explain with an example.
Create a table with a single nvarchar column - load the table with a single text value and a couple of numbers:
CREATE TABLE #temptable( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES('apple');
INSERT INTO #temptable( a )
VALUES(1);
INSERT INTO #temptable( a )
VALUES(2);
select * from #temptable;
This will return: apple, 1, 2
Use IsNumeric to get only the rows of the table that can be cast to numeric - this will leave the text value apple behind. This works fine.
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1 ;
This returns: 1, 2
However, if I use that exact same query as an inner select, and try to do a numeric WHERE clause, it fails saying cannot convert nvarchar value 'apple' to data type int. How has it got the value 'apple' back??
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
where x.NumA > 1
;
Note that the failing query works just fine without the WHERE clause:
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
;
I find this very surprising. What am I not getting? TIA
If you take a look at the estimated execution plan you'll find that it has optimized the inner query into the outer and combined the WHERE clauses.
Using a CTE to isolate the operations works (in SQL Server 2008 R2):
declare #temptable as table ( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES ('apple'), ('1'), ('2');
with Numbers as (
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
)
select * from Numbers
The reason you are getting this is fair and simple. When a query is executed there are some steps that are being followed. This is a parse, algebrize, optimize and compile.
The algebrize part in this case will get all the objects you need for this query. The optimize will use these objects to create a best query plan which will be compiled and executed...
So, when you look into that part you will see it will do a table scan on #temptable. And #temptable is defined as the way you created your table. That you will do some compute on it is a different thing..... The column still has the nvarchar datatype..
To know how this works you have to know how to read a query. First all the objects are retrieved (from table, inner join table), then the predicates (where, on), then the grouping and such, then the select of the columns (with the cast) and then the orderby.
So with that in mind, when you have a combination of selects, the optimizer will still process it that way.. since your select is subordinate to the from and join parts of your query, it will be a reason for getting this error.
I hope i made it a little clear?
The optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data (the evaluation order of the predicates is not guaranteed). I think using the case expression like bellow produces a NULL in absence of the ELSE clause and thus takes the APPLE out
select a from #temptable where case when isnumeric(a) = 1 then a end > 1